A Rudimentary Hardware MPEG Video Decoder
David Crandall
CSE 471 Honors Project
April 29, 1999
Dr. Das
Overview
For my honors project, I designed and implemented a rudimentary MPEG decoder using Synopsys VHDL. Given an MPEG bitstream, the decoder performs the following functions:
Parse and decode the headers for the various MPEG layers (sequence layer, group of pictures layer, picture layer, slice layer, and macroblock layer)
Parse the Discrete Cosine Transform (DCT) coefficients for each block
Reconstruct the DCT coefficients for each block
Dequantize the coefficients
Perform an Inverse Discrete Cosine Transform (IDCT) to obtain the original pixel values for each block
A full implementation of the MPEG decoding standard would have been very difficult to build in one semester. This decoder is a minimal implementation, and makes some assumptions about the MPEG bitstream. These assumptions and limitations include:
Only intra-coded (I) frames are supported. MPEG files typically also contain predicted (P) frames and backward predicted (B) frames, which use motion compensation from a reference frame. Support for these frame types could be added in the future.
The default IDCT quantization table is assumed. The standard allows for MPEG sequences to override the default quantization coefficients used to reconstruct the DCT coefficients, but this MPEG decoder always assumes the default.
Currently, the decoder only decodes the luminance plane without the chrominance planes, so its output is a grayscale image.
Figure
1a shows an sample frame output by my decoder. For comparison
purposes, Figure 1b shows the same frame, as decoded by the Berkeley
mpeg_play program.
Figure 1a (left) shows a sample frame decoded by my MPEG decoder;
Figure 1b (right) shows the same frame decoded by Berkeley mpeg_play
The MPEG-1 Video Standard
This section briefly describes the MPEG-1 video standard.
An MPEG video sequence is composed of several levels, and each layer has headers and data associated with it. The MPEG-1 layers are:
Sequence layer: This layer encompasses the whole video sequence. The sequence header contains general information about the video file, including the vertical and horizontal size of each frame and timing information.
Group of Pictures (GOP) layer: A new GOP header appears typically every 12 frames in the MPEG. It contains timing information for the decoder.
Picture layer: A picture is an individual image frame. The picture header contains information about the frame's temporal number, timing, and motion vectors.
Slice layer: A frame is divided into one or more slices.
Macroblock layer: A macroblock is a 16x16 pixel area of the image. Each macroblock contains four luminance blocks, and two chrominance blocks. The macroblock header contains information about its spatial location, its quantization, and its motion vectors, when appropriate.
Block layer: The block layer is where the actual image data appears. Each block is an 8x8 pixel area of the image.
Instead of storing actual pixel values, MPEG streams contain the Discrete Cosine Transform (DCT) of each block. Although the DCT of an 8x8 pixel array still results in 64 coefficients, many of the coefficients become zero, so they can be more efficiently stored. Also, the coefficients can be quantized and still maintain an acceptable image quality.
To compress the images, MPEG uses various Variable Length Coding (VLC) schemes. VLC codes are used to encode some of the header information. The DCT coefficients themselves are encoded with Run Length Encoding (RLE) and then encoded as VLC. While these codes drastically reduce the amount of data in an MPEG video, they complicate the decoding process.
For reference, Appendix B of this report contains the structure of the layer headers, as well as some of the VLC encoding tables.
Implementation Details
This section describes the general implementation of my MPEG decoder.
With the exception of the D flip-flop, all portions of the decoder were implemented using structural VHDL code. The Synopsys design_analyzer program was used to generate VHDL source files for the PLAs.
The mpeg decoder itself is a VHDL entity called mpegdecode. Its inputs are an asynchronous reset, a clock, and a serial input line. Its outputs are an error line, which becomes high if mpegdecode encounters an error, and the hold signal. The decoder expects the MPEG bitstream to be sent synchronously, one bit per clock cycle, via the serial input line. At times, the decoder may be unable to keep up with the input. In this case, it will raise the hold signal, and the device providing the input stream must wait until hold returns low before sending more bits.
The following describes the individual circuit blocks that make up the MPEG decoder. All of the source code described is attached to this report as Appendix A.
input
The input unit provides the bitstream for the MPEG decoder. In an actual hardware implementation of the decoder, the bitstream might come from a digital camera, a hard drive, a digital TV connection, etc. For the VHDL simulation, the input unit contains a 64K RAM which is loaded at simulation time with the contents of an mpeg file. The input unit steps through each bit of the RAM and sends a bit per cycle to the MPEG decoder. A simple UNIX shell script is used to convert an MPEG file into a form suitable for a Synopsys RAM.
mpegdecode
The mpegdecode block contains a control unit and blocks responsible for parsing each of the layers of the MPEG file. The control unit monitors the bitstream for MPEG header start codes for each of the layers. When a start code is detected, the control unit raises the go signal of the appropriate block. For example, when a slice start code is detected, the control unit raises slicehead's go input signal.
The subcircuit parses the header and, when finished, raises its done output signal. Each subcircuit also has an error output, which it raises if it encounters an error. For example, the picthead unit raises its error output if an encounters a non-intracoded frame, since only intracoded frames are supported by this decoder.
control
The control unit monitors the input stream until it sees the start code bit pattern (0x000001). The next two bytes of the input then contain a code indicating the layer type that follows, and control raises the go signal of the unit responsible for parsing that layer.
For simplicity, the control unit ignores the group of pictures (GOP), extended data, and user data start codes. The GOP header contains only timing information, which is not necessary for this decoder. The latter two layers are used only in MPEG-2 files.
Header parsers: seqhead, picthead, slicehead
These units parse the sequence headers, picture headers, and slice headers, respectively. Most of the data contained in these headers is not necessary for this decoder. The seqhead unit extracts the MPEG frame size (number of pixels horizontally and vertically). The picthead unit extracts the frame number and the frame coding type (I, P, or B). The slicehead unit extracts the quantization factor for this slice.
Each of these units use counters and shift registers to load in the appropriate number of bits for each data field.
macroblock
The macroblock unit is where the pixel data is actually read in and processed. The macroblock unit is composed of several state machines which are responsible for parsing certain parts of the macroblock's data. Once a state machine finishes parsing its data, it outputs a pulse on its done signal, which is fed into the next unit's go signal. In this way, the state machines cooperate to decode the macroblock.
First, the macroblock's address increment and coding type are read from the macroblock header. This information is stored in VLC codes, so the VLC unit (explained below) and a PLA are used to decode them. The quantization coefficient for this macroblock follows. The address increment field is always 1 in I-coded frames, but, for future expansion, the MPEG decoder parses and decodes it anyway.
Next in the bitstream appear the DCT coefficients themselves, stored in VLC-encoded RLE codes. The DC (zeroth) coefficient is stored separately from the other 63 coefficients, so one state machine is responsible for reading it. Again a VLC unit is employed to read this code. The DC coefficient is stored as the difference from the DC coefficient from the last block, so these must be added to obtain the new DC coefficient.
The AC coefficients follow. Appendix B contains the table for these VLC code. Each VLC code corresponds to a number indicating the number of zeros that occur between the current coefficient and the next coefficient, and a number indicating the quantized level of that next coefficient. They are encoded in the MPEG "zig-zag" order, in which 8x8 array indexes are numbered in order of increasing frequency. The coefficients are stored in a special register array, regarray, which maps the zig-zag indices to row-column indices. As each RLE code is read, a register stores the current zigzag index and writes the appropriate level to that index. The DCT coefficients are quantized, so they must be multiplied by the appropriate quantization coefficient for that index.
After the coefficients are decoded and reconstructed, a state machine performs a two-dimensional Inverse Discrete Cosine Transformation. It applies the idct1d unit, which performs a one-dimensional IDCT on 8 units, to each row and column of the 8x8 coefficient matrix. Therefore, this process takes 16 cycles. When the process is complete, regarray is cleared and the decoding of the next block begins.
Idct1d
The idct1d unit takes 8 DCT coefficients, and outputs the IDCT transform of them on its 8 output signals. The unit contains several cascaded multipliers, adders, and shifters which perform the IDCT algorithm described in Appendix C of this report. Since the adders and multipliers are implemented as combinational circuits, this unit can perform the IDCT in one cycle.
The arithmetic is performed using 24-bit fixed-point numbers, where the 16 most significant bits are the integer part, and the lower 8 bits are the fractional part. Allowing only 8-bits for the fractional part means that some data is lost during the transform, but this choice was made to reduce the design time, simulation time, and amount of hardware required.
regarray
The regarray unit is used for storing the DCT coefficients as they are decoded and as the successive 1-D IDCTs are performed. This register contains 64 16-bit registers, arranged as an 8x8 matrix. An individual coefficient can be written to the register array, and the zigzag PLA converts zigzag indices into row-column indices. Or, a whole row or a whole column can be written to the register in one cycle. Elements can also be read from the array a whole row or a whole column at a time. In this way, the macroblock module can read an entire row or column, find the IDCT using idct1d, and write the row or column back to the regarray, all in one clock cycle.
VLC
The vlc units are used to decode the VLC codes in the MPEG file. The VLC unit itself is very simple: it contains just a counter and a register. While enabled, it counts the number of bits it has seen in the counter, and stores these bits in its register. It outputs this data. The block using the VLC is expected to connect these outputs to a PLA, and when the code corresponds to a valid VLC code, the PLA outputs the decoded result.
PLAs
Several PLAs are used in this design, including:
mb_mai_pla, for decoding the VLC codes for each macroblock's address increment
mb_type_pla, for decoding each macroblock's coding type
bl_y_size_pla, for decoding the size of the DC coefficient for luminance blocks
bl_c_size_pla, for decoding the size of the DC coefficient for chrominance blocks
quant_pla, containing the quantization factors for each DCT coefficient
zigzag_pla, a pla used for controlling the load signals of the registers in the regarray, and for converting zig-zag-ordered array indices to row-column order
runlevel_pla, for decoding the DCT coefficients
Multipliers, adders, shift registers, counters, multiplexers, etc.
To support all of the above units, numerous fundamental blocks, such as multipliers, adders, shift registers, etc., were required. These blocks were implemented using gates and other blocks.
Results and Conclusions
Figure 1 shows a sample output image from the decoder. To obtain this, a simulation file, attached in Appendix A, loads the MPEG data into the input unit's memory, and then begins the simulation. Whenever the macroblock unit's signals indicate that a block has been parsed and reconstructed, the simulation file dumps the contents of the register array to a file. A simple C program, sim2rgb.c, converts these pixel values into an image in RGB format.
This encoder would be very fast if implemented in hardware. However, it is very, very slow to simulate using Synopsys VHDL. Processing the single image frame shown in Figure 1 requires nearly three hours of processing time on the Sun Sparcstations in 101 Pond Building. This made debugging the decoder very tedious and time-consuming. If I were to continue working on this project, I might rewrite the IDCT code to use behavioral logic instead of structural logic, since this would probably significantly reduce processing time.
The project could be improved with more time. With very little more work, output images could be in color instead of in grayscale. Support for P- and B-coded frames could be added. This would involve adding RAMs that would store the last and the next frames so that they could be used for the motion compensation. I could make the decoder be compatible with more MPEG sequences, by removing some of the current assumptions that it makes.
I learned a lot in this project. First, I gained a lot of experience using VHDL, and I learned many new things about it. Obviously, I also gained an intimate knowledge of the MPEG standard, and in basic concepts of compression and data coding. I learned more about how the Discrete Cosine Transform works, why it is used, and how it can be implemented. Finally, I learned a lot from struggling about how to implement parts of the decoder in hardware. For example, I spent a lot of time deciding on the best way to perform the VLC decoders. In short, this was the most difficult and time-consuming hardware project I've worked on, but it was also the most rewarding.
References
"Inverse Discrete Cosine Transform for MPEG Stream." http://www.cs.uow.edu.au/people/nabg/MPEG/IDCT.html
Mitchel, Joan, William Pennebaker, Chad Fogg, and Didier LeGall. MPEG Video Compression Standard. New York: Chapman & Hall, 1996.
Appendix A: Source Code
Appendix B: MPEG Coding Tables
Appendix C:
Algorithm for Inverse Discrete Cosine Transformation