For my project, I will design and simulate a VLSI circuit to compute the 1-D Inverse Discrete Cosine Transformation. The IDCT is the basis of many lossy compression algorithms, such as the JPEG standard for still images, the MPEG video standard, and the MPEG audio standards. My IDCT circuit will be targeted for use in an MPEG video decompression board.

Specifications

In MPEG video (as well as JPEG images), the video frame is divided into 8x8 pixel blocks, and the Discrete Cosine Transform is applied to each block individually. Because the DCT and IDCT are separable, the 2-D transform is obtained by applying the 1-D DCT or IDCT to each row and then to each column of the 8x8 pixel block. My circuit will be an implementation of an 8-pixel 1-D IDCT, which can be used successively by the external circuitry to perform a 2-D transform.

The I/O signals to the circuit will be:

As shown, the chip will have the standard power, ground, and clock signals. An ENABLE input allows the external circuitry to control the operation of the decoder. On the next eight clock cycles after ENABLE goes high, the chip will read in the 8 DCT coefficients. It will then compute the transform, lowering BUSY to indicate when the results are ready to be placed on the output lines. In MPEG the largest DCT coefficients are 12 bits, and the largest pixel values are 8 bits wide.

Design

My circuit will probably use the IDCT algorithm published by Loeffler, Ligtenberg & Moschytz[1] (a C implementation of this algorithm is available at [3]). I have chosen this algorithm because it is simple and I have worked with it in the past for performing the IDCT in software. However I will also research other IDCT algorithms in case there is one that would be faster or more natural when implemented in VLSI. The algorithm works by successively computing on an array of 8 numbers, and then replacing the array with the output of each computation. To obtain an item in the output array, two (or fewer) elements are multiplied by a constant, then added or subtracted together, and then a shift is performed on the result.

Figure 1 shows a block diagram of the circuit. The circuit contains 8x12 bit register files which store the inputs and outputs of each succession of the algorithm, respectively. The input register file has the special requirement that it is capable of outputing two different registers simultaneously. There are two multipliers for multiplying array items by constants, an addition/subtraction block, a 1-bit right/left shifter, and multiplexers to route data signals appropriately. The CONTROL block is a state machine which coordinates all of the control signals to the other blocks. The addition blocks will be simple 12-bit ripple carry adders, and the multiplication block will use a successive shift-and-add algorithm. If I have time, I can work on improving the performance of these blocks.

Figure 1: Block diagram for system
Figure 1: Block diagram for system

References

[1] Loeffler, Ligtenberg and Moschytz, "Practical Fast 1-D DCT Algorithms with 11 Multiplications", Proceedings of the International Conference on Acoustics, Speech, and Signal Processing 1989, pp. 988-991.

[2] Gonzalez, Rafael C. and Richard E. Woods. Digital Image Processing. New York: Addison-Wesley Publishing Company, 1993.

[3] Web Page: http://www.cs.uow.edu.au/people/nabg/MPEG/IDCT.html
David J Crandall
Last modified: Sun Sep 12 14:40:50 EDT 1999