Understanding MPEG-2 Compression
  February 6, 2001

Why Compression?

Video compression basically means using fewer bits to store and transmit digital video information. The data-rate of uncompressed "studio quality" digital video (ITU-R BT.601-5) is upwards of 100 Megabits per second, which vastly exceeds the speed at which a DVD player can retrieve video information from a disc (9.8 mbps). Storing a two hour program at this rate takes over 90 Gigabytes, while the storage capacity of DVD ranges from 4.7 to 17 GB. The answer to this dilemma is MPEG compression. DVD-Video supports both MPEG-1 (also used for Video CD in Asia) and MPEG-2. MPEG-2 is universally regarded as yielding higher image quality, and is the norm for most DVD-Video titles.

One underlying assumption of MPEG-2 compression is that motion pictures contain lots of redundancy, both within each frame and between a series of consecutive frames. Another is that there is some information in each frame that may be discarded without noticeably affecting the way that picture is perceived when played back. MPEG-2 reduces the overall volume of data both by discarding such "un-needed" information and by storing redundant data more efficiently.

To realize these efficiencies, MPEG-2 first performs intra-frame compression, similar to the techniques used in still-image formats such as JPEG. Next comes inter-frame compression, in which a series of adjacent frames are compared, and only the information necessary to describe the differences between successive frames is retained. When the encoded material is played back, a decoder extrapolates from the stored information to re-create a complete set of discrete frames.

The result of the MPEG-2 encoding process is a video stream. The basic unit of the stream is a "Group of Pictures" (GOP), made up of three picture types: I, B, and P. I-pictures (intra) are compressed using intra-frame techniques only, meaning that the information stored is complete enough to decode the frame without reference to any adjacent frames. For B (bi-directional) and P (predictive) pictures, however, only "difference information" (frame-to-frame changes) is stored, which generates much less data. These pictures can only be reconstructed by referring to the I-pictures around them, which is why the different picture types are grouped into GOPs.