Motion JPEG - Encoding

Encoding

Motion JPEG uses a lossy form of intraframe compression based on the discrete cosine transform (DCT). This mathematical operation converts each frame/field of the video source from the spatial (2D) domain into the frequency domain (aka transform domain.) A perceptual model based loosely on the human psychovisual system discards high-frequency information, i.e. sharp transitions in intensity, and color hue. In the transform domain, the process of reducing information is called quantization. In laymen's terms, quantization is a method for optimally reducing a large number scale (with different occurrences of each number) into a smaller one, and the transform-domain is a convenient representation of the image because the high-frequency coefficients, which contribute less to the over picture than other coefficients, are characteristically small-values with high compressibility. The quantized coefficients are then sequenced and losslessly packed into the output bitstream. Nearly all software implementations of M-JPEG permit user control over the compression-ratio (as well as other optional parameters), allowing the user to trade off picture-quality for smaller file size. In embedded applications (such as miniDV, which uses a similar DCT-compression scheme), the parameters are pre-selected and fixed for the application.

M-JPEG is an intraframe-only compression scheme (compared with the more computationally intensive technique of interframe prediction). Whereas modern interframe video formats, such as MPEG1, MPEG2 and H.264/MPEG-4 AVC, achieve real-world compression-ratios of 1:50 or better, M-JPEG's lack of interframe prediction limits its efficiency to 1:20 or lower, depending on the tolerance to spatial artifacting in the compressed output. Because frames are compressed independently of one another, M-JPEG imposes lower processing and memory requirements on hardware devices.

As a purely intraframe compression scheme, the image-quality of M-JPEG is directly a function of each video frame's static (spatial) complexity. Frames with large smooth-transitions or monotone surfaces compress well, and are more likely to hold their original detail with few visible compression artifacts. Frames exhibiting complex textures, fine curves and lines (such as writing on a newspaper) are prone to exhibit DCT-artifacts such as ringing, smudging, and macroblocking. M-JPEG compressed-video is also insensitive to motion-complexity, i.e. variation over time. It is neither hindered by highly random motion (such as the surface-water turbulence in a large waterfall), nor helped by the absence of motion (such as static landscape shot by tripod), which are two opposite extremes commonly used to test interframe video-formats.

For QuickTime formats, Apple has defined two types of coding: MJPEG-A and MJPEG-B. MJPEG-B no longer retains valid JPEG Interchange Files within it, hence it is not possible to take a frame into a JPEG file without slightly modifying the headers.

Read more about this topic:  Motion JPEG