Citizendia

MPEG-1 was an early standard for lossy compression of video and audio. A technical standard is an established norm or requirement It is usually a formal document that establishes uniform engineering or technical criteria methods processes and practices A lossy compression method is one where compressing data and then decompressing it retrieves data that may well be different from the original but is close enough to be useful Video is the technology of electronically capturing, Recording, processing storing transmitting and reconstructing a sequence of Still images It was designed to compress VHS-quality raw digital video and CD audio down to 1. 5 Mbit/s (26:1 and 6:1 compression ratios respectively)[1] without obvious quality loss, making Video CDs, digital cable/satellite TV and digital audio broadcasting (DAB) possible. Satellite television is Television delivered by the means of Communications satellites as compared to conventional Terrestrial television and Cable Digital Audio Broadcasting ( DAB) also known as Eureka 147, is a Digital radio technology for Broadcasting Radio stations used in [2] [3]

Today, MPEG-1 has become the most widely compatible lossy audio/video format in the world, and is used in a large number of products and technologies. Perhaps the best-known part of the MPEG-1 standard is the MP3 audio format it introduced.

Despite its age, MPEG-1 is not necessarily obsolete or substantially inferior to newer technologies. According to Leonardo Chiariglione (co-founder of MPEG): "the idea that compression technology keeps on improving is a myth. Leonardo Chiariglione is an italian engineer born in Almese (in the province of Turin, Piedmont The Moving Picture Experts Group, commonly referred to as simply MPEG, is a Working group of ISO / IEC charged with the development of video and "[4]

The MPEG-1 standard is published as ISO/IEC-11172.

Moving Picture Experts Group Phase 1 (MPEG-1)
File name extension. A filename extension is a suffix to the name of a Computer file applied to indicate the encoding convention ( File format) of its contents mpg, . mpeg, . mp1, . mp2, . mp3, . m1v, . m1a, . m2a, . mpa, . mpv
Internet media typeaudio/mpeg, video/mpeg
Developed byISO, IEC
Type of formataudio, video, container
Extended fromJPEG, H.261
Extended toMPEG-2

Contents


History

Modeled on the successful collaborative approach and the compression technologies developed by the Joint Photographic Experts Group and CCITT's Experts Group on Telephony (creators of the JPEG image compression standard and the H.261 standard for video conferencing respectively) the Moving Picture Experts Group (MPEG) working group was established in January 1988. An Internet media type, originally called a MIME type after MIME and sometimes a Content-type after the name of a header in several protocols whose value H261 is a 1990 ITU-T video coding standard originally designed for transmission over ISDN lines on which data rates are multiples of 64 kbit/s MPEG-2 is a standard for "the generic coding of moving pictures and associated audio information" The Joint Photographic Experts Group is a joint committee between ISO and ITU-T (formerly CCITT which created the JPEG and JPEG 2000 standards The ITU Telecommunication Standardization Sector ( ITU-T) coordinates standards for telecommunications on behalf of the International Telecommunication H261 is a 1990 ITU-T video coding standard originally designed for transmission over ISDN lines on which data rates are multiples of 64 kbit/s A videoconference (also known as a videoteleconference) is a set of interactive Telecommunication technologies which allow two or more locations to interact The Moving Picture Experts Group, commonly referred to as simply MPEG, is a Working group of ISO / IEC charged with the development of video and MPEG was formed to address the need for standard video and audio formats, and build on H. A technical standard is an established norm or requirement It is usually a formal document that establishes uniform engineering or technical criteria methods processes and practices 261 to get better quality through the use of more complex (non-real time) encoding methods. [2] [5]

Development of the MPEG-1 standard began in May 1988. 14 video and 14 audio codec proposals were submitted by individual companies and institutions for evaluation. The codecs were extensively tested for computational complexity and subjective (human perceived) quality, at data rates of 1. Computational complexity theory, as a branch of the Theory of computation in Computer science, investigates the problems related to the amounts of resources Subjectivity refers to a subject's perspective particularly feelings beliefs and desires 5 Mbit/s. This specific bitrate was chosen for transmission over T-1/E-1 lines and as the approximate data rate of audio CDs. In digital Telecommunications where a single physical wire pair can be used to carry many simultaneous voice conversations worldwide standards have been created and deployed Red Book is the standard for audio CDs ( Compact Disc Digital Audio system or CDDA) [4] The codecs that excelled in this testing were utilized as the basis for the standard and refined further, with additional features and other improvements being incorporated in the process. [6]

After 20 meetings of the full group in various cities around the world, and 4½ years of development and testing, the final standard (for parts 1-3) was approved in early November 1992 and published a few months later. [7] The reported completion date of the MPEG-1 standard, varies greatly. . . A largely complete draft standard was produced in September 1990, and from that point on, only minor changes were introduced. [2] The standard was finished with the 6 November 1992 meeting[8]. In July 1990, before the first draft of the MPEG-1 standard had even been written, work began on a second standard, MPEG-2,[9] intended to extend MPEG-1 technology to provide full broadcast-quality video (as per CCIR 601) at high bitrates (3 - 15 Mbit/s), and support for interlaced video. MPEG-2 is a standard for "the generic coding of moving pictures and associated audio information" ITU-R Recommendation BT601, more commonly know by the abbreviations Rec For the method of incrementally displaying Raster graphics, see Interlace (bitmaps. [10] Due in part to the similarity between the two codecs, the MPEG-2 standard includes full backwards compatibility with MPEG-1 video, so any MPEG-2 decoder can play MPEG-1 videos. [11]

Notably, the MPEG-1 standard very strictly defines the bitstream, and decoder function, but does not define how MPEG-1 encoding is to be performed (although a reference implementation is provided in ISO/IEC-11172-5). A bitstream or bit stream is a Time series of Bits A Bytestream is a series of Bytes typically of 8 bits each and can be [1] This means that MPEG-1 coding efficiency can drastically vary depending on the encoder used, and generally means that newer encoders perform significantly better than their predecessors. In Computer science, efficiency is used to describe properties of an Algorithm relating to how much of various types of resources it consumes [12] The first three parts (Systems, Video and Audio) of ISO/IEC 11172 were published in August 1993. [13]

Applications

Systems

Part 1 of the MPEG-1 standard covers systems, and is defined in ISO/IEC-11172-3.

MPEG-1 Systems specifies the logical layout and methods used to store the encoded audio, video, and other data into a standard bitstream, and to maintain synchronization between the different contents. This file format is specifically designed for storage on media, and transmission over data channels, that are considered relatively reliable. A file format is a particular way to encode information for storage in a Computer file. Channel, in communications (sometimes called communications channel) refers to the medium used to convey Information from a Only limited error protection is defined by the standard, and small errors in the bitstream may cause noticeable defects.

This structure was later named a program stream: "The MPEG-1 Systems design is essentially identical to the MPEG-2 Program Stream structure. Program stream ( PS or MPEG-PS) is a container format for multiplexing Digital audio, Video and more "[20] This terminology is more popular, precise (differentiates it from a transport stream) and will be used here. Transport stream ( TS, TP, MPEG-TS, or M2T) is a Communications protocol for audio, video, and data

Elementary Streams

Elementary streams (ES) are the raw bitstreams of MPEG-1 audio and video, output by an encoder. These files can be distributed on their own, such as is the case with MP3 files.

Additionally, elementary streams can be made more robust by packetizing them, i. e. dividing them into independent chunks, and adding a cyclic redundancy check (CRC) checksum to each segment for error detection. A cyclic redundancy check (CRC is a type of function that takes as input a data stream of any length and produces as output a value of a certain space commonly a 32-bit integer A checksum is a form of Redundancy check, a simple way to protect the integrity of data by detecting errors in data that are sent through space ( Telecommunications This is the Packetized Elementary Stream (PES) structure.

System Clock Reference (SCR) is a timing value stored in a 33-bit header of each ES, at a frequency/precision of 90 kHz, with an extra 9-bit extension that stores additional timing data with a precision of 27 MHz. [21] [22] These are inserted by the encoder, derived from the system time clock (STC). Simultaneously encoded audio and video streams will not have identical SCR values, however, due to buffering, encoding, jitter, and other delay.

Program Streams

Program Streams (PS) are concerned with combining multiple packetized elementary streams (usually just one audio and video PES) into a single stream, ensuring simultaneous delivery, and maintaining synchronization. The PS structure is known as a multiplex, or a container format. A container format is a computer file format that can contain various types of data compressed by means of standardized audio/video codecs.

Program time stamps (PTS) exist in PS to correct this disparity between audio and video SCR values (time-base correction). 90 kHz PTS values in the PS header tell the decoder which video SCR values match which audio SCR values. [21] PTS determines when to display a portion of a MPEG program, and is also used by the decoder to determine when data can be discarded from the buffer. In Computing, a buffer is a region of memory used to temporarily hold Data while it is being moved from one place to another [23] Either video or audio will be delayed by the decoder until the corresponding segment of the other arrives and can be decoded.

PTS handling can be problematic. Decoders must accept multiple program streams that have been concatenated (joined sequentially). This causes PTS values in the middle of the video to reset to zero, which then begin incrementing again. Such PTS wraparound disparities can cause timing issues that must be specially handled by the decoder.

Display Time Stamps (DTS), additionally, are required because of B-frames. With B-frames in the video stream, adjacent frames have to be encoded and decoded out-of-order (re-ordered frames). DTS is quite similar to PTS, but instead of just handling sequential frames, it contains the proper time-stamps to tell the decoder when to decode and display the next B-frame, ahead of its anchor (P- or I-) frame. Without B-frames in the video, PTS and DTS values are identical. [24]

Multiplexing

To generate the PS, the multiplexer will interleave the (two or more) packetized elementary streams. Interleaving in Computer science is a way to arrange Data in a non- Contiguous way in order to increase performance This is done so the packets of the simultaneous streams can be transferred over the same channel and are guaranteed to both arrive at the decoder at precisely the same time. Channel, in communications (sometimes called communications channel) refers to the medium used to convey Information from a This is a case of time-division multiplexing. Time-Division Multiplexing ( TDM) is a type of Digital or (rarely analog Multiplexing in which two or more signals or bit streams are transferred

Determining how much data from each stream should be in each interleaved segment (the size of the interleave) is complicated, yet an important requirement. Improper interleaving will result in buffer underflows or overflows, as the receiver gets more of one stream than it can store (eg. audio), before it gets enough data to decode the other simultaneous stream (eg. video). The MPEG Video Buffer Verifier (VBV) assists in determining if a multiplexed PS can be decoded by a device with a specified data throughput rate and buffer size. [25] This offers feedback to the muxer and the encoder, so that they can change the mux size or adjust bitrates as needed for compliance.

The PS, additionally, stores aspect ratio information which tells the decoder how much to stretch the height or width of a video when displaying it. The aspect ratio of a Shape is the ratio of its longer Dimension to its shorter dimension Different display devices (such as computer monitors and televisions) have different pixel heights/widths, which will result in video encoded for one appearing "squished" when played on the other, unless the aspect ratio information in the PS is used to compensate. A visual display unit, often called simply a monitor or display, is a piece of Electrical equipment which displays images generated from the Video Television ( TV) is a widely used Telecommunication medium for sending ( Broadcasting) and receiving moving Images, either monochromatic

Video

Part 2 of the MPEG-1 standard covers video and is defined in ISO/IEC-11172-2.

MPEG-1 Video exploits perceptual compression methods to significantly reduce the data rate required by a video stream. It reduces or completely discards information in certain frequencies and areas of the picture that the human eye has limited ability to fully perceive. It also utilizes effective methods to exploit temporal (over time) and spatial (across a picture) redundancy common in video, to achieve better data compression than would be possible otherwise. (See: Video compression)

Color Space

Example of 4:2:0 subsampling.  The 2 overlapping center circles represent chroma blue and chroma red (color) pixels, while the 4 outside circles represent the luma (brightness).
Example of 4:2:0 subsampling. Video compression refers to reducing the quantity of Data used to represent video images and is a straightforward combination of Image compression and Motion The 2 overlapping center circles represent chroma blue and chroma red (color) pixels, while the 4 outside circles represent the luma (brightness).

Before encoding video to MPEG-1 the color-space is transformed to Y'CbCr (Y'=Luma, Cb=Chroma Blue, Cr=Chroma Red). YCbCr or Y'CbCr is a family of Color spaces used as a part of the Color image pipeline in Video and Digital photography systems Luma (brightness, resolution) is stored separately from chroma (color, hue, phase) and even further separated into red and blue components. As applied to video signals luma represents the brightness in an image (the "black and white" or achromatic portion of the image Chrominance ( chroma for short is the signal used in many Video systems to carry the color information of the picture separately from the accompanying luma The chroma is also subsampled to 4:2:0, meaning it is decimated by one half vertically and one half horizontally, to just one quarter the resolution of the video. Chroma subsampling is the practice of encoding images by implementing less resolution for chroma Information than for luma information This article is related to Signal processing. For other meanings of the word Decimation, please see Decimation (disambiguation. [1]

Because the human eye is much less sensitive to small changes in color than in brightness, chroma subsampling is a very effective way to reduce the amount of video data that needs to be compressed. Chroma subsampling is the practice of encoding images by implementing less resolution for chroma Information than for luma information On videos with fine detail (high spatial complexity) this can manifest as chroma aliasing artifacts. In Mathematics, Physics, and Engineering, spatial frequency is a characteristic of any structure that is periodic across position in space This article applies to signal processing including computer graphics Compared to other digital compression artifacts, this issue seems to be very rarely a source of annoyance. A compression artifact (or artefact) is the result of an aggressive Data compression scheme applied to an Image, audio, or Video

Because of subsampling, Y'CbCr video must always be stored using even dimensions (divisible by 2), otherwise chroma mismatch ("ghosts") will occur, and it will appear as if the color is ahead of, or behind the rest of the video, much like a shadow. In Mathematics, a divisor of an Integer n, also called a factor of n, is an integer which evenly divides n without

Y'CbCr is often inaccurately called YUV which is only used in the domain of analog video signals. Analog transmission is a method of conveying voice data image signal or video information using a continuous signal which varies in amplitude phase or some other property in proportion Similarly, the terms luminance and chrominance are often used instead of the (more accurate) terms luma and chroma. Luminance is a photometric measure of the density of Luminous intensity in a given direction Chrominance ( chroma for short is the signal used in many Video systems to carry the color information of the picture separately from the accompanying luma

Resolution/Bitrate

MPEG-1 supports resolutions up to 4095×4095 (12-bits), and bitrates up to 100 Mbit/s. [5]

MPEG-1 videos are most commonly seen using Source Input Format (SIF) resolution: 352x240, 352x288, or 320x240. Source Input Format ( SIF) defined in MPEG-1, is a video format that was developed to allow the storage and transmission of digital video These low resolutions, combined with a bitrate less than 1. 5 Mbit/s, make up what is known as a constrained parameters bitstream (CPB), later renamed the "Low Level" (LL) profile in MPEG-2. This is the minimum video specifications any decoder should be able to handle, to be considered MPEG-1 compliant. For the drum and bass musician see Decoder (artist A decoder is a device which does the reverse of an Encoder, undoing the This was selected to provide a good balance between quality and performance, allowing the use of reasonably inexpensive hardware of the time. [2] [5]

Frame/Picture/Block Types

MPEG-1 has several frame/picture types that serve different purposes. The most important, yet simplest are I-frames.

I-Frames

I-frame is an abbreviation for Intra-frame, so-called because they can be decoded independently of any other frames. They may also be known as I-pictures, or keyframes due to their somewhat similar function to the key frames used in animation. A key frame in Animation and Filmmaking is a drawing which defines the starting and ending points of any smooth transition. I-frames can be considered effectively identical to baseline JPEG images. [5]

High-speed seeking through an MPEG-1 video is only possible to the nearest I-frame. When cutting a video it is not possible to start playback of a segment of video before the first I-frame in the segment (at least not without computationally-intensive re-encoding). For this reason, I-frame-only MPEG videos are used in editing applications.

I-frame only compression is very fast, but produces very large file sizes: a factor of 3× (or more) larger than normally encoded MPEG-1 video, depending on how temporally complex a specific video is. [2] I-frame only MPEG-1 video is very similar to MJPEG video. In Multimedia, Motion JPEG ( M-JPEG) is an informal name for multimedia formats where each Video frame or Interlaced field of a Digital video So much so that very high-speed and theoretically lossless (in reality, there are rounding errors) conversion can be made from one format to the other, provided a couple restrictions (color space and quantization matrix) are followed in the creation of the bitstream. [26]

The length between I-frames is known as the group of pictures (GOP) size. In MPEG encoding, a group of pictures, or GOP, specifies the order in which Intra-frames and Inter frames are arranged MPEG-1 most commonly uses a GOP size of 15-18. i. e. 1 I-frame for every 14-17 non-I-frames (some combination of P- and B- frames). With more intelligent encoders, GOP size is dynamically chosen, up to some pre-selected maximum limit. [5]

Limits are placed on the maximum number of frames between I-frames due to decoding complexing, decoder buffer size, recovery time after data errors, seeking ability, and accumulation of IDCT errors in low-precision implementations most common in hardware decoders (See: IEEE-1180). The Institute of Electrical and Electronics Engineers or IEEE (read eye-triple-e) is an international Non-profit, professional organization

P-frames

P-frame is an abbreviation for Predicted-frame. They may also be called forward-predicted frames, or inter-frames (B-frames are also inter-frames).

P-frames exist to improve compression by exploiting the temporal (over time) redundancy in a video. P-frames store only the difference in image from the frame (either an I-frame or P-frame) immediately preceding it (this reference frame is also called the anchor frame).

The difference between a P-frame and its anchor frame is calculated using motion vectors on each macroblock of the frame (see below). Such motion vector data will be embedded in the P-frame for use by the decoder.

A P-frame can contain any number of intra-coded blocks, in addition to any forward-predicted blocks. [27]

If a video drastically changes from one frame to the next (such as a scene change), it is more efficient to encode it as an I-frame.

B-frames

B-frame stands for bidirectional-frame. They may also be known as backwards-predicted frames or B-pictures. B-frames are quite similar to P-frames, except they can make predictions using both the previous and future frames (i. e. two anchor frames).

It is therefore necessary for the player to first decode the next I- or P- anchor frame sequentially after the B-frame, before the B-frame can be decoded and displayed. This makes B-frames very computationally complex, requires larger data buffers, and causes an increased delay on both decoding and during encoding. In Computing, a buffer is a region of memory used to temporarily hold Data while it is being moved from one place to another This also necessitates the display time stamps (DTS) feature in the container/system stream (see above). As such, B-frames have long been subject of much controversy, they are often avoided in videos, and are sometimes not fully supported by hardware decoders.

No other frames are predicted from a B-frame. Because of this, a very low bitrate B-frame can be inserted, where needed, to help control the bitrate. If this was done with a P-frame, future P-frames would be predicted from it and would lower the quality of the entire sequence. However, similarly, the future P-frame must still encode all the changes between it and the previous I- or P- anchor frame (a second time) in addition to much of the changes being coded in the B-frame. B-frames can also be beneficial in videos where the background behind an object is being revealed over several frames, or in fading transitions, such as scene changes. [2] [5]

A B-frame can contain any number of intra-coded blocks and forward-predicted blocks, in addition to backwards-predicted, or bidirectionally predicted blocks. [5] [27]

D-frames

MPEG-1 has a unique frame type not found in later video standards. D-frames or DC-pictures are independent images (intra-frames) that have been encoded DC-only (AC coefficients are removed—see DCT below) and hence are very low quality. D-frames are never referenced by I-, P- or B- frames. D-frames are only used for fast previews of video, for instance when seeking through a video at high speed. [2]

Given moderately higher-performance decoding equipment, this feature can be approximated by decoding I-frames instead. This provides higher quality previews, and without the need for D-frames taking up space in the stream, yet not improving video quality.

Macroblocks

MPEG-1 operates on video in a series of 8x8 blocks for quantization. However, because chroma (color) is subsampled by a factor of 4, each pair of (red and blue) chroma blocks corresponds to 4 different luma blocks. This set of 6 blocks, with a resolution of 16x16, is called a macroblock.

A macroblock is the smallest independent unit of (color) video. Motion vectors (see below) operate solely at the macroblock level.

If the height and/or width of the video is not exact multiples of 16, a full row of macroblocks must still be encoded (though not displayed) to store the remainder of the picture (macroblock padding). This wastes a significant amount of data in the bitstream, and is to be strictly avoided.

Some decoders will also improperly handle videos with partial macroblocks, resulting in visible artifacts.

Motion Vectors

To decrease the amount of spatial redundancy in a video, only blocks that change are updated, (up to the maximum GOP size). This is known as conditional replenishment. However, this is not very effective by itself. Movement of the objects, and/or the camera may result in large portions of the frame needing to be updated, even though only the position of the previously encoded objects has changed. Through motion estimation the encoder can compensate for this movement and remove a large amount of redundant information.

The encoder compares the current frame with adjacent parts of the video from the anchor frame (previous I- or P- frame) in a diamond pattern, up to a (encoder-specific) predefined radius limit from the area of the current macroblock. Remote Authentication Dial In User Service ( RADIUS) is a networking protocol that provides centralized access authorization and accounting management for people or computers If a match is found, only the direction and distance (i. e. the vector of the motion) from the previous video area to the current macroblock need to be encoded into the intra-frame (P- or B- frame). The reverse of this process, performed by the decoder to reconstruct the picture, is called motion compensation.

A predicted macroblock rarely matches the current picture perfectly, however. The differences between the estimated matching area, and the real frame/macroblock is called the prediction error. The larger the error, the more data must be additionally encoded in the frame. For efficient video compression, it is very important that the encoder is capable of effectively and precisely performing motion estimation.

Motion vectors record the distance between two areas on screen based on the number of pixels (called pels). MPEG-1 video uses a motion vector (MV) precision of one half of one pixel, or half-pel. The finer the precision of the MVs, the more accurate the match is likely to be, and the more efficient the compression. There are trade-offs to higher precision, however. Finer MVs result in larger data size, as larger numbers must be stored in the frame for every single MV, increased coding complexity as increasing levels of interpolation on the macroblock are required for both the encoder and decoder, and diminishing returns (minimal gains) with higher precision MVs. Half-pel was chosen as the ideal trade-off. (See: qpel)

Because neighboring macroblocks are likely to have very similar motion vectors, this redundant information can be compressed quite effectively by being stored DPCM-encoded. Quarter pixel (also known as Q-pel or Qpel) refers to a quarter of a standard Pixel. Only the (smaller) amount of difference between the MVs for each macroblock needs to be stored in the final bitstream.

P-frames have 1 motion vector per macroblock, relative to the previous anchor frame. B-frames, however, can use 2 motion vectors; one from the previous anchor frame, and one from the future anchor frame. [27]

Partial macroblocks, and black borders/bars encoded into the video that do not fall exactly on a macroblock boundary, cause havoc with motion prediction. The block padding/border information prevents the macroblock from closely matching with any other area of the video, and so, significantly larger prediction error information must be encoded for every one of the several dozen partial macroblocks along the screen border. DCT encoding and quantization (see below) also isn't nearly as effective when there is large/sharp picture contrast in a block.

An even more serious problem exists with macroblocks that contain significant, random, edge noise, where the picture transitions to (typically) black. All the above problems also apply to edge noise. In addition, the added randomness is simply impossible to compress significantly. All of these effects will lower the quality (or increase the bitrate) of the video substantially.

DCT

Each 8x8 block is encoded using the Forward Discrete Cosine Transform (FDCT). A discrete cosine transform ( DCT) expresses a sequence of finitely many data points in terms of a sum of Cosine functions oscillating at different frequencies [28] This process by itself is theoretically lossless, and is reversed by the Inverse DCT (IDCT) upon playback to produce the original values. A discrete cosine transform ( DCT) expresses a sequence of finitely many data points in terms of a sum of Cosine functions oscillating at different frequencies In reality, there are some (sometimes large) rounding errors. The minimum allowed accuracy of a DCT implementation is defined by IEEE-1180.

The FDCT process converts the 64 uncompressed pixel values (brightness) into 64 different frequency values. One (large) DC coefficient, which is the average of the entire 8x8 block, and 63 smaller AC coefficients, which are positive or negative values, each relative to the value of the DC coefficient.

An example FDCT encoded 8x8 block:

\begin{bmatrix} -415 & -30 & -61 & 27 & 56 & -20 & -2 & 0 \\ 4 & -22 & -61 & 10 & 13 & -7 & -9 & 5 \\ -47 & 7 & 77 & -25 & -29 & 10 & 5 & -6 \\ -49 & 12 & 34 & -15 & -10 & 6 & 2 & 2 \\ 12 & -7 & -13 & -4 & -2 & 2 & -3 & 3 \\ -8 & 3 & 2 & -6 & -2 & 1 & 4 & 2 \\ -1 & 0 & 0 & -2 & -1 & -3 & 4 & -1 \\ 0 & 0 & -1 & -4 & -1 & 0 & 1 & 2\end{bmatrix}

Since the DC coefficient remains mostly consistent from one block to the next, it can be compressed quite effectively with DPCM-encoding. Only the (smaller) amount of difference between each DC value needs to be stored in the final bitstream. Additionally, this DCT frequency conversion is necessary for quantization (see below).

Quantization

Quantization (of digital data) is, essentially, the process of reducing the accuracy of a signal, by dividing it into some larger step size (i. Quantization, involved in Image processing, is a Lossy compression technique achieved by compressing a range of values to a single quantum value e. finding the nearest multiple, and discarding the remainder/modulus).

The frame-level quantizer is a number from 0 to 31 (although encoders will usually omit/disable some of the extreme values) which determines how much information will be removed from a given frame. The frame-level quantizer is either dynamically selected by the encoder to maintain a certain user-specified bitrate, or (much less commonly) directly specified by the user.

Contrary to popular belief, a fixed frame-level quantizer (set by the user) does not deliver a constant level of quality. Instead, it is an arbitrary metric that will provide a somewhat varying level of quality, depending on the contents of each frame. Given two files of identical sizes, the one encoded at an average bitrate should look better than the one encoded with a fixed quantizer (variable bitrate). Constant quantizer encoding can be used, however, to accurately determine the minimum and maximum bitrates possible for encoding a given video.

A quantization matrix is a string of 64-numbers (0-255) which tells the encoder how relatively important or unimportant each piece of visual information is. Each number in the matrix corresponds to a certain frequency component of the video image.

An example quantization matrix:

\begin{bmatrix} 16 & 11 & 10 & 16 & 24 & 40 & 51 & 61 \\ 12 & 12 & 14 & 19 & 26 & 58 & 60 & 55 \\ 14 & 13 & 16 & 24 & 40 & 57 & 69 & 56 \\ 14 & 17 & 22 & 29 & 51 & 87 & 80 & 62 \\ 18 & 22 & 37 & 56 & 68 & 109 & 103 & 77 \\ 24 & 35 & 55 & 64 & 81 & 104 & 113 & 92 \\ 49 & 64 & 78 & 87 & 103 & 121 & 120 & 101 \\ 72 & 92 & 95 & 98 & 112 & 100 & 103 & 99\end{bmatrix}

Quantization is performed by taking each of the 64 frequency values of the DCT block, dividing them by the frame-level quantizer, then dividing them by their corresponding values in the quantization matrix. Finally, the result is rounded down. This significantly reduces, or completely eliminates, the information in some frequency components of the picture. Typically, high frequency information is less visually important, and so high frequencies are much more strongly quantized (drastically reduced). MPEG-1 actually uses two separate quantization matrices, one for intra-blocks (I-blocks) and one for inter-block (P- and B- blocks) so quantization of different block types can be done independently, and so, more effectively. [2]

This quantization process usually reduces a significant number of the AC coefficients to zero, (known as sparse data) which can then be more efficiently compressed by entropy coding (lossless compression) in the next step.

An example quantized DCT block:

\begin{bmatrix} -26 & -3 & -6 & 2 & 2 & -1 & 0 & 0 \\ 0 & -2 & -4 & 1 & 1 & 0 & 0 & 0 \\ -3 & 1 & 5 & -1 & -1 & 0 & 0 & 0 \\ -4 & 1 & 2 & -1 & 0 & 0 & 0 & 0 \\ 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0\end{bmatrix}

Quantization eliminates a large amount of data, and is the main lossy processing step in MPEG-1 video encoding. This is also the primary source of most MPEG-1 video compression artifacts, like blockiness, color banding, noise, ringing, discoloration, et al. A compression artifact (or artefact) is the result of an aggressive Data compression scheme applied to an Image, audio, or Video Colour banding is a problem of inaccurate colour presentation in Computer graphics. is a one volume manga created by Tsutomu Nihei as a prequel to his ten-volume work Blame!. Electricity In electrical circuits ringing is an unwanted Oscillation of a Voltage or current. This happens when video is encoded with an insufficient bitrate, and the encoder is therefore forced to use high frame-level quantizers (strong quantization) through much of the video.

Entropy Coding

Several steps in the encoding of MPEG-1 video are lossless, meaning they will be reversed upon decoding, to produce exactly the same (original) values. Since these lossless data compression steps don't add noise into, or otherwise change the contents (unlike quantization), it is sometimes referred to as noiseless coding. In Information theory, Shannon's source coding theorem (or noiseless coding theorem) establishes the limits to possible Data compression, and the operational [19] Since lossless compression aims to remove as much redundancy as possible, it is known as entropy coding in the field of information theory. In Information theory an entropy encoding is a lossless Data compression scheme that is independent of the specific characteristics of the medium Information theory is a branch of Applied mathematics and Electrical engineering involving the quantification of Information.

The DC coefficients and motion vectors are DPCM-encoded.

Run-length encoding (RLE) is a very simple method of compressing repetition. Run-length encoding ( RLE) is a very simple form of Data compression in which runs of data (that is sequences in which the same data value occurs in many A sequential string of characters, no matter how long, can be replaced with a few bytes, noting the value that repeats, and how many times. For example, if someone were to say "five nines", you would know they mean the number: 99999.

RLE is particularly effective after quantization, as a significant number of the AC coefficients are now zero (called sparse data), and can be represented with just a couple bytes. This is stored in a special 2-dimensional Huffman table that codes the run-length and the run-ending character. In mathematics the dimension of a Space is roughly defined as the minimum number of Coordinates needed to specify every point within it

Huffman Coding is a very popular method of entropy coding, and used in MPEG-1 video to reduce the data size. The data is analyzed to find strings that repeat often. Those strings are then put into a special (Huffman) table, with the most frequently repeating data assigned the shortest code. This keeps the data as small as possible with this form of compression. [19] Once the table is constructed, those strings in the data are replaced with their (much smaller) codes, which reference the appropriate entry in the table. The decoder simply reverses this process to produce the original data.

This is the final step in the video encoding process, so the result of Huffman coding is known as the MPEG-1 video "bitstream. History In 1951 David A Huffman and his MIT information theory classmates were given "

Audio

Part 3 of the MPEG-1 standard covers audio and is defined in ISO/IEC-11172-3.

MPEG-1 Audio utilizes psychoacoustics to significantly reduce the data rate required by an audio stream. Psychoacoustics is the study of subjective human Perception of Sounds Alternatively it can be described as the study of the Psychological correlates It reduces or completely discards certain parts of the audio that the human ear can't hear, either because they are in frequencies where the ear has limited sensitivity, or are masked by other (typically louder) sounds. Auditory masking occurs when the perception of one Sound is affected by the presence of another sound (Gelfand 2004 [29]

Channel Encoding:

MPEG-1 Audio is divide into 3 layers. Sampling theorem The Nyquist–Shannon sampling theorem states that perfect reconstruction In Telecommunications and Computing, bitrate (sometimes written bit rate, data rate or as a Variable R or f b Each higher layer is more computationally complex, and generally more efficient at lower bitrates than the previous. [5] The layers are also backwards compatible, so a Layer II decoder can also play Layer I audio, but NOT Layer III audio. [29]

Layer I

MPEG-1 Layer I is nothing more than a simplified version of Layer II. [6] Layer I uses a smaller 384-sample frame size for very low-delay, and finer resolution. [12] This is advantageous for applications like teleconferencing, studio editing, etc. It has lower complexity than Layer II to facilitate real-time encoding on the hardware available circa 1990. In Computer science, real-time computing (RTC is the study of hardware and software systems that are subject to a "real-time constraint"—i Circa (often abbreviated c, ca, ca or cca and sometimes Italicized to show it is Latin) means "about" [19]

Layer I saw limited adoption in its time, and most notably was used on Philips' defunct Digital Compact Cassette at a bitrate of 384 kbit/s. Koninklijke Philips Electronics NV ( Royal Philips Electronics Inc. Digital Compact Cassette ( DCC) is an obsolete Magnetic tape sound recording format introduced by Philips and Matsushita in late 1992 [1] With the substantial performance improvements in digital processing since its introduction, Layer I quickly became unnecessary and obsolete.

Layer I audio files typically use the extension . mp1 or sometimes . m1a

Layer II

MPEG-1 Layer II (MP2—often incorrectly called MUSICAM)[29] is a lossy audio format designed to provide high quality at about 192 kbit/s for stereo sound. A lossy compression method is one where compressing data and then decompressing it retrieves data that may well be different from the original but is close enough to be useful Decoding MP2 audio is computationally simple, relative to MP3, AAC, etc. Computational complexity theory, as a branch of the Theory of computation in Computer science, investigates the problems related to the amounts of resources Advanced Audio Coding ( AAC) is a standardized lossy compression and encoding scheme for Digital audio.

History/MUSICAM

MPEG-1 Layer II was derived from the MUSICAM (Masking pattern adapted Universal Subband Integrated Coding And Multiplexing) audio codec, developed by Centre commun d'études de télévision et télécommunications (CCETT), Philips, and Institut für Rundfunktechnik (IRT/CNET)[30] [6] [5] as part of the EUREKA 147 pan-European inter-governmental research and development initiative for the development of digital audio broadcasting. Koninklijke Philips Electronics NV ( Royal Philips Electronics Inc. The Institut für Rundfunktechnik GmbH (IRT is the research centre of the German broadcasters ( ARD / ZDF / DLR Austria's broadcaster ( ORF) and the Swiss Digital Audio Broadcasting ( DAB) also known as Eureka 147, is a Digital radio technology for Broadcasting Radio stations used in

Most key features of MPEG-1 Audio were directly inherited from MUSICAM, including the filter bank, time-domain processing, audio frame sizes, etc. However, improvements were made, and the actual MUSICAM algorithm was not used in the final MPEG-1 Layer II audio standard. The widespread usage of the term MUSICAM to refer to Layer II is entirely incorrect and discouraged for both technical and legal reasons. [29]

Technical Details

Layer II/MP2 is a time-domain encoder. It uses a low-delay 32 sub-band polyphased filter bank for time-frequency mapping; having overlapping ranges (i. A polyphase quadrature filter, or PQF, is a Filter bank which splits an input signal into a given number N (mostly a power of 2 of equidistant Sub-bands A filter bank is an array of band-pass filters that separates the input signal into several components each one carrying a single Frequency Subband e. polyphased) to prevent aliasing. [31] The psychoacoustic model is based on the principles of auditory masking, simultaneous masking effects, and the absolute threshold of hearing (ATH). Auditory masking occurs when the perception of one Sound is affected by the presence of another sound (Gelfand 2004 In acoustics simultaneous masking is masking between two concurrent sounds The absolute threshold of hearing (ATH is the minimum Sound level of a Pure tone that an average ear with normal hearing can hear in a noiseless environment The size of a Layer II frame is fixed at 1152-samples (coefficients).

Time domain refers to how analysis and quantization is performed: on short, discrete samples/chunks of the audio waveform. Time domain is a term used to describe the analysis of mathematical functions or physical signals with respect to Time. This offers low-delay as only a small number of samples are analyzed before encoding, as opposed to frequency domain encoding (like MP3) which must analyze many times more samples before it can decide how to transform and output encoded audio. Frequency domain is a term used to describe the analysis of Mathematical functions or signals with respect to frequency This also offers higher performance on complex, random and transient impulses (such as percussive instruments, and applause), offering avoidance of artifacts like pre-echo. In Acoustics and Audio, a transient is a short-duration Signal that represents a non-harmonic attack phase of a musical sound or spoken word

Visualization of the 32 sub-band filter bank used by MPEG-1 Audio, showing the disparity between the equal band-size of MP2 and the varying width of critical bands ("barks").
Visualization of the 32 sub-band filter bank used by MPEG-1 Audio, showing the disparity between the equal band-size of MP2 and the varying width of critical bands ("barks").

The 32 sub-band filter bank returns 32 amplitude coefficients, one for each equal-sized frequency band/segment of the audio, which is about 700 Hz wide (depending on the audio's sampling frequency). Amplitude is the magnitude of change in the oscillating variable with each Oscillation, within an oscillating system The encoder then utilizes the psychoacoustic model to determine which sub-bands contain audio information that is less important, and so, where quantization will be in-audible, or at least much less noticeable. [19]

Example FFT analysis on an audio wave sample.
Example FFT analysis on an audio wave sample.

The psychoacoustic model is applied using a 1024-point Fast Fourier Transform (FFT). Of the 1152 samples per frame, 64 samples at the top and bottom of the frequency range are ignored for this analysis. They are presumably not significant enough to change the result. The psychoacoustic model uses an empirically determined masking model to determine which sub-bands contribute more to the masking threshold, and how much quantization noise each can contain without being perceived. The masking threshold is the sound pressure level ( SPL) of a Sound you need to make hearing another in presence of a masker signal Any sounds below the absolute threshold of hearing (ATH) are completely discarded. The absolute threshold of hearing (ATH is the minimum Sound level of a Pure tone that an average ear with normal hearing can hear in a noiseless environment The available bits are then assigned to each sub-band accordingly. [29] [31]

Typically, sub-bands are less important if they contain quieter sounds (smaller coefficient) than a neighboring (i. e. similar frequency) sub-band with louder sounds (larger coefficient). Also, "noise" components typically have a more significant masking effect than "tonal" components. [30]

Less significant sub-bands are reduced in accuracy by quantization. This basically involves compressing the frequency range (amplitude of the coefficient), i. e. raising the noise floor. Then computing an amplification factor, for the decoder to use to re-expand each sub-band to the proper frequency range. [32] [33]

Layer II can also optionally use intensity stereo coding, a form of joint stereo. This means that the frequencies above 6 kHz of both channels are combined/down-mixed into one single (mono) channel, but the "side channel" information on the relative intensity (volume, amplitude) of each channel is preserved and encoded into the bitstream separately. On playback, the single channel is played through left and right speakers, with the intensity information applied to each channel to give the illusion of stereo sound. [30] [19] This perceptual trick is known as stereo irrelevancy. This can allow further reduction of the audio bitrate without much perceivable loss of fidelity, but is generally not used with higher bitrates as it does not provide very high quality (transparent) audio. [19] [31] [34]

Quality

Subjective audio testing by experts, in the most critical conditions ever implemented, has shown MP2 to offer transparent audio compression at 256 kbit/s for 16-bit 44. 1 kHz CD audio using the earliest reference implementation (more recent encoders should presumably perform even better). Red Book is the standard for audio CDs ( Compact Disc Digital Audio system or CDDA) [35] [1] [31] [30] That (approximately) 1:6 compression ratio for CD audio is particularly impressive because it is quite close to the estimated upper limit of perceptual entropy, at just over 1:8. [36] [37] Achieving much higher compression is simply not possible without discarding some perceptible information.

Despite some 20 years of progress in the field of digital audio coding, MP2 remains the preeminent lossy audio coding standard due to its especially high audio coding performances on highly critical audio material such as castanet, symphonic orchestra, male and female voices and particularly complex and high energy transients (impulses) like percussive sounds: triangle, glockenspiel and audience applause. . . quite the opposite of MP3. [12] More recent testing has shown that MPEG Multichannel (based on MP2), despite being compromised by an inferior matrixed mode (for the sake of backwards compatibility)[1] [31] rates just slightly lower than much more recent audio codecs, such as Dolby Digital (AC-3) and Advanced Audio Coding (AAC) (mostly within the margin of error—and substantially superior in some cases, such as audience applause). MPEG Multichannel is an extension to the MPEG-1 Layer II Audio compression specification as defined in the MPEG-2 standard ISO / MPEG Dolby Digital is the marketing name for a series of lossy audio compression technologies developed by Dolby Advanced Audio Coding ( AAC) is a standardized lossy compression and encoding scheme for Digital audio. [38] [39] This is one reason that MP2 audio continues to be used extensively. The MPEG-2 AAC Stereo verification tests reached a vastly different conclusion, however, showing AAC to provide superior performance to MP2 at half the bitrate. [40] The reason for this disparity with both earlier and later tests is not clear, but strangely, a sample of applause is notably absent from this test.

Layer II audio files typically use the extension . mp2 or sometimes . m2a

Layer III/MP3

MPEG-1 Layer III (MP3) is a lossy audio format designed to provide acceptable quality at about 64 kbit/s for monaural audio over single-channel (BRI) ISDN links, and 128 kbit/s for stereo sound. A lossy compression method is one where compressing data and then decompressing it retrieves data that may well be different from the original but is close enough to be useful Basic rate interface (BRI 2B+D 2B1D is an Integrated Services Digital Network (ISDN configuration defined in the physical layer standard I

History/ASPEC

Layer III/MP3 was derived from the Adaptive Spectral Perceptual Entropy Coding (ASPEC) codec developed by Fraunhofer as part of the EUREKA 147 pan-European inter-governmental research and development initiative for the development of digital audio broadcasting. Digital Audio Broadcasting ( DAB) also known as Eureka 147, is a Digital radio technology for Broadcasting Radio stations used in ASPEC was adapted to fit in with the Layer II/MUSICAM model (frame size, filter bank, FFT, etc. ), to become Layer III. [6]

ASPEC was itself based on Multiple adaptive Spectral audio Coding (MSC) by E. F. Schroeder, Optimum Coding in the Frequency domain (OCF) the doctoral thesis by Karlheinz Brandenburg at the University of Erlangen-Nuremberg, Perceptual Transform Coding (PXFM) by J. A dissertation (also called thesis or disquisition) is a document that presents the author's Research and findings and is submitted in support of candidature Karlheinz Brandenburg (born June 20, 1954, in Erlangen, Germany) is an Audio engineer who has contributed to the Audio compression History The university was founded in 1742 in Bayreuth by Frederick Margrave of Bayreuth, and moved to Erlangen in 1743 D. Johnston at AT&T Bell Labs, and Transform coding of audio signals by Y. Before proposing a merge request please see Talk and see if the merger you propose has recently been made and Bell Laboratories (also known as Bell Labs and formerly known as AT&T Bell Laboratories and Bell Telephone Laboratories) is the Research organization Mahieux and J. Petit at Institut für Rundfunktechnik (IRT/CNET). The Institut für Rundfunktechnik GmbH (IRT is the research centre of the German broadcasters ( ARD / ZDF / DLR Austria's broadcaster ( ORF) and the Swiss [41]

Technical Details

MP3 is a frequency-domain audio transform encoder. Transform coding is a type of Data compression for "natural" data like audio signals or photographic Images The transformation is typically Even though it utilizes some of the lower layer functions, MP3 is quite different from Layer II/MP2.

MP3 works on 1152 samples like Layer II, but needs to take multiple frames for analysis before frequency-domain (MDCT) processing and quantization can be effective. It outputs a variable number of samples, using a bit buffer to enable this variable bitrate (VBR) encoding while maintaining 1152 sample size output frames. This causes a significantly longer delay before output, which has caused MP3 to be considered unsuitable for studio applications where editing or other processing needs to take place. [31]

MP3 does not benefit from the 32 sub-band poylphased filter bank, instead just using an 18-point MDCT transformation on each output to split the data into 576 frequency components, and processing it in the frequency domain. [30] This extra granularity allows MP3 to have a much finer psychoacoustic model, and more carefully apply appropriate quantization to each band, providing much better low-bitrate performance.

Frequency-domain processing imposes some limitations as well, causing a factor of 12 or 36 × worse temporal resolution than Layer II. This causes quantization artifacts, due to transient sounds like percussive events and other high-frequency events that spread over a larger window. This results in audible smearing and pre-echo. Pre-echo is an audio Compression artifact where a sound is heard before it occurs (hence the name [31] MP3 uses pre-echo detection routines, and VBR encoding, which allows it to temporarily increase the bitrate during difficult passages, in an attempt to reduce this effect. It is also able to switch between the normal 36 sample quantization window, and instead using 3× short 12 sample windows instead, to reduce the temporal (time) length of quantization artifacts. [31] And yet in choosing a fairly small window size to make MP3's temporal response adequate enough to avoid the most serious artifacts, MP3 becomes much less efficient in frequency domain compression of stationary, tonal components.

Being forced to use a hybrid time domain (filter bank)/frequency domain (MDCT) model to fit in with Layer II simply wastes processing time and compromises quality by introducing aliasing artifacts. MP3 has an aliasing cancellation stage specifically to mask this problem, but which instead produces frequency domain energy which must be encoded in the audio. This is pushed to the top of the frequency range, where most people have limited hearing, in hopes the distortion it causes will be less audible.

Layer II's 1024 point FFT doesn't entirely cover all samples, and would omit several entire MP3 sub-bands, where quantization factors must be determined. MP3 instead uses two passes of FFT analysis for spectral estimation, to calculate the global and individual masking thresholds. This allows it to cover all 1152 samples. Of the two, it utilizes the global masking threshold level from the more critical pass, with the most difficult audio.

In addition to Layer II's intensity encoded joint stereo, MP3 can use middle/side (mid/side, m/s, MS, matrixed) joint stereo. With mid/side stereo, certain frequency ranges of both channels are merged into a single (middle, mid, L+R) mono channel, while the sound difference between the left and right channels is stored as a separate (side, L-R) channel. Unlike intensity stereo, this process does not discard any audio information. When combined with quantization, however, it can exaggerate artifacts.

If the difference between the left and right channels is small, the side channel will be small, which will offer as much as a 50% bitrate savings, and associated quality improvement. If the difference between left and right is large, standard (discrete, left/right) stereo encoding may be preferred, as mid/side joint stereo will not provide any benefits. An MP3 encoder can switch between m/s stereo and full stereo on a frame-by-frame basis. [30] [42] [34]

Unlike Layers I/II, MP3 uses variable-length Huffman coding (after perceptual) to further reduce the bitrate, without any further quality loss. History In 1951 David A Huffman and his MIT information theory classmates were given [29] [31]

Quality

These technical limitations inherently prevent MP3 from providing critically transparent quality at any bitrate. This makes Layer II sound quality actually superior to MP3 audio, when it is used at a high enough bitrate to avoid noticeable artifacts. The term "transparent" often gets misused, however. The quality of MP3 (and other codecs) is sometimes called "transparent," even at impossibly low bitrates, when what is really meant is "good quality on average/non-critical material," or perhaps "exhibiting only non-annoying artifacts. "

MP3's more fine-grained and selective quantization does prove notably superior to Layer II/MP2 at lower-bitrates, however. It is able to provide nearly equivalent audio quality to Layer II, at a 15% lower bitrate (approximately). [39] [40] 128 kbit/s is considered the "sweet spot" for MP3; meaning it provides generally-acceptable quality stereo sound on most music, and there are diminishing quality improvements from increasing the bitrate further. A sweet spot is a place often numerical as opposed to physical where a combination of factors suggest a particularly suitable solution In Economics, diminishing returns is also called diminishing marginal returns or the law of diminishing returns. MP3 is also regarded as exhibiting artifacts that are less-annoying than Layer II, when both are used at bitrates that are too low to possibly provide faithful reproduction.

Layer III audio files use the extension . mp3


MPEG-2 Audio Extensions

The MPEG-2 standard includes several extensions to MPEG-1 Audio. MPEG-2 is a standard for "the generic coding of moving pictures and associated audio information" [31] MPEG-2 Audio is defined in ISO/IEC-13818-3

These sampling rates are exactly half that of those originally defined for MPEG-1 Audio. Sampling theorem The Nyquist–Shannon sampling theorem states that perfect reconstruction In Telecommunications and Computing, bitrate (sometimes written bit rate, data rate or as a Variable R or f b They were introduced to maintain higher quality sound when encoding audio at lower-bitrates. [11] The even-lower bitrates were introduced because tests showed that MPEG-1 Audio could provide higher quality than any existing (circa 1994) very low bitrate (i. Circa (often abbreviated c, ca, ca or cca and sometimes Italicized to show it is Latin) means "about" e. speech) audio codecs. Speech coding is the application of Data compression of Digital audio signals containing Speech. [43]

Conformance Testing

Part 4 of the MPEG-1 standard covers conformance testing, and is defined in ISO/IEC-11172-4.

Conformance: Procedures for testing conformance.

Provides two sets of guidelines and reference bitstreams for testing the conformance of MPEG-1 audio and video decoders, as well as the bitstreams produced by an encoder. [5] [9]

Reference Software

Part 5 of the MPEG-1 standard includes reference software, and is defined in ISO/IEC-11172-5.

Simulation: Reference software.

C language reference code for encoding and decoding of audio and video, as well as multiplexing and demultiplexing. tags please moot on the talk page first! --> In Computing, C is a general-purpose cross-platform block structured [5] [9]

This includes the ISO Dist10 audio encoder code, which LAME and TooLAME were based upon. LAME is an Open source application used to encode audio into the MP3 file format TooLAME is a Free Software ( LGPL) and Open Source MPEG-1 Layer II (MP2 audio encoder written primarily by Mike Cheng

See also

Implementations

References

  1. ^ a b c d e f Adler, Mark; Popp, Harald & Hjerde, Morten (November 09, 1996), MPEG-FAQ: multimedia compression [1/9], faqs. Events 694 - Egica, a king of the Visigoths of Hispania, accuses Jews of aiding Muslims sentencing all Year 1996 ( MCMXCVI) was a Leap year starting on Monday (link will display full 1996 Gregorian calendar) org, <http://www.faqs.org/faqs/mpeg-faq/part1/>. Retrieved on 9 April 2008 
  2. ^ a b c d e f g h Le Gall, Didier (April, 1991), MPEG: a video compression standard for multimedia applications, Communications of the ACM, <http://www.cis.temple.edu/~vasilis/Courses/CIS750/Papers/mpeg_6.pdf>. Communications of the ACM ( CACM) is the flagship monthly Journal of the Association for Computing Machinery (ACM Retrieved on 9 April 2008 
  3. ^ Chiariglione, Leonardo (October 21, 1989), Kurihama 89 press release, ISO/IEC, <http://www.chiariglione.org/mpeg/meetings/kurihama89/kurihama_press.htm>. Events 1512 - Martin Luther joins the theological faculty of the University of Wittenberg. Year 1989 ( MCMLXXXIX) was a Common year starting on Sunday (link displays 1989 Gregorian calendar) Retrieved on 9 April 2008 
  4. ^ a b Chiariglione, Leonardo (March, 2001), Open source in MPEG, Linux Journal, <http://www.chiariglione.org/leonardo/publications/linux/linux00.htm>. Linux Journal is a monthly Magazine published by Belltown Media Inc Retrieved on 9 April 2008 
  5. ^ a b c d e f g h i j k l Fogg, Chad (April 2, 1996), MPEG-2 FAQ, University of California, Berkeley, <http://bmrc.berkeley.edu/research/mpeg/faq/mpeg2-v38/faq_v38.html>. Events 68 - Galba, Governor of Hispania, names himself legatus senatus populique Romani, breaking the line of Year 1996 ( MCMXCVI) was a Leap year starting on Monday (link will display full 1996 Gregorian calendar) The University of California Berkeley (also referred to as Cal, Berkeley and UC Berkeley) is a major research university located in Berkeley Retrieved on 9 April 2008 
  6. ^ a b c d Chiariglione, Leonardo; Le Gall, Didier; Musmann, Hans-Georg & Simon, Allen (September, 1990), Press Release - Status report of ISO MPEG, ISO/IEC, <http://www.chiariglione.org/mpeg/meetings/santa_clara90/santa_clara_press.htm>. Retrieved on 9 April 2008 
  7. ^ Meetings, ISO/IEC, <http://www.chiariglione.org/mpeg/meetings.htm>. Retrieved on 9 April 2008 
  8. ^ International Organisation For Standardisation Organisation Internationale De Normalisation Iso
  9. ^ a b c Achievements, ISO/IEC, <http://www.chiariglione.org/mpeg/achievements.htm>. Retrieved on 3 April 2008 
  10. ^ Chiariglione, Leonardo (November 06, 1992), MPEG Press Release, London, 6 November 1992, ISO/IEC, <http://www.chiariglione.org/mpeg/meetings/london/london_press.htm>. Events 355 - Roman Emperor Constantius II promotes his cousin Julian to the rank of Caesar, entrusting him with Year 1992 ( MCMXCII) was a Leap year starting on Wednesday (link will display full 1992 Gregorian calendar) Retrieved on 9 April 2008 
  11. ^ a b c Wallace, Greg (April 02, 1993), Press Release, ISO/IEC, <http://www.chiariglione.org/mpeg/meetings/sydney93/sydney_press.htm>. Events 68 - Galba, Governor of Hispania, names himself legatus senatus populique Romani, breaking the line of Year 1993 ( MCMXCIII) was a Common year starting on Friday (link will display full 1993 Gregorian calendar) Retrieved on 9 April 2008 
  12. ^ a b c d Popp, Harald & Hjerde, Morten (November 09, 1996), MPEG-FAQ: multimedia compression [2/9], faqs. Events 694 - Egica, a king of the Visigoths of Hispania, accuses Jews of aiding Muslims sentencing all Year 1996 ( MCMXCVI) was a Leap year starting on Monday (link will display full 1996 Gregorian calendar) org, <http://www.faqs.org/faqs/mpeg-faq/part2/>. Retrieved on 10 April 2008 
  13. ^ International Organisation For Standardisation Organisation Internationale De Normalisation Iso
  14. ^ Ozer, Jan (October 12, 2001), Choosing the Optimal Video Resolution: The MPEG-2 Player Market, extremetech. Events 539 BC - The army of Cyrus the Great of Persia takes Babylon. Year 2001 ( MMI) was a Common year starting on Monday according to the Gregorian calendar. com, <http://www.extremetech.com/article2/0,1697,1153916,00.asp>. Retrieved on 9 April 2008 
  15. ^ Comparison between MPEG 1 & 2, snazzizone. com, <http://www.snazzizone.com/TP09.html>. Retrieved on 9 April 2008 
  16. ^ MPEG 1 And 2 Compared, Pure Motion Ltd. , 2003, <http://213.130.34.82/resources/technical/mpegcompared/index.htm>. Retrieved on 9 April 2008 
  17. ^ [homework] summary of the video (and audio) codec discussion from Dave Singer on 2007-11-09 (public-html@w3.org from November 2007)
  18. ^ MPEG-1 Video Coding (H.261)
  19. ^ a b c d e f g Grill, B. & Quackenbush, S. (October, 2005), MPEG-1 Audio, ISO/IEC, <http://www.chiariglione.org/mpeg/technologies/mp01-aud/index.htm>. Retrieved on 3 April 2008 
  20. ^ Chiariglione, Leonardo, MPEG-1 Systems, ISO/IEC, <http://www.chiariglione.org/mpeg/faq/mp1-sys/mp1-sys.htm>. Retrieved on 9 April 2008 
  21. ^ a b Pack Header, <http://dvd.sourceforge.net/dvdinfo/packhdr.html>. Retrieved on 7 April 2008 
  22. ^ Fimoff, Mark & Bretl, Wayne E. (December 1, 1999), MPEG2 Tutorial, <http://www.bretl.com/mpeghtml/STC.HTM>. Events 800 - Charlemagne judges the accusations against Pope Leo III in the Vatican Year 1999 ( MCMXCIX) was a Common year starting on Friday (link will display full 1999 Gregorian calendar) Retrieved on 9 April 2008 
  23. ^ Fimoff, Mark & Bretl, Wayne E. (December 1, 1999), MPEG2 Tutorial, <http://www.bretl.com/mpeghtml/PTS.HTM>. Events 800 - Charlemagne judges the accusations against Pope Leo III in the Vatican Year 1999 ( MCMXCIX) was a Common year starting on Friday (link will display full 1999 Gregorian calendar) Retrieved on 9 April 2008 
  24. ^ Fimoff, Mark & Bretl, Wayne E. (December 1, 1999), MPEG2 Tutorial, <http://www.bretl.com/mpeghtml/DTS.HTM>. Events 800 - Charlemagne judges the accusations against Pope Leo III in the Vatican Year 1999 ( MCMXCIX) was a Common year starting on Friday (link will display full 1999 Gregorian calendar) Retrieved on 9 April 2008 
  25. ^ Fimoff, Mark & Bretl, Wayne E. (December 1, 1999), MPEG2 Tutorial, <http://www.bretl.com/mpeghtml/VBV.HTM>. Events 800 - Charlemagne judges the accusations against Pope Leo III in the Vatican Year 1999 ( MCMXCIX) was a Common year starting on Friday (link will display full 1999 Gregorian calendar) Retrieved on 9 April 2008 
  26. ^ Acharya, Soam & Smith, Brian (1998), Compressed Domain Transcoding of MPEG, Cornell University, IEEE Computer Society, ICMCS, pp. IEEE Computer Society is an organizational unit of the Institute of Electrical and Electronics Engineers (IEEE 3, <http://citeseer.ist.psu.edu/acharya98compressed.html>. Retrieved on 9 April 2008  - (Requires clever reading: says quantization matrices differ, but those are just defaults, and selectable)
  27. ^ a b c Wee, Susie J. ; Vasudev, Bhaskaran & Liu, Sam (March 13, 1997), Transcoding MPEG Video Streams in the Compressed Domain, HP, <http://web.archive.org/web/20070817191927/http://www.hpl.hp.com/personal/Susie_Wee/PAPERS/hpidc97/hpidc97.html>. Events 1138 - Cardinal Gregorio Conti is elected Antipope as Victor IV, succeeding Anacletus II. Year 1997 ( MCMXCVII) was a Common year starting on Wednesday (link will display full 1997 Gregorian calendar Retrieved on 1 April 2008 
  28. ^ being centered around 0, by subtracting the values by half the number of possible values (i. e. 128)
  29. ^ a b c d e f Thom, D. & Purnhagen, H. (October, 1998), MPEG Audio FAQ Version 9, ISO/IEC, <http://www.chiariglione.org/MPEG/faq/mp1-aud/mp1-aud.htm>. Retrieved on 9 April 2008 
  30. ^ a b c d e f Church, Steve, Perceptual Coding and MPEG Compression, NAB Engineering Handbook, Telos Systems, <http://www.telos-systems.com/techtalk/mpeg/default.htm>. Telos Systems is an American corporation manufacturing audio products primarily for broadcast stations Retrieved on 9 April 2008 
  31. ^ a b c d e f g h i j Pan, Davis (Summer, 1995), A Tutorial on MPEG/Audio Compression, IEEE Multimedia Journal, pp. 8, <http://www.cs.columbia.edu/~coms6181/slides/6R/mpegaud.pdf>. Retrieved on 9 April 2008 
  32. ^ Smith, Brian (1996), A Survey of Compressed Domain Processing Techniques, Cornell University, pp. 7, <http://citeseer.ist.psu.edu/257196.html>. Retrieved on 9 April 2008 
  33. ^ Cheng, Mike, Psychoacoustic Models in TooLAME/TwoLAME, twolame. org, <http://www.twolame.org/doc/psycho.html>. Retrieved on 9 April 2008 
  34. ^ a b Herre, Jurgen (October 05, 2004), From Joint Stereo to Spatial Audio Coding, Conference on Digital Audio Effects, pp. Events 869 - The Fourth Council of Constantinople is convened to decide about what to do about Patriarch Photius of Constantinople "MMIV" redirects here For the Modest Mouse album see " Baron von Bullshit Rides Again " 2, <http://dafx04.na.infn.it/WebProc/Proc/P_157.pdf>. Retrieved on 17 April 2008 
  35. ^ C. Grewin, and T. Ryden, Subjective Assessments on Low Bit-rate Audio Codecs, Proceedings of the 10th International AES Conference, pp 91 - 102, London 1991
  36. ^ J. Johnston, Estimation of Perceptual Entropy Using Noise Masking Criteria, in Proc. ICASSP-88, pp. 2524-2527, May 1988.
  37. ^ J. Johnston, Transform Coding of Audio Signals Using Perceptual Noise Criteria, IEEE Journal Select Areas in Communications, vol. 6, no. 2, pp. 314-323, Feb. 1988.
  38. ^ Wustenhagen et al, Subjective Listening Test of Multi-channel Audio Codecs, AES 105th Convention Paper 4813, San Francisco 1998
  39. ^ a b B/MAE Project Group (September, 2007), EBU evaluations of multichannel audio codecs, European Broadcasting Union, <http://www.ebu.ch/CMSimages/en/tec_doc_t3324-2007_tcm6-53801.pdf>. The European Broadcasting Union ( EBU; L'Union Européenne de Radio-Télévision ("UER" and unrelated to the European Union) is a confederation Retrieved on 9 April 2008 
  40. ^ a b Meares, David; Watanabe, Kaoru & Scheirer, Eric (February, 1998), Report on the MPEG-2 AAC Stereo Verification Tests, ISO/IEC, pp. 18, <http://sound.media.mit.edu/mpeg4/audio/public/w2006.pdf>. Retrieved on 16 April 2008 
  41. ^ Painter, Ted & Spanias, Andreas (April, 2000), Perceptual Coding of Digital Audio (PROCEEDINGS OF THE IEEE, VOL. 88, NO. 4), PROCEEDINGS OF THE IEEE, <http://www.ee.columbia.edu/~marios/courses/e6820y02/project/papers/Perceptual%20coding%20of%20digital%20audio%20.pdf>. Retrieved on 1 April 2008 
  42. ^ Amorim, Roberto (September 19, 2006), GPSYCHO - Mid/Side Stereo, LAME, <http://lame.sourceforge.net/ms_stereo.php>. Events 335 - Dalmatius is raised to the rank of Caesar by his uncle Constantine I. Year 2006 ( MMVI) was a Common year starting on Sunday of the Gregorian calendar. LAME is an Open source application used to encode audio into the MP3 file format Retrieved on 17 April 2008 
  43. ^ Chiariglione, Leonardo (November 11, 1994), Press Release, ISO/IEC, <http://www.chiariglione.org/mpeg/meetings/singapore94/singapore_press.htm>. Events 308 - The Congress of Carnuntum: Attempting to keep peace within the Roman Empire, the leaders of the Tetrarchy declare Year 1994 ( MCMXCIV) was a Common year starting on Saturday (link will display full 1994 Gregorian calendar) Retrieved on 9 April 2008 

External links


© 2009 citizendia.org; parts available under the terms of GNU Free Documentation License, from http://en.wikipedia.org
Dapyx Software network: MP3 Explorer | Ebook Manager | Zenithic