A prefix code is a code, typically a variable-length code, with the "prefix property": no code word is a prefix of any other code word in the set. In Communications a code is a rule for converting a piece of Information (for example a letter, Word, Phrase, or In Coding theory a variable-length code is a Code which maps source symbols to a variable number of bits In Telecommunication, a code word is an element of a Code. Each code word is a Sequence of symbols assembled in accordance with the specific rules of A subsequence, substring, prefix or suffix of a string is a subset of the symbols in a string where the order of the elements is preserved A code with code words {0, 10, 11} has the prefix property; a code consisting of {0, 1, 10, 11} does not, because "1" is a prefix of both "10" and "11".
Prefix codes are also known as prefix-free codes, prefix condition codes, comma-free codes (although this is incorrect), and instantaneous codes. Although Huffman coding is just one of many algorithms for deriving prefix codes, prefix codes are also widely referred to as "Huffman codes", even when the code was not produced by a Huffman algorithm. History In 1951 David A Huffman and his MIT information theory classmates were given
Using prefix codes, a message can be transmitted as a sequence of concatenated code words, without any out-of-band markers to frame the words in the message. Out-of-band is a technical term with different uses in Communications and Telecommunication. The recipient can decode the message unambiguously, by repeatedly finding and removing prefixes that form valid code words. This is not possible with codes that lack the prefix property, such as our example of {0, 1, 10, 11}: a receiver reading a "1" at the start of a code word would not know whether that was the complete code word "1", or merely the prefix of the code word "10" or "11".
The variable-length Huffman codes, country calling codes, the country and publisher parts of ISBNs, and the Secondary Synchronization Codes used in the UMTS W-CDMA 3G Wireless Standard are prefix codes. History In 1951 David A Huffman and his MIT information theory classmates were given This is a list of country calling codes defined by ITU-T recommendation E W-CDMA ( Wideband Code Division Multiple Access) is a type of 3G Cellular network. Prefix codes are also a form of entropy encoding used in lossless data compression. In Information theory an entropy encoding is a lossless Data compression scheme that is independent of the specific characteristics of the medium Lossless data compression is a class of Data compression Algorithms that allows the exact original data to be reconstructed from the compressed data
Prefix codes are not error-correcting codes. In Mathematics, Computer science, Telecommunication, and Information theory, error detection and correction has great practical importance in In actual practice, a message might first be compressed with a prefix code, and then encoded again (with an error-correcting code) before transmission.
This article is partly derived from Federal Standard 1037C, which uses the term comma-free code. Federal Standard 1037C, entitled Telecommunications Glossary of Telecommunication Terms is a United States Federal Standard issued by the General Services Administration
Contents |
Techniques for constructing a prefix code can be simple, or quite complicated.
If every word in the code has the same length, the code is called a fixed-length code. For example, ISO 8859-15 letters are always 8 bits long. ISO 8859-15 is part 15 of ISO 8859, a standard Character encoding defined by International Organization for Standardization. UTF-32/UCS-4 letters are always 32 bits long. UTF-32 (or UCS-4) is a protocol for encoding Unicode characters that uses exactly 32 Bits for each Unicode Code point. ATM packets are always 424 bits long. In electronic digital data transmission systems the Network protocol Asynchronous Transfer Mode (ATM encodes data traffic into small fixed-sized cells Prefixes cannot exist in a fixed-length code. Unfortunately, fixed length encodings are inefficient in situations where some words are much more likely to be transmitted than others.
Some codes mark the end of a code word with a special "comma" symbol, different from normal data. [1] This is somewhat analogous to the period at the end of a sentence; it marks where one sentence ends and another begins. If every code word ends in a comma, and the comma does not appear elsewhere in a code word, the code is prefix-free. However, modern communication systems send everything as sequences of "1" and "0" – adding a third symbol would be expensive, and using it only at the ends of words would be inefficient. Morse code is an everyday example of a variable-length code with a comma. Morse code is a Character encoding for transmitting telegraphic information using standardized sequences of short and long elements to represent the letters numerals The long pauses between letters, and the even longer pauses between words, help people recognize where one letter (or word) ends, and the next begins. Similarly, Fibonacci coding uses a "11" to mark the end of every code word. In Mathematics, Fibonacci coding is a universal code which encodes positive integers into binary Code words All tokens end with "11" and have
Huffman coding is a more sophisticated technique for constructing variable-length prefix codes. History In 1951 David A Huffman and his MIT information theory classmates were given The Huffman coding algorithm takes as input the frequencies that the code words should have, and constructs a prefix code that minimizes the weighted average of the code word lengths.
Kraft's inequality characterizes the sets of code word lengths that are possible in a prefix code. In Coding theory, Kraft's inequality gives a necessary and sufficient condition for the existence of a uniquely decodable code for a given set of Codeword
Examples of prefix codes include:
Commonly used techniques for constructing prefix codes include Huffman codes and the earlier Shannon-Fano codes, and universal codes such as: