Citizendia

Linear predictive coding (LPC) is a tool used mostly in audio signal processing and speech processing for representing the spectral envelope of a digital signal of speech in compressed form, using the information of a linear predictive model. Audio signal processing, sometimes referred to as audio processing, is the processing of a representation of auditory signals, or Sound. Speech processing is the study of speech signals and the processing methods of these signals In Remote sensing using a Spectrometer, the spectral envelope of a feature is the boundary of its spectral properties as defined by the range of brightness A digital system uses discrete (discontinuous values usually but not always Symbolized Numerically (hence called "digital" to represent information for In the fields of communications, Signal processing, and in Electrical engineering more generally a signal is any time-varying or spatial-varying quantity Speech refers to the processes associated with the production and perception of Sounds used in Spoken language. Linear prediction is a mathematical operation where future values of a discrete-time signal are estimated as a linear function of previous samples It is one of the most powerful speech analysis techniques, and one of the most useful methods for encoding good quality speech at a low bit rate and provides extremely accurate estimates of speech parameters.

Contents

Overview

LPC starts with the assumption that a speech signal is produced by a buzzer at the end of a tube (voiced sounds), with occasional added hissing and popping sounds (sibilants and plosive sounds). The source-filter model of speech production models speech as a combination of a sound source such as the Vocal cords, and a filter the Vocal tract (and radiation A sibilant is a type of Fricative or Affricate Consonant, made by directing a jet of air through a narrow channel in the Vocal tract towards A stop, plosive, or occlusive is a Consonant sound produced by stopping the airflow in the Vocal tract. Although apparently crude, this model is actually a close approximation to the reality of speech production. The glottis (the space between the vocal folds) produces the buzz, which is characterized by its intensity (loudness) and frequency (pitch). The glottis is defined as the combination of the Vocal folds and the space in between the folds (the Rima glottidis) The vocal tract (the throat and mouth) forms the tube, which is characterized by its resonances, which give rise to formants, or enhanced frequency bands in the sound produced. A formant is a peak in the Frequency spectrum of a sound caused by acoustic Resonance. Hisses and pops are generated by the action of the tongue, lips and throat during sibilants and plosives.

LPC analyzes the speech signal by estimating the formants, removing their effects from the speech signal, and estimating the intensity and frequency of the remaining buzz. The process of removing the formants is called inverse filtering, and the remaining signal after the subtraction of the filtered modeled signal is called the residue.

The numbers which describe the intensity and frequency of the buzz, the formants, and the residue signal, can be stored or transmitted somewhere else. LPC synthesizes the speech signal by reversing the process: use the buzz parameters and the residue to create a source signal, use the formants to create a filter (which represents the tube), and runs the source through the filter, resulting in speech.

Because speech signals vary with time, this process is done on short chunks of the speech signal, which are called frames; generally 30 to 50 frames per second give intelligible speech with good compression.

Early history of LPC

According to Robert M. Gray of Stanford University, the first ideas leading to LPC started in 1966 when S. Leland Stanford Junior University, commonly known as Stanford University or simply Stanford, is a private Research university located in Saito and F. Itakura of NTT described an approach to automatic phoneme discrimination that involved the first maximum likelihood approach to speech coding. commonly known as NTT, is a telephone company that dominates the Telecommunication market in Japan. Maximum likelihood estimation ( MLE) is a popular statistical method used for fitting a mathematical model to some data In 1967, John Burg outlined the maximum entropy approach. The principle of maximum entropy is a postulate about a universal feature of any Probability assignment on a given set of Propositions ( Events hypotheses In 1969 Itakura and Saito introduced partial correlation, May Glen Culler proposed realtime speech encoding, and B. In Probability theory and Statistics, partial correlation measures the degree of association between two Random variables, with the effect of S. Atal presented an LPC speech coder at the Annual Meeting of the Acoustical Society of America. The Acoustical Society of America (ASA is an international scientific society dedicated to increasing and diffusing the knowledge of Acoustics and its practical applications In 1971 realtime LPC using 16-bit LPC hardware was demonstrated by Philco-Ford; four units were sold. Philco, the Philadelphia Storage Battery Company (formerly known as the Spencer Company and later the Helios Electric Company) was a pioneer in early battery

In 1972 Bob Kahn of ARPA, with Jim Forgie (Lincoln Laboratory, LL) and Dave Walden (BBN Technologies), started the first developments in packetized speech, which would eventually lead to Voice over IP technology. Robert Elliot Kahn, (born December 23 1938) invented the TCP protocol and along with Vinton G The Defense Advanced Research Projects Agency (DARPA is an agency of the United States Department of Defense responsible for the development of new Technology MIT Lincoln Laboratory, also known as Lincoln Lab, is a Federally funded research and development center managed by the Massachusetts Institute of Technology Voice-over-Internet protocol ( VoIP, vɔɪp is a protocol optimized for the transmission of voice through the Internet In 1973, according to Lincoln Laboratory informal history, the first realtime 2400 bit/s LPC was implemented by Ed Hofstetter. In 1974 the first realtime two-way LPC packet speech communication was accomplished over the ARPANET at 3500 bit/s between Culler-Harrison and Lincoln Laboratories. In 1976 the first LPC conference took place over the ARPANET using the Network Voice Protocol, between Culler-Harrison, ISI, SRI, and LL at 3500 bit/s. And finally in 1978, Vishwanath et al. of BBN developed the first variable-rate LPC algorithm. Variable bitrate ( VBR) or less commonly variable bit rate, is a term used in Telecommunications and Computing that relates to the

LPC coefficient representations

LPC is frequently used for transmitting spectral envelope information, and as such it has to be tolerant for transmission errors. Transmission of the filter coefficients directly (see linear prediction for definition of coefficients) is undesirable, since they are very sensitive to errors. Linear prediction is a mathematical operation where future values of a discrete-time signal are estimated as a linear function of previous samples In other words, a very small error can distort the whole spectrum, or worse, a small error might make the prediction filter unstable.

There are more advanced representations such as Log Area Ratios (LAR), line spectral pairs (LSP) decomposition and reflection coefficients. Log Area Ratios (LAR can be used to represent Reflection Coefficients (another form for Linear Prediction Coefficients) for transmission over a channel Line Spectral Pairs (LSP or Line Spectral Frequencies (LSF are used to represent Linear Prediction Coefficients (LPC for transmission over a channel Levinson recursion or Levinson-Durbin recursion is a procedure in Linear algebra to recursively calculate the solution to an equation involving a Toeplitz Of these, especially LSP decomposition has gained popularity, since it ensures stability of the predictor, and spectral errors are local for small coefficient deviations.

Applications

LPC is generally used for speech analysis and resynthesis. It is used as a form of voice compression by phone companies, for example in the GSM standard. GSM ( Global System for Mobile communications: originally from Groupe Spécial Mobile) is the most popular standard for Mobile phones in the It is also used for secure wireless, where voice must be digitized, encrypted and sent over a narrow voice channel, an early example of this is the US government's Navajo I. Definition The compound word COMSEC is prevalent in the DoD culture with hundreds of secondary and tertiary words The Navajo I is a secure telephone built into a Briefcase that was developed by the U

LPC synthesis can be used to construct vocoders where musical instruments are used as excitation signal to the time-varying filter estimated from a singer's speech. A vocoder, ˈvoʊkoʊdər (a Portmanteau of vox/voc ( voice) and encoder) is an analysis / synthesis system mostly used for speech in which the input is This is somewhat popular in electronic music. Electronic music is music that employs Electronic musical instruments and Electronic Music technology in its production Paul Lansky made the well-known computer music piece notjustmoreidlechatter using linear predictive coding. Paul Lansky (born June 18, 1944, in New York) is an electronic-music or computer-music composer who has been producing works from [1] A 10th-order LPC was used in the popular 1980's Speak & Spell educational toy. The Speak & Spell was an electronic Toy consisting of a speech synthesizer and a keyboard

Waveform ROM in digital sample-based music synthesizers made by Yamaha Corporation is compressed using LPC algorithm. Sample-based synthesis is a form of audio synthesis that can be contrasted to either Subtractive synthesis or Additive synthesis.

0-to-32nd order LPC predictors are used in FLAC audio codec. Free Lossless Audio Codec ( FLAC) is a File format for lossless Audio data compression.

References

See also

External links


© 2009 citizendia.org; parts available under the terms of GNU Free Documentation License, from http://en.wikipedia.org
Dapyx Software network: MP3 Explorer | Ebook Manager | Zenithic