Citizendia
Your Ad Here

Speech coding is the application of data compression of digital audio signals containing speech. Digital audio uses Digital signals for Sound reproduction. This includes analog-to-digital conversion, digital-to-analog conversion, storage Speech refers to the processes associated with the production and perception of Sounds used in Spoken language. Speech coding uses speech-specific parameter estimation using audio signal processing techniques to model the speech signal, combined with generic data compression algorithms to represent the resulting modeled parameters in a compact bitstream. Estimation theory is a branch of Statistics and Signal processing that deals with estimating the values of parameters based on measured/empirical data Audio signal processing, sometimes referred to as audio processing, is the processing of a representation of auditory signals, or Sound.

The two most important applications of speech coding are mobile telephony and Voice over IP. Voice-over-Internet protocol ( VoIP, vɔɪp is a protocol optimized for the transmission of voice through the Internet

The techniques used in speech coding are similar to that in audio data compression and audio coding where knowledge in psychoacoustics is used to transmit only data that is relevant to the human auditory system. For processes which reduce the amount of time it takes to listen to and understand a recording see Time-compressed speech. An audio codec is a Hardware device or a Computer program that compresses/decompresses Digital audio data according to a given Audio file Psychoacoustics is the study of subjective human Perception of Sounds Alternatively it can be described as the study of the Psychological correlates For example, in narrowband speech coding, only information in the frequency band 400 Hz to 3500 Hz is transmitted but the reconstructed signal is still adequate for intelligibility. Narrowband refers to a situation in radio communications where the bandwidth of the message does not significantly exceed the channel's Coherence bandwidth.

Speech coding differs from other forms of audio coding in that speech is a much simpler signal than most other audio signals, and that there is a lot more statistical information available about the properties of speech. As a result, some auditory information which is relevant in audio coding can be unnecessary in the speech coding context. In speech coding, the most important criterion is preservation of intelligibility and "pleasantness" of speech, with a constrained amount of transmitted data.

It should be emphasised that the intelligibility of speech includes, besides the actual literal content, also speaker identity, emotions, intonation, timbre etc. In Music, timbre (ˈtæm-bər' like timber, or, from Fr timbre tɛ̃bʁ is the quality of a Musical note or sound that distinguishes different that are all important for perfect intelligibility. The more abstract concept of pleasantness of degraded speech is a different property than intelligibility, since it is possible that degraded speech is completely intelligible, but subjectively annoying to the listener.

In addition, most speech applications require low coding delay, as long coding delays interfere with speech interaction.

Sample companding viewed as a form of speech coding

From this viewpoint, the A-law and μ-law algorithms used in traditional PCM digital telephony can be seen as a very early precursor of speech encoding, requiring only 8 bits per sample but giving effectively 12 bits of resolution. An a-law algorithm is a standard Companding algorithm used in European Digital communications systems to optimize i Digital telephony is the use of digital electronics in the provision of digital Telephone services and systems Although this would generate unacceptable distortion in a music signal, the peaky nature of speech waveforms, combined with the simple frequency structure of speech as a periodic waveform with a single fundamental frequency with occasional added noise bursts, make these very simple instantaneous compression algorithms acceptable for speech.

A wide variety of other algorithms were tried at the time, mostly variants on delta modulation, but after careful consideration, the A-law/μ-law algorithms were chosen by the designers of the early digital telephony systems. Delta modulation (DM or Δ-modulation is an analog-to- digital and digital-to- analog signal conversion technique used for transmission of voice information where quality At the time of their design, their 33% bandwidth reduction for a very low complexity made them an excellent engineering compromise. Their audio performance remains acceptable, and there has been no need to replace them in the stationary phone network.

Modern speech compression

Much of the later work in speech compression was motivated by military research into digital communications for secure military radios, where very low data rates were required to allow effective operation in a hostile radio environment. At the same time, far more processing power was available, in the form of VLSI integrated circuits, than was available for earlier compression techniques. As a result, modern speech compression algorithms could use far more complex techniques than were available in the 1960s to achieve far higher compression ratios.

These techniques were available through the open research literature to be used for civilian applications, allowing the creation of digital mobile phone networks with substantially higher channel capacities than the analog systems that preceded them.

The most common speech coding scheme is Code Excited Linear Prediction (CELP) coding, which is used for example in the GSM standard. Code excited linear prediction ( CELP) is a Speech coding algorithm originally proposed by M Code excited linear prediction ( CELP) is a Speech coding algorithm originally proposed by M GSM ( Global System for Mobile communications: originally from Groupe Spécial Mobile) is the most popular standard for Mobile phones in the In CELP, the modelling is divided in two stages, a linear predictive stage that models the spectral envelope and code-book based model of the residual of the linear predictive model. Linear prediction is a mathematical operation where future values of a discrete-time signal are estimated as a linear function of previous samples

In addition to the actual speech coding of the signal, it is often necessary to use channel coding for transmission, to avoid losses due to transmission errors. In Computer science, a channel code is a broadly used term mostly referring to the Forward error correction code and Bit interleaving in communication and Usually, speech coding and channel coding methods have to be chosen in pairs, with the more important bits in the speech data stream protected by more robust channel coding, in order to get the best overall coding results.

The Speex project is an attempt to create a free software speech coder, unencumbered by patent restrictions. Speex is a Free software speech codec that may be used on VoIP applications and Podcasts Speex claims to be free of any patent restrictions Free software or software libre is Software that can be used studied and modified without restriction and which can be copied and redistributed in modified or unmodified

Major subfields:

See also

Adaptive Multi Rate – WideBand ( AMR-WB) is a Speech coding standard developed after the AMR using same technology like ACELP. W-CDMA ( Wideband Code Division Multiple Access) is a type of 3G Cellular network. Variable Multi Rate – WideBand ( VMR-WB) is a source-controlled variable-rate multimode Codec designed for robust encoding/decoding of wideband/narrowband speech CDMA2000 is a hybrid 25G / 3G technology of mobile Telecommunications standards that use CDMA, a multiple access scheme for Digital SCIP is the US Government 's standard for secure voice and data communication Code division multiple access ( CDMA) is a Channel access method utilized by various radio communication technologies Full Rate or FR or GSM-FR was the first digital Speech coding standard used in GSM digital mobile phone system Half Rate or HR or GSM-HR is a Speech encoding system for GSM developed in the early 1990s Enhanced Full Rate or EFR or GSM-EFR is a speech coding standard that was developed in order to improve the quite poor quality of GSM - Full Adaptive Multi-Rate (AMR is an Audio data compression scheme optimized for Speech coding. GSM ( Global System for Mobile communications: originally from Groupe Spécial Mobile) is the most popular standard for Mobile phones in the For processes which reduce the amount of time it takes to listen to and understand a recording see Time-compressed speech. Audio signal processing, sometimes referred to as audio processing, is the processing of a representation of auditory signals, or Sound. Digital signal processing ( DSP) is concerned with the representation of the signals by a sequence of numbers or symbols and the processing of these signals Psychoacoustics is the study of subjective human Perception of Sounds Alternatively it can be described as the study of the Psychological correlates Speech processing is the study of speech signals and the processing methods of these signals Vector quantization is a classical Quantization technique from Signal processing which allows the modeling of probability density functions by the distribution of A vocoder, ˈvoʊkoʊdər (a Portmanteau of vox/voc ( voice) and encoder) is an analysis / synthesis system mostly used for speech in which the input is
© 2009 citizendia.org; parts available under the terms of GNU Free Documentation License, from http://en.wikipedia.org
Dapyx Software network: MP3 Explorer | Ebook Manager | Zenithic