Acoustic Model - Telephony-based Speech Recognition

Telephony-based Speech Recognition

The limiting factor for telephony based speech recognition is the bandwidth at which speech can be transmitted. For example, a standard land-line telephone only has a bandwidth of 64 kbit/s at a sampling rate of 8 kHz and 8-bits per sample (8000 samples per second * 8-bits per sample = 64000 bit/s). Therefore, for telephony based speech recognition, acoustic models should be trained with 8 kHz/8-bit speech audio files.

In the case of Voice over IP, the codec determines the sampling rate/bits per sample of speech transmission. Codecs with a higher sampling rate/bits per sample for speech transmission (which improve the sound quality) necessitate acoustic models trained with audio data that matches that sampling rate/bits per sample.

Read more about this topic:  Acoustic Model

Famous quotes containing the words speech and/or recognition:

    Money clarifies one’s speech greatly, while no one listens to the poor.
    Chinese proverb.

    Democracy and equality try to deny ... the mystic recognition of difference and innate priority, the joy of obedience and the sacred responsibility of authority.
    —D.H. (David Herbert)