Acoustic Model - Telephony-based Speech Recognition

Telephony-based Speech Recognition

The limiting factor for telephony based speech recognition is the bandwidth at which speech can be transmitted. For example, a standard land-line telephone only has a bandwidth of 64 kbit/s at a sampling rate of 8 kHz and 8-bits per sample (8000 samples per second * 8-bits per sample = 64000 bit/s). Therefore, for telephony based speech recognition, acoustic models should be trained with 8 kHz/8-bit speech audio files.

In the case of Voice over IP, the codec determines the sampling rate/bits per sample of speech transmission. Codecs with a higher sampling rate/bits per sample for speech transmission (which improve the sound quality) necessitate acoustic models trained with audio data that matches that sampling rate/bits per sample.

Read more about this topic:  Acoustic Model

Famous quotes containing the words speech and/or recognition:

    We hear about constitutional rights, free speech and the free press. Every time I hear those words I say to myself, “That man is a Red, that man is a Communist.” You never heard a real American talk in that manner.
    Frank Hague (1876–1956)

    That the world can be improved and yet must be celebrated as it is are contradictions. The beginning of maturity may be the recognition that both are true.
    William Stott (b. 1940)