Speech Segmentation - Phonetic Segmentation

Phonetic Segmentation

The lowest level of speech segmentation is the breakup and classification of the sound signal into a string of phones. The difficulty of this problem is compounded by the phenomenon of co-articulation of speech sounds, where one may be modified in various ways by the adjacent sounds: it may blend smoothly with them, fuse with them, split, or even disappear. This phenomenon may happen between adjacent words just as easily as within a single word.

The notion that speech is produced like writing, as a sequence of distinct vowels and consonants, is a relic of our alphabetic heritage. In fact, the way we produce vowels depends on the surrounding consonants and the way we produce consonants depends on the surrounding vowels. For example, when we say 'kit', the is farther forward than when we say 'caught'. But also the vowel in 'kick' is phonetically different from the vowel in 'kit', though we normally do not hear this. In addition, there are language-specific changes which occur on casual speech which makes it quite different from spelling. For example, in English, the phrase 'hit you' could often be more appropriately spelled 'hitcha'. Therefore, even with the best algorithms, the result of phonetic segmentation will usually be very distant from the standard written language. For this reason, the lexical and syntactic parsing of spoken text normally requires specialized algorithms, distinct from those used for parsing written text.

Statistical models can be used to segment and align recorded speech to words or phones. Applications include automatic lip-synch timing for cartoon animation, follow-the-bouncing-ball video sub-titling, and linguistic research. Automatic segmentation and alignment software is commercially available.

Read more about this topic:  Speech Segmentation

Famous quotes containing the word phonetic:

    The syntactic component of a grammar must specify, for each sentence, a deep structure that determines its semantic interpretation and a surface structure that determines its phonetic interpretation.
    Noam Chomsky (b. 1928)