Speech Recognition - Further Information

Further Information

Popular speech recognition conferences held each year or two include SpeechTEK and SpeechTEK Europe, ICASSP, Interspeech/Eurospeech, and the IEEE ASRU. Conferences in the field of natural language processing, such as ACL, NAACL, EMNLP, and HLT, are beginning to include papers on speech processing. Important journals include the IEEE Transactions on Speech and Audio Processing (now named IEEE Transactions on Audio, Speech and Language Processing), Computer Speech and Language, and Speech Communication. Books like "Fundamentals of Speech Recognition" by Lawrence Rabiner can be useful to acquire basic knowledge but may not be fully up to date (1993). Another good source can be "Statistical Methods for Speech Recognition" by Frederick Jelinek and "Spoken Language Processing (2001)" by Xuedong Huang etc. More up to date is "Computer Speech", by Manfred R. Schroeder, second edition published in 2004. The recently updated textbook of "Speech and Language Processing (2008)" by Jurafsky and Martin presents the basics and the state of the art for ASR. Speaker recognition also uses the same features, most of the same front-end processing, and classification techniuqes as is done in speech recognition. A most recent comprehensive textbook, "Fundamentals of Speaker Recognition" by Homayoon Beigi, is an in depth source for up to date details on the theory and practice. A good insight into the techniques used in the best modern systems can be gained by paying attention to government sponsored evaluations such as those organised by DARPA (the largest speech recognition-related project ongoing as of 2007 is the GALE project, which involves both speech recognition and translation components).

In terms of freely available resources, Carnegie Mellon University's Sphinx toolkit is one place to start to both learn about speech recognition and to start experimenting. Another resource (free as in free beer, not as in free speech) is the HTK book (and the accompanying HTK toolkit). The AT&T libraries GRM library, and DCD library are also general software libraries for large-vocabulary speech recognition.

For more software resources, see List of speech recognition software.

A useful review of the area of robustness in ASR is provided by Junqua and Haton (1995).

Read more about this topic:  Speech Recognition

Famous quotes containing the word information:

    So while it is true that children are exposed to more information and a greater variety of experiences than were children of the past, it does not follow that they automatically become more sophisticated. We always know much more than we understand, and with the torrent of information to which young people are exposed, the gap between knowing and understanding, between experience and learning, has become even greater than it was in the past.
    David Elkind (20th century)

    The information links are like nerves that pervade and help to animate the human organism. The sensors and monitors are analogous to the human senses that put us in touch with the world. Data bases correspond to memory; the information processors perform the function of human reasoning and comprehension. Once the postmodern infrastructure is reasonably integrated, it will greatly exceed human intelligence in reach, acuity, capacity, and precision.
    Albert Borgman, U.S. educator, author. Crossing the Postmodern Divide, ch. 4, University of Chicago Press (1992)