Microsoft Speech API - Basic Architecture

Basic Architecture

Broadly the Speech API can be viewed as an interface or piece of middleware which sits between applications and speech engines (recognition and synthesis). In SAPI versions 1 to 4, applications could directly communicate with engines. The API included an abstract interface definition which applications and engines conformed to. Applications could also use simplified higher-level objects rather than directly call methods on the engines.

In SAPI 5 however, applications and engines do not directly communicate with each other. Instead each talk to a runtime component (sapi.dll). There is an API implemented by this component which applications use, and another set of interfaces for engines.

Typically in SAPI 5 applications issue calls through the API (for example to load a recognition grammar; start recognition; or provide text to be synthesized). The sapi.dll runtime component interprets these commands and processes them, where necessary calling on the engine through the engine interfaces (for example, the loading of a grammar from a file is done in the runtime, but then the grammar data is passed to the recognition engine to actually use in recognition). The recognition and synthesis engines also generate events while processing (for example, to indicate an utterance has been recognized or to indicate word boundaries in the synthesized speech). These pass in the reverse direction, from the engines, through the runtime dll, and on to an event sink in the application.

In addition to the actual API definition and runtime dll, other components are shipped with all versions of SAPI to make a complete Speech Software Development Kit. The following components are among those included in most versions of the Speech SDK:

  • API definition files - in MIDL and as C or C++ header files.
  • Runtime components - e.g. sapi.dll.
  • Control Panel applet - to select and configure default speech recognizer and synthesizer.
  • Text-To-Speech engines in multiple languages.
  • Speech Recognition engines in multiple languages.
  • Redistributable components to allow developers to package the engines and runtime with their application code to produce a single installable application.
  • Sample application code.
  • Sample engines - implementations of the necessary engine interfaces but with no true speech processing which could be used as a sample for those porting an engine to SAPI.
  • Documentation.

Read more about this topic:  Microsoft Speech API

Famous quotes containing the words basic and/or architecture:

    The gay world that flourished in the half-century between 1890 and the beginning of the Second World War, a highly visible, remarkably complex, and continually changing gay male world, took shape in New York City.... It is not supposed to have existed.
    George Chauncey, U.S. educator, author. Gay New York: Gender, Urban Culture, and the Making of the Gay Male World, 1890-1940, p. 1, Basic Books (1994)

    In short, the building becomes a theatrical demonstration of its functional ideal. In this romanticism, High-Tech architecture is, of course, no different in spirit—if totally different in form—from all the romantic architecture of the past.
    Dan Cruickshank (b. 1949)