Non-native Speech Database - Overview of Non-native Databases

Overview of Non-native Databases

Table 1: Abbreviations for languages used in Table 2
Arabic A Japanese J
Chinese C Korean K
Czech Cze Malaysian M
Danish D Norwegian N
Dutch Dut Portuguese P
English E Russian R
French F Spanish S
German G Swedish Swe
Greek Gre Thai T
Indonesian Ind Vietnamese V
Italian I


The actual table with information about the different databases is shown in Table 2.

Table 2: Overview of non-native Databases
Corpus Author Available at Language(s) #Speakers native Language #Utt. Duration Date Specials Reference
AMI EU E Dut and other 100h meeting recordings

ATR-Gruhn Gruhn ATR E 96 C G F J Ind 15000 2004 proficiency rating

BAS Strange Corpus I+II ELRA G 139 50 countries 7500 1998

Berkeley Restaurant ICSI E 55 G I H C F S J 2500 1994

Broadcast News LDC E 1997

Cambridge-Witt Witt U. Cambridge E 10 J I K S 1200 1999

Cambridge-Ye Ye U. Cambridge E 20 C 1600 2005

Children News Tomokiyo CMU E 62 J C 7500 2000 partly spontaneous

CLIPS-IMAG Tan CLIPS-IMAG F 15 C V 6h 2006

CLSU LDC E 22 countries 5000 2007 telephone, spontaneous

CMU CMU E 64 G 452 0.9h not available

Cross Towns Schaden U. Bochum E F G I Cze Dut 161 E F G I S 72000 133h 2006 city names

Duke-Arslan Arslan Duke University E 93 15 countries 2200 1995 partly telephone speech

ERJ Minematsu U. Tokyo E 200 J 68000 2002 proficiency rating

Fischer LDC E many 200h telephone speech

Fitt Fitt U. Edinburgh F I N Gre 10 E 700 1995 city names

Fraenki U. Erlangen E 19 G 2148

Hispanic Byrne E 22 S 20h 1998 partly spontaneous

IBM-Fischer IBM E 40 S F G I 2000 2002 digits

ISLE Atwell EU/ELDA E 46 G I 4000 18h 2000

Jupiter Zue MIT E unknown unknown 5146 1999 telephone speech

K-SEC Rhee SiTEC E unknown K 2004

LDC WSJ1 LDC 10 800 1h 1994

LeaP Gut University of Münster E G 127 41 different ones 73.941 words 12h 2003

MIST ELRA E F G 75 Dut 2200 1996

NATO HIWIRE NATO E 81 F Gre I S 8100 2007 clean speech

NATO M-ATC Pigeon NATO E 622 F G I S 9833 17h 2007 heavy background noise

NATO N4 NATO E 115 unknown 7.5h 2006 heavy background noise

Onomastica D Dut E F G Gre I N P S Swe (121000) 1995 only lexicon

PF-STAR U. Erlangen E 57 G 4627 3.4h 2005 children speech

Sunstar EU E 100 G S I P D 40000 1992 parliament speech

TC-STAR Heuvel ELDA E S unknown EU countries 13h 2006 multiple data sets

TED Lamel ELDA E 40(188) many 10h(47h) 1994 eurospeech 93

TLTS DARPA A E 1h 2004

Tokyo-Kikuko U. Tokyo J 140 10 countries 35000 2004 proficiency rating

Verbmobil U. Munich E 44 G 1.5h 1994 very spontaneous

VODIS EU F G 178 F G 2500 1998 about car navigation

WP Arabic Rocca LDC A 35 E 800 1h 2002

WP Russian Rocca LDC R 26 E 2500 2h 2003

WP Spanish Morgan LDC S E 2006

WSJ Spoke E 10 unknown 800 1993

Read more about this topic:  Non-native Speech Database