Quantitative Comparative Linguistics - Databases

Databases

Swadesh originally published a 200 word list but later refined it into a 100 word one. A commonly used IE database is that by Dyen, Kruskal and Black which contains data for 95 languages, though the original is known to contain a few errors. Besides the raw data it also contains cognacy judgements. This is available online. The database of Ringe, Warnow and Taylor has information on 24 IE languages, with 22 phonological characters, 15 morphological characters and 333 lexical characters. Gray and Atkinson used a database of 87 languages with 2449 lexical items, based on the Dyen set with the addition of three ancient languages. They incorporated the cognacy judgements of a number of scholars. Other databases have been drawn up for African, Australian and Andean language families, amongst others.

Coding of the data may be in binary form or in multistate form. The former is often used but does result in a bias. It has been claimed that there is a constant scale factor between the two coding methods, and that allowance can be made for this. However, another study suggests that the topology may change

Read more about this topic:  Quantitative Comparative Linguistics