Author Citation (zoology) - Implications For Information Retrieval

Implications For Information Retrieval

For a computer, O. F. Müller, O. Müller and Müller are different strings, even the differences between O. F. Müller, O.F. Müller and OF Müller can be problematic. Fauna Europaea is a typical example of a database where combined initials O.F. and O. F. are read as entirely different strings - those who try to search for all taxonomic names described by Otto Friedrich Müller have to know (1) that the submitted data by the various data providers contained several versions (O. F. Müller, O.F. Müller, Müller and O. Müller), and (2) that in many databases, the search function will not find O.F. Müller if you search for O. F. Müller or Müller, not to mention alternative orthographies of this name such as Mueller or Muller.

Thus, the usage of (e.g.) genus-species-author-year, genus-author-year, family-author-year, etc. as "de facto" unique identifiers for biodiversity informatics purposes can present problems, on account of variation in cited author surnames, presence/absence/variations in cited initials, and minor variants in style of presentation, as well as variant cited authors (responsible person/s) and sometimes, cited dates for what may be in fact the same nomenclatural act in the same work. In addition, in a small number of cases, the same author may have created the same name more than once in the same year for different taxa, which can then only be distinguished by reference to the title, page and sometimes line of the work in which each name appears.

In Australia a program was created (TAXAMATCH) that provides a helpful tool to indicate in a preliminary manner whether two variants of a taxon name should be accepted as identical or not, according to the similarity of the cited author strings. The authority matching function of TAXAMATCH is useful to assign a moderate-to-high similarity to author strings with minor orthographic and/or date differences, such as "Medvedev & Chernov, 1969" vs. "Medvedev & Cernov, 1969", or "Schaufuss, 1877" vs. "L. W. Schaufuss, 1877", or even "Oshmarin, 1952" vs. "Oschmarin in Skrjabin & Evranova, 1952", and a low similarity to author citations which are very different (for example "Hyalesthes Amyot, 1847" vs. "Hyalesthes Signoret, 1865") and are more likely to represent different publication instances, and therefore possibly also different taxa. The program also understands standardized abbreviations as used in Botany and sometimes in Zoology as well, for example "Rchb." for Reichenbach, however may still fail for non-standard abbreviations (such as "H. & A. Ad." for H. & A. Adams, where the normal citation would in fact be "Adams & Adams"); such non-standard abbreviations must then be picked up by subsequent manual inspection after the use of algorithmic approach to pre-sort the names to be matched into groups of either more or less similar names and cited authorities. However, author names which are spelled very similarly but in fact represent different persons, and who independently authored identical taxon names, will not be adequately separated by this program; examples include "O. F. Müller 1776" vs. "P. L. S. Müller 1776", "G. B. Sowerby I 1850" vs. "G. B. Sowerby III 1875" and "L. Pfeiffer 1856" vs. "K. L. Pfeiffer 1956", so additional manual inspection is also required, especially for known problem cases such as those given above.

A further cause of errors that would not be detected by such a program include authors with mult-part surnames which are sometimes inconsistently applied in the literature, and works where the accepted attribution has changed over time. For example, genera published in the anonymously authored work "Museum Boltenianum sive catalogus cimeliorum..." published in 1798 were for a long time ascribed to Bolten, but are now considered to have been authored by Röding according to a ruling by the ICZN in 1956. Analogous problems are encountered in the field of attempting to cross-link medical records by patient name, for relevant discussion see record linkage.

Read more about this topic:  Author Citation (zoology)

Famous quotes containing the words implications and/or information:

    The power to guess the unseen from the seen, to trace the implications of things, to judge the whole piece by the pattern, the condition of feeling life in general so completely that you are well on your way to knowing any particular corner of it—this cluster of gifts may almost be said to constitute experience.
    Henry James (1843–1916)

    The information links are like nerves that pervade and help to animate the human organism. The sensors and monitors are analogous to the human senses that put us in touch with the world. Data bases correspond to memory; the information processors perform the function of human reasoning and comprehension. Once the postmodern infrastructure is reasonably integrated, it will greatly exceed human intelligence in reach, acuity, capacity, and precision.
    Albert Borgman, U.S. educator, author. Crossing the Postmodern Divide, ch. 4, University of Chicago Press (1992)