Sequence Database - Many Inputs Create Inconsistencies

Many Inputs Create Inconsistencies

A major problem with all the large genetic sequence databases is that records are deposited in them from a wide range of sources, from individual researchers to large genome sequencing centers. As a result, the sequences themselves, and especially the biological annotations attached to these sequences, vary tremendously in quality. Also there is much redundancy, as multiple labs often submit numerous sequences that are identical, or nearly identical, to others in the databases.

Many annotations are based not on laboratory experiments, but on the results of sequence similarity searches for previously-annotated sequences. Of course, once a sequence has been annotated based on similarity to others, and itself deposited in the database, it can also become the basis for future annotations. This leads to the transitive annotation problem because there may be several such annotation transfers by sequence similarity between a particular database record and actual wet lab experimental information. Therefore, one must always regard the biological annotations in major sequence databases with a considerable degree of skepticism, unless they can be verified by reference to published papers describing high-quality experimental data, or at least by reference to a human-curated sequence database.

Read more about this topic: Sequence Database

Famous quotes containing the word create:

“Most people agree that men have trouble showing hurt, jealousy, and fear but even mothers, whose wider emotional range is often taken for granted, also seem more comfortable with anger than these other “unparentlike” feelings. This is probably because several generations of mothers have now been twelve-step-programmed and pop-psychologized enough to believe that expressing hurt, fear, anxiety, or dependence will create pathological guilt in their kids.”
—Ron Taffel (20th century)