Db SNP - Problems

Problems

The quality of the data found on dbSNP has been questioned by many research groups, which suspect high false positive rates due to genotyping and base-calling errors. These mistakes can easily be entered into dbSNP if the submitter uses (1) uncritical bioinformatic alignments of highly similar but distinct DNA sequences, and/or (2) PCRs with primers that cannot discriminate between similar but distinct DNA sequences. Mitchell et al. (2004) reviewed four studies and concluded that dbSNP has a false positive rate between 15-17% for SNPs, and also that the minor allele frequency is greater than 10% for approximately 80% of the SNPs that are not false positives. Similarly, Musemeci et al. (2010) states that as many as 8.32% of the biallelic coding SNPs in dbSNP are artifacts of highly similar DNA sequences (i.e. paralogous genes) and refer to these entries as single nucleotide differences (SNDs). The high error rates in dbSNP may not be surprising: of the 23.7 million refSNP entries for humans, only 14.5 million have been validated, leaving the remaining 9.2 million as candidate SNPs. However, according to Musemeci et al. (2010), even the validation code provided in the refSNP record is only partially useful: only HapMap validation reduced the number of SNDs (3% vs 8%), but only accepting this method removes more than half of the real SNPs in the dbSNP. These authors also note that one source of submissions from the Lee group are plagued with errors: 20% of these submissions are SNDs (vs. 8% for submissions). However, as the authors note, ignoring all of these submissions would remove many real SNPs.

Errors in the dbSNP can hamper candidate gene association studies and haplotype-based investigations. Errors may also increase false conclusions in association studies: increasing the number of SNPs that are tested by testing false SNPs requires more hypothesis tests. However, these false SNPs cannot actually be associated with traits, so the alpha level is decreased more than is necessary for a rigorous test if only the true SNPs were tested and the false negative rate will increase. Musemeci et al. (2010) suggested that authors of negative association studies inspect their previous studies for false SNPs (SNDs), which could be removed from analysis.

Read more about this topic:  Db SNP

Famous quotes containing the word problems:

    The Settlement ... is an experimental effort to aid in the solution of the social and industrial problems which are engendered by the modern conditions of life in a great city. It insists that these problems are not confined to any one portion of the city. It is an attempt to relieve, at the same time, the overaccumulation at one end of society and the destitution at the other ...
    Jane Addams (1860–1935)

    I conceive that the leading characteristic of the nineteenth century has been the rapid growth of the scientific spirit, the consequent application of scientific methods of investigation to all the problems with which the human mind is occupied, and the correlative rejection of traditional beliefs which have proved their incompetence to bear such investigation.
    Thomas Henry Huxley (1825–95)

    We have heard all of our lives how, after the Civil War was over, the South went back to straighten itself out and make a living again. It was for many years a voiceless part of the government. The balance of power moved away from it—to the north and the east. The problems of the north and the east became the big problem of the country and nobody paid much attention to the economic unbalance the South had left as its only choice.
    Lyndon Baines Johnson (1908–1973)