Genome Sequencing and Statistics
The T. vaginalis genome was found to be approximately 160 megabases in size – ten times larger than predicted from earlier gel-based chromosome sizing (The human genome is ~3.5 gigabases by comparison.) As much as two-thirds of the T. vaginalis sequence consists of repetitive and transposable elements, reflecting a massive, evolutionarily-recent expansion of the genome. The total number of predicted protein-coding genes is ~98,000, which includes ~38,000 'repeat' genes (virus-like, transposon-like, retrotransposon-like, and unclassified repeats, all with high copy number and low polymorphism). Approximately 26,000 of the protein-coding genes have been classed as 'evidence-supported' (similar either to known proteins, or to ESTs), while the remainder have no known function. These extraordinary genome statistics are likely to change downward as the genome sequence, currently very fragmented due to the difficulty of ordering repetitive DNA, is assembled into chromosomes, and as more transcription data (ESTs, microarrays) accumulate. But it appears that the gene number of the single-celled parasite T. vaginalis is, at minimum, on par with that of its host H. sapiens.
In late 2007 TrichDB.org was launched as a free, public genomic data repository and retrieval service devoted to genome-scale trichomonad data. The site currently contains all of the T. vaginalis sequence project data, several EST libraries, and tools for data mining and display. TrichDB is part of the NIH/NIAID-funded EupathDB functional genomics database project.
Read more about this topic: Trichomonas Vaginalis
Famous quotes containing the word statistics:
“O for a man who is a man, and, as my neighbor says, has a bone in his back which you cannot pass your hand through! Our statistics are at fault: the population has been returned too large. How many men are there to a square thousand miles in this country? Hardly one.”
—Henry David Thoreau (18171862)