Escherichia Coli - Genomics

Genomics

The first complete DNA sequence of an E. coli genome (laboratory strain K-12 derivative MG1655) was published in 1997. It was found to be a circular DNA molecule 4.6 million base pairs in length, containing 4288 annotated protein-coding genes (organized into 2584 operons), seven ribosomal RNA (rRNA) operons, and 86 transfer RNA (tRNA) genes. Despite having been the subject of intensive genetic analysis for approximately 40 years, a large number of these genes were previously unknown. The coding density was found to be very high, with a mean distance between genes of only 118 base pairs. The genome was observed to contain a significant number of transposable genetic elements, repeat elements, cryptic prophages, and bacteriophage remnants.

Today, over 60 complete genomic sequences of Escherichia and Shigella species are available. Comparison of these sequences shows a remarkable amount of diversity; only about 20% of each genome represents sequences present in every one of the isolates, while approximately 80% of each genome can vary among isolates. Each individual genome contains between 4,000 and 5,500 genes, but the total number of different genes among all of the sequenced E. coli strains (the pan-genome) exceeds 16,000. This very large variety of component genes has been interpreted to mean that two-thirds of the E. coli pangenome originated in other species and arrived through the process of horizontal gene transfer.

Read more about this topic:  Escherichia Coli