Sequence Assembly - Influence of Technological Changes

Influence of Technological Changes

The complexity of sequence assembly is driven by two major factors: the number of fragments and their lengths. While more and longer fragments allow better identification of sequence overlaps, they also pose problems as the underlying algorithms show quadratic or even exponential complexity behaviour to both number of fragments and their length. And while shorter sequences are faster to align, they also complicate the layout phase of an assembly as shorter reads are more difficult to use with repeats or near identical repeats.

In the earliest days of DNA sequencing, scientists could only gain a few sequences of short length (some dozen bases) after weeks of work in laboratories. Hence, these sequences could be aligned in a few minutes by hand.

In 1975, the Dideoxy termination method (also known as Sanger sequencing) was invented and until shortly after 2000, the technology was improved up to a point where fully automated machines could churn out sequences in a highly parallelised mode 24 hours a day. Large genome centers around the world housed complete farms of these sequencing machines, which in turn led to the necessity of assemblers to be optimised for sequences from whole-genome shotgun sequencing projects where the reads

  • are about 800–900 bases long
  • contain sequencing artifacts like sequencing and cloning vectors
  • have error rates between 0.5 and 10%

With the Sanger technology, bacterial projects with 20,000 to 200,000 reads could easily be assembled on one computer. Larger ones like the human genome with approximately 35 million reads needed already large computing farms and distributed computing.

By 2004 / 2005, pyrosequencing had been brought to commercial viability by 454 Life Sciences. This new sequencing methods generated reads much shorter than from Sanger sequencing: initially about 100 bases, now 400-500 bases. However, due to the much higher throughput and lower cost than Sanger sequencing, the adoption of this technology by genome centers pushed development of sequence assemblers to deal with this new type of sequences. The sheer amount of data coupled with technology specific error patterns in the reads delayed development of assemblers, at the beginning in 2004 only the Newbler assembler from 454 was available. Presented in mid 2007, the hybrid version of the MIRA assembler by Chevreux et al. was the first freely available assembler who could assemble 454 reads and mixtures of 454 reads and Sanger reads; using sequences from different sequencing technologies was subsequently coined hybrid assembly.

Since 2006, the Illumina (previously Solexa) technology is available and able to generate about 100 million reads per run on a single sequencing machine. Compare this to the 35 million reads of the human genome project which needed several years to be produced on hundreds of sequencing machines. Illumina initially was limited to a length of only 36 bases, making it less suitable for de novo assembly (such as de novo transcriptome assembly), but newer iterations of the technology achieve read lengths above 100 bases from both ends of a 3-400bp clone. Presented by the end of 2007, the SHARCGS assembler by Dohm et al. was the first published assembler that was used for an assembly with Solexa reads, quickly followed by a number of others.

Later, new technologies like SOLiD from Applied Biosystems are released, and new technologies (e.g. IonTorrent, PacBio) continue to emerge at a rapid rate.

Read more about this topic:  Sequence Assembly

Famous quotes containing the words influence of and/or influence:

    Imagination is always the fabric of social life and the dynamic of history. The influence of real needs and compulsions, of real interests and materials, is indirect because the crowd is never conscious of it.
    Simone Weil (1909–1943)

    ... even I am growing accustomed to slavery; so much so that I cease to think of its accursed influence and calmly eat from the hands of the bondman without being mindful that he is such. O, Slavery, hateful thing that thou art thus to blunt the keen edge of conscience!
    Susan B. Anthony (1820–1907)