Contig - Sequence Contigs

Sequence Contigs

A sequence contig is a contiguous, overlapping sequence read resulting from the reassembly of the small DNA fragments generated by bottom-up sequencing strategies. This meaning of contig is consistent with the original definition by Rodger Staden (1979). The bottom-up DNA sequencing strategy involves shearing genomic DNA into many small fragments ("bottom"), sequencing these fragments, reassembling them back into contigs and eventually the entire genome ("up"). Because current technology allows for the direct sequencing of only relatively short DNA fragments (300–1000 nucleotides), genomic DNA must be fragmented into small pieces prior to sequencing. In bottom-up sequencing projects, amplified DNA is sheared randomly into fragments appropriately sized for sequencing. The subsequent sequence reads, which are the data that contains the sequence of each fragment, are assembled into contigs, which are finally connected by sequencing the gaps between them resulting in a sequenced genome. The ability to assemble contigs depends on the overlap of reads. Because shearing is random and performed on multiple copies of DNA, each portion of the genome should be represented multiple times in different fragment frames. In other words, the sequences of the fragments (and thus the reads) should overlap. After sequencing, the overlapping reads are assembled into contigs by assembly software.

Today, it is common to use paired-end sequencing technology where both ends of consistently sized longer DNA fragments are sequenced. Here, a contig still refers to any contiguous stretch of sequence data created by read overlap. Because the fragments are of known length, the distance between the two end reads from each fragment is known. This gives additional information about the orientation of contigs constructed from these reads and allows for their assembly into scaffolds.

Scaffolds consist of overlapping contigs separated by gaps of known length. The new constraints placed on the orientation of the contigs allows for the placement of highly repeated sequences in the genome. If one end read has a repetitive sequence, as long as its mate pair is located within a contig, its placement is known. The remaining gaps between the contigs in the scaffolds can then be sequenced by a variety of methods, including PCR amplification followed by sequencing (for smaller gaps) and BAC cloning methods followed by sequencing for larger gaps.

Read more about this topic:  Contig

Famous quotes containing the word sequence:

    We have defined a story as a narrative of events arranged in their time-sequence. A plot is also a narrative of events, the emphasis falling on causality. “The king died and then the queen died” is a story. “The king died, and then the queen died of grief” is a plot. The time sequence is preserved, but the sense of causality overshadows it.
    —E.M. (Edward Morgan)