Next generation sequencing

Massively parallel sequencing

Sanger sequencing was dominant for 30 years as it produced reseonably long high quality sequence reads. However, it wasn't scaleable as it only sequenced ne molcule at a time making it expensive. To recude costs other sequencing technologies were required.

A key aim was to achieve massively parallel sequencing so more than one molcule could be sequenced at the same time. Several technologies came to market in the late 2000's before Illumina became the dominant platform. This significantly reduced the cost of sequencing a genome.

1 of 7

Illumina sequencing

One HiSeq (machine used) run produces ~8 billion reads and the maximum read length on high output is 125bp. This technology accounts for >70% of the sequencing market.

Illumina creates shorter reads than Sanger but can generate millions of reads simultaneously. Like sanger it used sequencing by synthesis with fluorescent bases used to detect each base. Optical sensors cannot detect a single fluorescent signal so the template sequence needs amplifying whilst keeping the different template molecules separate. This problem is solved by doing PCR on a glass slide surface, using a process called bridge amplification.

Key stages:

Genomic DNA extraction
DNA fragmentation
Library prep (addition of adapter molecules)
Cluster generation (bridge amplification)
Sequencing by synthesis
Data analysis

2 of 7

Preparation for sequencing

First extract the DNA
Then fragment the DNA and size select fragments of ~500bp
Short DNA molecules of known sequence, called adapters are attahced to both ends of all fragments. These allow amplification and sequencing of the fragment.

3 of 7

Bridge amplification

First melt the DNA so it is single stranded and run it across the flow cell (glass slide) which has lots of primers matching the adapter sequences.
The strands will then hybrid to primers and DNA pol is added to synthesise matching strands. The original strand is them melted and flows away leaving the newly synthesized strand attached to a primer.
Sequence then hybrides to adjacent primer due to length of sequence as it moves about on the flow cell.
DNA pol is again added and a new strand synthesized and the strands detach and the process repeats again (steps 3 and 4). This creates an exponential increase of copy number

4 of 7

Cluster generation

Template sequences will attach to flow cell surface at random positions and then undergo bridge amplification to form clusters.
By controlling the conc of template fragments it is possible to get millions of seperate clusters on the flow cell.
Each cluster contains a different template sequence from a different part of the genome and will produce sequence data for that region
Each of these clusters is thousands of identical molecules, and is large enough to be detected by a digital camera when it fluoresces

5 of 7

Sequencing process

Once the clusters are generated sequencing can begin. The approach uses reversible terminator nucleotides. They include a detachable fluorescent label and like ddNTPs are blocked to prevent chain extension, however this can be chemically removed to allow chain extension to continue.

Before sequencing the two complentary strands and you can only sequence one at a time so an endonuclease digests one copy of the strand. Then a primer is hybridised and you add the nucleotide bases one at a time and record fluorescene and chemically cleave label in between each one.

After the first round of sequencing a single round of bridge PCR can be used generate the other strand (after sequence just created is melted away). Then melt away the first strand and sequence the second one to get the sequence at the other end of the fragment. We expect these to map about 500bp apart on opposite strands as this is the size of the fragment.

Sometimes a third primer is used to sequence an index. This allows different samples to be distinguished, meaning we can sequence more than one sample at a time.

6 of 7

Advantages and disadvantages

A single run generates millions and sequences
Cheaper per base
Shotgun sequencing without a cloning step

Library prep is quite expensive and slow
Amplification of DNA fragments is required which can introduce biases and GC rich often don't amplify efficiently
Read lengths are short

7 of 7

Get Revising