Next-Generation Sequencing (NGS)- Definition, Types
Next-Generation Sequencing (NGS) is a term that refers to a group of technologies that enable rapid and high-throughput sequencing of DNA or RNA molecules. NGS can be used for various applications, such as gene expression profiling, variant/mutation detection, epigenetic changes, and molecular analysis. NGS has revolutionized genomic research and personalized medicine by allowing labs to study biological systems at a level never before possible.
NGS is different from the previous Sanger sequencing technology, which was slower and less scalable. Sanger sequencing relies on the chain-termination method, which uses dideoxynucleotides to terminate the synthesis of DNA strands and produces fragments of different lengths that are separated by electrophoresis and detected by fluorescence. Sanger sequencing is best for analyzing small numbers of gene targets and samples and can be accomplished in a single day. It is also considered the gold-standard sequencing technology, so NGS results are often verified using Sanger sequencing.
NGS, on the other hand, uses a massively parallel approach, which means that millions of DNA or RNA fragments are sequenced simultaneously in a single run. NGS technologies vary in the way they prepare, amplify, and sequence the DNA or RNA molecules, but they all share some common steps: library preparation, sequencing, and data analysis.
- Library preparation involves fragmenting the DNA or RNA sample and attaching adapter sequences that are specific to each technology. These adapters may also have unique molecular barcodes that allow for multiplexing, which means that multiple samples can be pooled and sequenced together in the same run.
- Sequencing involves adding fluorescently labeled nucleotides to the DNA or RNA fragments and detecting the incorporation of each nucleotide by a camera or a sensor. Sequencing chemistry can be based on synthesis (adding one nucleotide at a time), ligation (adding short oligonucleotides at a time), or hybridization (probing with complementary oligonucleotides) methods.
- Data analysis involves processing the raw images or signals into sequences of nucleotides (called reads) and aligning them to a reference genome or assembling them into contigs. The data analysis can also include quality control, variant calling, annotation, and interpretation steps, depending on the application.
NGS offers several advantages over Sanger sequencing, such as lower cost per base, higher throughput per run, higher accuracy, lower sample input requirements, and the ability to detect variants at lower allele frequencies. NGS also enables new applications that were not possible with Sanger sequencing, such as whole-genome sequencing, transcriptome sequencing, metagenomics sequencing, epigenomics sequencing, and single-cell sequencing. NGS has transformed the fields of genomics and clinical research, reproductive health, and environmental, agricultural, and forensic science.
In this article, we will provide an overview of the generations of sequencing technologies and describe some of the most common types of NGS platforms available today. We will also discuss their advantages and disadvantages, as well as their applications and challenges.
DNA sequencing is the process of determining the order of nucleotides in a DNA molecule. Since the discovery of the structure of DNA by Watson and Crick in 1953, various methods have been developed to sequence DNA and reveal its genetic information. These methods can be classified into three main generations, each with different characteristics and applications.
- First-generation sequencing refers to the original methods that were based on the chain termination or chemical degradation of DNA fragments. The most widely used first-generation method was Sanger sequencing, developed by Frederick Sanger and colleagues in 1977. Sanger sequencing uses dideoxynucleotides (ddNTPs) to terminate DNA synthesis by DNA polymerase and generate fragments of different lengths that can be separated by gel electrophoresis and detected by autoradiography or fluorescence. Sanger sequencing was the main method for sequencing the first genomes, such as bacteriophage X174, human mitochondrial DNA, and human chromosome 22. However, Sanger sequencing is limited by its low throughput, high cost, and labor-intensive procedure.
- Second-generation sequencing or next-generation sequencing (NGS) refers to the methods that emerged in the 2000s and revolutionized the field of genomics by increasing the throughput and reducing the cost of sequencing. NGS methods are based on massively parallel sequencing of millions of DNA molecules attached to a solid surface or beads. NGS methods use different strategies to amplify, label, and detect the DNA sequences, such as pyrosequencing, reversible terminator sequencing, ligation sequencing, or sequencing by synthesis. NGS methods have enabled many applications, such as whole-genome sequencing, transcriptome profiling, metagenomics, epigenomics, and personalized medicine.
- Third-generation sequencing or single-molecule sequencing refers to the methods that are currently being developed and aim to sequence DNA molecules without amplification or modification. Third-generation methods use novel technologies to directly measure the sequence of single DNA molecules in real-time, such as nanopores, nanoballs, or zero-mode waveguides. Third-generation methods have the potential to overcome some of the limitations of NGS methods, such as short read lengths, amplification biases, and sequencing errors. Third-generation methods also offer new possibilities for detecting DNA modifications, such as methylation or hydroxymethylation.
Each generation of sequencing has its own advantages and disadvantages, and the choice of the best method depends on the research question, the available resources, and the desired outcome. In this article, we will focus on NGS methods and describe their types in more detail.
Next-generation sequencing (NGS) is a term that encompasses various methods and technologies that enable rapid and high-throughput sequencing of DNA molecules. NGS can be used for various applications, such as genome sequencing, transcriptome profiling, epigenetic analysis, metagenomics, and more. NGS technologies differ in their sequencing chemistry, library preparation, imaging, and data analysis methods. Some of the most common types of NGS are:
- Lynx therapeutics’ massively parallel signature sequencing (MPSS): This was the first NGS technology developed in the 1990s. It is a bead-based method that uses adapter ligation and decoding to sequence cDNA fragments in increments of four nucleotides. It can reveal almost every transcript in the sample and provide its accurate expression level. However, it is susceptible to sequence-specific bias or loss of specific sequences.
- Polony sequencing: This is an inexpensive and accurate multiplex sequencing technique that uses in vitro paired-tag library, emulsion PCR, automated microscope, and ligation-based sequencing chemistry to sequence millions of immobilized DNA sequences in parallel. It was developed by George Church at Harvard Medical College and used to sequence an E. coli genome at a low cost and high accuracy.
- Pyrosequencing: This is a parallelized version of pyrosequencing developed by 454 Life Sciences, which was acquired by Roche Diagnostics. It uses luciferase to generate light for the detection of the individual nucleotides added to the nascent DNA. The DNA is amplified inside water droplets in an oil solution (emulsion PCR) and attached to primer-coated beads that form clonal colonies in picolitre-volume wells. This technology provides intermediate read length and price per base compared to other NGS technologies.
- Illumina (Solexa) sequencing: This is a sequencing technology based on dye terminators and bridge amplification. The DNA molecules are attached to primers on a slide and amplified to form clusters. The DNA can only be extended one nucleotide at a time, and a camera takes images of the fluorescently labeled nucleotides. The dye, along with the terminal 3′ blocker, is chemically removed from the DNA, allowing the next cycle to commence. This technology provides high throughput and low cost per base but short read length.
- SOLiD sequencing: This is a sequencing technology based on oligonucleotide ligation and detection. The DNA fragments are ligated with adapters and hybridized to a slide with complementary probes. A pool of all possible oligonucleotides of fixed length is labeled according to the sequenced position and ligated to the template. The signal is detected, and the ligated oligonucleotide is removed. The process is repeated until all positions are sequenced. This technology provides high accuracy and low cost per base but short read length and complex data analysis.
- DNA nanoball sequencing: This is a high-throughput sequencing technology that uses rolling circle replication to amplify fragments of genomic DNA molecules into DNA nanoballs (DNBs). The DNBs are then hybridized into a patterned array and sequenced by cyclic ligation and detection. This technology allows a large number of DNBs to be sequenced per run and at a low reagent cost compared to other NGS platforms. However, only short sequences of DNA are determined from each DNB, which makes mapping the short reads to a reference genome difficult.
- Helioscope single molecule sequencing: This is a single molecule sequencing technology that uses DNA fragments with added polyA Tail adapters attached to the surface of the flow cell. The following steps involve extension-based sequencing with cyclic washes of fluorescently labeled nucleotides in the flow cell. The reads are performed by the Helioscope sequencer. The reads are short, up to 55 bases per run, but recent improvement in the methodology allows more accurate reads of homopolymers and RNA sequencing.
- Single molecule SMRT sequencing: This is a single molecule sequencing technology based on the sequencing by synthesis approach. The DNA is synthesized in so-called zero-mode waveguides (ZMWs) – small well-like containers with the capturing tools located at the bottom of the well. The sequencing is performed with the use of unmodified polymerase and fluorescently labeled nucleotides flowing freely in the solution. The fluorescent label has separated from the nucleotide at its incorporation into the DNA strand, leaving an unmodified DNA strand. The SMRT technology allows the detection of nucleotide modifications through the observation of polymerase kinetics. This approach allows reads of 1000 nucleotides.
- Single molecule real-time (RNAP) sequencing: This is a single molecule sequencing technology based on The polystyrene bead-attached RNA polymerase (RNAP), with the distal end of sequenced DNA attached to another bead, with both beads being placed in optical traps. RNAP motion during transcription brings the beads closer, and their relative distance changes, which can then be recorded at a single nucleotide resolution. The sequence is deduced based on The polystyrene bead-attached RNA polymerase (RNAP),
- MPSS is considered the first of the "next-generation" sequencing technologies.
- MPSS was developed in the 1990s at Lynx Therapeutics, a company founded in 1992 by Sydney Brenner and Sam Eletr.
- MPSS is an ultra-high throughput sequencing technology that can sequence millions of DNA molecules in parallel.
- MPSS uses a complex method of adapter ligation, adapter decoding, and sequencing by hybridization to generate short reads of 16-20 nucleotides.
- MPSS is mainly used for transcriptome analysis, as it can measure the expression level of almost every transcript in a sample.
- MPSS has some advantages over other sequencing methods, such as high accuracy, low bias, and low cost per base.
- MPSS also has some limitations, such as the need for large amounts of starting material, the difficulty of assembling short reads, and the dependence on prior knowledge of the genome.
MPSS is a pioneer next-generation sequencing technology that can sequence millions of DNA molecules in parallel using a complex method of adapter ligation, decoding, and hybridization. It is mainly used for transcriptome analysis, as it can measure the expression level of almost every transcript in a sample. It has some advantages and limitations compared to other sequencing methods.
Polony sequencing is an inexpensive but highly accurate multiplex sequencing technique that can be used to read millions of immobilized DNA sequences in parallel. This technique was first developed by Dr. George Church at Harvard Medical College.
Polony sequencing involves the following steps:
- A DNA library is prepared by ligating adapters to both ends of DNA fragments and amplifying them by PCR.
- The amplified DNA is fragmented and denatured into single strands.
- The single-stranded DNA molecules are hybridized to complementary oligonucleotides attached to a glass slide, forming a lawn of DNA clusters.
- Each cluster is then sequenced by cyclically flooding the slide with a mixture of fluorescently labeled nucleotides and a DNA polymerase.
- The nucleotides are incorporated into the growing DNA strand, emitting a flash of light that is captured by a CCD camera.
- The fluorescent label and the terminator group are cleaved off, allowing the next cycle to proceed.
- The sequence of each cluster is determined by analyzing the color and intensity of the light signals.
Polony sequencing has several advantages over other next-generation sequencing methods, such as:
- It does not require emulsion PCR or bead-based amplification, which reduces the cost and complexity of the procedure.
- It can generate longer reads (up to 170 bp) with high accuracy (> 99.999%).
- It can be multiplexed by using different adapters or barcodes for each sample, allowing simultaneous sequencing of multiple genomes or regions of interest.
- It can be used for various applications, such as whole-genome sequencing, transcriptome analysis, metagenomics, and epigenomics.
Polony sequencing also has some limitations, such as:
- It requires a large amount of DNA input (about 1 μg) and a high-density array of oligonucleotides on the slide, which may increase the risk of cross-contamination or hybridization errors.
- It has a low throughput compared to other next-generation sequencing platforms, as it can only sequence about 2 million clusters per slide.
- It has a high error rate in homopolymer regions due to incomplete removal of the fluorescent label or terminator group.
Polony sequencing was one of the first next-generation sequencing technologies to be developed and used for various genomic projects. However, it has been largely replaced by newer and more efficient methods, such as Illumina or SOLiD sequencing. Polony sequencing is still used for some specialized applications, such as digital gene expression or single-cell analysis.
Pyrosequencing is a method of DNA sequencing that relies on the detection of light emitted during the synthesis of a complementary strand of DNA. The principle of pyrosequencing was first described in 1993 by Bertil Pettersson, Mathias Uhlen, and Pål Nyren. A parallelized version of pyrosequencing was developed by 454 Life Sciences, which has since been acquired by Roche Diagnostics.
The method involves amplifying DNA inside water droplets in an oil solution (emulsion PCR), with each droplet containing a single DNA template attached to a single primer-coated bead that then forms a clonal colony. The sequencing machine contains many picolitre-volume wells, each containing a single bead and sequencing enzymes. Pyrosequencing uses luciferase to generate light for the detection of the individual nucleotides added to the nascent DNA, and the combined data are used to generate sequence read-outsread-outs.
Pyrosequencing provides intermediate read length and price per base compared to Sanger sequencing on one end and Illumina and SOLiD on the other. Pyrosequencing has been used for various applications, such as whole genome sequencing, metagenomics, transcriptomics, epigenetics, and forensics.
Illumina (Solexa) sequencing is a widely adopted next-generation sequencing technology that uses a proprietary method called sequencing by synthesis (SBS) to determine the order of nucleotides in DNA molecules.
The basic steps of Illumina (Solexa) sequencing are:
- Library preparation: DNA samples are fragmented, and adapters are ligated to both ends of the fragments. The adapters serve as primers for amplification and sequencing.
- Bridge amplification: The DNA library is attached to a flow cell, where the fragments are amplified by solid-phase PCR to form clusters of identical DNA strands.
- Sequencing: The flow cell is loaded into a sequencing instrument, where four fluorescently labeled nucleotides are added one by one. Each nucleotide incorporation is detected by a camera and recorded as a color-coded signal. The fluorescent label and the 3` blocker are then removed, allowing the next cycle to proceed.
- Data analysis: The color-coded signals are converted into a sequence of bases using the software. The sequences are then aligned to a reference genome or assembled de novo.
Illumina (Solexa) sequencing offers several advantages, such as:
- High throughput: It can generate millions of reads per run, enabling large-scale genomic applications.
- High accuracy: It can achieve over 99.9% accuracy by using error-correction algorithms, and paired-end reads.
- Low cost: It has a low cost per base compared to other NGS technologies.
Illumina (Solexa) sequencing also has some limitations, such as:
- Short read length: It can only produce reads of up to 300 bp, which may pose challenges for the assembly and mapping of complex regions.
- GC bias: It may have reduced coverage and quality for GC-rich or GC-poor regions due to PCR amplification and sequencing chemistry.
Illumina (Solexa) sequencing is suitable for a variety of applications, such as whole-genome sequencing, whole-exome sequencing, targeted sequencing, RNA sequencing, methylation sequencing, and more.
SOLiD sequencing stands for Sequencing by Oligonucleotide Ligation and Detection. It is a second-generation DNA sequencing technology developed by Applied Biosystems that uses a novel method of sequencing by ligation. Unlike sequencing-by-synthesis, which relies on the addition of nucleotides with DNA polymerase, sequencing-by-ligation uses DNA ligase to join DNA fragments and determine the underlying sequence of the target DNA.
The process of SOLiD sequencing involves the following steps:
- Fragmentation and denaturation of the target DNA and ligation of two different adapters at both ends
- Binding of the DNA fragments to magnetic beads coated with complementary oligonucleotides
- Emulsion PCR to amplify the DNA fragments on the beads
- Deposition of the beads onto a glass slide
- Hybridization and ligation of fluorescently labeled 8-mers (short ssDNA fragments) to the DNA fragments on the beads
- Detection and cleavage of the fluorescent labels
- Repeated cycles of hybridization, ligation, detection, and cleavage with different primers
The key feature of SOLiD sequencing is the use of two-base encoding, which assigns one out of four possible colors to each unique pair of bases on the 3` end of the 8-mer probe. For example, "AA" is assigned to blue, "AC" is assigned to green, and so on for all 16 unique pairs. During sequencing, each base in the template is sequenced twice, and the resulting data are decoded according to this scheme. This provides an internal accuracy check and reduces sequencing errors.
SOLiD sequencing can generate up to 10^9 small sequence reads (35-75 bp) at one time. It has been used for various applications such as gene expression profiling, transcriptome analysis, genome resequencing, and epigenetic studies. However, it also has some limitations, such as difficulty in sequencing palindromic sequences and high complexity and cost of data analysis.
DNA nanoball sequencing is a high-throughput sequencing technology that is used to determine the entire genomic sequence of an organism. The method uses rolling circle replication to amplify small fragments of genomic DNA into DNA nanoballs. Fluorescent nucleotides bind to complementary nucleotides and are then polymerized to anchor sequences bound to known sequences on the DNA template. The base order is determined via the fluorescence of the bound nucleotides. This DNA sequencing method allows large numbers of DNA nanoballs to be sequenced per run at lower reagent costs compared to other next-generation sequencing platforms. However, a limitation of this method is that it generates only short sequences of DNA, which presents challenges to mapping its reads to a reference genome.
The procedure of DNA nanoball sequencing involves the following steps :
- DNA isolation, fragmentation, and size capture: Cells are lysed, and DNA is extracted from the cell lysate. The high-molecular-weight DNA, often several megabase pairs long, is fragmented by physical or enzymatic methods to break the DNA double-strands at random intervals. Bioinformatic mapping of the sequencing reads most efficiently when the sample DNA contains a narrow length range. For small RNA sequencing, the selection of the ideal fragment lengths for sequencing is performed by gel electrophoresis; for sequencing of larger fragments, DNA fragments are separated by bead-based size selection.
- Attaching adapter sequences: Adapter DNA sequences must be attached to the unknown DNA fragment so that DNA segments with known sequences flank the unknown DNA. In the first round of adapter ligation, right and left adapters are attached to the right and left flanks of the fragmented DNA, and the DNA is amplified by PCR. A splint oligo then hybridizes to the ends of the fragments, which are ligated to form a circle. An exonuclease is added to remove all remaining linear single-stranded and double-stranded DNA products. The result is a completed circular DNA template.
- Rolling circle replication: Once a single-stranded circular template is formed, it undergoes rolling circle replication with Phi29 DNA polymerase to create many single-stranded copies of each fragment. The DNA copies concatenate head to tail in a long strand and are compacted into a DNA nanoball.
- DNB loading onto patterned nanoarrays: The nanoballs are then adsorbed onto a sequencing flow cell that contains millions of wells with known anchor sequences. Each well can accommodate one nanoball, ensuring that each nanoball is sequenced independently.
- Combinatorial probe anchor synthesis (cPAS) sequencing: The cPAS sequencing uses four fluorescently labeled nucleotides that are complementary to the anchor sequences on the flow cell. The nucleotides bind to their respective anchors and are then polymerized by DNA polymerase. The color of the fluorescence at each interrogated position is recorded through a high-resolution camera. Bioinformatics is used to analyze the fluorescence data and make a base call.
DNA nanoball sequencing has been used for multiple genome sequencing projects and is scheduled to be used for more. It has been shown to produce reliable data with rapid turnaround times at reduced per-base cost compared to previous sequencing techniques. It also has advantages such as a low error rate, reduced risk of optical duplicates, reduced duplication rates, high base calling accuracy, reduced adapter carryover, and negligible index hopping. However, it also has some disadvantages, such as a longer workflow until sequencing-ready library, few third-party library prep solutions, shorter read lengths, more hands-on time, and multiplexing requiring complicated adapter setup.
Helioscope single-molecule sequencing is a next-generation sequencing technology that does not require DNA amplification or library preparation. It directly sequences single DNA molecules that are attached to a glass surface with polyA tails. The technology was developed by Helicos Biosciences, a company founded in 2003 by Stephen Quake and others.
The Helioscope sequencing process involves the following steps:
- The DNA molecules are fragmented and polyadenylated at both ends. The polyA tails serve as primers for the sequencing reaction.
- The DNA molecules are hybridized to a flow cell that has oligo(dT) probes covalently attached to the glass surface. Each DNA molecule occupies a distinct spot on the flow cell, which can accommodate millions of molecules.
- The flow cell is washed with fluorescently labeled nucleotides that have a reversible terminator group at the 3` end. The nucleotides compete for incorporation into the complementary strand of the DNA molecule by a DNA polymerase.
- A CCD camera captures the fluorescence signal from each spot on the flow cell, indicating which nucleotide was added. The terminator group prevents further extension of the strand until it is removed by a cleavage step.
- The cycle of nucleotide addition, fluorescence detection, and cleavage is repeated until the entire DNA molecule is sequenced.
The advantages of Helioscope single-molecule sequencing include the following:
- It eliminates the need for PCR amplification, which can introduce biases and errors in the sequence.
- It reduces the cost and complexity of library preparation, which can be time-consuming and labor-intensive.
- It enables direct sequencing of RNA molecules without the need for reverse transcription or cDNA synthesis.
- It allows long read lengths, up to 55 bases per cycle, which can improve the accuracy and coverage of the sequence.
The limitations of Helioscope single-molecule sequencing include the following:
- It has a high error rate, especially for homopolymer regions and low-complexity regions, due to incomplete cleavage of the terminator group or misincorporation of nucleotides.
- It has a low throughput, compared to other next-generation sequencing technologies, due to the limited number of spots on the flow cell and the slow speed of the sequencing reaction.
- It has a high sensitivity to contamination since any DNA or RNA molecules in the sample or reagents can hybridize into the flow cell and interfere with the sequencing.
Helioscope single-molecule sequencing is suitable for applications that require direct sequencing of single DNA or RNA molecules, such as transcriptome analysis, small RNA discovery, epigenetic analysis, and metagenomics. However, it faces competition from other next-generation sequencing technologies that offer higher throughput, lower error rates, and lower cost per base.
Single-molecule SMRT (single-molecule real-time) sequencing is a parallelized single-molecule DNA sequencing method developed by Pacific Biosciences (PacBio) of California, Inc. It is based on the sequencing-by-synthesis approach, which means that the DNA sequence is determined by monitoring the activity of a DNA polymerase enzyme as it synthesizes a complementary strand of DNA from a single-stranded template.
The key component of SMRT sequencing is the zero-mode waveguide (ZMW), a nanostructure that creates an illuminated observation volume that is small enough to observe only a single nucleotide of DNA being incorporated by the DNA polymerase. The ZMW consists of a metal-coated hole on a glass substrate with a diameter of about 70 nanometers and a depth of about 100 nanometers. A single DNA polymerase enzyme is immobilized at the bottom of the ZMW with a single molecule of DNA template attached to it.
The DNA template is prepared by circularizing a DNA fragment using hairpin adapters, which allows the DNA polymerase to synthesize multiple copies of the same sequence in a process called rolling circle amplification. The nucleotides used for the DNA synthesis are fluorescently labeled and linked to a phosphate chain, which prevents them from being incorporated by the DNA polymerase until they enter the ZMW. When a nucleotide enters the ZMW, it interacts with the laser light and emits a fluorescent signal that is detected by a camera. The signal indicates which base (A, C, G, or T) has been incorporated by the DNA polymerase, and the base call is made according to the corresponding fluorescence of the dye. The fluorescent label and the phosphate chain are then cleaved off from the nucleotide and diffuse out of the ZMW, leaving behind an unmodified DNA strand.
The SMRT sequencing technology allows long reads of up to 1000 nucleotides with high accuracy (>99.9%) for HiFi reads, which are generated by circular consensus sequencing (CCS), a method that uses multiple passes of the same template to correct errors. The SMRT sequencing technology also enables the detection of nucleotide modifications, such as methylation, by observing the kinetics of the DNA polymerase, which may slow down or stall when encountering a modified base. This feature provides additional information about the epigenetic state of the DNA sample.
SMRT sequencing has been widely used for various applications, such as genome assembly, genome variation detection, transcriptome analysis, metagenomics, and epigenomics. It has also contributed to several landmark projects, such as the Telomere-to-Telomere (T2T) Consortium that presented the first complete human genome in 2021.
This method is based on RNA polymerase (RNAP), which is attached to a polystyrene bead, with the distal end of the sequenced DNA attached to another bead, with both beads being placed in optical traps. RNAP motion during transcription brings the beads closer, and their relative distance changes, which can then be recorded at a single nucleotide resolution. The sequence is deduced based on four read-outsread-outs with lowered concentrations of each of the four nucleotide types. This method can sequence long DNA molecules up to 10 kb in length and can also detect modified bases. However, it has a high error rate of about 5% and requires high DNA concentrations.
We are Compiling this Section. Thanks for your understanding.