Structure and Genome of SARS-CoV-2 (COVID-19) with diagram
Updated:
Coronaviruses are a group of related RNA viruses that cause diseases in mammals and birds. In humans, they can cause respiratory tract infections that can range from mild to lethal, such as the common cold, severe acute respiratory syndrome (SARS), Middle East respiratory syndrome (MERS), and coronavirus disease 2019 (COVID-19) .
Coronaviruses are named for the spikes that protrude from their surfaces, resembling a crown or the sun`s corona. They belong to the subfamily Orthocoronavirinae, which is divided into four genera: Alphacoronavirus, Betacoronavirus, Gammacoronavirus, and Deltacoronavirus. Each genus includes different species of coronaviruses that infect different hosts.
Coronaviruses are enveloped, positive-sense, single-stranded RNA viruses with a genome size of about 30 kilobases . They have four structural proteins: spike (S), envelope (E), membrane (M), and nucleocapsid (N) . The S protein is responsible for binding to the host cell receptor and mediating viral entry. The E and M proteins form the viral envelope and are involved in virus assembly and release. The N protein binds to the viral RNA and forms the nucleocapsid inside the envelope .
Coronaviruses can infect both animals and humans, and can cause zoonotic diseases that jump from one species to another . Some of the known animal reservoirs of coronaviruses include bats, camels, civets, pangolins, and minks . The transmission of coronaviruses between animals and humans can occur through direct contact, aerosols, or contaminated surfaces .
The most recent coronavirus outbreak is caused by a novel coronavirus called SARS-CoV-2, which was first detected in Wuhan, China in December 2019 . This virus causes COVID-19, a disease that can range from mild to severe respiratory illness with symptoms such as fever, cough, shortness of breath, loss of taste or smell, and sometimes pneumonia or organ failure . As of June 2023, COVID-19 has infected more than 200 million people and caused more than 4 million deaths worldwide.
The best way to prevent and control the spread of coronaviruses is to be well informed about the disease and how the virus spreads. Some of the preventive measures include staying at least 1 metre apart from others, wearing a properly fitted mask, washing hands or using an alcohol-based rub frequently, getting vaccinated when it`s available and following local guidance. It is also important to practice respiratory etiquette, such as coughing into a flexed elbow, and to stay home and self-isolate if feeling unwell.
In this article, we will focus on the detailed structure and genomic organization of SARS-CoV-2, the causative agent of COVID-19. We will also discuss the role of the spike glycoprotein in viral infection and its implications for vaccine development and treatment. Finally, we will conclude with some future research directions on coronaviruses.
SARS-CoV-2 is a member of the Betacoronavirus genus, which also includes SARS-CoV and MERS-CoV. These viruses have a spherical to pleomorphic shape, with a diameter of about 80 to 160 nm. They are enveloped by a lipid bilayer derived from the host cell membrane, and have spike (S) glycoproteins protruding from the surface, giving them a crown-like appearance (hence the name coronavirus). The S protein mediates the attachment and entry of the virus into the host cells by binding to the angiotensin-converting enzyme 2 (ACE2) receptor .
The S protein is composed of two subunits: S1 and S2. The S1 subunit contains a signal peptide, followed by an N-terminal domain (NTD) and a receptor-binding domain (RBD). The RBD is responsible for recognizing and binding to the ACE2 receptor on the host cell surface. The NTD may also contribute to receptor binding and immune evasion. The S2 subunit contains a fusion peptide, two heptad repeat regions (HR1 and HR2), a transmembrane domain, and a cytoplasmic tail. The fusion peptide inserts into the host cell membrane, while the HR1 and HR2 regions form a six-helix bundle that brings the viral and host membranes closer together, facilitating membrane fusion.
The S protein can exist in two conformations: a prefusion state and a postfusion state. In the prefusion state, the RBD is mostly hidden by the NTD and other parts of the S1 subunit, making it less accessible to neutralizing antibodies. Upon binding to the ACE2 receptor, the S protein undergoes a conformational change that exposes the RBD and activates the S2 subunit for membrane fusion. This transition from the prefusion to the postfusion state is irreversible and essential for viral entry.
Besides the S protein, SARS-CoV-2 also has three other structural proteins: envelope (E), membrane (M), and nucleocapsid (N). The E protein is the smallest structural protein, with only 75 amino acids. It forms an ion channel in the viral envelope and plays a role in virus assembly, release, and pathogenesis. The M protein is the most abundant structural protein, accounting for about 40% of the viral mass. It has three transmembrane domains and a short N-terminal ectodomain. It determines the shape of the viral envelope and interacts with the S, E, and N proteins during virus assembly. The N protein binds to the viral RNA genome and forms a helical nucleocapsid inside the envelope. It also interacts with other viral and host proteins and modulates various aspects of viral replication, transcription, assembly, and immune response.
The genome of SARS-CoV-2 is a single-stranded positive-sense RNA molecule of about 30 kb in length. It encodes 29 proteins, including 16 non-structural proteins (NSPs) that are involved in viral replication and transcription, four structural proteins (S, E, M, and N) that form the virion structure, and nine accessory proteins that have various functions in viral infection. The genome organization of SARS-CoV-2 is similar to that of other coronaviruses, with a 5′-leader-UTR-replicase-S-E-M-N-3′UTR-poly(A) tail arrangement. The replicase gene occupies about two-thirds of the genome and encodes two large polyproteins (pp1a and pp1ab) that are cleaved by viral proteases into 16 NSPs. The remaining one-third of the genome encodes four structural proteins and nine accessory proteins in different open reading frames (ORFs).
The structure of SARS-CoV-2 provides important insights into its biology, evolution, pathogenesis, and therapeutic development. Understanding how the virus interacts with its host cells and evades immune recognition is crucial for designing effective vaccines and antiviral drugs. Furthermore, comparing the structure of SARS-CoV-2 with other coronaviruses may reveal common features as well as unique adaptations that enable its high infectivity and transmissibility.
The spike (S) glycoprotein is one of the four structural proteins of SARS-CoV-2, the causative agent of COVID-19. The S protein protrudes from the surface of the viral envelope and forms a crown-like appearance, hence the name coronavirus. The S protein plays a crucial role in mediating viral entry into the host cell by first interacting with the receptor on the cell surface and then fusing the viral and cellular membranes. The S protein is also the main target of neutralizing antibodies and vaccine development.
Structure of Spike Glycoprotein
The S protein is a type I transmembrane glycoprotein that consists of two subunits, S1 and S2, which are cleaved by host proteases during biosynthesis and maturation. The S1 subunit contains a signal peptide, followed by an N-terminal domain (NTD) and a receptor-binding domain (RBD). The NTD and RBD are involved in recognizing and binding to the host receptor, which is angiotensin-converting enzyme 2 (ACE2) for SARS-CoV-2. The RBD can adopt two conformations, up and down, depending on its accessibility to ACE2. The up conformation exposes the receptor-binding motif (RBM), which directly contacts ACE2, while the down conformation hides the RBM from ACE2. The S1 subunit also contains a furin cleavage site at the boundary between S1 and S2, which is unique to SARS-CoV-2 among other coronaviruses. This cleavage site enhances the infectivity and pathogenicity of SARS-CoV-2 by facilitating the activation of S protein by host proteases.
The S2 subunit contains conserved fusion peptide (FP), heptad repeat 1 (HR1) and 2 (HR2), a transmembrane domain (TM), and a cytoplasmic domain (CT). The FP, HR1, and HR2 are responsible for mediating membrane fusion between the virus and the host cell. The TM anchors the S protein to the viral envelope, while the CT interacts with other viral proteins during assembly and budding.
The S protein forms a trimer on the viral surface, with each monomer adopting a prefusion conformation that is metastable and prone to undergo conformational changes upon receptor binding or proteolytic cleavage. The prefusion conformation consists of three domains: domain I (DI), domain II (DII), and domain III (DIII). DI contains most of the NTD, DII contains most of the RBD and FP, and DIII contains most of the HR1. Upon receptor binding or proteolytic cleavage, the S protein undergoes a series of structural rearrangements that result in the exposure of FP and HR1, which insert into the host cell membrane. Then, HR1 forms a coiled-coil structure that brings together three FP regions from each monomer, forming a trimeric postfusion core. Finally, HR2 folds back and packs against HR1, forming a six-helix bundle that brings the viral and cellular membranes into close proximity for fusion.
Role of Spike Glycoprotein in Infection
The spike glycoprotein is essential for viral infection as it mediates both attachment and fusion of the virus to the host cell. The attachment process involves the recognition and binding of the RBD in the S1 subunit to ACE2 on the surface of host cells, mainly in the respiratory tract. This interaction determines the host range and cell tropism of SARS-CoV-2, as well as its affinity and specificity for human cells. The binding of RBD to ACE2 also triggers conformational changes in the S protein that expose the cleavage sites for host proteases, such as TMPRSS2 or furin, which activate the S protein for membrane fusion.
The fusion process involves the insertion of FP in the S2 subunit into the host cell membrane, followed by the formation of a postfusion core that brings together the viral and cellular membranes for fusion. This process releases the viral genome into the cytoplasm of the host cell, where it can initiate replication and transcription. The fusion process also depends on other factors, such as pH, temperature, cholesterol, and membrane curvature.
Antigenicity of Spike Glycoprotein
The spike glycoprotein is also highly immunogenic as it elicits both humoral and cellular immune responses in the host. The humoral response involves the production of neutralizing antibodies that bind to epitopes on the S protein and block its interaction with ACE2 or its conformational changes for membrane fusion. The cellular response involves the activation of T cells that recognize and eliminate infected cells or secrete cytokines that modulate the immune response. The S protein is the main target of vaccine development as it can induce protective immunity against SARS-CoV-2 infection.
However, the spike glycoprotein also faces several challenges in eliciting effective and durable immunity. One challenge is the extensive glycosylation of the S protein, which shields it from the recognition by antibodies or T cells. The glycans on the S protein can also affect its folding, stability, and function. Another challenge is the conformational variability of the S protein, which can affect its antigenicity and immunogenicity. The RBD, for example, can switch between up and down conformations, exposing or hiding the RBM from antibodies. The S protein can also undergo mutations or recombination events that alter its sequence, structure, or function, resulting in antigenic drift or shift. These challenges require careful design and optimization of the S protein-based vaccines and therapeutics.
References
- Walls AC, Park YJ, Tortorici MA, Wall A, McGuire AT, Veesler D. Structure, Function, and Antigenicity of the SARS-CoV-2 Spike Glycoprotein. Cell. 2020 Apr 16;181(2):281-292.e6. doi: 10.1016/j.cell.2020.02.058.
- Duan L, Zheng Q, Zhang H, Niu Y, Lou Y, Wang H. The SARS-CoV-2 Spike Glycoprotein Biosynthesis, Structure, Function, and Antigenicity: Implications for the Design of Spike-Based Vaccine Immunogens. Front Immunol. 2020 Oct 7;11:576622. doi: 10.3389/fimmu.2020.576622.
- Coronavirus spike protein - Wikipedia
- Glycans of SARS-CoV-2 Spike Protein in Virus Infection and Vaccination | Frontiers in Molecular Biosciences
- The spike glycoprotein of highly pathogenic human coronaviruses: structure-function relationships and therapeutic interventions - PubMed
The genome of SARS-CoV-2 is a single-stranded positive-sense RNA of about 30 kb (29891 nucleotides) encoding 9860 amino acids. The G + C content is 38%.
There are 12 functional open reading frames (ORFs) along with a set of nine subgenomic mRNAs carrying a conserved leader sequence, nine transcription-regulatory sequences, and two terminal untranslated regions.
The genome of this virus lacks the haemagglutinin-esterase gene, which is characteristically found in lineage A βCoV.
Two-thirds of viral RNA, mainly located in the first ORF, translates two polyproteins, pp1a and pp1ab, and encodes 16 non-structural proteins (NSPs), while the remaining ORFs encode accessory and structural proteins.
The 16 non-structural proteins include two viral cysteine proteases, namely, NSP3 (papain-like protease) and NSP5 (main protease), NSP12 (RNA-dependent RNA polymerase), NSP13 (helicase), and other NSPs which are likely involved in the transcription and replication of the virus.
The rest part of the viral genome codes for four structural proteins E, M, S, and N along with a number of accessory proteins that interfere with the host immune response.
The organization of the coronavirus genome is 5′-leader-UTR-replicase-S (Spike)–E (Envelope)-M (Membrane)-N (Nucleocapsid)-3′UTR-poly (A) tail with accessory genes interspersed within the structural genes at the 3′ end of the genome.
SARS-CoV-2 is closer to the SARS-like bat CoVs in terms of the whole genome sequence. However, mutations are observed in NSP2 and NSP3 and the spike protein, that play a significant role in infectious capability and differentiation mechanism of SARS-CoV-2.
Besides, two strains, namely L-type and S-type, are discovered. It was found that L lineage was more prevalent than the S lineage within the limited patient samples that were examined. The study states that, “The implication of these evolutionary changes on disease etiology remains unclear”.
SARS-CoV-2 is a novel coronavirus that has caused a global pandemic of COVID-19, a respiratory disease with severe complications and high mortality. The structure and genome of SARS-CoV-2 have been extensively studied to understand its origin, evolution, transmission, pathogenesis, and potential targets for diagnosis, treatment, and prevention. The spike glycoprotein is the key viral protein that mediates the entry of the virus into host cells by binding to the ACE2 receptor. The spike protein also elicits neutralizing antibodies and cellular immune responses in the host, making it a promising candidate for vaccine development. The genome of SARS-CoV-2 is a positive-sense single-stranded RNA that encodes 16 non-structural proteins and four structural proteins, as well as several accessory proteins that modulate the host immune response. The genome of SARS-CoV-2 is highly similar to those of SARS-like bat coronaviruses, but with some mutations and deletions that confer distinct features and functions.
Despite the rapid progress in the characterization of SARS-CoV-2, many questions remain unanswered and challenges persist. For example:
- What are the molecular mechanisms of viral replication, transcription, and translation in host cells?
- How does the virus evade or manipulate the innate and adaptive immune responses of the host?
- What are the factors that determine the virulence, transmissibility, and tropism of the virus?
- How does the virus interact with other co-infections or co-morbidities in the host?
- What are the genetic diversity and evolution of the virus in different populations and environments?
- How can we develop effective and safe vaccines and therapeutics against the virus?
- How can we improve the surveillance, diagnosis, and prevention of COVID-19 outbreaks?
These are some of the important research directions that need to be pursued in order to combat this emerging threat to global health. The scientific community has shown remarkable collaboration and innovation in responding to this unprecedented challenge. It is hoped that through continued efforts and cooperation, we can overcome this pandemic and prepare for future ones.
We are Compiling this Section. Thanks for your understanding.