9

RNA genome conservation and secondary structure in SARS-CoV-2 and SARS-related v...

 3 years ago
source link: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7217285/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
RNA genome conservation and secondary structure in SARS-CoV-2 and SARS-related viruses

Introduction

Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) has caused a rapidly expanding global pandemic, with the COVID-19 outbreak responsible at this time for over 600,000 cases and 25,000 deaths. The emergence of this pandemic has revealed an urgent need for diagnostic and antiviral strategies targeting SARS-CoV-2. Like other coronaviruses, SARS-CoV-2 is a positive sense RNA virus, with a large RNA genome approaching nearly 30 kilobases in length. Its RNA genome contains protein-coding open reading frames (ORFs) for the viral replication machinery, structural proteins, and accessory proteins. The genome additionally harbors various cis-acting RNA elements, with structures in the 5´ and 3´ untranslated region (UTRs) guiding viral replication, RNA synthesis and viral packaging.1 Conserved RNA elements offer compelling targets for diagnostics. In addition, such RNA elements may be useful targets for antivirals, a concept supported by the recent development of antisense oligonucleotide therapeutics and small-molecule RNA-targeting drugs for a variety of targets across infectious and chronic diseases.24

Conserved structured RNA regions have already been shown to play critical functional roles in the life cycles of coronaviruses. Most coronavirus 5´ UTR’s harbor at least four stem loops, with many showing heightened sequence conservation across betacoronaviruses, and various stems demonstrating functional roles in viral replication.5 Furthermore, RNA secondary structure in the 5´ UTR exposes a critical sequence motif, the transcriptional regulatory sequence (TRS), that forms long-range RNA interactions necessary for facilitating the discontinuous transcription characteristic to coronaviruses.6 Beyond the 5´ UTR, the frame-shifting element (FSE) in the first protein-coding ORF (ORF1ab) includes a pseudoknot structure that is necessary for the production of ORF1a and ORF1b from two overlapping reading frames via programmed ribosomal −1 frame-shifting.7 In the 3´ UTR, mutually exclusive RNA structures including the 3´ UTR pseudoknot control various stages of the RNA synthesis pathway.8

Beyond these canonical structured regions, the RNA structure of the SARS-CoV-2 genome remains mostly unexplored. Unbiased discovery of other conserved regions and/or structured regions in the virus has the potential to uncover further functional cis-acting RNA elements. Here, we analyze RNA sequence conservation across SARS-related betacoronaviruses and currently available SARS-CoV-2 sequences, and we identify structured and unstructured regions that are conserved in each sequence set; these intervals can provide starting points for a variety of diagnostic and antiviral development strategies (Fig. 1). To identify structured regions, we predict maximum expected accuracy structures around conserved regions and report the support of these single structures from predictions of each RNA’s structural ensemble. We additionally identify thermodynamically stable secondary structures across the whole genome, finding that currently known structures fall within these predictions, but also identifying various new candidate structured regions. We pinpoint unstructured genome intervals by identifying bases with low average base-pairing probabilities. Finally, we present secondary structure models for key RNA structural elements of SARS-CoV-2 annotated in the betacoronavirus family.

An external file that holds a picture, illustration, etc. Object name is nihpp-2020.03.27.012906-f0001.jpg

We aim to provide a series of genome regions in SARS-CoV-2 that are useful for a variety of diagnostic and therapeutic strategies, including regions that are (A) conserved in SARS-related betacoronaviruses and SARS-CoV-2 sequences (Table 1), (B) regions that are structured and conserved in SARS-CoV-2 sequences (Table 2), and (C) regions that are unstructured and conserved in SARS-CoV-2 sequences (Table 3).

Results

RNA sequence conservation in SARS-related betacoronaviruses and SARS-CoV-2

To identify potential regions of conserved RNA secondary structure in the virus, we located stretches of the SARS-CoV-2 genome with high RNA sequence conservation across SARS-related betacoronavirus full genome sequences. By identifying regions with high RNA sequence conservation as a first step, we reasoned that we would be more likely to filter for functionally relevant structures that must be conserved through virus evolution and thereby discover targets that are potentially less likely to develop resistance against therapeutics or to escape diagnosis as the virus evolves. To ensure reasonable numbers of sequences while still focusing on conservation and structure patterns most relevant to the current pandemic, we chose to analyze not all betacoronaviruses but a subgroup of SARS-related betacoronaviruses. These include SARS, SARS-CoV-2, and SARS-related bat coronaviruses, but not MERS, MHV, or other betacoronaviruses which have been classified into distinct subgroups based on different sequence and structure features in, for example, their 5´ UTR’s.9

We carried out this analysis beginning with three different sequence alignments. Each captures a range of complete genome sequences across the SARS-related betacoronviruses, but differ in the total number of sequences and in the redundancy of those sequences, as follows:

  1. The first multiple sequence alignment (SARSr-MSA-1) was computed by aligning sequences curated by Ceraolo and Giorgi,10 filtered by including only the reference genome sequence NC_0405512.211 from the SARS-CoV-2 sequence set, removing the two MERS sequences, and leaving in all remaining betacoronavirus whole genome sequences. This alignment captures a range of SARS-related bat coronavirus and SARS sequences with only 11 sequences. These sequences correspond well to the SARS-related group defined in Gorbalenya, Baker, et al.12

  2. The second MSA (SARSr-MSA-2) was obtained from BLAST by searching for the 100 complete genome sequences closest to the SARS-CoV-2 reference genome. This alignment captures a larger set of SARS-CoV-2, SARS, and bat coronavirus sequences than SARSr-MSA-1 but includes many sequences with high pairwise similarity.

  3. The final MSA (SARSr-MSA-3) was obtained by locating all complete genome betacoronavirus sequences from the NCBI database, and removing mutually similar sequences with a 99% sequence conservation cutoff. With 180 sequences with at most 99% pairwise sequence similarity, this MSA captures a broader set of betacoronaviruses than SARSr-MSA-1 and SARSr-MSA-2 but is more challenging to align due to higher sequence diversity.

We computed conserved regions as contiguous stretches of 15 nucleotides or longer that were 100% conserved (cutoff for SARSr-MSA-1), 98% conserved (cutoff for SARSr-MSA-2), or 54% conserved (cutoff for SARSr-MSA-3). Searching for conserved regions of 15 nucleotides or more enables the design of antisense oligonucleotides that fall within these stretches. The sequence conservation cutoffs chosen ensured that at least 75 candidate conserved stretches were used for further structure analysis for each MSA. When calculating sequence conservation at the 5´ and 3´ sequence ends of the sequence, we did not include sequences that included only leading or trailing sequence deletions up to that point to avoid sequencing artefacts.

In Fig. 2, we depict conserved regions (100% conservation cutoff, SARSr-MSA-1) alongside the genome coordinates for the reference SARS-CoV-2 sequence. We observe intervals of conservation in the 5´ UTR and 3´ UTR genome regions, as expected based on prior work demonstrating sequence conservation surrounding structured RNA elements in these regions,13 but we also noted stretches of RNA sequence conservation within some viral ORFs.

An external file that holds a picture, illustration, etc. Object name is nihpp-2020.03.27.012906-f0002.jpg

In black we annotate SARSr-MSA-1 conserved regions of the genome, superimposed on SARS-CoV-2 genome ORFs. We depict the top secondary structures as ranked by Matthews correlation coefficient that overlap with these conserved regions, ordered from A to E. Regions A to E are annotated on the genome in yellow and are located at genome positions: A:13743–13798, B:17511–17566, C:28990–29054, D:172–236, E:26–109. Secondary structures are colored by sequence conservation in SARSr-MSA-1 (cyan = more conserved, purple = less conserved). In magenta are depicted curated Rfam families present in coronaviruses, including the frame-shifting element (FSE), the 3´ UTR pseudoknot (PK3), and the 3´ stem-loop II-like motif (s2m). Figures prepared in Geneious41 and draw_rna (https://github.com/DasLab/draw_rna).

Interestingly, in SARSr-MSA-1 and SARSr-MSA-2 we found that conserved stretches overlapped with previously curated Rfam14 families for Coronaviridae RNA secondary structures: the frameshifting stimulation element (Rfam family RF00507), the 3´ UTR pseudoknot (Rfam family RF00165), and the 3´ stem-loop II-like motif (Rfam family RF00164) (Fig. 2). Locations for the frameshifting stimulation element, 3´ UTR pseudoknot, and 3´ stem-loop II-like motif were confirmed using Infernal,15 with all regions discovered at an E<10−4 threshold. We also found overlap between conserved stretches and additional 5´ UTR structures that have been established for previous coronaviruses, including the original SARS virus, including stem loops 2–3 (SL2–3) and stem loop 5 (SL5).16 These five known RNA structures overlap with conserved regions more than expected; in 10,000 random trials, the chance that five randomly chosen intervals of these lengths all overlap with the conserved regions from SARSr-MSA-1 or SARSr-MSA-2 is less than 0.0003. The enrichment of known RNA structures in these conserved regions suggests that other conserved regions may also harbor RNA structures.

To further tighten this list of conserved sequences to ones most relevant to the current COVID-19 outbreak, we analyzed whether sequence regions conserved across SARS and bat coronaviruses remain conserved in the SARS-Cov-2 strains, most of which emerged after our analysis above (Fig. 1A). We determined the conservation of conserved genome regions from SARSr-MSA-1 across SARS-CoV-2 sequences as of deposition date 03–18-20. For this analysis, we obtained two whole-genome multiple sequence alignments, keeping only full-length genome sequences of at least 29,000 nucleotides in both cases: the first includes 103 NCBI sequences (SARS-CoV-2-MSA-1), and the second includes 739 sequences deposited to GISAID17 (SARS-CoV-2-MSA-2). We noted conserved regions in the betacoronavirus alignment SARSr-MSA-1 were more likely to be at least 99% conserved in both SARS-CoV-2-MSA-1 and SARS-CoV-2-MSA-2 than random intervals of the same size (binomial test p-value < 1e-5). Table 1 lists these regions, which we term the SARS-related-conserved regions. These genome regions are conserved across the betacoronavirus sequences in SARSr-MSA-1 and have at least 99% sequence conservation across whole-genome sequences from the SARS-CoV-2 outbreak as of March 18, 2020 (SARS-CoV-2-MSA-2).

Table 1:

SARS-related-conserved.

Conserved regions across SARSr-MSA-1 and SARS-CoV-2-MSA-2. All intervals are at least 90% conserved across the SARS and bat coronavirus sequences in SARSr-MSA-1, have length at least 15 nucleotides, and have every position at least 99% conserved in current GISAID SARS-CoV-2 sequences (SARS-CoV-2-MSA-2). Sequence intervals are relative to the reference genome NC_045512.2.

NameIntervalSequenceConservation in SARS-CoV-2SARS-related-conserved-114060–14075UAGAUAAUCAAGAUCU0.995896SARS-related-conserved-215838–15857UGGACUGAGACUGACCUUAC0.995896SARS-related-conserved-328554–28569UUCGUGGUGGUGACGG0.995868SARS-related-conserved-428513–28546AGAUGACCAAAUUGGCUACUACCGAAGAGCUACC0.995868SARS-related-conserved-516153–16169GUUAUGCUUACUAAUGA0.994528SARS-related-conserved-627183–27212GUACAGUAAGUGACAACAGAUGUUUCAUCU0.99449SARS-related-conserved-727165–27181GACAAUAUUGCUUUGCU0.99449SARS-related-conserved-825511–25530CACUCCCUUUCGGAUGGCUU0.99449SARS-related-conserved-925393–25409AUGGAUUUGUUUAUGAG0.99449SARS-related-conserved-1012905–12924AGGUUUGUUACAGACACACC0.99316SARS-related-conserved-1113346–13361GUGGGUUUUACACUUA0.99316SARS-related-conserved-1215496–15518ACAACUGCUUAUGCUAAUAGUGU0.99316SARS-related-conserved-1328799–28818AGCAGAGGCGGCAGUCAAGC0.993113SARS-related-conserved-1427457–27473GAGUGUGUUAGAGGUAC0.993113SARS-related-conserved-1525547–25562UUCUUGCUGUUUUUCA0.993113SARS-related-conserved-1617089–17105GGUACUGGUAAGAGUCA0.991792SARS-related-conserved-1717956–17975UGCAUAAUGUCUGAUAGAGA0.991792SARS-related-conserved-1818034–18050UUACAAGCUGAAAAUGU0.991792SARS-related-conserved-19704–723GACGAGCUUGGCACUGAUCC0.991792SARS-related-conserved-2025376–25392UACACAUAAACGAACUU0.991736SARS-related-conserved-2110406–10422UACAAUGGUUCACCAUC0.990437SARS-related-conserved-2216364–16388UGUCUGUUAAUCCGUAUGUUUGCAA0.990424SARS-related-conserved-2315622–15644UAUGAGUGUCUCUAUAGAAAUAG0.990424SARS-related-conserved-2415349–15367GUUCUUGCUCGCAAACAUA0.990424SARS-related-conserved-2515301–15323AAAUGUGAUAGAGCCAUGCCUAA0.990424SARS-related-conserved-2614077–14099AAUGGUAACUGGUAUGAUUUCGG0.990424SARS-related-conserved-27741–756AAAACUGGAACACUAA0.990424SARS-related-conserved-2825106–25128GAAAUUGACCGCCUCAAUGAGGU0.990358SARS-related-conserved-2926232–26267UGAGUACGAACUUAUGUACUCAUUCGUUUCGGAAGA0.990358SARS-related-conserved-3028270–28293UAAAAUGUCUGAUAAUGGACCCCA0.990358

Conservation percentages for SARSr-MSA-1, SARSr-MSA-2, SARS-CoV-2-MSA-1, and SARS-CoV-2-MSA-2 are included in Supplementary File 1. We expect that some diagnostic and therapeutic strategies will benefit from focusing on conserved regions across a broad range of betacoronaviruses, whereas others may benefit from focusing on regions conserved only in SARS-CoV-2; we revisit the latter category of SARS-CoV-2-unique regions below.

Predictions for structured regions in SARS-CoV-2

The intrinsic RNA structure of a conserved genome region is of interest in current medical research (Fig. 1B). On one hand, stable secondary structure domains are candidates for harboring stereotyped 3D RNA folds that present targets for small-molecule drug therapeutics. On the other hand, if an RNA region is sufficiently unstructured to allow binding by hybridization probes, antisense oligonucleotides may be used to disrupt these functional structures. Such unstructured stretches may also be more likely to be accessible to diagnostic and antiviral interventions including standard RT-PCR assays.

We used two approaches to make predictions for conserved structured regions in SARS-CoV-2. First, we predicted RNA structures centered on the most sequence-conserved regions of SARS-related betacoronavirus genomes (alignment SARSr-MSA-1). For each conserved stretch (at least 15 nucleotides long, 100% sequence conservation) along with 20 nucleotide flanking windows, we predicted maximum expected accuracy (MEA) secondary structures using Contrafold 2.0.18 We then sought to rank sequences based on the predicted probability that the RNA folds into the MEA structure and not other structures. For this ranking, we used the estimated Matthews correlation coefficient (MCC) from each construct’s base-pairing probability matrix.19 We note here that while MCC is often used in the RNA structure modeling literature to assess agreement of a prediction with a reference structure, we here use the metric to assess how tightly concentrated the ensemble of predicted secondary structures is to a single predicted secondary structure, the MEA structure. An MEA structure with a higher estimated MCC is expected to have unpaired and paired bases that better align with the construct’s predicted ensemble base-pairing probabilities, lending support to the single-structure MEA prediction. In Fig. 2 regions A-E, we display the five conserved regions with the top maximum expected accuracy (MEA) secondary structures as ranked by the estimated MCC (all regions listed in Supplementary File 1). Regions D and E occurred within the 5´ UTR and correspond to known SARS-related virus stem loops SL5a and SL2, respectively. Interestingly, region A is close to but does not overlap with the frameshifting stimulation element; it lies 200 nucleotides downstream of the FSE and could perhaps be involved in a more elaborate structure, as has been described for human coronavirus 229E and other coronaviruses.20

We also sought independent methods to identify thermodynamically stable and conserved RNA structures, without initially guiding the search to focus on extremely sequence-conserved genome regions. We made predictions for structured regions using RNAz21, beginning with the betacoronavirus alignment SARSr-MSA-1. RNAz predicts structured regions that are more thermodynamically stable than expected by comparison to random sequences of the same length and sequence composition (z-score), and additionally assesses regions by the support of compensatory and consistent mutations in the sequence alignment (SCI score). These two criteria are combined into a single P-score, which when tested empirically on a set of ncRNAs produced a false-positive rate of 4% at a P>0.5 cutoff and 1% at a P>0.9 cutoff. To predict structured regions across the full viral genome, we scanned the SARSr-MSA-1 alignment in windows of length 120 nucleotides sliding by 40 nucleotides, predicted all RNAz hits in the plus strand at a P>0.5 cutoff, clustered the resulting hits to generate maximally contiguous loci of the genome with predicted structure, and filtered results to only include loci with at least one window with a P>0.9 structure prediction.

The RNAz approach led to the prediction of 44 structured genome loci comprising 117 windows with predicted structure (P>0.9), with these loci covering 46% of the SARS-CoV-2 genome (Figure 3). We found that five canonical RNA structures (the frameshifting element, the 3´ UTR pseudoknot, the 3´ UTR hypervariable region, 5´ UTR SL2–3, and 5´ UTR SL5) were present in these loci. Additionally, conserved SARS-CoV-2 regions overlap significantly with predicted RNAz loci, with 62 of 78 SARS-CoV-2 conserved intervals at a 97% sequence cutoff overlapping by at least 15 nucleotides with RNAz loci. This enrichment is statistically significant (p-value<0.001 from comparisons to 10,000 random placements of conserved intervals). This enrichment also holds when considering overlaps with conserved regions from SARSr-MSA-1; 124 of the 229 SARSr-MSA-1 conserved intervals at a 90% conservation cutoff overlap by at least 15 nucleotides with RNAz loci (p-value 0.0038). This analysis potentially expands the set of conserved structural regions of SARS-CoV-2 beyond known Rfam families and those noted in the literature (full set of RNAz loci in Supplementary File 1). Top-scoring structured windows from RNAz that overlap with conserved sequence regions in SARS-CoV-2-MSA-2 for at least 15 nucleotides are included in Table 2; we termed these SARS-CoV-2-conserved-structured regions. Overlapping intervals between the RNAz predictions and conserved sequence regions in SARSr-MSA-1 are included in Supplementary File 1.

An external file that holds a picture, illustration, etc. Object name is nihpp-2020.03.27.012906-f0003.jpg

Structured (cyan) and unstructured (yellow) intervals on the genome ORFs for SARS-CoV-2, predicted from RNAz and a Contrafold 2.0 analysis, respectively. (A-C) highlight the three secondary structures for windows that do not overlap with known Rfam or literature-annotated structures with the highest P-value scores from RNAz (all P>0.9). These windows are located at genome positions 14207–14366 (A), 17126–17245 (B), and 26176–26295 (C). Secondary structures are colored by sequence conservation (cyan = more conserved, purple = less conserved). Figures prepared in Geneious41 and draw_rna (https://github.com/DasLab/draw_rna).

Table 2:

SARS-CoV-2-conserved-structured.

RNAz windows as scored by the P-value (P>0.9) that overlap with conserved intervals from SARS-CoV-2-MSA-2 (97% conservation cutoff) by at least 15 nucleotides. Sequence intervals are relative to the reference genome NC_045512.2.

NameSequence intervalSequenceSecondary structurez-scoreP valueSARS-CoV-2 conserved-structured-1918–1037ACUGGACUUUAUUGACACUAAGAGGGGUGUAUACUGCUGCCGUGAACAUGAGCAUGAAAUUGCUUGGUACACGGAACGUUCUGAAAAGAGCUAUGAAUUGCAGACACCUUUUGAAAUUAA.....................(((((((((...((((..(((((.((.((((((......)))))))).)))))...(((((....)))))........)))))))))))))........−4.380.999SARS-CoV-2-conserved-structured-214206–14325AUGUUGACACUGACUUAACAAAGCCUUACAUUAAGUGGGAUUUGUUAAAAUAUGACUUCACGGAAGAGAGGUUAAAACUCUUUGACCGUUAUUUUAAAUAUUGGGAUCAGACAUACCACC(((((((..(..(.((((((((..(((((.....))))).))))))))...........((((.((((((.......))))))..))))...........)..)..))).))))......−1.990.999SARS-CoV-2-conserved-structured-317125–17244UCUACUACCCUUCUGCUCGCAUAGUGUAUACAGCUUGCUCUCAUGCCGCUGUUGAUGCACUAUGUGAGAAGGCAUUAAAAUAUUUGCCUAUAGAUAAAUGUAGUAGAAUUAUACCUGCAC((((((((...(((((((((((((((((((((((..((......)).)))))..)))))))))))))).(((((..........))))).)))).....)))))))).............−6.850.999SARS-CoV-2-conserved-structured-47011–7130UUUAGGUGUUUUAAUGUCUAAUUUAGGCAUGCCUUCUUACUGUACUGGUUACAGAGAAGGCUAUUUGAACUCUACUAAUGUCACUAUUGCAACCUACUGUACUGGUUCUAUACCUUGUAG..(((((((..(((((((((...)))))))(((((((..(((((.....)))))))))))).......................))..)).)))))...(((.(((.....)))..))).−1.360.999SARS-CoV-2-conserved-structured-526135–26254AAUCGACGGUUCAUCCGGAGUUGUUAAUCCAGUAAUGGAACCAAUUUAUGAUGAACCGACGACGACUACUAGCGUGCCUUUGUAAGCACAAGCUGAUGAGUACGAACUUAUGUACUCAUU..(((.(((((((((..((((((....((((....))))..))))))..))))))))).))).......((((((((........))))..))))(((((((((......))))))))).−5.440.999SARS-CoV-2-conserved-structured-626175–26294CCAAUUUAUGAUGAACCGACGACGACUACUAGCGUGCCUUUGUAAGCACAAGCUGAUGAGUACGAACUUAUGUACUCAUUCGUUUCGGAAGAGACAGGUACGUUAAUAGUUAAUAGCGUA..................(((..(((((.((((((((((.....(((....)))((((((((((......)))))))))).(((((....))))))))))))))).))))).....))).−4.440.999SARS-CoV-2-conserved-structured-76251–6370AUGCAACUAAUAAAGCCACGUAUAAACCAAAUACCUGGUGUAUACGUUGUCUUUGGAGCACAAAACCAGUUGAAACAUCAAAUUCGUUUGAUGUACUGAAGUCAGAGGACGCGCAGGGAA.(((......((((((.(((((((.((((......)))).))))))).).)))))..((.......((((....((((((((....))))))))))))..(((....)))))))).....−2.520.999SARS-CoV-2-conserved-structured-840–157UUUCGAUCUCUUGUAGAUCUGUUCUCUAAACGAACUUUAAAAUCUGUGUGGCUGUCACUCGGCUGCAUGCUUAGUGCACUCACGCAGUAUAAUUAAUAACUAAUUACUGUCGUUGACA....(((((.....))))).........(((((..........(((((((..((.((((.(((.....))).))))))..)))))))..((((((.....))))))...)))))....−2.910.998SARS-CoV-2-conserved-structured-926491–26607AGUUUUUCUGUUUGGAACUUUAAUUUUAGCCAUGGCAGAUUCCAACGGUACUAUUACCGUUGAAGAGCUUAAAAAGCUCCUUGAACAAUGGAACCUAGUAAUAGGUUUCCUAUUCCU.((((.(((((((((..((........))))).))))))...((((((((....))))))))..(((((.....)))))...))))...((((((((....))))))))........−3.520.998SARS-CoV-2-conserved-structured-1025895–26014CAUUACUUCAGGUGAUGGCACAACAAGUCCUAUUUCUGAACAUGACUACCAGAUUGGUGGUUAUACUGAAAAAUGGGAAUCUGGAGUAAAAGACUGUGUUGUAUUACACAGUUACUUCAC..(((((((((((..((......))..((((((((.(.(..(((((((((.....)))))))))..).).)))))))))))))))))))..((((((((......)))))))).......−4.40.998SARS-CoV-2-conserved-structured-115211–5330AUUAAAUCACACUAAAAAGUGGAAAUACCCACAAGUUAAUGGUUUAACUUCUAUUAAAUGGGCAGAUAACAACUGUUAUCUUGCCACUGCAUUGUUAACACUCCAACAAAUAGAGUUGAA.........((.(((...((((......))))...))).)).(((((((.(((((.....((((((((((....)))))).)))).......((((........))))))))))))))))−2.760.998SARS-CoV-2-conserved-structured-1226215–26334UGUAAGCACAAGCUGAUGAGUACGAACUUAUGUACUCAUUCGUUUCGGAAGAGACAGGUACGUUAAUAGUUAAUAGCGUACUUCUUUUUCUUGCUUUCGUGGUAUUCUUGCUAGUUACAC((((((((.((((.((((((((((......)))))))))).)))).((((((((..(((((((((........))))))))))))))))).))).....(((((....))))).))))).−3.490.998SARS-CoV-2-conserved-structured-136211–6330GCUAAAUUGUUACAUAAACCUAUUGUUUGGCAUGUUAACAAUGCAACUAAUAAAGCCACGUAUAAACCAAAUACCUGGUGUAUACGUUGUCUUUGGAGCACAAAACCAGUUGAAACAUCA(((..((((((((((...((........)).))).)))))))........((((((.(((((((.((((......)))).))))))).).))))).))).....................−1.030.998SARS-CoV-2-conserved-structured-1416445–16564UUAUUGUAAAUCACAUAAACCACCCAUUAGUUUUCCAUUGUGUGCUAAUGGACAAGUUUUUGGUUUAUAUAAAAAUACAUGUGUUGGUAGCGAUAAUGUUACUGACUUUAAUGCAAUUGC....(((((((((...((((...((((((((............))))))))....)))).)))))))))........(((..((..(((((......)))))..))....))).......−1.170.997SARS-CoV-2-conserved-structured-157051–7170UGUACUGGUUACAGAGAAGGCUAUUUGAACUCUACUAAUGUCACUAUUGCAACCUACUGUACUGGUUCUAUACCUUGUAGUGUUUGUCUUAGUGGUUUAGAUUCUUUAGACACCUAUCCU(((.(((....))).(((((..((((((((...((....))(((((..(((((((((......(((.....)))..)))).).))))..))))))))))))))))))..)))........0.990.996SARS-CoV-2-conserved-structured-16158–277GGACACGAGUAACUCGUCUAUCUUCUGCAGGCUGCUUACGGUUUCGUCCGUGUUGCAGCCGAUCAUCAGCACAUCUAGGUUUCGUCCGGGUGUGACCGAAAGGUAAGAUGGAGAGCCUUG.....((((...(((.((((((((.....((((((.(((((......)))))..)))))).........(((((((.((......)))))))))(((....)))))))))))))).))))−2.870.996SARS-CoV-2-conserved-structured-1723977–24096UUACCAGAUCCAUCAAAACCAAGCAAGAGGUCAUUUAUUGAAGAUCUACUUUUCAACAAAGUGACACUUGCAGAUGCUGGCUUCAUCAAACAAUAUGGUGAUUGCCUUGGUGAUAUUGCU(((((((...............(((((..(((((((.(((((((.....)))))))..))))))).))))).......(((.((((((.......))))))..)))))))))).......−2.820.994SARS-CoV-2-conserved-structured-1825855–25974ACUAUUGUAUACCUUACAAUAGUGUAACUUCUUCAAUUGUCAUUACUUCAGGUGAUGGCACAACAAGUCCUAUUUCUGAACAUGACUACCAGAUUGGUGGUUAUACUGAAAAAUGGGAAU(((((((((.....)))))))))..............(((((((((.....))))))))).......((((((((.(.(..(((((((((.....)))))))))..).).))))))))..−4.250.994SARS-CoV-2-conserved-structured-192943–3062GGGCAUUGAUUUAGAUGAGUGGAGUAUGGCUACAUACUACUUAUUUGAUGAGUCUGGUGAGUUUAAAUUGGCUUCACAUAUGUAUUGUUCUUUCUACCCUCCAGAUGAGGAUGAAGAAGA..((((((..((((((((((..((((((....))))))))))))))))((((.(..((........))..).)))))).))))....((((((....((((.....))))..))))))..−0.320.994SARS-CoV-2-conserved-structured-2014726–14845UAAGGAAGGAAGUUCUGUUGAAUUAAAACACUUCUUCUUUGCUCAGGAUGGUAAUGCUGCUAUCAGCGAUUAUGACUACUAUCGUUAUAAUCUACCAACAAUGUGUGAUAUCAGACAACU..((.(((((((...((((.......))))...))))))).)).....(((((..((((....))))(((((((((.......))))))))))))))....(((.((....)).)))...−3.020.994SARS-CoV-2-conserved-structured-2126375–26490CAAUAUUGUUAACGUGAGUCUUGUAAAACCUUCUUUUUACGUUUACUCUCGUGUUAAAAAUCUGAAUUCUUCUAGAGUUCCUGAUCUUCUGGUCUAAACGAACUAAAUAUUAUAUU.((((((....(((.((((..(((((((......)))))))...)))).)))(((........((((((.....))))))..((((....))))..)))......)))))).....−2.730.994SARS-CoV-2-conserved-structured-2221365–21484AAUUCAGUUGUCUUCCUAUUCUUUAUUUGACAUGAGUAAAUUUCCCCUUAAAUUAAGGGGUACUGCUGUUAUGUCUUUAAAAGAAGGUCAAAUCAAUGAUAUGAUUUUAUCUCUUCUUAG..................((((((....((((((((((.....(((((((...)))))))...))))..))))))....))))))(((.((((((......)))))).))).........−2.270.994SARS-CoV-2-conserved-structured-232158–2277GAAGAGAAGUUUAAGGAAGGUGUAGAGUUUCUUAGAGACGGUUGGGAAAUUGUUAAAUUUAUCUCAACCUGUGCUUGUGAAAUUGUCGGUGGACAAAUUGUCACCUGUGCAAAGGAAAUU........((((.((((((.(....).))))))..))))((((((((((((....))))..))))))))((..(..((((..(((((....)))))....))))..)..)).........−1.620.993SARS-CoV-2-conserved-structured-244174–4293AAAAAGGCUGGUGGCACUACUGAAAUGCUAGCGAAAGCUUUGAGAAAAGUGCCAACAGACAAUUAUAUAACCACUUACCCGGGUCAGGGUUUAAAUGGUUACACUGUAGAGGAGGCAAAG.....(.(((.(((((((.((.((.....(((....))))).))...))))))).))).)..(((((((((((...((((......)))).....))))))...)))))...........−1.330.993SARS-CoV-2-conserved-structured-2526528–26647GAUUCCAACGGUACUAUUACCGUUGAAGAGCUUAAAAAGCUCCUUGAACAAUGGAACCUAGUAAUAGGUUUCCUAUUCCUUACAUGGAUUUGUCUUCUACAAUUUGCCUAUGCCAACAGG.....((((((((....))))))))..(((((.....)))))((((......((((((((....))))))))..........(((((..((((.....))))....)).)))....))))−3.290.993SARS-CoV-2-conserved-structured-2626255–26374CGUUUCGGAAGAGACAGGUACGUUAAUAGUUAAUAGCGUACUUCUUUUUCUUGCUUUCGUGGUAUUCUUGCUAGUUACACUAGCCAUCCUUACUGCGCUUCGAUUGUGUGCGUACUGCUG.(((((....))))).(((((((..((((((...((((((...........((((.....)))).....(((((.....))))).........))))))..))))))..)))))))....−2.310.992SARS-CoV-2-conserved-structured-275651–5770AUCUAGUACAACAGGAGUCACCUUUUGUUAUGAUGUCAGCACCACCUGCUCAGUAUGAACUUAAGCAUGGUACAUUUACUUGUGCUAGUGAGUACACUGGUAAUUACCAGUGUGGUCACU...((((((((((((((....))))))))..(((((....((((..((((.(((....)))..))))))))))))).....))))))((((.(((((((((....))))))))).)))).−2.770.992SARS-CoV-2-conserved-structured-287131–7250UGUUUGUCUUAGUGGUUUAGAUUCUUUAGACACCUAUCCUUCUUUAGAAACUAUACAAAUUACCAUUUCAUCUUUUAAAUGGGAUUUAACUGCUUUUGGCUUAGUUGCAGAGUGGUUUUU.((((((..((((.(((((((...)))))))..(((........)))..)))).))))))..((((((........)))))).....(((..(((...((......)).)))..)))...−0.120.992SARS-CoV-2-conserved-structured-2910970–11089AAAGUGCAGUGAAAAGAACAAUCAAGGGUACACACCACUGGUUGUUACUCACAAUUUUGACUUCACUUUUAGUUUUAGUCCAGAGUACUCAAUGGUCUUUGUUCUUUUUUUUGUAUGAAA...((((((.(((((((((((.((..(((....)))..))((((.(((((...(((..((((........))))..)))...)))))..)))).....))))))))))).))))))....−1.440.992SARS-CoV-2-conserved-structured-3028838–28957GUAGUCGCAACAGUUCAAGAAAUUCAACUCCAGGCAGCAGUAGGGGAACUUCUCCUGCUAGAAUGGCUGGCAAUGGCGGUG AUGCUGCUCUUGCUUUGCUGCUGCUUGACAGAUUGAACC.(((((.....((((.((....)).)))).((((((((((((((((.....)))))))).....(((.(((((.(((((.....))))).)))))..)))))))))))...)))))....−2.640.992SARS-CoV-2-conserved-structured-3126095–26214AAAUUGUUGAUGAGCCUGAAGAACAUGUCCAAAUUCACACAAUCGACGGUUCAUCCGGAGUUGUUAAUCCAGUAAUGGAACCAAUUUAUGAUGAACCGACGACGACUACUAGCGUGCCUU.....((((.((..(.((.....)).)..))...........(((.(((((((((..((((((....((((....))))..))))))..))))))))).)))))))..............−1.910.992SARS-CoV-2-conserved-structured-322903–3022UAAAAACUUUGCAACCAGUAUCUGAAUUACUUACACCACUGGGCAUUGAUUUAGAUGAGUGGAGUAUGGCUACAUACUACUUAUUUGAUGAGUCUGGUGAGUUUAAAUUGGCUUCACAUA..............((((((.......)))(((.(((((..(((.((...((((((((((..((((((....)))))))))))))))).)))))..))).)).)))..))).........−0.970.991SARS-CoV-2-conserved-structured-3329550–29669AAGGCAGAUGGGCUAUAUAAACGUUUUCGCUUUUCCGUUUACGAUAUAUAGUCUACUCUUGUGCAGAAUGAAUUCUCGUAACUACAUAGCACAAGUAGAUGUAGUUAACUUUAAUCUCAC.(((.((.(((((((((((..(((...((......))...))).))))))))))))))))(((.(((.((((......(((((((((.((....))..)))))))))..)))).))))))−1.830.990SARS-CoV-2-conserved-structured-3480–197AAUCUGUGUGGCUGUCACUCGGCUGCAUGCUUAGUGCACUCACGCAGUAUAAUUAAUAACUAAUUACUGUCGUUGACAGGACACGAGUAACUCGUCUAUCUUCUGCAGGCUGCUUACG...(((((((..((.((((.(((.....))).))))))..)))))))..((((((.....))))))..((.(((..(((((.(((((...))))).....)))))..))).)).....−0.80.990SARS-CoV-2-conserved-structured-3513886–14005CAAUUGUUGUGAUGAUGAUUAUUUCAAUAAAAAGGACUGGUAUGAUUUUGUAGAAAACCCAGAUAUAUUACGCGUAUACGCCAACUUAGGUGAACGUGUACGCCAAGCUUUGUUAAAAAC...((((((.(((((...))))).))))))((((..((((..................)))).........(((((((((...((....))...)))))))))....)))).........−0.370.989SARS-CoV-2-conserved-structured-36120–237CACGCAGUAUAAUUAAUAACUAAUUACUGUCGUUGACAGGACACGAGUAACUCGUCUAUCUUCUGCAGGCUGCUUACGGUUUCGUCCGUGUUGCAGCCGAUCAUCAGCACAUCUAGGU...((((((..((((.....)))))))))).((((((((((.(((((...))))).....)))))..((((((.(((((......)))))..)))))).....)))))..........−2.330.989SARS-CoV-2-conserved-structured-3723697–23816UGCCAUACCCACAAAUUUUACUAUUAGUGUUACCACAGAAAUUCUACCAGUGUCUAUGACCAAGACAUCAGUAGAUUGUACAAUGUACAUUUGUGGUGAUUCAACUGAAUGCAGCAAUCU.......................((((((((((((((((...(((((..((((((.......))))))..))))).(((((...))))))))))))))))...)))))............−2.640.989SARS-CoV-2-conserved-structured-3825655–25774AACAGUUUACUCACACCUUUUGCUCGUUGCUGCUGGCCUUGAAGCCCCUUUUCUCUAUCUUUAUGCUUUAGUCUACUUCUUGCAGAGUAUAAACUUUGUAAGAAUAAUAAUGAGGCUUUG..((((...............((.....)).))))((((((((((...................))))).......((((((((((((....)))))))))))).......)))))....−1.150.989SARS-CoV-2-conserved-structured-3915606–15724UUACAACACAGACUUUAUGAGUGUCUCUAUAGAAAUAGAGAUGUUGACACAGACUUUGUGAAUGAGUUUUACGCAUAUUUGCGUAAACAUUUCUCAAUGAUGAUACUCUCUGACGAUGC........((((......(((((((..(((.((....((((((((..(((((...))))).........((((((....)))))))))))))))).)))..))))))))))).......−1.810.988SARS-CoV-2-conserved-structured-4014526–14645AGCUCUAGACUUAGUUUUAAGGAAUUACUUGUGUAUGCUGCUGACCCUGCUAUGCACGCUGCUUCUGGUAAUCUAUUACUAGAUAAACGCACUACGUGCUUUUCAGUAGCUGCACUUACU...........((((((....))))))...(((((.((((((((.........(((((.(((.((((((((....)))))))).....)))...)))))...))))))))))))).....−1.640.986SARS-CoV-2-conserved-structured-413808–3897CUCUAUGACAAACUUGUUUCAAGCUUUUUGGAAAUGAAGAGUGAAAAGCAAGUUGAACAAAAGAUCGCUGAGAUUCCUAAAGAGGAAGUU(((..(((..(((((((((...(((((((......)))))))...)))))))))..........)))..))).(((((....)))))...−1.320.983SARS-CoV-2-conserved-structured-4225935–26054CAUGACUACCAGAUUGGUGGUUAUACUGAAAAAUGGGAAUCUGGAGUAAAAGACUGUGUUGUAUUACACAGUUACUUCACUUCAGACUAUUACCAGCUGUACUCAACUCAAUUGAGUACA.(((((((((.....))))))))).(((...(((((...((((((((((..((((((((......))))))))..)).)))))))))))))..)))..((((((((.....)))))))).−4.970.983SARS-CoV-2-conserved-structured-4315645–15764GAUGUUGACACAGACUUUGUGAAUGAGUUUUACGCAUAUUUGCGUAAACAUUUCUCAAUGAUGAUACUCUCUGACGAUGCUGUUGUGUGUUUCAAUAGCACUUAUGCAUCUCAAGGUCUA(((.((((..((((..(..(.(.((((.((((((((....)))))))).....)))).).)..).....))))..(((((...((.(((((.....))))).)).))))))))).)))..−2.070.983SARS-CoV-2-conserved-structured-442518–2625UUAACAGAGGAAGUUGUCUUGAAAACUGGUGAUUUACAACCAUUAGAACAACCUACUAGUGAAGCUGUUGAAGCUCCAUUGGUUGGUACACCAGUUUGUAUUAACGGG............((((......(((((((((.((((.......))))...(((.(((((((.((((.....)))).))))))).))).)))))))))....))))...−1.930.982SARS-CoV-2-conserved-structured-4526568–26687UCCUUGAACAAUGGAACCUAGUAAUAGGUUUCCUAUUCCUUACAUGGAUUUGUCUUCUACAAUUUGCCUAUGCCAACAGGAAUAGGUUUUUGUAUAUAAUUAAGUUAAUUUUCCUCUGGC..(((((((((.((((((((....))))))))....(((......))).))))....(((((...((((((.((....)).))))))..))))).....)))))................−2.130.982SARS-CoV-2-conserved-structured-461238–1357AUUGUGGUGAAACUUCAUGGCAGACGGGCGAUUUUGUUAAAGCCACUUGCGAAUUUUGUGGCACUGAGAAUUUGACUAAAGAAGGUGCCACUACUUGUGGUUACUUACCCCAAAAUGCUG.((((.((((....)))).))))...(((.((((((.....(((((...........)))))....................(((((((((.....)))).)))))....))))))))).−0.750.981SARS-CoV-2-conserved-structured-476451–6570GAGUGUAAUGUGAAAACUACCGAAGUUGUAGGAGACAUUAUACUUAAACCAGCAAAUAAUAGUUUAAAAAUUACAGAAGAGGUUGGCCACACAGAUCUAAUGGCUGCUUAUGUAGACAAU(((((((((((.....((((.......))))...))))))))))).........................(((((....((((.(((((...........))))))))).))))).....−0.830.981SARS-CoV-2-conserved-structured-4829078–29197AUGUAACACAAGCUUUCGGCAGACGUGGUCCAGAACAAACCCAAGGAAAUUUUGGGGACCAGGAACUAAUCAGACAAGGAACUGAUUACAAACAUUGGCCGCAAAUUGCACAAUUUGCCC.................(((...(.((((((.(......)(((((.....))))))))))).)...(((((((........))))))).........)))(((((((....)))))))..−1.240.980SARS-CoV-2-conserved-structured-4923497–23616CGUGCAGGCUGUUUAAUAGGGGCUGAACAUGUCAACAACUCAUAUGAGUGUGACAUACCCAUUGGUGCAGGUAUAUGCGCUAGUUAUCAGACUCAGACUAAUUCUCCUCGGCGGGCACGU(((((..((((.......(((.((((..((((((...((((....)))).))))))....(((((((((......)))))))))..)))).)))(((.....)))...))))..))))).−2.420.979SARS-CoV-2-conserved-structured-5013686–13805GAAGAAACAAUUUAUAAUUUACUUAAGGAUUGUCCAGCUGUUGCUAAACAUGACUUCUUUAAGUUUAGAAUAGACGGUGACAUGGUACCACAUAUAUCACGUCAACGUCUUACUAAAUAC....................(((((((((.(((..(((....)))..))).....)))))))))((((...(((((.((((.(((((.......))))).)))).)))))..))))....−3.10.978SARS-CoV-2-conserved-structured-5112610–12729AUGGACAAUUCACCUAAUUUAGCAUGGCCUCUUAUUGUAACAGCUUUAAGGGCCAAUUCUGCUGUCAAAUUACAGAAUAAUGAGCUUAGUCCUGUUGCACUACGACAGAUGUCUUGUGCU..((((((((((............(((((.((((..((....))..)))))))))((((((.((......))))))))..)))).)).))))....((((...(((....)))..)))).−1.940.978SARS-CoV-2-conserved-structured-5219645–19764UUGUAAAUAAGGGACACUUUGAUGGACAACAGGGUGAAGUACCAGUUUCUAUCAUUAAUAACACUGUUUACACAAAAGUUGAUGGUGUUGAUGUAGAAUUGUUUGAAAAUAAAACAACAU...(((((((...(((...(((((((.(((.((........)).))))))))))....((((((((((.((......)).)))))))))).)))....)))))))...............−0.890.977SARS-CoV-2-conserved-structured-5311050–11169CAGAGUACUCAAUGGUCUUUGUUCUUUUUUUUGUAUGAAAAUGCCUUUUUACCUUUUGCUAUGGGUAUUAUUGCUAUGUCUGCUUUUGCAAUGAUGUUUGUCAAACAUAAGCAUGCAUUU(((((.(((....))))))))..........((((((.....((..(..(((((........)))))..)..))(((((.(((....))).((((....)))).)))))..))))))...−0.330.977SARS-CoV-2-conserved-structured-5417445–17564CAAUUACCUGCACCACGCACAUUGCUAACUAAGGGCACACUAGAACCAGAAUAUUUCAAUUCAGUGUGUAGACUUAUGAAAACUAUAGGUCCAGACAUGUUCCUCGGAACUUGUCGGCGU.........((.((((((((..((((.......))))...........((((......)))).)))))).((((((((.....))))))))..((((.((((....)))).)))))).))−2.140.974SARS-CoV-2-conserved-structured-5524256–24375GCUAUGCAAAUGGCUUAUAGGUUUAAUGGUAUUGGAGUUACACAGAAUGUUCUCUAUGAGAACCAAAAAUUGAUUGCCAACCAAUUUAAUAGUGCUAUUGGCAAAAUUCAAGACUCACUU(((((....)))))....................(((((.........((((((...))))))......(((((((((((...(((....)))....)))))))...)))))))))....−0.320.974SARS-CoV-2-conserved-structured-5617525–17644AACUAUAGGUCCAGACAUGUUCCUCGGAACUUGUCGGCGUUGUCCUGCUGAAAUUGUUGACACUGUGAGUGCUUUGGUUUAUGAUAAUAAGCUUAAAGCACAUAAAGACAAAUCAGCUCA......(((.((.((((.((((....)))).))))))......)))(((((..(((((.......((.((((((((((((((....)))))).)))))))).))..))))).)))))...−2.220.973SARS-CoV-2-conserved-structured-5727248–27367AAUUAUUAUGAGGACUUUUAAAGUUUCCAUUUGGAAUCUUGAUUACAUCAUAAACCUCAUAAUUAAAAAUUUAUCUAAGUCACUAACUGAGAAUAAAUAUUCUCAAUUAGAUGAAGAGCA....(((((((((.......(((.((((....)))).)))(((...))).....)))))))))...........((...(((((((.(((((((....))))))).)))).)))..))..−3.720.972SARS-CoV-2-conserved-structured-584574–4693CACUUAACGAUCUAAAUGAAACUCUUGUUACAAUGCCACUUGGCUAUGUAACACAUGGCUUAAAUUUGGAAGAAGCUGCUCGGUAUAUGAGAUCUCUCAAAGUGCCAGCUACAGUUUCUG..........(((((((..((..(.(((((((..(((....)))..)))))))...)..))..)))))))((((((((((.(((((.((((....))))..))))))))...))))))).−3.730.971SARS-CoV-2-conserved-structured-5921325–21444UCAUGCAUGCAAAUUACAUAUUUUGGAGGAAUACAAAUCCAAUUCAGUUGUCUUCCUAUUCUUUAUUUGACAUGAGUAAAUUUCCCCUUAAAUUAAGGGGUACUGCUGUUAUGUCUUUAA.(((((((((((((.....((...((((((..((............))..)))))).)).....))))).))))((((.....(((((((...)))))))...))))..)))).......−1.240.970SARS-CoV-2-conserved-structured-607771–7890CUUUACUUUGAUAAAGCUGGUCAAAAGACUUAUGAAAGACAUUCUCUCUCUCAUUUUGUUAACUUAGACAACCUGAGAGCUA AUAACACUAAAGGUUCAUUGCCUAUUAAUGUUAUAGUU.....((((.((((..((.......))..)))).)))).........((((((..(((((......)))))..))))))...((((((.(((((((.....)))).))).))))))....−1.930.967SARS-CoV-2-conserved-structured-6119765–19884UACCUGUUAAUGUAGCAUUUGAGCUUUGGGCUAAGCGCAACAUUAAACCAGUACCAGAGGUGAAAAUACUCAAUAAUUUGGGUGUGGACAUUGCUGCUAAUACUGUGAUCUGGGACUACA..(((.(((((((.((.(((.(((.....)))))).)).))))))).((((((.(((..(((...((((((((....))))))))...)))..)))....))))).)....)))......−0.840.965SARS-CoV-2-conserved-structured-626971–7090UAAGUGUUUGCCUAGGUUCUUUAAUCUACUCAACCGCUGCUUUAGGUGUUUUAAUGUCUAAUUUAGGCAUGCCUUCUUACUGUACUGGUUACAGAGAAGGCUAUUUGAACUCUACUAAUG..((((.(((..((((((....))))))..))).))))....((((.((((..(((((((...)))))))(((((((..(((((.....)))))))))))).....))))))))......−1.90.963SARS-CoV-2-conserved-structured-6328998–29117AGGCCAAACUGUCACUAAGAAAUCUGCUGCUGAGGCUUCUAAGAAGCCUCGGCAAAAACGUACUGCCACUAAAGCAUACAAUGUAACACAAGCUUUCGGCAGACGUGGUCCAGAACAAAC.((((......................((((((((((((...))))))))))))...((((.(((((...(((((................))))).)))))))))))))..........−4.190.963SARS-CoV-2-conserved-structured-645011–5130GUUGUGGACAUGUCAAUGACAUAUGGACAACAGUUUGGUCCAACUUAUUUGGAUGGAGCUGAUGUUACUAAAAUAAAACCUCAUAAUUCACAUGAAGGUAAAACAUUUUAUGUUUUACCU..((((((.(((((...)))))...((((.((((((.((((((.....)))))).)))))).))))....................))))))...(((((((((((...)))))))))))−4.190.963SARS-CoV-2-conserved-structured-654094–4213CUUUCUUAAAGAAAGAUGCUCCAUAUAUAGUGGGUGAUGUUGUUCAAGAGGGUGUUUUAACUGCUGUGGUUAUACCUACUAAAAAGGCUGGUGGCACUACUGAAAUGCUAGCGAAAGCUU..(((((.((.((...(((.((((.....)))))))...)).)).)))))(((((((((......(((((..((((..((.....))..))))..))))))))))))))(((....))).0.40.958SARS-CoV-2-conserved-structured-661598–1717GUCUUAAUGACAACCUUCUUGAAAUACUCCAAAAAGAGAAAGUCAACAUCAAUAUUGUUGGUGACUUUAAACUUAAUGAAGAGAUCGCCAUUAUUUUGGCAUCUUUUUCUGCUUCCACAA(((.....)))...((((((((....(((......)))(((((((.((.((....)).)).)))))))....)))).))))((((.((((......))))))))................−20.955SARS-CoV-2-conserved-structured-6711090–11209AUGCCUUUUUACCUUUUGCUAUGGGUAUUAUUGCUAUGUCUGCUUUUGCAAUGAUGUUUGUCAAACAUAAGCAUGCAUUUCUCUGUUUGUUUUUGUUACCUUCUCUUGCCACUGUAGCUU.................(((((((.(((((((((...(....)....))))))))).......((((.(((((.(((......))).))))).))))..............)))))))..−1.630.955SARS-CoV-2-conserved-structured-6817965–18084CUGAUAGAGACCUUUAUGACAAGUUGCAAUUUACAAGUCUUGAAAUUCCACGUAGGAAUGUGGCAACUUUACAAGCUGAAAAUGUAACAGGACUCUUUAAAGAUUGUAGUAAGGUAAUCA..(((....(((((....((((((((((.(((...((.(((((((((((((((....)))))).)).))).)))))).))).)))))).....(((....)))))))...))))).))).−0.960.952SARS-CoV-2-conserved-structured-6925815–25934GAUGCCAACUAUUUUCUUUGCUGGCAUACUAAUUGUUACGACUAUUGUAUACCUUACAAUAGUGUAACUUCUUCAAUUGUCAUUACUUCAGGUGAUGGCACAACAAGUCCUAUUUCUGAA.((((((.(..........).)))))).......(((((.(((((((((.....)))))))))))))).........(((((((((.....)))))))))....................−4.170.952SARS-CoV-2-conserved-structured-7014446–14565AUGGUGUUCCAUUUGUAGUUUCAACUGGAUACCACUUCAGAGAGCUAGGUGUUGUACAUAAUCAGGAUGUAAACUUACAUAGCUCUAGACUUAGUUUUAAGGAAUUACUUGUGUAUGCUG..(((((..(((..((((((((..(((((......)))))(((((((.(((...(((((.......)))))....))).)))))))..............))))))))..))).))))).−1.410.948SARS-CoV-2-conserved-structured-712118–2237CACUGUUUAUGAAAAACUCAAACCCGUCCUUGAUUGGCUUGAAGAGAAGUUUAAGGAAGGUGUAGAGUUUCUUAGAGACGGUUGGGAAAUUGUUAAAUUUAUCUCAACCUGUGCUUGUGA(((.((((.(((.((((((..(((..(((((((...((((......))))))))))).)))...)))))).))).))))((((((((((((....))))..)))))))).......))).−2.750.948SARS-CoV-2-conserved-structured-7217205–17324UAUUUGCCUAUAGAUAAAUGUAGUAGAAUUAUACCUGCACGUGCUCGUGUAGAGUGUUUUGAUAAAUUCAAAGUGAAUUCAACAUUAGAACAGUAUGUCUUUUGUACUGUAAAUGCAUUG((((((....))))))((((((((.((((((((((((((((....))))))).))))(((((.....)))))...))))).))......((((((((.....))))))))...)))))).−2.610.947SARS-CoV-2-conserved-structured-737171–7290UCUUUAGAAACUAUACAAAUUACCAUUUCAUCUUUUAAAUGGGAUUUAACUGCUUUUGGCUUAGUUGCAGAGUGGUUUUUGGCAUAUAUUCUUUUCACUAGGUUUUUCUAUGUACUUGGA...(((((((((((..(((((.((((((........)))))))))))..((((..(((...)))..)))).)))))))))))...............((((((..........)))))).−0.530.945SARS-CoV-2-conserved-structured-7414766–14885GCUCAGGAUGGUAAUGCUGCUAUCAGCGAUUAUGACUACUAUCGUUAUAAUCUACCAACAAUGUGUGAUAUCAGACAACUACUAUUUGUAGUUGAAGUUGUUGAUAAGUACUUUGAUUGU(((...(((((((....))))))))))(((((((((.......)))))))))...(((((((((.((....)).))((((((.....))))))...))))))).................−2.130.943SARS-CoV-2-conserved-structured-7511730–11849UAUGAAUUCACAGGGACUACUCCCACCCAAGAAUAGCAUAGAUGCCUUCAAACUCAACAUUAAAUUGUUGGGUGUUGGUGGCAAACCUUGUAUCAAAGUAGCCACUGUACAGUCUAAAAU...((....((((((.(((((......((((...........((((.((((((((((((......)))))))).)))).))))...))))......))))))).))))....))......−1.590.942SARS-CoV-2-conserved-structured-7629238–29357GGAAGUCACACCUUCGGGAACGUGGUUGACCUACACAGGUGCCAUCAAAUUGGAUGACAAAGAUCCAAAUUUCAAAGAUCAAGUCAUUUUGCUGAAUAAGCAUAUUGACGCAUACAAAAC....((((.(((..((....)).)))))))........((((.......((((((.......))))))..............((((...((((.....))))...)))))))).......−1.580.941SARS-CoV-2-conserved-structured-771558–1677GGUUGUAACCAUACAGGUGUUGUUGGAGAAGGUUCCGAAGGUCUUAAUGACAACCUUCUUGAAAUACUCCAAAAAGAGAAAGUCAACAUCAAUAUUGUUGGUGACUUUAAACUUAAUGAA(((....))).....((.((..((.(((((((((......(((.....)))))))))))).))..)).))...(((..(((((((.((.((....)).)).)))))))...)))......−1.180.941SARS-CoV-2-conserved-structured-784214–4333UGAGAAAAGUGCCAACAGACAAUUAUAUAACCACUUACCCGGGUCAGGGUUUAAAUGGUUACACUGUAGAGGAGGCAAAGACAGUGCUUAAAAAGUGUAAAAGUGCCUUUUACAUUCUAC..((((.((((................((((((...((((......)))).....))))))))))(((((..(((((...(((.(........).))).....)))))))))).))))..−0.390.940SARS-CoV-2-conserved-structured-799410–9529CUGGUGGUAUUGUAGCUAUCGUAGUAACAUGCCUUGCCUACUAUUUUAUGAGGUUUAGAAGAGCUUUUGGUGAAUACAGUCAUGUAGUUGCCUUUAAUACUUUACUAUUCCUUAUGUCAU..((((((((.((.((((...)))).)))))))..)))........(((((((..(((.((((..((.((..(.((((....)))).)..))...))..)))).)))..)))))))....−1.740.940SARS-CoV-2-conserved-structured-8013806–13925ACAAUGGCAGACCUCGUCUAUGCUUUAAGGCAUUUUGAUGAAGGUAAUUGUGACACAUUAAAAGAAAUACUUGUCACAUACAAUUGUUGUGAUGAUGAUUAUUUCAAUAAAAAGGACUGG(((((.....((((((((.(((((....)))))...)))).)))).)))))............((((((.(..((((((((....).)))).)))..).))))))...............−1.730.940SARS-CoV-2-conserved-structured-8123537–23656CAUAUGAGUGUGACAUACCCAUUGGUGCAGGUAUAUGCGCUAGUUAUCAGACUCAGACUAAUUCUCCUCGGCGGGCACGUAGUGUAGCUAGUCAAUCCAUCAUUGCCUACACUAUGUCAC....(((((.(((.......(((((((((......)))))))))..))).)))))..........((.....))(.(((((((((((.((((.........)))).))))))))))))..−0.510.940SARS-CoV-2-conserved-structured-829210–9329GUACUGUAGGCACGGCACUUGUGAAAGAUCAGAAGCUGGUGUUUGUGUAUCUACUAGUGGUAGAUGGGUACUUAACAAUGAUUAUUACAGAUCUUUACCAGGAGUUUUCUGUGGUGUAGA(((((.(((((((.(..(((.(((....))).))).).)))))))..(((((((.....))))))))))))...........(((((((((.((((....))))...)))))))))....−0.310.939SARS-CoV-2-conserved-structured-836091–6210CAGUUAACUGGUUAUAAGAAACCUGCUUCAAGAGAGCUUAAAGUUACAUUUUUCCCUGACUUAAAUGGUGAUGUGGUGGCUAUUGAUUAUAAACACUACACACCCUCUUUUAAGAAAGGA..((.((((((((......)))).((((.....))))....)))))).......(((..((((((.((((.(((((((..(((.....)))..)))))))))))....))))))..))).−2.570.939SARS-CoV-2-conserved-structured-8429118–29237CCAAGGAAAUUUUGGGGACCAGGAACUAAUCAGACAAGGAACUGAUUACAAACAUUGGCCGCAAAUUGCACAAUUUGCCCCCAGCGCUUCAGCGUUCUUCGGAAUGUCGCGCAUUGGCAU((((........(((((.((((....(((((((........)))))))......))))..(((((((....))))))))))))((((....((((((....)))))).)))).))))...−3.140.938SARS-CoV-2-conserved-structured-8528958–29077AGCUUGAGAGCAAAAUGUCUGGUAAAGGCCAACAACAACAAGGCCAAACUGUCACUAAGAAAUCUGCUGCUGAGGCUUCUAAGAAGCCUCGGCAAAAACGUACUGCCACUAAAGCAUACA.((((.((.(((..((((..((((..((((...........))))...((.......)).....))))(((((((((((...)))))))))))....))))..)))..)).)))).....−2.780.935SARS-CoV-2-conserved-structured-8629278–29397GCCAUCAAAUUGGAUGACAAAGAUCCAAAUUUCAAAGAUCAAGUCAUUUUGCUGAAUAAGCAUAUUGACGCAUACAAAACAUUCCCACCAACAGAGCCUAAAAAGGACAAAAAGAAGAAG.........((((((.......)))))).((((.........(((...((((.....))))...))))..........................((((....))).).....))))...−1.580.935SARS-CoV-2-conserved-structured-8714046–14165GGUGUACUGACAUUAGAUAAUCAAGAUCUCAAUGGUAACUGGUAUGAUUUCGGUGAUUUCAUACAAACCACGCCAGGUAGUGGAGUUCCUGUUGUAGAUUCUUAUUAUUCAUUGUUAAUG.......(((((.(..(((((.(((((((((((((.((((.((((((..((...))..))))))...((((........)))))))).)))))).))).)))))))))..).)))))...−2.050.935SARS-CoV-2-conserved-structured-889250–9369GUUUGUGUAUCUACUAGUGGUAGAUGGGUACUUAACAAUGAUUAUUACAGAUCUUUACCAGGAGUUUUCUGUGGUGUAGAUGCUGUAAAUUUACUUACUAAUAUGUUUACACCACUAAUU(((...((((((((((....))).)))))))..)))..........(((((.((((....))))...)))))(((((((((...((((......))))......))))))))).......−0.530.934SARS-CoV-2-conserved-structured-8926415–26527GUUUACUCUCGUGUUAAAAAUCUGAAUUCUUCUAGAGUUCCUGAUCUUCUGGUCUAAACGAACUAAAUAUUAUAUUAGUUUUUCUGUUUGGAACUUUAAUUUUAGCCAUGGCA........(((((((((((....((.....))(((((((((.((((....)))).(((((((((((........)))))))....))))))))))))).)))))).)))))..−1.940.934SARS-CoV-2-conserved-structured-9014846–14965ACUAUUUGUAGUUGAAGUUGUUGAUAAGUACUUUGAUUGUUACGAUGGUGGCUGUAUUAAUGCUAACCAAGUCAUCGUCAAC AACCUAGACAAAUCAGCUGGUUUUCCAUUUAAUAAAUG..........((((((...((((((..((.((..(.(((((..((((((((((...(((....)))...))))))))))))))))..))))..))))))(((....))))))))).....−0.790.934SARS-CoV-2-conserved-structured-914971–5090GGUGUUUACAACAGUAGACAACAUUAACCUCCACACGCAAGUUGUGGACAUGUCAAUGACAUAUGGACAACAGUUUGGUCCAACUUAUUUGGAUGGAGCUGAUGUUACUAAAAUAAAACC(((((((((....))))))(((((...(((((((((....).)))))).(((((...)))))..))....((((((.((((((.....)))))).)))))))))))...........)))−3.720.932SARS-CoV-2-conserved-structured-929610–9729CUUACUAAUGAUGUUUCUUUUUUAGCACAUAUUCAGUGGAUGGUUAUGUUCACACCUUUAGUACCUUUCUGGAUAACAAUUGCUUAUAUCAUUUGUAUUUCCACAAAGCAUUUCUAUUGG.......(((.((((........))))))).....((((((......)))))).((..(((...((((.((((..((((.((.......)).))))...)))).)))).....)))..))−0.850.932SARS-CoV-2-conserved-structured-935691–5810ACCACCUGCUCAGUAUGAACUUAAGCAUGGUACAUUUACUUGUGCUAGUGAGUACACUGGUAAUUACCAGUGUGGUCACUAUAAACAUAUAACUUCUAAAGAAACUUUGUAUUGCAUAGA...(((((((.(((....)))..)))).)))........((((((((((((.(((((((((....))))))))).)))))).....((((((.(((....)))...)))))).)))))).−3.180.931SARS-CoV-2-conserved-structured-9425016–25135UAGAUAAAUAUUUUAAGAAUCAUACAUCACCAGAUGUUGAUUUAGGUGACAUCUCUGGCAUUAAUGCUUCAGUUGUAAACAUUCAAAAAGAAAUUGACCGCCUCAAUGAGGUUGCCAAGA.((((...........((((((.(((((....))))))))))).(....))))).(((((........((((((.................))))))..(((((...))))))))))...−1.160.926SARS-CoV-2-conserved-structured-956411–6530AGAAGUAGUGGAAAAUCCUACCAUACAGAAAGACGUUCUUGAGUGUAAUGUGAAAACUACCGAAGUUGUAGGAGACAUUAUACUUAAACCAGCAAAUAAUAGUUUAAAAAUUACAGAAGA....(((((((.....))................(((.(((((((((((((.....((((.......))))...)))))))))))))...)))................)))))......−1.10.922SARS-CoV-2-conserved-structured-9626608–26727UACAUGGAUUUGUCUUCUACAAUUUGCCUAUGCCAACAGGAAUAGGUUUUUGUAUAUAAUUAAGUUAAUUUUCCUCUGGCUGUUAUGGCCAGUAACUUUAGCUUGUUUUGUGCUUGCUGC..((..((.........(((((...((((((.((....)).))))))..)))))......((((((((.......((((((.....)))))).....)))))))).))..))........−1.640.922SARS-CoV-2-conserved-structured-9711010–11129GUUGUUACUCACAAUUUUGACUUCACUUUUAGUUUUAGUCCAGAGUACUCAAUGGUCUUUGUUCUUUUUUUUGUAUGAAAAUGCCUUUUUACCUUUUGCUAUGGGUAUUAUUGCUAUGUC..((.(((((...(((..((((........))))..)))...)))))..))(((((................((((....)))).....(((((........))))).....)))))...0.120.922SARS-CoV-2-conserved-structured-9810850–10969CUUCAUUAAAAGAAUUACUGCAAAAUGGUAUGAAUGGACGUACCAUAUUGGGUAGUGCUUUAUUAGAAGAUGAAUUUACACCUUUUGAUGUUGUUAGACAAUGCUCAGGUGUUACUUUCC((((....((((.((((((.(((.((((((((......)))))))).))))))))).))))....))))..(((...((((((...(.(((((.....))))))..))))))....))).−2.670.920SARS-CoV-2-conserved-structured-9914886–15005UACGAUGGUGGCUGUAUUAAUGCUAACCAAGUCAUCGUCAACAACCUAGACAAAUCAGCUGGUUUUCCAUUUAAUAAAUGGGGUAAGGCUAGACUUUAUUAUGAUUCAAUGAGUUAUGAG...((((((((((...(((....)))...))))))))))...............((((((....(((((((.....)))))))...)))).)).....(((((((((...))))))))).−1.330.919SARS-CoV-2-conserved-structured-10025216–25335GGUUUUAUAGCUGGCUUGAUUGCCAUAGUAAUGGUGACAAUUAUGCUUUGCUGUAUGACCAGUUGCUGUAGUUGUCUCAAGGGCUGUUGUUCUUGUGGAUCCUGCUGCAAAUUUGAUGAA((((.((((((.(((.((((((((((....))))...)))))).)))..)))))).))))..((((.((((..(((((((((((....))))))).)))).)))).))))..........−3.780.916SARS-CoV-2-conserved-structured-10111690–11809GUGUUUAUGAUUACUUAGUUUCUACACAGGAGUUUAGAUAUAUGAAUUCACAGGGACUACUCCCACCCAAGAAUAGCAUAGAUGCCUUCAAACUCAACAUUAAAUUGUUGGGUGUUGGUG(((((((((....(((.............(((((((......)))))))...((((....))))....))).....)))))))))..((((((((((((......)))))))).))))..−1.170.913SARS-CoV-2-conserved-structured-1025251–5370GGUUUAACUUCUAUUAAAUGGGCAGAUAACAACUGUUAUCUUGCCACUGCAUUGUUAACACUCCAACAAAUAGAGUUGAAGUUUAAUCCACCUGCUCUACAAGAUGCUUAUUACAGAGCA((...((((((.(((.....((((((((((....)))))).)))).(((..(((((........))))).)))))).))))))....))....(((((..((........))..))))).−2.280.912SARS-CoV-2-conserved-structured-1031198–1317UGCAACCAAAUGUGCCUUUCAACUCUCAUGAAGUGUGAUCAUUGUGGUGAAACUUCAUGGCAGACGGGCGAUUUUGUUAAAGCCACUUGCGAAUUUUGUGGCACUGAGAAUUUGACUAAA......((((((((((...(((....((((((((.(.(((.....))).).))))))))((((...(((............)))..)))).....))).))))).....)))))......−0.420.910SARS-CoV-2-conserved-structured-10428758–28877UCAAGGAACAACAUUGCCAAAAGGCUUCUACGCAGAAGGGAGCAGAGGCGGCAGUCAAGCCUCUUCUCGUUCCUCAUCACGUAGUCGCAACAGUUCAAGAAAUUCAACUCCAGGCAGCAG.............(((((.....((..(((((.....(((((((((((.(((......))).))))).)))))).....)))))..))...((((.((....)).))))...)))))...−3.570.910SARS-CoV-2-conserved-structured-10517765–17884CUUUAUUUCACCUUAUAAUUCACAGAAUGCUGUAGCCUCAAAGAUUUUGGGACUACCAACUCAAACUGUUGAUUCAUCACAGGGCUCAGAAUAUGACUAUGUCAUAUUCACUCAAACCAC.....................((((....))))((((.....((..((((.....)))).))...((((.((....))))))))))..(((((((((...)))))))))...........−2.030.903SARS-CoV-2-conserved-structured-10622101–22218AGGAAAACAGGGUAAUUUCAAAAAUCUUAGGGAAUUUGUGUUUAAGAAUAUUGAUGGUUAUUUUAAAAUAUAUUCUAAGCACACGCCUAUUAAUUUAGUGCGUGAUCUCCCUCAGGGU.......................((((((((((....(((((((.((((((..................)))))))))))))(((((((......))).))))....))))).)))))−1.70.900

We sought to further check structured windows reported by RNAz using orthogonal approaches. First, we explored using R-scape to make structure predictions with covariation signal in the sequence alignments. However, we found that the SARSr-MSA-1 alignment had insufficient variation to detect conserved base pairs with covariation, lacking alignment power for all genomic windows.22 Second, we validated structured window predictions with alifoldz, a program that calculates a z-score for an alignment window by comparing the window’s consensus minimum free energy structure to that of random shuffled alignments. To mirror the RNAz analysis above, we scanned through windows of length 120 nucleotides sliding by 40 nucleotides. We chose a z-score cutoff of - 2.69, which kept only 1% of windows when running alifoldz on all shuffled windows across the genome. This approach led to predicting 228 alifoldz structured windows, overlapping with 104 of the 117 RNAz structured windows (P>0.9 cutoff). This overlap is statistically significant (p-value<1e-05). RNAz structured windows supported by alifoldz analysis are highlighted in Supplementary File 1.

Conserved unstructured regions of SARS-CoV-2

We additionally located conserved regions of the viral genome predicted to lack structure, as such regions may be desired targets for some diagnostic and therapeutic approaches (Fig. 1C). We scanned the SARS-CoV-2 reference genome in windows of length 120 nucleotides sliding by 40 nucleotides, and for each window, we predicted the base-pair probability matrix with Contrafold 2.0, using these probabilities to assemble average single-nucleotide base pairing probabilities across the genome. In Figure 3, we display the 76 stretches of the genome of length at least 15 nucleotides where every base has average base-pairing probability at most 0.4.

It is interesting to note that some structured 120 nucleotide windows reported by RNAz include these unpaired stretches. A potential explanation for this observation is that such regions encode for well-defined, conserved RNA structures that themselves harbor long unpaired loops to recruit proteins, distal RNA elements, or other molecular machinery.

Overall, we find that 58 of these unpaired stretches have at least 15 nucleotides of overlap with sequence regions that are at least 97% conserved in SARS-CoV-2-MSA-2 (Fig. 5). These unpaired stretches termed SARS-CoV-2-conserved-unstructured regions are listed in Table 3 (overlaps with SARSr-MSA-1 are included in Supplementary File 1.)

An external file that holds a picture, illustration, etc. Object name is nihpp-2020.03.27.012906-f0005.jpg

We depict the predicted number of structured, unstructured and conserved intervals for a choice of sequence conservation cutoffs. The SARS-related conserved intervals are all regions of at least 15 nucleotides with each position at least 90% conserved across an alignment of SARS, bat coronavirus, and SARS-CoV-2 sequences (SARSr-MSA-1). The SARS-CoV-2 intervals are regions of at least 15 nucleotides with each position at least 97% conserved across an alignment of currently available SARS-CoV-2 sequences (SARS-CoV-2-MSA-2). Structured intervals are loci predicted from RNAz with some loci containing multiple RNAz windows, and unstructured intervals are stretches of at least 15 nucleotides where all bases have base-pairing probability at most 0.4. All interval intersections are required to have at least 15 nucleotide overlaps, with the number of overlapping intervals listed for each interval type involved in the intersection. Top-scoring structured intervals conserved in SARS-CoV-2 sequences (green) are listed in Table 2. Top-scoring unstructured intervals conserved in SARS-CoV-2 sequences (blue) are listed in Table 3.

Table 3:

SARS-CoV-2-conserved-unstructured.

Top unstructured regions (ranked by minimum unpaired probability over the interval, stretch of at least 15 nt) that overlap with conserved intervals from SARS-CoV-2 for at least 15 nt at a 97% sequence conservation cutoff. Sequence intervals are relative to the reference genome NC_045512.2.

NameSequence intervalAverage unpaired probabilityMinimum unpaired probabilitySequenceSARS-CoV-2-conserved-unstructured-129074–290870.8911050.763759AUACAAUGUAACACSARS-CoV-2-conserved-unstructured-28078–80940.824970.752542CCAAUGGAAAAACUCAASARS-CoV-2-conserved-unstructured-31359–13740.8365120.716868UUGUUAAAAUUUAUUGSARS-CoV-2-conserved-unstructured-421626–216430.8572350.713387ACUCAAUUACCCCCUGCASARS-CoV-2-conserved-unstructured-51420–14360.7966310.697304CGAAUACCAUAAUGAAUSARS-CoV-2-conserved-unstructured-618471–184840.7798210.695284UCAAUUUAAACACCSARS-CoV-2-conserved-unstructured-711910–119230.7669730.683262AAUCAUCAUCUAAASARS-CoV-2-conserved-unstructured-823960–239810.7870370.677563UUUAAUUUUUCACAAAUAUUACSARS-CoV-2-conserved-unstructured-913990–140030.7963340.661868CAAGCUUUGUUAAASARS-CoV-2-conserved-unstructured-1010009–100350.7602040.656672UCUUUACCAACCACCACAAACCUCUAUSARS-CoV-2-conserved-unstructured-1123700–237180.8227330.654787CCAUACCCACAAAUUUUACSARS-CoV-2-conserved-unstructured-1218918–189340.8321320.654059UAUUGAAUAUCCUAUAASARS-CoV-2-conserved-unstructured-1327385–274020.8096930.653733UAAACGAACAUGAAAAUUSARS-CoV-2-conserved-unstructured-145773–57890.8082840.65353UAAACAUAUAACUUCUASARS-CoV-2-conserved-unstructured-1523910–239320.8377770.652669CACAAGUCAAACAAAUUUACAAASARS-CoV-2-conserved-unstructured-1617762–177850.7674260.650277CUGUCUUUAUUUCACCUUAUAAUUSARS-CoV-2-conserved-unstructured-1725569–255820.8257960.648774UUCCAAAAUCAUAASARS-CoV-2-conserved-unstructured-1819569–195880.753870.648103UUACAAACAAUUUGAUACUUSARS-CoV-2-conserved-unstructured-1922552–225650.7734610.646815UAAUAUUACAAACUSARS-CoV-2-conserved-unstructured-2025417–254370.7471660.639634ACAAUUGGAACUGUAACUUUGSARS-CoV-2-conserved-unstructured-2112195–122100.7459110.634404UUAAAAAGUUGAAGAASARS-CoV-2-conserved-unstructured-226757–67830.789790.633073UUGUACUAAUUAUAUGCCUUAUUUCUUSARS-CoV-2-conserved-unstructured-2315236–152570.7470240.630481ACAACAUGUUAAAAACUGUUUASARS-CoV-2-conserved-unstructured-246225–62380.7343490.628462AUAAACCUAUUGUUSARS-CoV-2-conserved-unstructured-259578–95980.8256970.627812UAUUCUGUUAUUUACUUGUACSARS-CoV-2-conserved-unstructured-2621649–216620.8205270.626762UAAUUCUUUCACACSARS-CoV-2-conserved-unstructured-2723985–239980.7997540.625346AUCCAUCAAAACCASARS-CoV-2-conserved-unstructured-287161–71740.7743160.62504ACACCUAUCCUUCUSARS-CoV-2-conserved-unstructured-296010–60290.7386490.624805ACCAAACCAACCAUAUCCAASARS-CoV-2-conserved-unstructured-306515–65290.810840.624018UUAAAAAUUACAGAASARS-CoV-2-conserved-unstructured-3118219–182320.7495320.623586UAAAAUGAAUUAUCSARS-CoV-2-conserved-unstructured-3211659–116810.8191590.623162UUUACUCAACCGCUACUUUAGACSARS-CoV-2-conserved-unstructured-3324778–247970.7711310.622469AAAGAACUUCACAACUGCUCSARS-CoV-2-conserved-unstructured-3421669–216830.7886440.621176UUUAUUACCCUGACASARS-CoV-2-conserved-unstructured-356105–61220.7772390.618997AUAAGAAACCUGCUUCAASARS-CoV-2-conserved-unstructured-3628436–284520.808270.618884GCUCUCACUCAACAUGGSARS-CoV-2-conserved-unstructured-3716361–163750.7725620.618808UCUUGUCUGUUAAUCSARS-CoV-2-conserved-unstructured-3828148–281620.7343080.616622UUUUACAAUUAAUUGSARS-CoV-2-conserved-unstructured-3924165–241780.7454390.616288AAAUGAUUGCUCAASARS-CoV-2-conserved-unstructured-4026429–264470.7413310.615047UUAAAAAUCUGAAUUCUUCSARS-CoV-2-conserved-unstructured-416072–60860.7270410.614595AAUUUGCUGAUGAUUSARS-CoV-2-conserved-unstructured-421304–13190.8218770.614536GAGAAUUUGACUAAAGSARS-CoV-2-conserved-unstructured-431918–19330.7515660.614279UCUUGAAACUGCUCAASARS-CoV-2-conserved-unstructured-4427361–273750.7406960.61226GAAGAGCAACCAAUGSARS-CoV-2-conserved-unstructured-4528853–288660.7701890.611938UCAAGAAAUUCAACSARS-CoV-2-conserved-unstructured-4614899–149130.7416560.610443UGUAUUAAUGCUAACSARS-CoV-2-conserved-unstructured-4719260–192730.7609650.609632UAACCUUAACUUGCSARS-CoV-2-conserved-unstructured-4811724–117400.7296390.607945UUAGAUAUAUGAAUUCASARS-CoV-2-conserved-unstructured-4929008–290230.7797310.607881UGUCACUAAGAAAUCUSARS-CoV-2-conserved-unstructured-5011537–115540.7712410.607484AUUGUUUUUAUGUGUGUUSARS-CoV-2-conserved-unstructured-5111628–116450.850370.606842AUUUUUGUACUUGUUACUSARS-CoV-2-conserved-unstructured-5218681–186940.7473340.605637CACAUGCUUUUCCASARS-CoV-2-conserved-unstructured-537366–73840.7236020.605523AAUAAUUAAUCUUGUACAASARS-CoV-2-conserved-unstructured-541031–10470.7277120.605352GAAAUUAAAUUGGCAAASARS-CoV-2-conserved-unstructured-5514367–143800.7144340.604803UUGUGCAAACUUUASARS-CoV-2-conserved-unstructured-563797–38160.7416690.60258UUUGAUAAAAAUCUCUAUGASARS-CoV-2-conserved-unstructured-5722281–222960.7418330.600606CUUUACUUGCUUUACASARS-CoV-2-conserved-unstructured-5816038–160530.7800440.600438UUACCCACUUACUAAA

As an orthogonal check for the unstructured intervals predicted using Contrafold 2.0 base-pairing probabilities, we used Vienna’s RNAplfold to compute unpaired probabilities for each genome position. In general, we found that RNAplfold predicted lower unpaired probabilities than Contrafold 2.0, with only 10 intervals of length at least 15 nucleotides having at least 0.6 probability of being unpaired, in contrast with the 76 stretches predicted by Contrafold 2.0. Nevertheless, we found that 9 of the 10 intervals predicted by Vienna’s RNAplfold overlap with unpaired intervals predicted from our Contrafold 2.0 analysis (regions listed in Supplementary File 1.)

Secondary structure models for canonical structured regions of SARS-CoV-2

Currently known RNA structures that recur across betacoronaviruses provide potential starting points for therapeutic development targeting the SARS-CoV-2 RNA genome. Here, we include secondary structures for the 5´ UTR (Figure 4a), frame-shifting element (Figure 4b), and 3´ UTR (Figure 4c) for SARS-CoV-2, built by analyzing homology to literature-annotated structures in related betacoronaviruses. We additionally include computer-readable secondary structures in Table 4 and Supplementary File 1. A brief review of salient secondary structure features in these regions and their putative functional roles in the betacoronavirus life cycle follows.

An external file that holds a picture, illustration, etc. Object name is nihpp-2020.03.27.012906-f0004.jpg

Secondary structure diagrams for A) 5´ UTR, B) Frameshift element, C) 3´ UTR. Nucleotides are black if 100% conserved in the SARS, bat, and SARS-CoV-2 sequences in SARSr-MSA-1, and grey otherwise. Special labeled domains are in boldface. Structures are based primarily on manual identification of homology with literature coronavirus structure models. Note that numbering in (C) is relative to 3´ end of virus sequence. Figures prepared in RiboDraw (https://github.com/ribokit/RiboDraw).

Table 4:

Sequences and secondary structure dot-brackets for key structured genome regions of SARS-CoV-2.

NameSequenceSecondary structure dot-bracket5´ UTRAUUAAAGGUUUAUACCUUCCCAGGUAACAAACCAACCAACUUUCGAUCUCUUGUAGAUCUGUUCUCUAAACGAACUUUAAAAUCUGUGUGGCUGUCACUCGGCUGCAUGCUUAGUGCACUCACGCAGUAUAAUUAAUAACUAAUUACUGUCGUUGACAGGACACGAGUAACUCGUCUAUCUUCUGCAGGCUGCUUACGGUUUCGUCCGUGUUGCAGCCGAUCAUCAGCACAUCUAGGUUUCGUCCGGGUGUGACCGAAAGGUAAGAUGGAGAGCCUUGUCCCUGGUUUCAACGAGAAAACACACGUCCAACUCAGUUUGCCUGUUUUACAGGUUCGCGACGUGCUCGUACGUGGCUUUGGAGACUCCGUGGAGGAGGUCUUAUCAGAGGCACGUCAACAUCUUAAAGAUGGCACUUGUGGCUUAGUAGAAGUUGAAAAAGGCGUUUUGCC......(((((.(((((....)))))..)))))...........(((((.....))))).((((.......))))........((((((((.((.((((.(((.....))).)))))).))))))))(..(((((.....))))))...(((((((((((..(((((...(((.((((((((((((.((((((.(((((......)))))..))))))).....)))(((((((.((......)))))))))(((....)))))))))))))).))))).))))...))))))).......((((((.......((..((((((...))))))..))))))))(....(((((.(((((((((((((.....)))).))))..))))).)))))...(((((...)))))).......(((((.....)))))......(((.....)))Frameshifting elementGUUUUUAAACGGGUUUGCGGUGUAAGUGCAGCCCGUCUUACACCGUGCGGCACAGGCACUAGUACUGAUGUCGUAUACAGGGCUUUUG...............(((((((((((...[[[[[[[)))))))))))((((((((.........))).)))))...]].]]]]]....3´ UTRGACCACACAAGGCAGAUGGGCUAUAUAAACGUUUUCGCUUUUCCGUUUACGAUAUAUAGUCUACUCUUGUGCAGAAUGAAUUCUCGUAACUACAUAGCACAAGUAGAUGUAGUUAACUUUAAUCUCACAUAGCAAUCUUUAAUCAGUGUGUAACAUUAGGGAGGACUUGAAAGAGCCACCACAUUUUCACCGAGGCCACGCGGAGUACGAUCGAGUGUACAGUGAACAAUGCUAGGGAGAGCUGCCUAUAUGGAAGAGCCCUAAUGUGUAAAAUUAAUUUUAGUAGUGCUAUCCCCAUGUGAUUUUAAUAGCUUCUUAGGAGAAUGACAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA................(((((((((((..(((...((......))...))).)))))))))))..(((((((.......{{{{{{.[[[[[[[[[.)))))))...]]]]]]]]]...((((..((((((((((..((..(((.(((.....(((((((((.((.(((....)))))....(((((((((....((..((((.....))..))...))...))))).)))).((((........))))..........))))))))).....))).)))..))...)))))......)))))..))))...........}}}}}}....................................

The 5´ UTR includes five confident stem-loop structures (SL1-SL5), with structures verified by chemical mapping experiments in related coronaviruses.9, 16 SL1 and SL2 are conserved across betacoronaviruses, with SL2 having the highest sequence conservation across the 5´ UTR.13 The high A-U base-pairing content in the SARS-CoV-2 SL1 sequence and the bulged nucleotides align with prior reports that SL1 is relatively thermodynamically unstable to allow for the formation of long-range interactions.23 SL2 has been shown to be critical for subgenomic RNA synthesis, with mutations in its conserved pentaloop retaining the production of genome-sized RNA, but not subgenomic RNA segments.24 SL3, conserved only in betacoronaviruses, presents the transcription-regulating sequence (TRS) that base pairs with one of several complementary sequences in nascent negative-sense strands in a ‘copy-choice mechanism’ that gives rise to discontinuous transcription of subgenomic mRNAs.6 SL4 contains a short upstream ORF, here labeled uORF, which precedes the first longer ORF1ab of the genome. The uORF leads to attenuated transcription of ORF1ab that appears helpful but is not essential for viral replication.13 SL5 has been implicated in a potential role in viral packaging, and contains the AUG start codon for long ORF1ab which encodes the viral replicase/transcriptase polyprotein. The SARS-CoV-2 SL5 domain has common features with the domain in other group IIb betacoronaviruses, for instance including UUCGU pentaloops on SL5a and SL5b, and a GNRA tetraloop on SL5c.9 Prior DMS-probing data for Stem 5 in SARS-CoV aligned with the proposed SL5a,b,c structures.9 Two additional stems (SL6 and SL7) are predicted from computer modeling here, but prior literature has not established whether such stems embedded in the coding region are functionally important across betacoronaviruses.

The frameshifting element (FSE) is located in ORF1ab and is involved in regulating a (−1) ribosomal frameshift event that is necessary for producing ORF1b. The FSE consists of a conserved pseudoknot structure that regulates the rate of ribosomal frameshifting at an upstream slippery site.7 This domain is nearly exactly conserved between SARS-CoV and SARS-CoV-2, suggesting a similar mechanism for ribosomal pausing and slippage between the two viruses.25

The 3´ UTR contains various domains critical for regulating viral RNA synthesis and potentially translation. The most 5´ region of the 3´ UTR includes a switch-like domain involving mutually exclusive formation of a pseudoknot and stem-loop, both of which are essential for viral replication with putative roles in establishing the kinetics of RNA synthesis.6, 26 The hyper-variable region (HVR) is not essential for viral RNA synthesis, as this can be removed while allowing for viral replication in tissue culture; however, viruses without this domain have lower pathogenicity in mice.27 This domain contains a completely conserved octonucleotide sequence with unconfirmed functional significance. The stem-loop II-like motif (s2m) is another subregion of the HVR that is conserved in SARS-CoV-2 and other coronaviruses. A crystal structure of the SARS s2m domain has been shown to be homologous to an rRNA loop that binds translation initiation proteins, leading this domain to have a proposed role in recruiting host translational machinery.28 The domain has been proposed to be a selfish element due to its recurrence in numerous virus families outside the Coronaviridae, but its function is not well understood.29


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK