In early 2020, a few months after the COVID-19 pandemic began, scientists were able to sequence the full genome of SARS-CoV-2, the virus that causes the COVID-19 infection. While many of its genes were already known at that point, the full complement of protein-coding genes was unresolved. Now, after performing an extensive comparative genomics study, MIT researchers have generated what they describe as the most accurate and complete gene annotation of the SARS-CoV-2 genome. In their study, the results of which were published online on May 11, 2021 in Nature Communications, the scientists confirmed several protein-coding genes and found that a few others that had been suggested as genes do not code for any proteins. “We were able to use this powerful comparative genomics approach for evolutionary signatures to discover the true functional protein-coding content of this enormously important genome,” says Manolis Kellis, PhD, who is the senior author of the study and a Professor of Computer Science in MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) as well as a member of the Broad Institute of MIT and Harvard. The research team also analyzed nearly 2,000 mutations that have arisen in different SARS-CoV-2 isolates since the virus began infecting humans, allowing them to rate how important those mutations may be in changing the virus’ ability to evade the immune system or become more infectious. The open-access Nature Communications article is titled “Conflicting and Ambiguous Names of Overlapping ORFs in the SARS-CoV-2 Genome: A Homology-Based Resolution.” The SARS-CoV-2 genome consists of nearly 30,000 RNA bases. Scientists have identified several regions known to encode protein-coding genes, based on their similarity to protein-coding genes found in related viruses. A few other regions were suspected to encode proteins, but they had not been definitively classified as protein-coding genes.
Login Or Register To Read Full Story