Frequent Technical Bias Occurs in RNA-Seq Expression Studies, Leading to Widespread Misinterpretation of Gene Expression Data; Authors Present Approach to Removing This Bias

Reproducibility is a major challenge in experimental biology, and with the increasing complexity of data generated by genomic-scale, this concern is immensely amplified. RNA-seq, one of the most widely used methods in modern molecular biology, allows in a single test the simultaneous measurement of the expression level of all the genes in a given sample. New research, published online on November 12, 2019 in the open-access journal PLOS Biology by Shir Mandelbaum, Zohar Manber, Orna Elroy-Stein, and Ran Elkon from Tel Aviv University, identifies a frequent technical bias in data generated by RNA-seq technology, which recurrently leads to false results. The article is titled “Recurrent Functional Misinterpretation of RNA-Seq Data Caused By Sample-Specific Gene Length Bias.” Analyzing dozens of publicly available RNA-seq datasets, which profiled the cellular responses to numerous different stresses, Dr. Mandelbaum and colleagues noticed that sets of particularly short or long genes repeatedly showed changes in expression level (as shown by the apparent number of RNA transcripts from a given gene). Puzzled by this recurring pattern, the authors then asked whether it reflects some universal biological response common to many different triggers or if it, rather, stems from some experimental artefact. To tackle this question, they compared replicate samples from the same biological condition. Differences in gene expression between replicates can reflect technical effects that are not related to the experiment's biological factor of interest. Unexpectedly, the same pattern of particularly short or long genes showing changes in expression level was observed in these comparisons between replicates, demonstrating that this pattern is the result of a technical bias that seemed to be coupled with gene length.
Login Or Register To Read Full Story