Mathematical Advance at Yale Improves Speed & Resolution of RNA Data Visualization, Should Accelerate Work on Science Magazine’s 2018 “Breakthrough of the Year”—Following Embryonic Development Cell by Cell

Similar to going from a pinhole camera to a Polaroid, a significant mathematical update to the formula for a popular bioinformatics data visualization method will allow researchers to develop snapshots of single-cell gene expression, not only several times faster, but also at much higher-resolution. Published online on February 11, 2019, in Nature Methods, this innovation by Yale mathematicians will reduce the rendering time of a million-point single-cell RNA-sequencing (scRNA-seq) data set from over three hours down to just fifteen minutes. The article is titled “Fast Interpolation-Based t-SNE for Improved Visualization of Single-Cell RNA-seq Data.” Scientists say the existing decade-old method, t-distributed Stochastic Neighborhood Embedding (t-SNE), is great for representing patterns in RNA sequencing data gathered at the single-cell level, scRNA-seq data, in two dimensions. "In this setting, t-SNE 'organizes' the cells by the genes they express and has been used to discover new cell types and cell states," said George Linderman, lead author and a Yale MD, PhD, student specializing in applied mathematics. By computational standards, however, t-SNE is quite slow. Thus, researchers often "down-sample" their scRNA-seq dataset -- take a smaller sample from the initial sample -- before applying t-SNE. However, down-sampling is a poor compromise, as it makes it unlikely for t-SNE to capture rare cell populations, which are often what researchers most want to identify. More than 30 years ago, another team of Yale mathematicians developed the fast multipole method (FMM), a revolutionary numerical technique that sped up the calculation of long-ranged forces in the n-body problem.
Login Or Register To Read Full Story