In “Technical Tour de Force,” Scientists Develop Explainable AI for Decoding Genome Regulatory Code

Researchers at the Stowers Institute for Medical Research, in Kansas City, Missouri, in collaboration with colleagues at Stanford University and Technical University of Munich, have developed advanced explainable artificial intelligence (AI) in a technical tour de force to decipher regulatory instructions encoded in DNA. In a report published online on February 18, 2021, in Nature Genetics, the team found that a neural network trained on high-resolution maps of protein-DNA interactions can uncover subtle DNA sequence patterns throughout the genome and provide a deeper understanding of how these sequences are organized to regulate genes. The article is titled “Base-Resolution Models of Transcription-Factor Binding Reveal Soft Motif Syntax.” Neural networks are powerful AI models that can learn complex patterns from diverse types of data such as images, speech signals, or text to predict associated properties with impressive high accuracy. However, many see these models as uninterpretable because the learned predictive patterns are hard to extract from the model. This black-box nature has hindered the wide application of neural networks to biology, where interpretation of predictive patterns is of paramount importance. One of the big unsolved problems in biology is the genome's second code--its regulatory code. DNA bases (commonly represented by letters A, C, G, and T) encode not only the instructions for how to build proteins, but also when and where to make these proteins in an organism. The regulatory code is read by proteins called transcription factors that bind to short stretches of DNA called motifs. However, how particular combinations and arrangements of motifs specify regulatory activity is an extremely complex problem that has been hard to pin down.
Login Or Register To Read Full Story