Distilling Out Prevalent Modes of Protein-Protein Complexes and Associated Lipids Using Deep Learning

Gautham Dharuman, Lawrence Livermore National Laboratory
About a third of all human cancers are driven by mutations in RAS genes.

Mutations of the KRAS gene are a prevalent driver in nearly 30% of all human cancers. It is hypothesized that the KRAS dimerization may facilitate RAF clustering, which is known to be required for RAF activation. A recent study has demonstrated the importance of KRAS dimerization in the oncogenic function of mutant KRAS and has revealed that the disruption of dimerization could be a potential therapeutic strategy. Consequently, it is crucial to identify the stable modes of KRAS-KRAS association. Identifying conformational states in KRAS monomers can be achieved using hand-engineered features, such as tilt and rotation angles. However, a similar approach is challenging for a KRAS-KRAS complex because the complexity of the structure makes it a nontrivial task to hand-engineer the right set of features. With a machine learning based encoding and clustering approach applied to several thousand coarse grained (CG) molecular dynamics (MD) simulations where two KRAS came in contact, we identified 3 prevalent modes of KRAS-KRAS association. I’ll talk about our scalable and effective ML solution that involved deep neural network (DNN) based autoencoders coupled with a data-parallel distributed training approach applied to ~1 million CG frames corresponding to ~1TB of data. The DNN based autoencoder provided orders of magnitude reduction in data dimensionality and resulted in a latent space with the most essential features of the data. This space was then subjected to a combination of clustering techniques to identify in an unsupervised manner the dominant modes of KRAS-KRAS association. I designed the DNNs using the TensorFlow API and performed the distributed training using the Nvidia V100 GPUs across several nodes of Lassen using the Horovod API framework.

Bio: Gautham Dharuman joined LLNL as a Postdoctoral Research Staff Member in the Computational Materials Science Group in June 2018. His research interests are broadly in scientific machine learning, molecular dynamics simulations at scale, high performance computing, and multiscale modeling. He received his Ph.D. (dual) in Computational Mathematics Science and Engineering and Electrical Engineering from Michigan State University in 2018. At LLNL, he is part of the pilot project on RAS protein that aims to understand RAS-driven cancer initiation and growth through multiscale simulations aided by machine learning to enable predictions at unprecedented length- and time-scales. His efforts include physics guided machine learning, consistency studies of simulations spread across scales, and high-performant code development.

Please use this link to attend the virtual seminar.