OpenFold-Powered Machine Learning of Protein-Protein Interactions and Complexes

Project Summary

This project will use artificial intelligence to build tools that predict interactions between any two proteins and will make these tools widely available to the biology community.

Project Description

Protein-protein interactions (PPIs) underpin most biological processes. Despite the major role they play in disease, most PPIs in humans are not well understood. Biophysically, PPIs can be classified as idiosyncratic (driven by binding surfaces unique to individual proteins) or as canonical (driven by surfaces reused by members of homologous protein families to bind peptides on partner proteins). Idiosyncratic PPIs are often high-affinity and form stable complexes while canonical PPIs are often transient, low- affinity, and vary minutely across domains to drive signaling logic. While both PPIs are studied by high-throughput experimental methods, the cost, complexity, and insensitivity of these methods, and the enormity of PPI space, have resulted in <20% coverage of the human interactome and sparse coverage of most other species.

To advance our understanding of PPIs, this INCITE project will use artificial intelligence (AI) to develop tools that predict interactions between any two proteins and make these tools widely available to the biology community. The research team will use DOE supercomputers to build computational methods for identifying novel idiosyncratic and canonical PPIs by combining multiple tiers of direct and indirect binding data with supervised and unsupervised machine learning models that account for varying degrees of experimental evidence. To conduct this research, the team developed OpenFold, a trainable implementation of AlphaFold2 (an AI tool used for predicting protein structures).

The researchers will tackle PPI prediction by building three types of models: (1) a supervised model for predicting idiosyncratic PPIs; (2) a supervised model for predicting canonical peptide-mediated PPIs; and (3) an unsupervised model for predicting canonical peptide- mediated PPIs. The team has produced preliminary results for all three models that support their validity. Their idiosyncratic PPI model aims to help identify novel protein complexes and human/human pathogen PPIs for drug targeting. Similarly, their canonical PPI models are designed to help unravel signaling networks and their dysregulation in disease by modeling the effects of mutations on PPIs. The proposed models thus have the potential to be as transformative to protein interactomes as AlphaFold2 has been to protein structure.

Argonne Leadership Computing Facility

Leadership Computing Resources

Featured: Aurora

Computational Science

Featured: Engineering

Growing the HPC Community

Accelerating Science

Support Center

Featured: Get Started

Featured: MyALCF

OpenFold-Powered Machine Learning of Protein-Protein Interactions and Complexes

Leadership Computing Resources

Featured: Aurora

Computational Science

Featured: Engineering

Growing the HPC Community

Accelerating Science

Support Center

Featured: Get Started

Featured: MyALCF

OpenFold-Powered Machine Learning of Protein-Protein Interactions and Complexes

More Biological Sciences Projects

Foundation Models for Predictive Molecular Epidemiology

ExaCortex: Exascale Reconstruction of Human Cerebral Cortex

Large-Scale Simulations of Inner-Ear Mechanotransduction Complexes