OpenFold-Powered Machine Learning of Protein-Protein Interactions and Complexes

PI Mohammed AlQuraishi, Columbia University
Co-PI Zhao Zhang, Rutgers University
AlQuraishi INCITE 2025

Generalization to new families. AUROCs for models trained onone PBD family and tested on another. Families of the same PTM type arehighlighted in blocks. Polyproline-binding domains (WH1, WW, SH3, andGYF) show strong transferability

Project Description

Protein-protein interactions (PPIs) underpin most biological processes. Despite the major role they play in disease, most PPIs in humans are not well understood. Biophysically, PPIs can be classified as idiosyncratic (driven by binding surfaces unique to individual proteins) or as canonical (driven by surfaces reused by members of homologous protein families to bind peptides on partner proteins). Idiosyncratic PPIs are< often high-affinity and form stable complexes while canonical PPIs are often transient, low affinity, and vary minutely across domains to drive signaling logic. While both PPIs are studied by high-throughput experimental methods, the cost, complexity, and insensitivity of these methods, and the enormity of PPI space, have resulted in <20% coverage of the human interactome and sparse coverage of most other species. 

To advance our understanding of PPIs, this INCITE project is leveraging artificial intelligence (AI) to develop tools that predict interactions between any two proteins and make these tools widely available to the biology community. The research team is using DOE supercomputers to build computational methods for identifying novel idiosyncratic and canonical PPIs by combining multiple tiers of direct and indirect binding data with supervised and unsupervised machine learning models that account for varying degrees of experimental evidence. To conduct this research, the team developed OpenFold, a trainable implementation of AlphaFold2 (an AI tool used for predicting protein structures). 

The researchers are tackling PPI prediction by building three types of models: (1) a supervised model for predicting idiosyncratic PPIs; (2) a supervised model for predicting canonical peptide-mediated PPIs; and (3) an unsupervised model for predicting canonical peptide-mediated PPIs. The team has produced preliminary results for all three models that support their validity. Their idiosyncratic PPI model aims to help identify novel protein complexes and human/human pathogen PPIs for drug targeting. Similarly, their canonical PPI models are designed to help unravel signaling networks and their dysregulation in disease by< modeling the effects of mutations on PPIs. Their models thus have the potential to be as transformative to protein interactomes as AlphaFold2 has been to protein structure.

Project Type
Allocations