Using DOE supercomputers, researchers are training powerful AI models that learn from text, images, and energy data across many institutions—without sharing sensitive data—to accelerate discovery, strengthen the power grid, and enable secure scientific collaboration.
This project advances privacy-preserving federated learning (PPFL) to enable the training of large-scale foundation models (FMs) on sensitive, multimodal scientific data distributed across institutions. By leveraging the Department of Energy’s (DOE) high-performance computing (HPC) facilities—including Frontier, Aurora, Polaris, and Perlmutter—the research team will train FMs in four key areas: extracting knowledge from scientific text, interpreting high-resolution imaging data from DOE light sources, forecasting building energy consumption using national building datasets, and modeling electric grid operations through graph-based learning. These models will be developed without centralizing data, preserving privacy while enabling collaborative AI development across national laboratories and universities.
The project supports DOE’s mission by delivering AI capabilities that enhance energy resilience, scientific discovery, and secure collaboration. The PPFL framework will integrate scalable optimization and privacy- preserving mechanisms—such as adaptive compression, federated pruning, and differential privacy—to reduce communication costs and safeguard sensitive data. The outcomes will establish a foundation for collaboratively training foundation models on multiple DOE’s exascale systems, contributing to breakthroughs in imaging science, biomedical science, and grid modernization, while setting a precedent for secure, multi-institutional research at leadership computing scale.