This project will pioneer advancements in federated learning by fostering more effective simulation and modeling techniques, addressing the critical challenges of scalability and resilience in distributed federated learning systems and the associated scientific workflows.
This project aims to harness substantial computational resources from the ASCR Leadership Computing Challenge (ALCC) to design and develop a scalable and resilient modeling and simulation framework for federated learning systems and applications. Leveraging ALCC computing allocations on Frontier, Aurora, Polaris, and Perlmutter, the project will conduct meticulous modeling and large-scale simulations essential for refining federated learning systems and understanding their dynamics in evolving real-world scenarios. Additionally, ALCC resources will facilitate the development and evaluation of advanced federated learning systems, workflows, and applications.
This project is expected to significantly impact mission-driven scientific applications and workflows of the Department of Energy (DOE). The project will pioneer advancements in federated learning by fostering more effective simulation and modeling techniques, addressing the critical challenges of scalability and resilience in distributed federated learning systems and the associated scientific workflows. The extensive computing resources provided by the ALCC are crucial, enabling the project to explore and expand the frontiers of current federated learning capabilities and applications.