ALCF workshop equips researchers with AI and deep learning skills

outreach
 Argonne’s Huihuo Zheng

Argonne’s Huihuo Zheng leads a session on distributed deep learning at the ALCF’s 2021 Simulation, Data, and Learning Workshop.

The ALCF's annual Simulation, Data, and Learning Workshop helps attendees advance their use of supercomputing resources for AI- and data-driven science.

As artificial intelligence (AI) continues to cement its place as a powerful tool for science, the Argonne Leadership Computing Facility (ALCF) remains committed to training researchers to use machine learning, deep learning, and other emerging AI techniques on its world-class supercomputing resources. The ALCF is a U.S. Department of Energy Office of Science user facility at Argonne National Laboratory.

“As part of Argonne’s efforts to advance AI for science, the lab is assembling an impressive array of AI technologies and expertise,” said Kyle Felker, Argonne computational scientist. “The goal of the ALCF’s training program is to grow the community of researchers who can use our advanced computing resources to accelerate science.”

Felker SDL Screen Capture

Argonne’s Kyle Felker welcomes attendees to the ALCF’s 2021 Simulation, Data, and Learning Workshop.

This October, Felker helped organize the ALCF’s Simulation, Data, and Learning (SDL) Workshop, an annual event designed to help researchers improve the performance and productivity of simulation, data science, and machine learning applications on ALCF systems. Participants had the opportunity to learn about leading-edge AI methods and technologies while working directly with ALCF staff scientists during dedicated hands-on sessions.

“It was especially important to the workshop organizers to center the event around interactive, hands-on instruction,” said Felker. “With the breakneck speed of development of new deep learning models, software frameworks, and hardware, we are forced to make major updates to the event’s content each year, and we want attendees to immediately see how to map their science applications to these tools and ALCF resources. The addition of the Polaris and Aurora supercomputers next year presents a great challenge and opportunity to make SDL 2022 a useful stepping-stone to these new platforms."

Over the course of the three-day virtual workshop, attendees learned how to use deep learning tools, such as the Horovod framework, DeepSpeed library, and the Argonne-developed DeepHyper package. They were able to test these tools out in real time on ALCF computing resources, including ThetaGPU, an AI- and simulation-enabled extension of the Theta supercomputer.

“As the presenters were leading sessions, I was running the application codes presented simultaneously,” said Smeet Chheda, a PhD student studying computer science at Stony Brook University. “The hands-on experience with ThetaGPU was amazing, and the presenters were knowledgeable on a wide range of topics. They helped me with small problems as well as large theoretical problems.”

Chheda, who was among the 60 researchers to participate in this year’s workshop, attended the event to learn how to use large-scale systems like ThetaGPU to advance his machine learning research.

“I am already directly applying the distributed deep learning concepts that were presented during the workshop, and I am looking forward to experimenting with what I learned about neural architecture search as well,” said Chheda. “In the future, I hope to continue work with ALCF on various scientific machine learning problems.”

Like Chheda, most attendees had plans to continue their work at the ALCF after the event ended. The workshop concluded with a session detailing how researchers can apply for Director’s Discretionary projects to test and optimize their software and prepare for a future project through allocation programs, such as INCITE, ALCC, and the ALCF Data Science Program.

“As our facility continues to expand the use of AI and machine learning on leadership computing resources, it is critical that we bring the ALCF user community along with us,” said Ray Loy, ALCF lead for training, debuggers, and math libraries. “Our workshops and training events are key to educating current users and cultivating a new set of researchers who can leverage AI and supercomputers for science at ALCF.”

 

==========

The Argonne Leadership Computing Facility provides supercomputing capabilities to the scientific and engineering community to advance fundamental discovery and understanding in a broad range of disciplines. Supported by the U.S. Department of Energy’s (DOE’s) Office of Science, Advanced Scientific Computing Research (ASCR) program, the ALCF is one of two DOE Leadership Computing Facilities in the nation dedicated to open science.

Argonne National Laboratory seeks solutions to pressing national problems in science and technology. The nation's first national laboratory, Argonne conducts leading-edge basic and applied scientific research in virtually every scientific discipline. Argonne researchers work closely with researchers from hundreds of companies, universities, and federal, state and municipal agencies to help them solve their specific problems, advance America's scientific leadership and prepare the nation for a better future. With employees from more than 60 nations, Argonne is managed by UChicago Argonne, LLC for the U.S. Department of Energy's Office of Science.

The U.S. Department of Energy's Office of Science is the single largest supporter of basic research in the physical sciences in the United States and is working to address some of the most pressing challenges of our time. For more information, visit https://energy.gov/science