The team will use a sizeable INCITE allocation to explore efficient alternatives for transformer models for language modeling.
Artificial intelligence (AI), and deep learning (DL) in particular, is rapidly becoming pervasive in almost all areas of computer science, and is even being used to assist computational science modeling and simulations. At the forefront of this development are large language models (LLMs). The challenges the team seeks to address in this project originate from the fact that large models do not fit on a single CPU/GPU and/or take a long time to train. Scaling the training of large neural networks to extreme levels of parallelism requires parallelizing and optimizing different computational and communication motifs such as dense and sparse tensor computations, irregular communication patterns, load imbalance issues, and fast filesystem access.
The team will use a sizeable INCITE allocation across the three platforms (Polaris, Aurora, and Frontier) to advance research in three directions. First, the scaling of parallel training of deep learning models to a large number of GPUs is non-trivial. They plan to use their framework, AxoNN, to analyze and optimize the performance and portability of training, finetuning, and inference. Second, they plan to explore efficient alternatives for transformer models for language modeling. They intend to train variants of modern language model architectures that are directly aimed at usability constraints in smaller academic laboratories. The team is focused on variants with smaller memory footprints and adaptive compute capabilities at deployment to enable more research and development in the fields of machine learning and NLP. Third, they propose to fine-tune trained models for several downstream tasks. The team plans to utilize the trained models for several HPC-related tasks such as improving portability and studying performance explainability.