Evaluating Large Language Models and Potential Pitfalls

An overview of evaluating large language models, including a discussion on potential pitfalls and limitations.

Bethany Lusch is a Computer Scientist in the data science group at the Argonne Leadership Computing Facility at Argonne National Lab. Her research expertise includes developing methods and tools to integrate AI with science, especially for dynamical systems and PDE-based simulations. Her recent work includes developing machine-learning emulators to replace expensive parts of simulations, such as computational fluid dynamics simulations of engines and climate simulations. She is also working on methods that incorporate domain knowledge in machine learning, representation learning, and using machine learning to analyze supercomputer logs. She holds a Ph.D. and MS in applied mathematics from the University of Washington and a BS in mathematics from the University of Notre Dame.

Marieme Ngom is an Assistant Computer Scientist at the Argonne Leadership Computing Facility. Her research interests include probabilistic machine learning, high-performance computing, and dynamical systems modeling with applications in chemical engineering and material sciences. Ngom received her Ph.D. in mathematics from the University of Illinois at Chicago (UIC) in 2019 under the supervision of Prof. David Nicholls. Marieme holds an MSc in mathematics from the University of Paris-Saclay (formerly Paris XI), an MSc in computer science from the National Polytechnic Institute of Toulouse, and an MEng in computer science and applied mathematics from the École nationale supérieure d’électrotechnique, d’électronique, d’informatique, d’hydraulique et des télécommunications (ENSEEIHT) in Toulouse.

Sandeep Madireddy is an Assistant Computer Scientist in the Mathematics and Computer Science Division at Argonne National Laboratory. His research interests include machine learning, probabilistic modeling and high performance computing, with applications across science and engineering. His current research aims at developing deep learning algorithms and architectures tailored for scientific machine learning, with a particular focus on improving training efficiency, model robustness, uncertainty quantification and feature representation learning. He has experience applying these approaches to address diverse problems in various domains, ranging from physical sciences to computer systems modeling and neuromorphic computing. His talk covers Comprehensive Evaluation of Scientific Foundation Models.

Argonne Leadership Computing Facility

Leadership Computing Resources

Featured: Aurora

Computational Science

Featured: Engineering

Growing the HPC Community

Accelerating Science

Support Center

Featured: Get Started

Featured: MyALCF

Help Desk

MyALCF

Leadership Computing Resources

Featured: Aurora

Computational Science

Featured: Engineering

Growing the HPC Community

Accelerating Science

Support Center

Featured: Get Started

Featured: MyALCF

Evaluating Large Language Models and Potential Pitfalls

Help Desk

MyALCF

Training Assets: Data Science

Evaluating Large Language Models and Potential Pitfalls

Intro to AI Series: AI Accelerators

Parallel Training Methods for AI

Large Language Models: Embeddings and Tokenization

Introduction to Large Language Models

Advanced Topics in Neural Networks