Continual Pre-Training of Foundation Models

TPC Seminar Graphic


The talk will revolve around the topic of continual pre-training of foundation models. Foundation models are routinely pre-trained on billions of tokens, only to restart the process over again once new data becomes available. A much cheaper and more efficient solution would be to enable the continual pre-training of these models, i.e. updating pre-trained models with new data instead of re-training them from scratch. This talk will dive into challenges and insights around continually pre-training these models as new data and modalities emerge, to further expand their capabilities. Specifically, the role of learning rate schedules and replay strategies in this process will be explored. The talk will explore scenarios including training on new data, new languages and integrating new modalities to build general purpose foundation models.


Kshitij Gupta is a MSc student at Mila through the Université de Montréal (UdeM) under the supervision of Prof. Irina Rish and Prof. Sarath Chandar. He completed his undergraduate degree in Computer Science at the University of Illinois Urbana-Champaign. He is working towards building highly multimodal generally intelligent agents and is researching topics on multimodal models, reasoning and memory augmented neural networks. He is passionate about AGI and his research interests include scaling laws and embodied agents. He has previously gained valuable industry experience through his work at Microsoft and DeepMind and has been recognized for his contributions, receiving awards such as the esteemed Henry Ford Scholar Award.

For more information about upcoming speakers please visit the TPC Seminar Series Webpage:

Enter your credentials.

Search for your seminar

Click “Add to calendar”