Bridging the Data Gap Between Children and Large Language Models

Michael C. Frank, Stanford University
TPC Seminar Graphic featuring title of event with the date.

While large language models require billions of words of text to show zero shot generalization and in-context learning, children show the same emergent behaviors with just a few million words of language input. What accounts for this difference? I’ll be discussing some of our attempts to measure and understand how language models and multimodal models can be compared productively with children’s learning using datasets and evaluations from developmental psychology.


Michael C. Frank is Benjamin Scott Crocker Professor of Human Biology in the Department of Psychology at Stanford University and Director of the Symbolic Systems Program. He received his PhD from MIT in Brain and Cognitive Sciences in 2010. He studies children’s language learning and development, with a focus on the use of large-scale datasets to understand the variability and consistency of learning across cultures. He is a founder of the ManyBabies Consortium, and has led open-data projects including Wordbank and the ongoing LEVANTE project. He has received awards including the Troland Award from the National Academy of Sciences and the FABBS Early Career Impact award. He served as President of the Cognitive Science Society, has edited for journals including Cognition and Child Development, and is current co-Editor in Chief of the Open Encyclopedia of Cognitive Science.

For more information about upcoming speakers please visit the TPC Seminar Series Webpage: