Abstract: The expansive, interdisciplinary nature of astronomy, combined with its open-access culture, makes it an ideal testing ground for exploring how Large Language Models (LLMs) can accelerate scientific discovery. Recent developments in LLM reasoning capabilities have shown substantial progress—our work demonstrates that AI agents can now achieve gold medal performance on International Olympiad on Astronomy and Astrophysics (IOAA) problems, indicating their growing analytical abilities.
In this talk, I will present our recent advances in applying LLMs as agents to real-world astronomical challenges. Through self-play reinforcement learning, we demonstrate how LLM agents can conduct end-to-end research tasks in galaxy spectral fitting, encompassing data analysis, strategy refinement, and outlier detection—approaching capabilities similar to human intuition and domain knowledge.
However, limitations remain. While autonomous research agents like Mephisto could theoretically help analyze all observed sources, the cost of closed-source solutions remains prohibitive for large-scale applications involving billions of objects. Additionally, the Moravec paradox manifests clearly in astronomy: tasks requiring abstract reasoning may be easier for AI than seemingly simple perceptual tasks. Current models still struggle with chart reading, multi-modal data interpretation, and other fundamental astronomical workflows.
To address the cost limitation, we developed lightweight, open-source specialized models (AstroSage and AstroLLaMA)—trained on arXiv literature—and evaluated them against carefully curated astronomical benchmarks. Our research shows that these specialized LLMs can outperform larger general-purpose models on astronomy Q&A tasks when appropriately pretrained and fine-tuned, demonstrating a path forward for building more capable and accessible astronomy-specific models.
Looking ahead, the path forward involves integrating more function-calling tools and building a comprehensive ecosystem—not just better models. The astronomical community's collaborative infrastructure will be important for scaling up automated inference and expanding the role of AI in astronomical research.