The team’s innovative framework lays the foundations to exploit deep transfer learning at scale, data clustering and recursive training to produce large-scale galaxy catalogs in the Large Synoptic Survey Telescope (LSST) era.
“We’re excited to work with the team at NCSA and Argonne as well as the researchers who drove the original Galaxy Zoo effort to pursue this important area of scientific discovery,” said Tom Gibbs, manager of developer relations at NVIDIA. “Using these new methods, we’re taking an important step to understanding the mystery of dark energy.”
Highlights of the study include:
- The first application of deep transfer learning using disparate datasets for galaxy classification. The team used deep transfer learning to transfer knowledge from Xception, a state-of-the-art neural network model for image classification trained with the ImageNet dataset, to classify SDSS galaxy images. Transfer learning between similar datasets, such as images of human faces, has been traditionally used in computer science literature. In stark contrast, this study uses a pre-trained model for real-world object recognition and then transfers its knowledge to classify galaxies.
- The researchers developed open-source software stacks to extract galaxy images from the SDSS and DES surveys at scale using the NCSA’s Blue Waters supercomputer. Deep learning algorithms were prototyped and trained using NVIDIA GPUs in the Bridges supercomputer at the Pittsburgh Supercomputing Center through the National Science Foundation’s Extreme Science and Engineering Discovery Environment (XSEDE). Finally, deep transfer learning was combined with distributed training to reduce the training stage of the Xception model with galaxy image datasets from five hours to just eight minutes using ALCF supercomputing resources.
- The researchers used deep neural network classifiers to label over 10,000 unlabeled DES galaxies that have not been observed in previous surveys. The neural network model models are then turned into feature extractors to show that these unlabeled datasets can be clustered according to their morphology, forming two distinct datasets.
- ALCF researchers created a visualization to show the output of the penultimate layer of a deep neural network during training as it is learning to classify galaxies as spiral or elliptical.
Acknowledgments:
The ALCF is a DOE Office of Science User Facility. This research was carried out as part of the ALCF Data Science program (ADSP). The goal of ADSP is to accelerate discoveries across a broad range of scientific domains by supporting projects that require data-intensive and machine learning algorithms to address challenging research problems at scale.
This research is part of the Blue Waters sustained-petascale computing project, which is supported by the National Science Foundation (awards OCI-0725070 and ACI-1238993) and the State of Illinois. Blue Waters is a joint effort of the University of Illinois at Urbana-Champaign and its National Center for Supercomputing Applications.
This work used the Extreme Science and Engineering Discovery Environment (XSEDE), which is supported by National Science Foundation grant number ACI-1548562. Specifically, it used the Bridges system, which is supported by NSF award number ACI-1445606, at the Pittsburgh Supercomputing Center (PSC). We gratefully acknowledge grant TG-PHY160053.
Argonne National Laboratory seeks solutions to pressing national problems in science and technology. The nation's first national laboratory, Argonne conducts leading-edge basic and applied scientific research in virtually every scientific discipline. Argonne researchers work closely with researchers from hundreds of companies, universities, and federal, state and municipal agencies to help them solve their specific problems, advance America's scientific leadership and prepare the nation for a better future. With employees from more than 60 nations, Argonne is managed by UChicago Argonne, LLC for the U.S. Department of Energy's Office of Science.
The U.S. Department of Energy's Office of Science is the single largest supporter of basic research in the physical sciences in the United States and is working to address some of the most pressing challenges of our time. For more information, visit https://energy.gov/science