This session gives an overview of using Tensorflow, Pytorch, and JAX are the core deep learning frameworks on ALCF production resources. All three frameworks are accessible in python (and a few other languages, for tf/torch) and offer the core elements of:
- Automatic differentiation;
- GPU offload and acceleration from python;
- Library of essential building blocks of machine learning operations;
- Performant ways to scale codes out to multiple devices
- An ecosystem of extensions and custom tools to make your life easier;
- Export your trained models to open source inference engines (ONNX, etc)
- All are open source; All will be supported on Aurora. Pick the one that makes sense for your problem!