In this session, we will cover how Data Parallel Python can be used to develop high-performing code for ALCF's upcoming Aurora supercomputer. The talk will introduce Numba-dppy and show examples of how to write data-parallel code inside numba.jit decorated functions and offload them to a SYCL device. We will provide examples of how to write an explicit kernel using the @numba_dppy.kernel decorator. Numba-dppy is packaged as part of Intel Distribution for Python*, which is included with the Intel oneAPI AI Analytics Toolkit.
The talk will also cover dpctl, a companion library intended to make it easier to write Python native extensions based on DPC++. Dpctl provides a Python binding for the DPCPP runtime classes, an API to manage devices, and wrappers for the Unified Shared Memory (USM) allocators to enable creation of Python objects that use SYCL USM for data allocation.
For use cases, we will cover Pairwise, Black Scholes, and K-Means as examples to demonstrate the CPU and GPU implementation of numba-dppy and practice live sample code on the Intel DevCloud and/or Argonne's Joint Laboratory for System Evaluation (JLSE).