The management software on Polaris has been successfully upgraded to HPCM 1.10, along with upgrades to SlingShot 2.1.2, Programming Environment 23.12, NVIDIA SDK 23.9, NVIDIA driver version 535.154.05, CUDA 12.2, and SUSE 15 SP5. Jobs that were queued before the upgrade have been restored to the appropriate queues but are on user hold since they need to be rebuilt for the new PE environment and major OS upgrade. Existing binaries are unlikely to run successfully. In addition to the system upgrades, several important changes have been made to the user software environment which may impact user workflows including:
- /soft/modulefiles is no longer in the default $MODULEPATH. To access modules installed in /soft, users should run module use /soft/modulefiles.
- Many user codes will need to be re-built and/or re-linked against the newer version of the Cray Programming Environment (23.12) and Spack provided dependencies.
- New Spack deployments in /soft.
- PE versions older than 23.12 deprecated.
- Deprecated software in /soft has been removed and new software has been installed.
- Memory limits were lowered on the logins due to resource contention to 8G of memory, and 8 cores per user which may result in error messages indicating abnormal process termination for user processes run on logins.
- Updated the datascience Anaconda module and built various packages and libraries with CUDA 12.4.1 to be compatible with the new Polaris NVIDIA GPU hardware driver (CUDA 12.2) and to use the latest MPI, NCCL, cuDNN, TensorRT, etc. libraries. PyTorch 2.3.0 and TensorFlow 2.16.1 are now available as part of this module.
For detailed information, please visit: https://docs.alcf.anl.gov/polaris/system-updates/#2024-04-22