Major OS Upgrade Complete

Major OS Upgrade Complete

In order to make newer software versions available and to improve security and reliability—as well as to help pave the way for Theta to connect with the coming global filesystem—we had to update Theta’s major OS versions during the week of 12:00 CDT, Monday, July 13 to Wednesday, July 22, 2020.

Click here to view details pertaining to the programming environment (PE) updates on July 27.

We upgraded ALCF-Theta from its current SLES12/CNL6 configuration to SLES15/CNL7. This necessary enhancement is the last major OS version update of Theta. We also upgraded Theta’s Sonexion Lustre filesystem.

Some user codes will need to be rebuilt and/or relinked.

Please note: 

  • Some user codes need to be rebuilt and/or relinked.
  • Statically linked applications should not be affected, unless they depend on the OS in some non-obvious way. The default will remain static linking.
  • Hugepages of various sizes will be available as loadable modules, with none loaded by default.

See details below.

==============================

Improvements

  • Newer versions of system software will offer better support for user software.
  • Default routing has changed from Adaptive-0 to Adaptive-3 to reduce congestion effects.

Known impacts and remediations

We are taking measures to minimize impacts and ensure continuity of available software. The current list of known impacts and remediations: (Note: some of programming environment changes previously announced have been deferred to July 27)

  • Intel compilers and software products are not expected to be affected.
  • Python support under Intel python should be minimally affected, similarly for conda or user-provided pythons. However, if any of a code's underlying dependencies are reliant upon packages the Cray programming environment or upon specific versions of system libraries, they *may* be affected. The upgraded OS will support both python 2.7.17 and 3.6.10. 
  • Many user codes will need to be re-built and/or re-linked against the programming environment and spack provided dependencies.
  • Some older versions of cray-built packages will no longer be available. In some instances this may require migrating to a newer version of a dependency, or sanctioning a spack-built replacement for the cray package.

Changes to adaptive routing

The adaptive routing has been changed from "Adaptive 0" to "Adaptive 3". This change will result in positive overall performance improvement for applications especially those that are sensitive to network latency. Although we don't suggest it, any application may return to the default ADAPTIVE_0 by unloading the adaptive-routing-a3 module.

module unload adaptive-routing-a3

We appreciate any feedback on this change and how it may have impacted your application performance. Theta (Cray XC 40) uses the packet-level adaptive routing† which transfers packets on network potentially avoiding the congested links. This ensures balancing the network load on the available paths thereby realizing high network utilization even under heavy network load.

Adaptive routing on Aries comes in four different flavors differed by the way the weighting (bias) given to the minimal vs. nonminimal paths. The default adaptive routing used so far on Theta is ADAPTIVE_0 which has no bias towards minimal or nonminimal. Our recent research found that ADAPTIVE_3 which has a strong bias towards minimal routing is optimal for majority of the workloads as well as overall system-level congestion management, hence a recommendation was made to switch the default routing mode on Theta to ADAPTIVE_3.

https://www.cray.com/sites/default/files/resources/CrayXCNetwork.pdf