Getting Started on Sunspot

* ACCESS TO SUNSPOT IS ENABLED FOR ESP AND ECP TEAMS ONLY *

Overview
Prerequisites for Access
Getting Help
- Known Issues
Logging into Sunspot
Home and Project Directories
- Quotas
Scheduling
- Queue Policies
Allocation usage
Data Transfer
Proxy Settings
Git SSH Protocol
- Using non-default SSH key for GitHub
Programming Environment Setup
- OpenMP Stack Size on the CPU
GPU Validation Check
MPI
- Aurora MPICH
- CrayMPI
  - Building on UAN
  - Running on compute nodes
Kokkos
Debugging Applications
- Running gdb-oneapi in batch mode
Conda
Spack and E4S
VTune
DAOS

Overview

The Sunspot Test and Development System (TDS) consists of 2 racks, each with 64 nodes, for a total of 128 nodes
Each node consists of 2x Intel Xeon CPU Max Series (codename Sapphire Rapids or SPR) and 6x Intel Data Center GPU Max Series (codename Ponte Vecchio or PVC).
- Each Xeon has 52 physical cores supporting 2 hardware threads per core
Interconnect is provided via 8x HPE Slingshot-11 NICs per node.

Sharing of any results from Sunspot publicly no longer requires a review or approval from Intel. However, anyone publishing these results should include the following in their materials: "This work was done on a pre-production supercomputer with early versions of the Aurora software development kit." In addition, users should acknowledge the ALCF. Refer to the acknowledgement policy page for details : https://docs.alcf.anl.gov/policies/alcf-acknowledgement-policy/#alcf-only-acknowledgement. Please note that certain information on Sunspot hardware and software is considered NDA and cannot be shared publicly.

Sunspot is a Test and Development System and it is extremely early in the deployment of the system - do not expect a production environment !

Expect to experience:

Hardware instabilities – possible frequent downtimes
Software instabilities – non-optimized compilers, libraries, and tool; frequent software updates
Non-final configurations (e.g. storage, OS versions, etc.)
Short notice for downtimes (scheduled downtimes will be with 4 hr notice, but sometimes downtimes may occur with just an email notice). Notices go to the sunspot-notify@alcf.anl.gov email list. All users with access are added to the list initially.

Prerequisites for Access to Sunspot/Aurora

* ACCESS TO SUNSPOT (and AURORA) IS ENABLED FOR ESP AND ECP TEAMS ONLY *

ECP:

Exascale Computing Project (ECP) team members must:

Request Aurora early hardware/software access through ECP by filling out the Jira* form: https://jira.exascaleproject.org/servicedesk/customer/portal/10/create/254. If you have already put in a request and it was not rejected or you did not change institutions, please skip this step as you do not need to put in a 2nd request. Note that access to the ECP Atlassian/Jira tool ends for users ends on December 31, 2023. After December 31st, the ECP project office will no longer accept Sunspot account requests. All requests must be submitted before December 31, 2023.
If you don’t have an ECP Atlassian/Jira account, follow the steps below. Questions regarding ECP Jira account or access should be emailed to ecp-support@exascaleproject.org. Proceed to step 2 once you have submitted the ECP Jira form.
1. Ask your PI or his/her representative to complete the onboard form https://jira.exascaleproject.org/servicedesk/customer/portal/20/create/189 and be sure they select “Jira Project” in the tools access list (Optional to also select “Confluence”).
2. Once submitted, notifications are sent to initiate the ECP Atlassian account creation process. PI approval and PAS (personnel access system) approval must be completed before the account is created. PAS processing for foreign nationals can take 7-10 days or more after receipt of required materials.
3. Requestor will be notified when the ECP Atlassian account is created.
Please read and acknowledge the latest Terms of Use by filling out the form below. You are responsible for ensuring you are authorized by your institution to read and acknowledge the TOU: https://events.cels.anl.gov/event/147/surveys/7.
Have an active ALCF account and be a member of all the appropriate ECP project on Polaris.
1. Request for an account if none: https://accounts.alcf.anl.gov/#/accountRequest. Search for your project(s) by the WBS number (for ECP) or name with the right PI. Do not choose projects ending in _CNDA.
2. Re-activate if your account is inactive: https://accounts.alcf.anl.gov/#/accountReactivate. Search for your project by the WBS (for ECP) number or name with the right PI. Do not choose projects ending in _CNDA.
3. If you have an active account but you are not on all the ESP/ECP projects on Theta/Polaris, request to join the projects that are missing: https://accounts.alcf.anl.gov/#/joinProject. Search for your project by the WBS number (for ECP) or name with the right PI. Do not choose projects ending in _CNDA.

Team members that satisfy all the pre-requisites listed above should then email support@alcf.anl.gov requesting access to Sunspot/Aurora.

ESP:

Refer to this page for instructions: https://docs.alcf.anl.gov/aurora/getting-started-on-aurora/#for-aurora-early-science-program-esp-team-members

Getting Help:

Email ALCF Support : support@alcf.anl.gov for bugs, technical questions, software requests, reservations, priority boosts, etc.
- ALCF’s user support team will triage and forward the tickets to the appropriate technical SME as needed
- Expect turnaround times to be slower than on a production system as the technical team will be focused on stabilizing and debugging the system
For faster assistance, consider contacting your project’s POC at ALCF (project catalyst or liaison)
- They are an excellent source of assistance during this early period and will be aware of common bugs and known issues
ECP and ESP users will be added to a CNDA Slack workspace, where CNDA discussions may occur. An invite to the slack workspace will be sent when a user is added to the Sunspot resource.

Known Issues

A known issues page can be found in the JLSE Wiki space used for NDA content. Note that this page requires JLSE Aurora early hw/sw resource account for access : https://wiki.jlse.anl.gov/display/inteldga/Known+Issues

Logging into Sunspot user access nodes

You will be able to access the system via SSH'ing to 'bastion.alcf.anl.gov'. This bastion is merely a pass-through erected for security purposes and is not meant to host files. Once on the bastion, SSH to 'sunspot.alcf.anl.gov'. It is round robin to the UANs (user access nodes). To use proxyjump, see the DataTransfer section below.

Note that Sunspot uses ALCF credentials (same as Polaris and https://accounts.alcf.anl.gov ) and not JLSE credentials.

Home and project directories

Home mounted as /home, shared on uans and computes. Bastions have a different /home which is on Swift (shared with Polaris, Theta, Cooley). Default quota is 50 GB.
Project directories are on /lus/gila/projects
- ALCF staff should use /lus/gila/projects/Aurora_deployment project directory. ESP and ECP project members should use their corresponding project directories. The project name is similar to the name on Theta/Polaris with an _CNDA suffix (for eg: projectA_aesp_CNDA, CSC250ADABC_CNDA). Default quota is 1 TB. The project PI should email support@alcf.anl.gov if their project requires additional storage.

Home and Project directories are on a Lustre file system called Gila.

Quotas

Default home quota is 50 GB. Use this command to view your home directory quota usage:

soft/tools/alcf_quota/bin/myquota

Leadership Computing Resources

Featured: Aurora

Computational Science

Featured: Engineering

Growing the HPC Community

Accelerating Science

Support Center

Featured: Get Started

Featured: MyALCF

Help Desk

MyALCF

*** ACCESS TO SUNSPOT IS ENABLED FOR ESP AND ECP TEAMS ONLY ***

Table of Contents:

Overview

Prerequisites for Access to Sunspot/Aurora

*** ACCESS TO SUNSPOT (and AURORA) IS ENABLED FOR ESP AND ECP TEAMS ONLY ***

ECP:

ESP:

Getting Help:

Known Issues

Logging into Sunspot user access nodes

Home and project directories

Quotas

Scheduling

Queue Policies:

Submission Options:

Allocation usage

Data Transfer

Proxy Settings

Git with SSH protocol

Using Non-Default SSH Key for GitHub

Programming Environment Setup

Loading Intel OneAPI SDK + Aurora optimized MPICH

OpenMP Stack Size on the CPU

GPU Validation Check

MPI

Aurora MPICH

CrayMPI (WIP)

Building on UAN

Running on Compute Nodes

Kokkos

Debugging Applications

Running gdb-oneapi in batch mode

Conda

Spack and E4S

Using Spack packages

Using Spack to build packages

Package lists

VTune

DAOS

Training Assets: Getting Started

Overview of Polaris

Interactive High-Performance Computing with ALCF JupyterHub

From Polaris to Aurora (Overview of HW and SW)

Getting Started Bootcamp

Getting Started on ThetaGPU

Overview of the New Intel oneAPI Math Kernel Library (oneMKL)

* ACCESS TO SUNSPOT IS ENABLED FOR ESP AND ECP TEAMS ONLY *

* ACCESS TO SUNSPOT (and AURORA) IS ENABLED FOR ESP AND ECP TEAMS ONLY *