Storage and Networking

The ALCF provides high-performance data storage and networking capabilities to help users manage, store, and transfer large-scale datasets generated on the facility’s computing resources.

Data Storage

The ALCF’s data storage system is used to retain the data generated by simulations and visualizations. Disk storage provides intermediate-term storage for active projects, offering a means to access, analyze, and share simulation results. Tape storage is used to archive data from completed projects.

Disk Storage

The ALCF has Lustre file systems and GPFS file systems for data storage:

  • Grand: A Lustre file system residing on an HPE ClusterStor E1000 platform equipped with 100 Petabytes of usable capacity across 8480 disk drives. This ClusterStor platform provides 160 Object Storage Targets and 40 Metadata Targets with an aggregate data transfer rate of 650GB/s. Primary use of Grand is Compute campaign storage. Also see Theta Disk Quota, Data Policy, and Data Transfer.
  • Eagle: A Lustre file system residing on an HPE ClusterStor E1000 platform equipped with 100 Petabytes of usable capacity across 8480 disk drives. This ClusterStor platform also provides 160 Object Storage Targets and 40 Metadata Targets with an aggregate data transfer rate of 650GB/s. Primary use of Eagle is data sharing with the research community, using Globus. Also see Theta Disk Quota and Data Policy.
  • theta-fs0: A Lustre file system residing on an HPE Sonexion 3000 storage array with a usable capacity of 9.2PB and an aggregate data transfer rate of 240GB/s. Also see Theta Disk Quota and Data Policy.
  • theta-fs1: A GPFS file system that resides on an IBM Elastic Storage System (ESS) cluster with a usable capacity of 7.9PB and an aggregate data transfer rate of 400GB/s
  • mira-fs0: A GPFS file system that resides on 16 DDN SFA12Ke storage arrays that each contain 560 3TB SATA hard drives for a total of 8,960 disk drives and a total capacity of 26.8 PB raw storage (approximately 19.2PB usable). The aggregate data transfer rate is 240 GB/s.
    [mira-fs0 will be decommissioned on May 3, 2021]
  • mira-fs1: A GPFS file system that resides on 6 DDN SFA12Ke storage arrays that each contain 560 3TB SATA hard drives for a total of 3,360 disk drives and a total capacity of 10.1PB raw storage (approximately 7.2PB usable). The aggregate data transfer rate is 90 GB/s.
    [mira-fs1 will be decommissioned on May 3, 2021]

Compute projects are mapped to the file system that best matches their storage and performance needs.

Mira, Cetus, and Vesta have been decommissioned. For additional details and instructions on how to transfer data, please visit Decommissioning Mira

Tape Storage

ALCF computing resources share three 10,000-slot libraries using LTO6 and LTO8 tape technology. The LTO tape drives have built-in hardware compression with compression ratios typically between 1.25:1 and 2:1, depending on the data, giving an effective capacity of approximately 65PB. Also see Data Transfer and HPSS.

Networking

Networking is the fabric that ties all of the ALCF’s computing systems together.

The HPE Cray Theta system uses an internal proprietary network technology, known as Aries, for communicating between nodes. For more information about the Aries network, please see Theta/ThetaGPU Machine Overview.

InfiniBand enables communication between system I/O nodes and the various storage systems described above. The Production HPC SAN is built upon NVIDIA Mellanox High Data Rate (HDR) InfiniBand hardware. Two 800-port core switches provide the backbone links between eighty edge switches, yielding 1600 total available host ports, each at 200Gbps, in a non-blocking fat-tree topology. The full bisection bandwidth of this fabric is 320Tbps. The HPC SAN is maintained by the NVIDIA Mellanox Unified Fabric Manager (UFM), providing Adaptive Routing to avoid congestion, as well as the NVIDIA Mellanox Self-Healing Interconnect Enhancement for InteLligent Datacenters (SHIELD) resiliency system for link fault detection and recovery.

When external communications are required, Ethernet is the interconnect of choice. Remote user access, systems maintenance and management, as well as high performance data transfers are all enabled by the Local Area Network (LAN) and Wide Area Network (WAN) Ethernet infrastructure. This connectivity is built upon a combination of Extreme Networks SLX & MLXe routers and NVIDIA Mellanox Ethernet switches. 

ALCF systems connect to other research institutions over multiple 100Gbps Ethernet circuits that link to many high performance research networks, including local and regional networks like the Metropolitan Research and Education Network (MREN), as well as national and international networks like the Energy Sciences Network (ESnet) and Internet2.