The AI Testbed aims to help evaluate the usability and performance of machine learning-based high-performance computing applications running on these accelerators. The goal is to better understand how to integrate with existing and upcoming supercomputers at the facility to accelerate science insights.
We are currently offering allocations on our Groq, Graphcore Bow IPUs, Cerebras CS-2, and SambaNova DataScale systems.
AI-Testbed Links
Systems
GroqRack (Available for Allocation Requests)
GroqRack Inference
- System Size: 72 Accelerators (9 nodes x 8 Accelerators per node)
- Compute Units per Accelerator: 5120 vector ALUs
- Performance of a single accelerator (TFlops): >188 (FP16) >750 (INT8)
- Software Stack Support: GroqWare SDK, ONNX
- Interconnect: RealScale TM
Cerebras CS-2 (Available for Allocation Requests)
Cerebras CS-2 Wafer-Scale Cluster WSE-2
- System Size: 2 Nodes (each with a Wafer scale engine) including Memory-X and Swarm-X
- Compute Units per Accelerator: 850,000 Cores
- Performance of a single accelerator (TFlops): >5780 (FP16)
- Software Stack Support: Cerebras SDK, Tensorflow, Pytorch
- Interconnect: Ethernet-based
SambaNova Dataflow (Available for Allocation Requests)
- System Size: 64 Accelerators (8 nodes and 8 accelerators per node)
- Compute Units per Accelerator: 1280 Programmable compute units
- Performance of a single accelerator (TFlops): >660 (BF16)
- Software Stack Support: SambaFlow, Pytorch
- Interconnect: Ethernet-based
Graphcore Bow Pod64 (Available for Allocation Requests)
Graphcore Intelligent Processing Unit (IPU)
- System Size: 64 Accelerators (4 nodes x 16 Accelerators per node)
- Compute Units per Accelerator: 1472 independent processing units
- Performance of a single accelerator (TFlops): >250 (FP16)
- Software Stack Support: PopArt, Tensorflow, Pytorch, ONNX
- Interconnect: IPU Link
Habana Gaudi-1
Habana Gaudi Tensor Processing Cores
- System Size: 16 Accelerators (2 nodes x 8 Accelerators per node)
- Compute Units per Accelerator: 8 TPC + GEMM engine
- Performance of a single accelerator (TFlops): >150 (FP16)
- Software Stack Support: Synapse AI, TensorFlow and PyTorch
- Interconnect: Ethernet-based