The AI Testbed aims to help evaluate the usability and performance of machine learning-based high-performance computing applications running on these accelerators. The goal is to better understand how to integrate with existing and upcoming supercomputers at the facility to accelerate science insights.
We are currently offering allocations on our Groq, Graphcore Bow IPUs, Cerebras CS-3, and SambaNova DataScale system.
NOTE: There is no need to request an allocation for Metis (SambaNova SN40L). Metis is available to all users through our AI Inference endpoints service.
AI Testbed Links
Systems
Cerebras CS-3 (Available through an allocation request)
- System Size: 2 4 Nodes (Each with a Wafer-Scale Engine) Including MemoryX and SwarmX
- Compute Units per Accelerator: 900,000 cores
- Performance of a single accelerator (TFlops): 125,000 (FP16)
- Software Stack Support: Cerebras Model Zoo, PyTorch
- Interconnect: Ethernet-based
SambaNova Dataflow (Available through an allocation request)
- System Size: 64 Accelerators (8 nodes and 8 accelerators per node)
- Compute Units per Accelerator: 1280 Programmable compute units
- Performance of a single accelerator (TFlops): >660 (BF16)
- Software Stack Support: SambaFlow, Pytorch
- Interconnect: Ethernet-based
Metis: SambaNova SN40L (Available to all users)
- System Size: 32 Accelerators (16 Nodes and 2 Accelerators per Node)
- Compute Units per Accelerator: 1,040
- Estimated Performance of a Single Accelerator (TFlops): 637.5 (BF16)
- Software Stack Support: SambaStudio, SambaStack
- Interconnect: Ethernet-based
GroqRack (Available through an allocation request)
GroqRack Inference
- System Size: 72 Accelerators (9 nodes x 8 Accelerators per node)
- Compute Units per Accelerator: 5120 vector ALUs
- Performance of a single accelerator (TFlops): >188 (FP16) >750 (INT8)
- Software Stack Support: GroqWare SDK, ONNX
- Interconnect: RealScale TM
Graphcore Bow Pod64 (Available through an allocation request)
Graphcore Intelligent Processing Unit (IPU)
- System Size: 64 Accelerators (4 nodes x 16 Accelerators per node)
- Compute Units per Accelerator: 1472 independent processing units
- Performance of a single accelerator (TFlops): >250 (FP16)
- Software Stack Support: PopArt, Tensorflow, Pytorch, ONNX
- Interconnect: IPU Link