Quick Start: Using Apache Spark for Large-Scale Data Processing

Help Desk

ALCF Dev Session

This is an interactive webinar focused on using Apache Spark, a framework for parallel data processing, on ALCF computing resources. The webinar will present a brief tutorial on Apache Spark, provide instructions for running the framework on ALCF systems, discuss the unique characteristics of Theta, and recommend a few tuning parameters to achieve optimal performance.

Xiao-Yong Jin, Argonne National Laboratory