A Performance Modeling Approach for Analyzing In-Memory MapReduce Workloads on Multi-core Architecture

Devesh Tiwari, Postdoctoral Candidate
Seminar

MapReduce parallel programming model has been widely adopted, including scientific data analysis and management. Recently, lightweight, fast, in-memory MapReduce runtime systems have been proposed for shared memory systems. Such in-memory MapReduce runtime systems have the potential to alleviate the parallel programming challenges. However, what factors affect performance and what performance bottlenecks exist for a given program are not well understood. In this talk, I will present a practical performance model that captures key performance factors, important trends, and behavior of in-memory MapReduce on multi-core architectures. I will discuss how our analytical model discovers several important findings and implications for system designers, performance tuners and programmers.

If time permits, I will share my experiences in applying analytical models for understanding performance and energy trade-offs in other execution paradigms, in particular performing data analysis on emerging storage devices such as Solid State Drives (SSDs). I will show that how analytical models can be used to understand the feasibility and design trade-offs in designing such systems.