Streaming to the Cloud

Radu Tudoran
Seminar

In the recent years BigData has become an important aspect of scientific discoveries - a process referred to as the Forth Paradigm. From the wide spectrum of applications and acquisitions methods, the ones that will generate the biggest amounts of data fall in the category of streaming data, i.e., networks of sensors, observatories, telescopes or experiments such as CERN LHC. As the amount of acquired information grows and the location of data sources are increasingly geographically distributed, it becomes important to process the data in scalable and efficient ways. Cloud computing presents an interesting option for a scalable processing platform. However, the question arises how to best use cloud computing capabilities for geographically distributed stream processing. In this work, we explore and analyze different approaches to streaming data to the cloud and evaluate them in the context of multiple cloud offerings including Microsoft Azure, and and FutureGrid's Nimbus and OpenStack installations. We show, using an ATLAS application, that using the right approach to streaming data can improve the average data rates three times.