As the data requirements of commercial and scientific applications continue to increase at an unprecedented rate, obtaining optimal end-to-end data transfer performance becomes of crucial importance for a broad range of data-intensive applications. Achieving optimal end-to-end data transfer performance requires effectively utilizing the available network bandwidth and resources, yet in practice the transfers seldom reach the levels of utilization they potentially could. Tuning protocol parameters such as pipelining, parallelism, and concurrency can significantly increase the network utilization and the transfer performance, however determining the best combination for these parameters is a challenging task, as real-time network conditions can vary greatly between any given sites.
In this dissertation, we propose to explore novel algorithms for application-level tuning of protocol parameters to maximize the data transfer throughput especially in wide-area networks. The contributions of this research will include: (1) analysis and prediction of optimal protocol parameter combinations based on the dataset and network characteristics; (2) algorithms to cluster the datasets into comparable partitions, efficiently distribute the maximum allowed concurrency level among partitions, and transfer multiple partitions concurrently for maximum transfer throughput; (3) dynamic monitoring of the instantaneous data transfer throughput and online tuning of the protocol parameters to detect and remedy possible transfer slowdowns.
As a preliminary work, we have developed several application-level algorithms that dynamically tune the protocol parameters, including the number of parallel data streams per file (for large file optimization), the level of control channel pipelining (for small file optimization), and the level of concurrent file transfers to fill the long fat network pipes (for all files). The developed algorithms employ novel techniques to group and transfer set of files in order to yield maximum transfer throughput. In order to minimize the negative effect of “lots of small files” on average transfer throughput, we introduced the “multi-chunk concurrency" technique, which partitions the dataset into chunks considering the file sizes and number of files in the dataset, and transfers certain chunks concurrently. In “proactive multi-chunk” technique, we dynamically change allocation of chunks among TCP channels to improve the overall performance of concurrency by balancing the small and large chunks. And, in “max-fair multi-chunk” technique, we aim to make use of the concurrent chunk transfers as well as keeping the network and end-system utilization at a fair level. The experimental results are very promising, and our algorithms outperform other existing solutions in this area. Currently, we are in the process of developing a “hysteresis” based optimization technique, which combines real-time dynamic tuning with offline prediction based on historical data.
Bio:
Engin Arslan received his BS degree of Computer Engineering from Bogazici University and MS degree from University at Nevada, Reno. Currently, he is pursuing his PhD of Computer Science at University at Buffalo, SUNY. He is also working as a research assistant at UB, SUNY. His research interests include high performance networks, data intensive distributed computing, and cloud computing.