Big Data Analytics---A New Opportunity for Numerical Analysis and High Performance Computing

Jie Chen
Seminar

As the term ``big data'' appears more and more frequently in our daily life and research activities, it changes our knowledge of how large the scale of data can be and challenges our traditional practice of data handling. One such challenge occurs in statistical analysis, which drives the extraction of hidden information and assists human understanding of the underlying principles behind the data. Many analysis techniques suffer from the poor scalability of the numerical algorithms and thus pose significant difficulties for their applications to data that are ``big enough.'' In this talk , we start from a basic statistics principle---maximum likelihood estimation---to illustrate why traditional numerical calculations fail to provide answers and how novel numerical algorithms (including matrix function evaluation, trace approximation, and the solution of fully dense linear systems) are derived to widen the applicability of the principle to large-scale data in practice. Big data provides a fresh opportunity for numerical analysts to develop algorithms with a central goal of scalability in mind. Accompanied with the increasing computing power of high performance computers, parallelization is of the same importance in this process. The development of scalable and parallelizable numerical algorithms is key for convincing statisticians and data analysts to apply the powerful statistical theories on large-scale data that they currently feel uncomfortable to handle.