Historical Pandemic Musing on Big Data Summarization with Novel Applications
Amr El Abbadi
Department of Computer Science
University of California at Santa Barbara
Santa Barbara, CA 93106
During the past two decades we have seen an unprecedented increase in the amount of data that is being generated from numerous internet-scale applications. As hundreds of millions to billions of users interact with these applications, there is a continuous flow of interaction or log data that is collected by internet companies hosting these applications. Before this data can be subject to modeling and analysis, it is often necessary to obtain summary statistics such as the cardinality of unique visitors, frequency counts of users from different states or countries, and in general, finding the quantile and median information from the dataset. Efficient algorithms exist for computing the exact information over the data. Unfortunately, these algorithms require a considerable amount of time, scanning the data multiple times, or require additional storage that is linear in the size of the dataset itself. Approximation methods, with guaranteed error bounds, developed in the context of streaming data are extremely effective to extract useful and relatively accurate knowledge from big data. In this talk, we will review the recent, and not so recent, advances in big data summarization. The main objective of this tutorial-style talk is to demonstrate the strong relationship between the mathematics of big data and the management of big data. We also show some of our recent results and how some of these approaches have diverse applications even in system design.
Amr El Abbadi is a Professor of Computer Science at the University of California, Santa Barbara. He received his B. Eng. from Alexandria University, Egypt, and his Ph.D. from Cornell University. His research interests are in the fields of fault-tolerant distributed systems and databases, focusing recently on Cloud data management and blockchain based systems. Prof. El Abbadi is an ACM Fellow, AAAS Fellow, and IEEE Fellow. He was Chair of the Computer Science Department at UCSB from 2007 to 2011. He has served as a journal editor for several database journals, including, The VLDB Journal, IEEE Transactions on Computers and The Computer Journal. He has been Program Chair for multiple database and distributed systems conferences. He currently serves on the executive committee of the IEEE Technical Committee on Data Engineering (TCDE) and was a board member of the VLDB Endowment from 2002 to 2008. In 2007, Prof. El Abbadi received the UCSB Senate Outstanding Mentorship Award for his excellence in mentoring graduate students. In 2013, his student, Sudipto Das received the SIGMOD Jim Gray Doctoral Dissertation Award. Prof. El Abbadi is also a co-recipient of the Test of Time Award at EDBT/ICDT 2015. He has published over 300 articles in databases and distributed systems and has supervised over 35 PhD students.