There is an increasingly pressing need, by several applications in diverse domains, for developing techniques able to analyze very large collections of high-dimensional vectors. Examples of such applications come from scientific, manufacturing and social domains, where in several cases they need to apply machine learning techniques for knowledge extraction. It is not unusual for these applications to involve vector collections in the order of hundreds of millions to billions, which are often times not analyzed in their full detail due to their sheer size. In this talk, we describe examples of data sources that produce high-dimensional vectors, and focus on two popular types: data series and deep network embeddings. We discuss the solutions that have been independently developed and are used for each one of these types, and argue that the data series solutions are the overall winners, even on general high-d datasets. Finally, we describe the current efforts in this area, as well as the open research problems.
Themis Palpanas is an elected Senior Member of the French University Institute (IUF), a distinction that recognizes excellence across all academic disciplines, and Distinguished Professor of computer science at the University Paris Cite (France), where he is director of the Data Intelligence Institute of Paris (diiP), and director of the data management group, diNo. He received the BS degree from the National Technical University of Athens, Greece, and the MSc and PhD degrees from the University of Toronto, Canada. He has previously held positions at the University of California at Riverside, University of Trento, and at IBM T.J. Watson Research Center, and visited Microsoft Research, and the IBM Almaden Research Center. His interests include problems related to data science (big data analytics and machine learning applications). He is the author of 9 US patents (3 of which have been implemented in world-leading commercial data management products), and 2 French patents. He is the recipient of 3 Best Paper awards, and the IBM Shared University Research (SUR) Award. He is currently serving on the VLDB Endowment Board of Trustees, as an Associate Editor in the TKDE, and IDA journals, as well as on the Editorial Advisory Board of the IS journal, and the Editorial Board of the TLDKS Journal. He has served as Editor in Chief for the BDR Journal (that he drove to an impact factor of 3.578 and cite score of 8.6), as General Chair for VLDB 2013, Associate Editor for VLDB 2022, 2019 and 2017, Research PC Vice Chair for ICDE 2020, and Workshop Chair for EDBT 2016, ADBIS 2013, and ADBIS 2014, General Chair for the PDA@IOT International Workshop (in conjunction with VLDB 2014), and General Chair for the Event Processing Symposium 2009.