Prof. Samuel Kaski

Prof. Samuel Kaski

Aalto University and University of Helsinki, Finland


Bio: Samuel Kaski is Academy (research) Professor of the Academy of Finland, Professor of Computer Science at Aalto University, and Director of Finnish Center of Excellence in Computational Inference Research COIN. His field is probabilistic machine learning, with applications involving multiple data sources in interactive information retrieval, data visualization, health and biology.


Bayesian factorization of multiple data sources

Abstract: An increasingly common data analysis task is to factorize multiple data matrices together. The goal can be to borrow strength from related data sources for missing value imputation or prediction, or to find out what is shared between different sources and what is unique in each. I will discuss an extension of factor analysis to this task, group factor analysis GFA, and its extension from analysis of multiple coupled matrices to multiple coupled tensors and matrices. I will pick examples from molecular medicine and brain data analysis.

Dr. Sihem Amer-Yahia

Dr. Sihem Amer-Yahia

CNRS at LIG, Grenoble, France


Bio: Sihem Amer-Yahia is DR1 CNRS at LIG in Grenoble where she leads the SLIDE team. Her interests are at the intersection of large-scale data management and data analytics. Before joining CNRS, she was Principal Scientist at QCRI, Senior Scientist at Yahoo! Research and Member of Technical Staff at at&t Labs. Sihem served on the SIGMOD Executive Board, the VLDB Endowment, and the EDBT Board. She is the Editor-in-Chief of the VLDB Journal for Europe and Africa and is on the editorial boards of TODS and the Information Systems Journal. She was PC chair of SIGMOD Industrial 2015 and is currently chairing the VLDB Workshops 2016. Sihem received her Ph.D. in CS from Paris-Orsay and INRIA in 1999, and her Diplôme d’Ingénieur from INI, Algeria.


Worker-Centricity Could Be Today’s Disruptive Innovation in Crowdsourcing

Abstract: Organization studies have been focusing on understanding human factors that influence the ability of an individual to perform a task, or a set of tasks, alone, or in collaboration with others, for over 40 years. The reason crowdsourcing platforms have been so successful is that tasks are small and simple, and do not require a long engagement from workers. The crowd is typically volatile, its arrival and departure asynchronous, and its levels of attention and accuracy diverse. Today, crowdsourcing platforms have plateaued and, despite a high demand, they are not adequate for emerging applications such as citizen science and disaster management. I will discuss the need to think about making crowdsourcing worker-centric by accounting for human factors such as skills, expected wage and motivation. This talk will cover published work on team formation for collaborative tasks and ongoing work on adaptive task assignment and task composition to help workers find useful tasks. This is joint work with Senjuti Basu Roy from the University of Washington and Dongwon Lee, Penn State University.

Prof. Foster Provost

Prof. Foster Provost

New York University, USA


Bio: Foster Provost is Professor of Data Science and Andre Meyer Faculty Fellow at New York University. He is coauthor of the best-selling data science book, Data Science for Business. His research focuses on modeling behavior data, modeling (social) network data, crowd-sourcing for data science, aligning data science with application goals, and privacy-friendly methods. His research has won many awards, including the INFORMS Design Science Award and best paper awards at KDD across three decades. He cofounded several companies based on his research, including Dstillery, Integral Ad Science, and Detectica. Foster previously was Editor-in-Chief of the journal Machine Learning. His latest music album, Mean Reversion, is scheduled to be released in 2016.


The Predictive Power of Massive Data about our Fine-Grained Behavior

Abstract: What really is it about “big data” that makes it different from traditional data? In this talk I illustrate one important aspect: massive ultra-fine-grained data on individuals’ behaviors holds remarkable predictive power. I examine several applications to marketing-related tasks, showing how machine learning methods can extract the predictive power and how the value of the data “asset” seems different from the value of traditional data used for predictive modeling. I then dig deeper into explaining the predictions made from massive numbers of fine-grained behaviors by applying a counterfactual framework for explaining model behavior based on treating the individual behaviors as evidence that is combined by the model. This analysis shows that the fine-grained behavior data incorporate various sorts of information that we traditionally have sought to capture by other means. For example, for marketing modeling the behavior data effectively incorporate demographics, psychographics, category interest, and purchase intent. Finally, I discuss the flip side of the coin: the remarkable predictive power based on fine-grained information on individuals raises new privacy concerns. In particular, I discuss privacy concerns based on inferences drawn about us (in contrast to privacy concerns stemming from violations to data confidentiality). The evidence counterfactual approach used to explain the predictions also can be used to provide online consumers with transparency into the reasons why inferences are drawn about them. In addition, it offers the possibility to design novel solutions such as a privacy-friendly “cloaking device” to inhibit inferences from being drawn based on particular behaviors. This talk draws on work from several papers.