Data mining allows researchers to measure human interactions, draw conclusions and predictions for social good

Social sciences have always pondered the question; how do people get along? And until recently, studies to answer this question have come via controlled environments. However, with the explosion of social media and social networks on the Internet over the past 10 years, social scientists have found a looking-glass into human behaviors that provides natural and real-time behaviors on a large scale. As Dr. Jure Leskovec of Stanford University puts it, "with this large scale data on social interactions, we can now let the data speak for itself." Dr. Leskovec's research is combining large scale data analytics to build computational models of human behavior.

Dr. Leskovec's principal research interest is in large-scale data mining, focusing on the analysis of networks. Networks allow us to study phenomena across the social, technological, and natural worlds. Networks frame numerous research problems that lead to high-impact applications. For example, social networks on the Internet generate revenue of billions of dollars; detection of virus outbreaks in human networks can save lives; anomaly detection in computer-traffic networks is vital for security. His long-term research goal is to harness large-scale social and information networks to understand, predict, and ultimately, enhance social and technological systems. Dr. Leskovec aims to create explanatory and predictive models of actions of large groups of people and societies, and large technological systems.

Only a few years ago the goal of modeling large social and technological systems would be unattainable. However, in less than a decade the World Wide Web has been transformed from a large static library that people only browse, into a vast information resource where people interact with each other. Through the emergence of online social networking, social media and social gaming, daily activities of hundreds of millions of people are migrating to the Web. Today the Web is a "sensor" that captures the pulse of humanity: what we are thinking, what we are doing, and what we know.

The activity of millions of humans on the Web leaves massive digital traces, which come in many forms and modalities --- combining text, images, and video along spatial and temporal axes --- and are connected in rich network structures. Such data can naturally be represented, studied and analyzed as complex dynamic interaction networks. Networks provide enormous potential both to address long-standing scientific questions, and also to harness and inform the design of future social computing applications.

  • Networks pose interesting challenges and questions that motivate Dr. Leskovec's research: How is information in a social network created? How does it flow and mutate as it is passed from a node to node? How will a community or a social network evolve in the future? And also, how do we compute and develop algorithms that scale to massive dynamic networks?
  • Dr. Leskovec's research group strives to address the above challenges and harness the opportunities networks put forward. His group combines analysis of complex networks with large-scale data mining to develop computational models of networks. Their explorations consist of:

(1) Modeling the structure and evolution of networks and online communities.

(2) Developing methods for social media analytics and information diffusion.

(3) Working with massive datasets, as certain behaviors and patterns are observable only when the amount of data is large enough.

  • Dr. Leskovec is successfully collaborating with journalists, communication scientists, biologists, medical doctors as well as linguists. Similarly, this research also has an impact in industry: Facebook has deployed a version of Dr. Leskovec's link prediction engine, Samsung and Volkswagen are evaluating his social media recommendation engine, and market research firm Ipsos (as well as the largest Chinese online advertiser Allyes) are considering Dr. Leskovec's algorithms for online advertising. The collaboration with industry shows that his approaches are more than only ideas but get implemented and solve problems today.
  • Dr. Leskovec is currently expanding this research into Data Science for Social Good. The idea here is to use data analysis and computation to build models and understanding of societal problems. Two particular avenues he is pursuing are criminal courts and medical records.
  1. In collaboration with researchers at Yale, Harvard and University of Chicago, Dr. Leskovec is analyzing over two million pretrial court cases to help develop tools for judges so they can make optimal pretrial decisions on a defendant's fate without imposing biases that often disrupt due processes.
  2. In the medical world, Dr. Leskovec is looking at ways to extract new medical knowledge from data and find out connections between diseases that haven't been researched before. For instance, new data from such mining has shown connections between diabetes and sleep apnea.

Dr. Leskovec's research on networks is theoretically grounded and spans several areas of computer science as diverse as machine learning, theory and systems. Computation over massive data is at the heart of his research and the implications of this research have direct applications well beyond computer science --- to social sciences, physics, economics and marketing. Dr. Leskovec is excited about the influence that his research has already had within industry and academia, and looks forward to continuing to make strides on both theoretical foundations and real-world applications.

As a kid, Dr. Leskovec found himself captivated by computers; he even used all his savings to purchase his first computer. Soon he started programming and learning how to apply computation to extract patterns and models from data. Online social networks and social media fascinate Dr. Leskovec because he feels they provide a unique sensor into human behavior - something that was invisible to us until very recently.

Dr. Leskovec is currently an assistant professor of Computer Science at Stanford University. His research focuses on mining and modeling large social and information networks, their evolution, and diffusion of information and influence over them. Problems he investigates are motivated by large scale data, the Web and online media.

Best paper runner-up award, ACM Intl. Conf. on World Wide Web, WWW, 2014

Best paper award, ACM Intl. Conf. on World Wide Web, WWW, 2013

Okawa Foundation Fellowship, 2012

Alfred P. Sloan Fellowship, 2012

Kavli Fellow, National Academy of Sciences, Frontiers of Science, 2011