Automated sorting of complex information becomes a reality

We are experiencing a data boom in diverse domains and the available data has grown exponentially in the last twenty years. In fact, at present, there is so much data available that it is nearly impossible to sift through it all without the added assistance of technology. In the past, categorizing information was reliant upon human labor, now however, advances in machine learning technology are making automated sorting of complex information a reality. Dr. Anima Anandkumar, of the University of California, Irvine, combines theoretical and applied machine learning research, to develop methods which are applicable in social networks, computational biology, and document analysis. In doing so, she and her team are able to find useful information in large-scale data, which can literally be “a needle in the haystack.”

Much of Dr. Anandkumar’s work relies upon identifying relationships among variables or unknowns based on the observed data. These relationships can be represented via graphs and can also involve hidden variables, which are not directly observed. In other words, she and her interdisciplinary team, of graduate students, biologists, neurologists, and sociologists, use sophisticated algorithms to sift through large data sets in order to make subtle inferences about how to categorize the information within them. These novel algorithms, which are scalable to huge datasets having billions of variables, produce highly accurate estimates with strong relevance to a number of applications including computer vision, natural language processing, social network analysis, and computational biology.

Current research includes:

  • Theoretical Research: Dr. Anandkumar’s theoretical research answers important questions that guide her applied work. For instance, she and her team answer questions such as, What kind of phenomena can we learn? Can we analyze the behavior of the algorithms we develop?, to enable the development of research with real world impact that are scalable while capturing the complexities of data.

  • Document Analysis: Dr. Anandkumar aims to develop probabilistic models in order to generate a process in which documents can be annotated without the effort of human users. For instance, picture the growing amount of articles published online each day. Rather than relying upon people to annotate and categorize articles, Dr. Anandkumar’s algorithms would effectively and quickly analyze the contents while picking up upon the subtleties between the lines of the article. 

  • Neuroscience: Dr. Anandkumar is using data made available by the Allen Brain Institute to learn about the unknown gene profiles of the individual cell types in the brains of mice. Living organisms have many different types of neurons and scientists are currently mostly only able to take measurements over groups of cells, rather than on single cells. Dr. Anandkumar’s work identifies what individual neurons do using sophisticated probabilistic models and advanced machine learning algorithms. Down the road, identification of individual neurons’ roles would lead to better understanding of neuronal pathways and development and curing of brain diseases such as multiple sclerosis and Alzheimer's.

  • Social Media: Social media has become an important platform for the organization of communities with both positive and negative outcomes. Dr. Anandkumar’s models are able to identify signs of malfeasance, like organized terrorism, in addition to positive phenomena, like trending topics, collaborative communities which have large applications for public policy goals. Dr. Anandkumar has developed highly scalable methods to learn about communities on a network with billions of users.

  • Tracking Online Student Learning in MOOCs: MOOCs or Massive Open Online Courses are revolutionizing education, and making it accessible to a large group of students. However, a major drawback is the lack of personalized attention and instruction. Dr. Anandkumar’s group has developed tools to automatically mine related concepts in courses (or over several courses) based on past performance of all the students on questions based on those concepts. She has also developed scalable approaches for extracting hidden groups of students, who demonstrate similar evolution of learning behavior for various concepts in the course.

Since her childhood, Dr. Anandkumar has been curious about how the world works, how it came into being, and what the mechanisms are that drive it. Mathematics thus became a language that allowed her to express and analyze the hidden structures of our world. This principle has lead her to the development of the right tools to search through and learn about hidden structures in different kinds of data. This “needle in the haystack” problem has motivated Dr. Anandkumar and her team to solve incredibly challenging problems with diligence and passion.

Dr. Anandkumar is passionate about many things beyond research, which help her to grow into a better person. She has been dancing since she was three years old, and enjoys many different forms of dance such as Indian, Middle-eastern, Flamenco, Tango, and Latin forms. She likes to spend time outdoors hiking, climbing, kayaking, swimming, snowboarding, and so on. She is a wanderlust at heart, and likes to explore new places and cultures. She also cares deeply about helping the underprivileged and in animal welfare, and believes in giving back to the society in several different ways.


AFOSR Young Investigator Award (YIP), 2015

Alfred P. Sloan Research Fellowship, 2014

Microsoft Faculty Fellowship, 2013

NSF CAREER Award, 2013

Best Thesis Award, 2009 by ACM SIGMETRICS Society

A. Anandkumar and D. Agrawal and C. Bisdikian and T. He, and S. Perelman. Selective Instrumentation For Distributed Applications For Transaction Monitoring. US 8433786 B2, April 2013.