Meaning extraction from large-scale and diverse information and data sources

With the increasing availability of data and computational power at our fingertips, a growing number of organizations and individuals are able to analyze and extract knowledge. Dr. Mohammed Zaki, of Rensselaer Polytechnic Institute, is working to create novel models, algorithms, and systems to enable a more intuitive use and development of data mining and analysis in all areas and especially for applications in bioinformatics and social networks. Dr. Zaki hopes to use his research to make data mining and learning easier for both experts in the Computer Science field as well as non-experts, who are usually professionals with no formal background in Computer Science and Statistics but nevertheless are compelled to analyze huge amounts of data. His research is focused on scalable new methods and novel tools and platforms for coding, mining, and analyzing massive, complex, and interconnected data. The applications of what he terms "intuitive data mining" will allow other researchers in social sciences, biology and life sciences, data journalism, and many other areas to better access, organize and mine relevant information. It will also allow data science experts to reach new advances due to more sophisticated technology.

Dr. Zaki's highly technical research is grounded in real-world applications, especially in bioinformatics, and social and citation networks. For example, in bioinformatics tremendous gains are possible when scientists simultaneously mine the data from genomes, protein structures and networks, metabolic pathways, drug interactions, and numerous other curated datasets. Dr. Zaki is developing novel methods for integrated and holistic mining of omics-scale data to better understand complex systems. Another project on the Frontiers of Science hopes to develop mining techniques to identify ground-breaking ideas and trends, summarize open and solved problems, and display summaries on a timeline by simultaneously mining the graph structure and content of citation networks.

In order to achieve his research goals, Dr. Zaki is directed by three main themes:

  • Living in an Interconnected World: To make sense of and to extract valuable insights from the diverse data we need an integrated or holistic modeling and mining approach. Dr. Zaki is developing novel techniques for integrated graph analytics. His tools will allow experts and non-experts to makes sense of the incredible amount of information produced by others to better inform their own work.

  • Rise of Complex Big Data: Massive amounts of data are being collected in different disciplines ranging from the natural and physical sciences to the social sciences and digital humanities. Dr. Zaki is scaling his existing research in order to mine and learn from big data. By building scalable and parallel data mining and graph mining algorithms, Dr. Zaki and his team are leveraging state-of-the-art hardware and architectural trends, such as many-core processors and cloud computing, to work with big data sets in a reasonable amount of time.

  • Data Analytics for the Masses: There is an increasing gap between the scale and complexity of data and the tools that currently exist to mine them for knowledge. Dr. Zaki is focusing on intuitive data mining methods and frameworks that will address the necessity for improved tools for data science.

Dr. Zaki is a Professor of Computer Science at RPI. He received his Ph.D. degree in computer science from the University of Rochester in 1998. He has over 225 publications in data mining and bioinformatics. He is currently on the Board of Directors of theACM Special Interest Group on Knowledge Discovery and Data Mining (SIGKDD), and is the Information Director for ACM Special Interest Group on Bioinformatics, Computational Biology, and Biomedical Informatics (SIGBIO). He is an Area Editor for Statistical Analysis and Data Mining, and an Associate Editor for Data Mining and Knowledge DiscoveryACM Transactions on Knowledge Discovery from Data, and Social Network Analysis and Mining. He has served as the program co-chair for all the main data mining conferences, such as ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'09), IEEE International Conference on Data Mining (ICDM'12), SIAM International Conference on Data Mining (SDM'08), ACM International Conference on Knowlegde Management (CIKM'12), Pacific-Asia Conference on KDD (PAKDD'10), and IEEE International Conference on Bioinformatics and Biomedicine (BIBM'11), and he is the founding co-chair for the BIOKDD (Data Mining in Bioinformatics) workshop. Dr. Zaki is also the general chair for the SIAM International Conference on Data Mining, 2014-2015.

Dr. Zaki has always been interested in understanding nature and in mathematics. Computer science has been the perfect way for him to combine both interests. He is deeply interested in discovering interesting and useful knowledge and extracting meaning from vast amounts of data, especially for applications in bioinformatics and enriched networks.

Dr. Zaki is a strong proponent of open source software and education. All of the algorithms and tools developed within his research group are available online as Open Source Software. As another example, his new textbook "Data Mining and Analysis: Fundamental Concepts and Algorithms" (Cambridge University Press, 2014) is available as a PDF download for online reading, as are video lectures and additional educational resources for learning data mining concepts. The book has been accessed/downloaded over 44,000 times since October 2013 from over 160 countries (based on Google Analytics).

In his free time, aside from research, Dr. Zaki enjoys playing squash, painting (mainly watercolor) and hiking. 

Website: http://www.cs.rpi.edu/~zaki

2013 Board of Directors (Executive Committee), ACM Special Interest Group on Knowledge Discovery and Data Mining (SIGKDD), 2013-Present

ACM Distinguished Scientist, 2010

DOE Office of Science Early Career Principal Investigator Award in Applied Mathematics, Computer Science and High-Performance Networks, U.S. Department of Energy, 2002

NSF Faculty Early Development Award (CAREER Award), National Science Foundation, 2001

Google Faculty Research Award, 2011

HP Labs Innovation Award, 2010-2012

Best Paper Award, 13th Pacific-Asia Conference on Knowledge Discovery and Data Mining, Bangkok, Thailand, April 2009,

U.S. Patent #6230151: "System and Method for Scalable Parallel Classification for Data Mining on Shared-memory Systems"

Mohammed J. Zaki, Ching-Tien Ho, and Rakesh Agrawal, IBM Almaden Research Center. Granted on 5/8/200. https://www.google.com/patents/US6230151

U.S. Patent #2013009713: "Discovering Representative Composite CI Patterns in an IT System"

Omer Barkol, Ruth Bergman, Yifat Felder, Shahar Golan and Arik Sityon (HP), Mohammed J. Zaki and Pranay Anchuri (RPI), Hewlett-Packard Company, Granted on 4/18/2013, https://www.google.com/patents/US20130097138