How Small Data Footprints will Transform our World

Small data is to individuals what Big Data is to institutions. Today our digital traces are amalgamated by commercial and governmental institutions in what we commonly think of as Big Data practices-Big Data in this sense is the domain of big institutions. The Small Data Lab explores new techniques for individuals to harness their own disparate sources of small data: the myriad of information we each generate implicitly about ourselves across our cellphone mobility patterns and call data records, search and click-through histories online, our personal shopping cart histories on and offline, the language patterns in our texts and tweets, and the games and media we consume. The goal is to enable individuals to be at the center of their own personal data universes and to stimulate a new ecosystem of apps and services that can create insightful, actionable, and at times, delightful value.

The Small Data Lab at Cornell Tech explores the systems, data, and human-computer interaction challenges around small data. How to design applications and services that interweave and make sense of multiple, diverse, noisy data-streams? How to enable apps that do not require us to warehouse every single grain of personal information, by intentionally generating systems that create and share higher-order summaries of individual relevant behaviors. Exploring these questions through iterative design, implementation, and evaluation of small data applications is Dr. Deborah Estrin's pathway to understanding the capabilities and limitations of current algorithms and architectures, while also opening up new areas of research in systems design, machine learning, and behavioral economics.


Three example ongoing application projects are:

  • Pushcart automatically processes grocery receipts to provide personalized, ongoing, and configurable nudges toward healthier grocery shopping.

  • Ora processes everyday activity traces to support key relationships through the passive pairwise sharing of how you are doing, not what you are doing.

  • PainLess transforms passive measurements of everyday activities into behavioral biomarkers to inform personalized precision clinical and self-care.


To support these applications three cross-cutting software building blocks have been developed:

  • Lifestreams is a modular stream-processing framework for analyzing, combining, correlating, and synthesizing personal data streams.

  • Email Analysis Framework supports flexible extraction of language features from individual email and makes them available for processing by lifestreams and inclusion in small data apps.

  • Ohmage is an open source platform for mHealth and small data end-to-end experimentation and evaluation.

  • Open mHealth is a non-profit initiative committed to developing, promoting and sustaining open architecture in mobile health.

Dr. Estrin's research interests extend beyond the technical exercise to the exploration of where we are headed as a society in our relationship to ourselves, each other, and the technologies and institutions that both generate and govern our personal data.


Dr. Estrin has been a Professor of Computer Science at Cornell NYC Tech and Professor of Public Health at Weill Cornell Medical College in New York with research focuses in small data and mobile health.  In 2011, she co-founded Open mHealth, an open-source software non-profit with Dr. Ida Sim (UCSF). Recently her work has focused on the user-contributed data streams that are increasingly available on mobile phones and using that data to contribute to self-monitoring programs with health outcomes.

Previously, Estrin was on the UCLA faculty where she was the Founding Director of the NSF Center for Embedded Networked Sensing (CENS), pioneering the development of mobile and wireless systems to collect and analyze real time data about the physical world and the people who occupy it.


Identifying preferences for mobile health applications for self-monitoring and self-management: focus group findings from HIV-po


Feasibility testing of an automated image-capture method to aid dietary recall


Making Sense of Mobile Health Data: An Open Architecture to Improve Individual- and Population-Level Health



ACM-W Athena Lecturer


WITI Hall of Fame


Anita Borg Institute


Doctor Honoris Causa from EPFL, Uppsala University


Elected into National Academy of Engineering