Documenting all the places
personal data goes.

About theDataMap

theDataMap™ is an online portal for documenting flows of personal data. It tells you where your data goes. The goal is to produce a detailed description of personal data flows in the United States. The effort started with health data and is expanding to all other kinds of personal data. The motivation is to help journalists, advocates, regulators, policy makers and researchers understand the current state of personal data sharing so they can do their jobs better. Our aim is to help the helpers.

A comprehensive data map will encourage new uses of personal data, help innovators find new data sources, and educate the public and inform policy makers on data sharing practices so society can act responsibly to reap benefits from sharing while addressing risks for harm. With funding from the Knight Foundation, we will launch a portal that engages members of the public in a game-like environment to report and vet reports of personal data sharing and to participate in data visualization and analysis competitions.


When you interact with an organization, you often leave behind personal information. The receiving organization may then share your personal information with other organizations without your explicit knowledge. This hidden data sharing can also cause you harm by making personal data available to third parties without your knowledge, while at the same time, it can make it difficult for third-parties with a legitimate interest in your data to obtain it in ways that benefit you and/or society.

Who are we?

theDataMap™ operates as a research project in the Data Privacy Lab, a program in the Institute for Quantitative Social Science (IQSS) at Harvard University. The project leader is Professor Latanya Sweeney.

Our Health Data Origin

We started in 2010 with a targeted effort on health data because that year was a transformational moment for personal health records. Most Americans in 2010 paid bills electronically, emailed photographs, searched the Web for supplemental health information, and completed so many functions online, people rarely visited brick and mortar offices anymore for basic transactions. Yet chances were a child's pediatrician used paper records like those of his grandfather's pediatrician. The technology mismatch was striking. About 61% of Americans looked online for health information in 2009 [Fox and Jones 2009], but only 4% of American office-based physicians used fully functional electronic medical record systems in 2007 [Hing and Hsiao 2010].

The American Recovery and Reinvestment Act of 2009 ("ARRA") ignited a mass exodus from this prehistoric paper age into a tech-savvy networked cosmos using political will and billions of dollars. By 2015, patient measurements, diagnoses, procedures, medications, and demographics, along with physician notes and lab results were no longer stored on paper but in digital format, enabling widespread sharing beyond the doctor-patient encounter. The vision of the benefits of widespread data sharing of medical information is clear. Relevant medical information should flow seamlessly across computers, devices, organizations and locations as needed. Evidence exists that doing so can offer significant improvements to patient care and possibly reduce costs [Chaudhry et al. 2006]. Health data are flowing. For example, PriceWaterHouseCoopers estimates the sharing of personal health information beyond the doctor-patient encounter is now more than a two billion dollar market.

With so much personal data readily available in today's data-rich network savvy world and the sharing of health data rising, it is reasonable to expect to see a litany of personal harms, but pronouncements seem rare. There are many reasons for this, but perhaps the most important is the lack of transparency in data sharing arrangements. These hidden activities make personal harms difficult to detect. How then can policy makers and individuals make educated decisions about privacy and data utility in the absence of such knowledge? There are many worthy uses for personal data beyond the person, so the goal is not to stop data sharing, but to understand the risks so society can address the risks responsibly and reap benefits.

Beyond Health Data

Much of the initial attention given to theDataMap™ stems from health data, but the project is in no way limited to health data. theDataMap™ project includes the full spectrum of sharing personal information.

While our project leader, Latanya Sweeney, was the Chief Technology Officer at the U.S. Federal Trade Commission, she and a group of summer research fellows surveyed popular mobile apps and recorded the personal information sent from the mobile device. We added these findings to theDataMap.

With funding from the Knight Foundation, we will host a series of competitions and activities to broadly expand the nature and volume of documentation on data sharing.

Copyright © 2012-2016 President and Fellows Harvard University.