Maps Highlights Features
News About    
Login Sign-up Rankings
Contact Sponsors    
   theDataMap™ is an online portal for documenting flows of personal data. The goal is to produce a detailed description of personal data flows in the United States.

A comprehensive data map will encourage new uses of personal data, help innovators find new data sources, and educate the public and inform policy makers on data sharing practices so society can act responsibly to reap benefits from sharing while addressing risks for harm. To accomplish this goal, the portal engages members of the public in a game-like environment to report and vet reports of personal data sharing.

When you interact with an organization, you often leave behind personal information. The receiving organization may then share your personal information with other organizations without your explicit knowledge. This hidden data can also cause you harm by making personal data available to third parties without your knowledge, while at the same time, it can make it difficult for third-parties with a legitimate interest in your data to obtain it in ways that benefit you and/or society.


A good example is health data. When you visit a doctor, you expect some organizations to receive information about your visit (e.g., your medical insurance company and your pharmacy), but you might be surprised and not even recognize many of the other entities who may also receive identifiable information about your visit (e.g., a data mining company, your employer, your state government). If you then suffer an economic harm or discrimination as a result of the hidden sharing, you would not know the information was used against you, and if the information was incorrect, you could offer no correction. If a data breach occurs, you would not know your information was stolen because you would have no reason to believe your information was being held by the breached company, yet you could be the victim of identity theft or medical identity theft as a result.

Here is another example. Imagine you find yourself visiting a city in another state, and before you know what happened, you are unconscious in an emergency room. The treating physician has no knowledge of any of your current medications or allergies. If only there was a way to retrieve that information from your local pharmacy and doctor's office, it could save your life! About 75,000 deaths may result each year from preventable medical errors [Kohn et al. 2000]. Suppose an innovative start-up has an ingenious way to deliver this information inexpensively and seamlessly, just in time when needed, to save lives and can do so with privacy protections, but cannot locate sources of information for the task even though there may be many viable sources available. Lives are needlessly lost.

Health Data Motivation

Health data has been late to move from paper to electronic form, but the American Recovery and Reinvestment Act of 2009 ("ARRA"), through meaningful use incentives and health data exchanges, aims to dramatically increase the sharing of patient information in the United States. Most Americans, pay bills electronically, email photographs, search the Web for supplemental health information, and complete so many functions online, people rarely visit brick and mortar offices anymore for basic transactions. Yet chances are a child's pediatrician uses paper records like those of his grandfather's pediatrician. The technology mismatch is striking. About 61% of Americans looked online for health information in 2009 [Fox and Jones 2009], but only 4% of American office-based physicians used fully functional electronic medical record systems in 2007 [Hing and Hsiao 2010].

ARRA attempts to ignite a mass exodus from this prehistoric paper age into a tech-savvy networked cosmos by 2015 using political will and billions of dollars. If successful, patient measurements, diagnoses, procedures, medications, and demographics, along with physician notes and lab results will no longer be stored on paper but in digital format, enabling widespread sharing beyond the doctor-patient encounter. The vision of the benefits of widespread data sharing of medical information is clear. Relevant medical information should flow seamlessly across computers, devices, organizations and locations as needed. Evidence exists that doing so can offer significant improvements to patient care and possibly reduce costs [Chaudhry et al. 2006]. There are signs that health data are flowing. For example, PriceWaterHouseCoopers estimates the sharing of personal health information beyond the doctor-patient encounter is now a multibillion dollar market.

With so much personal data readily available in today's data-rich network savvy world and the sharing of health data rising, it is reasonable to expect to see a litany of personal harms, but pronouncements seem rare. There are many reasons for this, but perhaps the most important is the lack of transparency in data sharing arrangements. These hidden activities make personal harms difficult to detect. How then can policy makers and individuals make educated decisions about privacy and data utility in the absence of such knowledge? There are many worthy uses for personal data beyond the person, so the goal is not to stop data sharing, but to understand the risks so society can address the risks responsibly and reap benefits.

Beyond Health Data

Much of the initial attention given to theDataMap™ stems from health data, but the project is in no way limited to health data. theDataMap™ project includes the full spectrum of sharing personal information. Even if there was a desire to limit attention to health data, the reality is that health data appears in all kinds of other data. Bob Gellman points out that health data does not respect a silo. For example, schools have records about a student's vaccinations, medications, special education needs, illnesses, and more. Motor vehicle departments have records about a driver's medical restrictions (eyeglasses, etc.) and disabilities for special license plates. Gyms, websites, banks, casualty insurers, and many others have health information often mixed together with other data about individuals. Below are kinds of data other than health likely to be included in theDataMap™:

  • driver license information
  • voter registration records
  • birth information
  • marriage information
  • death information
  • real estate property records
  • court records
  • divorce records
  • arrest records
  • postal address information

Copyright © 2012 President and Fellows Harvard University.