Documenting all the places
Health Data - all_health_data.zip
Mobile Apps Data - all_mobile_data.zip
There are two datasets available for theDataMap Visualization Contest. The health data highlights data sharing arrangements of personal health information in the United States, while the mobile apps data documents which third parties popular mobile applications are sending user information to. The data found in these files presents information you can find here on theDataMap; you may look at the pages here to acquire more detailed information. Your team only needs to submit one visualization. You can choose one dataset to use, or you can combine data from both datasets.
Below are detailed descriptions of each dataset along with images of the current corresponding visualizations.
Hospitals collect lots of information about patients. For a standard visit, names, addresses, phone numbers, and even social security numbers are processed through some electronic database.
This data does not always stay in the hospitals, however. Forty-eight states in the United States collect patient data statewide for legal and governmental use. The Health Information Portability and Accountability Act (HIPAA), established by the U.S. government, mandates that every state follows a strict set of rules and regulations to releasing personal health information to the public- all so that the identity of the patient is not fully disclosed.
There are a multitude of named organizations in the United States that participate in this sort of patient data sharing, the variety of which differs from state to state. Said data is also sometimes sold to certain buyers.
The health data is organized across 6 files. One file that lists the organizations (orgsindex), a list of categories that describe organizations (categories), an association of organizations and categories (catsorg), relevant information associated with each category (categories_info), examples of breaches associated with different categories and organizations (prcbreaches2005-18), and finally, a list of edges, or directions of data transfer, between categories (edges).
orgsindex.csv is a list of organizations and entities whose data sharing transaction(s) appear on theDataMap. The file has 5351 rows in total, not including the header row. The fields are as follows:
categories.csv is a list of categories of data holders of health data. The file has 54 rows, not including the header row. These correspond to the nodes on the graph itself. The fields are:
catsorgs.csv is an association list of categories (CatID) from the categories file and organizations (OrgID) from the OrgsIndex file. The file has 5336 rows, not including the header row. The fields are:
categories_info.xlsx contains a longer description of each category that appears when a particular node, or category, on the graph is clicked. The file has 54 rows, not including the header row. The fields are:
prcbreaches2005-18.csv contains a list of breaches associated with different categories and organizations. The file has 4126 rows, not including the header row. The fields are:
edges.xlsx contains paths/directions of data transfer from one category to another in order to form the edges seen on the current visualization for the health data. The file has 169 rows, not including the header row. The fields are:
Below are the images rendered for the current visualization of health data on theDataMap, and examples of the existing features.
Hovering over a node gives a popup description of the category, which can be found in the Hover field in categories.csv. See below.
Clicking on a node shows a list of organizations documented as sharing data on one of the edges incident to the node. Below the image, an additional description of the category is loaded, which can be found in the Information field of categories_info.xlsx. Below this are examples of breaches, which can be generated using the Example (HTML) field from prcbreaches2005-18.csv. See below.
MOBILE APPS DATA
The mobile apps data is organized across 2 files, each containing 2 sheets. These data were collected from a study that surveyed 110 popular apps and documented which domains the apps were sharing user information with. One file lists the results for the Android apps that were studied (20160614_android_apps), while the other file lists the results for the iOS apps (20160614_ios_apps).
20160614_android_apps.xlsx contains two sheets, android_canaries and android_third_party_designation.
20160614_ios_apps.xlsx contains two sheets, apple_canaries and apple_third_party_designation.
Below is the image currently rendered for mobile app data on theDataMap. Apps (left) are connected to various domains (right). The color of the line indicates whether the domain is that of the primary maker (orange) of the app or of a third party (black). Apps with larger circles shared sensitive data with more domains, both primary and third-party.
Copyright © 2012-2016 President and Fellows Harvard University.