Documenting all the places
personal data goes.


How were the mobile app datasets acquired?

The mobile app data is the result of a survey in which we tested 110 popular Android and iOS apps. We found out that most of these apps share personal information, like email address and location, with many other third-parties.

I see lots of rows with the same app and host in android_canaries and apple_canaries. What do they represent?

Each row in android_canaries and apple_canaries represents a flow of communication between a particular mobile app and a domain, accounting for information that is transmitted from the app to its destination and vice-versa. Many of these apps tend to communicate with the Internet at very frequent intervals to send different types of not just personal data, but also data such as text and images. Facebook on iOS, for example, sends the user's address, birthday, name, gender, and other such personal information to In essence, the repeating rows illustrate how often a mobile app sends data. Every connection is distinct; they only differ in what data is being sent. While the survey was being conducted, there were a few background processes that could not be shut off on the phones that were used. Android or iOS tended to send traffic to certain domains during our testing phase, and those connections might have been recorded as belonging to the app that was currently running.

Where can I find more background information on the data materials?

If you would like to learn more about the rationale behind the data materials, please take a look at the Health Data and Mobile Data pages here on theDataMap. We also have an article that discusses the survey conducted to acquire the mobile app data on Technology Science.


If you have any questions that are not part of this FAQ, please email Ji Su Yoo at

Copyright © 2012-2016 President and Fellows Harvard University.