What is Your Digital Fingerprint?

binary dataAs a result of Data Privacy Day last week, I have spent the past few days poking around online to see what data about myself I could discover that I didn’t know existed. Before I try to tame others’ data, perhaps I should try taming my own?

I searched under various versions of my name. Now, I admit to engaging in “ego searches” before, but I have never gone through every major search engine and examined every page of the results. Most of it was boring, to be honest. I’m just not that interesting. The links were about this conference, that conference, this old CV, some old presentation. However, some other information I found associated with my name was interesting to me, and it was all new (and news) to me.

For example, I discovered that “someone” had taken a programming assignment from a course I had taken 8 years ago and put the homework online on an assignment sharing site. My name and the course number were still on it, and I was able to compare it to the original assignment. I immediately wrote the company who owned the site, and they did remove the assignment. I also increased security measures on the public html directory provided by my graduate program.

I discovered that someone had stored my master’s paper in a repository in…Argentina. I expected downloads of my master’s paper for personal use. I did not expect it to be stored in a repository without my consent. I found that one to be a bit odd, but I left it alone. I also didn’t realize that Google tracks what I watch on YouTube via iGoogle, if I am logged into my account. I can, however, delete most information about me that Google stores. (Please see the Google Privacy Center for more information on how to view your account-related Google data.)

I also read “‘I’ve Got Nothing to Hide’ and Other Misunderstandings of Privacy” by Daniel J. Solove. In this article, he argues that whether or not you have something to hide isn’t the point. Privacy isn’t about whether or not you have something to hide, it is about what is and isn’t someone else’s business. It is about the balance of power. It is about knowing what the government or a corporation is storing about you, having the right to opt out, and having the right to change any erroneous information. Even if data is anonymized and machine-analyzed, what business is it of the government, corporations, or organizations to hold this data in the first place? Can this information that has been gathered about you without your knowledge or consent be held against you at some future point?

For example, today I learned that the government mandates genetic testing of all newborns. Because it is the law, parents are not required to provide consent before the testing is performed. Did you know that some states will hold copies of your baby’s DNA indefinitely? In MN, the DNA is stored attached to identifying information in the event the child goes missing and/or dies. Some states do allow you to opt out, and will destroy the genetic material upon request. Your DNA is very personal. I don’t mind the testing of babies, I do mind if the DNA is stored by any organization for an indefinite period. In one instance, a baby tested positive for cystic fibrosis, and this result will be stored in her records with the insurance company, because the cost of the test was covered by insurance. (Note: the parents stated they would have paid out of pocket for the testing, if they’d known about the testing requirement beforehand, in order to avoid this black mark on their child’s health insurance record.) Will this information be held against this child down the road? What if other tests are developed, for manic-depression or other disorders? Will the indefinitely stored results of these tests prevent these babies from getting health insurance or a job in the future?

Michelle G. Hough wrote a fascinating article, entitled, “Keeping It to Ourselves: Technology, Privacy, and the Loss of Reserve“. The author defined reserve as the “ability to control what information about us is disclosed, and what is not”. She cites a previous study by Sweeney which found, using 1990 census data, that with only the combination of zip code, birth date, and gender, 87% of the U.S population could be identified. If you combine that data set with a 3rd party “anonymized” data set that contains related information, you could identify the users in the 3rd party data set. The conclusion? We need to think and talk more about privacy, reserve, and how much of those we are willing to lose in exchange for the advantages technological innovation brings us.

Anonymized data is not as “anonymous” as one might desire. The Electronic Frontier Foundation estimates that in order to identify one individual in the entire population of the planet, you need only 32.6 bits of information. The organization is conducting an experiment to determine how unique browser configurations are, and whether or not effective online tracking can be accomplished by corporations, organizations, and/or the government. The experiment is a project called Panopticlick. I went to the web site and let the software test my browser configuration.

I learned that I have 19 bits of identifying information in my browser fingerprint. However, my browser fingerprint does appear to be unique among the 572,016 browsers tested so far.

I encourage you to poke around online and check your digital fingerprint. This was a time-consuming exercise for me, but an enlightening one.

Please let me know what you think....