In my profession, we recognize we have given metadata a vague definition. We define it as “data about data”. My metadata could be your data, and vice versa.
The reason we have a somewhat vague definition of metadata is because the context of how someone or some organization uses the data/metadata defines whether it is either data or metadata.
In this post, I’ll do my best to explain why if metadata is “data about data”, then metadata is also data.
I’ll use smart phone data and metadata as an example. Then, I’ll provide the context in which I think metadata does equal data, using my previous metadata analysis work. If you’d like a more detailed discussion of the general definition of metadata, please read a previous post on the topic.
Your smart phone leaves a trail behind you. This includes the time, date, and location for when you completed an action, your name, what language you speak, and the type of phone you have. It also provides your current location. Technologists sometimes refer to this trail as your digital exhaust or your digital footprint.
This trail becomes more visible if or when you Tweet or email a photo, for example. Not only are you sending a digital image of yourself (the data) over the Internet, but also, you are sending your metadata (the time, date, your name, your spoken language, your phone type, and your location).
Image above via The Wall Street Journal, June 15, 2013. Please click on the image to view a larger version.
Evan Perez and Siobhan Gorman at The Wall Street Journal recently wrote an excellent article called Phone Metadata Proves a Powerful Tool for NSA, Police that details what the government can find out, and how, just by tracing your cell phone metadata.
The typical smartphone user can give off a total of nearly 100 pieces of highly technical data through calls, texts and other activities, according to research by Tracy Ann Kosa, a digital-privacy expert at the University of Ontario. This information includes the time that phones make contact with cellphone towers, the direction of the tower with respect to the phone and the signal strength at the time.
Ms. Kosa said much of the data is “insignificant on its own.” But “every little piece counts,” she said. “Think of it like footsteps—or calories.”
The authors went on to describe how metadata provided a method for police to arrest two robbers. They also detail how metadata brought the extramarital affair between Paula Broadwell and General David Petraeus to the attention of the FBI…and, eventually the public. (I discussed the Broadwell-Patraeus metadata in a previous post.)
Perez and Gorman then explained under what legal auspices the National Security Agency (NSA) gathers metadata, and why the metadata the NSA gathers concerns privacy advocates. They wrote that location data, in particular, brings up issues of “unreasonable search and seizure”, which the authors of the Fourth Amendment of the U.S. Constitution banned. They close the article with a new term: dataveillance.
Dataveillance is “the ability to surveil people through their data trail”.
As we now know, the NSA is gathering metadata about people without a warrant targeted to a specific person, which the Justice Department considers legal because it is metadata.
The problem is, their analysts are using the metadata as data.
If the NSA is gathering metadata about U.S. citizens with the intent of analyzing it as data, they are, in fact, gathering data.
It is illegal to gather data about someone without a warrant that targets a specific individual. Why is it illegal? Because the U.S. Constitution’s Fourth Amendment provides protection against “unreasonable search and seizure”. That is, the Fourth Amendment guarantees U.S. citizens a certain right to privacy. Many people are angry because the NSA is using semantics — e.g., calling the data gathered “metadata” instead of “data” — to circumvent the U.S. Constitution’s restrictions against surveillance of U.S. citizens by their own government, even though the NSA gathers the data in the interests of U.S. national security.
As part of my master’s paper research in 2002, I gathered ~1 million metadata records exposed as the Dublin Core Metadata Element Set (DCMES). I gathered these records from 100 Data Providers (DPs) registered with the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH). I wanted to understand which metadata elements DPs used and did not use.
The moment I gathered the metadata records to examine the DCMES usage, the DPs’ metadata became my data. I learned just as much from analyzing DP metadata usage as I would have learned from examining the actual content of the DCMES fields.
I am not saying I reached the same conclusions by examining the metadata as I would have reached by examining the content (data). I am saying that I learned just as much from analyzing the metadata as I would have learned by examining the content. It was a different type of analysis than content analysis, and yet it provided a lot of details to me that an examination of the content (data) would not have revealed. If an analyst examines metadata, then it may provide the analyst with a much faster, more quantitative analysis than a qualitative content analysis of the data will provide.
Therefore, it is my professional opinion that metadata does equal data, but it does depend on the context in which a person or organization gathers, uses, and/or analyzes the metadata. Because the NSA is gathering metadata with the intent of analyzing it as data for national security reasons, they are gathering data, not metadata.
In your opinion, do you think metadata can also be data? Do you think the Fourth Amendment to the U.S. Constitution should cover your metadata, and not just your data?
[Thanks, @kboughida and @kvanmalssen for reminding me that metadata equals data, as I well know. Interested in learning about your 4th Amendment rights when crossing borders? Then check out this infographic.]