New blog post! Personal Data and 33 bits of Entropy

The concept of personal data revisited – the mathematical approach of identifying a person

As a lawyer, one of the fundamental and most profound questions when working with data protection and the GDPR is: what is personal data? If you are a company or an organisation that started implementing the necessary groundwork to be in compliance with the GDPR, surely, you have pondered over the same question.

Which information constitutes personal data and how much information is required to identify a person? As it turns out, there is an objective way to approach this question.

Personal data

First, let us look at the basics. According to the GDPR, “personal data means any information relating to an identified or identifiable natural person […]”. While this provides some guidance to the criteria of determining whether specific identifiers, alone or jointly, constitutes information able to correctly deduce the identity of a unique individual, it is clear that the definition still leaves ample space for interpretation. Whether the identifiers presented are enough to identify a single person highly depends on the context they are used in. For example, one of the most common identifiers is the name of a person. If you use the name Oscar, which is a relatively common name, it may not always be enough to identify a specific individual. However, if you add additional identifiers such as: address, telephone number etc., the possibilities of it being anyone else than a single person quickly decreases. In addition, it is also possible to use other identifiers to identify an individual. If I say that a certain person is working at Amazon, this in itself is not sufficient information to identify anyone as Amazon has over 500 000 employees. If I tell you that a certain person is a CEO, this is also not sufficient as there are literally millions of companies with CEOs in the world. However, couple the fact that the person working at Amazon also is the CEO there, and it would be easy to deduce that I am talking about Jeff Bezos.

Usually, the situations presented are not as clear-cut and there are many complex situations subject to interpretation where bits and pieces of information are sporadically available. While each piece of given information might be partially revealing about a person, one might wonder whether it would be possible to measure exactly how much information one would need in order to identify someone? To determine such a thing, one could argue, would resemble the act of determining how many grains of sand you would need to build a sandcastle.

Well, it seems to be the case that there is a way to measure the exact amount of information you need, and the information hides behind 33 bits of entropy.

33 bits of entropy

There is a mathematical quantity called entropy which is measured in bits (if you are a lawyer, like me, you might be squirming uncomfortably in your seat right now). Entropy can be thought of as the number of possibilities a random variable can generate. If there are two possibilities, the entropy is one bit. If there are four possibilities the entropy is two bits, and the number of possibilities grows exponentially with each bit of entropy added. As there are around seven billion people on this planet, the entropy would be around 33 bits (e.g. 2 to the power of 33 which gives us around seven billion possibilities). In plain language, this means that you need 33 bits of entropy (footnote 1) to objectively and definitely identify a specific individual. In the same way, identifiers such as name, address and birthday etc., carry with them bits of entropy that may be partially revealing about a person’s identity. By using a mathematical formula (footnote 2), you are able to deduce how many bits of information you might gather from certain factors. Someone’s unique birthday is worth 8,51 bits of information while a certain ZIP code might be worth 10-20 bits of information depending on the area of the ZIP code. According to mathematical theory, if the bits of information are truly unique information bits, by adding the bits of entropy together it is possible to identify a specific person without fail.

With that said, information that does not provide new information e.g. if you know that someone lives in Stockholm, the information that they live in Sweden does not constitute new information, and hence cannot be counted towards the bits of entropy.

Is it really that simple?

In accordance with above, it seems to be possible to simply gather different bits of information, insert them into a mathematical formula and get an answer of whether the accumulated information is enough to identify an individual. Well, turns out it is not that simple. In theory there is no dispute that, this is how you can effectively identify someone. In practice, however, there are several concerns that might be addressed. Take the fact that it is difficult to understand how much information a certain identifier might present. The example above, that the city of Stockholm belongs to Sweden and hence does not bring forth new information, presumes that specific knowledge. Thus, it is not easy to distinguish already known information from new information which leads to an incorrect estimation of the information provided.

It is also necessary to understand that above-mentioned approach must be put in a legal context and therefore discern it from a purely mathematical approach. According to the GDPR, a criterion for the individual to be identifiable is that account should be taken to all the reasonable means at the controller’s or any other person’s disposal. This includes factors such as cost, amount of time and technical means amongst other things. Distinguishing between the objective way of being able to identify a person and a relative way of doing the same provides two different results. While blood, fingerprints and other types of unique biological samples might contain all the bits of entropy required to objectively identify a person, in most contexts there is simply no way of identifying the specific person behind the biological sample. Hence, although identifiers may contain all the necessary information on an objective level to identify a specific individual, in most cases it would not, judicially speaking, count as personal information.

With that said, it seems that privacy lawyers need to be around for a while longer in order to strike the correct balance of what constitutes personal data and not. If it is within a legal context, that is.

[1] The number is closer to 32,84 in reality as the population today is 7,7 billion, but for simplicity’s sake we will round it up to 33.
[2] ΔS = – log2 Pr(X=x), where ΔS is the reduction in entropy and Pr(X=x) the probability of a fact being true e.g. someone’s unique birthday would be 1/365.

This blog post is written by Kenny Chung, lawyer at Synch. Kenny is passionated about privacy issues beyond the ordinary. Read his thoughts about Personal Data and 33 bits of Entropy.

News and Insights
Blog Posts

ANONYMISATION AND PSEUDONYMISATION OF PERSONAL DATA

29/11/2019

This blog post is written by Erik Myrberg, lawyer at Synch Recital 26 of the GDPR clarifies that the principles of data protection should not apply to anonymous information, namely information which does not relate to an identified or identifiable natural person or to personal data rendered anonymous in such a manner that the data subject […]

Press release

Synch has acted as legal advisor to Zington AB

22/11/2019

Synch has acted as legal advisor to Claremont AB (under name change to Zington AB) with trademark strategy work in connection with its expansion plans to becoming a global brand.

Blog Posts

ABOUT THE NEW PROPOSITION ON GENERAL ADVICE FOR CONSUMER CREDITS

14/11/2019

The rules on how consumer credits can be granted and marketed are spread out in several different acts and regulations.

Blog Posts

Strong customer authentication – about the new rules on electronic payments

08/11/2019

Strong customer authentication (SCA) means that a customer must verify his/her identity with two from each other independent factors when using electronic payment methods, for example when using a credit card. The rules, which are based on EU legislation, aims to increase the security of electronic payments and combat fraud. Generally speaking, the legislation does […]

Press release

3 SYNCH LAWYERS RANKED IN Who’s Who Legal

08/11/2019

Since 1996 Who’s Who Legal has identified the foremost legal practitioners in 34 areas of business law. Over 16,000 of the world’s leading private practice lawyers in over 100 countries are featured

Press release

SYNCH HAS ACTED LEGAL ADVISOR TO SKALL STUDIO IN CONNECTION WITH THE TRANSACTION

22/10/2019

Synch has assisted SKALL STUDIO with the transaction. Tobias Kisum has led the transaction.