New blog post! Personal Data and 33 bits of Entropy

The concept of personal data revisited – the mathematical approach of identifying a person

As a lawyer, one of the fundamental and most profound questions when working with data protection and the GDPR is: what is personal data? If you are a company or an organisation that started implementing the necessary groundwork to be in compliance with the GDPR, surely, you have pondered over the same question.

Which information constitutes personal data and how much information is required to identify a person? As it turns out, there is an objective way to approach this question.

Personal data

First, let us look at the basics. According to the GDPR, “personal data means any information relating to an identified or identifiable natural person […]”. While this provides some guidance to the criteria of determining whether specific identifiers, alone or jointly, constitutes information able to correctly deduce the identity of a unique individual, it is clear that the definition still leaves ample space for interpretation. Whether the identifiers presented are enough to identify a single person highly depends on the context they are used in. For example, one of the most common identifiers is the name of a person. If you use the name Oscar, which is a relatively common name, it may not always be enough to identify a specific individual. However, if you add additional identifiers such as: address, telephone number etc., the possibilities of it being anyone else than a single person quickly decreases. In addition, it is also possible to use other identifiers to identify an individual. If I say that a certain person is working at Amazon, this in itself is not sufficient information to identify anyone as Amazon has over 500 000 employees. If I tell you that a certain person is a CEO, this is also not sufficient as there are literally millions of companies with CEOs in the world. However, couple the fact that the person working at Amazon also is the CEO there, and it would be easy to deduce that I am talking about Jeff Bezos.

Usually, the situations presented are not as clear-cut and there are many complex situations subject to interpretation where bits and pieces of information are sporadically available. While each piece of given information might be partially revealing about a person, one might wonder whether it would be possible to measure exactly how much information one would need in order to identify someone? To determine such a thing, one could argue, would resemble the act of determining how many grains of sand you would need to build a sandcastle.

Well, it seems to be the case that there is a way to measure the exact amount of information you need, and the information hides behind 33 bits of entropy.

33 bits of entropy

There is a mathematical quantity called entropy which is measured in bits (if you are a lawyer, like me, you might be squirming uncomfortably in your seat right now). Entropy can be thought of as the number of possibilities a random variable can generate. If there are two possibilities, the entropy is one bit. If there are four possibilities the entropy is two bits, and the number of possibilities grows exponentially with each bit of entropy added. As there are around seven billion people on this planet, the entropy would be around 33 bits (e.g. 2 to the power of 33 which gives us around seven billion possibilities). In plain language, this means that you need 33 bits of entropy (footnote 1) to objectively and definitely identify a specific individual. In the same way, identifiers such as name, address and birthday etc., carry with them bits of entropy that may be partially revealing about a person’s identity. By using a mathematical formula (footnote 2), you are able to deduce how many bits of information you might gather from certain factors. Someone’s unique birthday is worth 8,51 bits of information while a certain ZIP code might be worth 10-20 bits of information depending on the area of the ZIP code. According to mathematical theory, if the bits of information are truly unique information bits, by adding the bits of entropy together it is possible to identify a specific person without fail.

With that said, information that does not provide new information e.g. if you know that someone lives in Stockholm, the information that they live in Sweden does not constitute new information, and hence cannot be counted towards the bits of entropy.

Is it really that simple?

In accordance with above, it seems to be possible to simply gather different bits of information, insert them into a mathematical formula and get an answer of whether the accumulated information is enough to identify an individual. Well, turns out it is not that simple. In theory there is no dispute that, this is how you can effectively identify someone. In practice, however, there are several concerns that might be addressed. Take the fact that it is difficult to understand how much information a certain identifier might present. The example above, that the city of Stockholm belongs to Sweden and hence does not bring forth new information, presumes that specific knowledge. Thus, it is not easy to distinguish already known information from new information which leads to an incorrect estimation of the information provided.

It is also necessary to understand that above-mentioned approach must be put in a legal context and therefore discern it from a purely mathematical approach. According to the GDPR, a criterion for the individual to be identifiable is that account should be taken to all the reasonable means at the controller’s or any other person’s disposal. This includes factors such as cost, amount of time and technical means amongst other things. Distinguishing between the objective way of being able to identify a person and a relative way of doing the same provides two different results. While blood, fingerprints and other types of unique biological samples might contain all the bits of entropy required to objectively identify a person, in most contexts there is simply no way of identifying the specific person behind the biological sample. Hence, although identifiers may contain all the necessary information on an objective level to identify a specific individual, in most cases it would not, judicially speaking, count as personal information.

With that said, it seems that privacy lawyers need to be around for a while longer in order to strike the correct balance of what constitutes personal data and not. If it is within a legal context, that is.

[1] The number is closer to 32,84 in reality as the population today is 7,7 billion, but for simplicity’s sake we will round it up to 33.
[2] ΔS = – log2 Pr(X=x), where ΔS is the reduction in entropy and Pr(X=x) the probability of a fact being true e.g. someone’s unique birthday would be 1/365.

This blog post is written by Kenny Chung, lawyer at Synch. Kenny is passionated about privacy issues beyond the ordinary. Read his thoughts about Personal Data and 33 bits of Entropy.

News and Insights
Press release

Synch – Intergiro

20 hours ago

Synch has delivered a contract management solution over its digital platform WeSynch to Intergiro Intl AB (publ), a Fintech which is redesigning corporate banking from zero. Intergiro offers a digital alternative to the hassle and stress of opening a bank account, built for the 2.5 million businesses born each year in Europe. Its mission is to […]

Press release

Synch has acted as legal advisor to Lingio AB


Synch has acted as legal advisor to Lingio AB when the company raises capital in its first financing round amounting to SEK 17 million. Almi Invest acted as lead investor in the round where Add Value and Austrian venture fund Calm/Storm Ventures participated together with renowned angel investors. Lingio has been developed to tackle the […]

Blog Posts

Covid-19; kan force majeure tillämpas?


This blog post is written by Anders Hellström and Josefin Skyttedal, lawyers at Synch Med anledning av den snabba spridningen av Coronaviruset, är det många som undrar över hur virusets spridning kan påverka deras affärsverksamhet. Särskilt kan detta gälla om effekterna av viruset medför eller utgör hinder mot att uppfylla avtal. Möjligheten att uppfylla åtaganden […]

Press release

Synch has acted as legal advisor to Mavenoid AB


Synch has acted as legal advisor to Mavenoid AB in its recent A round investment of $8 million. New investor Mosaic Ventures together with existing investors Creandum and Point Nine Capital acted as lead investors. Shahan Lilja, Founder and CEO of Mavenoid comments on the transaction and assistance by Synch: ”Synch understands the nuances of […]

Press release

Mathilda Nordmark and Sara Sparring received excellent feedback in the World Trademark Review


Congratulations to Sara Sparring, Mathilda Nordmark and the Trademark-team for the excellent feedback from trademark specialist on the market. The WTR 1000 is the only guide exclusively dedicated to identifying the world’s leading trademark legal services providers. In WTR 1000 2020 Synch is highlighted as: “If you’re looking for a modern and technology-focused firm, Synch […]

Blog Posts

The Consumer’s Right of Withdrawal


This blog post was written by Veronica Uddsten, lawyer at Synch Businesses compete not only with their goods and services but also with their sales terms. By giving customers e.g. the possibility to return products if not satisfied, companies may become more attractive. In this blog post in our series on consumer protection, we will examine […]