Pseudo-pseudonymisation

Viewpoints
May 31, 2022
2 minutes

Most of us have words we struggle to say.  Mine are “Sharing” and “Plates”. Pseudonymisation seems to be a common one within the privacy community —and not without reason.

Pseudonymised: where the data can’t be attributed to an individual without the use of additional information that is kept separately.  Anonymised: where the data no longer identifies the individual.  The GDPR applies to the former but not the latter. We see “de-identified” being used variously to mean pseudonymised and anonymised, so it’s best to avoid the term altogether if it can be helped.   

The use and protection of pseudonymised data is something of a hot topic in the UK.  To be clear, it’s probably just a hot topic for those in the privacy and medical research communities.  Other hot topics are available.

Last month, medConfidential — an organisation that campaigns for confidentiality and consent in health and social care — sent a letter to various NHS trusts requesting that they terminate their contracts with British data analytics firm Sensyne Health and require Sensyne to return or delete the data it processes for the trusts.  According to medConfidential, Sensyne’s database of 13 million UK patients contained pseudonymised data rather than anonymised data, as Sensyne had claimed in its contracts.  The result is that both parties had likely been operating under false pretenses vis a vis the data and their obligations re. the same: anonymised = no GDPR obligations; pseudonymised = lots of GDPR obligations.

medConfidential’s concerns echo the findings of Professor Ben Goldacre’s recent review into the use of health data in the NHS.  Amongst the 185 recommendations that his report makes to the UK Government, Goldacre argues that the current system of relying on pseudonymisation is inadequate to protect patient data, because as data sets grow larger and more detailed it becomes easier to re-identify individuals.  As a case in point, Goldacre says that knowing the approximate date range in which someone had a medical intervention, their approximate age and their approximate location can be enough to re-identify someone in a pseudonymised dataset.

According to Goldacre, the answer is that a small number of “Trusted Research Environments” should be rapidly adopted to provide greater oversight over NHS data and fewer opportunities for re-identification and misuse.  Additional measures he recommends in the medical context, but which also bear considering for pseudonymisation strategies more broadly, include the removal of sensitive codes prior to the dissemination of data sets, sub-sampling and data perturbation (now that’s tricky to say).

At the same time, the ICO is currently consulting on an updated version of its anonymisation and pseudonymisation guidance, so it's a good time to be thinking about these issues as they apply to your business.  And if you're anything like me, also figuring out how to pronounce any tongue-twisting new words that enter the privacy and security lexicon.