ICO Consults on Anonymisation Code of Practice

May 30, 2012

The transparency agenda, and the increasing amount of information being released into the public domain, has brought the issues surrounding personal data anonymisation to the fore.  The ICO has now launched a public consultation on its draft code of practice on anonymisation.  The Information Commissioner stated ‘Anonymisation can allow organisations to publish or share useful information derived from personal data, whilst protecting the privacy rights of individuals. Our code will aim to provide clear, practical advice on how data can be anonymised. We are now inviting individuals and organisations to submit their views on how this can best be achieved.’

Overview

Anonymisation means the conversion of personal data into a form in which individuals are no longer identifiable.  The code explains the need for, and the legal implications of, the anonymisation of personal data, how to assess the risk of deanonymisation and provides some examples of anonymisation techniques.  The underlying premise of the code is that publication of anonymous data does not amount to disclosure of personal data (and thus the Data Protection Act does not apply to that disclosure) even though the data controller still holds information that would permit re-identification (following the reasoning of Lord Hope in Common Services Agency v Scottish Information Commissioner [2008] UKHL 47). 

The code ambitiously aims to cover public, private and third sector organisations, and includes a number of examples from a Freedom of Information Act (FOIA) perspective.  Parts of the code are written with a public interest agenda in mind, although much of the good practice guidance will be applicable to all circumstances when anonymised data is to be released. 

Has personal data been anonymised effectively?

The code addresses this question by advising on how a data controller might assess the risk of deanonymisation.  Could the anonymised data be combined with other publicly available information, such as the edited Electoral Roll, allowing an individual to be identified (so called ‘jigsaw identification’)?  The code stresses that this assessment will be ‘unpredictable’ because ‘it can never be known what data is already available or what data may be released in the future.’

The code suggests the adoption of a ‘motivated intruder’ test to determine whether ‘a) the anonymised information…will allow the re-identification of the individuals, or b) whether anyone would be likely to do this in practice.’  Issues to consider include what other ‘linkable’ information is available publicly or easily, and what technical measures might be used to achieve re-identification. This will require a case-by-case risk analysis for each dataset and close liaison between IT, legal and subject matter experts.  Indeed, the code may benefit from additional content from a technical perspective, in particular to illustrate the effect that the Internet, social media and new technologies may have on the risks of deanonymisation, a gap that input from professional bodies may be able to fill.

The code advises that it is ‘good practice’ to review the risks of jigsaw identification periodically ‘bearing in mind that subsequent data releases and the development of new techniques may facilitate re-identification that was impossible previously.’  Data protection obligations in these circumstances could include takings steps to withdraw access to the information (although, as the code mentions, this may be difficult to achieve if the information has been released under the transparency agenda, the FOIA or is anywhere on the Internet).  If released under licence, contractual conditions could be included to deal with this potential scenario.

Personal knowledge

The recently published NHS information strategy gives an example of how personal knowledge could impact on the effectiveness of anonymisation: ‘if data at hospital episode level were to be released – if someone knows the hospital, admission date and approximate age of the patient, they may well be able to deduce which record relates to that person’ (see The power of information: putting all of us in control of the health and care information we need, Department of Health, May 2012).   

Although, as the NHS strategy points out, the privacy risks are likely to be ‘low’, the ICO suggests that it is good practice to assess whether there are any individuals with the necessary knowledge to achieve re-identification, how likely it is that the anonymised information will come to their attention or be sought out, how they are likely to act and what the consequences of re-identification are likely to be for the data subject concerned.  To this could be added the questions ‘would the re-identification in fact increase the knowledge that the informed individual has and would this be to the data subject’s detriment?’ 

However, it is perhaps the question of the probability of this outcome occurring, and thus what weight should be given to the issue of personal knowledge, that may prove the most challenging for data controllers. 

The ‘educated guess’

The code covers the scenario where someone makes an educated guess that information is about a particular person.  On the one hand, the code says that ‘even where a guess based on anonymous information turns out to be correct, this does not mean that a disclosure of personal data has taken place.’  On the other, ‘the consequences of releasing the anonymised information may be such that a cautious approach should be adopted.’ 

That an educated guess results in re-identification will surely be a possibility for many databases, but the probability of an educated guess resulting in re-identification may again be a challenge to determine.  Could it have been anticipated, for example, that someone could have returned a digital camera lost underwater to its owners by deducing their identity through clues in the pictures from the undamaged memory card and linking those clues to other online information? (See The scuba detective: Diver finds camera in a French river… and uses snaps to find Welsh owner.)

Techniques

The code gives guidance on spatial information, including principles developed from the ICO’s crime mapping guidance, for instance the larger the number of properties or occupants in a mapping area, the lower the privacy risk.  Appendix 1 to the code also contains some worked examples of perturbation techniques.  Such techniques should not be applied on a ‘one size fits all’ basis but ‘in discussion with the data recipient.’  Otherwise the efficacy of the data may be lost.  In the Data Swapping example, the swapped attribute is age, rendering the data useless if it is to be used to research the link between age and income bracket.

Re-identification testing

Not to be confused with techniques used to test the security of a computer system against electronic attack, the code suggests that it is good practice to pen-test re-identification vulnerabilities prior to release of anonymous data, often by engaging a third-party provider that may be more aware than the data controller of relevant information sources, techniques or vulnerabilities.

Of course, a provider will be limited to acquiring lawfully obtainable data sources and so the data controller will need to give consideration to other (illegal) methods available to an intruder, for instance abuse of the full Electoral Roll.

The consultation will run until 23 August with a final code due to be published in September. 

Marion Oswald is a Solicitor and Senior Lecturer at the Centre for Information Rights, University of Winchester