The first article in this series can be found here.
Part 3: Data security and data minimisation
Further guidance in relation to security |
---|
The 2014 ICO report on “Protecting personal data in online services: learning from the mistakes of others” may contain useful information and guidance. The National Cyber Security Centre (NCSC) has also produced guidance on maintaining code repositories. The above noted, data security is an area in which further specific ICO guidance is anticipated. |
Data security
The Guidance notes that AI systems can both exacerbate existing security risks and present challenges in relation to data minimisation.
As with non-AI software projects, additional security risks may arise when there is a reliance on third party code and third party suppliers. Moreover, AI systems frequently operate within a larger set of business processes and other software systems and data flows. This can make implementing a holistic approach to security challenging.
The Guidance recommends reviewing risk management practices and outlines what might be described as good practice in relation to ensuring the security of personal data (including, for example, deploying de-identification techniques).
The Guidance also provides a useful, although high level, summary of the types of privacy attacks that may apply to AI models (including model inversion attacks, membership inference attacks and black and white box attacks); as well as discussing adversarial examples (these being examples deliberately placed into an AI system with modifications that are readily identifiable (one example given is images of road signs with stickers placed on them).
Importantly, readers are reminded that some types of AI models contain training data by design (‘support vector machines’ (SVMs) and ‘k-nearest neighbours’ (KNN) models both contain some training data in models). The result is that, where the training data is personal data, any party accessing the model will have access to a subset of personal data within the model and the storage and use of these data will amount to processing of these data.
Data minimisation
Further guidance in relation to data minimisation |
---|
The ICO’s 2017 report on Big data, artificial intelligence, machine learning and data protection contains further guidance in relation to data minimisation in the context of AI systems. The report also contains useful discussion in relation to other topics and principles of data protection law that are of particular relevance to AI systems including, for example, accuracy. |
Article 5(1)(c) of the GDPR requires that “Personal data shall be adequate, relevant and limited to what is necessary in relation to the purposes for which they are processed”. One of the features of many AI systems is that they require large amounts of data. Generally, an AI system will be more statistically accurate the more data it is trained on so there is a tendency among some AI developers to attempt to acquire and process all available relevant data.
The Guidance stresses the need to determine what data is ‘adequate, relevant and limited to what is necessary’, this determination naturally being very context specific. It also stresses the need to consider data minimisation at the design phase, perhaps by mapping out all of a system’s processes that might access personal data.
The Guidance does touch on particular considerations in relation to processing personal data in supervised machine learning models (namely, those that learn or are ‘trained’ on training data sets) at both the training and inference phases but see the ICO report on Big data, artificial intelligence, machine learning and data protection for more detail.1
Privacy enhancements
Privacy enhancement methods |
---|
A range of privacy enhancement methods exist including perturbation (or adding ‘noise’ to a data set), using synthetic or artificial data, rather than actual personal data and using aggregated patterns or gradients, rather than direct use of underlying personal data. The Guidance recommends considering the use of these and other methods for enhancing the privacy of individuals when developing and training AI systems. |
Finally, this section of the Guidance contains a useful review of methods that can be used to enhance the privacy of AI systems.2 These include:
- Perturbation: Random changes are made to data points in a manner that preserves the statistical properties of the data points. This makes the data less accurate at the individual level.
- Synthetic data: Generation of artificial data, which would not be personal data as it would not relate to a living individual. Where based on personal data, however, there may be a re-identification risk.
- Federated learning: Different parties train models on separate ‘local’ data sets, patterns (or ‘gradients’) are shared between the parties into a ‘global’ model. An emerging application area for federated learning is in medical research, where lessons from local patient datasets are centrally amalgamated. As with synthetic data, there is a risk of re-identification. Participating entities might need to consider that they may be joint controllers of each other’s data.
- Local inferences: A model is hosted locally, rather than on the cloud or on a central server.
- Private query approaches: Elements of a query are kept private and are not shared with the entity running the relevant model.3
Part 4: Individual rights
Further guidance in relation to individual rights |
---|
The ICO’s guidance on individual rights should be read as a companion document to the Guidance in relation to individual rights in the context of AI systems. |
The final section of the Guidance is addressed at those in compliance focused roles, responsible for responding to individual rights requests. The section outlines the challenges faced in ensuring individual rights are respected across the AI development and deployment life cycle and describes the role of meaningful human oversight.
The section also gives details in relation to individual rights in the context of AI, including the right to rectification, the right to erasure, the right to data portability and the right to be informed.4
Particular considerations in relation to training data are discussed. In summary, whilst training data may be more difficult to link to a particular individual than other data, this extra degree of difficulty may not be sufficient for these data not to amount to personal data. Nonetheless, in this context, note that organisations can charge a fee for or refuse requests that are manifestly unfounded or excessive.
The Guidance also discusses particular difficulties in relation to the right to rectification or erasure of personal data contained in training data sets. Typically, where a system is trained on the personal data of large numbers of individuals, the erasure of one individual’s personal data is unlikely to affect ability to train an AI system; however, some AI models may contain personal data or can infer personal data (an example being Support Vector Models that can contain examples from training data to help distinguish new examples encountered when the model is deployed). In the latter type of model, it may be necessary to either re-train or even delete the model if there is a request for rectification or erasure. To help reduce potential costs, the Guidance suggests implementing models so as to make personal data retrieval more efficient, together with developing a streamlined model re-deployment pipeline.
Further guidance on individual safeguards |
---|
The ICO has published separate guidance on Rights related to automated decision making including profiling and there is relevant guidance at the European level titled Guidelines on Automated individual decision-making and Profiling for the purposes of Regulation 2016/679 (wp251rev.01). Further, the ICO and The Alan Turing guidance on Explaining decisions made with Artificial Intelligence is likely to be relevant. |
Outsourcing AI services can make responding to individual rights requests much more complex, potentially making understanding and agreeing controller/processor relationships at the outset of an outsourcing arrangement particularly important.
Solely automated decisions with legal or similar effect
Although subject to certain important exceptions, GDPR Article 22 gives individuals the right not to be subject to a solely automated decision producing legal or similarly significant effects. And, in relation to such decisions with such effects, individuals have available to them what the Guidance terms safeguards5 including the rights for individuals to:
- obtain human intervention;
- express their point of view;
- contest the decision made about them; and
- obtain an explanation about the logic of the decision.
Particular considerations and risk factors in relation to human intervention or inputs in AI systems are discussed in the Guidance, including automation bias (including automation induced complacency) and lack of interpretability. In these respects, the Guidance stresses the importance of staff awareness and training, as well as ongoing monitoring.
Conclusion
Overall, the Guidance outlines best practices for data protection compliance in AI systems and aims at giving its readers the means to assess risks to individual rights and freedoms, as well as the means of assessing the measures available to mitigate risks and process personal data fairly.
Although the Guidance is, naturally, focused on those AI systems processing personal data, many of the principles and approaches outlined will be of relevance outside the field of AI to those leading edge software development and deployment projects that process personal data, for example, to projects processing personal data using deterministic algorithm based technologies.
Concluding thoughts |
---|
Given the data protection challenges posed by innovative uses of personal data in AI systems, it is essential to consider compliance with core GPDR principles as part of the design phase of any AI project that will process personal data. Moreover, given the rapid ‘evolution’ of AI systems and the further development of such systems after their deployment, personal data controllers would be well advised to review their data processing and their compliance with data protection law regularly. For example, one distinct possibility is that the purpose or purposes of processing change over time, outside of the original purpose specification. Not only will it be advisable to also consider the wider corpus of relevant ICO guidance already in existence but, particularly given the dynamic nature of the AI space, stakeholders across AI value chains should seek to stay up to date with future guidance and case law. developments. |
The AI space is dynamic. Not only are AI systems becoming more pervasive in industry but the underlying AI technologies themselves continue to evolve and develop. This dynamism will, doubtless, drive further guidance from the ICO (and other regulators) as well as driving the development of case law, as disputes between stakeholders seem inevitable.
The relative complexity of AI systems and the fact that many AI systems may be opaque highlights questions around who will be liable in the event that ‘things go wrong’, particularly where elements of a system’s development or deployment have been outsourced. An extra layer of complication may arise in cloud computing implementations of AI systems.
Earlier this year, I interviewed Mr Simon McDougall, the ICO’s Deputy Commissioner, on the topic of AI and data protection.6 I asked Mr McDougall where we might see most ICO enforcement activity in relation to AI based technologies in the future. In his answer, Mr McDougall drew a comparison with the cyber security space, noting that most ICO cyber security enforcements involve breaches that would have been avoidable had industry standard approaches been adopted. One possible insight is that much of the risk around AI systems could be avoided or mitigated by simply by adopting leading practice in software development, deployment and procurement. Whilst this seems correct to me, as the Guidance makes clear, the way that personal data is processed (and even created or inferred) as AI systems are developed and deployed means that, in addition to what might be described as typical risks, there may be special considerations and particularly challenging judgements to be made as those developing, deploying and procuring AI systems seek to comply with data protection rules and principles.
Quentin Tannock is a barrister at 4 Pump Court, a barristers’ chambers with expertise in areas including information technology, telecommunications and professional negligence. Quentin has a broad commercial practice with particular focus on commercial litigation in the areas of technology and IP.
Notes & Sources
1. Available here: https://ico.org.uk/media/for-organisations/documents/2013559/big-data-ai-ml-and-data-protection.pdf
2. For a discussion of differential privacy see the report at: https://gss.civilservice.gov.uk/wp-content/uploads/2018/12/12-12-18_FINAL_Privitar_Kobbi_Nissim_article.pdf
3. For example, see: TAPAS: Trustworthy privacy-aware participatory sensing; Leyla Kazemi, Cyrus Shahabi (2012) available at https://infolab.usc.edu/DocsDemos/kazemi-TAPAS-KAIS.pdf
4. For the ICO’s general guidance on individual rights, visit: https://ico.org.uk/for-organisations/guide-to-data-protection/guide-to-the-general-data-protection-regulation-gdpr/individual-rights/
5. The safeguards differ, depending on whether the processing falls under Part 2 or Part 3 of the Da-ta Protection Act 2018. In general terms, Part 2 concerns general processing and applicable safe-guards under Part 2 will depend on whether the lawful basis for that processing is a requirement or authorisation by law. Part 3 concerns law enforcement processing, in which case applicable safe-guards depend on regulations provided in the relevant law (although the individual has the right to request reconsideration of decisions or for decisions to be taken that are not based on solely auto-mated processing).
6. The full interview is available as part of the 4 Pump Court podcast series, here: https://www.4pumpcourt.com/podcast-the-ico-perspective-on-ai-and-data-protection-with-simon-mcdougall/.