Ways out of unlawful data processing by AI

The New York Times recently filed a lawsuit against Microsoft and OpenAI. The reason: possible copyright infringements through the use of millions of articles from the New York Times to train ChatGPT. This lawsuit gives cause to consider whether the unlawful processing of data in the design, training or operation phases of artificial intelligence (AI) systems requires that the models be retrained or whether alternative solutions are available; for retraining the systems is time-consuming and cost-intensive.

Initial statements on the use of AI published by data protection authorities

Since not only the use of copyrighted material can lead to problems, but also the processing of personal data for training AI, the discussion paper on the use of AI recently published by the State Commissioner for Data Protection and Freedom of Information of Baden-Württemberg (Landesbeauftragter für Datenschutz und Informationsfreiheit Baden-Württemberg, LfDI BW) is of interest. Further supervisory authorities have also published initial statements on the use of AI: the Hamburg Commissioner for Data Protection and Freedom of Information on the use of LLM-powered chatbots (published on 13 November 2023) and the French Data Protection Authority a self-assessment guide for AI systems.

The LfDI BW points out two methods that developers and operators of AI models could use to avoid subsequent corrections to their systems or that they could implement if necessary: 1. the method of "differential privacy" and 2. "machine unlearning".

Differential privacy as the gold standard for secure AI models

The aim of the "differential privacy" method is to ensure that an individual data record does not have too great an influence on the finished trained model. In this method, random values are added to the data worthy of protection in complex calculation processes. For this purpose, hashing methods and the insertion of "mathematical noise" are used. In addition, only small sections of the databases are analysed. This method is intended to make it impossible to derive personal data from the trained AI model. For this reason, differential privacy is considered the new gold standard for data protection-friendly technologies.

Machine unlearning as a data protection-friendly approach for the targeted deletion of training data

In AI models, machine unlearning is based on the right to be forgotten and the erasure of data (Art. 17 GDPR). In principle, it is difficult to delete individual personal data from the finished trained AI system in a targeted manner. This is because the individual pieces of information that the models have learnt from the respective data during training cannot be traced or can only be traced in part. It is therefore virtually impossible to remove individual data records from the learning result without a trace. The aim of "unlearning" is to make it possible to delete old and incorrect data from the system in a targeted manner. However, the development of this data protection-friendly approach is still in its infancy. In the middle of last year, Google announced an "unlearning challenge" and sought AI engineers who could realise this approach. In the future, this approach could make it considerably easier for companies that integrate AI models into their business processes to implement the requirements of the GDPR.


Back to list