The European Data Protection Board has published an opinion addressing data protection in AI models. It covers assessing AI anonymity, the legal basis for processing data, and mitigation measures for impacts on data subjects for tech companies operating in the bloc.
It was published in response to a request from Ireland’s Data Protection Commission, the lead supervisory authority under the GDPR for many multinationals.
What were the key points of the guidance?
The DPC sought more information about:
- When and how can an AI model be considered “anonymous” — those that are very unlikely to identify individuals whose data was used in its creation, and therefore is exempt from privacy laws.
- When companies can say they have a “legitimate interest” in processing individuals’ data for AI models and, therefore, don’t need to seek their consent.
- The consequences of the unlawful processing of personal data in the development phase of an AI model.
EDPB Chair Anu Talus said in a press release: “AI technologies may bring many opportunities and benefits to different industries and areas of life. We need to ensure these innovations are done ethically, safely, and in a way that benefits everyone.
“The EDPB wants to support responsible AI innovation by ensuring personal data are protected and in full respect of the General Data Protection Regulation.”
When an AI model can be considered ‘anonymous’
An AI model can be considered anonymous if the chance that personal data used for training will be traced back to any individual — either directly or indirectly, as through a prompt — is deemed “insignificant.” Anonymity is assessed by supervisory authorities on a “case-by-case” basis and “a thorough evaluation of the likelihood of identification” is required.
However, the opinion does provide a list of ways that model developers might demonstrate anonymity, including:
- Taking steps during source selection to avoid or limit the collection of personal data, such as excluding irrelevant or inappropriate sources.
- Implementing strong technical measures to prevent re-identification.
- Ensuring data is sufficiently anonymised.
- Applying data minimisation techniques to avoid unnecessary personal data.
- Regularly assessing the risks of re-identification through testing and audits.
Kathryn Wynn, a data protection lawyer from Pinsent Masons, said that these requirements would make it difficult for AI companies to claim anonymity.
“The potential harm to the privacy of the person whose data is being used to train the AI model could, depending on the circumstances, be relatively minimal and may be further reduced through security and pseudonymisation measures,” she said in a company article.
“However, the way in which the EDPB is interpreting the law would require organisations to meet burdensome, and in some cases impractical, compliance obligations around purpose limitation and transparency, in particular.”
When AI companies can process personal data without the individuals’ consent
The EDPB opinion outlines that AI companies can process personal data without consent under the “legitimate interest” basis if they can demonstrate that their interest, such as improving models or services, outweigh the individual’s rights and freedoms.
This is particularly important to tech firms, as seeking consent for the vast amounts of data used to train models is neither trivial nor economically viable. But to qualify, companies will need to pass these three tests:
- Legitimacy test: A lawful, legitimate reason for processing personal data must be identified.
- Necessity test: The data processing must be necessary for purpose. There can be no other alternative, less intrusive ways of achieving the company’s goal, and the amount of data processed must be proportionate.
- Balancing test: The legitimate interest in the data processing must outweigh the impact on individuals’ rights and freedoms. This takes into account whether individuals would reasonably expect their data to be processed in this way, such as if they made it publicly available or have a relationship with the company.
Even if a company fails the balancing test, it may still not be required to gain the data subjects’ consent if they apply mitigating measures to limit the processing’s impact. Such measures include:
- Technical safeguards: Applying safeguards that reduce security risks, such as encryption.
- Pseudonymisation: Replacing or removing identifiable information to prevent data from being linked to an individual.
- Data masking: Substituting real personal data with fake data when actual content is not essential.
- Mechanisms for data subjects to exercise their rights: Making it easy for individuals to exercise their data rights, such as opting out, requesting erasure, or making claims for data correction.
- Transparency: Publicly disclosing data processing practices through media campaigns and transparency labels.
- Web scraping-specific measures: Implementing restrictions to prevent unauthorised personal data scraping, such as offering an opt-out list to data subjects or excluding sensitive data.
Technology lawyer Malcolm Dowden of Pinsent Masons said in the company article that the definition of “legitimate interest” has been contentious recently, particularly in the U.K.’s Data (Use and Access) Bill.
“Advocates of AI suggest that data processing in the AI context drives innovation and brings inherent social good and benefits that constitute a ‘legitimate interest’ for data protection law purposes,” he said. “Opponents believe that view does not account for AI-related risks, such as to privacy, to discrimination or from the potential dissemination of ‘deepfakes’ or disinformation.”
Advocates from the charity Privacy International have expressed concerns that AI models like OpenAI’s GPT series might not be properly scrutinised under the three tests because they lack specific reasons for processing personal data.
Consequences of unlawfully processing personal data in AI development
If a model is developed by processing data in a way that violates GDPR, this will impact how the model will be allowed to operate. The relevant authority evaluates “the circumstances of each individual case” but provides examples of possible considerations:
- If the same company retains and processes personal data, the lawfulness of both the development and deployment phases must be assessed based on case specifics.
- If another firm processes personal data during deployment, the EDPB will consider if that firm did an appropriate assessment of the model’s lawfulness beforehand.
- If the data is anonymised after unlawful processing, subsequent non-personal data processing is not liable to GDPR. However, any subsequent personal data processing would still be subject to the regulation.
Why AI firms should pay attention to the guidance
The EDPB’s guidance is crucial for tech firms. Although it holds no legal power, it influences how privacy laws are enforced in the EU.
Indeed, companies can be fined up to €20 million or 4% of their annual turnover — whichever is larger — for GDPR infringements. They might even be required to change how their AI models operate or delete them entirely.
SEE: EU’s AI Act: Europe’s New Rules for Artificial Intelligence
AI companies struggle to comply with GDPR due to the vast amounts of personal data needed to train models, often sourced from public databases. This creates challenges in ensuring lawful data processing and addressing data subject access requests, corrections, or erasures.
These challenges have manifested in numerous legal battles and fines. For instance:
- In January 2024, Italy’s data protection authority accused OpenAI’s ChatGPT of violating GDPR by processing personal data without a proper legal basis, leading to a temporary suspension.
- The advocacy group noyb also filed a complaint against Sam Altman’s company in April, alleging that ChatGPT provided false information about individuals without offering mechanisms for correction.
- In June, Meta delayed the training of its large language models on public content shared by adults on Facebook and Instagram in Europe after pushback from Irish regulators. Meta AI, its frontier AI assistant, has still not been released within the bloc due to its “unpredictable” regulations.
Additionally, in September, the Dutch Data Protection Authority fined Clearview AI €30.5 million for unlawfully collecting facial images from the internet without user consent, violating GDPR. That same month, the Irish DPC requested the opinion be drawn up just after it successfully convinced Elon Musk’s X to cease using European users’ public posts to train its AI chatbot, Grok, without obtaining their consent.