Canadian companies looking for guidance on the use of personal information in training and deploying artificial intelligence (“AI”) models may want to look to a recent opinion from the European Data Protection Board (“EDPB”). Opinion 28/2024 (“Opinion”), which discusses four questions:
(1) when and how an AI model can be considered ‘anonymous’;
(2) how controllers can demonstrate the appropriateness of the legal basis for processing in the development (training) phase;
(3) how controllers can demonstrate the appropriateness of the legal basis for processing in the deployment phase; and
(4) what are the consequences of the unlawful processing of personal data in the development phase of an AI model on the subsequent processing or operation of the AI model (e.g., what happens to AI models developed using legally questionable techniques such as web-scraping and what happens to those who use such models).
In the absence of any binding Canadian AI laws, the Opinion – although formulated with respect to compliance with the GDPR and aimed at European data protection authorities to guide their decision-making – nonetheless provides helpful analysis of the issues, includes definitions that may be useful in drafting data protection addenda and other transactional documents, and identifies concrete mitigation steps organizations can take to reduce risk.
1. Anonymity of AI models
From a privacy perspective, the key question with respect to AI models is whether or not they include personal information, either in the training stage or deployment stage. If they do, they are within the regulatory perimeter and must comply with privacy laws. This section of the Opinion focuses on determining when personal data embedded in an AI model is no longer identifiable. The Opinion cautions that even though the AI model has not been intentionally designed to produce information relating to an identified or identifiable natural person, the personal data may still remain “absorbed” in the parameters of the model.
Personal data is broadly defined in the GDPR (and similarly under Canadian privacy laws), and includes circumstances in which information relating to an individual can still be identified indirectly, even if encoded or obscured. This becomes particularly relevant for AI models, as personal data might still be inferred from the model through techniques like membership inference or querying. These inferences constitute personal information and bring the model under privacy laws. Organizations developing their own models, or purchasing such models, should be skeptical of claims that such models are “anonymous” or don’t need to comply with privacy laws.
The EDPB recommends that the anonymity of AI models be evaluated on a case-by-case basis, focusing on confirming:
- Personal data related to the training set cannot be extracted through reasonable means.
- Outputs generated by the model do not relate to the data subjects whose data was used.
While assessing the anonymity of the AI model, the Opinion reminds controllers to be aware of other legal requirements around anonymization techniques (in Canada, the anonymizations regulations under the Quebec privacy law would be an example). In assessing anonymity, the Opinion recommends controllers take a contextual approach, and consider “all the means reasonably likely to be used” for re-identification. Finally, the Opinion also notes that controllers should be aware of and assess the risk of identification by the controller and by different types of ‘other persons’, including unintended third parties accessing the AI model.
The EDPB highlights that evaluating the anonymity of AI models is a continuous process that requires a comprehensive review.
2. Legal basis for processing in development and deployment phases
The bulk of the Opinion addresses how to assess the use of legitimate interest as a valid legal basis for controllers processing personal data in AI models, focusing primarily on the development stage. Canadian privacy laws do not have such a basis for processing personal information as they are consent-based. The closest concept to that of “legitimate interest” would likely be the Canadian concept of “implied consent” coupled with reasonable purpose, although the two concepts do not perfectly align.
Nonetheless, the test articulated in the Opinion mirrors a similar test in Canada. Controllers must clearly identify the legitimate interest (purpose) pursued by the controller or the third-party and perform a legitimate interests assessment. Three steps are necessary under the assessment:
- The first step involves determining whether the pursuit of a legitimate interest is lawful, clearly articulated, and real and present.
- The second step includes analysing the necessity of the processing for the purposes of the legitimate interest.
- The third step involves balancing whether the data subjects’ fundamental rights are not overridden by the processing. This includes considering the nature of the data processed, the context of the of the processing, the severity of the risks identified, the consequences of the processing, the likelihood that the identified consequences materialise and the reasonable expectations of the data subjects.
The Opinion sets out mitigation measures that may reduce risk of violating the above steps. For instance:
- Technical solutions like pseudonymization and data masking,
- Steps to facilitate data subjects’ rights, such as providing opt-out options or waiting a reasonable period between data collection and use,
- Transparency through public communications,
- Web-scraping measures to restrict data collection from certain sources, and
- Technical measures in the deployment phase to prevent storage, regurgitation or generation of personal data.
The reasonable expectations of individuals will also play a significant role here. The criteria to determine whether individuals might reasonably expect processing of their personal data in AI models include: (a) whether the personal data was publicly available (in Canada, “publicly available” is a defined term, and is very narrow); (b) the nature of the relationship between the individual and the controller; (c) the nature of the service; (d) the context in which the personal data was collected; (e) the source from which the data was collected; (f) potential further uses of the AI model; and (g) whether the data subject is actually aware that his or her personal data is online.
3. Unlawful development of an AI model
The Opinion looks at three scenarios in which the AI model’s deployment could be impacted by the development of the model using unlawfully processed personal data and the data: (A) is retained in an AI model and subsequently processed by the same controller; (B) is retained in the AI model and processed by another controller in the context of deployment of the model; and (C) where a controller ensures the AI model is anonymized before further processing of personal data takes place in the AI model.
Interestingly, for (C), where the AI model was developed using unlawfully processed personal data but is now being used by a controller which can satisfy the high bar for anonymization, the deployment will be acceptable.
The EDPB states that the legality of personal data processing must be assessed on a case-by-case basis by the controller, considering both the initial and subsequent processing activities. If personal data is unlawfully processed at any stage, corrective actions, such as data deletion, may be necessary to ensure compliance and prevent further unlawful processing. When assessing subsequent processing, factors like the legal basis for processing, including legitimate interest, and the potential risks to data subjects must be considered, as the unlawfulness of the initial processing could impact the legitimacy of later actions.
If the data is anonymized in accordance with the GDPR, and no personal data is processed during subsequent activities, the GDPR may no longer apply, meaning the unlawfulness of the initial processing would not affect the later processing. However, the onus is on the controllers to demonstrate the lawfulness of their actions and to ensure compliance with data protection obligations throughout the data lifecycle.
Canadian Reminder
Both the Opinion and Canadian privacy laws both emphasize the importance of transparent and lawful data processing. While Canadian privacy legislation does not specifically regulate AI, it will be applicable when personal data is used at any stage of model development or use.
In Québec, the Act respecting the Protection of Personal Information in the Private Sector (“Private Sector Act”) and the Civil Code of Québec parallel many of the data protection considerations raised in the Opinion. Specifically, these laws mandate that organizations collect personal data only for legitimate and serious reasons. Further, the Private Sector Act states that the collection of personal information is limited to what is necessary for the intended purposes.
While the Personal Information Protection and Electronic Documents Act (“PIPEDA”) does not specifically address AI, it still governs the collection, use, and retention of personal information, providing a legal framework that parallels the GDPR’s focus on accountability, consent, purpose limitation, and data retention.
Similar to the concerns raised by the EDPB regarding AI model anonymity, once anonymized, personal information under PIPEDA and the Private Sector Act can be used without triggering the privacy obligations set forth in these laws.
Practical implications for businesses
To effectively manage AI systems and ensure compliance with privacy laws, controllers should establish a comprehensive AI governance program. While such governance is not explicitly required by Canadian laws (given the current absence of AI-specific laws), the lack of such governance will make it nearly impossible to demonstrate compliance with privacy laws when an AI model uses personal information. A governance program should include clear procedures for triaging AI use cases and ensuring that each project undergoes an appropriate assessment process. Key to this process is the implementation of data minimization practices, careful selection of training data sources, sound data preparation techniques, and a focus on ensuring the model’s resilience to inference attacks. Additionally, controllers should be able to demonstrate transparency though documentation supporting claims of anonymity, assessment of any legal basis for processing, risk assessments, privacy impact assessments, and records of security measures.
Organizations planning to use personal data for AI model development or deployment should employ strategies like data minimization, anonymization, pseudonymization, and the use of privacy-preserving technologies. Implementing individual rights control of data would also enhance transparency and accountability but may prove difficult to implement in many circumstances. Restrictions on web scraping practices should also be in place to safeguard data privacy (see the Office of the Privacy Commissioner of Canada’s Concluding joint statement on data scraping and the protection of privacy).
For more information on this topic, please contact Kirsten Thompson, Charles Giroux or other members of the Dentons Privacy and Cybersecurity group.