A recent data scraping decision from the French data protection authority provides support for the Canadian privacy regulator’s guidance to date. Canadian organizations and organizations doing business in Canada should be aware that:
- Indiscriminate data scraping of publicly accessible personal information of Canadians will likely be unlawful; and
- Canadian organizations which make personal information available on their websites (e.g., social media sites, directories) have an obligation to protect it, and a failure to take reasonable measures to prevent unlawful scraping of their website may amount to a reportable data breach.
Given the recent Canadian guidance on this issue, and global attention to it, it would not be a surprise to see an enforcement action by Canadian privacy regulators in this area in the coming months.
Background
Back in August 2023, the Office of the Privacy Commissioner of Canada (OPC) issued a joint statement with members from the Global Privacy Assembly’s International Enforcement Cooperation Working Group (“IEWG”) that set out expectations on what social media companies (“SMCs”) and website operators should do to prevent unlawful data scraping (“Initial Statement”). We had summarized the statement and discussed some of the key takeaways in our previous post Privacy Commissioner Issues Statement on Obligations to Protect Against Data Scraping.
Subsequently, the OPC engaged with several SMCs on how to potentially respond to the growing challenge of data scraping advances. In October 2024, the OPC released a concluding joint statement (“Concluding Statement”). The Concluding Statement highlighted four key messages:
- Personal information that is publicly accessible is subject to data protection and privacy laws in most jurisdictions.
- SMCs and the operators of websites that host publicly accessible personal data have an obligation to protect publicly accessible personal data on their platforms from data scraping that violates data protection and privacy laws (“unlawful scraping”).
- Mass data scraping incidents that harvest personal information can constitute reportable data breaches in many jurisdictions.
- Individuals can also take steps to protect their personal information from data scraping, and social media companies have a role to play in enabling users to engage with their services in a privacy protective manner.
These principles essentially set out the view of the OPC and will likely inform its approach to any enforcement proceedings in this area.
The global data protection authorities tend to coordinate their enforcement activities around various themes, so it is no surprise, that on December 5, 2024, the Commission Nationale de l’Informatique et des Libertés (“CNIL”), the French administrative regulatory body regarding data privacy, issued a decision against KASPR, fining it €240,000 (approximately CA $350,000) after an inspection found that the company had failed to comply with several obligations under the General Data Protection Regulation (“GDPR”).
Nature of the violations
KASPR markets a Chrome extension that allows customers to collect the professional contact information of the people they visit on social media site, LinkedIn. Using the contact details from LinkedIn and other websites, KASPR built a database of about 160 million contacts, which allows its customers to contact the targeted people for commercial prospecting, recruitment, or identity verification purposes.
Notably, on LinkedIn, users can choose from four options to determine the visibility of their contact information:
- “Only visible to me”;
- “Anyone on LinkedIn”;
- “1st-degree connections”; and
- “1st and 2nd-degree connections.”
However, in addition to the contact details of users who had made them visible to all, KASPR also collected the contact details of those who had chosen to restrict visibility to their 1st and 2nd-degree connections.
After the CNIL received complaints from people who had been targeted by KASPR, the CNIL found that KASPR had breached several GDPR articles.
- Article 6 – Obligation to have a legal basis on which to process the information
KASPR collected contact information from LinkedIn users who had chosen to restrict visibility to their first- and second-degree connections. Considering how some LinkedIn users had expressly limited visibility, the CNIL considered KASPR’s collection of their data to have been unlawful because it exceeded what could reasonably be expected by people who register on a professional social network.
- Article 5-1-e – Obligation to define and respect a data retention period proportionate to the purpose of the processing
KASPR kept the contact details of users for five years from each data update, which generally occurred when a person changed jobs or employers. If a person changed their job or employer before five years, the CNIL found that the renewal of this retention period meant that personal data was being kept for a disproportionately long time.
- Articles 12 and 14 – Obligation to provide transparency and information to individuals
KASPR only started to inform individuals subject to data scraping that their information was being collected in 2022, four years after the extension’s implementation. This information was provided in English via email with a link to oppose processing. CNIL commented on this delay in informing individuals and found that providing the information only in English did not provide transparent and comprehensible enough information.
- Article 15 – Respecting the right of access of individuals
When KASPR was asked by subjected individuals how their contact information had been collected, the company simply replied that the details had been collected from publicly accessible sources. The CNIL pointed out that KASPR should be able to indicate all available information to the source of the data and that, even if the company was not able to do so for every individual concerned, KASPR needed to be aware of its sources.
In addition to the fine, the CNIL ordered KASPR to:
- Cease collecting personal data from people who chose to limit the visibility of their contact details and delete any data collected in this way. If it is impossible to distinguish this data from the rest, KASPR will have three months to inform the people impacted that their data is being processed and the possibility of objecting to it, and to only use their data for this purpose;
- Stop the automatic renewal of storage of targeted people’s personal data;
- Inform the people whose data is collected in a language they understand; and
- Respond to any requests from individuals to access the data collected, providing all available information on the sources of data collection.
Importance of the KASPR decision in Canada
Canadian companies should pay attention to the KASPR decision as it highlights the anticipated enforcement approach that will be taken against companies that conduct data scraping, as well as those companies which host people’s personal information on publicly assessable websites. Bear in mind that this latter category could include not just social media sites, but online company employee directories/personnel listings. While neither the Joint Statement or Concluding Statement are binding on Canadian companies, they highlight the OPC’s increasing concern and likely telegraph a future investigation priority.
1. Publicly accessible is NOT “publicly available”
Under the Personal Information Protection and Electronic Documents Act (“PIPEDA”), which governs privacy protection in the private sector in Canada unless provincial privacy laws apply, companies must have a data subject’s consent to collect their information, or they must rely on an enumerated exception to having to obtain consent.
In Canada, there is an exception that allows companies to collect personal information that is “publicly available” without an individual’s consent (see s 7(1)(d)). However, many companies fail to understand the limits of this exception. “Publicly available” does not mean that information is simply available publicly. Instead, “publicly available” is a defined term set out PIPEDA’s Regulations Specifying Publicly Available Information (s. 1(e))., and only includes certain things, and includes “personal information that appears in a publication, book or newspaper, in printed or electronic form, that is available to the public, where the individual has provided the information.”
On a plain reading, this would appear to permit data scraping of social media sites, since such sites are arguably publications in electronic form in which an individual has made their personal information available.
This approach has been rejected by the OPC, which has said in prior investigations that “indiscriminate scraping of publicly accessible websites” will not be reasonable (see Joint investigation of Clearview AI, Inc. et al, 2021 CanLII 9227)
Consent is always conditioned by reasonableness of purpose and if the collection is unreasonable, it will not be saved by consent. Organizations must ensure that their purposes for collection, use and disclosure of personal information are limited to only those which a reasonable person would consider appropriate in the circumstances. See the OPC’s Guidance on inappropriate data practices: Interpretation and application of subsection 5(3).
The OPC further commented that two factors which influenced its determination of unreasonableness of collection were that organization did not collect the information directly from the individuals in question, “[n]or did it have any relationship with the third parties whose websites it scraped, who could have, hypothetically, obtained consent for Clearview’s purposes.”
2. Operators of sites must take steps to prevent scraping, or may face a reportable data breach
PIPEDA requires private companies to safeguard personal information under their control and to protect it against “loss or theft, as well as unauthorized access, disclosure, copying, use, or modification” (PIPEDA, Schedule I, clause 4.7). Data scraping may place the owners of websites hosting personal information at risk of violating this principle, if appropriate steps are not taken to prevent data scraping.
Not all data scraping is unlawful; it can be a useful way to facilitate information sharing to enable third parties to provide services either to the individual or the host site. To this end, the OPC suggested that any companies that authorize data scraping can do so through contractual terms, such as through their Terms and Conditions. However, the OPC suggests that there remains an obligation on these companies to ensure any permitted scraping is lawful and that any contractual terms are compliant with the applicable laws. Simply including contractual terms is not enough and companies must take active steps to ensure the scraping and use of personal data is compliant through monitoring and enforcement of contractual terms. Failure to do so may amount to a reportable data breach.
The reminder for transparency and consent between companies that allow data scraping, companies that conduct data scraping, and targeted individuals remains consistent throughout both the Concluding Statement and the CNIL’s decision against KASPR.
Takeaways
- The obligation to protect against unlawful scraping applies to small, medium, and large companies.
- Publicly accessible personal data is still subject to data protection and privacy laws in most jurisdictions. SMCs and website operators that host publicly accessible personal data have obligations to protect such information from unlawful scraping.
- To protect against unlawful scraping, companies should implement a combination of safeguarding measures. These measures should be regularly reviewed and updated as scraping techniques and technologies (including artificial intelligence and machine learning) continue to advance.
- While some organizations may contractually authorize data scraping, contractual terms alone cannot render the scraping lawful. There must be a lawful basis for scraping personal data. Organizations should be transparent about the scraping they allow and obtain consent where required by law. Additionally, any companies participating in data scraping should also implement adequate measures to ensure the contractually authorized use of scraped data is compliant with data protection and privacy laws, such as contractual terms, monitoring, and enforcement.
- Companies which use scraped data sets to train AI must comply with data protection and privacy laws in addition to any existing AI-specific laws. Organizations using AI models provided by third-party vendors should ensure that such vendors have trained their models using lawfully collected personal data.
The authors would like to thank articling student Emily Zheng for her assistance in preparing this insight.
For more information on this topic, please contact Kirsten Thompson, George Hua or other members of the Dentons Privacy and Cybersecurity group.