On August 24, 2023, the Office of the Privacy Commissioner of Canada (“OPC”) issued a joint statement suggesting that social media companies (“SMC”) and other websites have an obligation to actively protect the personal information posted by users against unlawful data scraping(“Joint Statement”).This Joint Statement was issued along with the regulators of other members from the Global Privacy Assembly’s International Enforcement Cooperation Working Group (“IEWG”).[1]
Takeaways
- Personal information that is publicly accessible may still be subject to data protection laws and require protection.
- While the expectations are phrased as recommendations, the OPC stated that “many of them are explicit statutory requirements in particular jurisdictions or may be interpreted as such by courts and data protection authorities”, suggesting that the OPC will be interpreting PIPEDA in this manner going forward.
- Social media companies and other organizations with websites containing personal information should review their practices around preventing data scraping to ensure appropriate diligence beyond notices to users that they should be careful about what personal information they post.
- A larger issue is whether such companies and websites should have an obligation to protect against data scraping. Where an individual is made aware of the risks, and chooses nonetheless to make their personal information publicly available, is it appropriate to hold social media companies and others accountable for the choices of consumers?
- Massive data scraping incidents can be considered reportable breaches
What is Data Scraping?
Data scraping is an automated technique in which a computer program is used to extract (or “scrape”) information available on web pages. The company or individual using data scraping typically collects the scraped information and uses it for another purpose. For instance, a company may data scrape the social media websites of individuals who have posted their age, thereby creating a database of a certain demographic that may be sold or made available to data brokers or marketers. There OPC has said there is increased data scraping of individuals’ personal information from social media and other websites that host publicly accessible data.
Data scraping is usually a violation of a web site’s terms of use, which often contain a clause prohibiting such misuse. Depending upon the information scraped and the use to which it is put, it may also be a violation of the website owner’s intellectual property rights.
The OPC and IEWG identified a number of privacy concerns with the use of scraped data, including such information being used for:
- Targeted cyberattacks;
- Identity fraud;
- Monitoring, profiling or surveilling individuals;
- Unauthorized political or intelligence gathering purposes; and
- Unwanted direct marketing or spam.
What are the Recommended Steps?
To address these concerns, the Joint Statement provided “recommendations” on how SMCs and other websites should implement multi-layered technical and procedural controls to mitigate the risks, including:
- Designating a team and/or specific roles within the organisation to identify and implement controls to protect against, monitor for, and respond to scraping activities.
- ‘Rate limiting’ the number of visits per hour or day by one account to other account profiles, and limiting access if unusual activity is detected.
- Monitoring how quickly and aggressively a new account starts looking for other users (as abnormally high activity could be indicative of unacceptable usage).
- Taking steps to detect scrapers by identifying patterns in ‘bot’ activity. For example, a group of suspicious IP addresses can be detected by monitoring from where a platform is being accessed by using the same credentials from multiple locations. This would be suspicious where these accesses are occurring within a short period of time.
- Taking steps to detect bots, such as by using CAPTCHAs, and blocking the IP address where data scraping activity is identified.
- Where data scraping is suspected and/or confirmed, taking appropriate legal action such as the sending of ‘cease and desist’ letters, requiring the deletion of scraped information, obtaining confirmation of the deletion, and other legal action to enforce terms and conditions prohibiting data scraping.
The various regulators also suggest that at least in some jurisdictions, data scraping may constitute a data breach, triggering notifications to affected individuals and privacy regulators as required.
The Joint Statement is not binding on Canadian organizations, but the endorsement of the Joint Statement by the OPC suggests the OPC shares the approach behind the Joint Statement and could launch investigations of social media companies and other website owners that it believes are falling short of the recommendations in the Joint Statement.
The difficulty here is that the responsibilities the Joint Statement would assign to Canadian companies are not necessarily clearly grounded in the Personal Information Protection and Electronic Documents Act, (“PIPEDA”), the Act that governs privacy protection in the private sector.
Under PIPEDA, companies are required to safeguard personal information under their control and to protect it against “loss or theft, as well as unauthorized access, disclosure, copying, use, or modification”. The recommended practice on SMC’s to proactively monitor data scraping, while congruent with this statutory obligation, goes one step beyond the protection of personal information as anticipated under PIPEDA.
The OPC has characterized the use by companies of information publicly posted by individuals as an unauthorized use, despite the fact that users generally have control over their own privacy settings and can choose to make their information public or not. The “unauthorized” aspect likely arises from a prohibition on data scraping in a website’s terms of use. Query whether the absence of such prohibition would then make such use “authorized” and thereby undercut the OPC’s position here. The OPC would then likely regard this as a failure to have adequate safeguards, so organizations would be in trouble in either case. However, this ignores the fact that many companies use data scraping as part of their legitimate business model. For instance, in the absence of an open banking regime in Canada, many fintechs (and banks) use data scraping to harvest the information from a user’s bank account website and repurpose it for budgeting or dashboard purposes.
Under PIPEDA, a “breach of security safeguards” is “the loss of, unauthorized access to or unauthorized disclosure of personal information resulting from a breach of an organization’s security safeguards”. It would appear on its face to meet this criteria, although it is questionable whether the use/misuse of publicly posted information is a result of a breach of the organization’s “safeguards”.
Assuming for the moment that it is, the next step would be to do conduct a “real risk of significant harm” assessment to determine whether a breach is reportable. It is the breach itself which must create the harm (the language in PIPEDA says a breach is reportable “if it is reasonable in the circumstances to believe that the breach creates a real risk of significant harm to an individual.” If the information is already publicly available, is there any additional harm created by the “breach” (which assumes there use of such publicly available information is, in fact, a breach)?
Factors that are relevant to determining whether a breach of security safeguards creates a real risk of significant harm include the sensitivity of the personal information involved in the breach of security safeguards and the probability the personal information has been/is/will be misused.
Note that under PIPEDA, companies have the right to collect personal information that is “publicly available” without the consent of the individual.[2] However “publicly available” information is narrowly defined in the Regulations but does include “personal information that appears in a publication, including a magazine, book or newspaper, in printed or electronic form, that is available to the public, where the individual has provided the information.”[3] This would appear to squarely address the situation of at least some social media sites (e.g., “information that appears in a publication….in printed or electronic form….where that individual has provided the information”).
However, the OPC has previously condemned the collection of such information, for instance, in the case of Clearview AI’s scraping of billions of images of people from across the Internet and providing it to third parties. This joint statement reinforces the OPC’s approach that personal information hosted on social media is not considered as publicly available information exempted under PIPEDA.
The Joint Statement also advises individuals on steps to help protect themselves against the risks of scraping such as paying attention to platforms’ privacy policies; being careful about what they choose to share online; modifying their privacy settings; and making complaints to the SMC and then to the OPC where they are concerned about having been targeted by data scraping.
The Joint Statement is an attempt to harmonize the global data protection principles and practices on data scraping and to protect individuals specifically against the generative AI tools which have been trained on people’s data without their knowledge or consent. However, the OPC’s interpretation appears to be an expansive reading of the text the Regulation.
[1] Australia, United Kingdom, Hong Kong, Switzerland, Norway, New Zealand, Colombia, Jersey, Morocco, Argentina and Mexico.
[2] Personal Information Protection and Electronic Documents Act, SC 2000, c 5, s 7(1)(d).
[3] Regulations Specifying Publicly Available Information, SOR/2001-7, s 1(e).