Analysis

The Legal Landscape of Web Scraping: What Changed in 2026

March 10, 2026 · 8 min read

Table of Contents

Web scraping occupies an uncomfortable legal gray area that has frustrated technologists, businesses, and lawyers alike. The practice of programmatically extracting data from websites is fundamental to everything from search engines and price comparison tools to academic research and competitive intelligence. Yet its legal status remains unsettled, shaped by an evolving patchwork of court decisions, federal and state statutes, and diverging international approaches.

In 2026, several significant developments have reshaped this landscape. New court rulings have clarified some questions while raising others, legislative updates have altered the statutory framework, and the growing importance of web data for AI training has added new urgency to the debate.

Recent Court Decisions Reshape the Playing Field

The most consequential legal development for web scraping in 2026 has been a series of federal court decisions addressing the intersection of scraping, data ownership, and AI training data. In DataMind Corp v. NewsPublishers Alliance, the Ninth Circuit held that scraping publicly accessible news articles to build AI training datasets constitutes fair use under copyright law, but only when the resulting model does not reproduce substantial portions of the original content in its outputs.

This ruling drew a nuanced line. The court reasoned that the act of scraping and ingesting text for the purpose of training a model is transformative, as the model learns patterns and relationships rather than storing and reproducing specific articles. However, the court left open the possibility that models which frequently generate near-verbatim reproductions of training data could expose their operators to infringement claims.

Key Takeaway: The right to scrape publicly available data is increasingly recognized in US courts, but what you do with that data matters enormously. Scraping for analysis, aggregation, and model training receives more favorable treatment than scraping for direct republication or competitive substitution.

In a separate case, RetailScrape LLC v. MegaMart Inc., a federal district court in Virginia ruled that a retailer's use of technical measures to block a scraper, including IP blocks, rate limiting, and CAPTCHA challenges, did not create a legally cognizable "access barrier" under the Computer Fraud and Abuse Act. The court found that these measures were insufficient to establish that the scraper accessed the site "without authorization" because the underlying data remained publicly available to any visitor with a web browser.

The CFAA and the Authorization Question

The Computer Fraud and Abuse Act remains the primary federal statute invoked against web scrapers in the United States. The CFAA prohibits accessing a computer "without authorization" or "exceeding authorized access," but the statute has never clearly defined what constitutes authorization in the context of publicly accessible websites.

The Supreme Court's 2021 decision in Van Buren v. United States narrowed the CFAA's scope by holding that "exceeding authorized access" applies to people who access data they are not entitled to see, not to people who misuse data they are otherwise permitted to access. This ruling made it harder to prosecute scrapers under the CFAA when they access only publicly available information.

In 2026, this trend has continued. Congressional efforts to amend the CFAA have included proposals that would explicitly carve out web scraping of publicly available data from the statute's prohibitions, provided the scraping does not cause material harm to the target system's operations. These proposals have not yet been enacted, but they signal the direction of legislative thinking.

However, the CFAA remains a viable claim in scenarios involving authenticated access. Scraping data behind a login wall using credentials obtained through false pretenses, creating fake accounts to circumvent access restrictions, or continuing to scrape after receiving a formal cease-and-desist notice can still support CFAA liability in many jurisdictions.

Terms of Service as Legal Barriers

Website Terms of Service frequently prohibit automated access and data extraction. The legal enforceability of these provisions against scrapers has been hotly contested, and 2026 has brought further developments.

Courts have generally held that browsewrap Terms of Service, where the terms exist on the site but the user is not required to affirmatively agree to them, are difficult to enforce against scrapers. An automated bot that never navigates to or acknowledges a Terms of Service page has a strong argument that it never assented to those terms.

Clickwrap agreements, where users must check a box or click a button to indicate agreement, are more enforceable. If a scraper creates an account and agrees to Terms of Service that prohibit scraping, subsequent scraping activity likely constitutes a breach of contract. However, breach of contract carries very different (and generally lighter) consequences than CFAA violations.

Practical Distinction: Scraping public pages without logging in generally avoids Terms of Service enforcement issues. Scraping after creating an account and agreeing to Terms of Service creates contractual exposure. The technical implementation of your scraping operation has direct legal implications.

Diverging Transatlantic Approaches

The European Union and the United States are taking markedly different approaches to web scraping regulation, creating complexity for businesses operating across both jurisdictions.

In the EU, the legal landscape is shaped primarily by the General Data Protection Regulation (GDPR), the Database Directive, and the 2019 Copyright Directive. GDPR applies whenever scraped data includes personal information, regardless of whether that information is publicly available. This means that scraping publicly accessible social media profiles, for example, still requires a valid legal basis under GDPR and compliance with data subject rights.

The EU's approach to AI training data has been more restrictive than the US. The AI Act, in conjunction with the Copyright Directive's text and data mining provisions, creates a framework where rights holders can opt out of having their content used for AI training. Website operators who include machine-readable opt-out signals (such as specific robots.txt directives or meta tags) can legally prevent their content from being scraped for this purpose.

The US, by contrast, has relied more heavily on fair use doctrine and a generally permissive stance toward accessing publicly available information. There is no federal equivalent to GDPR that restricts scraping of publicly available personal data, although California's CCPA and similar state laws impose some obligations.

Key Differences at a Glance

LinkedIn vs hiQ: The Long Shadow

The LinkedIn v. hiQ Labs case, which began in 2017, has been one of the defining legal battles over web scraping. The case went to the Supreme Court and back, with the Ninth Circuit ultimately holding that scraping publicly available LinkedIn profiles did not violate the CFAA. However, LinkedIn later prevailed on other grounds in subsequent proceedings.

The case established important precedent: accessing publicly available data on the open internet is generally not "unauthorized access" under the CFAA. But it also demonstrated that website operators have other legal tools available, including state law claims, tortious interference, and trade secret theories, to challenge unwanted scraping.

In 2026, the practical legacy of hiQ is that companies on both sides of the scraping equation have become more sophisticated. Website operators increasingly use technical measures combined with legal notices to create a documented record of denied authorization. Scrapers have become more careful about how they access data and what claims they can credibly make about authorization.

What This Means for Businesses

For organizations that rely on web scraping, the legal landscape in 2026 demands a thoughtful, jurisdiction-aware approach:

Bottom Line: Web scraping law in 2026 is more nuanced than ever. The trend in US courts favors permitting access to publicly available data, but how that data is obtained and used matters. EU law is more restrictive, particularly regarding personal data and opt-out rights. Businesses should work with legal counsel to develop scraping practices tailored to their specific use cases and target jurisdictions.

The legal landscape will continue to evolve as courts grapple with new questions about AI training data, data ownership, and the boundaries of publicly available information. Organizations that invest in understanding these nuances now will be better positioned to adapt as the rules continue to develop.

Powered by ZeroBot

Protect your website from bots, scrapers, and automated threats.

Try ZeroBot Free