What were the main allegations Cloudflare made against Perplexity AI's web crawler?

Cloudflare alleged that Perplexity AI engaged in 'stealth crawling,' using unlabelled User Agents and routing its scraper traffic through third-party residential proxy networks to evade traditional blocking and crawl monitoring methods set up by websites.

Home

Cloudflare Exposes Perplexity AI Stealth Crawling Tactics Used to Evade Website Blocks

byMorgan H

August 17, 2025

Cloudflare Exposes Perplexity AI for "Stealth Crawling" and Evading Website Blocks

Technology & Ethics | August 17, 2025

Internet infrastructure giant Cloudflare has publicly accused AI search startup Perplexity of using deceptive tactics to scrape website content, even after being explicitly blocked by site owners. The allegations have resulted in Perplexity being removed from Cloudflare’s verified bot program and sparked a broader debate about AI companies’ data collection practices.

Table of Contents

The Accusations

Cloudflare’s investigation revealed what they describe as “stealth crawling behavior” by Perplexity, the $18 billion AI-powered answer engine. According to Cloudflare’s detailed technical analysis, Perplexity was systematically circumventing website directives that prohibited automated crawling.

The company documented that when Perplexity’s declared crawler was blocked by websites, the AI startup would switch to disguised crawlers that impersonated regular web browsers, specifically mimicking Google Chrome on macOS systems. This deceptive practice allowed Perplexity to continue accessing content despite explicit blocks and robots.txt files forbidding such activity.

Technical Evidence

Cloudflare conducted controlled experiments using newly created test domains that had never been indexed by search engines or made publicly accessible. These domains implemented strict robots.txt files prohibiting all automated access and specific firewall rules blocking Perplexity’s known crawlers.

Despite these comprehensive blocks, Perplexity’s AI system was still able to provide detailed information about the restricted content when users queried the platform. The investigation revealed two distinct crawling operations:

Declared Crawler: Making 20-25 million daily requests using Perplexity’s official user agent
Stealth Crawler: Making 3-6 million daily requests while disguised as a Chrome browser

The stealth crawler utilized multiple undisclosed IP addresses not listed in Perplexity’s official documentation and would rotate through different network providers to evade detection and blocks.

Industry Standards Violated

Cloudflare emphasized that Perplexity’s behavior violates established web crawling ethics and Internet standards. Good-faith crawlers are expected to be transparent, identify themselves honestly, respect website directives like robots.txt files, and avoid overwhelming sites with excessive traffic.

The company contrasted Perplexity’s behavior with that of OpenAI, praising the ChatGPT maker as an example of responsible AI crawling. When Cloudflare conducted the same tests with OpenAI’s crawlers, they found that ChatGPT properly respected robots.txt files and ceased crawling when blocked, demonstrating appropriate compliance with website owner preferences.

Cloudflare’s Response

As a result of these findings, Cloudflare has taken decisive action:

De-listed Perplexity from its verified bot program
Implemented blocking heuristics specifically targeting the stealth crawling behavior
Added signature matches to managed rules that block AI crawling activity
Provided protection to all customers, including those on free plans

The company noted that any customer with existing bot management rules was already protected from these stealth crawling attempts, as their systems had correctly identified the disguised traffic as automated.

Broader Industry Implications

This incident highlights the escalating tension between AI companies seeking training data and content creators trying to protect their intellectual property. Cloudflare announced “Content Independence Day” a month prior, giving website owners more control over AI access to their content. Over 2.5 million websites have since chosen to completely block AI training through Cloudflare’s tools.

The controversy raises important questions about the ethics of AI data collection and whether current industry practices adequately respect content creators’ rights and preferences.

Industry Reaction

The revelation has sparked debate within the technology community. While some critics support Cloudflare’s decision to expose Perplexity’s practices, others argue that the complex nature of AI crawling and website blocking makes the situation less clear-cut than presented.

Perplexity has not yet issued a comprehensive public response to Cloudflare’s detailed accusations, leaving questions about how the AI startup will address these ethical and technical concerns.

Looking Forward

Cloudflare acknowledged that this public disclosure will likely prompt changes in Perplexity’s crawling behavior and expects continued evolution in bot detection and evasion techniques. The company is working with technical and policy experts worldwide, including Internet Engineering Task Force efforts to standardize extensions to robots.txt, to establish clearer principles for responsible bot operation.

This case may serve as a precedent for how major internet infrastructure companies will handle similar situations in the future, potentially reshaping the relationship between AI companies and content creators in the rapidly evolving digital landscape.

The outcome of this dispute could have significant implications for the broader AI industry, as it tests the boundaries of acceptable data collection practices and may influence future regulatory approaches to AI training data acquisition.