Technology & Ethics | August 17, 2025
Internet infrastructure giant Cloudflare has publicly accused AI search startup Perplexity of using deceptive tactics to scrape website content, even after being explicitly blocked by site owners. The allegations have resulted in Perplexity being removed from Cloudflare’s verified bot program and sparked a broader debate about AI companies’ data collection practices.
Table of Contents
Toggle
The Accusations
Cloudflare’s investigation revealed what they describe as “stealth crawling behavior” by Perplexity, the $18 billion AI-powered answer engine. According to Cloudflare’s detailed technical analysis, Perplexity was systematically circumventing website directives that prohibited automated crawling.
The company documented that when Perplexity’s declared crawler was blocked by websites, the AI startup would switch to disguised crawlers that impersonated regular web browsers, specifically mimicking Google Chrome on macOS systems. This deceptive practice allowed Perplexity to continue accessing content despite explicit blocks and robots.txt files forbidding such activity.
Technical Evidence
Cloudflare conducted controlled experiments using newly created test domains that had never been indexed by search engines or made publicly accessible. These domains implemented strict robots.txt files prohibiting all automated access and specific firewall rules blocking Perplexity’s known crawlers.
Despite these comprehensive blocks, Perplexity’s AI system was still able to provide detailed information about the restricted content when users queried the platform. The investigation revealed two distinct crawling operations:
- Declared Crawler: Making 20-25 million daily requests using Perplexity’s official user agent
- Stealth Crawler: Making 3-6 million daily requests while disguised as a Chrome browser
The stealth crawler utilized multiple undisclosed IP addresses not listed in Perplexity’s official documentation and would rotate through different network providers to evade detection and blocks.
Industry Standards Violated
Cloudflare emphasized that Perplexity’s behavior violates established web crawling ethics and Internet standards. Good-faith crawlers are expected to be transparent, identify themselves honestly, respect website directives like robots.txt files, and avoid overwhelming sites with excessive traffic.
The company contrasted Perplexity’s behavior with that of OpenAI, praising the ChatGPT maker as an example of responsible AI crawling. When Cloudflare conducted the same tests with OpenAI’s crawlers, they found that ChatGPT properly respected robots.txt files and ceased crawling when blocked, demonstrating appropriate compliance with website owner preferences.
Cloudflare’s Response
As a result of these findings, Cloudflare has taken decisive action:
- De-listed Perplexity from its verified bot program
- Implemented blocking heuristics specifically targeting the stealth crawling behavior
- Added signature matches to managed rules that block AI crawling activity
- Provided protection to all customers, including those on free plans
The company noted that any customer with existing bot management rules was already protected from these stealth crawling attempts, as their systems had correctly identified the disguised traffic as automated.
Broader Industry Implications
This incident highlights the escalating tension between AI companies seeking training data and content creators trying to protect their intellectual property. Cloudflare announced “Content Independence Day” a month prior, giving website owners more control over AI access to their content. Over 2.5 million websites have since chosen to completely block AI training through Cloudflare’s tools.
The controversy raises important questions about the ethics of AI data collection and whether current industry practices adequately respect content creators’ rights and preferences.
Industry Reaction
The revelation has sparked debate within the technology community. While some critics support Cloudflare’s decision to expose Perplexity’s practices, others argue that the complex nature of AI crawling and website blocking makes the situation less clear-cut than presented.
Perplexity has not yet issued a comprehensive public response to Cloudflare’s detailed accusations, leaving questions about how the AI startup will address these ethical and technical concerns.
Looking Forward
Cloudflare acknowledged that this public disclosure will likely prompt changes in Perplexity’s crawling behavior and expects continued evolution in bot detection and evasion techniques. The company is working with technical and policy experts worldwide, including Internet Engineering Task Force efforts to standardize extensions to robots.txt, to establish clearer principles for responsible bot operation.
This case may serve as a precedent for how major internet infrastructure companies will handle similar situations in the future, potentially reshaping the relationship between AI companies and content creators in the rapidly evolving digital landscape.
The outcome of this dispute could have significant implications for the broader AI industry, as it tests the boundaries of acceptable data collection practices and may influence future regulatory approaches to AI training data acquisition.
Related posts:
- 🚨BREAKING: Google Clarifies AI SEO Reality – Tools Yes, Special Tactics (AEO & GEO) No
- Google AI images PageRank 2025, Google search updates, AI image SEO, Google algorithm news
- Google’s AI Training Data: Key Differences Between Search AI and the Gemini App Revealed
- AI Search and SEO: How the Industry is Navigating a “26-Mile Sprint”