Every AI Crawler Visiting Your Site Right Now — And What Each One Can Actually See
Subtitle: GPTBot, ClaudeBot, PerplexityBot, Google-Extended: same site, very different views.
Subtitle: GPTBot, ClaudeBot, PerplexityBot, Google-Extended: same site, very different views.
If you're running a moderately trafficked website, you're being crawled by multiple AI systems simultaneously. You probably have crawlers from OpenAI, Anthropic, Perplexity, and Google sitting in your access logs. Each of them is reading your content right now. And each of them is reading something subtly different.
This is not a catastrophe — but it's important to understand. These crawlers have very different capabilities. Some can execute JavaScript. Some can't. Some see your site the way a modern browser does. Others see something closer to what a crawler from 2010 would have seen. For background on how AI systems actually process web content, the gap is larger than most people realise.
The Big Four and Their Capabilities
Vercel published data on AI crawler traffic volume that provides useful scale context. GPTBot, OpenAI's crawler, generates approximately 569 million requests per month across the web. That's a lot of traffic. Anthropic's ClaudeBot is second at around 370 million requests monthly. Perplexity's bot is smaller in volume but growing. Google-Extended, Google's AI crawler, operates on a different scale because Google runs the largest existing crawler already — Google-Extended is an extension of that infrastructure, not a separate entity.
But here's what matters: GPTBot does not execute JavaScript. It receives the initial HTML response from your server and processes what it gets. If your page requires JavaScript to render content, GPTBot sees an empty or nearly empty page. The same is true for ClaudeBot. Both of these crawlers are fundamentally HTML-based. They download JavaScript files (so they can parse them and understand your codebase structure), but they don't run those files. They don't open a browser engine and execute code.
PerplexityBot operates under the same constraints. It reads HTML. It doesn't execute JavaScript.
Google-Extended is different. Google-Extended uses the same Web Rendering Service (WRS) that powers Googlebot — it runs a real headless Chromium browser. When Google-Extended fetches your page, it actually executes JavaScript. It waits for dynamic content to load. It processes your React components, your Vue.js scripts, your vanilla JavaScript DOM manipulations. To Google-Extended, your site looks the way it looks in Chrome.
This is a massive asymmetry. A modern SPA built entirely in React with no server-side rendering will appear to Google-Extended as a fully functional, content-rich site. The same site, to GPTBot and ClaudeBot, will appear as a loading spinner and empty divs. Same URL. Completely different ingested content.
The Practical Test
If you want to see what GPTBot, ClaudeBot, and PerplexityBot see when they visit your site, here's a direct method: disable JavaScript in your browser. Use your browser's developer tools to turn off JavaScript execution, then view your site the way you normally would. What you're looking at is roughly what these non-JS crawlers see. No client-side rendering. No dynamic content. No React hydration. Just the raw HTML that came from your server.
On a lot of modern sites, this looks broken. There's a loading spinner. There are empty divs. There might be a message saying "Please enable JavaScript." This is what GPTBot sees. This is what ClaudeBot sees. This is what PerplexityBot sees.
If your site is a blog with static HTML and modest JavaScript enhancements, you're probably fine. If your site is an SPA, a single-page app with JavaScript-driven routing and rendering, you have a problem.
The Limits of Google-Extended
Google-Extended is better, but not unlimited. Google has detailed its comprehensive web crawling process, and there are important constraints built in. Google caches CSS and JavaScript for up to 30 days. If you push a JavaScript update, Google-Extended may not see that update for weeks. Your crawl budget is finite — not everything gets the JS rendering treatment, just pages that Google determines are high value or frequently accessed.
Also worth noting: there's a distinction between GPTBot as a crawler (which doesn't render JS) and ChatGPT's browsing feature. When you use ChatGPT to browse the web within a conversation, you're using a different infrastructure path. ChatGPT's browser might process JavaScript differently than GPTBot's crawler. We don't have complete transparency into how this works, which introduces uncertainty.
The same is true for other AI platforms. When Claude's research mode browses the web, it might use different rendering than when Anthropic's training team crawls the web. The pipeline matters.
What This Means for Your Site
If AI visibility is important to your business, you need to ask: which crawlers matter most to my use cases? If you're building a research tool and you care whether OpenAI's models know about your content, you need GPTBot to see your full content. If you're worried about Perplexity citations, you need PerplexityBot to see your pages correctly. If you want AI agents to interact with your site reliably, you need every crawler to see correct, complete HTML.
The honest answer is: if any of these matter to you, you can't rely on client-side rendering for your content. Not because it's inherently wrong, but because you're building for multiple different readers with different capabilities. Some of those readers can't execute JavaScript.
Server-side rendering ensures that the HTML coming from your server contains the actual content, not a skeleton. Every crawler sees the same content. Google-Extended can still render JavaScript for interactive enhancements. ClaudeBot and GPTBot can see your actual information. This works for everyone.
Another way to frame it: the gap between what you see in your browser and what a non-JS crawler sees should be as small as possible. Ideally, it should be zero. Your critical content should exist in the HTML. Nice-to-have interactivity should be layered on top via JavaScript, not baked into the rendering itself.
The Emerging Baseline
The crawler landscape is consolidating around a simple principle: some crawlers render JavaScript, most don't. The ones that do (Google) have enormous advantages in seeing your full site. The ones that don't (OpenAI, Anthropic, Perplexity) can only see what you put in HTML.
For most content-driven businesses — news sites, research platforms, documentation, product catalogs — this means server-side rendering isn't optional. It's the only way to ensure consistency across the crawlers that matter.
For interactive applications, the rule is different. An email client, a project management tool, or a collaborative editor has legitimate reasons to use client-side rendering. But even in those cases, the crawlable parts — the help pages, the documentation, the API references — should be server-side rendered.
The future may bring improvements. OpenAI might deploy a version of GPTBot that renders JavaScript. Perplexity might add rendering capability. But that's speculative. Today, the safest approach is to assume that most AI crawlers can't execute JavaScript and build accordingly. If you surprise them by actually having JavaScript-rendered content, you'll be pleasantly ahead of expectations. If you assume they can render and they can't, you'll be invisible.
Built for this problem
Control exactly what AI reads on your site
MachineContext serves clean, structured content to AI bots — JavaScript rendered, properly formatted, always accurate — while keeping your site unchanged for humans.
Get started →