Blog

How AI reads the web — and what you can do about it.

·5 min read

A Bigger Context Window Doesn't Fix Bad Retrieval

The context window keeps growing. GPT-4 operates at 128,000 tokens. Gemini reaches 1 million. Claude can handle 200,000. With windows this large, a common thought follows: why retrieve at all? Why not just load the entire document, the entire codebase, the entire knowledge base i

·5 min read

AI Search Is Not Search. Stop Optimizing for It Like It Is.

For twenty years, the web's visibility game has had one clearly defined winner: Google Search. Get into the top ten results for your target queries and traffic flows. The signals were well understood. Backlinks meant authority. Keyword relevance mattered. Page experience factors

·5 min read

Can ChatGPT Request Markdown Instead of HTML?

Cloudflare built the infrastructure. Most AI agents haven't shown up to use it.

·6 min read

Do AI Models Actually Read Your Website's Full HTML?

Most don't — and what they miss affects how accurately they understand your content.

·5 min read

llms.txt Is a Good Idea That Nobody's Actually Reading

The new standard for telling AI what's on your site has a problem: the bots aren't checking it.

·5 min read

Paywalls vs. AI Crawlers: What's Actually Happening

Publishers have invested decades in paywalls. The New York Times, The Wall Street Journal, The Financial Times, CNBC — they built subscription models because they needed recurring revenue to fund quality journalism. Paywalls work by restricting access. You pay, you read. You don'

·6 min read

Does Schema.org Help AI Understand Your Site? Sort Of.

Structured data gives AI systems useful anchors — but most live crawlers don't read it the moment they visit.

·7 min read

Server-Side Rendering Is Back. AI Is a Big Reason Why.

The shift from client-side to server-side rendering was already underway. AI crawlers just made the case harder to ignore.

·7 min read

The Token Cost of a Badly Built RAG Pipeline

Every API call to an LLM costs money. The bill arrives per token. A token is roughly four characters, so a single HTML page can mean thousands of tokens, each one adding to your operational cost. Most AI teams understand this intellectually. But many still haven't thought through

·7 min read

Why AI Agents Keep Getting Stuck on the Web

Multi-step web tasks sound simple. For AI agents, they're a compounding failure problem.

·5 min read

The Race to Build Web Standards for AI: robots.txt, llms.txt, ai.txt

The web has had a protocol for managing automated access since 1994. It's called robots.txt. It's a simple text file in the root directory of a domain that tells crawlers which paths they shouldn't visit. The entire system works because crawlers choose to respect it. There's no t

·6 min read

Every AI Crawler Visiting Your Site Right Now — And What Each One Can Actually See

Subtitle: GPTBot, ClaudeBot, PerplexityBot, Google-Extended: same site, very different views.