Blog

How AI reads the web — and what you can do about it.

2026-03-01·5 min read

A Bigger Context Window Doesn't Fix Bad Retrieval

The context window keeps growing. GPT-4 operates at 128,000 tokens. Gemini reaches 1 million. Claude can handle 200,000. With windows this large, a common thought follows: why retrieve at all? Why not just load the entire document, the entire codebase, the entire knowledge base i

2026-03-01·5 min read

AI Search Is Not Search. Stop Optimizing for It Like It Is.

For twenty years, the web's visibility game has had one clearly defined winner: Google Search. Get into the top ten results for your target queries and traffic flows. The signals were well understood. Backlinks meant authority. Keyword relevance mattered. Page experience factors

2026-03-01·5 min read

Can ChatGPT Request Markdown Instead of HTML?

Cloudflare built the infrastructure. Most AI agents haven't shown up to use it.

2026-03-01·6 min read

Do AI Models Actually Read Your Website's Full HTML?

Most don't — and what they miss affects how accurately they understand your content.

2026-03-01·5 min read

llms.txt Is a Good Idea That Nobody's Actually Reading

The new standard for telling AI what's on your site has a problem: the bots aren't checking it.

2026-03-01·5 min read

Paywalls vs. AI Crawlers: What's Actually Happening

Publishers have invested decades in paywalls. The New York Times, The Wall Street Journal, The Financial Times, CNBC — they built subscription models because they needed recurring revenue to fund quality journalism. Paywalls work by restricting access. You pay, you read. You don'

2026-03-01·6 min read

Does Schema.org Help AI Understand Your Site? Sort Of.

Structured data gives AI systems useful anchors — but most live crawlers don't read it the moment they visit.

2026-03-01·7 min read

Server-Side Rendering Is Back. AI Is a Big Reason Why.

The shift from client-side to server-side rendering was already underway. AI crawlers just made the case harder to ignore.

2026-03-01·7 min read

The Token Cost of a Badly Built RAG Pipeline

Every API call to an LLM costs money. The bill arrives per token. A token is roughly four characters, so a single HTML page can mean thousands of tokens, each one adding to your operational cost. Most AI teams understand this intellectually. But many still haven't thought through

2026-03-01·7 min read

Why AI Agents Keep Getting Stuck on the Web

Multi-step web tasks sound simple. For AI agents, they're a compounding failure problem.

2026-03-01·5 min read

The Race to Build Web Standards for AI: robots.txt, llms.txt, ai.txt

The web has had a protocol for managing automated access since 1994. It's called robots.txt. It's a simple text file in the root directory of a domain that tells crawlers which paths they shouldn't visit. The entire system works because crawlers choose to respect it. There's no t

2026-03-01·6 min read

Every AI Crawler Visiting Your Site Right Now — And What Each One Can Actually See

Subtitle: GPTBot, ClaudeBot, PerplexityBot, Google-Extended: same site, very different views.