Blog
How AI reads the web — and what you can do about it.
A Bigger Context Window Doesn't Fix Bad Retrieval
The context window keeps growing. GPT-4 operates at 128,000 tokens. Gemini reaches 1 million. Claude can handle 200,000. With windows this large, a common thought follows: why retrieve at all? Why not just load the entire document, the entire codebase, the entire knowledge base i
AI Search Is Not Search. Stop Optimizing for It Like It Is.
For twenty years, the web's visibility game has had one clearly defined winner: Google Search. Get into the top ten results for your target queries and traffic flows. The signals were well understood. Backlinks meant authority. Keyword relevance mattered. Page experience factors
Can ChatGPT Request Markdown Instead of HTML?
Cloudflare built the infrastructure. Most AI agents haven't shown up to use it.
Do AI Models Actually Read Your Website's Full HTML?
Most don't — and what they miss affects how accurately they understand your content.
llms.txt Is a Good Idea That Nobody's Actually Reading
The new standard for telling AI what's on your site has a problem: the bots aren't checking it.
Paywalls vs. AI Crawlers: What's Actually Happening
Publishers have invested decades in paywalls. The New York Times, The Wall Street Journal, The Financial Times, CNBC — they built subscription models because they needed recurring revenue to fund quality journalism. Paywalls work by restricting access. You pay, you read. You don'
Does Schema.org Help AI Understand Your Site? Sort Of.
Structured data gives AI systems useful anchors — but most live crawlers don't read it the moment they visit.
Server-Side Rendering Is Back. AI Is a Big Reason Why.
The shift from client-side to server-side rendering was already underway. AI crawlers just made the case harder to ignore.
The Token Cost of a Badly Built RAG Pipeline
Every API call to an LLM costs money. The bill arrives per token. A token is roughly four characters, so a single HTML page can mean thousands of tokens, each one adding to your operational cost. Most AI teams understand this intellectually. But many still haven't thought through
Why AI Agents Keep Getting Stuck on the Web
Multi-step web tasks sound simple. For AI agents, they're a compounding failure problem.
The Race to Build Web Standards for AI: robots.txt, llms.txt, ai.txt
The web has had a protocol for managing automated access since 1994. It's called robots.txt. It's a simple text file in the root directory of a domain that tells crawlers which paths they shouldn't visit. The entire system works because crawlers choose to respect it. There's no t
Every AI Crawler Visiting Your Site Right Now — And What Each One Can Actually See
Subtitle: GPTBot, ClaudeBot, PerplexityBot, Google-Extended: same site, very different views.