Dev reports suggest long sessions now burn through usage much faster
Anthropic last month reduced the TTL (time to live) for the Claude Code prompt cache from one hour to five minutes for many requests, but said this should not increase costs despite users reporting faster depleting quotas.
User Sean Swanson posted a bug report showing that Anthropic introduced a one-hour cache for Claude Code context around February 1, then changed it back to a five-minute cache around March 7.
"The 5m TTL is disproportionately punishing for the long-session, high-context use case that defines Claude Code usage," said Swanson.
When using AI coding assistants or agents, the context is additional data sent along with the user's prompts, such as existing code or background instructions.
Context improves the accuracy of the AI but also requires more processing.
Claude prompt caching avoids re-processing previously used prompts including context and background information.
The cache can have either a five-minute or one-hour TTL.
Writing to the five-minute cache costs 25 percent more in tokens, and writing to the one-hour cache 100 percent more, but reading from cache is around 10 percent of the base price.
Another factor is that the large one-million-token context window available on paid plans with the Claude Opus 4.6 or Sonnet 4.6 models increases costs, especially with cache misses.
Claude Code creator Boris Cherny said that "prompt cache misses when using 1M token context window are expensive...
if you leave your computer for over an hour then continue a stale session, it's often a full cache miss." He said that Anthropic is investigating a 400,000-token context window by default, with an option for one million tokens if preferred.
There is already a configuration setting for this.
Cherny said that larger contexts are now common because users are "pulling in a large number of skills, or running many agents or background automations."
Some developers are convinced that cache rebuilding and cache misses are major factors in Claude Code quota exhaustion, which has reached the point where Pro users ($20 per month) may get as few as two prompts in five hours .
A number of bugs in the caching code have been reported, such that one user said : "Before those are fixed likely any 5 minutes vs 1 h discussion is entirely moot since numbers are totally flawed."
The focus on cache optimization may also be evidence that, under the covers, Anthropic's quotas are simply buying less processing time than they did.
Swanson is not alone in reporting that Claude's performance has dropped.
For example, a user on the enterprise team plan said : "In March I could use Opus all day and it was getting great results.
Since the last week of March and into April, I've had sessions where I maxed out session usage under 2 hours and it got stuck in overthinking loops, multiple turns of realising the same thing, dozens of paragraphs of 'but wait, actually I need to do x' with slight variations." That chimes with similar comments from an AI director at AMD.
Cache optimization may be important, but it seems unlikely to account for all these reported issues.
®
Related Stories
Source: This article was originally published by The Register
Read Full Original Article →
Comments (0)
No comments yet. Be the first to comment!
Leave a Comment