Never Waste a Claude Reset Again

If you use Claude Pro, you already know the feeling. You're deep in a task — generating code, reviewing a PR, building out a feature — and then it happens. Usage limit reached. Resets in 4 hours. You either wait, or you switch to a slower model and lose momentum.

I built a small system that fixes this. It makes Claude aware of its own usage limit at the start of every session, so it can proactively offer to hand off work to other agents before you hit the wall.

The Problem With Usage Blindness

Claude has no idea how much of its usage window it has consumed. It starts every session assuming it has infinite runway. That's fine at 20% utilization. At 90%, it's a problem — especially if you're about to kick off a long implementation task.

The moment Claude hits its limit mid-task, you lose context, momentum, and time. And if you're working with agents like Codex CLI or Gemini CLI, those tools don't have this constraint. They can keep working while Claude resets. But Claude never thinks to suggest that, because it doesn't know it needs to.

The Fix: A Three-Part System

The solution has three components that connect a macOS menu bar app, a shell hook, and Claude's configuration.

1. Write usage to disk after every poll

I run a macOS menu bar app called ClaudeUsageBar that already polls claude.ai's internal usage API every two minutes. I modified it to write the result to a cache file after each poll:

~/.claude/usage-cache.json
{
  "utilization": 93,
  "resets_at": "2026-03-16T03:00:00Z",
  "updated_at": "2026-03-16T02:55:35Z"
}

One small thing to watch: the app has to run without sandbox mode to write outside its container. Setting com.apple.security.app-sandbox: false in the project entitlements handles this.

2. Inject usage into every Claude session

Claude Code supports a SessionStart hook — a shell script that runs when a session begins and can inject additional context into Claude's system prompt via stdout.

The hook reads the cache file, calculates time remaining (correctly, in UTC), and outputs a one-liner that Claude sees at the top of every session:

"[Claude usage: 93% — resets in 5m. Cache updated: 2026-03-16T02:55:35Z]"

The key detail in parsing: the resets_at timestamp is UTC. macOS date -j defaults to local time, which produces wildly wrong results. The fix is date -j -u -f "%Y-%m-%dT%H:%M:%S" — the -u flag forces UTC interpretation.

3. Tell Claude what to do with the information

The final piece is a rule in ~/.claude/CLAUDE.md — Claude's global config that loads in every session:

### Usage-Aware Delegation
At session start, injected context includes [Claude usage: X%].

If usage is >80%:
- Tell the user: "My usage is at X% and resets in ~Y."
- Offer: "Want me to delegate implementation to Codex or Gemini?"
- If user agrees → Codex/Gemini handle implementation;
  Claude handles planning, review, communication.
- If user declines → proceed normally.

If usage is ≤80% → proceed normally, no mention needed.

Now, at 93% utilization with 5 minutes to reset, Claude opens with: "My usage is at 93% and resets in ~5m. Want me to hand off the implementation to Codex while we wait?" — and you can say yes or no.

Why This Actually Matters

The deeper insight here isn't about usage limits. It's about Claude knowing its own constraints. An agent that's blind to its own state will always behave suboptimally. An agent that knows it's running low on runway will make better decisions — including knowing when to pass the baton.

Codex CLI and Gemini CLI both run outside Claude's usage window. Codex is particularly strong at focused implementation tasks: scaffolding, refactors, test generation. Gemini handles multimodal tasks and large context windows. Neither replaces Claude for planning, architecture, or communication — but they're excellent workers when Claude needs to pause.

The result is a system where hitting 93% doesn't mean stopping. It means a smooth handoff — Claude plans, delegates, waits the five minutes, and resumes review once the workers report back.

The Broader Pattern

This is one instance of a broader pattern worth thinking about: agents should know their own limits and communicate them proactively. Not just token limits. Response quality degradation under ambiguous prompts. Uncertainty about an API they haven't been trained on. Gaps in their own knowledge.

The instinct to build systems that surface this information — and act on it — is what separates useful AI tooling from impressive demos. Usage-aware delegation is a small version of that. But the principle scales.