When you hit $50, it stops. Not an alert — it stops. One line of code change.
No surprise bills. Ever.
“I set up a $20 budget alarm for AWS Bedrock. On May 1, I received a bill for several hundred dollars. The alarm was never triggered. AWS support was apologetic but would not offer any kind of refund.”
Works with every major provider
Setup in 5 minutes
Point your API client at proxy.llmcap.io. Works with every SDK. No code changes beyond that one line.
Define daily, monthly, or per-key dollar limits in the dashboard. Per-model granularity supported.
When a cap is hit, LLMCap returns 429 before the token is consumed. No charge. No surprise bill.
Hard caps are just the start. LLMCap tracks cache efficiency, anomalies, and cost per deployment — so you can fix the root cause.
When daily spend exceeds 2× your 7-day rolling average, a Slack alert fires in minutes. Not after the invoice.
Anthropic prompt cache hit rate tracked daily. Alert fires when rate drops > 40 percentage points — a silent token cost spike, caught.
Tag requests with x-llmcap-version. Dashboard shows cost broken down by deployment — catch expensive prompt refactors before they ship.
Available everywhere you code
Live spend in your status bar. Click to see today's usage, burn rate, and blocked count — without leaving the editor.
Check spend, browse logs, and manage keys from the command line. Works on macOS, Linux, and Windows.
System tray icon shows live spend. Right-click for stats and quick actions. Always visible, never intrusive.
Simple pricing
3-day trial, no charge until it ends · Cancel anytime
after 3-day trial
after 3-day trial
Credit card required for trial. Cancel before day 3 and you won't be charged.