6 providers · <35ms latency · Anomaly alerts · Hard enforcement

Hard dollar caps
on every LLM call.

When you hit $50, it stops. Not an alert — it stops. One line of code change.

No surprise bills. Ever.

# Before
client = Anthropic(api_key="sk-ant-...")

# After — one line change
client = Anthropic(
    api_key="sk-ant-...",
    base_url="https://proxy.llmcap.io/anthropic"
)
# When you hit $50 → 429. Token never consumed.

🔒Your API key is never stored — passes through per-request, discarded immediately

✓Only token counts & costs are logged

“I set up a $20 budget alarm for AWS Bedrock. On May 1, I received a bill for several hundred dollars. The alarm was never triggered. AWS support was apologetic but would not offer any kind of refund.”

— andersco onHacker News

Works with every major provider

Anthropic

OpenAI

Google Gemini

Mistral

Cohere

AWS Bedrock

Setup in 5 minutes

How LLMCap works

⚡

Change base_url

Point your API client at proxy.llmcap.io. Works with every SDK. No code changes beyond that one line.

Set your cap

Define daily, monthly, or per-key dollar limits in the dashboard. Per-model granularity supported.

🛡

Sleep peacefully

When a cap is hit, LLMCap returns 429 before the token is consumed. No charge. No surprise bill.

new — cost intelligence

Know why costs spike.
Before the bill arrives.

Hard caps are just the start. LLMCap tracks cache efficiency, anomalies, and cost per deployment — so you can fix the root cause.

3.7× spike detected

→ Slack

Spend Anomaly Alerts

When daily spend exceeds 2× your 7-day rolling average, a Slack alert fires in minutes. Not after the invoice.

◆Slack webhook · configurable multiplier

56%

Yesterday91%

Today56% ↓35pp

cache drop alert

Cache Hit Rate Tracking

Anthropic prompt cache hit rate tracked daily. Alert fires when rate drops > 40 percentage points — a silent token cost spike, caught.

◆Anthropic only · auto-detected from API response

v2.1.0

$14.20

v2.0.3

$4.80

(untagged)

$1.00

x-llmcap-version: v2.1.0

Deploy Cost Attribution

Tag requests with x-llmcap-version. Dashboard shows cost broken down by deployment — catch expensive prompt refactors before they ship.

◆All 6 providers · free-form version string

Providers supported

0ms

Avg added latency

18,742

Requests blocked today

0.9%

Uptime

Available everywhere you code

Works in your workflow

⬛

Marketplace

VS Code Extension

Live spend in your status bar. Click to see today's usage, burn rate, and blocked count — without leaving the editor.

Install Extension

PyPI

Terminal CLI

Check spend, browse logs, and manage keys from the command line. Works on macOS, Linux, and Windows.

pip install llmcap

View on PyPI

🪟

Desktop

Windows Tray App

System tray icon shows live spend. Right-click for stats and quick actions. Always visible, never intrusive.

pip install "llmcap[tray]"

Get Tray App

LLMCap · Hard caps · No surprise bills · <35ms latency · 6 providers incl. AWS Bedrock · 3-day trial · LLMCap · Hard caps · No surprise bills · <35ms latency · 6 providers incl. AWS Bedrock · 3-day trial · LLMCap · Hard caps · No surprise bills · <35ms latency · 6 providers incl. AWS Bedrock · 3-day trial · LLMCap · Hard caps · No surprise bills · <35ms latency · 6 providers incl. AWS Bedrock · 3-day trial · LLMCap · Hard caps · No surprise bills · <35ms latency · 6 providers incl. AWS Bedrock · 3-day trial · LLMCap · Hard caps · No surprise bills · <35ms latency · 6 providers incl. AWS Bedrock · 3-day trial · LLMCap · Hard caps · No surprise bills · <35ms latency · 6 providers incl. AWS Bedrock · 3-day trial · LLMCap · Hard caps · No surprise bills · <35ms latency · 6 providers incl. AWS Bedrock · 3-day trial ·

Simple pricing

Pick your plan

3-day trial, no charge until it ends · Cancel anytime

Starter

$19/mo

after 3-day trial

✓2 API keys
✓All 6 providers incl. Bedrock
✓Daily & monthly caps
✓Cache hit rate monitoring
✓Deploy cost attribution
✓30-day audit log
✓1 user
✓Email support

Start 3-Day Trial

Questions

Does LLMCap ever see or store my API keys?+

No. Your provider API key (e.g. sk-ant-...) is passed through the proxy header on each request and immediately discarded. LLMCap only stores your LLMCap proxy key, hashed with bcrypt. We never log your provider keys.

Does it work with streaming responses?+

Yes — streaming is supported from day one. LLMCap passes SSE chunks through in real time. If the budget is exceeded mid-stream, the connection is closed and a final 429 event is sent. The token that triggered the cap is not charged.

What exactly happens when the cap is hit?+

The next incoming request is rejected with HTTP 429 before it reaches the provider. The token is never consumed, so you are never billed for it. Your app receives the same 429 response structure providers use for rate limiting, so existing error handling works as-is.

Does it work with AWS Bedrock?+

Yes — AWS Bedrock is fully supported including Claude, Llama, Titan, Mistral, and Cohere models via Bedrock. Pass your AWS credentials in request headers alongside your LLMCap key. LLMCap re-signs requests using SigV4 and forwards them to the Bedrock Runtime. Credentials are never stored. This is exactly what fills the gap that AWS Budgets leaves open: AWS Budgets has 8-12 hour reporting lag and sometimes doesn't trigger at all — LLMCap blocks before the charge happens.

Can I self-host LLMCap?+

Self-hosting is on the roadmap. The proxy is open source (FastAPI + Redis). For now, the managed service at proxy.llmcap.io is the recommended path — it's already deployed with <35ms latency worldwide.

Hard dollar capson every LLM call.