Documentation Index
Fetch the complete documentation index at: https://docs.barndoor.ai/llms.txt
Use this file to discover all available pages before exploring further.
Overview
Once your LLM Gateway is set up — see Using the LLM Gateway — LLM Controls is where admins govern how the people and services in their organization actually consume it: capping spend, throttling busy callers, and restricting which models a given role can use.
| Tab | What it does |
|---|---|
| Budgets | Cap how many tokens — and optionally how much money — a scope (org, group, role, or user) is allowed to spend in a daily, weekly, or monthly period. Block requests once exhausted, or warn only. |
| Rate Limits | Throttle a scope with requests-per-minute and / or tokens-per-minute caps measured over a rolling 60-second window. |
| Model Access | Allowlist or denylist specific models, providers, or upstream model identifiers for an org, group, role, user, or API key. |
Before You Begin
You’ll need:- A Barndoor account with admin or superadmin privileges.
- An LLM Gateway that’s already configured with at least one provider and one route — see Using the LLM Gateway.
- For spending budgets (and accurate cost reporting), set per-model pricing under LLM Configuration → Model Pricing. Without matching pricing rules the gateway can still enforce token budgets but cannot compute spending budgets. See Managing Model Pricing for the full guide.
How Limits Compose
When a caller sends a request to the gateway, Barndoor evaluates policies in a fixed order. The first one that denies stops the chain and returns an HTTP error. A few things worth knowing about this chain:- Counters are debited after the upstream call returns, using actual prompt + completion token counts. The pre-check is what enforces the limit; the post-debit is what keeps the counter honest.
- Warn-only budgets never deny. They still record usage and fire alerts at their configured thresholds, but the request goes through.
- Multiple policies can apply at once. If a user is covered by both a user-scoped budget and an org-scoped budget, both are pre-checked and both are debited; the most restrictive one is what denies.
Token Budgets
Budgets cap how much a scope is allowed to consume in a rolling window — daily, weekly, or monthly. They can cap tokens, spending, or both (whichever hits its limit first denies the request).
Creating a budget
Open LLM Controls → Budgets and click Create Budget
Name and scope
- Name — anything short and descriptive. The name surfaces in the denial message the caller sees, so make it identifiable (for example
Engineering monthlyorSales-team daily cap). - Scope — pick Org to cover everyone, or Group, Role, or User to target a slice. Picking a scope without choosing a specific entity means “every entity of that type in this org”.
Period and action
- Period —
Daily,Weekly, orMonthly. Counters reset at the start of each period. - Action when exhausted — choose Block Requests (default) to return
429once the limit is hit, or Warn Only to keep accepting requests and just fire alerts at the threshold percentages.
Limits
- Token limit — total prompt + completion tokens allowed in the period.
- Spending limit (optional) — a dollar cap. Requires per-model pricing — see Managing Model Pricing.
- Alert thresholds — percentages at which Barndoor emits an alert (default
80,90).

How budgets are counted
- Tokens — prompt + completion tokens reported by the upstream provider on every request, summed per (budget, period).
- Spending — for each request, Barndoor multiplies prompt and completion tokens by the input and output prices in LLM Configuration → Model Pricing for that model.
- Reset — at the start of the next period (next day at 00:00 UTC for daily, next Monday for weekly, the 1st for monthly).
When a caller hits the limit
A blocking budget returns:"Spending …" in the message.
tokens_used slightly above token_limit in the denial. That’s expected: the gateway pre-checks the counter on the way in, forwards the call, and debits the actual prompt + completion tokens once the provider responds. The request that pushed the counter over the limit is allowed to finish; the next one is denied. The percentage is rounded for display, so a counter of 1,001,234 / 1,000,000 still prints as 100%.Rate Limits
Rate limits throttle a scope on a rolling 60-second window. They’re the right tool when you want to smooth bursts and prevent a single noisy caller from drowning out everyone else. (Budgets are the right tool when you want to control total spend over a day, week, or month.)
Creating a rate limit
Open LLM Controls → Rate Limits and click Create Rate Limit
Name and scope
- Name — appears in the denial message the caller sees.
- Scope — Org, Group, Role, User, API Key, or Model. Selecting Model lets you cap a particular alias across all users.
Limits
- Requests per minute — caps the number of inbound requests, regardless of size.
- Tokens per minute — caps the sum of prompt + completion tokens over the same window.

When a caller hits the limit
Retry-After header and the seconds-value in the message always match the window length — 60 seconds, since rate limits use a rolling 60-second window. It’s a conservative “wait one full window and you’re guaranteed to be unblocked” hint; the caller may often succeed sooner as older events fall out of the window. Clients that respect Retry-After (the OpenAI / Anthropic SDKs do) will back off appropriately.Model Access
Model access policies decide whether a caller is allowed to invoke a particular model at all. They support both allowlists (“only let this scope call these models”) and denylists (“this scope must not call these models”).
Creating a policy
Open LLM Controls → Model Access and click Create Policy
Add at least one target
| Target kind | Matches when… | Example |
|---|---|---|
| Model alias | The caller’s model field matches an alias you’ve defined in Model Routes. Supports trailing * wildcard. | claude-* |
| Upstream model | The resolved upstream model name (the string the gateway sends to the provider). | gpt-4o |
| Provider | Any model served by the selected provider. | openai-prod |
| Provider + model | A specific provider/upstream-model pair. | azure-east + gpt-4o-mini |

How allowlists and denylists combine
- If any allowlist applies to a caller, at least one of its targets must match — otherwise the call is denied.
- If any denylist applies, none of its targets may match — otherwise the call is denied.
- A caller with both an allowlist and a denylist must satisfy both.
- A caller with neither has unrestricted access (subject to budgets and rate limits).
When a caller is denied
What callers see when something is denied
| Response | Triggered by | Example message |
|---|---|---|
401 authentication_error | Missing, invalid, or revoked API key | invalid API key |
403 permission_error | Model access policy denial | Model 'foo' is not allowed by access policy: Production-only |
404 not_found_error | The model field doesn’t resolve to a configured route | model 'foo' not found or not available |
429 rate_limit_error | Rate-limit policy or concurrency limit | Rate limit exceeded (policy: Sales-team rate cap). Try again in 60 seconds. |
429 budget_exhausted | Token or spending budget exhausted (and the budget’s action is Block Requests) | Token monthly budget exhausted (budget: Engineering monthly) (100% used: 1001234 / 1000000 tokens). |
{ "error": { "message", "type", "code" } }), so callers using the OpenAI / Anthropic SDKs surface these as normal SDK errors without any special handling.
Troubleshooting
A user keeps getting 429s — which policy is responsible?
A user keeps getting 429s — which policy is responsible?
error.message in the response — Barndoor names the policy that denied the request ("… (policy: Sales-team rate cap)" or "… (budget: Engineering monthly)"). Open that policy in the relevant tab to inspect the configuration and the live usage bars.A budget says it's at 100% but I expected resets
A budget says it's at 100% but I expected resets
My model-access policy isn't firing
My model-access policy isn't firing
* must be trailing (claude-* is valid; *-mini is not).Spending budget is always $0 used
Spending budget is always $0 used
I want to know who's at risk of hitting a budget
I want to know who's at risk of hitting a budget
80, 90). Barndoor emits alerts when usage crosses each threshold so you can intervene before the budget actually blocks anyone. Future product updates will surface these alerts in a Dashboard view directly in LLM Controls.Frequently Asked Questions
When should I use a budget vs a rate limit?
When should I use a budget vs a rate limit?
Can the same scope have more than one budget or rate limit?
Can the same scope have more than one budget or rate limit?
Do policies apply to embeddings and completions too?
Do policies apply to embeddings and completions too?
/v1/chat/completions, /v1/completions, /v1/embeddings, /v1/responses, and /v1/messages goes through the same enforcement chain. Embedding requests consume tokens against budgets just like chat requests do.Are warn-only budgets useful?
Are warn-only budgets useful?
How can a developer test their own policy without disrupting other users?
How can a developer test their own policy without disrupting other users?
Need Help?
Reach out to [email protected] with:- The name of the policy you’re configuring or troubleshooting.
- The exact
error.messagereturned to the caller (if any). - The scope (org / group / role / user / API key) involved.
