Skip to content
tecminds

FastAPI Rate Limit Headers: The Slowapi Response Parameter Gotcha

A FastAPI rate limit headers bug took down rate-limited POST endpoints with 500s. Here's what slowapi's _inject_headers actually requires — and the one-line fix per endpoint.

TTobias LüscherCo‑Founder · TecMinds2026-05-20 · 7 min read

FastAPI Rate Limit Headers: The Slowapi Response Parameter Gotcha

A rate-limited POST endpoint started returning 500s in production last week. Not a single one — all of them. Every endpoint wrapped in our @default_mutation_limit or @expensive_mutation_limit decorator was failing the moment a real client hit it. The 429 path worked. The success path returned the data. But anywhere slowapi tried to attach FastAPI rate limit headers to the response, the call stack collapsed into parameter response must be an instance of starlette.responses.Response.

The fix took ten minutes. The understanding took longer. This is the writeup we wish we'd read before shipping the original decorator.

The bug surfaced inside Wield, our recruiting-pipeline product (the cvflow codebase internally), where rate limiting protects roughly two dozen mutation endpoints — uploads, dossier generation, evaluation runs — from a single client burning the whole tenant quota in a tight loop. The same pattern shows up in any FastAPI service that adds slowapi to defend a paid LLM-backed API. If you don't declare a Response parameter on the endpoint, the library will eventually bite.

What Slowapi Actually Does With Your Response

The advertised contract of slowapi is simple: decorate an endpoint with @limiter.limit("10/minute"), get throttling and the X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset, and Retry-After headers for free. In practice there are two code paths and they have very different requirements.

The throttling path runs early. Inside _check_request_limit, slowapi inspects the incoming request, applies your key_func (we use a per-user key so admins behind a shared corporate NAT don't collectively burn one quota), checks the bucket, and raises a RateLimitExceeded if you're over. That path needs nothing from your endpoint signature. It will return a 429 with a JSON body whether or not your handler declared anything special.

The header-injection path is the one that breaks. After your handler returns, slowapi's async_wrapper calls _inject_headers to attach the informational X-RateLimit-* and Retry-After headers. And _inject_headers does not synthesise a Response object — it requires one to already exist, supplied as a parameter to your endpoint, so it can call response.headers["X-RateLimit-Remaining"] = ... directly. In slowapi 0.1.9, the async wrapper resolves that parameter via kwargs.get("response") after FastAPI has finished dependency injection. If your endpoint signature doesn't include response: Response, FastAPI never injected one, kwargs["response"] is None, and _inject_headers raises parameter response must be an instance of starlette.responses.Response.

The frustrating part is that this only manifests when slowapi would have attached headers — i.e. on a successful, rate-limited response. The 429 path runs earlier and handles its own headers. So unit tests that fire a handful of requests will look green. The 500s start in production, when your real users sit comfortably under the limit and the success path runs, every time.

If you're new to FastAPI's Response parameter, the official advanced response headers docs describe the mechanism: declaring response: Response in an endpoint signature asks FastAPI to inject an empty Response object that downstream code (yours or a library's) can attach headers to before the framework serialises the actual return value into the wire response. It's a deliberately lightweight contract — the function still returns a dict or a Pydantic model the normal way — but it's the contract slowapi is silently relying on.

The Hotfix vs The Proper Fix

The first response to "rate limiting is throwing 500s in production" is to stop the bleeding. The fast hotfix is to flip slowapi's header injection off entirely:

limiter = Limiter(
    key_func=per_user_key,
    headers_enabled=False,
)

headers_enabled=False short-circuits _inject_headers before it ever inspects the response object. Throttling stays on — the 429 path runs earlier in _check_request_limit and doesn't care about this flag. You lose the informational X-RateLimit-Limit / -Remaining / -Reset and Retry-After headers on successful responses, but if no client in your codebase reads them, that's a survivable degradation for a few hours.

The same hotfix lands in your test suite. Any assertion that the 429 response carries a Retry-After header has to be loosened, because Retry-After is also set inside _inject_headers. We tagged that test change with a follow-up ticket so the looser assertion couldn't quietly outlive the workaround.

The proper fix is the one-liner you should have shipped the first time. On every endpoint wrapped by a rate limit decorator, declare a Response parameter:

from fastapi import APIRouter, Response

@router.post("/documents/upload-batch")
@default_mutation_limit
async def upload_batch(
    payload: UploadBatchIn,
    response: Response,  # <-- the missing piece
    user: User = Depends(current_user),
):
    ...
    return UploadBatchOut(...)

That's it. FastAPI sees the response: Response parameter, injects an empty Response during dependency resolution, slowapi's _inject_headers finds it in kwargs, and the X-RateLimit-* and Retry-After headers ride out on every rate-limited success. We re-enabled headers_enabled=True, restored the Retry-After assertion in the rate-limit test suite, and confirmed the headers were present on a live response via TestClient.

The proper fix touched twenty-seven endpoints. Tedious, but mechanical — a git grep for the two decorators gives you the whole list, and the change at each site is the same line in the signature. We did it in a single PR so the two-line config flip back to headers_enabled=True couldn't accidentally outrun the per-endpoint work.

One more thing worth wiring up while you're in there: CORS. The Limit-Remaining header isn't useful if the browser's fetch() can't read it. We added X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset, and Retry-After to the CORS expose_headers list so the SPA can surface live quota state to the user instead of waiting for an opaque 429 to land. If you're building anything credit- or quota-metered, exposing the headers is the difference between "you ran out of credits" appearing before the user clicks the button versus after.

What This Means for Anyone Building FastAPI APIs

Three takeaways from the incident, in the order they'll save you time.

Inject the Response parameter on every rate-limited endpoint. Don't treat it as optional just because slowapi's quickstart examples don't always show it. The async wrapper requires it as soon as headers_enabled=True (the default), and you don't want to discover that asymmetrically in production. A lint rule that pairs your rate-limit decorators with a response: Response parameter is cheaper than the postmortem.

Treat informational response headers as part of the contract, not decoration. RFC 6585 defines 429 Too Many Requests and the Retry-After header together for a reason: clients are supposed to back off based on the header, not on a hardcoded sleep. If your API is consumed by a frontend you also own — which is the situation most SaaS teams are in — exposing the X-RateLimit-* family via CORS is the cheap upgrade that turns "the app feels broken near the limit" into "the UI shows remaining credits in real time." This is the same instinct that powered the advisor-pattern architecture we wrote about in April: cheap signal to the right layer beats expensive guesswork at the wrong one.

Per-user key_func matters more than the limit number. The default slowapi key is the request's IP address. Behind a corporate NAT, a Cloudflare proxy, or a shared mobile network, that means twenty unrelated users sharing one bucket. We bind the key to the authenticated user ID, fall back to IP only for unauthenticated routes (login, password reset), and document the choice in the limiter setup. The same logic applies to the GDPR-sensitive routes — our POST /data-subject/export is rate-limited 3/hour and DELETE /data-subject/account 5/hour, both per-user, so a single tenant admin under load can't deny the export endpoint to the rest of the company. If you're standing up similar policies in a regulated context, the broader playbook is in our ChatGPT governance checklist.

The headline lesson is small enough to fit on a sticky note: if you decorate a FastAPI endpoint with a slowapi rate limit, declare a Response parameter on the same endpoint. The reason is worth the longer read — middleware that silently swaps behavior based on the type of values it finds in kwargs is a category of bug that will keep finding fresh victims, and the same shape will reappear anywhere a library reaches into FastAPI's dependency-injection plumbing.

If you're standing up rate limiting for an LLM-backed API and want a second pair of eyes on the decorator surface, the key function, and the CORS expose list before you ship, book a free AI Potenzial-Check — or read how we think about AI agents for Swiss SMEs for the broader architecture context.

wield · The recruiting pipeline that actually scales with your volume.

CV pipeline with AI dossier generation and evaluation. For recruiters sorting a hundred applications an hour — without losing quality.

NEXT STEPWas this useful?