Symptom-based guide for inference issues.
401 Unauthorized
| Check | Action |
|---|
| Header shape | OpenAI: Authorization: Bearer <key>; Anthropic: x-api-key or Bearer |
| Key state | Not disabled/expired in console |
| Env mix-up | Non-production keys used with production Base URL |
403 Forbidden
| Cause | Action |
|---|
insufficient_balance / quota_exceeded | Console balance/quota |
| Model not allowed | Tenant entitlement for model id |
ip_not_allowed | Update IP allow list (Network and access) |
| Upstream keys exhausted | Contact support |
400 Bad Request
| Cause | Action |
|---|
param under messages | Fix tool_call_id, content part types |
| Unsupported part | Only text, image_url, input_audio, input_file |
| Body too large | ~32MB cap on chat/embeddings |
Missing max_tokens (Anthropic) | Required on Messages |
Read error.message and param.
429
Backoff — Rate limits.
Streaming issues
| Symptom | Action |
|---|
| Burst after silence | Disable proxy buffering |
| Mid-stream drop | Timeouts; client cancel |
In-stream error | Errors — not success |
5xx / timeouts
- Reproduce with smaller
max_tokens / shorter input.
- Retry or switch model.
- Escalate with
x-trace-id, time, model — Support information.
Empty model list
GET /models works without a key?
- Filters too strict?
- Model not enabled for you in the console?
Related
Errors · Production checklist