Gateway troubleshooting and observability
Organized as symptom → checks → deeper docs. Tenant keys, model allowlists, upstream routes, etc. follow your deployment README and service source.
HTTP status quick reference
| Status | Common causes |
|---|---|
| 401 | Missing/wrong Bearer, disabled or deleted key. |
| 403 | Model not on key allowlist (TenantModelNotAllowedError); no usable upstream key after binding (TenantUpstreamKeysExhaustedError); IP allowlist (ip_not_allowed). |
| 400 | Zod body validation; invalid input_file; unknown MIME, etc. |
| 502 | Upstream APICallError (body often includes upstream_error). |
| 501 | Unsupported capability (ChatCompletionsNotSupportedError). |
Streaming 200 with mid-stream failure: Tool calls and streaming.
Models and routing
modelspelling exactly matchescustom_models.code?- Does
GET /v1/modelslist the model? - Does the tenant API key allowlist omit the target model?
- Upstream missing key / wrong binding? Cross-check README and routing code.
Multimodal and PDF
- File parts use
input_filethree-way rule? Chat Completions. - PDF preprocess failures:
error.code(pdf_password_required,pdf_expand_too_large, …) andGATEWAY_PDFIUM_WASM_PATH.
Proxies and SSE
- Nginx / Envoy buffering delaying first bytes or breaking streams.
- Timeouts killing long answers or tool chains.
- HTTP/2 vs client SSE quirks (sometimes force HTTP/1.1 for this path).
Logs and tracing
- Pass
x-trace-idthrough this service to upstream logs for correlation with tenant usage and trace systems (SDK: SDK quickstart).