Custom models, routing, and primary/standby fallback
The model field in POST /v1/chat/completions is the model id visible to the tenant (configured on the platform side; matches code in GET /v1/models). The service picks upstream routes and credentials from that and supports primary/standby fallback during execution.
Caller-oriented summary; status codes and edge cases follow your deployment README and actual service behavior.
From model to upstream (concept)
- Resolve model: align request
modelwith tenant-visible configuration; unknown, disabled, or unroutable models usually yield 4xx (exact status and message depend on implementation). - Model allowlist: if the tenant API key restricts models,
modelmust be in the allowed set or you get 403. - Upstream credentials: pick an upstream access key from tenant–provider binding rules; no usable credential → often 403.
- Primary/standby: internally score and constrain up to two attempts (primary and standby); ordering and whether standby is used depend on implementation.
Primary/standby execution and streaming semantics
- Non-streaming: try primary, then standby on failure.
- Streaming: prefer primary; if it fails before any chunk is sent to the client, standby may be tried; once output has started, mid-stream failure does not switch routes (avoids protocol confusion).
How callers pick the right model
- Finish model and routing configuration on the tenant side (entry varies by deployment).
- Use
GET /v1/modelsand matchcode(or your wrapped field) to the requestmodelstring exactly (case-sensitive). - If the tenant uses multiple API keys, confirm each key’s model allowlist allows the target model.