Custom models, routing, and primary/standby fallback

The model field in POST /v1/chat/completions is the model id visible to the tenant (configured on the platform side; matches code in GET /v1/models). The service picks upstream routes and credentials from that and supports primary/standby fallback during execution.

Caller-oriented summary; status codes and edge cases follow your deployment README and actual service behavior.

From `model` to upstream (concept)

Resolve model: align request model with tenant-visible configuration; unknown, disabled, or unroutable models usually yield 4xx (exact status and message depend on implementation).
Model allowlist: if the tenant API key restricts models, model must be in the allowed set or you get 403.
Upstream credentials: pick an upstream access key from tenant–provider binding rules; no usable credential → often 403.
Primary/standby: internally score and constrain up to two attempts (primary and standby); ordering and whether standby is used depend on implementation.

Primary/standby execution and streaming semantics

Non-streaming: try primary, then standby on failure.
Streaming: prefer primary; if it fails before any chunk is sent to the client, standby may be tried; once output has started, mid-stream failure does not switch routes (avoids protocol confusion).

How callers pick the right `model`

Finish model and routing configuration on the tenant side (entry varies by deployment).
Use GET /v1/models and match code (or your wrapped field) to the request model string exactly (case-sensitive).
If the tenant uses multiple API keys, confirm each key’s model allowlist allows the target model.

Custom models, routing, and primary/standby fallback

From model to upstream (concept)

Primary/standby execution and streaming semantics

How callers pick the right model

See also

From `model` to upstream (concept)

How callers pick the right `model`