Custom models, routing, and primary/standby fallback

The model field in POST /v1/chat/completions is the model id visible to the tenant (configured on the platform side; matches code in GET /v1/models). The service picks upstream routes and credentials from that and supports primary/standby fallback during execution.

Caller-oriented summary; status codes and edge cases follow your deployment README and actual service behavior.

From model to upstream (concept)

  1. Resolve model: align request model with tenant-visible configuration; unknown, disabled, or unroutable models usually yield 4xx (exact status and message depend on implementation).
  2. Model allowlist: if the tenant API key restricts models, model must be in the allowed set or you get 403.
  3. Upstream credentials: pick an upstream access key from tenant–provider binding rules; no usable credential → often 403.
  4. Primary/standby: internally score and constrain up to two attempts (primary and standby); ordering and whether standby is used depend on implementation.

Primary/standby execution and streaming semantics

  • Non-streaming: try primary, then standby on failure.
  • Streaming: prefer primary; if it fails before any chunk is sent to the client, standby may be tried; once output has started, mid-stream failure does not switch routes (avoids protocol confusion).

How callers pick the right model

  1. Finish model and routing configuration on the tenant side (entry varies by deployment).
  2. Use GET /v1/models and match code (or your wrapped field) to the request model string exactly (case-sensitive).
  3. If the tenant uses multiple API keys, confirm each key’s model allowlist allows the target model.

See also

Back to docs home