Streaming reduces time-to-first-token and enables incremental UI. OpenAI and Anthropic surfaces use different event shapes.

OpenAI-compatible chat

Request

ItemValue
Body streamtrue
Accepttext/event-stream
curl -sSN "https://51kik.com/v1/chat/completions" \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -H "Accept: text/event-stream" \
  -d '{"model":"YOUR_MODEL_ID","stream":true,"messages":[{"role":"user","content":"Hello"}]}'

SSE format

Each data: line is JSON with object: "chat.completion.chunk":

data: {"choices":[{"delta":{"content":"Hi"}}],...}
data: [DONE]

delta may include content, tool_calls, etc. For tools, merge partial tool_calls per OpenAI rules.

Usage chunk

The gateway sets stream_options.include_usage: true. When the upstream supports it, a late chunk includes usage for billing/metrics.

Mid-stream errors

If a data: JSON object contains error, stop and handle as failure (SDK: GatewaySseError). See Errors.

Anthropic-compatible Messages

POST https://51kik.com/anthropic/v1/messages with stream: true and Accept: text/event-stream. Events follow Anthropic (message_start, content_block_delta, …). See Create message.

Proxies / CDN

SymptomFix
Burst output after long silenceDisable buffering (proxy_buffering off)
Mid-stream disconnectRaise read timeout; define retry policy
HTTP/2 + SSEVerify middle boxes support SSE

Client checklist

  • Stream parser — do not buffer full body before parse
  • Handle [DONE] and connection close
  • Distinguish user cancel vs upstream error
  • SSE buffering disabled in production proxies

Related