Streaming reduces time-to-first-token and enables incremental UI. OpenAI and Anthropic surfaces use different event shapes.
OpenAI-compatible chat
Request
| Item | Value |
|---|---|
Body stream | true |
Accept | text/event-stream |
curl -sSN "https://51kik.com/v1/chat/completions" \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-H "Accept: text/event-stream" \
-d '{"model":"YOUR_MODEL_ID","stream":true,"messages":[{"role":"user","content":"Hello"}]}'
SSE format
Each data: line is JSON with object: "chat.completion.chunk":
data: {"choices":[{"delta":{"content":"Hi"}}],...}
data: [DONE]
delta may include content, tool_calls, etc. For tools, merge partial tool_calls per OpenAI rules.
Usage chunk
The gateway sets stream_options.include_usage: true. When the upstream supports it, a late chunk includes usage for billing/metrics.
Mid-stream errors
If a data: JSON object contains error, stop and handle as failure (SDK: GatewaySseError). See Errors.
Anthropic-compatible Messages
POST https://51kik.com/anthropic/v1/messages with stream: true and Accept: text/event-stream. Events follow Anthropic (message_start, content_block_delta, …). See Create message.
Proxies / CDN
| Symptom | Fix |
|---|---|
| Burst output after long silence | Disable buffering (proxy_buffering off) |
| Mid-stream disconnect | Raise read timeout; define retry policy |
| HTTP/2 + SSE | Verify middle boxes support SSE |
Client checklist
- Stream parser — do not buffer full body before parse
- Handle
[DONE]and connection close - Distinguish user cancel vs upstream error
- SSE buffering disabled in production proxies