Streaming Output Guide

Streaming output is used for the Chat and Responses interfaces. It returns content chunk by chunk, providing a better user experience (no need to wait until the entire content is generated before showing it).

Enabling It

Set stream: true in the request body:

{
  "model": "gpt-4o",
  "messages": [{"role": "user", "content": "Write a poem"}],
  "stream": true
}

Response Format (SSE)

Streaming responses use the Server-Sent Events (SSE) protocol, with Content-Type: text/event-stream.

Each data chunk is formatted as:

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1713833628,"model":"gpt-4o","choices":[{"index":0,"delta":{"content":"You"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1713833628,"model":"gpt-4o","choices":[{"index":0,"delta":{"content":" are"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1713833628,"model":"gpt-4o","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}

data: [DONE]

Key points:

Each data item starts with data: , followed by JSON

The last item is data: [DONE], indicating the stream has ended

The delta.content of each chunk is the new text fragment for that chunk — concatenating them produces the complete reply

A finish_reason of stop indicates a normal end

cURL Streaming Call

Python SDK Streaming Call

Node.js SDK Streaming Call

Usage Information in Streaming Output

By default, streaming responses do not include usage (Token usage). If you need it, set stream_options:

{
  "model": "gpt-4o",
  "messages": [{"role": "user", "content": "Hello"}],
  "stream": true,
  "stream_options": {"include_usage": true}
}

Once set, the last chunk will include the complete usage field:

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion.chunk",
  "choices": [{"index": 0, "delta": {}, "finish_reason": "stop"}],
  "usage": {"prompt_tokens": 10, "completion_tokens": 20, "total_tokens": 30}
}

Streaming Output for the Responses Interface

The Responses API also supports stream: true. The format is similar to Chat — also the SSE protocol, ending with data: [DONE].

Notes for Parsing SSE Manually

If you don't use the SDK and parse the SSE stream yourself, pay attention to the following:

Read line by line: Each data item occupies one line, starting with data:

Skip empty lines: Empty lines are event delimiters in the SSE protocol and do not affect the data

Detect end: Stop reading when you encounter data: [DONE]

Handle disconnections: You can retry after a network interruption, but you cannot resume from the breakpoint. You'll need to initiate the request again

Timeout settings: We recommend setting the HTTP client timeout to 60 seconds or more, as long text generation may take a while

Streaming vs. Non-Streaming Comparison

Dimension	Non-Streaming (`stream: false`)	Streaming (`stream: true`)
Response	Returns the complete result at once	Returns text fragments chunk by chunk
User perception	Longer wait time	Text appears character by character, feels faster
Response format	`chat.completion`	`chat.completion.chunk`
Usage	Included by default	Requires `stream_options` to be set
Use cases	Backend batch processing, API chaining	Frontend conversation, real-time interaction
Parsing difficulty	Simple, read JSON directly	Requires SSE parsing

Enabling It#

Response Format (SSE)#

cURL Streaming Call#

Python SDK Streaming Call#

Node.js SDK Streaming Call#

Usage Information in Streaming Output#

Streaming Output for the Responses Interface#

Notes for Parsing SSE Manually#

Streaming vs. Non-Streaming Comparison#