Website
  1. API integration explanation
  • Start
    • Product Intrduction
    • Quick to use
    • Using Nexhina in AI Coding Tools
  • API integration explanation
    • HTTP Status Codes
    • Getting an API Key
    • Authentication
    • Quick Start
    • Streaming Output Guide
    • Request URL
  • API Endpoints
    • Chat
      • Chat Completion
    • Models
      • List Available Models
    • Responses
      • Responses API
    • Embeddings
      • Text Embedding
    • Images
      • Generate Image
    • Audio
      • Text-to-Speech (TTS)
      • Speech-to-Text (STT)
      • Audio Translation
    • Video
      • Generate Video
    • Moderation
      • Content Moderation
    • Rerank
      • Rerank
  • Platform-related
    • Platform agreement
    • Privacy Policy
    • General Questions
  • Schemas
    • ChatRequest
    • Log
    • ChatMessage
    • ToolCall
    • User
    • FunctionDefinition
    • Channel
    • ToolDefinition
    • Token
    • ChatCompletionRequest
    • Redemption
    • ChatCompletionChoice
    • ChatCompletionResponse
    • ChatCompletionChunk
    • ResponseInputText
    • ResponseRequest
    • ResponseOutputText
    • ResponseOutputMessage
    • ResponseObject
    • EmbeddingRequest
    • EmbeddingData
    • EmbeddingResponse
    • ImageGenerationRequest
    • ImageData
    • ImageUsageInputTokensDetails
    • ImageUsage
    • ImageGenerationResponse
    • SpeechRequest
    • TranscriptionRequest
    • TranslationRequest
    • TranscriptionResponse
    • TranslationResponse
    • VideoGenerationRequest
    • VideoData
    • VideoGenerationResponse
    • ModerationRequest
    • ModerationCategory
    • ModerationResult
    • ModerationResponse
    • RerankRequest
    • RerankResult
    • RerankResponse
  1. API integration explanation

Streaming Output Guide

Streaming output is used for the Chat and Responses interfaces. It returns content chunk by chunk, providing a better user experience (no need to wait until the entire content is generated before showing it).

Enabling It#

Set stream: true in the request body:
{
  "model": "gpt-4o",
  "messages": [{"role": "user", "content": "Write a poem"}],
  "stream": true
}

Response Format (SSE)#

Streaming responses use the Server-Sent Events (SSE) protocol, with Content-Type: text/event-stream.
Each data chunk is formatted as:
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1713833628,"model":"gpt-4o","choices":[{"index":0,"delta":{"content":"You"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1713833628,"model":"gpt-4o","choices":[{"index":0,"delta":{"content":" are"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1713833628,"model":"gpt-4o","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}

data: [DONE]
Key points:
Each data item starts with data: , followed by JSON
The last item is data: [DONE], indicating the stream has ended
The delta.content of each chunk is the new text fragment for that chunk — concatenating them produces the complete reply
A finish_reason of stop indicates a normal end

cURL Streaming Call#

Python SDK Streaming Call#

Node.js SDK Streaming Call#

Usage Information in Streaming Output#

By default, streaming responses do not include usage (Token usage). If you need it, set stream_options:
{
  "model": "gpt-4o",
  "messages": [{"role": "user", "content": "Hello"}],
  "stream": true,
  "stream_options": {"include_usage": true}
}
Once set, the last chunk will include the complete usage field:
{
  "id": "chatcmpl-abc123",
  "object": "chat.completion.chunk",
  "choices": [{"index": 0, "delta": {}, "finish_reason": "stop"}],
  "usage": {"prompt_tokens": 10, "completion_tokens": 20, "total_tokens": 30}
}

Streaming Output for the Responses Interface#

The Responses API also supports stream: true. The format is similar to Chat — also the SSE protocol, ending with data: [DONE].

Notes for Parsing SSE Manually#

If you don't use the SDK and parse the SSE stream yourself, pay attention to the following:
1.
Read line by line: Each data item occupies one line, starting with data:
2.
Skip empty lines: Empty lines are event delimiters in the SSE protocol and do not affect the data
3.
Detect end: Stop reading when you encounter data: [DONE]
4.
Handle disconnections: You can retry after a network interruption, but you cannot resume from the breakpoint. You'll need to initiate the request again
5.
Timeout settings: We recommend setting the HTTP client timeout to 60 seconds or more, as long text generation may take a while

Streaming vs. Non-Streaming Comparison#

DimensionNon-Streaming (stream: false)Streaming (stream: true)
ResponseReturns the complete result at onceReturns text fragments chunk by chunk
User perceptionLonger wait timeText appears character by character, feels faster
Response formatchat.completionchat.completion.chunk
UsageIncluded by defaultRequires stream_options to be set
Use casesBackend batch processing, API chainingFrontend conversation, real-time interaction
Parsing difficultySimple, read JSON directlyRequires SSE parsing
Modified at 2026-06-30 09:30:34
Previous
Quick Start
Next
Request URL
Built with