Website
  1. Platform-related
  • Start
    • Product Intrduction
    • Quick to use
    • Using Nexhina in AI Coding Tools
  • API integration explanation
    • HTTP Status Codes
    • Getting an API Key
    • Authentication
    • Quick Start
    • Streaming Output Guide
    • Request URL
  • API Endpoints
    • Chat
      • Chat Completion
    • Models
      • List Available Models
    • Responses
      • Responses API
    • Embeddings
      • Text Embedding
    • Images
      • Generate Image
    • Audio
      • Text-to-Speech (TTS)
      • Speech-to-Text (STT)
      • Audio Translation
    • Video
      • Generate Video
    • Moderation
      • Content Moderation
    • Rerank
      • Rerank
  • Platform-related
    • Platform agreement
    • Privacy Policy
    • General Questions
  • Schemas
    • ChatRequest
    • Log
    • ChatMessage
    • ToolCall
    • User
    • FunctionDefinition
    • Channel
    • ToolDefinition
    • Token
    • ChatCompletionRequest
    • Redemption
    • ChatCompletionChoice
    • ChatCompletionResponse
    • ChatCompletionChunk
    • ResponseInputText
    • ResponseRequest
    • ResponseOutputText
    • ResponseOutputMessage
    • ResponseObject
    • EmbeddingRequest
    • EmbeddingData
    • EmbeddingResponse
    • ImageGenerationRequest
    • ImageData
    • ImageUsageInputTokensDetails
    • ImageUsage
    • ImageGenerationResponse
    • SpeechRequest
    • TranscriptionRequest
    • TranslationRequest
    • TranscriptionResponse
    • TranslationResponse
    • VideoGenerationRequest
    • VideoData
    • VideoGenerationResponse
    • ModerationRequest
    • ModerationCategory
    • ModerationResult
    • ModerationResponse
    • RerankRequest
    • RerankResult
    • RerankResponse
  1. Platform-related

General Questions

Q: What's the difference between Nexhina and OpenAI's official service?
A: Nexhina is an OpenAI-compatible gateway, with access to top-tier international models (GPT-4o, Claude Sonnet 4, Gemini, etc.) and more flexible pricing. The interface format is fully compatible, so the OpenAI SDK can be used directly.
Q: Which programming languages are supported?
A: Any language that supports HTTP can call the API. Python and Node.js have official SDKs for the most convenient experience. For other languages (Go/Java/PHP/Rust), you can use an HTTP client to make requests directly.
Q: Is there a free trial?
A: Contact the administrator to get a test Key, which usually comes with an initial quota.

Call Issues#

Q: What should I do if I get a context_length_exceeded error?
A: The input is too long. Streamline the contents of the messages, or switch to a model with a longer context (e.g., gpt-4.1 supports 1M).
Q: What should I do if I get a model_not_found error?
A: The model parameter is incorrect. Call GET /v1/models to see the list of available models, and pay attention to case sensitivity.
Q: What should I do if the streaming output is interrupted?
A: A network issue caused the SSE connection to drop. There is no way to resume, so you'll need to initiate the request again. We recommend implementing concatenation logic on the client and re-requesting after a stream break.
Q: Why is the reply content truncated?
A: It may be that max_tokens is set too small, or the model output has reached its limit. Check finish_reason; if it is length, it means the output was truncated. Increase max_tokens.
Q: The quality of Chinese replies is poor. What should I do?
A: Try explicitly asking for a "reply in Chinese" in the system message, or use a model with stronger multilingual capabilities (Claude Sonnet 4 / GPT-4o).

Billing Questions#

Q: How many Tokens does a request consume?
A: Look at the usage field in the response. The total number of input + output Tokens is the consumption.
Q: How are Tokens counted for streaming requests?
A: Set stream_options: {"include_usage": true}. The last chunk will include usage. Non-streaming requests return usage by default.
Q: Is billing the same as OpenAI's official service?
A: The billing logic is the same (per Token), but the multipliers are different. Nexhina's pricing is more competitive than going direct. See the backend configuration for specific multipliers.

Feature Questions#

Q: Is Function Calling supported?
A: Yes. Models such as GPT-4o / Claude Sonnet 4 / Gemini all support it, and the usage is exactly the same as OpenAI.
Q: Is image input (Vision) supported?
A: Yes. Use multimodal models such as gpt-4o / claude-sonnet-4, and pass an image URL or Base64 in content. See the Multimodal Calling Guide for details.
Q: Is JSON output supported?
A: Yes. Set response_format: {"type": "json_object"}. See the JSON Mode Guide for details.
Q: Can models be fine-tuned?
A: Not currently supported. You can use the pre-trained models provided by the platform directly, and achieve customization through prompt engineering and few-shot learning.
Q: How long does video generation take?
A: Usually 30 seconds to several minutes, depending on the model and video length. See the Async Tasks Guide for details.

Deployment Questions#

Q: How do I configure CORS?
A: If the frontend calls the API directly, you'll encounter CORS issues. We recommend going through a backend proxy, or contacting the administrator to configure a CORS whitelist.
Q: Can the API be called from an internal network?
A: Nexhina is deployed on the public network, and the internal network needs to be able to access the external network. If it is completely isolated, a private deployment is required.
Q: Is private deployment supported?
A: Contact our sales team. We support private deployment in customer data centers.
Q: How do I view API call logs?
A: In the admin console → Logs page, you can filter by Key / model / time.
Modified at 2026-06-30 09:28:03
Previous
Privacy Policy
Next
ChatRequest
Built with