Mixed Understanding

Authorizations

Authorization

string

header

required

All endpoints require Bearer Token authentication. Add to the request header:

Authorization: Bearer YOUR_API_KEY

YOUR_API_KEY is the API Token (sk-... format).

Body

application/json

Request body in messages[] form (OpenAI Chat compatible). Apart from the fields listed below, other OpenAI-compatible parameters (temperature, top_p, stop, frequency_penalty, etc.) are used per the OpenAI Chat spec.

model

string

required

Get Model List

Example:

"claude-opus-4-7"

messages

object[]

required

OpenAI Chat format messages array. messages[*].content may be a string or an array; array element type ∈ {text, image_url, video_url, audio_url, file_url}. A type the model does not support returns 422 model_not_support_capability.

Typical content shapes:

Plain text (string content, simplest form):

[{"role": "user", "content": "count 1 to 3"}]

Plain text (array content, the unified format when mixing with multimodal):

[{"role": "user", "content": [{"type": "text", "text": "count 1 to 3"}]}]

text + image:

[{"role": "user", "content": [
  {"type": "text", "text": "describe this image"},
  {"type": "image_url", "image_url": {"url": "https://example.com/x.png"}}
]}]

text + video:

[{"role": "user", "content": [
  {"type": "text", "text": "summarize this video"},
  {"type": "video_url", "video_url": {"url": "https://example.com/clip.mp4"}}
]}]

text + file:

[{"role": "user", "content": [
  {"type": "text", "text": "extract key points"},
  {"type": "file_url", "file_url": {"url": "https://example.com/doc.pdf"}}
]}]

Multi-turn conversation (system + multiple user/assistant turns):

[
  {"role": "system", "content": "You are a terse assistant."},
  {"role": "user", "content": "1+1?"},
  {"role": "assistant", "content": "2"},
  {"role": "user", "content": "3+3?"}
]

Mixed attachments of all types (a single user message containing 2 each of image_url / video_url / audio_url / file_url):

[{"role": "user", "content": [
  {"type": "text", "text": "Summarize the key information from the following images, videos, audio, and documents"},
  {"type": "image_url", "image_url": {"url": "https://example.com/image-1.png"}},
  {"type": "image_url", "image_url": {"url": "https://example.com/image-2.jpg"}},
  {"type": "video_url", "video_url": {"url": "https://example.com/clip-1.mp4"}},
  {"type": "video_url", "video_url": {"url": "https://example.com/clip-2.mp4"}},
  {"type": "audio_url", "audio_url": {"url": "https://example.com/audio-1.mp3"}},
  {"type": "audio_url", "audio_url": {"url": "https://example.com/audio-2.wav"}},
  {"type": "file_url", "file_url": {"url": "https://example.com/doc-1.pdf"}},
  {"type": "file_url", "file_url": {"url": "https://example.com/doc-2.docx"}}
]}]

Show child attributes

Example:

[
  { "role": "user", "content": "count 1 to 3" }
]

stream

boolean

default:false

Whether to stream.

Behavior differences:

Value	Submit response `stream` field	SSE endpoint
`false`	`null`	Not available
`true`	`{"url": "/v1/llm/generations/{task_id}/stream"}`	Available; meanwhile task.data accumulates the full response

Example:

false

max_tokens

integer | null

Generation token limit. Family-level constraints: claude-* required; gpt-* usually ≥ 16; gemini-* optional.

Example:

64

temperature

number | null

Sampling temperature.

top_p

number | null

Nucleus sampling.

stop

Stop sequences.

Response

Task created

Submit response, conforming to the unified task standard shape. results / error are fixed at null during submit; they are returned via GET /v1/tasks/{task_id} after the task completes or fails

string

required

Task ID, formatted as task-llm-{timestamp}-{8random}. Used for GET /v1/tasks/{task_id} queries or GET /v1/llm/generations/{task_id}/stream SSE subscriptions

Example:

"task-llm-1776874565-yq3szvcu"

object

enum<string>

required

Object type, fixed at llm.generation.task

Available options:

llm.generation.task

Example:

"llm.generation.task"

type

enum<string>

required

Media type, fixed at llm

Available options:

llm

Example:

"llm"

model

string

required

The model name submitted by the client (echoed verbatim)

Example:

"claude-opus-4-7"

status

enum<string>

required

Task status, fixed at pending during submit

Available options:

pending

Example:

"pending"

progress

integer

required

Progress 0-100, fixed at 0 during submit

Example:

0

created

integer

required

Creation time (Unix seconds)

Example:

1776874565

stream

object

Returns {url: ...} when stream=true; null when stream=false. The client uses this to decide whether to connect to SSE

Show child attributes

results

object[] | null

Fixed at null during submit; obtained via GET /v1/tasks/{task_id} after the task completes — results[0] is the full OpenAI ChatCompletion response.

Known limitation: a thinking model's reasoning content (reasoning_content) appears only in the SSE stream's delta and is not accumulated into results[0].message.content

Example:

null

error

object

Fixed at null during submit; returned via GET /v1/tasks/{task_id} when the task fails

Example:

null