Skip to main content
POST
/
v1
/
llm
/
generations
curl --request POST \ --url https://api.foxapi.cc/v1/llm/generations \ --header 'Authorization: Bearer <token>' \ --header 'Content-Type: application/json' \ --data ' { "model": "claude-opus-4-7", "prompt": "Summarize the theory of relativity in two sentences.", "max_tokens": 64, "temperature": 0.3 } '
{ "id": "task-llmrouter-1776874481-rj6bs3yb", "object": "llm.generation.task", "type": "llm", "model": "claude-opus-4-7", "status": "pending", "progress": 0, "created": 1776874481, "stream": null, "results": null, "error": null }

Authorizations

Authorization
string
header
required

All endpoints require Bearer Token authentication. Add to the request header:

Authorization: Bearer YOUR_API_KEY

YOUR_API_KEY is the API Token (sk-... format).

Body

application/json
model
string
required
Example:

"claude-opus-4-7"

prompt
string
required

User prompt, up to 100,000 characters.

Maximum string length: 100000
Example:

"Summarize the theory of relativity in two sentences."

sync
boolean
default:false

Synchronous mode. When true, the endpoint blocks until the upstream completes and returns the full response (if stream=true at the same time, returns an SSE stream); when false, the endpoint returns the task ID immediately, and results are fetched via GET /v1/tasks/{task_id} or the SSE endpoint.

Example:

false

stream
boolean
default:false

Whether to stream. When true, the Submit response includes stream.url pointing to the SSE subscription path; streaming chunks are unified as the OpenAI chat.completion.chunk format.

Example:

false

max_tokens
integer | null

Generation token limit. Optional.

Required range: x >= 1
Example:

64

temperature
number | null

Sampling temperature, range [0, 2]. Optional.

Required range: 0 <= x <= 2
Example:

0.3

system_prompt
string | null

System instruction, prepended to the conversation context. Optional, up to 10,000 characters.

Maximum string length: 10000
Example:

"You are a terse assistant."

reasoning
boolean | null

Whether to include reasoning tokens. Passed through to the upstream; concrete semantics depend on the upstream model (thinking models like gemini-2.5-pro may require true).

Response

Task created (async mode) / full response (sync mode)

Submit response, conforming to the unified task standard shape. results / error are fixed at null during submit; they are returned via GET /v1/tasks/{task_id} after the task completes or fails. In sync=true, stream=false mode, the endpoint directly returns the full OpenAI ChatCompletion JSON (does not follow this shape).

id
string
required

Task ID, formatted as task-llmrouter-{timestamp}-{8random}.

Example:

"task-llmrouter-1776874565-yq3szvcu"

object
enum<string>
required
Available options:
llm.generation.task
Example:

"llm.generation.task"

type
enum<string>
required
Available options:
llm
Example:

"llm"

model
string
required

The model name submitted by the client (echoed verbatim)

Example:

"claude-opus-4-7"

status
enum<string>
required
Available options:
pending
Example:

"pending"

progress
integer
required
Example:

0

created
integer
required
Example:

1776874565

stream
object

Returns {url: ...} when stream=true; null when stream=false.

results
object[] | null

Fixed at null during submit; returned via GET /v1/tasks/{task_id} after the task completes — results[0] is the full OpenAI ChatCompletion response.

Example:

null

error
object

Fixed at null during submit; returned via GET /v1/tasks/{task_id} when the task fails.

Example:

null