Video Understanding

Authorizations

Authorization

string

header

required

All endpoints require Bearer Token authentication. Add to the request header:

Authorization: Bearer YOUR_API_KEY

YOUR_API_KEY is the API Token (sk-... format).

Body

application/json

model

string

required

Get Model List

Example:

"gemini-2.5-pro"

prompt

string

required

User prompt, up to 100,000 characters.

Maximum string length: 100000

Example:

"What is happening in this video?"

video_urls

string[]

required

Array of video sources (1–10). Each element accepts one of the following two forms:

Publicly reachable HTTP/HTTPS URL
data:video/<type>;base64,<payload> data URI (base64 inline; note that video payloads are large)

URL format constraints (based on fal openrouter testing, 2026-05-13):

Direct video files: the extension must be .mp4 / .mpeg / .mpg / .mov / .webm
YouTube videos: https://www.youtube.com/watch?v=<id> and https://youtu.be/<id> are supported (Gemini family only)
YouTube Shorts URLs (https://www.youtube.com/shorts/<id>) are not supported; the upstream returns 422. The client can rewrite <id> into the watch?v=<id> form before calling

Model constraints: whether multiple videos are supported and the maximum count are determined by the upstream behind the selected model; when a model supports only a single video but the request passes multiple, the routing layer returns 422 model_rule_violation (the specific rules are maintained in app/relays/llm_router/model_rules.py). The Gemini family generally supports multiple videos.

Cost note: video is encoded by frame + time; a 30s clip may consume 20K+ tokens. Prefer short clips or low-frame-rate sources.

Required array length: 1 - 10 elements

Example:

[
  "https://storage.googleapis.com/cloud-samples-data/video/animals.mp4"
]

sync

boolean

default:false

Synchronous mode (see llm-text schema).

Example:

false

stream

boolean

default:false

Whether to stream (see llm-text schema).

Example:

false

max_tokens

integer | null

Generation token limit. Optional.

Required range: x >= 1

Example:

128

temperature

number | null

Sampling temperature, range [0, 2]. Optional.

Required range: 0 <= x <= 2

system_prompt

string | null

System instruction. Optional.

Maximum string length: 10000

reasoning

boolean | null

Whether to include reasoning tokens. Thinking models like gemini-2.5-pro may require this to be set to true.

Response

Task created (async mode) / full response (sync mode)

Submit response, conforming to the unified task standard shape. results / error are fixed at null during submit; they are returned via GET /v1/tasks/{task_id} after the task completes or fails. In sync=true, stream=false mode, the endpoint directly returns the full OpenAI ChatCompletion JSON.

string

required

Task ID, formatted as task-llmrouter-{timestamp}-{8random}.

Example:

"task-llmrouter-1776874565-yq3szvcu"

object

enum<string>

required

Available options:

llm.generation.task

Example:

"llm.generation.task"

type

enum<string>

required

Available options:

llm

Example:

"llm"

model

string

required

The model name submitted by the client (echoed verbatim)

Example:

"gemini-2.5-pro"

status

enum<string>

required

Available options:

pending

Example:

"pending"

progress

integer

required

Example:

0

created

integer

required

Example:

1776874565

stream

object

Returns {url: ...} when stream=true; null when stream=false.

Show child attributes

results

object[] | null

Fixed at null during submit; returned via GET /v1/tasks/{task_id} after the task completes — results[0] is the full OpenAI ChatCompletion response.

Example:

null

error

object

Fixed at null during submit; returned via GET /v1/tasks/{task_id} when the task fails.

Example:

null