Video Understanding
Use the returned task ID to query the task for the final result.
Authorizations
All endpoints require Bearer Token authentication. Add to the request header:
Authorization: Bearer YOUR_API_KEY
YOUR_API_KEY is the API Token (sk-... format).
Body
"gemini-2.5-pro"
User prompt, up to 100,000 characters.
100000"What is happening in this video?"
Array of video sources (1–10). Each element accepts one of the following two forms:
- Publicly reachable HTTP/HTTPS URL
data:video/<type>;base64,<payload>data URI (base64 inline; note that video payloads are large)
URL format constraints (based on fal openrouter testing, 2026-05-13):
- Direct video files: the extension must be
.mp4/.mpeg/.mpg/.mov/.webm - YouTube videos:
https://www.youtube.com/watch?v=<id>andhttps://youtu.be/<id>are supported (Gemini family only) - YouTube Shorts URLs (
https://www.youtube.com/shorts/<id>) are not supported; the upstream returns 422. The client can rewrite<id>into thewatch?v=<id>form before calling
Model constraints: whether multiple videos are supported and the maximum count are determined by the upstream behind the selected model; when a model supports only a single video but the request passes multiple, the routing layer returns 422 model_rule_violation (the specific rules are maintained in app/relays/llm_router/model_rules.py). The Gemini family generally supports multiple videos.
Cost note: video is encoded by frame + time; a 30s clip may consume 20K+ tokens. Prefer short clips or low-frame-rate sources.
1 - 10 elements[
"https://storage.googleapis.com/cloud-samples-data/video/animals.mp4"
]Synchronous mode (see llm-text schema).
false
Whether to stream (see llm-text schema).
false
Generation token limit. Optional.
x >= 1128
Sampling temperature, range [0, 2]. Optional.
0 <= x <= 2System instruction. Optional.
10000Whether to include reasoning tokens. Thinking models like gemini-2.5-pro may require this to be set to true.
Response
Task created (async mode) / full response (sync mode)
Submit response, conforming to the unified task standard shape. results / error are fixed at null during submit; they are returned via GET /v1/tasks/{task_id} after the task completes or fails. In sync=true, stream=false mode, the endpoint directly returns the full OpenAI ChatCompletion JSON.
Task ID, formatted as task-llmrouter-{timestamp}-{8random}.
"task-llmrouter-1776874565-yq3szvcu"
llm.generation.task "llm.generation.task"
llm "llm"
The model name submitted by the client (echoed verbatim)
"gemini-2.5-pro"
pending "pending"
0
1776874565
Returns {url: ...} when stream=true; null when stream=false.
Fixed at null during submit; returned via GET /v1/tasks/{task_id} after the task completes — results[0] is the full OpenAI ChatCompletion response.
null
Fixed at null during submit; returned via GET /v1/tasks/{task_id} when the task fails.
null