Audio Understanding
Use the returned task ID to query the task for the final result.
Authorizations
All endpoints require Bearer Token authentication. Add to the request header:
Authorization: Bearer YOUR_API_KEY
YOUR_API_KEY is the API Token (sk-... format).
Body
"gemini-2.5-pro"
Audio source. Accepts one of the following two forms:
- Publicly reachable HTTP/HTTPS URL
data:audio/<type>;base64,<payload>data URI (base64 inline)
Audio format support per family (the specific available models are driven by channel configuration):
- Gemini family (e.g.
gemini-*): wav/mp3/aiff/aac/ogg/flac/m4a; total request body (prompt + system + inline files) ≤ 20 MB
Base64 data is not size-validated; oversized payloads may trigger 422.
1"https://storage.googleapis.com/cloud-samples-tests/speech/brooklyn.flac"
User prompt. When omitted, defaults to 'Please transcribe this audio file', aligning with the transcription scenario.
100000"Identify the speakers and emotion in this audio."
Synchronous mode. When true, the endpoint blocks until the upstream completes and returns the full response (if stream=true at the same time, returns an SSE stream); when false, the endpoint returns the task ID immediately, and results are fetched via GET /v1/tasks/{task_id} or the SSE endpoint.
false
Whether to stream. When true, the Submit response includes stream.url pointing to the SSE subscription path; streaming chunks are unified as the OpenAI chat.completion.chunk format.
false
Generation token limit. Optional.
x >= 1256
Sampling temperature, range [0, 2]. Optional.
0 <= x <= 2System instruction. Optional.
10000Whether to include reasoning tokens. Some thinking models require this to be set to true.
Response
Task created
Submit response, conforming to the unified task standard shape. results / error are fixed at null during submit; they are returned via GET /v1/tasks/{task_id} after the task completes or fails.
Task ID, formatted as task-llm-{timestamp}-{8random}.
"task-llm-1776874565-yq3szvcu"
llm.generation.task "llm.generation.task"
llm "llm"
The model name submitted by the client (echoed verbatim)
"gemini-2.5-pro"
pending "pending"
0
1776874565
Returns {url: ...} when stream=true; null when stream=false.
Fixed at null during submit; returned via GET /v1/tasks/{task_id} after the task completes — results[0] is the full OpenAI ChatCompletion response (audio transcription / understanding output is in message.content).
null
Fixed at null during submit; returned via GET /v1/tasks/{task_id} when the task fails.
null