Scribe V2 Speech Recognition
- Scribe V2 audio file recognition model
- Supports language specification, speaker diarization, audio event tagging, and keyterms biasing
- Asynchronous processing mode, use the returned task ID to query status
- Recognition results are returned in the
resultsfield of the task detail
Authorizations
All APIs require Bearer Token authentication
Add to request header:
Authorization: Bearer YOUR_API_KEY
Body
scribe-v2: Speech recognition model supporting diarize, audio event tagging, and keyterms
"scribe-v2"
Audio file URL to transcribe
Notes:
- Must be an HTTP/HTTPS accessible URL
- The audio file must be directly accessible and readable by the system
"https://samplelib.com/lib/preview/mp3/sample-3s.mp3"
Audio language code
Notes:
- Supports ISO-639-1 or ISO-639-3 codes
- Examples:
zh/zho/en/eng - Auto-detected if not provided
"zh"
Whether to tag audio events such as laughter and applause. Enabled by default.
true
Whether to perform speaker diarization. Enabled by default.
true
Bias terms / phrase list
Notes:
- Up to 100 entries
- Each entry up to 50 characters
- Used to boost recognition of specific terms or proper nouns
Do not pass this parameter unless necessary.
10050[
"project kickoff",
"quarterly results",
"speech to text"
]Response
Task created successfully
Task creation timestamp
1757165031
Task ID
"task-unified-1757165031-uyujaw3d"
Actual model name used
Specific task type
audio.generation.task Task progress percentage (0-100)
0 <= x <= 1000
Task status
pending, processing, completed, failed "pending"
Asynchronous task info
Task output type
audio "audio"