Scribe V2 Speech Recognition

Authorizations

Authorization

string

header

required

All APIs require Bearer Token authentication

Add to request header:

Authorization: Bearer YOUR_API_KEY

Body

application/json

model

string

default:scribe-v2

required

scribe-v2: Speech recognition model supporting diarize, audio event tagging, and keyterms

Example:

"scribe-v2"

audio_url

string

required

Audio file URL to transcribe

Notes:

Must be an HTTP/HTTPS accessible URL
The audio file must be directly accessible and readable by the system

Example:

"https://samplelib.com/lib/preview/mp3/sample-3s.mp3"

language_code

string | null

Audio language code

Notes:

Supports ISO-639-1 or ISO-639-3 codes
Examples: zh / zho / en / eng
Auto-detected if not provided

Example:

"zh"

tag_audio_events

boolean

default:true

Whether to tag audio events such as laughter and applause. Enabled by default.

Example:

true

diarize

boolean

default:true

Whether to perform speaker diarization. Enabled by default.

Example:

true

keyterms

string[] | null

Bias terms / phrase list

Notes:

Up to 100 entries
Each entry up to 50 characters
Used to boost recognition of specific terms or proper nouns

Do not pass this parameter unless necessary.

Maximum array length: 100

Maximum string length: 50

Example:

[
  "project kickoff",
  "quarterly results",
  "speech to text"
]

Response

Task created successfully

created

integer

Task creation timestamp

Example:

1757165031

string

Task ID

Example:

"task-unified-1757165031-uyujaw3d"

model

string

Actual model name used

object

enum<string>

Specific task type

Available options:

audio.generation.task

progress

integer

Task progress percentage (0-100)

Required range: 0 <= x <= 100

Example:

0

status

enum<string>

Task status

Available options:

pending,

processing,

completed,

failed

Example:

"pending"

task_info

object

Asynchronous task info

Show child attributes

type

enum<string>

Task output type

Available options:

audio

Example:

"audio"