Vidu Q3 Video Generation

Authorizations

Authorization

string

header

required

All endpoints require Bearer Token authentication

Add the following to your request headers:

Authorization: Bearer YOUR_API_KEY

Body

application/json

model

string

default:vidu-q3-pro

required

vidu-q3-pro: Standard version, supports text-to-video, image-to-video, first-last-frame transition, and reference-to-video vidu-q3-turbo: Accelerated version, supports text-to-video, image-to-video, and first-last-frame transition, does not support reference-to-video

Examples:

"vidu-q3-pro"

"vidu-q3-turbo"

generation_type

enum<string> | null

Generation mode

Options:

text-to-video — Generate video from text only, do not pass image_urls
image-to-video — Single image to video, image_urls must contain exactly 1 image
first-last-frame — First-last frame transition video, image_urls must contain exactly 2 images
reference-to-video — Reference image to video, image_urls can contain 1-4 images

Notes:

If not specified, it will be automatically inferred based on input: 0 images → text-to-video, 1 image → image-to-video, 2 images → first-last-frame, 3-4 images → reference-to-video
vidu-q3-turbo does not support reference-to-video

Available options:

text-to-video,

image-to-video,

first-last-frame,

reference-to-video

Example:

"text-to-video"

prompt

string | null

Text prompt describing the desired video content

Notes:

Required for text-to-video and reference-to-video
Optional for image-to-video and first-last-frame, if not provided, generation is primarily driven by the input images
Supports both Chinese and English

Maximum string length: 2000

Example:

"A cinematic tracking shot through a rainy cyberpunk alley"

image_urls

string[] | null

List of image URLs, the number of images depends on generation_type

Notes:

text-to-video: Do not pass
image-to-video: Exactly 1 image
first-last-frame: Exactly 2 images, in the order of first frame, last frame
reference-to-video: 1-4 images
Image URLs must be publicly accessible

Example:

["https://picsum.photos/id/237/1280/720.jpg"]

duration

integer

default:5

Output video duration (seconds)

Notes:

Defaults to 5 seconds
Must be a positive integer

Required range: x >= 1

Example:

5

aspect_ratio

enum<string>

default:16:9

Output video aspect ratio

Options:

16:9 — Landscape (default)
9:16 — Portrait
4:3 — Traditional landscape
3:4 — Traditional portrait
1:1 — Square

Available options:

16:9,

9:16,

4:3,

3:4,

1:1

Example:

"16:9"

resolution

enum<string>

default:720p

Output video resolution

Options:

360p — Low resolution
540p — Medium-low resolution
720p — HD (default)
1080p — Full HD

Notes:

first-last-frame does not support 360p

Available options:

360p,

540p,

720p,

1080p

Example:

"540p"

generate_audio

boolean

default:true

Whether to generate accompanying audio

Notes:

Enabled by default
Set to false to generate video only without sound

Example:

true

seed

integer | null

Random seed for improving result reproducibility

Not recommended unless specifically needed.

Required range: 0 <= x <= 2147483647

Example:

42

Response

Task created successfully

created

integer

Task creation timestamp

Example:

1757165031

string

Task ID

Example:

"task-unified-1757165031-uyujaw3d"

model

string

Actual model name used

object

enum<string>

Specific type of the task

Available options:

video.generation.task

progress

integer

Task progress percentage (0-100)

Required range: 0 <= x <= 100

Example:

0

status

enum<string>

Task status

Available options:

pending,

processing,

completed,

failed

Example:

"pending"

task_info

object

Async task information

Show child attributes

type

enum<string>

Output type of the task

Available options:

video

Example:

"video"