Gemini Format - Music Generation
- Uses the Gemini native format generateContent endpoint to generate music via the Lyria 3 model
- Enable audio output by including
AUDIOingenerationConfig.responseModalities; ifTEXTis also included, the response will additionally return text descriptions (lyrics/structure) - Supports text prompts and image input (up to 10 images); images are used to inspire visually-driven music creation
- Duration, structure (verse/chorus/bridge), style, etc. are primarily controlled via text prompts
lyria-3-clip-preview: generates fixed 30-second clips, returns MP3 by default (audio/mpeg)lyria-3-pro-preview: generates full songs; you can requestaudio/mpegoraudio/wavviaresponseMimeType, but the actual output format should be determined by theinlineData.mimeTypein the response- For SSE streaming output, use
/v1beta/models/{model}:streamGenerateContent?alt=sse - Music generation is a single-turn process; multi-turn iterative editing is not supported
Authorizations
All endpoints require Bearer Token authentication
Add the following to your request headers:
Authorization: Bearer YOUR_API_KEY
Path Parameters
Model name. lyria-3-clip-preview generates 30-second clips (default MP3 / audio/mpeg). lyria-3-pro-preview generates full songs; you can request audio/mpeg or audio/wav, but the actual output format should be determined by the returned inlineData.mimeType
lyria-3-clip-preview, lyria-3-pro-preview "lyria-3-clip-preview"
Body
Content list. Music generation is a single-turn process; multi-turn iterative editing is not supported
Generation config; responseModalities must include AUDIO for music generation requests
System instruction. Lyria 3 model support for this field is not confirmed by official documentation; it may not take effect
Content safety filter settings