Veo capabilities

5 mapped capabilities, each graded and dated. The map shows what Veo can do; the audit shows whether it’s worth consolidating — and a guide shows how to move.

Run the free audit See migrations

API pricing and plan gating

canonicalverified today

Veo is available to developers as a pay-as-you-go model through the Gemini API (Google AI Studio) and Vertex AI, priced per second of generated video, with three model tiers trading quality for cost.

Provisional · 0.70docssource ↗source ↗

Flow filmmaking app and consumer access

canonicalverified today

For non-developers, Veo is reached through Google Flow (an AI filmmaking studio in Google Labs) and the Gemini consumer app, both gated by Google AI subscription plans and a monthly credit allowance rather than per-second API billing.

Provisional · 0.70docssource ↗source ↗

Image, reference, and frame-guided video

canonicalverified today

Beyond plain text-to-video, Veo 3.1 can be steered with images: animate a starting frame (image-to-video), guide content and style with reference images (Ingredients to Video), interpolate between a first and last frame, and extend an existing Veo clip to continue the story.

Provisional · 0.70docssource ↗source ↗

Native audio generation

canonicalverified today

Veo 3.1 generates synchronized audio natively alongside the video in a single pass, including dialogue, sound effects, and ambient soundscapes. This is a signature differentiator versus video models that output silent clips.

Provisional · 0.70docssource ↗source ↗

Text-to-video generation

canonicalverified today

Veo 3.1 is Google's video generation model that turns a text prompt into a short, high-fidelity cinematic clip with natively generated audio. It is Veo's core capability and the basis for all other generation modes.

Provisional · 0.70docssource ↗source ↗

Capabilities

API pricing and plan gating

Flow filmmaking app and consumer access

Image, reference, and frame-guided video

Native audio generation

Text-to-video generation