Veo capabilities
5 mapped capabilities, each graded and dated. The map shows what Veo can do; the audit shows whether it’s worth consolidating — and a guide shows how to move.
Capabilities
API pricing and plan gating
canonicalverified todayVeo is available to developers as a pay-as-you-go model through the Gemini API (Google AI Studio) and Vertex AI, priced per second of generated video, with three model tiers trading quality for cost.
Flow filmmaking app and consumer access
canonicalverified todayFor non-developers, Veo is reached through Google Flow (an AI filmmaking studio in Google Labs) and the Gemini consumer app, both gated by Google AI subscription plans and a monthly credit allowance rather than per-second API billing.
Image, reference, and frame-guided video
canonicalverified todayBeyond plain text-to-video, Veo 3.1 can be steered with images: animate a starting frame (image-to-video), guide content and style with reference images (Ingredients to Video), interpolate between a first and last frame, and extend an existing Veo clip to continue the story.
Native audio generation
canonicalverified todayVeo 3.1 generates synchronized audio natively alongside the video in a single pass, including dialogue, sound effects, and ambient soundscapes. This is a signature differentiator versus video models that output silent clips.
Text-to-video generation
canonicalverified todayVeo 3.1 is Google's video generation model that turns a text prompt into a short, high-fidelity cinematic clip with natively generated audio. It is Veo's core capability and the basis for all other generation modes.