veo-3-nano-banana-gemini-api-quickstart by google-gemini

Gemini API SDK for multimodal content creation

Created 6 months ago

299 stars

Top 89.2% on SourcePulse

View on GitHub

1 Expert Loves This Project

Omar Sanseviero

DevRel at Google DeepMind

Project Summary

This project provides a Next.js quickstart for developers to integrate Google's advanced AI media models—Veo 3, Imagen 4, and Gemini 2.5 Flash—into their applications. It offers a lightweight, unified UI for creating and editing images and videos, serving as a learning tool and a foundation for building custom AI-powered media studios. The benefit lies in simplifying the complex process of interacting with these powerful generative AI models via the Gemini API.

How It Works

The application leverages a standard Next.js architecture with dedicated API routes to interface with the Gemini API. It orchestrates requests for image generation (Imagen 4, Gemini 2.5 Flash), image editing/composition (Gemini 2.5 Flash), and video generation (Veo 3). A unified composer UI allows seamless switching between these modes, abstracting the underlying API calls and providing a cohesive user experience for diverse AI media tasks.

Quick Start & Requirements

Primary install/run command: npm install followed by npm run dev.
Non-default prerequisites: Node.js and npm (or yarn/pnpm). A GEMINI_API_KEY is mandatory, configured via a .env file or system environment variable.
Adoption blockers: Usage of Veo 3, Imagen 4, and Gemini 2.5 Flash models requires being on the Gemini API Paid tier.
Links:
- Gemini API docs: https://ai.google.dev/gemini-api/docs
- Veo 3 Guide: https://ai.google.dev/gemini-api/docs/video?example=dialogue
- Imagen 4 Guide: https://ai.google.dev/gemini-api/docs/imagen
Local Access: The development server is accessible at http://localhost:3000.

Highlighted Details

Unified composer UI supporting image/video creation, editing, and composition.
Direct integration with Veo 3 for video generation and Imagen 4/Gemini 2.5 Flash for image generation.
Gemini 2.5 Flash capabilities include image editing and combining multiple images.
Features in-browser video trimming and direct asset downloads.

Maintenance & Community

The provided README does not detail specific contributors, sponsorships, or community channels (e.g., Discord, Slack). It directs users to open GitHub issues for feature requests.

Licensing & Compatibility

License type: Apache License 2.0.
Compatibility notes: The Apache 2.0 license generally permits commercial use and integration into closed-source projects, subject to the terms of service of the underlying Google Gemini API.

Limitations & Caveats

This repository serves as a quickstart example and is explicitly stated as not being an official Google product. Access to the core AI models (Veo 3, Imagen 4, Gemini 2.5 Flash) necessitates a paid Gemini API tier. The project is positioned as a lightweight alternative to professional environments like Google's Flow.

Health Check

Last Commit

6 days ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

20 stars in the last 30 days