Web tool for multimodal interaction with GPT-4 and Gemini
Top 50.5% on sourcepulse
Sagittarius is a web-based tool for exploring the capabilities of GPT-4 and Gemini through voice and video interactions. It allows users with API keys from OpenAI or Google to engage with these advanced AI models in real-time, offering a faster and more versatile alternative to official demos.
How It Works
The tool leverages web technologies to provide a user-friendly interface for voice and video input, processing these through either the OpenAI GPT-4 API (specifically gpt-4-vision-preview
) or the Google Gemini API. It supports multiple voice outputs and aims for a seamless, high-speed interaction experience.
Quick Start & Requirements
npm install
and run with npm run dev
.gpt-4-vision-preview
) or a Gemini API key is necessary.http://localhost:5173
.Highlighted Details
Maintenance & Community
The project is maintained by gregsadetsky. Future plans include adding allcontributors
and dependabot
for community engagement and dependency management.
Licensing & Compatibility
The repository does not explicitly state a license in the provided README. Users should verify licensing for commercial or closed-source integration.
Limitations & Caveats
The project is still under active development, with several features marked as TODO, including deployment to a public site and streaming output implementation. In-browser speech recognition is noted to work best in Google Chrome.
1 year ago
1 day