sagittarius by gregsadetsky

Web tool for multimodal interaction with GPT-4 and Gemini

Created 2 years ago

685 stars

Top 49.6% on SourcePulse

Project Summary

Sagittarius is a web-based tool for exploring the capabilities of GPT-4 and Gemini through voice and video interactions. It allows users with API keys from OpenAI or Google to engage with these advanced AI models in real-time, offering a faster and more versatile alternative to official demos.

How It Works

The tool leverages web technologies to provide a user-friendly interface for voice and video input, processing these through either the OpenAI GPT-4 API (specifically gpt-4-vision-preview) or the Google Gemini API. It supports multiple voice outputs and aims for a seamless, high-speed interaction experience.

Quick Start & Requirements

Install via npm install and run with npm run dev.
Requires Node.js and npm.
An OpenAI API key (with access to gpt-4-vision-preview) or a Gemini API key is necessary.
Google Chrome is recommended for optimal in-browser speech recognition.
The demo runs at http://localhost:5173.

Highlighted Details

Enables head-to-head comparisons between Gemini Pro and GPT-4.
Supports multiple voice options for output.
Offers faster performance compared to some existing demos.
Includes planned support for streaming output and UI improvements.

Maintenance & Community

The project is maintained by gregsadetsky. Future plans include adding allcontributors and dependabot for community engagement and dependency management.

Licensing & Compatibility

The repository does not explicitly state a license in the provided README. Users should verify licensing for commercial or closed-source integration.

Limitations & Caveats

The project is still under active development, with several features marked as TODO, including deployment to a public site and streaming output implementation. In-browser speech recognition is noted to work best in Google Chrome.

sagittarius by gregsadetsky

Explore Similar Projects

whispering-ui by Sharrnah

whisper_dictation by themanyone

gemini-cursor by 13point5

ai-devices by developersdigest

Stream-Omni by ictnlp

gpt-voice-conversation-chatbot by Adri6336

ChatGPT-Virtual-Live by smallnew666

gemini-multimodal-playground by saharmor

ada by Nlouis38

Scriberr by rishikanthc

self-operating-computer by OthersideAI

MiniCPM-o by OpenBMB