sagittarius  by gregsadetsky

Web tool for multimodal interaction with GPT-4 and Gemini

Created 1 year ago
687 stars

Top 49.5% on SourcePulse

GitHubView on GitHub
Project Summary

Sagittarius is a web-based tool for exploring the capabilities of GPT-4 and Gemini through voice and video interactions. It allows users with API keys from OpenAI or Google to engage with these advanced AI models in real-time, offering a faster and more versatile alternative to official demos.

How It Works

The tool leverages web technologies to provide a user-friendly interface for voice and video input, processing these through either the OpenAI GPT-4 API (specifically gpt-4-vision-preview) or the Google Gemini API. It supports multiple voice outputs and aims for a seamless, high-speed interaction experience.

Quick Start & Requirements

  • Install via npm install and run with npm run dev.
  • Requires Node.js and npm.
  • An OpenAI API key (with access to gpt-4-vision-preview) or a Gemini API key is necessary.
  • Google Chrome is recommended for optimal in-browser speech recognition.
  • The demo runs at http://localhost:5173.

Highlighted Details

  • Enables head-to-head comparisons between Gemini Pro and GPT-4.
  • Supports multiple voice options for output.
  • Offers faster performance compared to some existing demos.
  • Includes planned support for streaming output and UI improvements.

Maintenance & Community

The project is maintained by gregsadetsky. Future plans include adding allcontributors and dependabot for community engagement and dependency management.

Licensing & Compatibility

The repository does not explicitly state a license in the provided README. Users should verify licensing for commercial or closed-source integration.

Limitations & Caveats

The project is still under active development, with several features marked as TODO, including deployment to a public site and streaming output implementation. In-browser speech recognition is noted to work best in Google Chrome.

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
1 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Jinze Bai Jinze Bai(Research Scientist at Alibaba Qwen), and
4 more.

self-operating-computer by OthersideAI

0.1%
10k
Framework for multimodal computer operation
Created 1 year ago
Updated 4 months ago
Feedback? Help us improve.