gazelle  by tincans-ai

Joint speech-language model for direct audio response

created 1 year ago
371 stars

Top 77.4% on sourcepulse

GitHubView on GitHub
Project Summary

Gazelle is a joint speech-language model designed to process audio input directly, enabling conversational AI that responds to spoken language. It targets researchers and developers interested in multimodal AI and speech-enabled applications. The primary benefit is a unified model that handles both speech recognition and language understanding, simplifying the pipeline for audio-based interactions.

How It Works

Gazelle integrates speech and language processing into a single model, eliminating the need for separate Automatic Speech Recognition (ASR) and Large Language Model (LLM) components. This joint approach allows for more direct and potentially more efficient processing of audio inputs, enabling the model to understand and respond to spoken commands or queries without intermediate text conversion. The inference code is based on Huggingface's Llava implementation.

Quick Start & Requirements

  • Install: Code is available via Hugging Face.
  • Prerequisites: Requires Python and dependencies managed by Hugging Face's ecosystem. Specific hardware requirements (e.g., GPU, VRAM) are not detailed but are implied for running LLM-based models.
  • Links:

Highlighted Details

  • Joint speech-language modeling for direct audio-to-response capabilities.
  • Inference code derived from Huggingface's Llava implementation.
  • Available checkpoints include v0.1, v0.2, and v0.2-dpo.

Maintenance & Community

Licensing & Compatibility

  • Modeling Code: Apache 2.0.
  • v0.2 Checkpoint: Apache 2.0 (derived from Mistral 7B).
  • v0.1 Checkpoint: Derived from Llama 2, governed by the Llama 2 license. Users must agree to Llama 2 license terms.

Limitations & Caveats

The v0.2 model is noted as not robust to adversarial attacks or jailbreaks and is not recommended for production use. Initial checkpoints are described as "backproppin' on a budget" and may not be robust to many real-world considerations.

Health Check
Last commit

1 year ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
2 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.