JARVIS-AGI by SreejanPersonal

AI voice assistant enabling human-like interaction and task automation

Created 2 years ago

265 stars

Top 96.3% on SourcePulse

Project Summary

JARVIS-AGI is an AI-powered voice assistant designed to integrate speech recognition, text processing, and image analysis into a cohesive system. It aims to simplify user interaction with technology by enabling voice commands for tasks like checking weather, managing calendars, and web searches, targeting users seeking a more convenient and intuitive tech experience.

How It Works

The project employs a modular architecture, separating core functionalities into distinct directories. The ENGINE handles speech-to-text (STT) and text-to-speech (TTS) using various models like Vosk and ElevenLabs. The BRAIN directory houses diverse AI models for text processing (integrating numerous APIs like Blackbox_ai, Hugging Face, deepInfra, and local LLMs like llama_CPP), image analysis, and vision tasks. Audio tools for hotword detection and playback management are also integrated, allowing for hands-free interaction.

Quick Start & Requirements

Clone the repository: git clone https://github.com/SreejanPersonal/JARVIS-AGI.git
Navigate to the directory: cd JARVIS-AGI
Install dependencies: pip install -r requirements.txt
(Optional) Install Vosk Speech Recognition Models by downloading language models from the Vosk GitHub releases and placing them in ASSETS/Vosk/, then updating model_path in relevant scripts.
Configure API keys and settings in the .env file.

For optimal performance, using the default library versions specified in the requirements is recommended.

Highlighted Details

Extensive integration of multiple third-party AI APIs and local models for text generation, image analysis, and speech processing.
Supports multi-modal interactions including voice commands, text processing, and image recognition.
Includes specialized tools for web searching, calendar management, and Android device interaction via ADB.
Features hotword detection for seamless activation and control.

Maintenance & Community

The project is primarily developed by Sree (Devs Do Code). Community engagement and support are available via Telegram (Devs Do Code), YouTube (Devs Do Code), Discord (Devs Do Code), and Instagram.

Licensing & Compatibility

This project is licensed under the MIT License. No explicit restrictions for commercial use are mentioned, though the recommendation to use default library versions suggests potential compatibility considerations with dependency version mismatches.

Limitations & Caveats

The README does not explicitly detail limitations. However, the strong emphasis on using default library versions implies potential fragility or compatibility issues when deviating from the specified dependencies. The project's reliance on numerous external APIs also introduces dependencies on their availability and potential associated costs.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

4 stars in the last 30 days