friday-tony-stark-demo by SAGAR-TAMANG

Voice AI assistant with dynamic tool access

Created 3 months ago

1,440 stars

Top 27.6% on SourcePulse

Project Summary

Summary

This project offers a F.R.I.D.A.Y.-inspired AI assistant, demonstrating real-time voice interaction, LLM reasoning, and dynamic tool integration. It targets developers and enthusiasts seeking to build sophisticated conversational agents with capabilities akin to a high-tech personal assistant. The core benefit is a functional, modular framework for integrating voice, AI, and external tools.

How It Works

The system comprises two cooperating components: an MCP Server and a Voice Agent. The MCP Server, built with FastMCP, exposes various tools (news, web search, system info) via Server-Sent Events (SSE). The Voice Agent, utilizing LiveKit Agents, handles the voice pipeline: it captures audio, transcribes it using STT (defaulting to Sarvam Saaras v3), reasons with an LLM (defaulting to Gemini 2.5 Flash), and synthesizes speech via TTS (defaulting to OpenAI nova). Crucially, the Voice Agent dynamically pulls tools from the MCP server in real-time during its reasoning process.

Quick Start & Requirements

Prerequisites: Python ≥ 3.11, uv package manager, a LiveKit Cloud project.
Installation: Clone the repository, navigate into the directory, and run uv sync to create a virtual environment and install dependencies.
Configuration: Copy .env.example to .env and populate it with required API keys for LiveKit, Sarvam, OpenAI, and Google.
Execution: Run the MCP server (uv run friday) and the Voice Agent (uv run friday_voice) concurrently in separate terminals.
Resources: API keys are required from LiveKit Cloud, Sarvam AI, OpenAI, and Google AI Studio.

Highlighted Details

Modular design separating backend tool serving (MCP Server) from the voice interaction pipeline (Voice Agent).
Flexible provider selection for Speech-to-Text (Sarvam, Whisper), Large Language Model (Gemini, OpenAI), and Text-to-Speech (OpenAI, Sarvam).
Real-time tool invocation mechanism where the LLM dynamically requests and uses tools exposed by the MCP server.
Project structure includes dedicated modules for tools, prompts, and resources, facilitating extensibility.

Maintenance & Community

The provided README does not contain specific details regarding maintainers, community channels (e.g., Discord, Slack), project roadmap, or notable sponsorships.

Licensing & Compatibility

The project is released under the MIT license, which is highly permissive and allows for commercial use, modification, and distribution, including integration into closed-source applications.

Limitations & Caveats

Successful operation necessitates obtaining and configuring multiple third-party API keys. The system requires both the MCP server and the voice agent to be running simultaneously. The project is presented as a "demo" and utilizes "dev mode," indicating potential areas for further hardening or optimization before production deployment. Switching between STT, LLM, and TTS providers requires direct code modification within agent_friday.py.

Health Check

Last Commit

5 days ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

263 stars in the last 30 days