joinly  by joinly-ai

AI agents for video calls

Created 3 months ago
335 stars

Top 82.0% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

Joinly is an open-source, self-hosted middleware designed to enable AI agents to actively participate in video calls across platforms like Google Meet, Zoom, and Microsoft Teams. It provides AI agents with real-time interaction capabilities through voice and chat, facilitating natural conversational flows and task execution within meetings.

How It Works

Joinly operates as a connector middleware, utilizing an MCP (Meeting Control Protocol) server to expose essential meeting tools and resources to AI agents. It supports a modular design for Speech-to-Text (STT) and Text-to-Speech (TTS) services, allowing users to choose providers like Whisper, Deepgram, Kokoro, and ElevenLabs. The system is built to handle interruptions and multi-speaker interactions, ensuring a seamless conversational experience.

Quick Start & Requirements

  • Installation: Run via Docker.
  • Prerequisites: Docker installation, .env file with LLM API keys (e.g., OpenAI, Anthropic, Ollama).
  • Setup: Pull Docker image (~2.3GB).
  • Running: docker pull ghcr.io/joinly-ai/joinly:latest followed by docker run --env-file .env ghcr.io/joinly-ai/joinly:latest --client <MeetingURL>.
  • GPU Support: Requires NVIDIA Container Toolkit and CUDA >= 12.6. Use ghcr.io/joinly-ai/joinly:latest-cuda and --gpus all.
  • Links: Quickstart, Website, Demos, Discord

Highlighted Details

  • Supports live interaction via voice and chat within meetings.
  • Cross-platform compatibility with major video conferencing tools.
  • Bring-your-own-LLM and modular TTS/STT provider support.
  • Offers GPU acceleration for transcription and TTS models.

Maintenance & Community

The project is actively maintained with a roadmap outlining future features like camera integration, screen sharing, and improved client memory. Community support is available via Discord and GitHub Discussions.

Licensing & Compatibility

Licensed under the MIT License, permitting commercial use and integration with closed-source applications.

Limitations & Caveats

The Docker image is substantial (~2.3GB) due to bundled browser and models. GPU support requires specific CUDA versions and NVIDIA drivers. Some roadmap features, such as camera integration and improved client memory, are still under development.

Health Check
Last Commit

17 hours ago

Responsiveness

Inactive

Pull Requests (30d)
73
Issues (30d)
1
Star History
86 stars in the last 30 days

Explore Similar Projects

Starred by Tobi Lutke Tobi Lutke(Cofounder of Shopify), Joe Walnes Joe Walnes(Head of Experimental Projects at Stripe), and
12 more.

LibreChat by danny-avila

0.7%
30k
Enhanced ChatGPT clone for self-hosting
Created 2 years ago
Updated 20 hours ago
Feedback? Help us improve.