joinly by joinly-ai

AI agents for video calls

Created 9 months ago

469 stars

Top 65.0% on SourcePulse

View on GitHub

1 Expert Loves This Project

Magnus Müller

Cofounder of Browser Use

Project Summary

Joinly is an open-source, self-hosted middleware designed to enable AI agents to actively participate in video calls across platforms like Google Meet, Zoom, and Microsoft Teams. It provides AI agents with real-time interaction capabilities through voice and chat, facilitating natural conversational flows and task execution within meetings.

How It Works

Joinly operates as a connector middleware, utilizing an MCP (Meeting Control Protocol) server to expose essential meeting tools and resources to AI agents. It supports a modular design for Speech-to-Text (STT) and Text-to-Speech (TTS) services, allowing users to choose providers like Whisper, Deepgram, Kokoro, and ElevenLabs. The system is built to handle interruptions and multi-speaker interactions, ensuring a seamless conversational experience.

Quick Start & Requirements

Installation: Run via Docker.
Prerequisites: Docker installation, .env file with LLM API keys (e.g., OpenAI, Anthropic, Ollama).
Setup: Pull Docker image (~2.3GB).
Running: docker pull ghcr.io/joinly-ai/joinly:latest followed by docker run --env-file .env ghcr.io/joinly-ai/joinly:latest --client <MeetingURL>.
GPU Support: Requires NVIDIA Container Toolkit and CUDA >= 12.6. Use ghcr.io/joinly-ai/joinly:latest-cuda and --gpus all.
Links: Quickstart, Website, Demos, Discord

Highlighted Details

Supports live interaction via voice and chat within meetings.
Cross-platform compatibility with major video conferencing tools.
Bring-your-own-LLM and modular TTS/STT provider support.
Offers GPU acceleration for transcription and TTS models.

Maintenance & Community

The project is actively maintained with a roadmap outlining future features like camera integration, screen sharing, and improved client memory. Community support is available via Discord and GitHub Discussions.

Licensing & Compatibility

Licensed under the MIT License, permitting commercial use and integration with closed-source applications.

Limitations & Caveats

The Docker image is substantial (~2.3GB) due to bundled browser and models. GPU support requires specific CUDA versions and NVIDIA drivers. Some roadmap features, such as camera integration and improved client memory, are still under development.

Health Check

Last Commit

2 days ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

39 stars in the last 30 days