rtc-aigc-embedded-demo by volcengine

IoT demo for RTC AIGC integration on ESP32

Created 8 months ago

303 stars

Top 88.2% on SourcePulse

Project Summary

This project provides a demo for integrating Real-Time Communication (RTC) with AI-generated content (AIGC) on embedded devices, targeting developers working with IoT and AI applications. It showcases a real-time conversational AI experience powered by Volcengine's cloud services and Espressif hardware.

How It Works

The demo orchestrates a pipeline involving Volcengine's RTC, Speech Recognition (ASR), Text-to-Speech (TTS), and Ark large language models. An embedded device (ESP32-S3-Korvo-2) captures audio, sends it for ASR, processes the transcribed text with a large language model for a response, synthesizes the response into speech via TTS, and plays it back. The server component manages API interactions and configurations.

Quick Start & Requirements

Server:
- Install dependencies: pip install requests
- Configure RtcAigcConfig.py with Volcengine API keys (AK/SK), RTC AppID/AppKey, Ark EndpointId, TTS Voice Type, and ASR/TTS AppIDs/Access Tokens.
- Run server: python3 RtcAigcService.py
Device (ESP32-S3):
- Prerequisites: Linux server (Ubuntu 18.04+ recommended), Python 3.8+, Espressif ESP32-S3-Korvo-2 or AtomS3R board, CMake, Ninja, dfu-util.
- Espressif ADF framework setup: Clone esp-adf, reset to specific commit 0d76650198ca96546c40d10a7ce8963bacdf820b, update submodules, run ./install.sh esp32s3, and . ./export.sh.
- Clone demo into $ADF_PATH/examples.
- Configure Config.h with server address and Volcengine parameters.
- Apply patches for disabling Volcengine components and adding AtomS3R board support.
- Compile: idf.py set-target esp32s3, idf.py menuconfig (set WiFi, board), idf.py build.
- Flash & Monitor: idf.py flash, idf.py monitor.
Volcengine Services: Requires activation of RTC, Speech Recognition, Speech Synthesis, and Ark services.

Highlighted Details

Demonstrates end-to-end AIGC integration on resource-constrained embedded hardware.
Utilizes Volcengine's suite of AI and RTC services for a conversational experience.
Supports ESP32-S3-Korvo-2 and AtomS3R development boards.

Maintenance & Community

Project welcomes technical discussions via issues and community groups.

Licensing & Compatibility

The repository itself appears to be under a permissive license, but the underlying Espressif ADF framework has its own licensing. Specific Volcengine service usage is governed by Volcengine's terms.

Limitations & Caveats

The provided server example is for demonstration and quick testing only; production environments require a custom server implementation.
Requires specific Volcengine service configurations and API keys.
Strict adherence to Espressif ADF and IDF versions is necessary for device-side compilation.

Health Check

Last Commit

1 week ago

Responsiveness

1+ week

Pull Requests (30d)

0

Issues (30d)

1

Star History

4 stars in the last 30 days

Explore Similar Projects

whispering-ui by Sharrnah

Native UI for live audio transcription/translation

Created 2 years ago

Updated 2 days ago

alibabacloud-bailian-speech-demo by aliyun

Speech AI SDK demos for AlibabaCloud Bailian

Created 1 year ago

Updated 1 week ago

Starred by

Elvis Saravia

Elvis Saravia(Founder of DAIR.AI).

S.A.T.U.R.D.A.Y by GRVYDEV

Vocal computing toolbox for building voice interfaces to LLMs

Created 2 years ago

Updated 2 years ago

LiveWhisper by Nikorasu

Live transcription tool using OpenAI's Whisper

Created 2 years ago

Updated 2 months ago

Starred by

Jong Wook Kim

Jong Wook Kim(Research Scientist at OpenAI).

realtime-transcription-fastrtc by sofi444

Real-time transcription tool using local Whisper models

Created 6 months ago

Updated 2 months ago

fast-voice-assistant by dsa

AI voice assistant demo with <500ms response

Created 1 year ago

Updated 9 months ago

Starred by

Luis Capelo

Luis Capelo(Cofounder of Lightning AI).

hertz-dev by Standard-Intelligence

Open-source base model for full-duplex conversational audio

Created 10 months ago

Updated 8 months ago

xtts-api-server by daswer123

FastAPI server for XTTSv2 text-to-speech

Created 1 year ago

Updated 1 year ago

Esp32_VoiceChat_LLMs by MetaWu2077

ESP32 device for voice chat with LLMs

Created 1 year ago

Updated 1 year ago

Starred by

Luis Capelo

Luis Capelo(Cofounder of Lightning AI).

RealtimeVoiceChat by KoljaB

Real-time voice chat with AI using streaming audio

Created 5 months ago

Updated 2 months ago

py-xiaozhi by huangjunsen0406

Python voice client for AI assistant "Xiaozhi"

Created 7 months ago

Updated 5 days ago

Starred by

Chaoyu Yang

Chaoyu Yang(Founder of Bento),

Nir Gazit

Nir Gazit(Cofounder of Traceloop), and

4 more.

pipecat by pipecat-ai

Open-source framework for building real-time voice and multimodal conversational AI agents

Created 1 year ago

Updated 1 day ago

Feedback? Help us improve.