WavCraft  by JinhuaLiang

AI agent for audio creation and editing

Created 1 year ago
525 stars

Top 60.2% on SourcePulse

GitHubView on GitHub
Project Summary

WavCraft is an AI agent designed for audio creation and editing, targeting researchers and content creators. It simplifies complex audio tasks like text-guided generation, editing, and scriptwriting by leveraging large language models to orchestrate various audio expert models and digital signal processing functions.

How It Works

WavCraft functions as an LLM-driven agent, connecting diverse audio models and DSP functions. This approach allows users to interact with audio using natural language prompts for tasks such as editing existing audio clips based on text descriptions or generating new audio from scratch. The agent's architecture integrates multiple specialized audio models, enabling a unified interface for various audio manipulation needs.

Quick Start & Requirements

  • Install via bash scripts/setup_envs.sh.
  • Requires OPENAI_KEY and HF_KEY environment variables.
  • Launch services with bash scripts/start_services.sh.
  • Basic usage: python3 WavCraft.py basic -f --input-wav assets/duck_quacking_in_water.wav --input-text "Add dog barking."
  • Interactive chat: python3 WavCraft-chat.py basic -f -c
  • Watermark check: python3 check_watermark.py --wav-path /path/to/audio/file
  • Supports openLLMs (e.g., MistralAI family) by specifying the --model argument.

Highlighted Details

  • Text-guided audio editing and generation.
  • AI-powered audio scriptwriting.
  • Watermarking for generated/modified audio detection.
  • Support for openLLMs like Mistral-7B-Instruct-v0.2.

Maintenance & Community

The project acknowledges contributions from WavJourney, AudioCraft, AudioSep, AudioSR, AudioLDM, and WavMark. The primary author is Jinhua Liang.

Licensing & Compatibility

The repository is for research purposes only. Users must not disable watermarking techniques.

Limitations & Caveats

This repository is intended for research purposes only, and the developers are not responsible for the semantics of generated or edited audio. Users are explicitly prohibited from disabling the watermarking features.

Health Check
Last Commit

7 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
1 stars in the last 30 days

Explore Similar Projects

Starred by Patrick von Platen Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral) and Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind).

AudioLDM by haoheliu

0.1%
3k
Audio generation research paper using latent diffusion
Created 2 years ago
Updated 2 months ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), and
2 more.

AudioGPT by AIGC-Audio

0.0%
10k
Audio processing and generation research project
Created 2 years ago
Updated 1 year ago
Feedback? Help us improve.