dla by markovka17

Course materials for deep learning in audio processing

Created 5 years ago

731 stars

Top 47.2% on SourcePulse

Project Summary

This repository provides comprehensive lecture and seminar materials for a deep learning course focused on audio processing. It covers a wide range of topics from digital signal processing fundamentals to advanced applications like speech recognition, source separation, text-to-speech, voice biometry, and AI for music. The target audience includes students and researchers in machine learning and audio engineering seeking a structured curriculum with practical examples.

How It Works

The course material is organized weekly, with each week featuring lecture notes, seminar exercises, and self-study materials. It leverages modern deep learning frameworks and tools, including PyTorch, Hydra for configuration, and Git for version control. The curriculum progresses from foundational concepts to state-of-the-art models, offering hands-on experience with practical audio tasks.

Quick Start & Requirements

Installation: No explicit installation instructions are provided for the repository itself, but individual seminar exercises likely require Python and deep learning libraries.
Prerequisites: Python, PyTorch, Hydra, Git, VS Code are mentioned. Specific audio processing libraries and potentially CUDA for GPU acceleration may be needed for seminar exercises.
Resources: Lecture recordings are available on YouTube (some in Russian, some in English). Links to past versions of the course are also provided.

Highlighted Details

Covers both classic and state-of-the-art models in speech recognition (CTC, RNN-T, LAS) and source separation (Demucs, TasNet, ConvTasNet).
Includes specialized topics like audio-visual deep learning, diffusion-based TTS, and AI for music.
Features guest lectures and discussions on practical aspects like experiment tracking and deployment (Slurm).

Maintenance & Community

The course materials have been developed and delivered by a team of contributors over several years, with past versions available for 2020-2023. The primary channel for technical issues and contributions is via GitHub Issues.

Licensing & Compatibility

The repository's license is not explicitly stated in the README.

Limitations & Caveats

Some lecture recordings are in Russian, which may be a barrier for non-Russian speakers. The repository focuses on course materials rather than a runnable library, so setting up and running specific models will require individual effort based on the provided seminar instructions.

dla by markovka17

Explore Similar Projects

awesome-audio-plaza by metame-ai

ICASSP-2023-24-Papers by DmitryRyumin

speech-dataset-generator by davidmartinrius

UniAudio by yangdongchao

NBSS by Audio-WestlakeU

edgedict by theblackcat102

awesome-large-audio-models by EmulationAI

av_hubert by facebookresearch

speech-denoising-wavenet by drethage

Kimi-Audio by MoonshotAI

speech-to-text-wavenet by buriburisuri

speechbrain by speechbrain