ai-research-code by sony

Research code for ML/AI papers

created 5 years ago

356 stars

Top 79.5% on sourcepulse

Project Summary

This repository provides code for various machine learning and AI research papers published by Sony, aiming to offer transparent and reproducible research. It targets researchers and engineers interested in cutting-edge AI techniques, particularly in areas like efficient DNN inference, music separation, large-scale model training, data cleansing, dense prediction tasks, and voice conversion. The benefit is direct access to implementations of novel methods, potentially accelerating research and development.

How It Works

The repository showcases diverse approaches, including differentiable quantization with learned step sizes and dynamic ranges for mixed-precision DNNs, a CrossNet-Open-Unmix (X-UMX) architecture for music separation leveraging additional average operations and custom losses, and an out-of-core training algorithm for large-scale neural networks that adaptively schedules memory transfers and uses virtual addressing to reduce fragmentation. It also features methods for storage-efficient approximation of influence functions for data cleansing, a D3Net architecture with multidilated convolutions for dense prediction tasks, and an end-to-end adversarial voice conversion network (NVC-Net) operating on raw audio waveforms. Finally, it includes a PyTorch implementation of FastSpeech 2 with TVC-GMM for modeling residual multimodality in expressive speech synthesis.

Quick Start & Requirements

Installation: Primarily relies on the NNabla framework for several projects, with one project using PyTorch. Specific installation instructions and dependencies are detailed within each project's subdirectory.
Prerequisites: May require specific Python versions, CUDA for GPU acceleration, and potentially large datasets for training and evaluation.
Resources: Training large-scale models or complex tasks will likely require significant GPU memory and compute resources.
Links:
- Mixed Precision DNNs: arXiv 1905.11452
- X-UMX: x-umx, open-unmix-nnabla, open-unmix-pytorch
- Out-of-core Training: NNabla documentation
- D3Net: CVPR 2021
- NVC-Net: arXiv:2106.00992
- TVC-GMM: INTERSPEECH 2023

Highlighted Details

Mixed Precision DNNs achieve state-of-the-art performance by learning optimal bitwidths.
X-UMX improves music separation without additional learnable parameters.
Out-of-core training enables training of networks larger than GPU memory with improved speed.
Data cleansing method reduces cache size by 1,563x compared to previous approaches.
D3Net uses multidilated convolutions for simultaneous modeling of multiresolution patterns.
NVC-Net achieves fast inference (>3600 kHz on V100) for end-to-end voice conversion.
TVC-GMM enhances expressive speech synthesis by modeling residual multimodality.

Maintenance & Community

The repository is maintained by Sony AI Research. Specific community channels or active development forums are not explicitly mentioned in the README.

Licensing & Compatibility

The README does not explicitly state a unified license for the entire repository. Individual projects may have their own licenses. Compatibility for commercial use or closed-source linking would depend on the specific license of each included code project.

Limitations & Caveats

The repository contains code for multiple research papers, each with its own dependencies and potential setup complexities. Some projects are implemented in NNabla, which might be less common than PyTorch or TensorFlow. The README does not provide a consolidated overview of all dependencies or a single point of entry for setup.

ai-research-code by sony

Explore Similar Projects

SoundStorm by yangdongchao

NBSS by Audio-WestlakeU

DL-Hub by jhlucc

wavegrad by lmnt-com

Wave-U-Net-Pytorch by f90

Text2Video by michaelzhang-ai

smol-vision by merveenoyan

DeepWorks by prodramp

flowtron by NVIDIA

deep_learning by nosuggest

deep-learning by udacity

so-vits-svc by svc-develop-team