ai-research-code  by sony

Research code for ML/AI papers

created 5 years ago
356 stars

Top 79.5% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides code for various machine learning and AI research papers published by Sony, aiming to offer transparent and reproducible research. It targets researchers and engineers interested in cutting-edge AI techniques, particularly in areas like efficient DNN inference, music separation, large-scale model training, data cleansing, dense prediction tasks, and voice conversion. The benefit is direct access to implementations of novel methods, potentially accelerating research and development.

How It Works

The repository showcases diverse approaches, including differentiable quantization with learned step sizes and dynamic ranges for mixed-precision DNNs, a CrossNet-Open-Unmix (X-UMX) architecture for music separation leveraging additional average operations and custom losses, and an out-of-core training algorithm for large-scale neural networks that adaptively schedules memory transfers and uses virtual addressing to reduce fragmentation. It also features methods for storage-efficient approximation of influence functions for data cleansing, a D3Net architecture with multidilated convolutions for dense prediction tasks, and an end-to-end adversarial voice conversion network (NVC-Net) operating on raw audio waveforms. Finally, it includes a PyTorch implementation of FastSpeech 2 with TVC-GMM for modeling residual multimodality in expressive speech synthesis.

Quick Start & Requirements

  • Installation: Primarily relies on the NNabla framework for several projects, with one project using PyTorch. Specific installation instructions and dependencies are detailed within each project's subdirectory.
  • Prerequisites: May require specific Python versions, CUDA for GPU acceleration, and potentially large datasets for training and evaluation.
  • Resources: Training large-scale models or complex tasks will likely require significant GPU memory and compute resources.
  • Links:

Highlighted Details

  • Mixed Precision DNNs achieve state-of-the-art performance by learning optimal bitwidths.
  • X-UMX improves music separation without additional learnable parameters.
  • Out-of-core training enables training of networks larger than GPU memory with improved speed.
  • Data cleansing method reduces cache size by 1,563x compared to previous approaches.
  • D3Net uses multidilated convolutions for simultaneous modeling of multiresolution patterns.
  • NVC-Net achieves fast inference (>3600 kHz on V100) for end-to-end voice conversion.
  • TVC-GMM enhances expressive speech synthesis by modeling residual multimodality.

Maintenance & Community

The repository is maintained by Sony AI Research. Specific community channels or active development forums are not explicitly mentioned in the README.

Licensing & Compatibility

The README does not explicitly state a unified license for the entire repository. Individual projects may have their own licenses. Compatibility for commercial use or closed-source linking would depend on the specific license of each included code project.

Limitations & Caveats

The repository contains code for multiple research papers, each with its own dependencies and potential setup complexities. Some projects are implemented in NNabla, which might be less common than PyTorch or TensorFlow. The README does not provide a consolidated overview of all dependencies or a single point of entry for setup.

Health Check
Last commit

1 year ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
4 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.