s3prl by s3prl

Speech representation learning toolkit

Created 6 years ago

2,531 stars

Top 18.0% on SourcePulse

View on GitHub

7 Experts Love This Project

Benjamin Bolte

Cofounder of K-Scale Labs

Jeff Hammerbacher

Cofounder of Cloudera

Anastasios Angelopoulos

Cofounder of LMArena

Junyang Lin

Core Maintainer at Alibaba Qwen

and 3 more!

Project Summary

This toolkit addresses self-supervised speech pre-training and representation learning, offering a unified interface for numerous upstream models and downstream tasks. It is targeted at researchers and developers in speech processing who want to leverage or develop self-supervised learning (SSL) models for various applications, providing a flexible and modular framework that integrates with other toolkits like ESPNet.

How It Works

S3PRL (Self-Supervised Speech Pre-training and Representation Learning) organizes self-supervised speech pre-trained models as "upstream" components. These upstream models are registered via torch.hub, allowing for one-line plug-and-play usage in external projects without requiring the entire S3PRL codebase. The toolkit also facilitates using these representations in downstream tasks and benchmarking them with the SUPERB benchmark.

Quick Start & Requirements

Install: pip install s3prl or pip install -e ".[all]" for all extras.
Prerequisites: Python >= 3.9, PyTorch (versions 1.13.1 to 2.4.0), sox installed on the OS. Some upstream models may have additional specific dependencies detailed in their respective README.md files.
Documentation: Tutorials and documentation are available.

Highlighted Details

Supports a wide range of upstream models including Mockingjay, TERA, Audio ALBERT, and integrates with ESPnet for broader applications.
Facilitates benchmarking of upstream models using the SUPERB benchmark.
Modularized SSL models are available as a standalone PyPI package for easy integration.
Actively maintained with contributions from various institutions, with a focus on new upstream models and maintenance of existing functions since 2024.

Maintenance & Community

The project is in pure maintenance mode, focusing on long-term support for existing functions. Contributions in the form of bug reports or fixes are welcome. Discussions are preferred on the GitHub issue page for transparency. Key contributors include Shu-wen Yang, Andy T. Liu, Heng-Jui Chang, Haibin Wu, and Xuankai Chang.

Licensing & Compatibility

The majority of the S3PRL Toolkit is licensed under the Apache License 2.0. However, files authored by Facebook, Inc. are licensed under CC-BY-NC, which may impose non-commercial use restrictions.

Limitations & Caveats

Since transitioning to maintenance mode in 2024, the focus is on maintaining existing functions, and new techniques will not be integrated into S3PRL itself. Some upstream models may have specific, unstated dependencies that could lead to installation or runtime errors if not carefully managed.

Health Check

Last Commit

8 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

16 stars in the last 30 days