SpeechTransProgress by kahne

End-to-end speech translation research and dataset tracker

Created 6 years ago

260 stars

Top 97.5% on SourcePulse

Project Summary

<2-3 sentences summarising what the project addresses and solves, the target audience, and the benefit.> The kahne/SpeechTransProgress repository serves as a curated knowledge base for researchers and practitioners focused on end-to-end speech translation (ST). It aims to track advancements in the field by cataloging key datasets, influential research papers, and relevant tutorials, providing a centralized resource for understanding the state-of-the-art and ongoing developments in spoken language translation.

How It Works

This project functions as a meta-repository, aggregating pointers to significant datasets, research publications, and academic events within the speech translation domain. It highlights diverse corpora like CoVoST 2, CVSS, and MUST-C, detailing their language pairs, data types, durations, and licensing. The repository also lists numerous research papers covering various ST sub-fields, including simultaneous translation, low-resource scenarios, and novel model architectures, offering a structured overview of the research landscape.

Quick Start & Requirements

This repository does not provide installation instructions or a runnable toolkit. It serves as a curated list of resources and research pointers for the speech translation community.

Highlighted Details

Comprehensive catalog of over 100 research papers and tutorials from major conferences (ACL, EACL, INTERSPEECH, ICML, etc.) spanning 2013-2023.
Detailed table of speech translation datasets, including CoVoST 2, CVSS, mTEDx, MUST-C, How2, Europarl-ST, and others, specifying language directions, data types (text/speech), duration, and licensing.
Covers a wide spectrum of ST research, including direct speech-to-speech translation, simultaneous translation, multilingual ST, low-resource ST, and the impact of pre-training and self-supervised learning.

Maintenance & Community

No information regarding maintainers, community channels (e.g., Discord, Slack), or project roadmaps is available in the provided README.

Licensing & Compatibility

The repository itself does not appear to have a specific license, but the listed datasets carry various licenses. These include permissive licenses like CC0 and CC BY 4.0, as well as more restrictive non-commercial licenses such as CC BY-NC-ND 4.0 and CC BY-NC 4.0. Some datasets are sourced from LDC or Bible.is, which may have their own specific terms of use. Users must consult the individual dataset licenses for compatibility with commercial or closed-source applications.

Limitations & Caveats

As a curated list of research and datasets rather than a software toolkit, this repository does not offer direct implementation or tooling for speech translation. Users seeking to build or deploy ST systems will need to refer to the cited papers and associated toolkits (e.g., ESPNet-ST, Fairseq S2T) for practical implementation details. The rapid evolution of the ST field means that the listed resources may not encompass the absolute latest advancements beyond the publication dates of the cited papers.

SpeechTransProgress by kahne

Explore Similar Projects

Awesome-Simultaneous-Translation by zhangshaolei1998

Multilingual-PR by ASR-project

Meta-voicebox by SpeechifyInc

mtla by D-Keqi

deepspeech-german by AASHISHAG

Modelscope_Faster_Whisper_Multi_Subtitle by v3ucn

Hy-MT by Tencent-Hunyuan

SimulStreaming by ufal

IndicTrans2 by AI4Bharat

fairseq-lua by facebookresearch

seamless_communication by facebookresearch

lectures by oxford-cs-deepnlp-2017