SpeechTransProgress  by kahne

End-to-end speech translation research and dataset tracker

Created 5 years ago
261 stars

Top 97.4% on SourcePulse

GitHubView on GitHub
Project Summary

<2-3 sentences summarising what the project addresses and solves, the target audience, and the benefit.> The kahne/SpeechTransProgress repository serves as a curated knowledge base for researchers and practitioners focused on end-to-end speech translation (ST). It aims to track advancements in the field by cataloging key datasets, influential research papers, and relevant tutorials, providing a centralized resource for understanding the state-of-the-art and ongoing developments in spoken language translation.

How It Works

This project functions as a meta-repository, aggregating pointers to significant datasets, research publications, and academic events within the speech translation domain. It highlights diverse corpora like CoVoST 2, CVSS, and MUST-C, detailing their language pairs, data types, durations, and licensing. The repository also lists numerous research papers covering various ST sub-fields, including simultaneous translation, low-resource scenarios, and novel model architectures, offering a structured overview of the research landscape.

Quick Start & Requirements

This repository does not provide installation instructions or a runnable toolkit. It serves as a curated list of resources and research pointers for the speech translation community.

Highlighted Details

  • Comprehensive catalog of over 100 research papers and tutorials from major conferences (ACL, EACL, INTERSPEECH, ICML, etc.) spanning 2013-2023.
  • Detailed table of speech translation datasets, including CoVoST 2, CVSS, mTEDx, MUST-C, How2, Europarl-ST, and others, specifying language directions, data types (text/speech), duration, and licensing.
  • Covers a wide spectrum of ST research, including direct speech-to-speech translation, simultaneous translation, multilingual ST, low-resource ST, and the impact of pre-training and self-supervised learning.

Maintenance & Community

No information regarding maintainers, community channels (e.g., Discord, Slack), or project roadmaps is available in the provided README.

Licensing & Compatibility

The repository itself does not appear to have a specific license, but the listed datasets carry various licenses. These include permissive licenses like CC0 and CC BY 4.0, as well as more restrictive non-commercial licenses such as CC BY-NC-ND 4.0 and CC BY-NC 4.0. Some datasets are sourced from LDC or Bible.is, which may have their own specific terms of use. Users must consult the individual dataset licenses for compatibility with commercial or closed-source applications.

Limitations & Caveats

As a curated list of research and datasets rather than a software toolkit, this repository does not offer direct implementation or tooling for speech translation. Users seeking to build or deploy ST systems will need to refer to the cited papers and associated toolkits (e.g., ESPNet-ST, Fairseq S2T) for practical implementation details. The rapid evolution of the ST field means that the listed resources may not encompass the absolute latest advancements beyond the publication dates of the cited papers.

Health Check
Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
1 stars in the last 30 days

Explore Similar Projects

Starred by Boris Cherny Boris Cherny(Creator of Claude Code; MTS at Anthropic), Stas Bekman Stas Bekman(Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake), and
19 more.

lectures by oxford-cs-deepnlp-2017

0.0%
16k
NLP course (lecture slides) for deep learning approaches to language
Created 8 years ago
Updated 2 years ago
Feedback? Help us improve.