caffe-speech-recognition by pannous

Speech recognition with Caffe

Created 11 years ago

325 stars

Top 84.0% on SourcePulse

View on GitHub

1 Expert Loves This Project

Anastasis Germanidis

Cofounder of Runway

Project Summary

This project provides a speech recognition system built with the Caffe deep learning framework. It is particularly useful for training a limited set of commands or options, achieving high accuracy for spoken numbers. The project is actively migrating to TensorFlow due to perceived difficulties with merging contributions into the Caffe project.

How It Works

The system utilizes spectrograms of spoken audio for training. It employs a neural network architecture, with plans to integrate Caffe's LSTM layers, though the project's migration to TensorFlow suggests a shift in core technology. The current implementation focuses on achieving high accuracy for specific tasks like recognizing spoken digits.

Quick Start & Requirements

Training Spoken Numbers:
- Download training data: http://pannous.net/spoken_numbers.tar (470 MB).
- Start training: ./train.sh.
- Test with ipython notebook test-speech-recognition.ipynb or Caffe's classify.py.
Prerequisites: Caffe deep learning framework, Python.

Highlighted Details

Achieves 99% accuracy for spoken numbers.
Includes scripts for online recognition and learning (recognition-server.py, record.py).
Future plans include training for general words and broader speech, incorporating silence and noise categories.

Maintenance & Community

The project is actively migrating to TensorFlow due to issues with the Caffe project's merging policy. Specific community links or active development forums are not detailed in the README.

Licensing & Compatibility

The licensing details are not explicitly stated in the provided README excerpt.

Limitations & Caveats

The project is described as "fresh" with only the first of three milestones accomplished. The migration to TensorFlow indicates ongoing development and potential instability. Training for general speech is a future goal and not yet implemented. Access to large datasets like TIMIT may involve significant costs or licensing restrictions.

Health Check

Last Commit

7 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

0 stars in the last 30 days