caffe-speech-recognition  by pannous

Speech recognition with Caffe

Created 10 years ago
325 stars

Top 83.6% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

This project provides a speech recognition system built with the Caffe deep learning framework. It is particularly useful for training a limited set of commands or options, achieving high accuracy for spoken numbers. The project is actively migrating to TensorFlow due to perceived difficulties with merging contributions into the Caffe project.

How It Works

The system utilizes spectrograms of spoken audio for training. It employs a neural network architecture, with plans to integrate Caffe's LSTM layers, though the project's migration to TensorFlow suggests a shift in core technology. The current implementation focuses on achieving high accuracy for specific tasks like recognizing spoken digits.

Quick Start & Requirements

  • Training Spoken Numbers:
    • Download training data: http://pannous.net/spoken_numbers.tar (470 MB).
    • Start training: ./train.sh.
    • Test with ipython notebook test-speech-recognition.ipynb or Caffe's classify.py.
  • Prerequisites: Caffe deep learning framework, Python.

Highlighted Details

  • Achieves 99% accuracy for spoken numbers.
  • Includes scripts for online recognition and learning (recognition-server.py, record.py).
  • Future plans include training for general words and broader speech, incorporating silence and noise categories.

Maintenance & Community

The project is actively migrating to TensorFlow due to issues with the Caffe project's merging policy. Specific community links or active development forums are not detailed in the README.

Licensing & Compatibility

The licensing details are not explicitly stated in the provided README excerpt.

Limitations & Caveats

The project is described as "fresh" with only the first of three milestones accomplished. The migration to TensorFlow indicates ongoing development and potential instability. Training for general speech is a future goal and not yet implemented. Access to large datasets like TIMIT may involve significant costs or licensing restrictions.

Health Check
Last Commit

7 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
1 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.