Discover and explore top open-source AI tools and projects—updated daily.
Speech recognition with Caffe
Top 83.6% on SourcePulse
This project provides a speech recognition system built with the Caffe deep learning framework. It is particularly useful for training a limited set of commands or options, achieving high accuracy for spoken numbers. The project is actively migrating to TensorFlow due to perceived difficulties with merging contributions into the Caffe project.
How It Works
The system utilizes spectrograms of spoken audio for training. It employs a neural network architecture, with plans to integrate Caffe's LSTM layers, though the project's migration to TensorFlow suggests a shift in core technology. The current implementation focuses on achieving high accuracy for specific tasks like recognizing spoken digits.
Quick Start & Requirements
http://pannous.net/spoken_numbers.tar
(470 MB)../train.sh
.ipython notebook test-speech-recognition.ipynb
or Caffe's classify.py
.Highlighted Details
recognition-server.py
, record.py
).Maintenance & Community
The project is actively migrating to TensorFlow due to issues with the Caffe project's merging policy. Specific community links or active development forums are not detailed in the README.
Licensing & Compatibility
The licensing details are not explicitly stated in the provided README excerpt.
Limitations & Caveats
The project is described as "fresh" with only the first of three milestones accomplished. The migration to TensorFlow indicates ongoing development and potential instability. Training for general speech is a future goal and not yet implemented. Access to large datasets like TIMIT may involve significant costs or licensing restrictions.
7 years ago
Inactive