classifier  by cardmagic

AI text classifier with streaming and native performance

Created 17 years ago
685 stars

Top 49.8% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

A general-purpose text classification module for Ruby, cardmagic/classifier provides five distinct algorithms (Bayesian, Logistic Regression, LSI, KNN, TF-IDF) to address diverse classification needs. It targets Ruby developers seeking high-performance, scalable text analysis solutions, offering significant speedups and efficient handling of large datasets through native extensions and streaming capabilities. The library aims to simplify complex text processing tasks with a focus on performance and flexibility.

How It Works

The library leverages native C extensions for core algorithms like LSI, achieving 5-50x performance gains over pure Ruby implementations. A key innovation is its incremental LSI implementation, based on Brand's algorithm, which allows for efficient training on multi-gigabyte datasets without requiring a full index rebuild. This approach is highly advantageous for real-time or streaming data scenarios where memory constraints are critical. The library also features pluggable persistence, enabling models to be saved and loaded using various backends like File, Redis, S3, or SQL, offering substantial flexibility in deployment.

Quick Start & Requirements

Install via RubyGems: gem 'classifier'. For CLI-only usage, install via Homebrew: brew install cardmagic/tap/classifier. Development requires Ruby and potentially C compilation tools for building native extensions (rake compile). Links to specific algorithm guides and CLI documentation are mentioned within the README but are not provided as direct URLs.

Highlighted Details

  • Performance: Native C extension provides 5-50x speedup for LSI operations, making it suitable for performance-critical applications.
  • Incremental LSI: Supports streaming training on multi-GB datasets without full index rebuilds, enabling efficient processing of large corpora.
  • Pluggable Persistence: Models can be stored via File, Redis, S3, SQL, or custom solutions, enhancing deployment flexibility.
  • CLI Tool: Enables instant classification and custom model training directly from the command line, reducing the need for coding for basic tasks.
  • Claude Plugin: Offers integration with Claude AI for automated classification skills and slash commands, extending its utility within AI workflows.

Maintenance & Community

The project lists four primary authors: Lucas Carlson, David Fayram II, Cameron McBride, and Ivan Acosta-Rubio. No specific community channels (like Discord or Slack) or roadmap links are provided in the README, suggesting a potentially smaller or less actively public community.

Licensing & Compatibility

The library is licensed under the LGPL 2.1. This license permits commercial use and linking from closed-source applications, provided that any modifications made to the LGPL-licensed code itself are shared under the LGPL.

Limitations & Caveats

The README primarily details Ruby integration and CLI usage; cross-platform or multi-language support is not discussed. While not explicitly stated as alpha or beta, the absence of community links might indicate a smaller user base or less active community development compared to more widely adopted libraries.

Health Check
Last Commit

1 week ago

Responsiveness

Inactive

Pull Requests (30d)
26
Issues (30d)
24
Star History
22 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.