tracer  by adrida

LLM classification routing for cost efficiency

Created 2 months ago
473 stars

Top 63.8% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

Summary

TRACER addresses the high cost of LLM-based classification by intelligently routing predictable inputs to lightweight, traditional ML models. It targets users with LLM classification pipelines, offering over 90% cost reduction and formal parity guarantees against the teacher LLM, with a self-improving routing policy.

How It Works

TRACER learns a decision boundary from LLM classification traces. It fits a fast, non-LLM surrogate model (e.g., logistic regression, LightGBM) on "easy" inputs. A calibrated "acceptor gate" estimates surrogate agreement with the LLM, deferring uncertain inputs. New traces from deferred calls feed back into subsequent refits, automatically increasing surrogate coverage and reducing LLM reliance. This enables sub-millisecond, CPU-bound inference for handled cases, drastically cutting costs while maintaining formal parity guarantees.

Quick Start & Requirements

  • Primary install: pip install tracer-llm. Optional [embeddings] for sentence-transformers.
  • Requires Python and text embeddings (computed locally or via API).
  • Demo available via tracer demo.
  • Links: Concepts guide, API reference, JS integration guide.

Highlighted Details

  • Achieves 90%+ classification calls handled by traditional ML surrogates.
  • Provides formal parity guarantees against the teacher LLM.
  • Features a self-improving routing policy via continual learning from new traces.
  • Demonstrates significant cost reduction (e.g., >$300k/yr savings projected for 10k queries/day).
  • Offers sub-millisecond, CPU-bound inference for surrogate models.

Maintenance & Community

No specific details on contributors, sponsorships, or community channels were found in the provided README.

Licensing & Compatibility

MIT License. This license is permissive and generally compatible with commercial use and closed-source linking.

Limitations & Caveats

Effectiveness depends on initial trace quality and ongoing data generation for continual learning. Potential troubleshooting areas include selected_method=null and coverage drift. Parity gate calibration is critical for maintaining accuracy guarantees.

Health Check
Last Commit

3 days ago

Responsiveness

Inactive

Pull Requests (30d)
23
Issues (30d)
23
Star History
294 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.