llm-classifier by lamini-ai

LLM classifier for instant data classification using Llama 2

Created 2 years ago

279 stars

Top 93.2% on SourcePulse

View on GitHub

1 Expert Loves This Project

Jeff Hammerbacher

Cofounder of Cloudera

Project Summary

This project provides an LLM-based classifier that allows users to categorize data using natural language prompts, eliminating the need for labeled datasets. It's designed for users who want to quickly build custom classifiers without extensive data preparation or hyperparameter tuning, leveraging prompt engineering as the primary method for defining classes.

How It Works

The classifier leverages the Llama 2 LLM to generate synthetic training data from user-provided prompts, effectively creating "piles" of examples for each class. It then fine-tunes specialized LLMs derived from Llama 2 to distinguish between these generated data piles. This approach bypasses manual data labeling and allows for classifier customization through prompt refinement.

Quick Start & Requirements

Install: pip install lamini or clone the repository and use provided shell scripts.
Prerequisites: Docker is required for the shell scripts. Lamini API keys are needed (free signup available).
Setup: Requires cloning the repository and obtaining API keys.
Docs: https://lamini-ai.github.io/

Highlighted Details

Classifiers are trained solely on prompts, with optional data augmentation.
Outputs include class predictions and probabilities for uncertainty gauging.
Offers a Python library for direct integration and command-line scripts for ease of use.
Prompt engineering replaces traditional hyperparameter tuning.

Maintenance & Community

This appears to be a hackathon project with ongoing refinement. Feedback is encouraged for improvements.

Licensing & Compatibility

The repository does not explicitly state a license in the README. Compatibility for commercial or closed-source use is not specified.

Limitations & Caveats

The project is described as a "week night hackathon project" with known limitations, including inefficient batching for training on many classes and ongoing refinement of LLM example generators. The accuracy of generated examples depends on prompt quality.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

1 stars in the last 30 days