tiger by tigerlab-ai

Open-source LLM toolkit for building trustworthy applications

Created 2 years ago

399 stars

Top 72.4% on SourcePulse

View on GitHub

1 Expert Loves This Project

Benjamin Bolte

Cofounder of K-Scale Labs

Project Summary

This toolkit provides an open-source framework for building trustworthy LLM applications, targeting developers who need to integrate retrieval, fine-tuning, and AI safety measures. It aims to bridge the gap between general LLMs and specific data stores, enabling customized AI systems aligned with unique intellectual property and safety requirements.

How It Works

The toolkit comprises four main components: TigerRAG for retrieval-augmented generation using embeddings, TigerTune for fine-tuning and evaluating text generation and classification models, TigerDA for data augmentation, and TigerArmor for AI safety evaluation. TigerRAG employs embeddings-based retrieval (EBR), RAG, and generation-augmented retrieval (GAR), utilizing BERT for embeddings and FAISS for indexing. TigerTune supports fine-tuning models like Llama2 and DistilBERT.

Quick Start & Requirements

Installation: Clone the repository, then install requirements for each component (e.g., pip install . in tiger/TigerRAG).
Prerequisites: OpenAI API Token is required. CUDA GPU is needed for generation_example.py in TigerTune; notebooks are available as an alternative.
Demos: Examples are provided for TigerRAG (movie_recs) and TigerTune (classification_example.py, generation_example.py).
Resources: Setup involves cloning and installing Python packages. GPU acceleration is recommended for fine-tuning.
Links: TigerLab.ai Demo, Setup Tutorial (Note: Actual YouTube link not provided in README).

Highlighted Details

Supports Embeddings-based Retrieval (EBR), Retrieval-Augmented Generation (RAG), and Generation-Augmented Retrieval (GAR).
Offers fine-tuning capabilities for models like Llama2 and DistilBERT.
Includes AI safety metrics, datasets, and evaluation tools for LLMs.
Data augmentation toolkit with generation-based augmentation using fine-tuned GPT2.

Maintenance & Community

Active development with a roadmap including additional model support, perturbation-based augmenters, and a VectorDB for TigerRAG.
Community engagement encouraged via GitHub issues and a Discord server.
Links: Discord (Note: Actual Discord link not provided in README).

Licensing & Compatibility

The README does not explicitly state the license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project relies on the OpenAI API, requiring an API key. CUDA GPU is necessary for certain fine-tuning examples, with notebooks provided as an alternative. The README does not specify the license, which may impact commercial adoption.

Health Check

Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

0 stars in the last 30 days