tiger  by tigerlab-ai

Open-source LLM toolkit for building trustworthy applications

created 1 year ago
398 stars

Top 73.7% on sourcepulse

GitHubView on GitHub
Project Summary

This toolkit provides an open-source framework for building trustworthy LLM applications, targeting developers who need to integrate retrieval, fine-tuning, and AI safety measures. It aims to bridge the gap between general LLMs and specific data stores, enabling customized AI systems aligned with unique intellectual property and safety requirements.

How It Works

The toolkit comprises four main components: TigerRAG for retrieval-augmented generation using embeddings, TigerTune for fine-tuning and evaluating text generation and classification models, TigerDA for data augmentation, and TigerArmor for AI safety evaluation. TigerRAG employs embeddings-based retrieval (EBR), RAG, and generation-augmented retrieval (GAR), utilizing BERT for embeddings and FAISS for indexing. TigerTune supports fine-tuning models like Llama2 and DistilBERT.

Quick Start & Requirements

  • Installation: Clone the repository, then install requirements for each component (e.g., pip install . in tiger/TigerRAG).
  • Prerequisites: OpenAI API Token is required. CUDA GPU is needed for generation_example.py in TigerTune; notebooks are available as an alternative.
  • Demos: Examples are provided for TigerRAG (movie_recs) and TigerTune (classification_example.py, generation_example.py).
  • Resources: Setup involves cloning and installing Python packages. GPU acceleration is recommended for fine-tuning.
  • Links: TigerLab.ai Demo, Setup Tutorial (Note: Actual YouTube link not provided in README).

Highlighted Details

  • Supports Embeddings-based Retrieval (EBR), Retrieval-Augmented Generation (RAG), and Generation-Augmented Retrieval (GAR).
  • Offers fine-tuning capabilities for models like Llama2 and DistilBERT.
  • Includes AI safety metrics, datasets, and evaluation tools for LLMs.
  • Data augmentation toolkit with generation-based augmentation using fine-tuned GPT2.

Maintenance & Community

  • Active development with a roadmap including additional model support, perturbation-based augmenters, and a VectorDB for TigerRAG.
  • Community engagement encouraged via GitHub issues and a Discord server.
  • Links: Discord (Note: Actual Discord link not provided in README).

Licensing & Compatibility

  • The README does not explicitly state the license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project relies on the OpenAI API, requiring an API key. CUDA GPU is necessary for certain fine-tuning examples, with notebooks provided as an alternative. The README does not specify the license, which may impact commercial adoption.

Health Check
Last commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
5 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Didier Lopes Didier Lopes(Founder of OpenBB), and
10 more.

JARVIS by microsoft

0.1%
24k
System for LLM-orchestrated AI task automation
created 2 years ago
updated 5 days ago
Feedback? Help us improve.