textgen  by shibing624

Text generation models, including LLaMA, BLOOM, GPT2, BART, T5, etc

created 4 years ago
965 stars

Top 39.0% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides a comprehensive toolkit for text generation tasks, supporting a wide array of models including LLaMA, ChatGLM, BLOOM, GPT2, T5, and more. It's designed for researchers and developers working with natural language processing, offering capabilities for model training, fine-tuning (including LoRA and QLoRA), and prediction, with a particular focus on Chinese language applications.

How It Works

The library is built on PyTorch and offers modular implementations for various architectures. It supports advanced fine-tuning techniques like LoRA, QLoRA, AdaLoRA, P_Tuning, and Prefix_Tuning, enabling efficient adaptation of large language models to specific domains or tasks. The project also integrates text augmentation methods like UDA and EDA, and provides implementations for specialized models like SongNet for structured text generation.

Quick Start & Requirements

  • Install: pip install -U textgen or pip install torch followed by git clone https://github.com/shibing624/textgen.git and python setup.py install.
  • Prerequisites: PyTorch. Specific models may have additional requirements.
  • Usage: Examples for inference and training are provided for various models, including ChatGLM-6B and LLaMA. See Usage for details.

Highlighted Details

  • Supports fine-tuning and prediction for numerous LLMs including LLaMA, ChatGLM, BLOOM, Mistral, and QWen.
  • Includes implementations for text augmentation (UDA/EDA) and specialized models like SongNet for structured text.
  • Offers extensive support for various fine-tuning methods (LoRA, QLoRA, etc.) and multi-GPU training/inference.
  • Provides pre-trained models on HuggingFace and detailed evaluation benchmarks for Chinese LLMs.

Maintenance & Community

The project is actively maintained, with recent updates in late 2023 adding features like NEFTune support and multi-GPU inference. The primary contact is via email (xuming624@qq.com) or WeChat.

Licensing & Compatibility

The repository is licensed under The Apache License 2.0. Usage of specific models like LLaMA requires adherence to their respective model cards, and BLOOM/BLOOMZ models follow the RAIL License.

Limitations & Caveats

While the project offers extensive features, some advanced training methods (Reward Modeling, RL finetuning) are directed to a separate repository (shibing624/MedicalGPT). The README notes that evaluation scores are for reference and may not be absolutely rigorous due to potential variations in decoding parameters and random seeds.

Health Check
Last commit

10 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
5 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.