Text generation library with pre-trained language models
Top 35.5% on sourcepulse
TextBox 2.0 is a comprehensive Python library designed for text generation tasks, offering a unified and standardized framework for applying pre-trained language models (PLMs). It caters to researchers and developers working with various text generation applications, providing a reproducible and flexible environment for experimentation.
How It Works
TextBox 2.0 standardizes the pipeline for PLM-based text generation, supporting 13 common tasks and 83 datasets. It integrates 47 diverse PLMs, categorized by function (e.g., translation, dialogue, controllable generation), and offers 4 pre-training objectives and efficient training strategies like distributed data parallel. This unified approach simplifies the application of state-of-the-art models and facilitates result reproduction.
Quick Start & Requirements
install.sh
after cloning the repository. A conda
environment with Python 3.8 is recommended.dataset
folder.python run_textbox.py --model=BART --dataset=samsum --model_path=facebook/bart-base
Highlighted Details
Maintenance & Community
Developed and maintained by AI Box. Contributions are welcomed via issues and pull requests.
Licensing & Compatibility
MIT License. Permissive for commercial use and closed-source linking.
Limitations & Caveats
The installation process may require specific handling for dependencies like files2rouge
. Users need to manually download datasets from Hugging Face.
2 years ago
1 day