LLM for zero-shot information extraction using annotation guidelines
Top 74.9% on sourcepulse
GoLLIE is a Large Language Model designed for zero-shot Information Extraction (IE) by strictly following user-defined annotation guidelines. It enables dynamic schema definition and inference, outperforming prior methods by leveraging detailed instructions rather than relying solely on pre-existing LLM knowledge. This is beneficial for researchers and practitioners needing flexible and precise IE capabilities.
How It Works
GoLLIE utilizes a guideline-following approach where annotation schemas are defined as Python classes and instructions are embedded in docstrings. The model is trained to interpret these guidelines and extract information accordingly, allowing for on-the-fly schema adaptation. This method enhances zero-shot performance by explicitly conditioning the LLM on task-specific rules.
Quick Start & Requirements
pip install --upgrade transformers peft bitsandbytes
and pip install flash-attn --no-build-isolation
followed by pip install git+https://github.com/HazyResearch/flash-attention.git#subdirectory=csrc/rotary
. Additional dependencies include numpy
, black
, Jinja2
, tqdm
, rich
, psutil
, datasets
, ruff
, wandb
, fschat
.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The project does not redistribute training datasets, requiring users to acquire and potentially license certain datasets manually. Compatibility with commercial or closed-source applications may be impacted by the licensing of these external datasets.
9 months ago
1 day