UnifiedSKG  by xlang-ai

Unified framework for structured knowledge grounding research

created 3 years ago
562 stars

Top 58.1% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

UnifiedSKG provides a unified framework for structured knowledge grounding (SKG) tasks, enabling multi-task learning and systematic research. It targets researchers and practitioners in NLP who work with knowledge bases, databases, and semantic parsing, offering a standardized approach to diverse SKG problems.

How It Works

The framework unifies 21 SKG tasks into a text-to-text format, leveraging large language models like T5. This approach allows for a single model to handle heterogeneous SKG tasks, promoting research beyond single-task or domain-specific limitations. It facilitates multi-task learning, particularly with prefix-tuning, and serves as a challenging benchmark for few-shot and zero-shot learning scenarios.

Quick Start & Requirements

  • Install: Clone recursively (git clone --recurse-submodules) and create a Conda environment using py3.7pytorch1.8.yaml. Install PyTorch with CUDA 11.1 support (pip install torch==1.8.0+cu111 torchvision==0.9.0+cu111 torchaudio==0.8.0 -f https://download.pytorch.org/whl/torch_stable.html) and the datasets library (pip install datasets==1.14.0).
  • Prerequisites: Python 3.7, PyTorch 1.8.0 with CUDA 11.1, Conda.
  • Setup: Environment setup via Conda, WandB API key for logging.
  • Resources: Training examples suggest multi-GPU setups (4-8 GPUs) and significant effective batch sizes (128).
  • Links: Project page, HuggingFace Model Hub.

Highlighted Details

  • Unifies 21 SKG tasks into a text-to-text format.
  • Achieves state-of-the-art performance on most tasks with T5.
  • Demonstrates benefits of multi-task prefix-tuning.
  • Evaluates performance of LLMs like T0, GPT-3, and Codex on SKG tasks.
  • Facilitates controlled experiments on structured knowledge encoding.

Maintenance & Community

The project is associated with EMNLP 2022 (oral). Contributions via pull requests are welcomed.

Licensing & Compatibility

The repository includes third-party code, and specific licensing details for the core framework are not explicitly stated in the README, but it is presented as an open-source research project.

Limitations & Caveats

The provided environment setup specifies PyTorch 1.8.0 with CUDA 11.1, which may be outdated. The README mentions experimental code for combined prefix-tuning that did not outperform simpler methods but is open-sourced for future exploration.

Health Check
Last commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
5 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), and
10 more.

open-r1 by huggingface

0.2%
25k
SDK for reproducing DeepSeek-R1
created 6 months ago
updated 3 days ago
Feedback? Help us improve.