UnifiedSKG by xlang-ai

Unified framework for structured knowledge grounding research

Created 4 years ago

566 stars

Top 56.8% on SourcePulse

View on GitHub

4 Experts Love This Project

Shizhe Diao

Author of LMFlow; Research Scientist at NVIDIA

Jeff Hammerbacher

Cofounder of Cloudera

Shyamal Anadkat

Research Scientist at OpenAI

Binyuan Hui

Research Scientist at Alibaba Qwen

Project Summary

UnifiedSKG provides a unified framework for structured knowledge grounding (SKG) tasks, enabling multi-task learning and systematic research. It targets researchers and practitioners in NLP who work with knowledge bases, databases, and semantic parsing, offering a standardized approach to diverse SKG problems.

How It Works

The framework unifies 21 SKG tasks into a text-to-text format, leveraging large language models like T5. This approach allows for a single model to handle heterogeneous SKG tasks, promoting research beyond single-task or domain-specific limitations. It facilitates multi-task learning, particularly with prefix-tuning, and serves as a challenging benchmark for few-shot and zero-shot learning scenarios.

Quick Start & Requirements

Install: Clone recursively (git clone --recurse-submodules) and create a Conda environment using py3.7pytorch1.8.yaml. Install PyTorch with CUDA 11.1 support (pip install torch==1.8.0+cu111 torchvision==0.9.0+cu111 torchaudio==0.8.0 -f https://download.pytorch.org/whl/torch_stable.html) and the datasets library (pip install datasets==1.14.0).
Prerequisites: Python 3.7, PyTorch 1.8.0 with CUDA 11.1, Conda.
Setup: Environment setup via Conda, WandB API key for logging.
Resources: Training examples suggest multi-GPU setups (4-8 GPUs) and significant effective batch sizes (128).
Links: Project page, HuggingFace Model Hub.

Highlighted Details

Unifies 21 SKG tasks into a text-to-text format.
Achieves state-of-the-art performance on most tasks with T5.
Demonstrates benefits of multi-task prefix-tuning.
Evaluates performance of LLMs like T0, GPT-3, and Codex on SKG tasks.
Facilitates controlled experiments on structured knowledge encoding.

Maintenance & Community

The project is associated with EMNLP 2022 (oral). Contributions via pull requests are welcomed.

Licensing & Compatibility

The repository includes third-party code, and specific licensing details for the core framework are not explicitly stated in the README, but it is presented as an open-source research project.

Limitations & Caveats

The provided environment setup specifies PyTorch 1.8.0 with CUDA 11.1, which may be outdated. The README mentions experimental code for combined prefix-tuning that did not outperform simpler methods but is open-sourced for future exploration.

Health Check

Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

1 stars in the last 30 days