UnifiedSKG  by xlang-ai

Unified framework for structured knowledge grounding research

Created 3 years ago
564 stars

Top 57.0% on SourcePulse

GitHubView on GitHub
Project Summary

UnifiedSKG provides a unified framework for structured knowledge grounding (SKG) tasks, enabling multi-task learning and systematic research. It targets researchers and practitioners in NLP who work with knowledge bases, databases, and semantic parsing, offering a standardized approach to diverse SKG problems.

How It Works

The framework unifies 21 SKG tasks into a text-to-text format, leveraging large language models like T5. This approach allows for a single model to handle heterogeneous SKG tasks, promoting research beyond single-task or domain-specific limitations. It facilitates multi-task learning, particularly with prefix-tuning, and serves as a challenging benchmark for few-shot and zero-shot learning scenarios.

Quick Start & Requirements

  • Install: Clone recursively (git clone --recurse-submodules) and create a Conda environment using py3.7pytorch1.8.yaml. Install PyTorch with CUDA 11.1 support (pip install torch==1.8.0+cu111 torchvision==0.9.0+cu111 torchaudio==0.8.0 -f https://download.pytorch.org/whl/torch_stable.html) and the datasets library (pip install datasets==1.14.0).
  • Prerequisites: Python 3.7, PyTorch 1.8.0 with CUDA 11.1, Conda.
  • Setup: Environment setup via Conda, WandB API key for logging.
  • Resources: Training examples suggest multi-GPU setups (4-8 GPUs) and significant effective batch sizes (128).
  • Links: Project page, HuggingFace Model Hub.

Highlighted Details

  • Unifies 21 SKG tasks into a text-to-text format.
  • Achieves state-of-the-art performance on most tasks with T5.
  • Demonstrates benefits of multi-task prefix-tuning.
  • Evaluates performance of LLMs like T0, GPT-3, and Codex on SKG tasks.
  • Facilitates controlled experiments on structured knowledge encoding.

Maintenance & Community

The project is associated with EMNLP 2022 (oral). Contributions via pull requests are welcomed.

Licensing & Compatibility

The repository includes third-party code, and specific licensing details for the core framework are not explicitly stated in the README, but it is presented as an open-source research project.

Limitations & Caveats

The provided environment setup specifies PyTorch 1.8.0 with CUDA 11.1, which may be outdated. The README mentions experimental code for combined prefix-tuning that did not outperform simpler methods but is open-sourced for future exploration.

Health Check
Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
2 stars in the last 30 days

Explore Similar Projects

Starred by Aravind Srinivas Aravind Srinivas(Cofounder of Perplexity), Shizhe Diao Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA), and
12 more.

gpt-3 by openai

0.0%
16k
Research paper on large language model few-shot learning
Created 5 years ago
Updated 5 years ago
Feedback? Help us improve.