OneForAll by LechengKong

One graph model for all classification tasks

Created 2 years ago

254 stars

Top 99.1% on SourcePulse

Project Summary

OneForAll is a foundational graph learning framework designed to address cross-domain and cross-task classification problems using a single, unified model. It targets researchers and practitioners seeking to simplify the development and deployment of graph models across diverse datasets like citation networks and molecular graphs, and various task types including few-shot, zero-shot, and node-level classification. The primary benefit is achieving this versatility without modifying model parameters or architecture, significantly reducing complexity.

How It Works

OneForAll employs a novel approach by representing all graphs, nodes, and edges using natural language descriptions. These descriptions are then embedded into a shared semantic space using a Large Language Model (LLM). A key innovation is its "prompting paradigm," where task-specific information is converted into prompt graphs. The single model processes these prompts, enabling it to understand and adapt to different tasks and domains dynamically, facilitating cross-task generalization.

Quick Start & Requirements

Installation: Use Conda: conda env create -f environment.yml.
Execution:
- End-to-end experiments: python run_cdm.py --override e2e_all_config.yaml
- Low-resource (few-shot/zero-shot) experiments: python run_cdm.py --override lr_all_config.yaml
Configuration: Tasks and datasets are defined via YAML configuration files (e.g., ./configs/task_config.yaml, e2e_all_config.yaml).
Custom Datasets: Requires implementing gen_data.py, registering the dataset, and defining a splitter. A template CustomizedOFADataset class is available.
Prerequisites: A Conda environment managed by environment.yml. Multi-GPU training is supported.

Highlighted Details

Unified Model: Solves diverse graph classification tasks across multiple domains (e.g., citation, molecular) and task types (few-shot, zero-shot, node/graph-level) with one model and parameter set.
LLM-Powered Embeddings: Leverages LLMs to create a unified embedding space from natural language graph descriptions.
Prompting Paradigm: Utilizes prompt graphs derived from task descriptions to guide the model's predictions.
Multi-GPU Training: Supports training across multiple GPUs for potentially faster experimentation.

Maintenance & Community

Recent updates include the implementation of multi-GPU training, bug fixes, and adjustments to dataset splits and prompt logic. The project has undergone a major revision involving code cleanup and bug fixes, requiring users of older versions to update. No specific community links (e.g., Discord, Slack) or notable contributors are mentioned in the provided text.

Licensing & Compatibility

The license type and any compatibility notes for commercial use or closed-source linking are not specified in the provided README content.

Limitations & Caveats

Adding custom datasets requires significant implementation effort, including data generation, splitting, and configuration. The recent major code revision necessitates users to update their local clones and potentially regenerate data/features. Understanding and modifying the YAML configuration files is crucial for effective use. The absence of explicit licensing information presents a potential adoption blocker.

Health Check

Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

2 stars in the last 30 days