GLiNER2 by fastino-ai

Unified information extraction for diverse NLP tasks

Created 7 months ago

880 stars

Top 40.7% on SourcePulse

View on GitHub

2 Experts Love This Project

Jesse Clark

Cofounder of Marqo

Jeff Hammerbacher

Cofounder of Cloudera

Project Summary

Unified Schema-Based Information Extraction

GLiNER2 is a unified information extraction framework that consolidates Named Entity Recognition (NER), text classification, and structured data extraction into a single, efficient model. Designed for CPU-first inference, it offers fast, local processing without requiring GPUs or external dependencies, making advanced NLP accessible on standard hardware. This approach benefits users needing versatile data extraction capabilities while prioritizing privacy and ease of deployment.

How It Works

The core of GLiNER2 is a unified schema-based architecture, integrating multiple NLP tasks into a single 205M-340M parameter model. It processes information in one forward pass, enabling rapid CPU inference. The system leverages a flexible schema definition that allows users to specify entity types with optional descriptions for enhanced accuracy, configure text classification with confidence thresholds, and define complex structured data extraction with field-level types and constraints. This design prioritizes efficiency and broad accessibility.

Quick Start & Requirements

Installation: pip install gliner2
Models: Pre-trained models (fastino/gliner2-base-v1, fastino/gliner2-large-v1) are available on Hugging Face.
Prerequisites: Standard Python environment; no GPU is required.
Documentation: Available via Hugging Face model pages and the project's citation link.

Highlighted Details

Unified Task Execution: Performs NER, classification, and structured data extraction concurrently within a single model and forward pass.
CPU-Optimized Performance: Achieves fast inference speeds on standard CPUs, eliminating the need for specialized hardware.
Schema-Driven Precision: Enables fine-grained control over extraction through a declarative schema, supporting entity descriptions, classification thresholds, and structured field constraints (types, choices).
Multi-Task Composition: Facilitates complex analysis by combining entity, classification, and structured data extraction schemas.

Maintenance & Community

The project builds upon the original GLiNER architecture by Fastino AI. Specific community channels or active maintenance signals beyond the provided citation are not detailed in the README.

Licensing & Compatibility

Licensed under the Apache License 2.0, permitting broad commercial use and integration into closed-source applications.

Limitations & Caveats

The README does not explicitly detail limitations. While CPU-optimized, the model size (205M-340M parameters) may still represent a significant resource footprint for highly constrained environments. Performance on highly specialized or out-of-domain data may require further fine-tuning.

Health Check

Last Commit

1 day ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

322 stars in the last 30 days