GLiNER2  by fastino-ai

Unified information extraction for diverse NLP tasks

Created 6 months ago
480 stars

Top 63.8% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

Unified Schema-Based Information Extraction

GLiNER2 is a unified information extraction framework that consolidates Named Entity Recognition (NER), text classification, and structured data extraction into a single, efficient model. Designed for CPU-first inference, it offers fast, local processing without requiring GPUs or external dependencies, making advanced NLP accessible on standard hardware. This approach benefits users needing versatile data extraction capabilities while prioritizing privacy and ease of deployment.

How It Works

The core of GLiNER2 is a unified schema-based architecture, integrating multiple NLP tasks into a single 205M-340M parameter model. It processes information in one forward pass, enabling rapid CPU inference. The system leverages a flexible schema definition that allows users to specify entity types with optional descriptions for enhanced accuracy, configure text classification with confidence thresholds, and define complex structured data extraction with field-level types and constraints. This design prioritizes efficiency and broad accessibility.

Quick Start & Requirements

  • Installation: pip install gliner2
  • Models: Pre-trained models (fastino/gliner2-base-v1, fastino/gliner2-large-v1) are available on Hugging Face.
  • Prerequisites: Standard Python environment; no GPU is required.
  • Documentation: Available via Hugging Face model pages and the project's citation link.

Highlighted Details

  • Unified Task Execution: Performs NER, classification, and structured data extraction concurrently within a single model and forward pass.
  • CPU-Optimized Performance: Achieves fast inference speeds on standard CPUs, eliminating the need for specialized hardware.
  • Schema-Driven Precision: Enables fine-grained control over extraction through a declarative schema, supporting entity descriptions, classification thresholds, and structured field constraints (types, choices).
  • Multi-Task Composition: Facilitates complex analysis by combining entity, classification, and structured data extraction schemas.

Maintenance & Community

The project builds upon the original GLiNER architecture by Fastino AI. Specific community channels or active maintenance signals beyond the provided citation are not detailed in the README.

Licensing & Compatibility

Licensed under the Apache License 2.0, permitting broad commercial use and integration into closed-source applications.

Limitations & Caveats

The README does not explicitly detail limitations. While CPU-optimized, the model size (205M-340M parameters) may still represent a significant resource footprint for highly constrained environments. Performance on highly specialized or out-of-domain data may require further fine-tuning.

Health Check
Last Commit

2 weeks ago

Responsiveness

Inactive

Pull Requests (30d)
7
Issues (30d)
9
Star History
123 stars in the last 30 days

Explore Similar Projects

Starred by Elvis Saravia Elvis Saravia(Founder of DAIR.AI), Stas Bekman Stas Bekman(Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake), and
3 more.

nlp-library by mihail911

0%
1k
NLP papers for practitioners
Created 8 years ago
Updated 5 years ago
Feedback? Help us improve.