Annif  by NatLibFi

Automated subject indexing for libraries and archives

Created 8 years ago
252 stars

Top 99.6% on SourcePulse

GitHubView on GitHub
Project Summary

Automated subject indexing is addressed by Annif, a toolkit designed for libraries, archives, and museums. It offers a multi-algorithm approach to automate subject indexing, providing significant benefits by streamlining metadata creation processes for these institutions. The target audience includes technical users and institutions looking to enhance their cataloging workflows.

How It Works

Annif is a multi-algorithm automated subject indexing toolkit. It leverages various indexing approaches and models, originally trained on metadata from Finna.fi. The system offers a command-line interface (CLI) for administration, a REST API for programmatic access, and a web UI for end-users, facilitating flexible integration and operation.

Quick Start & Requirements

  • Primary install: Requires Python 3.10-3.13. Installation via pip into a virtual environment (python3 -m venv annif-venv, source annif-venv/bin/activate, pip install annif) is recommended.
  • Running: Use the annif command.
  • Prerequisites: Linux is the primary development OS; Docker or a Linux VM is advised for Windows/Mac. Detailed setup for optional backends (fastText, Omikuji) and analyzers (Voikko, spaCy) is available.
  • Links: Getting Started, Optional Features, Annif Tutorial, User Forum, API Docs, Project Website.

Highlighted Details

  • Supports multiple indexing algorithms and backends.
  • Provides a REST API and a web UI for end-user interaction.
  • Includes CLI commands for administration and configuration.
  • Finto AI is a service built upon Annif, with models available on Hugging Face Hub.
  • Offers shell tab-completion for commands and parameters.

Maintenance & Community

Annif is actively used and referenced in academic publications, indicating ongoing relevance and development. Community support and questions are primarily handled through the annif-users discussion forum.

Licensing & Compatibility

The core code is licensed under the Apache License 2.0. However, optional dependencies, notably the YAKE library, are licensed under GPLv3. This dual-licensing situation may impose copyleft obligations (requiring source code publication) on the entire Annif application if the GPLv3-licensed YAKE dependency is installed and the combined work is distributed, depending on legal interpretation.

Limitations & Caveats

Windows and macOS users are recommended to use Docker or a Linux virtual machine for optimal compatibility. The GPLv3 license of the optional YAKE dependency presents potential licensing complexities for users distributing modified or extended versions of Annif.

Health Check
Last Commit

2 days ago

Responsiveness

1 day

Pull Requests (30d)
5
Issues (30d)
7
Star History
3 stars in the last 30 days

Explore Similar Projects

Starred by Tobi Lutke Tobi Lutke(Cofounder of Shopify), John Resig John Resig(Author of jQuery; Chief Software Architect at Khan Academy), and
9 more.

lilac by databricks

0%
1k
Data exploration tool for LLM dataset curation and quality control
Created 2 years ago
Updated 1 year ago
Feedback? Help us improve.