lunr-languages  by MihaiValentin

Fast, multilingual search for AI and edge applications

Created 12 years ago
450 stars

Top 66.3% on SourcePulse

GitHubView on GitHub
Project Summary

Summary Lunr Languages provides a collection of language stemmers and stopwords for the Lunr.js JavaScript search library, enabling fast, multilingual full-text search. It serves developers building search capabilities into AI, RAG, local-first applications, and static sites, offering a lightweight, zero-infrastructure retrieval layer that enhances context retrieval for LLMs.

How It Works

This project extends Lunr.js by integrating language-specific tokenization, stemming, and stopword filtering for over 30 languages. Its core advantage lies in delivering efficient, consistent lexical retrieval without requiring external databases or complex infrastructure, making it ideal for client-side or Node.js environments. Advanced Chinese tokenization leverages Intl.Segmenter for browser compatibility and offers optional integration with @node-rs/jieba in Node.js for improved segmentation quality.

Quick Start & Requirements

  • Installation: npm install lunr-languages
  • Prerequisites: Node.js or modern browser environment. Chinese tokenization in browsers requires Intl.Segmenter support. For enhanced Chinese segmentation in Node.js, install @node-rs/jieba.
  • Links: Usage examples provided in the README serve as a quick start guide.

Highlighted Details

  • Supports 30+ languages with dedicated stemmers and stopwords.
  • Functions as a lightweight retrieval layer for AI systems, including RAG and hybrid search.
  • Operates entirely client-side or in Node.js, requiring zero infrastructure.
  • Improves search recall and precision for non-English, inflected, or mixed-language datasets.
  • Offers robust Chinese tokenization options for different environments.

Maintenance & Community

Maintained as an open-source project for over a decade, the project seeks sponsorship or contributions to ensure continued stability and development. No specific community channels (e.g., Discord, Slack) are listed.

Licensing & Compatibility

The license type is not explicitly stated in the provided README content, which may impact commercial adoption or integration. The library is designed for browser and Node.js environments.

Limitations & Caveats

Chinese tokenization in browsers is dependent on Intl.Segmenter availability, with no bundled fallback. In Node.js, the fallback to Intl.Segmenter (when @node-rs/jieba is absent) may yield less precise results for Chinese text. The absence of a clearly stated license is a notable caveat for adoption.

Health Check
Last Commit

5 days ago

Responsiveness

Inactive

Pull Requests (30d)
17
Issues (30d)
3
Star History
3 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems") and Simon Willison Simon Willison(Coauthor of Django).

semantra by freedmand

0.0%
3k
CLI tool for semantic document search
Created 3 years ago
Updated 1 year ago
Starred by Vaibhav Nivargi Vaibhav Nivargi(Cofounder of Moveworks), Jared Palmer Jared Palmer(SVP at GitHub; Founder of Turborepo; Author of Formik, TSDX), and
4 more.

searchkick by ankane

0.0%
7k
Ruby gem for integrating intelligent search
Created 13 years ago
Updated 1 day ago
Starred by John Resig John Resig(Author of jQuery; Chief Software Architect at Khan Academy), Simon Horup Eskildsen Simon Horup Eskildsen(Cofounder of Turbopuffer), and
21 more.

meilisearch by meilisearch

0.2%
58k
Search engine API for integrating AI-powered hybrid search
Created 8 years ago
Updated 1 day ago
Feedback? Help us improve.