Discover and explore top open-source AI tools and projects—updated daily.
MihaiValentinFast, multilingual search for AI and edge applications
Top 66.3% on SourcePulse
Summary Lunr Languages provides a collection of language stemmers and stopwords for the Lunr.js JavaScript search library, enabling fast, multilingual full-text search. It serves developers building search capabilities into AI, RAG, local-first applications, and static sites, offering a lightweight, zero-infrastructure retrieval layer that enhances context retrieval for LLMs.
How It Works
This project extends Lunr.js by integrating language-specific tokenization, stemming, and stopword filtering for over 30 languages. Its core advantage lies in delivering efficient, consistent lexical retrieval without requiring external databases or complex infrastructure, making it ideal for client-side or Node.js environments. Advanced Chinese tokenization leverages Intl.Segmenter for browser compatibility and offers optional integration with @node-rs/jieba in Node.js for improved segmentation quality.
Quick Start & Requirements
npm install lunr-languagesIntl.Segmenter support. For enhanced Chinese segmentation in Node.js, install @node-rs/jieba.Highlighted Details
Maintenance & Community
Maintained as an open-source project for over a decade, the project seeks sponsorship or contributions to ensure continued stability and development. No specific community channels (e.g., Discord, Slack) are listed.
Licensing & Compatibility
The license type is not explicitly stated in the provided README content, which may impact commercial adoption or integration. The library is designed for browser and Node.js environments.
Limitations & Caveats
Chinese tokenization in browsers is dependent on Intl.Segmenter availability, with no bundled fallback. In Node.js, the fallback to Intl.Segmenter (when @node-rs/jieba is absent) may yield less precise results for Chinese text. The absence of a clearly stated license is a notable caveat for adoption.
5 days ago
Inactive
freedmand
devflowinc
oramasearch
ankane
meilisearch