Keyword extraction tool using BERT embeddings
Top 12.6% on sourcepulse
KeyBERT provides a minimal and easy-to-use library for extracting keywords and keyphrases from documents using BERT embeddings. It is designed for beginners and researchers looking for a straightforward, powerful method that requires minimal setup.
How It Works
KeyBERT leverages BERT embeddings to find sub-phrases most similar to the document's overall meaning. It first generates a document embedding, then extracts embeddings for candidate N-grams within the text. Cosine similarity is used to identify the N-grams most semantically similar to the document embedding, serving as the extracted keywords. Advanced techniques like Max Sum Distance and Maximal Marginal Relevance (MMR) are available for diversifying keyword results.
Quick Start & Requirements
pip install keybert
keybert[flair]
, keybert[gensim]
, keybert[spacy]
, keybert[use]
pip install keybert --no-deps scikit-learn model2vec
all-MiniLM-L6-v2
(English), paraphrase-multilingual-MiniLM-L12-v2
(multilingual).Highlighted Details
pip install keybert
and 3 lines of code.Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The library relies on pre-trained models, and performance is dependent on the chosen embedding model's quality and suitability for the input text. LLM integration requires an OpenAI API key and incurs associated costs.
3 weeks ago
1 day