Discover and explore top open-source AI tools and projects—updated daily.
gyj155Semantic search for academic papers
New!
Top 85.7% on SourcePulse
This repository offers a tool for semantic search of academic papers, enabling users to find similar research using embedding models. It caters to researchers and engineers needing to efficiently discover relevant literature, providing flexibility with both free local models and higher-quality OpenAI API integration.
How It Works
The system converts paper titles and abstracts into vector embeddings, which are then cached for performance. User queries (either example papers or text descriptions) are embedded using the same model. Cosine similarity is employed to calculate the relevance between the query embedding and the cached paper embeddings, ranking results by similarity score.
Quick Start & Requirements
pip install -r requirements.txtcrawl_papers.PaperSearcher with paper data and select model_type='local' or model_type='openai'.searcher.search().Highlighted Details
all-MiniLM-L6-v2) and paid, higher-fidelity OpenAI embeddings.Maintenance & Community
No information regarding maintainers, community channels (like Discord/Slack), or project roadmap is present in the README.
Licensing & Compatibility
The README does not specify a software license. This lack of clarity presents a significant adoption blocker, particularly for commercial or closed-source integration.
Limitations & Caveats
The project's effectiveness is tied to the quality of the chosen embedding model. Preparation of paper data is a necessary prerequisite. The absence of explicit licensing information is a critical caveat for evaluating adoption.
2 weeks ago
Inactive
ZachNagengast
freedmand
Future-House