Discover and explore top open-source AI tools and projects—updated daily.
context-labsVisual explorer for scientific research papers
Top 48.3% on SourcePulse
Summary
This project provides an interactive web application for exploring the Aella open science dataset, which comprises approximately 100 million scientific articles. It targets researchers and power users by enabling semantic exploration through embeddings, dimensionality reduction, and clustering, offering a novel way to navigate and understand scientific literature.
How It Works
The application features a React/TypeScript frontend and a Python FastAPI backend, storing data locally in SQLite or Cloudflare D1/R2 in production. Its core innovation lies in the data pipeline: scientific papers are processed to generate 768-dimensional semantic embeddings using SPECTER2. These embeddings are then reduced to 2D using UMAP with cosine distance, followed by K-Means clustering optimized via silhouette scores. Interpretability is enhanced by LLM-curated, domain-specific labels, surpassing basic TF-IDF analysis.
Quick Start & Requirements
Prerequisites include Python 3.11+, bun, and the Task runner. Install dependencies with task setup. Download the SQLite database using task db:setup. Run the backend with task backend:dev and the frontend with task frontend:dev in separate terminals. The live explorer is available at https://aella.inference.net.
Highlighted Details
Maintenance & Community
This project is a collaboration between Inference.net and LAION, intentionally scoped as a one-time preview. Significant feature additions are not planned; users are encouraged to fork the repository for further development. Contributions for bug fixes and minor improvements are welcome via pull requests.
Licensing & Compatibility
The project is released under the MIT License, which is permissive for commercial use and integration into closed-source projects.
Limitations & Caveats
The code for the data pipeline used to construct the dataset is not open-source. The project's scope is limited to a preview, and it is not intended for substantial feature expansion.
2 months ago
Inactive
enjalot
databricks
nomic-ai
PAIR-code
Kanaries