stc by nexus-stc

Distributed search engine and AI tools for accessing knowledge

Created 3 years ago

482 stars

Top 62.9% on SourcePulse

Project Summary

STC is a distributed, free search engine and AI tooling suite designed to provide open access to academic knowledge and fictional literature. It targets researchers, students, and AI developers seeking to leverage large corpora of scholarly texts without reliance on centralized servers, offering a decentralized and censorship-resistant approach to information access.

How It Works

STC combines a search engine, inspired by Summa, with IPFS-hosted databanks. This architecture allows for efficient searching without downloading entire datasets. The system is versatile, functioning as a standalone server, an embeddable Python library, or a WASM module for browser-based applications, enabling deployment on static sites hosted via IPFS.

Quick Start & Requirements

Install/Run: Primarily through Python library (pip install nexus-stc) or Docker. WASM module available for browser use.
Prerequisites: Python, IPFS (Kubo), potentially AI models (OpenAI or local LLMs). Specific hardware requirements for local LLMs are noted as a challenge.
Resources: Setup time and resource footprint vary based on corpus size and AI model usage.
Links: Web STC, Telegram bots, Roadmap

Highlighted Details

IPFS-based databanks for decentralized, censorship-resistant access.
Multiple deployment options: server, Python library, browser (WASM).
Integration with AI tools (e.g., OpenAI, local LLMs) for Q&A and summarization.
Focus on accessing and processing scholarly texts, including corpus assimilation (LibGen, SciMag).

Maintenance & Community

Active development with ongoing corpus assimilation and feature implementation.
Community engagement via Telegram channels.
Roadmap includes ambitious goals like global replication and space-based outposts.

Licensing & Compatibility

License details are not explicitly stated in the README. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project is in active development, with several roadmap items marked as "in progress" (e.g., SciMag corpus assimilation, local LLM support).
User experience for loading large data chunks needs improvement.
First-class support for local LLMs is still under extensive testing, with current models failing on CPU.
Copyright issues are an explicit area of focus for community activities.

Health Check

Last Commit

2 years ago

Responsiveness

1 week

Pull Requests (30d)

Issues (30d)

Star History

1 stars in the last 30 days