stc  by nexus-stc

Distributed search engine and AI tools for accessing knowledge

created 2 years ago
446 stars

Top 68.4% on sourcepulse

GitHubView on GitHub
Project Summary

STC is a distributed, free search engine and AI tooling suite designed to provide open access to academic knowledge and fictional literature. It targets researchers, students, and AI developers seeking to leverage large corpora of scholarly texts without reliance on centralized servers, offering a decentralized and censorship-resistant approach to information access.

How It Works

STC combines a search engine, inspired by Summa, with IPFS-hosted databanks. This architecture allows for efficient searching without downloading entire datasets. The system is versatile, functioning as a standalone server, an embeddable Python library, or a WASM module for browser-based applications, enabling deployment on static sites hosted via IPFS.

Quick Start & Requirements

  • Install/Run: Primarily through Python library (pip install nexus-stc) or Docker. WASM module available for browser use.
  • Prerequisites: Python, IPFS (Kubo), potentially AI models (OpenAI or local LLMs). Specific hardware requirements for local LLMs are noted as a challenge.
  • Resources: Setup time and resource footprint vary based on corpus size and AI model usage.
  • Links: Web STC, Telegram bots, Roadmap

Highlighted Details

  • IPFS-based databanks for decentralized, censorship-resistant access.
  • Multiple deployment options: server, Python library, browser (WASM).
  • Integration with AI tools (e.g., OpenAI, local LLMs) for Q&A and summarization.
  • Focus on accessing and processing scholarly texts, including corpus assimilation (LibGen, SciMag).

Maintenance & Community

  • Active development with ongoing corpus assimilation and feature implementation.
  • Community engagement via Telegram channels.
  • Roadmap includes ambitious goals like global replication and space-based outposts.

Licensing & Compatibility

  • License details are not explicitly stated in the README. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

  • The project is in active development, with several roadmap items marked as "in progress" (e.g., SciMag corpus assimilation, local LLM support).
  • User experience for loading large data chunks needs improvement.
  • First-class support for local LLMs is still under extensive testing, with current models failing on CPU.
  • Copyright issues are an explicit area of focus for community activities.
Health Check
Last commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
17 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.