kernel-memory  by microsoft

RAG architecture for indexing and querying data using LLMs

created 2 years ago
2,036 stars

Top 22.3% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

Kernel Memory (KM) provides a comprehensive Retrieval Augmented Generation (RAG) architecture for indexing and querying diverse data sources using LLMs. It targets developers building AI applications who need to integrate natural language search, source tracking, and citations. KM offers a flexible, multi-modal service that can be deployed as a web service, Docker container, or embedded .NET library, simplifying the creation of intelligent search and Q&A systems.

How It Works

KM employs a hybrid data pipeline for efficient indexing, supporting RAG, synthetic memory, and custom semantic processing. It extracts text from various file formats, partitions it into manageable chunks, generates embeddings using configurable LLM providers (e.g., OpenAI, Azure OpenAI), and stores these embeddings in a choice of vector databases (e.g., Azure AI Search, Qdrant). This approach allows for natural language querying with precise source citations and facilitates fine-grained access control via document ownership and tags.

Quick Start & Requirements

Highlighted Details

  • Supports a wide range of data formats including PDF, Word, PowerPoint, Excel, Images, and web pages.
  • Offers extensive extensibility for file storage, queues, vector stores, and LLMs.
  • Integrates seamlessly as a plugin for Semantic Kernel, Microsoft Copilot, and ChatGPT.
  • Provides detailed token usage reports for LLM interactions.

Maintenance & Community

The project has a large number of contributors, indicating active development and community engagement. Links to community resources are not explicitly provided in the README.

Licensing & Compatibility

The project is licensed under the MIT License, permitting commercial use and integration with closed-source applications.

Limitations & Caveats

The code is presented as a demonstration and is not an officially supported Microsoft offering. While flexible, custom pipeline development requires .NET expertise.

Health Check
Last commit

2 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
1
Star History
137 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems) and Elie Bursztein Elie Bursztein(Cybersecurity Lead at Google DeepMind).

LightRAG by HKUDS

1.0%
19k
RAG framework for fast, simple retrieval-augmented generation
created 10 months ago
updated 18 hours ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Anton Troynikov Anton Troynikov(Cofounder of Chroma), and
20 more.

llama_index by run-llama

0.3%
43k
Data framework for building LLM-powered agents
created 2 years ago
updated 19 hours ago
Feedback? Help us improve.