entropix  by xjdr-alt

Research project for entropy-based context-aware sampling & parallel CoT decoding

Created 11 months ago
3,422 stars

Top 14.1% on SourcePulse

GitHubView on GitHub
Project Summary

This project explores entropy-based sampling for large language models, aiming to improve inference quality by making sampling context-aware. It targets researchers and developers seeking to enhance LLM reasoning and output through novel sampling techniques, potentially simulating advanced CoT capabilities.

How It Works

Entropix leverages entropy and "varentropy" (variance in entropy) as signals to guide the sampling process. High entropy suggests uncertainty and potential for exploration, while low entropy indicates a more predictable path. The sampler aims to navigate these states to achieve more nuanced and contextually relevant text generation, akin to advanced chain-of-thought prompting.

Quick Start & Requirements

  • Install: poetry install
  • Prerequisites: Python 3.x, Poetry, Rust (for tiktoken), Hugging Face CLI (for model weights), CUDA (implied for GPU usage).
  • Setup: Requires downloading model weights and tokenizer files.
  • Run: PYTHONPATH=. poetry run python entropix/main.py (JAX) or PYTHONPATH=. poetry run python entropix/torch_main.py (PyTorch).
  • Docs: [Not explicitly linked, but implied by the structure.]

Highlighted Details

  • Supports Llama 3.1+ models, with plans for DeepSeek V2 and Mistral Large.
  • Offers both JAX (for TPU) and PyTorch (for GPU) implementations.
  • Includes notes on disabling JAX JIT for faster iteration and managing VRAM.
  • Future plans include splitting into entropix-local (single GPU, Metal) and entropix (multi-GPU, TPU) repos, plus a training component.

Maintenance & Community

  • The project is described as a research work-in-progress with active development and plans for significant restructuring.
  • Author is active on X (@_xjdr).
  • Acknowledges contributions from several individuals for compute and development support.

Licensing & Compatibility

  • No license is explicitly stated in the README.
  • Compatibility for commercial use or closed-source linking is undetermined.

Limitations & Caveats

The project is explicitly labeled "HERE BE DRAGONS!!!! THIS IS NOT A FINISHED PRODUCT AND WILL BE UNSTABLE AS HELL RIGHT NOW." Significant restructuring is planned, and PRs are temporarily discouraged. The current state may be partially broken with an unmerged backlog.

Health Check
Last Commit

10 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
22 stars in the last 30 days

Explore Similar Projects

Starred by Yineng Zhang Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI).

dots.llm1 by rednote-hilab

0.2%
462
MoE model for research
Created 4 months ago
Updated 4 weeks ago
Starred by Yineng Zhang Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI), Nikola Borisov Nikola Borisov(Founder and CEO of DeepInfra), and
3 more.

tensorrtllm_backend by triton-inference-server

0.2%
889
Triton backend for serving TensorRT-LLM models
Created 2 years ago
Updated 1 day ago
Starred by Shizhe Diao Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA), Yineng Zhang Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI), and
8 more.

EAGLE by SafeAILab

10.6%
2k
Speculative decoding research paper for faster LLM inference
Created 1 year ago
Updated 1 week ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Tim J. Baek Tim J. Baek(Founder of Open WebUI), and
7 more.

gemma.cpp by google

0.1%
7k
C++ inference engine for Google's Gemma models
Created 1 year ago
Updated 1 day ago
Feedback? Help us improve.