g1  by bklieger-groq

Reasoning chains prototype using Llama-3.1 70b on Groq

created 10 months ago
4,229 stars

Top 11.8% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

This project demonstrates a prompting strategy to enhance LLM reasoning capabilities, specifically for logical problems that often challenge standard models. It targets users and researchers interested in improving LLM performance through structured, step-by-step "thinking" processes without fine-tuning, showcasing a method to achieve "o1-like" reasoning chains.

How It Works

The project leverages Llama-3.1-70b on the Groq platform to implement dynamic Chain-of-Thought reasoning. It prompts the LLM to break down problems into titled steps, explore alternative solutions, question its own reasoning, and utilize at least three distinct methods to arrive at an answer. This approach, which visualizes the entire reasoning process, aims to improve accuracy on logic puzzles by forcing the model to engage in deeper, more self-critical analysis.

Quick Start & Requirements

  • Streamlit UI:
    • python3 -m venv venv
    • source venv/bin/activate
    • pip3 install -r requirements.txt
    • export GROQ_API_KEY=gsk...
    • streamlit run app.py
  • Gradio UI:
    • cd gradio
    • pip3 install -r requirements.txt
    • python3 app.py
  • Prerequisites: Groq API Key.

Highlighted Details

  • Achieves ~70% accuracy on the "Strawberry problem" (counting 'R's) using prompting alone, compared to 0% for base Llama-3.1-70b and 30% for ChatGPT-4o.
  • Demonstrates a prompting strategy that encourages LLMs to use multiple reasoning methods and self-correction.
  • Outputs reasoning steps in a structured JSON format for clarity.
  • Aims to inspire open-source community development of similar reasoning strategies.

Maintenance & Community

Developed by Benjamin Klieger. Links to Huggingface Spaces and a local LLM R implementation (thinkR) are provided as related projects.

Licensing & Compatibility

The repository's license is not explicitly stated in the README.

Limitations & Caveats

The project is described as an "early prototype" and "experimental." While initial testing shows significant improvement on simple logic problems (60-80% accuracy), formal accuracy evaluation is pending. The effectiveness is tied to the Groq platform and the specific Llama-3.1-70b model.

Health Check
Last commit

6 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
34 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), John Yang John Yang(Author of SWE-bench, SWE-agent), and
7 more.

tree-of-thought-llm by princeton-nlp

0.3%
5k
Research paper implementation for Tree of Thoughts (ToT) prompting
created 2 years ago
updated 6 months ago
Feedback? Help us improve.