g1  by build-with-groq

Reasoning chains prototype using Llama-3.1 70b on Groq

Created 1 year ago
4,219 stars

Top 11.6% on SourcePulse

GitHubView on GitHub
Project Summary

This project demonstrates a prompting strategy to enhance LLM reasoning capabilities, specifically for logical problems that often challenge standard models. It targets users and researchers interested in improving LLM performance through structured, step-by-step "thinking" processes without fine-tuning, showcasing a method to achieve "o1-like" reasoning chains.

How It Works

The project leverages Llama-3.1-70b on the Groq platform to implement dynamic Chain-of-Thought reasoning. It prompts the LLM to break down problems into titled steps, explore alternative solutions, question its own reasoning, and utilize at least three distinct methods to arrive at an answer. This approach, which visualizes the entire reasoning process, aims to improve accuracy on logic puzzles by forcing the model to engage in deeper, more self-critical analysis.

Quick Start & Requirements

  • Streamlit UI:
    • python3 -m venv venv
    • source venv/bin/activate
    • pip3 install -r requirements.txt
    • export GROQ_API_KEY=gsk...
    • streamlit run app.py
  • Gradio UI:
    • cd gradio
    • pip3 install -r requirements.txt
    • python3 app.py
  • Prerequisites: Groq API Key.

Highlighted Details

  • Achieves ~70% accuracy on the "Strawberry problem" (counting 'R's) using prompting alone, compared to 0% for base Llama-3.1-70b and 30% for ChatGPT-4o.
  • Demonstrates a prompting strategy that encourages LLMs to use multiple reasoning methods and self-correction.
  • Outputs reasoning steps in a structured JSON format for clarity.
  • Aims to inspire open-source community development of similar reasoning strategies.

Maintenance & Community

Developed by Benjamin Klieger. Links to Huggingface Spaces and a local LLM R implementation (thinkR) are provided as related projects.

Licensing & Compatibility

The repository's license is not explicitly stated in the README.

Limitations & Caveats

The project is described as an "early prototype" and "experimental." While initial testing shows significant improvement on simple logic problems (60-80% accuracy), formal accuracy evaluation is pending. The effectiveness is tied to the Groq platform and the specific Llama-3.1-70b model.

Health Check
Last Commit

1 week ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
1
Star History
5 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.