g1 by build-with-groq

Reasoning chains prototype using Llama-3.1 70b on Groq

Created 1 year ago

4,218 stars

Top 11.6% on SourcePulse

View on GitHub

2 Experts Love This Project

Philipp Schmid

DevRel at Google DeepMind

Elvis Saravia

Founder of DAIR.AI

Project Summary

This project demonstrates a prompting strategy to enhance LLM reasoning capabilities, specifically for logical problems that often challenge standard models. It targets users and researchers interested in improving LLM performance through structured, step-by-step "thinking" processes without fine-tuning, showcasing a method to achieve "o1-like" reasoning chains.

How It Works

The project leverages Llama-3.1-70b on the Groq platform to implement dynamic Chain-of-Thought reasoning. It prompts the LLM to break down problems into titled steps, explore alternative solutions, question its own reasoning, and utilize at least three distinct methods to arrive at an answer. This approach, which visualizes the entire reasoning process, aims to improve accuracy on logic puzzles by forcing the model to engage in deeper, more self-critical analysis.

Quick Start & Requirements

Streamlit UI:
- python3 -m venv venv
- source venv/bin/activate
- pip3 install -r requirements.txt
- export GROQ_API_KEY=gsk...
- streamlit run app.py
Gradio UI:
- cd gradio
- pip3 install -r requirements.txt
- python3 app.py
Prerequisites: Groq API Key.

Highlighted Details

Achieves ~70% accuracy on the "Strawberry problem" (counting 'R's) using prompting alone, compared to 0% for base Llama-3.1-70b and 30% for ChatGPT-4o.
Demonstrates a prompting strategy that encourages LLMs to use multiple reasoning methods and self-correction.
Outputs reasoning steps in a structured JSON format for clarity.
Aims to inspire open-source community development of similar reasoning strategies.

Maintenance & Community

Developed by Benjamin Klieger. Links to Huggingface Spaces and a local LLM R implementation (thinkR) are provided as related projects.

Licensing & Compatibility

The repository's license is not explicitly stated in the README.

Limitations & Caveats

The project is described as an "early prototype" and "experimental." While initial testing shows significant improvement on simple logic problems (60-80% accuracy), formal accuracy evaluation is pending. The effectiveness is tied to the Groq platform and the specific Llama-3.1-70b model.

Health Check

Last Commit

1 week ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

6 stars in the last 30 days