RTL-Coder  by hkust-zhiyao

Advanced LLM for RTL code generation

Created 2 years ago
253 stars

Top 99.3% on SourcePulse

GitHubView on GitHub
Project Summary

Summary

RTLCoder addresses the critical data scarcity challenge in RTL code generation by providing an open-source, LLM-assisted solution. It offers state-of-the-art performance, outperforming GPT-3.5 on Verilog generation tasks, and targets engineers and researchers in IC design. The project's benefit lies in its efficient, lightweight models and a novel approach to dataset creation and model training.

How It Works

The core innovation lies in an automated dataset generation flow that leverages commercial LLMs to create over 27,000 Verilog instruction-code pairs, overcoming data availability hurdles. RTLCoder employs a novel training scheme that incorporates code quality feedback to significantly boost model performance. Additionally, training processes have been algorithmically revised to reduce GPU memory consumption, enabling implementation on more accessible hardware.

Quick Start & Requirements

  • Models: Four models are available on HuggingFace: RTLCoder-Deepseek-v1.1, RTLCoder-v1.1 (Mistral-based), RTLCoder-v1.1-gptq-4bit, and RTLCoder-v1.1-gguf-4bit (CPU-compatible).
  • Inference: Scripts provided use transformers and ctransformers libraries. Example inference code is included in the README.
  • Prerequisites: Python environment, torch, transformers, ctransformers (for CPU version), auto_gptq (for GPTQ version). GPU with >4GB memory recommended for faster inference.
  • Links:

Highlighted Details

  • Achieves state-of-the-art performance for non-commercial RTL code generation.
  • Outperforms GPT-3.5 in design RTL generation tasks.
  • Features an automated dataset generation flow producing over 27,000 Verilog instruction-code samples.
  • Novel training scheme incorporates code quality feedback for performance enhancement.
  • Quantized versions (GPTQ, GGUF) enable lower resource usage and CPU inference.

Maintenance & Community

The project is associated with multiple IEEE publications, indicating academic backing. No specific community channels (e.g., Discord, Slack) or explicit roadmap details are provided in the README.

Licensing & Compatibility

The README does not explicitly state a license. It mentions "non-commercial solutions" and adherence to OpenAI's terms for dataset generation, suggesting potential restrictions on commercial use. Compatibility with closed-source linking is not specified.

Limitations & Caveats

The RTLCoder-Deepseek-v1.1 model may require post-processing to ensure correct output termination. The generated dataset, while extensive, may contain inaccuracies in problem descriptions and code, as it was created using GPT-3.5-turbo. The primary focus is on Verilog RTL code generation.

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
5 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.