Discover and explore top open-source AI tools and projects—updated daily.
salmanmohammadiClaude Code model training library
Top 98.0% on SourcePulse
Summary
nanocode is a comprehensive JAX library for end-to-end training of custom Claude Code large language models using Constitutional AI. It targets researchers and developers aiming to build powerful, cost-effective code generation models, offering a full pipeline from tokenizer training to agentic SFT and DPO alignment, optimized for Google TPUs.
How It Works
The library is built in pure JAX, leveraging TPU acceleration for efficient training. Its core approach integrates Constitutional AI principles throughout the model development lifecycle. This includes custom tokenizer training, large-scale pretraining on diverse datasets, synthetic data generation pipelines for specialized tasks, agentic supervised fine-tuning with tool use capabilities, and Direct Preference Optimization (DPO) for constitutional alignment, enabling fine-grained control over model behavior.
Quick Start & Requirements
gcloud, and running ./install.sh tpu on the TPU pod, followed by executing speedrun_*.sh scripts for training.gcloud CLI, tmux. For synthetic data generation, an OpenRouter API key or a local vLLM server is needed. NVIDIA GPUs require --attn-impl=eager.d24) costs approximately $200 and takes ~9 hours on a TPU v6e-8. A 477M parameter model (d20) costs $34 and takes ~1.5 hours.smohammadi/nanocode-tulu-selfoss-evol).Highlighted Details
Maintenance & Community
The project is authored by Salman Mohammadi. No specific community channels (e.g., Discord, Slack), sponsorships, or notable contributors are detailed in the provided README.
Licensing & Compatibility
The project is released under the MIT license, which is permissive for commercial use and integration into closed-source projects.
Limitations & Caveats
The library is primarily optimized for and designed to run on Google TPUs. While NVIDIA GPU support is mentioned, it requires specific flags (--attn-impl=eager) and has not been extensively tested on multi-GPU configurations. Synthetic data generation necessitates external API keys or local server setups.
3 weeks ago
Inactive
multimodal-art-projection
ise-uiuc