nCPU  by robertcprice

GPU-native CPU executes operations via trained neural networks

Created 4 months ago
634 stars

Top 52.1% on SourcePulse

GitHubView on GitHub
Project Summary

This project introduces nCPU, a novel CPU architecture where all components—registers, memory, and program counter—are represented as GPU tensors, and ALU operations are executed by trained neural networks. It targets researchers and power users exploring alternative computing paradigms and offers a unique approach to hardware acceleration by leveraging deep learning models for fundamental arithmetic and logic operations, potentially enabling new forms of computation.

How It Works

The nCPU architecture runs entirely on the GPU, with all state managed as PyTorch tensors. Instruction fetch, decode, execution, and state updates occur on-device, eliminating host CPU round-trips. Each ALU operation is routed through a specific trained neural network model: addition uses a Kogge-Stone carry-lookahead network, multiplication employs a learned byte-pair lookup table, bitwise operations utilize neural truth tables, and shifts are handled by attention-based bit routing. This model-native approach aims for high accuracy and explores the transferability of classical hardware design principles to neural architectures.

Quick Start & Requirements

  • Installation: pip install -e ".[dev]"
  • Execution: python main.py --program programs/sum_1_to_10.asm
  • Prerequisites: PyTorch, GPU with Metal support (benchmarks specific to Apple Silicon MPS backend, PyTorch 2.10.0).
  • Resources: ~135 MB for 23 trained models.
  • Links: Official Docs, Research Paper, Benchmarks, DOOM Demo

Highlighted Details

  • Achieves 100% accuracy on integer arithmetic, validated by 347 automated tests.
  • Multiplication is 12x faster than addition (21 µs vs. 248 µs) due to parallel LUT lookups versus O(log n) carry propagation.
  • Kogge-Stone carry-lookahead implemented via a trained network yields a 3.3x speedup for ADD/SUB/CMP operations.
  • Vectorized shift operations achieve a 6.5x speedup through attention-based routing.
  • Offers two modes: Neural Mode (default, model inference) and Fast Mode (native tensor ops, targeting 1.35M IPS on Apple Silicon).
  • Includes native Metal GPU implementations (MLX and Rust) for zero CPU-GPU synchronization.

Maintenance & Community

No specific details regarding maintainers, community channels (e.g., Discord, Slack), or project roadmap are provided in the README.

Licensing & Compatibility

The project is released under the MIT License, permitting commercial use and integration with closed-source projects without significant restrictions.

Limitations & Caveats

As a research runtime, nCPU may not be production-ready. Performance benchmarks are primarily demonstrated on Apple Silicon, and broader hardware compatibility for optimal performance is not detailed. The project explores a highly experimental architecture, and long-term maintenance or community support is not explicitly indicated.

Health Check
Last Commit

1 week ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
1
Star History
60 stars in the last 30 days

Explore Similar Projects

Starred by David Cournapeau David Cournapeau(Author of scikit-learn), Stas Bekman Stas Bekman(Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake), and
5 more.

lectures by gpu-mode

0.3%
6k
Lecture series for GPU-accelerated computing
Created 2 years ago
Updated 2 months ago
Feedback? Help us improve.