t1 by chipsalliance

RISC-V Vector implementation inspired by the Cray X1 vector machine

Created 3 years ago

306 stars

Top 87.7% on SourcePulse

Project Summary

T1 (Torrent-1) is a RISC-V vector processor implementation inspired by the Cray X1, targeting researchers and hardware designers. It offers a lane-based microarchitecture with extensive chaining and configurable SRAM-based Vector Register Files (VRFs), supporting standard RISC-V vector extensions and large VLEN/DLEN configurations up to 64K.

How It Works

T1 implements a lane-based microarchitecture with a focus on intensive chaining between Vector Function Units (VFUs) and Load Store Units (LSUs). It features configurable banked SRAM VRFs with various port configurations and pipelined/asynchronous VFUs. The LSU supports instruction-level out-of-order execution and configurable outstanding memory instructions to mitigate latency. The design prioritizes balancing throughput, area, and frequency, allowing users to tune performance by adjusting VRF memory types, pipeline stages, and LSU configurations.

Quick Start & Requirements

Installation: Nix is the primary build system. Docker images are available via docker pull ghcr.io/chipsalliance/t1-<config>:latest.
Prerequisites: Nix package manager, potentially QEMU/KVM for Docker image builds.
Resources: Building and emulation can be resource-intensive.
Documentation: Configuration options and build commands are detailed in the README.

Highlighted Details

Supports standard RISC-V vector extensions (Zve32f, Zve32x) and configurable VLEN/DLEN up to 64K.
Features lane-based execution with support for masked element skipping and direct-connected lane interconnections.
LSU supports instruction-level out-of-order execution and configurable outstanding memory instructions.
Design Space Exploration (DSE) principles allow tuning for efficiency or performance by adjusting VRF memory, VFU pipeline stages, and LSU configurations.

Maintenance & Community

The project is maintained by the CHIPS Alliance.
Development is driven by Nix Flakes. Test cases cover various categories including assembly, MLIR, and PyTorch.

Licensing & Compatibility

License: Apache-2.0 License.
Compatibility: Permissive license suitable for commercial use and integration into closed-source projects.

Limitations & Caveats

The forked Rocket Core is not officially supported and can be replaced.
The LSU has specific requirements for bus ordering and no-MMU support for high-bandwidth ports, which may not be compatible with all RISC-V scalar cores.
No coherence support is provided for high-performance caches.

Health Check

Last Commit

4 days ago

Responsiveness

1 week

Pull Requests (30d)

3

Issues (30d)

0

Star History

2 stars in the last 30 days

Explore Similar Projects

tiny-dream by symisc

Header-only C++ library for Stable Diffusion inference

Created 2 years ago

Updated 2 years ago

dusky by dusklinux

Optimized Arch Linux desktop experience

Created 1 month ago

Updated 16 hours ago

Starred by

Woosuk Kwon

Woosuk Kwon(Coauthor of vLLM),

Ying Sheng

Ying Sheng(Coauthor of SGLang), and

2 more.

Nanoflow by efeslab

LLM serving framework for high throughput

Created 1 year ago

Updated 2 months ago

sarathi-serve by microsoft

LLM serving engine for low-latency & high-throughput inference (OSDI'24 paper)

Created 2 years ago

Updated 3 days ago

amd-strix-halo-toolboxes by kyuz0

LLM inference toolboxes for AMD Ryzen AI Max

Created 5 months ago

Updated 17 hours ago

ventus-gpgpu by THU-DSP-LAB

RISC-V GPGPU processor design and toolchain

Created 3 years ago

Updated 17 hours ago

FlagScale by flagos-ai

Large model toolkit for end-to-end management and scaling

Created 2 years ago

Updated 2 days ago

Starred by

Chaoyu Yang

Chaoyu Yang(Founder of Bento),

Wing Lian

Wing Lian(Founder of Axolotl AI), and

2 more.

luminal by luminal-ai

Deep learning library using composable compilers for high performance

Created 2 years ago

Updated 18 hours ago

Starred by

Chip Huyen

Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems") and

Ying Sheng

Ying Sheng(Coauthor of SGLang).

fastllm by ztxz16

High-performance C++ LLM inference library

Created 2 years ago

Updated 1 month ago

Starred by

Taranjeet Singh

Taranjeet Singh(Cofounder of Mem0),

Chip Huyen

Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and

4 more.

LMCache by LMCache

LLM serving engine extension for reduced TTFT and increased throughput

Created 1 year ago

Updated 17 hours ago

Starred by

Chip Huyen

Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"),

Jeff Hammerbacher

Jeff Hammerbacher(Cofounder of Cloudera), and

9 more.

FlashMLA by deepseek-ai

Efficient CUDA kernels for MLA decoding

Created 10 months ago

Updated 3 weeks ago

Starred by

Tobi Lutke

Tobi Lutke(Cofounder of Shopify),

Andrej Karpathy

Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), and

41 more.

unsloth by unslothai

Finetuning tool for LLMs, targeting speed and memory efficiency

Created 2 years ago

Updated 1 day ago

Feedback? Help us improve.