Discover and explore top open-source AI tools and projects—updated daily.
tiiuaeDense autoregressive Transformer for multimodal vision-language understanding
New!
Top 71.6% on SourcePulse
Falcon-Perception offers a performant, minimal PyTorch inference engine for natively multimodal, dense, autoregressive Transformer models. It enables object detection, instance segmentation, and OCR via natural language queries, targeting researchers and engineers for efficient multimodal AI deployment. The engine utilizes advanced inference techniques for significant performance gains.
How It Works
The core architecture features dense, autoregressive Transformers with native multimodality. Inference leverages FlexAttention, compiled into fused Triton kernels via PyTorch's flex_attention, enabling composable attention masks and seamless continuous batching with paged attention. This optimizes memory and throughput using a paged KV cache with virtual page tables, eliminating padding waste.
Quick Start & Requirements
Installation uses pip install -e ., with extras like .[torch] for PyTorch/CUDA and .[mlx] for Apple Silicon. PyTorch backend requires CUDA GPUs and compatible drivers. MLX runs natively on Apple Silicon without PyTorch/transformer dependencies. Initial PyTorch setup involves a 10-30 second compilation and CUDA graph capture; subsequent inference is much faster.
Highlighted Details
Maintenance & Community
No specific details regarding maintenance, community channels, or notable contributors were found in the provided README excerpt.
Licensing & Compatibility
The license type and compatibility notes for commercial use or closed-source linking are not explicitly stated in the provided README excerpt.
Limitations & Caveats
Initial PyTorch backend setup incurs a compilation delay. The MLX backend is limited to Apple Silicon hardware. Layout-aware OCR requires an additional installation and a third-party layout detection model. The vLLM Docker server is exclusively for FalconOCR.
3 days ago
Inactive
JosefAlbers