Discover and explore top open-source AI tools and projects—updated daily.
kepengxuVision-language models reasoning from camera sensor measurements
New!
Top 69.0% on SourcePulse
PRISM-VL addresses the loss of critical sensor evidence in standard RGB images processed by image signal pipelines. By utilizing RAW-derived Meas.-XYZ inputs and camera metadata, it enables vision-language models (VLMs) to achieve more robust reasoning, particularly beneficial for researchers and engineers aiming to enhance VLM performance in challenging visual scenarios.
How It Works
This project shifts VLM visual input from post-ISP RGB to RAW-derived Meas.-XYZ channels augmented with camera metadata (e.g., ISO, exposure time, aperture). This approach preserves sensor evidence that is often discarded during RGB rendering, leading to improved performance on measurement-sensitive tasks like low-illumination recovery and scene text recognition.
Quick Start & Requirements
bash install_editable.sh.Highlighted Details
Maintenance & Community
No specific community channels (e.g., Discord, Slack) or detailed maintenance information beyond author contributions are provided in the README.
Licensing & Compatibility
The MeasL-Bench-V1 and MeasL-150K-V1 datasets are licensed under CC BY-NC 4.0, permitting non-commercial research and education use only, with mandatory citation. The code license is not explicitly stated.
Limitations & Caveats
This is a research release focused on specific measurement-grounded VLM capabilities. The CC BY-NC 4.0 dataset license restricts commercial applications.
2 weeks ago
Inactive
BAAI-DCAI
zai-org
QwenLM