Discover and explore top open-source AI tools and projects—updated daily.
multimodal-art-projectionOpen-source LLM with pretraining data, pipeline, scripts, and alignment code
Top 37.9% on SourcePulse
MAP-NEO is a fully open-sourced Large Language Model series trained from scratch on 4.5T tokens, offering transparent LLM training and proprietary-model-like performance in reasoning, math, and coding. It targets researchers and developers seeking high-capability bilingual models with full access to training data, pipelines, and code.
How It Works
MAP-NEO is trained from scratch on a 4.5T token bilingual corpus, utilizing a data processing pipeline called "Matrix." The project emphasizes full transparency by releasing pretraining data, intermediate checkpoints, a custom tokenizer, and optimized pretraining codebase. This approach aims to provide a comprehensive resource for understanding and replicating LLM training.
Quick Start & Requirements
Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The README does not detail specific hardware requirements for running the models or the exact nature of the "Matrix" data processing pipeline beyond its name.
11 months ago
Inactive
shm007g
huybery
XueFuzhao
facebookresearch