Discover and explore top open-source AI tools and projects—updated daily.
DGoettlichHistorical LLMs trained on time-locked data
New!
Top 29.8% on SourcePulse
This project trains large language models (LLMs) exclusively on historical, time-stamped data to serve as research tools for humanities and social sciences. By creating "time-locked" models with specific knowledge cutoffs, it enables exploration of historical discourse patterns without modern hindsight contamination, offering a unique window into past thought.
How It Works
The core approach involves training LLMs from scratch on massive historical text corpora (80B-600B+ tokens) with defined knowledge cutoffs (e.g., 1913, 1929). This creates "time-locked" models that embody the textual culture of their era, preventing hindsight contamination inherent in general-purpose LLMs. A key design goal is "uncontaminated bootstrapping," minimizing interference with normative judgments acquired during pretraining for scientific applications.
Quick Start & Requirements
Artifacts, data, and repositories are slated for future public release. Installation, specific hardware requirements (e.g., GPU, CUDA), and setup procedures are not yet detailed. Links to official quick-start guides or demos are not provided in the README.
Highlighted Details
Maintenance & Community
Developed by researchers from the University of Zurich and Cologne University. Acknowledges research credits from Lambda AI. Community engagement is invited via history-llms@econ.uzh.ch for input on research directions and access frameworks.
Licensing & Compatibility
The README does not specify a software license. Compatibility for commercial use or closed-source linking is undetermined.
Limitations & Caveats
Models will reproduce historical biases (racism, misogyny, etc.) present in training data, a feature for studying historical articulation. Access will be restricted to researchers via a scholarly framework to prevent misuse. Models represent published text, not public opinion, and are not substitutes for human interpretation. Artifacts are not yet publicly available.
2 weeks ago
Inactive
princeton-nlp
danqi
stanford-oval