FILM  by microsoft

LLM for enhanced context utilization

created 1 year ago
253 stars

Top 99.5% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides the official implementation for FILM-7B, a 32K-context Large Language Model designed to overcome the "lost-in-the-middle" problem. It targets researchers and developers working with long-context LLMs, offering improved performance on both probing and real-world tasks without sacrificing short-context capabilities.

How It Works

FILM-7B is a fine-tuned version of Mistral-7B-Instruct-v0.2, utilizing Information-Intensive (In2) Training. This approach enhances the model's ability to effectively process and recall information from extended contexts, achieving near-perfect scores on probing tasks and state-of-the-art performance for its size class on real-world long-context benchmarks.

Quick Start & Requirements

  • Install: Clone the repo, activate a conda environment, and install dependencies:
    git clone https://github.com/microsoft/FILM.git
    cd FILM
    conda create -n FILM python=3.10.11
    conda activate FILM
    pip install torch==2.0.1 # cuda11.7 and cudnn8
    pip install -r requirements.txt
    
  • Prerequisites: Python 3.10.11, PyTorch 2.0.1 with CUDA 11.7 and cuDNN 8.
  • Resources: Requires GPU with CUDA 11.7.
  • Docs: VaLProbing-32K, real_world_long, short_tasks.

Highlighted Details

  • Achieves near-perfect performance on probing tasks.
  • State-of-the-art results on real-world long-context tasks for ~7B LLMs.
  • Maintains short-context performance.
  • Trained using Information-Intensive (In2) Training.

Maintenance & Community

This project is from Microsoft and welcomes contributions via pull requests, requiring agreement to a Contributor License Agreement (CLA). It adheres to the Microsoft Open Source Code of Conduct.

Licensing & Compatibility

The repository does not explicitly state a license in the README. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

This repository is strictly for research purposes and is not an official Microsoft product or service. The specific license for the model weights and code is not detailed in the README.

Health Check
Last commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
2 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), and
1 more.

yarn by jquesnelle

1.0%
2k
Context window extension method for LLMs (research paper, models)
created 2 years ago
updated 1 year ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems) and Georgios Konstantopoulos Georgios Konstantopoulos(CTO, General Partner at Paradigm).

LongLoRA by dvlab-research

0.1%
3k
LongLoRA: Efficient fine-tuning for long-context LLMs
created 1 year ago
updated 11 months ago
Feedback? Help us improve.