attachments  by MaximeRivest

Funnel any file into LLM context

Created 3 months ago
277 stars

Top 93.6% on SourcePulse

GitHubView on GitHub
Project Summary

This library provides a unified interface for processing various file types into a format suitable for Large Language Models (LLMs), extracting both text and images. It aims to simplify the integration of diverse data sources into LLM workflows for developers and researchers.

How It Works

The core of the library is the Attachments class, which acts as a central "funnel." It accepts file paths or URLs and employs a pipeline of loaders, modifiers, presenters, refiners, and adapters. This modular design allows for extensibility, enabling users to contribute custom processing steps for new file formats or LLM APIs. The library supports a Domain Specific Language (DSL) for advanced customization of the processing pipeline.

Quick Start & Requirements

  • Primary install: pip install attachments
  • Additional support: pip install attachments[office] for Microsoft Office formats, pip install attachments[browser] for advanced web scraping with visual highlighting.
  • Prerequisites: Python 3.7+. Playwright and browser binaries are required for visual highlighting features.
  • Links: Demo, Quick-start, API Reference

Highlighted Details

  • Seamless integration with OpenAI and Anthropic APIs for multimodal LLM interactions.
  • DSPy integration for class-based and string-based signatures with automatic type registration.
  • Advanced DSL for complex pipelines, including web scraping with CSS selectors and visual highlighting.
  • Support for a wide range of file formats including PDF, DOCX, PPTX, CSV, JSON, and various image types.

Maintenance & Community

The project is actively developed, with an alpha version available for testing new features. Users are encouraged to contribute via GitHub Issues or Pull Requests.

Licensing & Compatibility

The library is released under the MIT License, permitting commercial use and integration with closed-source projects.

Limitations & Caveats

The project is currently in alpha, indicating potential for bugs and breaking changes. While extensive, the list of supported formats and adapters is still growing, with plans for audio, video, and more cloud service integrations.

Health Check
Last Commit

3 days ago

Responsiveness

Inactive

Pull Requests (30d)
5
Issues (30d)
2
Star History
48 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Simon Willison Simon Willison(Author of Django), and
1 more.

Lumos by andrewnguonly

0.1%
2k
Chrome extension for local LLM web RAG co-piloting
Created 1 year ago
Updated 7 months ago
Feedback? Help us improve.