attachments by MaximeRivest

Funnel any file into LLM context

Created 6 months ago

322 stars

Top 84.2% on SourcePulse

View on GitHub

1 Expert Loves This Project

Omar Khattab

Coauthor of DSPy, ColBERT; Professor at MIT

Project Summary

This library provides a unified interface for processing various file types into a format suitable for Large Language Models (LLMs), extracting both text and images. It aims to simplify the integration of diverse data sources into LLM workflows for developers and researchers.

How It Works

The core of the library is the Attachments class, which acts as a central "funnel." It accepts file paths or URLs and employs a pipeline of loaders, modifiers, presenters, refiners, and adapters. This modular design allows for extensibility, enabling users to contribute custom processing steps for new file formats or LLM APIs. The library supports a Domain Specific Language (DSL) for advanced customization of the processing pipeline.

Quick Start & Requirements

Primary install: pip install attachments
Additional support: pip install attachments[office] for Microsoft Office formats, pip install attachments[browser] for advanced web scraping with visual highlighting.
Prerequisites: Python 3.7+. Playwright and browser binaries are required for visual highlighting features.
Links: Demo, Quick-start, API Reference

Highlighted Details

Seamless integration with OpenAI and Anthropic APIs for multimodal LLM interactions.
DSPy integration for class-based and string-based signatures with automatic type registration.
Advanced DSL for complex pipelines, including web scraping with CSS selectors and visual highlighting.
Support for a wide range of file formats including PDF, DOCX, PPTX, CSV, JSON, and various image types.

Maintenance & Community

The project is actively developed, with an alpha version available for testing new features. Users are encouraged to contribute via GitHub Issues or Pull Requests.

Licensing & Compatibility

The library is released under the MIT License, permitting commercial use and integration with closed-source projects.

Limitations & Caveats

The project is currently in alpha, indicating potential for bugs and breaking changes. While extensive, the list of supported formats and adapters is still growing, with plans for audio, video, and more cloud service integrations.

Health Check

Last Commit

2 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

6 stars in the last 30 days