fltr  by moritztng

CLI tool for natural language question answering over text files

Created 1 year ago
385 stars

Top 74.2% on SourcePulse

GitHubView on GitHub
Project Summary

fltr is a command-line tool that enables natural language querying of text files, acting as a "grep for questions." It leverages large language models (LLMs) like Mistral 7B and Mixtral 8x7B to understand and answer questions posed in natural language, filtering files based on the LLM's responses. This is beneficial for users who need to extract specific information from large text datasets without relying on complex regular expressions or keyword searches.

How It Works

fltr utilizes LLMs to process text files. Users provide a text file and a natural language prompt. The tool then feeds chunks of the text file along with the prompt to the LLM. The LLM evaluates each chunk against the prompt, and fltr outputs lines where the LLM's inferred answer is affirmative. This approach allows for semantic understanding and context-aware filtering, going beyond simple pattern matching.

Quick Start & Requirements

  • Install: curl https://raw.githubusercontent.com/moritztng/fltr/main/install.sh -o install.sh && bash install.sh small
  • Prerequisites: Linux (x86_64) & macOS (x86_64 & arm64). Requires CUDA 12.1 compatible NVIDIA driver for GPU acceleration; otherwise, it falls back to CPU.
  • Model Size: Use small for Mistral 7B (~7GB) or replace with large for Mixtral 8x7B (~48GB).
  • Usage: fltr --file emails.txt --prompt "Is the following email spam? Email:" --batch-size 32
  • Environment: export PATH=$PATH:~/Fltr

Highlighted Details

  • Performance: RTX 3070 (8GB): Mistral 7B ~52 tok/s, Mixtral 8x7B ~28 tok/s. Intel i5-6500 (8GB): Mistral 7B ~5 tok/s, Mixtral 8x7B ~2 tok/s.
  • Batching: Supports --batch-size for potentially faster processing.
  • Output: Filters lines where the LLM's answer to the prompt is "yes."

Maintenance & Community

  • Contributors: Maintained by Moritz T.
  • Community: No explicit links to community channels (Discord, Slack) or roadmaps are provided in the README.

Licensing & Compatibility

  • License: Not specified in the README.
  • Compatibility: Suitable for Linux and macOS. GPU acceleration requires specific NVIDIA drivers.

Limitations & Caveats

The README does not specify the license, which is crucial for commercial use or integration into closed-source projects. Performance on CPU is significantly limited, making GPU acceleration a practical necessity for reasonable throughput. The tool's effectiveness is highly dependent on the LLM's ability to accurately interpret the prompt and the text content.

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
1 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.