potato  by davidjurgens

AI-powered multi-modal annotation tool for NLP research

Created 5 years ago
366 stars

Top 77.4% on SourcePulse

GitHubView on GitHub
Project Summary

Potato is a lightweight, configuration-driven annotation tool designed for rapid deployment in NLP research, requiring no coding for setup. It addresses the need for efficient, self-hosted annotation across multiple data modalities, offering AI assistance and robust quality control features. The tool benefits researchers and teams by providing full data control and significantly reducing setup time compared to custom-coded solutions.

How It Works

Potato employs a YAML configuration system to define annotation tasks, abstracting away complex coding requirements. It supports multi-modal data including text, audio, video, images, and dialogue, with specialized annotation schemes for each. Core to its design is integrated AI assistance, leveraging LLMs for label suggestions and active learning to prioritize uncertain instances, thereby accelerating the annotation process.

Quick Start & Requirements

  • Primary install: pip install potato-annotation
  • Prerequisites: Python. Dependencies are managed via requirements.txt when running from source.
  • Links:
    • Documentation: potato-annotation.readthedocs.io
    • Example Projects: Available in the project-hub/ directory and the Potato Showcase.
    • Quick Start Commands: potato list all, potato get <template>, potato start <template>, or running python potato/flask_server.py start <config_path> -p <port>.

Highlighted Details

  • Multi-Modal Annotation: Supports text (classification, span labeling, pairwise comparison), audio (waveform visualization, segmentation), video (frame-by-frame, temporal segments), images (region labeling, classification), and dialogue (turn-level, threading).
  • AI-Powered Assistance: Features LLM integration for label suggestions and active learning, supporting backends like OpenAI, Anthropic, Ollama, and vLLM.
  • Quality Control: Includes attention checks, gold standards, inter-annotator agreement calculation (Krippendorff's alpha), and time tracking.
  • Deployment Options: Offers local development, team annotation with authentication, crowdsourcing integration (Prolific, MTurk), and an enterprise-grade MySQL backend.

Maintenance & Community

  • Support: Primarily through GitHub Issues.
  • Contact: For questions, reach out to pedropei@umich.edu or jurgens@umich.edu.
  • Documentation: Available at potato-annotation.readthedocs.io.

Licensing & Compatibility

Potato is dual-licensed under Polyform Shield for non-commercial use. Commercial licensing is available upon contact with the developers. Academic research, internal company annotation, forking for personal development, and integration into open-source pipelines are permitted.

Limitations & Caveats

The Polyform Shield license imposes restrictions on commercial use, requiring a separate license for commercial annotation services or integration into competing proprietary platforms.

Health Check
Last Commit

2 days ago

Responsiveness

Inactive

Pull Requests (30d)
1
Issues (30d)
0
Star History
9 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Andre Zayarni Andre Zayarni(Cofounder of Qdrant), and
3 more.

refinery by code-kern-ai

0%
1k
Open-source tool for NLP data scaling, assessment, and maintenance
Created 3 years ago
Updated 1 year ago
Starred by Tobi Lutke Tobi Lutke(Cofounder of Shopify), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
7 more.

autolabel by refuel-ai

0%
2k
Python library to label text datasets using LLMs
Created 2 years ago
Updated 11 months ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems") and Wing Lian Wing Lian(Founder of Axolotl AI).

xtreme1 by xtreme1-io

0.3%
1k
Open-source platform for multimodal training data annotation
Created 3 years ago
Updated 7 months ago
Feedback? Help us improve.