datascience-fails by xLaszlo

Curated list of articles on why data science projects fail

Created 5 years ago

463 stars

Top 64.7% on SourcePulse

View on GitHub

2 Experts Love This Project

Chip Huyen

Author of "AI Engineering", "Designing Machine Learning Systems"

Andrey Vasnetsov

Cofounder of Qdrant

Project Summary

This repository is a curated collection of articles and resources detailing common reasons for data science and machine learning project failures. It serves as a valuable reference for practitioners, managers, and researchers aiming to avoid pitfalls in AI/ML initiatives. The project categorizes failure points across organizational, technical, and product-related aspects, offering practical insights to improve project success rates.

How It Works

The project compiles links to articles, blog posts, and research papers that analyze data science project failures. It categorizes these failures into broad themes such as organizational issues (leadership, employees, infrastructure), intermediate concerns (legal, privacy, bias, security), product planning (business value, specification), project execution (data, modeling), and ongoing product management (operations). This structured approach helps users quickly identify common failure modes and understand their root causes.

Quick Start & Requirements

This is a static collection of links and does not require installation or execution. Users can browse the README for categorized links to external resources.

Highlighted Details

Comprehensive categorization of failure reasons, including organizational, intermediate, product planning, execution, and ongoing aspects.
Links to numerous external resources, including blog posts, research papers, and conference talks, providing diverse perspectives on data science failures.
Specific examples of failures, such as IBM's Watson for Oncology, Microsoft's Tay chatbot, and Amazon's recruitment AI, illustrating real-world consequences.
Discussion of common pitfalls like data quality issues, lack of domain expertise, organizational silos, and misaligned business/technical goals.

Maintenance & Community

The project is maintained by xLaszlo. Users are encouraged to suggest additional articles via the Issues tab. The author also shares updates on Twitter (@xLaszlo) and links to their company blog (hypergolic.co.uk).

Licensing & Compatibility

The repository content is presented as a collection of links to external resources. The specific licensing of the linked external content varies by source.

Limitations & Caveats

The README notes a "notable absence of any concern about domain experts and any collaboration with them" in the collected failures, suggesting this is a potential blind spot in the current categorization. The project is a curated list and does not provide tools or methodologies for active failure prevention.

Health Check

Last Commit

5 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

0 stars in the last 30 days