nomic  by nomic-ai

Python client for massive unstructured data interaction

created 3 years ago
1,761 stars

Top 24.9% on sourcepulse

GitHubView on GitHub
Project Summary

Nomic Atlas is a Python client for interacting with a powerful, browser-based platform designed for exploring, labeling, searching, and sharing massive unstructured datasets. It caters to researchers and developers working with text, image, audio, and video data, enabling efficient insight discovery and data organization.

How It Works

Atlas leverages embeddings to represent unstructured data, allowing for semantic search and clustering into topics. It generates and stores these embeddings, providing access to both high-dimensional latent representations and 2D projections for visualization. The platform facilitates programmatic access to data structures, individual data points, and automatically generated topic models, enabling both coding-based and no-code interaction.

Quick Start & Requirements

Highlighted Details

  • Supports datasets from hundreds to tens of millions of points.
  • Handles multiple data modalities: text, image, audio, video.
  • Features semantic search, topic clustering, data tagging, and deduplication.
  • Offers shareable, interactive maps with or without coding.

Maintenance & Community

Licensing & Compatibility

  • License details are not explicitly stated in the README.
  • Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The README does not specify the exact license, which may impact commercial adoption. There are no explicit mentions of supported operating systems or hardware requirements beyond general Python compatibility.

Health Check
Last commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
113 stars in the last 90 days

Explore Similar Projects

Starred by Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind), Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), and
4 more.

argilla by argilla-io

0.4%
5k
Collaboration tool for building high-quality AI datasets
created 4 years ago
updated 5 days ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems) and Elie Bursztein Elie Bursztein(Cybersecurity Lead at Google DeepMind).

LightRAG by HKUDS

1.0%
19k
RAG framework for fast, simple retrieval-augmented generation
created 10 months ago
updated 21 hours ago
Feedback? Help us improve.