vector-search-class-notes  by edoliberty

Class notes on vector search and databases

created 2 years ago
320 stars

Top 84.6% on SourcePulse

GitHubView on GitHub
Project Summary

This repository contains class notes for Princeton's COS 597A "Long Term Memory in AI - Vector Search and Databases" course. It covers the theoretical foundations and practical implementation of vector search, targeting students and researchers interested in AI, database management, and large-scale information retrieval systems. The notes provide a comprehensive overview of embeddings, algorithms, and systems crucial for modern AI applications.

How It Works

The course material delves into vector search as a core component of AI systems, explaining how embeddings serve as an intermediate representation for scalability and explainability. It covers various embedding techniques for text and images, dimensionality reduction methods like SVD and PCA, and approximate nearest neighbor search (ANNS) algorithms such as Locality Sensitive Hashing (LSH). The curriculum also explores clustering techniques (k-means, IVF) and quantization methods for efficient vector compression, culminating in graph-based indexing approaches like HNSW.

Quick Start & Requirements

To build the notes, a Unix-like system with bibtex and pdflatex is required.

git clone git@github.com:edoliberty/vector-search-class-notes.git
cd vector-search-class-notes
./build

The repository also links to a Python notebook for Class 8.

Highlighted Details

  • Covers foundational AI concepts like embeddings, vector spaces, and information retrieval.
  • Details algorithms for dimensionality reduction (SVD, PCA) and Approximate Nearest Neighbor Search (LSH, HNSW).
  • Explores practical implementation aspects including clustering (k-means) and vector quantization.
  • Includes a project component with options for theoretical research, data science applications, or engineering contributions to libraries like FAISS.

Maintenance & Community

The course is taught by industry leaders from Pinecone and Meta (FAISS), with guest lectures from Microsoft Research (DiskANN). Contributions are welcomed via pull requests or issues.

Licensing & Compatibility

All class materials are intended for free use by academics, students, and professors.

Limitations & Caveats

The primary build process requires specific LaTeX and BibTeX dependencies. Some class sessions are marked as "No Class" due to university holidays or midterm exams.

Health Check
Last commit

7 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
0 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.