Class notes on vector search and databases
Top 84.6% on SourcePulse
This repository contains class notes for Princeton's COS 597A "Long Term Memory in AI - Vector Search and Databases" course. It covers the theoretical foundations and practical implementation of vector search, targeting students and researchers interested in AI, database management, and large-scale information retrieval systems. The notes provide a comprehensive overview of embeddings, algorithms, and systems crucial for modern AI applications.
How It Works
The course material delves into vector search as a core component of AI systems, explaining how embeddings serve as an intermediate representation for scalability and explainability. It covers various embedding techniques for text and images, dimensionality reduction methods like SVD and PCA, and approximate nearest neighbor search (ANNS) algorithms such as Locality Sensitive Hashing (LSH). The curriculum also explores clustering techniques (k-means, IVF) and quantization methods for efficient vector compression, culminating in graph-based indexing approaches like HNSW.
Quick Start & Requirements
To build the notes, a Unix-like system with bibtex
and pdflatex
is required.
git clone git@github.com:edoliberty/vector-search-class-notes.git
cd vector-search-class-notes
./build
The repository also links to a Python notebook for Class 8.
Highlighted Details
Maintenance & Community
The course is taught by industry leaders from Pinecone and Meta (FAISS), with guest lectures from Microsoft Research (DiskANN). Contributions are welcomed via pull requests or issues.
Licensing & Compatibility
All class materials are intended for free use by academics, students, and professors.
Limitations & Caveats
The primary build process requires specific LaTeX and BibTeX dependencies. Some class sessions are marked as "No Class" due to university holidays or midterm exams.
7 months ago
Inactive