ai-powered-search  by treygrainger

Code examples for "AI-Powered Search" book

created 5 years ago
301 stars

Top 89.6% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides Python code examples for the book "AI-Powered Search" by Manning Publications. It teaches readers how to build intelligent, continuously learning search engines using modern machine learning techniques, including semantic search, LLM integration, and personalized search, targeting developers and data scientists seeking to enhance search capabilities.

How It Works

The project leverages Jupyter Notebooks for interactive code execution, packaged within Docker containers for simplified setup. It demonstrates advanced search concepts such as semantic search via dense vector embeddings, Retrieval Augmented Generation (RAG), question answering with LLMs, and machine-learned ranking models. The approach emphasizes data-science-driven techniques to create search engines that understand natural language nuances and user context.

Quick Start & Requirements

  • Install Docker.
  • Clone the repository: git clone https://github.com/treygrainger/ai-powered-search.git
  • Navigate to the directory: cd ai-powered-search
  • Build and run: docker compose up
  • Access notebooks at http://localhost:8888.
  • Prerequisites: Docker. Initial build may take time.
  • Full instructions are in Appendix A of the book.

Highlighted Details

  • Demonstrates semantic search using foundation model embeddings.
  • Covers Retrieval Augmented Generation (RAG) and LLM integration for Q&A.
  • Includes techniques for personalized search and machine-learned ranking.
  • Utilizes Python and PySpark for data processing, with Apache Solr as the default search engine.

Maintenance & Community

The project is associated with the book "AI-Powered Search" by Trey Grainger, Doug Turnbull, and Max Irwin. Questions and support are available via Manning's LiveBook forum, GitHub issues, and pull requests on the repository.

Licensing & Compatibility

The code is released under the Apache License, Version 2.0 (ASL 2.0). Users should be aware that external dependencies and datasets may have different licenses, requiring inspection for suitability in commercial or closed-source projects.

Limitations & Caveats

The provided datasets are for demonstration purposes only and may be subject to change. While Apache Solr is the default engine, support for other search engines and vector databases is noted as forthcoming.

Health Check
Last commit

3 days ago

Responsiveness

1+ week

Pull Requests (30d)
1
Issues (30d)
0
Star History
32 stars in the last 90 days

Explore Similar Projects

Starred by Jason Liu Jason Liu(Author of Instructor) and Ross Taylor Ross Taylor(Cofounder of General Reasoning; Creator of Papers with Code).

Search-R1 by PeterGriffinJin

1.3%
3k
RL framework for training LLMs to use search engines
created 5 months ago
updated 3 weeks ago
Feedback? Help us improve.