Hybrid Search RAG API

A production-ready Retrieval-Augmented Generation (RAG) system with advanced hybrid search capabilities, combining semantic understanding with keyword precision for superior document retrieval. Designed for on-premise deployment and seamless n8n integration.

Features

Hybrid Search: Combines semantic (dense) and keyword (sparse) search for optimal results
Advanced Ranking: Multi-stage ranking with Reciprocal Rank Fusion and LLM-based reranking
Multi-Format Support: Processes PDF, TXT, CSV, and JSON files with format-specific strategies
Intelligent Chunking: Document-type specific chunking with configurable overlap
Azure OpenAI Integration: Uses text-embedding-3-large (3072 dimensions) and gpt-4o-mini
MinIO Storage: S3-compatible object storage with automatic deduplication
Qdrant Vector Database: High-performance vector search with named vectors
RESTful API: FastAPI-based endpoints with comprehensive error handling
Docker Deployment: Fully containerized with health checks and monitoring
Enhanced Name Search: Special handling for person/entity name queries

Architecture

┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│   FastAPI   │────▶│   MinIO     │     │   Qdrant    │
│     API     │     │  Storage    │     │   Vector    │
└─────────────┘     └─────────────┘     │  Database   │
       │                                 └─────────────┘
       │                                        ▲
       ▼                                        │
┌─────────────┐     ┌─────────────┐            │
│ Unstructured│────▶│    Azure    │────────────┘
│  Processor  │     │   OpenAI    │
└─────────────┘     └─────────────┘

Core Concepts

Hybrid Search Architecture

This system implements a sophisticated hybrid search approach that combines the strengths of both semantic and keyword-based search:

1. Dense Vectors (Semantic Search)

Uses Azure OpenAI's text-embedding-3-large model (3072 dimensions)
Captures semantic meaning and context
Excellent for conceptual queries and paraphrasing
Handles synonyms and related concepts naturally

2. Sparse Vectors (Keyword Search)

Uses Qdrant's built-in sparse vector implementation
Preserves exact keyword matching capabilities
Critical for technical terms, names, and specific identifiers
Ensures important keywords aren't lost in semantic abstraction

Document Ranking & Reranking Process

┌─────────────────┐
│   User Query    │
└────────┬────────┘
         │
    ┌────┴────┐
    │ Process │
    └────┬────┘
         │
    ┌────┴────┐              ┌────┴────┐
    │  Dense  │              │ Sparse  │
    │ Search  │              │ Search  │
    │(Semantic)│             │(Keyword)│
    └────┬────┘              └────┬────┘
         │                        │
         └──────────┬─────────────┘
                     │
              ┌──────┴──────┐
              │     RRF     │
              │   Fusion    │
              └──────┬──────┘
                     │
              ┌──────┴──────┐
              │  Optional   │
              │ LLM Rerank  │
              └──────┬──────┘
                     │
              ┌──────┴──────┐
              │   Results   │
              └─────────────┘

Stage 1: Initial Retrieval

Query Processing: User query is simultaneously:
- Embedded into a dense vector for semantic search
- Tokenized into sparse vectors for keyword search
Parallel Search: Both search methods run concurrently in Qdrant:
- Dense search finds semantically similar documents
- Sparse search finds keyword matches
Reciprocal Rank Fusion (RRF):
```
RRF_score = Σ(1 / (k + rank_i))
```
- Combines results from both searches
- k=60 (constant) prevents bias toward top results
- Creates unified ranking preserving both semantic and keyword relevance

Stage 2: LLM-Based Reranking (Optional)

Context Enrichment: Top-K results are sent to GPT-4o-mini
Relevance Assessment: LLM evaluates each result against the query
Smart Reordering: Results are reranked based on:
- Contextual understanding
- Query intent matching
- Information completeness

Key Advantages

1. Superior Retrieval Quality

Best of Both Worlds: Captures both meaning and precision
Robust to Query Variations: Works with natural language and specific terms
Context-Aware: Understands document relationships and intent

2. Enhanced Name Search

Special handling for person/entity queries
Exact match prioritization for names
Prevents semantic drift for proper nouns

3. Flexibility

Configurable Alpha Weight (HYBRID_ALPHA=0.7): Tune semantic vs keyword importance
Search Type Selection: Choose hybrid, dense-only, or sparse-only per query
Optional Reranking: Balance speed vs accuracy based on use case

4. Performance Optimization

Parallel Processing: Dense and sparse searches run concurrently
Batch Embeddings: Efficient processing of multiple documents
Fallback Strategies: Graceful degradation if one method fails

5. Document Intelligence

Format-Specific Processing: Optimal handling for PDFs, CSVs, JSON, TXT
Smart Chunking: Preserves context with configurable overlap
Metadata Preservation: Maintains source, position, and type information

Quick Start

1. Prerequisites

Docker and Docker Compose
Python 3.11+ (for local testing)
8GB+ RAM recommended

2. Clone and Setup

git clone <repository>
cd python-rag

# Ensure .env file has your Azure credentials
# (Already configured in the provided .env)

3. Start Services

# Start all services
#docker-compose up -d

# Check service health
docker-compose ps

# View logs
docker-compose logs -f

4. Test the API

# Install test dependencies
pip install aiohttp

# Run test suite
python test_api.py

API Endpoints

Health Check

GET /health

Upload File

POST /upload
Content-Type: multipart/form-data

# Example with curl:
curl -X POST -F "[email protected]" http://localhost:8000/upload

Search

POST /search
Content-Type: application/json

{
  "query": "your search query",
  "top_k": 10,
  "search_type": "hybrid"  # Options: "hybrid", "dense", "sparse"
}

Ask

POST /ask
Content-Type: application/json

{
  "query": "your search query"
}

Refresh Index

POST /index/refresh

Get Statistics

GET /stats

n8n Integration

This API is designed to work as a tool in n8n workflows for RAG patterns.

n8n HTTP Request Node Configuration

Search Endpoint:
- Method: POST
- URL: http://your-host:8000/search
- Body Type: JSON
- Body:
```
{
  "query": "{{ $json.query }}",
  "top_k": 10,
  "search_type": "hybrid"
}
```
Upload Endpoint:
- Method: POST
- URL: http://your-host:8000/upload
- Body Type: Form-Data
- Send Binary Data: Yes

Example n8n Workflow

{
  "nodes": [
    {
      "name": "RAG Search",
      "type": "n8n-nodes-base.httpRequest",
      "parameters": {
        "method": "POST",
        "url": "http://localhost:8000/search",
        "jsonParameters": true,
        "options": {},
        "bodyParametersJson": {
          "query": "{{ $json.userQuery }}",
          "top_k": 5
        }
      }
    }
  ]
}

Configuration

Key settings in .env that control ranking behavior:

# Chunking
CHUNK_SIZE=512          # Characters per chunk
CHUNK_OVERLAP=50        # Overlap between chunks

# Search & Ranking
HYBRID_ALPHA=0.7        # Dense vs sparse weight (0.7 = 70% semantic, 30% keyword)
TOP_K_RESULTS=10        # Final results to return
RERANK_TOP_K=20         # Candidates for LLM reranking
ENABLE_RERANKING=true   # Toggle LLM-based reranking
RRF_K=60               # Reciprocal Rank Fusion constant

# Azure OpenAI
AZURE_EMBEDDING_DEPLOYMENT=text-embedding-3-large
AZURE_LLM_DEPLOYMENT=gpt-4o-mini

Advanced Usage

Custom Filters

Search with metadata filters:

{
  "query": "workshop",
  "filters": {
    "file_type": "pdf",
    "filename": "Workshop_Manual.pdf"
  }
}

Batch Processing

Process multiple files from MinIO:

# Upload files to MinIO bucket
# Then refresh index
curl -X POST http://localhost:8000/index/refresh

Performance Optimization

Search Quality Tuning

Hybrid Alpha (HYBRID_ALPHA):
- 0.0: Pure keyword search (best for exact matches)
- 0.5: Balanced semantic and keyword
- 0.7: Default - emphasizes semantic understanding
- 1.0: Pure semantic search (best for concepts)
Chunk Configuration:
- Size: Larger chunks (1024) preserve context, smaller (256) increase precision
- Overlap: Higher overlap (100) prevents boundary loss, lower (0) maximizes coverage
Reranking Strategy:
- Enable for critical queries requiring highest accuracy
- Disable for real-time applications needing sub-second response
- Adjust RERANK_TOP_K to balance quality vs API costs

Performance Tips

Embedding Batch Size: Adjust EMBEDDING_BATCH_SIZE for API rate limits
Concurrent Searches: Hybrid search runs dense and sparse in parallel
Caching: Results are cached for repeated queries
Index Optimization: Regular index refresh maintains search quality

Monitoring

Check Logs

# All services
docker-compose logs -f

# Specific service
docker-compose logs -f app

MinIO Console

Access at: http://localhost:9001

Username: minioadmin
Password: minioadmin

Qdrant Dashboard

Access at: http://localhost:6333/dashboard

Troubleshooting

Services Not Starting

# Check Docker resources
docker system df

# Restart services
docker-compose down
docker-compose up -d

Slow Embeddings

Check Azure OpenAI rate limits
Reduce EMBEDDING_BATCH_SIZE
Enable request caching

Search Quality Issues

Poor Semantic Results:
- Increase HYBRID_ALPHA toward 1.0
- Check embedding model deployment
- Verify chunk size isn't too small
Missing Exact Matches:
- Decrease HYBRID_ALPHA toward 0.0
- Ensure sparse vectors are being generated
- Check tokenization isn't removing important terms
Irrelevant Results:
- Enable reranking with ENABLE_RERANKING=true
- Increase RERANK_TOP_K for more candidates
- Adjust chunk overlap for better context

n8n Integration

The API provides REST endpoints that can be easily integrated with n8n workflows:

Use HTTP Request nodes to interact with the API
Available endpoints:
- /search: Search your knowledge base
- /ask: Get AI-powered answers
- /stats: Monitor your RAG system

Use Cases

Ideal For

Technical Documentation: Balances technical terms with conceptual search
Knowledge Management: Handles diverse query types from different users
Customer Support: Finds answers using both keywords and intent
Research Libraries: Combines citation search with topic exploration
Enterprise Search: Handles acronyms, names, and concepts equally well

Example Scenarios

Technical Query: "SSL certificate error"
- Sparse search ensures "SSL" and "certificate" are found
- Dense search includes related concepts like "TLS" or "security"
Conceptual Query: "How to improve team communication"
- Dense search dominates, finding semantically related content
- Sparse search still catches exact phrase matches
Name Search: "John Smith project updates"
- Enhanced name detection prioritizes exact "John Smith" matches
- Semantic search finds related project content

Development

Local Development

# Install dependencies
pip install -r requirements.txt

# Run locally (requires services running)
cd app
uvicorn main:app --reload

Adding New File Types

Extend DocumentProcessor in services/document_processor.py
Add parsing logic for the new type
Update chunking strategy if needed

Enterprise Advantages

Why Hybrid Search RAG?

Accuracy: Traditional semantic-only RAG systems can miss critical exact matches (product codes, names, technical terms). Our hybrid approach ensures nothing is lost.
Flexibility: Single embedding models can't handle all query types equally well. By combining approaches, we excel at both natural language questions and specific keyword searches.
Performance: Parallel processing and intelligent caching provide fast responses even with large document collections.
Control: On-premise deployment with configurable ranking weights gives you full control over search behavior and data security.
Integration: REST API design makes it easy to integrate with existing workflows, especially n8n automation.

License

This project is provided as-is for on-premise deployment.

Support

For issues or questions:

Check the logs first
Ensure all services are healthy
Verify Azure credentials are correct
Check example data format matches your use case

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
app		app
example_data		example_data
integrations		integrations
.env.dist		.env.dist
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
CONCEPT.md		CONCEPT.md
CUSTOM_INTEGRATIONS.md		CUSTOM_INTEGRATIONS.md
DEBUGGING.md		DEBUGGING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.debug.yml		docker-compose.debug.yml
docker-compose.prod.yml		docker-compose.prod.yml
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt
test_api.py		test_api.py

Folders and files

Latest commit

History

Repository files navigation

Hybrid Search RAG API

Features

Architecture

Core Concepts

Hybrid Search Architecture

1. Dense Vectors (Semantic Search)

2. Sparse Vectors (Keyword Search)

Document Ranking & Reranking Process

Stage 1: Initial Retrieval

Stage 2: LLM-Based Reranking (Optional)

Key Advantages

1. Superior Retrieval Quality

2. Enhanced Name Search

3. Flexibility

4. Performance Optimization

5. Document Intelligence

Quick Start

1. Prerequisites

2. Clone and Setup

3. Start Services

4. Test the API

API Endpoints

Health Check

Upload File

Search

Ask

Refresh Index

Get Statistics

n8n Integration

n8n HTTP Request Node Configuration

Example n8n Workflow

Configuration

Advanced Usage

Custom Filters

Batch Processing

Performance Optimization

Search Quality Tuning

Performance Tips

Monitoring

Check Logs

MinIO Console

Qdrant Dashboard

Troubleshooting

Services Not Starting

Slow Embeddings

Search Quality Issues

n8n Integration

Use Cases

Ideal For

Example Scenarios

Development

Local Development

Adding New File Types

Enterprise Advantages

Why Hybrid Search RAG?

License

Support

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages