Skip to content

valantic-CEC-Deutschland-GmbH/workoflow-rag

Repository files navigation

Hybrid Search RAG API

A production-ready Retrieval-Augmented Generation (RAG) system with advanced hybrid search capabilities, combining semantic understanding with keyword precision for superior document retrieval. Designed for on-premise deployment and seamless n8n integration.

Features

  • Hybrid Search: Combines semantic (dense) and keyword (sparse) search for optimal results
  • Advanced Ranking: Multi-stage ranking with Reciprocal Rank Fusion and LLM-based reranking
  • Multi-Format Support: Processes PDF, TXT, CSV, and JSON files with format-specific strategies
  • Intelligent Chunking: Document-type specific chunking with configurable overlap
  • Azure OpenAI Integration: Uses text-embedding-3-large (3072 dimensions) and gpt-4o-mini
  • MinIO Storage: S3-compatible object storage with automatic deduplication
  • Qdrant Vector Database: High-performance vector search with named vectors
  • RESTful API: FastAPI-based endpoints with comprehensive error handling
  • Docker Deployment: Fully containerized with health checks and monitoring
  • Enhanced Name Search: Special handling for person/entity name queries

Architecture

┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│   FastAPI   │────▶│   MinIO     │     │   Qdrant    │
│     API     │     │  Storage    │     │   Vector    │
└─────────────┘     └─────────────┘     │  Database   │
       │                                 └─────────────┘
       │                                        ▲
       ▼                                        │
┌─────────────┐     ┌─────────────┐            │
│ Unstructured│────▶│    Azure    │────────────┘
│  Processor  │     │   OpenAI    │
└─────────────┘     └─────────────┘

Core Concepts

Hybrid Search Architecture

This system implements a sophisticated hybrid search approach that combines the strengths of both semantic and keyword-based search:

1. Dense Vectors (Semantic Search)

  • Uses Azure OpenAI's text-embedding-3-large model (3072 dimensions)
  • Captures semantic meaning and context
  • Excellent for conceptual queries and paraphrasing
  • Handles synonyms and related concepts naturally

2. Sparse Vectors (Keyword Search)

  • Uses Qdrant's built-in sparse vector implementation
  • Preserves exact keyword matching capabilities
  • Critical for technical terms, names, and specific identifiers
  • Ensures important keywords aren't lost in semantic abstraction

Document Ranking & Reranking Process

┌─────────────────┐
│   User Query    │
└────────┬────────┘
         │
    ┌────┴────┐
    │ Process │
    └────┬────┘
         │
    ┌────┴────┐              ┌────┴────┐
    │  Dense  │              │ Sparse  │
    │ Search  │              │ Search  │
    │(Semantic)│             │(Keyword)│
    └────┬────┘              └────┬────┘
         │                        │
         └──────────┬─────────────┘
                     │
              ┌──────┴──────┐
              │     RRF     │
              │   Fusion    │
              └──────┬──────┘
                     │
              ┌──────┴──────┐
              │  Optional   │
              │ LLM Rerank  │
              └──────┬──────┘
                     │
              ┌──────┴──────┐
              │   Results   │
              └─────────────┘

Stage 1: Initial Retrieval

  1. Query Processing: User query is simultaneously:

    • Embedded into a dense vector for semantic search
    • Tokenized into sparse vectors for keyword search
  2. Parallel Search: Both search methods run concurrently in Qdrant:

    • Dense search finds semantically similar documents
    • Sparse search finds keyword matches
  3. Reciprocal Rank Fusion (RRF):

    RRF_score = Σ(1 / (k + rank_i))
    
    • Combines results from both searches
    • k=60 (constant) prevents bias toward top results
    • Creates unified ranking preserving both semantic and keyword relevance

Stage 2: LLM-Based Reranking (Optional)

  1. Context Enrichment: Top-K results are sent to GPT-4o-mini
  2. Relevance Assessment: LLM evaluates each result against the query
  3. Smart Reordering: Results are reranked based on:
    • Contextual understanding
    • Query intent matching
    • Information completeness

Key Advantages

1. Superior Retrieval Quality

  • Best of Both Worlds: Captures both meaning and precision
  • Robust to Query Variations: Works with natural language and specific terms
  • Context-Aware: Understands document relationships and intent

2. Enhanced Name Search

  • Special handling for person/entity queries
  • Exact match prioritization for names
  • Prevents semantic drift for proper nouns

3. Flexibility

  • Configurable Alpha Weight (HYBRID_ALPHA=0.7): Tune semantic vs keyword importance
  • Search Type Selection: Choose hybrid, dense-only, or sparse-only per query
  • Optional Reranking: Balance speed vs accuracy based on use case

4. Performance Optimization

  • Parallel Processing: Dense and sparse searches run concurrently
  • Batch Embeddings: Efficient processing of multiple documents
  • Fallback Strategies: Graceful degradation if one method fails

5. Document Intelligence

  • Format-Specific Processing: Optimal handling for PDFs, CSVs, JSON, TXT
  • Smart Chunking: Preserves context with configurable overlap
  • Metadata Preservation: Maintains source, position, and type information

Quick Start

1. Prerequisites

  • Docker and Docker Compose
  • Python 3.11+ (for local testing)
  • 8GB+ RAM recommended

2. Clone and Setup

git clone <repository>
cd python-rag

# Ensure .env file has your Azure credentials
# (Already configured in the provided .env)

3. Start Services

# Start all services
#docker-compose up -d

# Check service health
docker-compose ps

# View logs
docker-compose logs -f

4. Test the API

# Install test dependencies
pip install aiohttp

# Run test suite
python test_api.py

API Endpoints

Health Check

GET /health

Upload File

POST /upload
Content-Type: multipart/form-data

# Example with curl:
curl -X POST -F "[email protected]" http://localhost:8000/upload

Search

POST /search
Content-Type: application/json

{
  "query": "your search query",
  "top_k": 10,
  "search_type": "hybrid"  # Options: "hybrid", "dense", "sparse"
}

Ask

POST /ask
Content-Type: application/json

{
  "query": "your search query"
}

Refresh Index

POST /index/refresh

Get Statistics

GET /stats

n8n Integration

This API is designed to work as a tool in n8n workflows for RAG patterns.

n8n HTTP Request Node Configuration

  1. Search Endpoint:

    • Method: POST
    • URL: http://your-host:8000/search
    • Body Type: JSON
    • Body:
      {
        "query": "{{ $json.query }}",
        "top_k": 10,
        "search_type": "hybrid"
      }
  2. Upload Endpoint:

    • Method: POST
    • URL: http://your-host:8000/upload
    • Body Type: Form-Data
    • Send Binary Data: Yes

Example n8n Workflow

{
  "nodes": [
    {
      "name": "RAG Search",
      "type": "n8n-nodes-base.httpRequest",
      "parameters": {
        "method": "POST",
        "url": "http://localhost:8000/search",
        "jsonParameters": true,
        "options": {},
        "bodyParametersJson": {
          "query": "{{ $json.userQuery }}",
          "top_k": 5
        }
      }
    }
  ]
}

Configuration

Key settings in .env that control ranking behavior:

# Chunking
CHUNK_SIZE=512          # Characters per chunk
CHUNK_OVERLAP=50        # Overlap between chunks

# Search & Ranking
HYBRID_ALPHA=0.7        # Dense vs sparse weight (0.7 = 70% semantic, 30% keyword)
TOP_K_RESULTS=10        # Final results to return
RERANK_TOP_K=20         # Candidates for LLM reranking
ENABLE_RERANKING=true   # Toggle LLM-based reranking
RRF_K=60               # Reciprocal Rank Fusion constant

# Azure OpenAI
AZURE_EMBEDDING_DEPLOYMENT=text-embedding-3-large
AZURE_LLM_DEPLOYMENT=gpt-4o-mini

Advanced Usage

Custom Filters

Search with metadata filters:

{
  "query": "workshop",
  "filters": {
    "file_type": "pdf",
    "filename": "Workshop_Manual.pdf"
  }
}

Batch Processing

Process multiple files from MinIO:

# Upload files to MinIO bucket
# Then refresh index
curl -X POST http://localhost:8000/index/refresh

Performance Optimization

Search Quality Tuning

  1. Hybrid Alpha (HYBRID_ALPHA):

    • 0.0: Pure keyword search (best for exact matches)
    • 0.5: Balanced semantic and keyword
    • 0.7: Default - emphasizes semantic understanding
    • 1.0: Pure semantic search (best for concepts)
  2. Chunk Configuration:

    • Size: Larger chunks (1024) preserve context, smaller (256) increase precision
    • Overlap: Higher overlap (100) prevents boundary loss, lower (0) maximizes coverage
  3. Reranking Strategy:

    • Enable for critical queries requiring highest accuracy
    • Disable for real-time applications needing sub-second response
    • Adjust RERANK_TOP_K to balance quality vs API costs

Performance Tips

  1. Embedding Batch Size: Adjust EMBEDDING_BATCH_SIZE for API rate limits
  2. Concurrent Searches: Hybrid search runs dense and sparse in parallel
  3. Caching: Results are cached for repeated queries
  4. Index Optimization: Regular index refresh maintains search quality

Monitoring

Check Logs

# All services
docker-compose logs -f

# Specific service
docker-compose logs -f app

MinIO Console

Access at: http://localhost:9001

  • Username: minioadmin
  • Password: minioadmin

Qdrant Dashboard

Access at: http://localhost:6333/dashboard

Troubleshooting

Services Not Starting

# Check Docker resources
docker system df

# Restart services
docker-compose down
docker-compose up -d

Slow Embeddings

  • Check Azure OpenAI rate limits
  • Reduce EMBEDDING_BATCH_SIZE
  • Enable request caching

Search Quality Issues

  1. Poor Semantic Results:

    • Increase HYBRID_ALPHA toward 1.0
    • Check embedding model deployment
    • Verify chunk size isn't too small
  2. Missing Exact Matches:

    • Decrease HYBRID_ALPHA toward 0.0
    • Ensure sparse vectors are being generated
    • Check tokenization isn't removing important terms
  3. Irrelevant Results:

    • Enable reranking with ENABLE_RERANKING=true
    • Increase RERANK_TOP_K for more candidates
    • Adjust chunk overlap for better context

n8n Integration

The API provides REST endpoints that can be easily integrated with n8n workflows:

  1. Use HTTP Request nodes to interact with the API
  2. Available endpoints:
    • /search: Search your knowledge base
    • /ask: Get AI-powered answers
    • /stats: Monitor your RAG system

Use Cases

Ideal For

  • Technical Documentation: Balances technical terms with conceptual search
  • Knowledge Management: Handles diverse query types from different users
  • Customer Support: Finds answers using both keywords and intent
  • Research Libraries: Combines citation search with topic exploration
  • Enterprise Search: Handles acronyms, names, and concepts equally well

Example Scenarios

  1. Technical Query: "SSL certificate error"

    • Sparse search ensures "SSL" and "certificate" are found
    • Dense search includes related concepts like "TLS" or "security"
  2. Conceptual Query: "How to improve team communication"

    • Dense search dominates, finding semantically related content
    • Sparse search still catches exact phrase matches
  3. Name Search: "John Smith project updates"

    • Enhanced name detection prioritizes exact "John Smith" matches
    • Semantic search finds related project content

Development

Local Development

# Install dependencies
pip install -r requirements.txt

# Run locally (requires services running)
cd app
uvicorn main:app --reload

Adding New File Types

  1. Extend DocumentProcessor in services/document_processor.py
  2. Add parsing logic for the new type
  3. Update chunking strategy if needed

Enterprise Advantages

Why Hybrid Search RAG?

  1. Accuracy: Traditional semantic-only RAG systems can miss critical exact matches (product codes, names, technical terms). Our hybrid approach ensures nothing is lost.

  2. Flexibility: Single embedding models can't handle all query types equally well. By combining approaches, we excel at both natural language questions and specific keyword searches.

  3. Performance: Parallel processing and intelligent caching provide fast responses even with large document collections.

  4. Control: On-premise deployment with configurable ranking weights gives you full control over search behavior and data security.

  5. Integration: REST API design makes it easy to integrate with existing workflows, especially n8n automation.

License

This project is provided as-is for on-premise deployment.

Support

For issues or questions:

  1. Check the logs first
  2. Ensure all services are healthy
  3. Verify Azure credentials are correct
  4. Check example data format matches your use case

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors