A production-ready Retrieval-Augmented Generation (RAG) system with advanced hybrid search capabilities, combining semantic understanding with keyword precision for superior document retrieval. Designed for on-premise deployment and seamless n8n integration.
- Hybrid Search: Combines semantic (dense) and keyword (sparse) search for optimal results
- Advanced Ranking: Multi-stage ranking with Reciprocal Rank Fusion and LLM-based reranking
- Multi-Format Support: Processes PDF, TXT, CSV, and JSON files with format-specific strategies
- Intelligent Chunking: Document-type specific chunking with configurable overlap
- Azure OpenAI Integration: Uses text-embedding-3-large (3072 dimensions) and gpt-4o-mini
- MinIO Storage: S3-compatible object storage with automatic deduplication
- Qdrant Vector Database: High-performance vector search with named vectors
- RESTful API: FastAPI-based endpoints with comprehensive error handling
- Docker Deployment: Fully containerized with health checks and monitoring
- Enhanced Name Search: Special handling for person/entity name queries
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ FastAPI │────▶│ MinIO │ │ Qdrant │
│ API │ │ Storage │ │ Vector │
└─────────────┘ └─────────────┘ │ Database │
│ └─────────────┘
│ ▲
▼ │
┌─────────────┐ ┌─────────────┐ │
│ Unstructured│────▶│ Azure │────────────┘
│ Processor │ │ OpenAI │
└─────────────┘ └─────────────┘
This system implements a sophisticated hybrid search approach that combines the strengths of both semantic and keyword-based search:
- Uses Azure OpenAI's
text-embedding-3-largemodel (3072 dimensions) - Captures semantic meaning and context
- Excellent for conceptual queries and paraphrasing
- Handles synonyms and related concepts naturally
- Uses Qdrant's built-in sparse vector implementation
- Preserves exact keyword matching capabilities
- Critical for technical terms, names, and specific identifiers
- Ensures important keywords aren't lost in semantic abstraction
┌─────────────────┐
│ User Query │
└────────┬────────┘
│
┌────┴────┐
│ Process │
└────┬────┘
│
┌────┴────┐ ┌────┴────┐
│ Dense │ │ Sparse │
│ Search │ │ Search │
│(Semantic)│ │(Keyword)│
└────┬────┘ └────┬────┘
│ │
└──────────┬─────────────┘
│
┌──────┴──────┐
│ RRF │
│ Fusion │
└──────┬──────┘
│
┌──────┴──────┐
│ Optional │
│ LLM Rerank │
└──────┬──────┘
│
┌──────┴──────┐
│ Results │
└─────────────┘
-
Query Processing: User query is simultaneously:
- Embedded into a dense vector for semantic search
- Tokenized into sparse vectors for keyword search
-
Parallel Search: Both search methods run concurrently in Qdrant:
- Dense search finds semantically similar documents
- Sparse search finds keyword matches
-
Reciprocal Rank Fusion (RRF):
RRF_score = Σ(1 / (k + rank_i))- Combines results from both searches
k=60(constant) prevents bias toward top results- Creates unified ranking preserving both semantic and keyword relevance
- Context Enrichment: Top-K results are sent to GPT-4o-mini
- Relevance Assessment: LLM evaluates each result against the query
- Smart Reordering: Results are reranked based on:
- Contextual understanding
- Query intent matching
- Information completeness
- Best of Both Worlds: Captures both meaning and precision
- Robust to Query Variations: Works with natural language and specific terms
- Context-Aware: Understands document relationships and intent
- Special handling for person/entity queries
- Exact match prioritization for names
- Prevents semantic drift for proper nouns
- Configurable Alpha Weight (
HYBRID_ALPHA=0.7): Tune semantic vs keyword importance - Search Type Selection: Choose hybrid, dense-only, or sparse-only per query
- Optional Reranking: Balance speed vs accuracy based on use case
- Parallel Processing: Dense and sparse searches run concurrently
- Batch Embeddings: Efficient processing of multiple documents
- Fallback Strategies: Graceful degradation if one method fails
- Format-Specific Processing: Optimal handling for PDFs, CSVs, JSON, TXT
- Smart Chunking: Preserves context with configurable overlap
- Metadata Preservation: Maintains source, position, and type information
- Docker and Docker Compose
- Python 3.11+ (for local testing)
- 8GB+ RAM recommended
git clone <repository>
cd python-rag
# Ensure .env file has your Azure credentials
# (Already configured in the provided .env)# Start all services
#docker-compose up -d
# Check service health
docker-compose ps
# View logs
docker-compose logs -f# Install test dependencies
pip install aiohttp
# Run test suite
python test_api.pyGET /healthPOST /upload
Content-Type: multipart/form-data
# Example with curl:
curl -X POST -F "[email protected]" http://localhost:8000/uploadPOST /search
Content-Type: application/json
{
"query": "your search query",
"top_k": 10,
"search_type": "hybrid" # Options: "hybrid", "dense", "sparse"
}POST /ask
Content-Type: application/json
{
"query": "your search query"
}POST /index/refreshGET /statsThis API is designed to work as a tool in n8n workflows for RAG patterns.
-
Search Endpoint:
- Method: POST
- URL:
http://your-host:8000/search - Body Type: JSON
- Body:
{ "query": "{{ $json.query }}", "top_k": 10, "search_type": "hybrid" }
-
Upload Endpoint:
- Method: POST
- URL:
http://your-host:8000/upload - Body Type: Form-Data
- Send Binary Data: Yes
{
"nodes": [
{
"name": "RAG Search",
"type": "n8n-nodes-base.httpRequest",
"parameters": {
"method": "POST",
"url": "http://localhost:8000/search",
"jsonParameters": true,
"options": {},
"bodyParametersJson": {
"query": "{{ $json.userQuery }}",
"top_k": 5
}
}
}
]
}Key settings in .env that control ranking behavior:
# Chunking
CHUNK_SIZE=512 # Characters per chunk
CHUNK_OVERLAP=50 # Overlap between chunks
# Search & Ranking
HYBRID_ALPHA=0.7 # Dense vs sparse weight (0.7 = 70% semantic, 30% keyword)
TOP_K_RESULTS=10 # Final results to return
RERANK_TOP_K=20 # Candidates for LLM reranking
ENABLE_RERANKING=true # Toggle LLM-based reranking
RRF_K=60 # Reciprocal Rank Fusion constant
# Azure OpenAI
AZURE_EMBEDDING_DEPLOYMENT=text-embedding-3-large
AZURE_LLM_DEPLOYMENT=gpt-4o-miniSearch with metadata filters:
{
"query": "workshop",
"filters": {
"file_type": "pdf",
"filename": "Workshop_Manual.pdf"
}
}Process multiple files from MinIO:
# Upload files to MinIO bucket
# Then refresh index
curl -X POST http://localhost:8000/index/refresh-
Hybrid Alpha (
HYBRID_ALPHA):0.0: Pure keyword search (best for exact matches)0.5: Balanced semantic and keyword0.7: Default - emphasizes semantic understanding1.0: Pure semantic search (best for concepts)
-
Chunk Configuration:
- Size: Larger chunks (1024) preserve context, smaller (256) increase precision
- Overlap: Higher overlap (100) prevents boundary loss, lower (0) maximizes coverage
-
Reranking Strategy:
- Enable for critical queries requiring highest accuracy
- Disable for real-time applications needing sub-second response
- Adjust
RERANK_TOP_Kto balance quality vs API costs
- Embedding Batch Size: Adjust
EMBEDDING_BATCH_SIZEfor API rate limits - Concurrent Searches: Hybrid search runs dense and sparse in parallel
- Caching: Results are cached for repeated queries
- Index Optimization: Regular index refresh maintains search quality
# All services
docker-compose logs -f
# Specific service
docker-compose logs -f app- Username: minioadmin
- Password: minioadmin
# Check Docker resources
docker system df
# Restart services
docker-compose down
docker-compose up -d- Check Azure OpenAI rate limits
- Reduce
EMBEDDING_BATCH_SIZE - Enable request caching
-
Poor Semantic Results:
- Increase
HYBRID_ALPHAtoward 1.0 - Check embedding model deployment
- Verify chunk size isn't too small
- Increase
-
Missing Exact Matches:
- Decrease
HYBRID_ALPHAtoward 0.0 - Ensure sparse vectors are being generated
- Check tokenization isn't removing important terms
- Decrease
-
Irrelevant Results:
- Enable reranking with
ENABLE_RERANKING=true - Increase
RERANK_TOP_Kfor more candidates - Adjust chunk overlap for better context
- Enable reranking with
The API provides REST endpoints that can be easily integrated with n8n workflows:
- Use HTTP Request nodes to interact with the API
- Available endpoints:
/search: Search your knowledge base/ask: Get AI-powered answers/stats: Monitor your RAG system
- Technical Documentation: Balances technical terms with conceptual search
- Knowledge Management: Handles diverse query types from different users
- Customer Support: Finds answers using both keywords and intent
- Research Libraries: Combines citation search with topic exploration
- Enterprise Search: Handles acronyms, names, and concepts equally well
-
Technical Query: "SSL certificate error"
- Sparse search ensures "SSL" and "certificate" are found
- Dense search includes related concepts like "TLS" or "security"
-
Conceptual Query: "How to improve team communication"
- Dense search dominates, finding semantically related content
- Sparse search still catches exact phrase matches
-
Name Search: "John Smith project updates"
- Enhanced name detection prioritizes exact "John Smith" matches
- Semantic search finds related project content
# Install dependencies
pip install -r requirements.txt
# Run locally (requires services running)
cd app
uvicorn main:app --reload- Extend
DocumentProcessorinservices/document_processor.py - Add parsing logic for the new type
- Update chunking strategy if needed
-
Accuracy: Traditional semantic-only RAG systems can miss critical exact matches (product codes, names, technical terms). Our hybrid approach ensures nothing is lost.
-
Flexibility: Single embedding models can't handle all query types equally well. By combining approaches, we excel at both natural language questions and specific keyword searches.
-
Performance: Parallel processing and intelligent caching provide fast responses even with large document collections.
-
Control: On-premise deployment with configurable ranking weights gives you full control over search behavior and data security.
-
Integration: REST API design makes it easy to integrate with existing workflows, especially n8n automation.
This project is provided as-is for on-premise deployment.
For issues or questions:
- Check the logs first
- Ensure all services are healthy
- Verify Azure credentials are correct
- Check example data format matches your use case