Skip to content

A RAG system for searching and querying your Zoho Mail inbox using natural language.

Notifications You must be signed in to change notification settings

encryptedtouhid/zoho-mail-rag

Repository files navigation

Zoho Mail RAG

A Retrieval-Augmented Generation (RAG) system for searching and querying your Zoho Mail inbox using natural language.

Zoho Mail RAG Screenshot

Architecture

┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐
│  React Frontend │────▶│ FastAPI Backend │────▶│   Zoho Mail     │
│  (Chat UI)      │◀────│ (RAG Engine)    │◀────│   (IMAP)        │
└─────────────────┘     └────────┬────────┘     └─────────────────┘
                                 │
                    ┌────────────┼────────────┐
                    ▼            ▼            ▼
              ┌──────────┐ ┌──────────┐ ┌──────────┐
              │ ChromaDB │ │  SQLite  │ │ OpenAI   │
              │ (Vectors)│ │  (State) │ │ (LLM)    │
              └──────────┘ └──────────┘ └──────────┘

How it works:

  1. Sync: Emails are fetched via IMAP and stored in SQLite
  2. Index: Email content is embedded using OpenAI and stored in ChromaDB
  3. Search: Queries use hybrid search (semantic + keyword) to find relevant emails
  4. Answer: LLM generates responses based on retrieved email context

Features

  • Natural language email search
  • Hybrid search (vector + FTS5 keyword)
  • Conversation memory for follow-ups
  • Query caching for faster responses
  • Parallel folder syncing
  • Date-aware filtering for temporal queries
  • Real-time sync progress UI

How Sync Works

┌──────────────┐    ┌──────────────┐    ┌──────────────┐    ┌──────────────┐
│  Connect to  │───▶│  List Mail   │───▶│ Fetch Emails │───▶│   Generate   │
│  IMAP Server │    │   Folders    │    │  in Batches  │    │  Embeddings  │
└──────────────┘    └──────────────┘    └──────────────┘    └──────┬───────┘
                                                                   │
                    ┌──────────────┐    ┌──────────────┐           │
                    │   Complete   │◀───│ Store Vectors│◀──────────┘
                    │              │    │  in ChromaDB │
                    └──────────────┘    └──────────────┘
  1. IMAP Connection: Connects to Zoho Mail using SSL/TLS
  2. Folder Discovery: Lists all mail folders (Inbox, Sent, etc.)
  3. Parallel Sync: Multiple folders sync simultaneously (configurable workers)
  4. Batch Processing: Emails fetched in batches (default: 50) to manage memory
  5. Data Extraction: Subject, sender, recipients, date, body, and attachments
  6. Embedding Generation: Email content converted to vectors via OpenAI
  7. Dual Storage:
    • SQLite: Email metadata and sync state
    • ChromaDB: Vector embeddings for semantic search

How Search Works

┌─────────────┐     ┌────────────────────────────────┐     ┌─────────────┐
│   User      │     │        Hybrid Search           │     │   LLM       │
│   Query     │────▶│  ┌─────────┐    ┌─────────┐    │────▶│  Response   │
│             │     │  │ Vector  │    │ Keyword │    │     │  Generation │
└─────────────┘     │  │ Search  │    │ Search  │    │     └─────────────┘
                    │  └────┬────┘    └────┬────┘    │
                    │       │              │         │
                    │       └──────┬───────┘         │
                    │              ▼                 │
                    │    ┌─────────────────┐         │
                    │    │  RRF Fusion     │         │
                    │    │  (Rank Merge)   │         │
                    │    └─────────────────┘         │
                    └────────────────────────────────┘
  1. Query Processing: User question is analyzed for intent and keywords
  2. Vector Search: Query embedded and compared against email vectors (semantic similarity)
  3. Keyword Search: FTS5 full-text search for exact term matches
  4. Reciprocal Rank Fusion (RRF): Combines both result sets with weighted scoring
  5. Context Building: Top emails assembled into prompt context
  6. LLM Generation: GPT generates natural language response with citations

Search Improvements

Hybrid Search Strategy

Instead of relying solely on vector similarity, we combine two approaches:

Method Strength Weight
Vector Search Understands meaning and context 60%
Keyword Search Exact matches, names, codes 40%

Query Expansion

When searching for specific topics, related terms are automatically added:

"upcoming interviews" → also searches for:
  meeting, schedule, invite, calendar, zoom, teams,
  recruiter, hiring, assessment, panel

Intelligent Date Filtering

For temporal queries ("upcoming", "next month"), the system:

  • Extracts dates from email body using multiple format parsers
  • Filters out past events automatically
  • Supports formats: Jan 5, 2026, 5/1/2026, 2026-01-05, Tue 3 Feb 2026, etc.

Caching & Memory

  • Query Cache: Repeated questions served instantly (LRU with 5-min TTL)
  • Conversation Memory: Follow-up questions maintain context

Quick Start

Prerequisites

  • Python 3.11+
  • Node.js 18+
  • Zoho Mail account with IMAP enabled
  • OpenRouter API key (for OpenAI models)

1. Configure Environment

cp .env.example .env

Edit .env with your credentials:

IMAP_USERNAME=[email protected]
IMAP_PASSWORD=your-app-password
OPENROUTER_API_KEY=your-openrouter-key

2. Run with Docker (Recommended)

docker-compose up -d

Access at: http://localhost

3. Run Locally (Development)

# Terminal 1 - Backend
cd backend
python -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate
pip install -r requirements.txt
uvicorn app.main:app --reload --port 8000

# Terminal 2 - Frontend
cd frontend
npm install
npm run dev

Access at: http://localhost:5173

API Endpoints

Endpoint Method Description
/api/sync/start POST Start email sync
/api/sync/stop POST Stop sync
/api/sync/status GET Get sync progress
/api/query POST Ask a question
/api/search POST Search emails

Configuration

Variable Description Default
IMAP_SERVER Zoho IMAP server imappro.zoho.com
SYNC_BATCH_SIZE Emails per batch 50
SYNC_PARALLEL_WORKERS Concurrent folders 3
EMBEDDING_MODEL OpenAI embedding model text-embedding-3-small
CHAT_MODEL LLM for responses gpt-4o-mini

Project Structure

├── backend/
│   ├── app/
│   │   ├── api/routes.py       # API endpoints
│   │   ├── services/
│   │   │   ├── sync_service.py    # Email sync
│   │   │   ├── query_service.py   # RAG query engine
│   │   │   ├── hybrid_search.py   # Vector + keyword search
│   │   │   └── vector_store.py    # ChromaDB operations
│   │   └── models/email.py     # Data models
│   └── requirements.txt
├── frontend/
│   ├── src/
│   │   ├── App.jsx             # Main app
│   │   └── components/
│   │       └── ChatInterface.jsx  # Chat UI
│   └── package.json
├── docker-compose.yml
└── .env.example

About

A RAG system for searching and querying your Zoho Mail inbox using natural language.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors