Zoho Mail RAG

A Retrieval-Augmented Generation (RAG) system for searching and querying your Zoho Mail inbox using natural language.

Architecture

┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐
│  React Frontend │────▶│ FastAPI Backend │────▶│   Zoho Mail     │
│  (Chat UI)      │◀────│ (RAG Engine)    │◀────│   (IMAP)        │
└─────────────────┘     └────────┬────────┘     └─────────────────┘
                                 │
                    ┌────────────┼────────────┐
                    ▼            ▼            ▼
              ┌──────────┐ ┌──────────┐ ┌──────────┐
              │ ChromaDB │ │  SQLite  │ │ OpenAI   │
              │ (Vectors)│ │  (State) │ │ (LLM)    │
              └──────────┘ └──────────┘ └──────────┘

How it works:

Sync: Emails are fetched via IMAP and stored in SQLite
Index: Email content is embedded using OpenAI and stored in ChromaDB
Search: Queries use hybrid search (semantic + keyword) to find relevant emails
Answer: LLM generates responses based on retrieved email context

Features

Natural language email search
Hybrid search (vector + FTS5 keyword)
Conversation memory for follow-ups
Query caching for faster responses
Parallel folder syncing
Date-aware filtering for temporal queries
Real-time sync progress UI

How Sync Works

┌──────────────┐    ┌──────────────┐    ┌──────────────┐    ┌──────────────┐
│  Connect to  │───▶│  List Mail   │───▶│ Fetch Emails │───▶│   Generate   │
│  IMAP Server │    │   Folders    │    │  in Batches  │    │  Embeddings  │
└──────────────┘    └──────────────┘    └──────────────┘    └──────┬───────┘
                                                                   │
                    ┌──────────────┐    ┌──────────────┐           │
                    │   Complete   │◀───│ Store Vectors│◀──────────┘
                    │              │    │  in ChromaDB │
                    └──────────────┘    └──────────────┘

IMAP Connection: Connects to Zoho Mail using SSL/TLS
Folder Discovery: Lists all mail folders (Inbox, Sent, etc.)
Parallel Sync: Multiple folders sync simultaneously (configurable workers)
Batch Processing: Emails fetched in batches (default: 50) to manage memory
Data Extraction: Subject, sender, recipients, date, body, and attachments
Embedding Generation: Email content converted to vectors via OpenAI
Dual Storage:
- SQLite: Email metadata and sync state
- ChromaDB: Vector embeddings for semantic search

How Search Works

┌─────────────┐     ┌────────────────────────────────┐     ┌─────────────┐
│   User      │     │        Hybrid Search           │     │   LLM       │
│   Query     │────▶│  ┌─────────┐    ┌─────────┐    │────▶│  Response   │
│             │     │  │ Vector  │    │ Keyword │    │     │  Generation │
└─────────────┘     │  │ Search  │    │ Search  │    │     └─────────────┘
                    │  └────┬────┘    └────┬────┘    │
                    │       │              │         │
                    │       └──────┬───────┘         │
                    │              ▼                 │
                    │    ┌─────────────────┐         │
                    │    │  RRF Fusion     │         │
                    │    │  (Rank Merge)   │         │
                    │    └─────────────────┘         │
                    └────────────────────────────────┘

Query Processing: User question is analyzed for intent and keywords
Vector Search: Query embedded and compared against email vectors (semantic similarity)
Keyword Search: FTS5 full-text search for exact term matches
Reciprocal Rank Fusion (RRF): Combines both result sets with weighted scoring
Context Building: Top emails assembled into prompt context
LLM Generation: GPT generates natural language response with citations

Search Improvements

Hybrid Search Strategy

Instead of relying solely on vector similarity, we combine two approaches:

Method	Strength	Weight
Vector Search	Understands meaning and context	60%
Keyword Search	Exact matches, names, codes	40%

Query Expansion

When searching for specific topics, related terms are automatically added:

"upcoming interviews" → also searches for:
  meeting, schedule, invite, calendar, zoom, teams,
  recruiter, hiring, assessment, panel

Intelligent Date Filtering

For temporal queries ("upcoming", "next month"), the system:

Extracts dates from email body using multiple format parsers
Filters out past events automatically
Supports formats: Jan 5, 2026, 5/1/2026, 2026-01-05, Tue 3 Feb 2026, etc.

Caching & Memory

Query Cache: Repeated questions served instantly (LRU with 5-min TTL)
Conversation Memory: Follow-up questions maintain context

Quick Start

Prerequisites

Python 3.11+
Node.js 18+
Zoho Mail account with IMAP enabled
OpenRouter API key (for OpenAI models)

1. Configure Environment

cp .env.example .env

Edit .env with your credentials:

IMAP_USERNAME=[email protected]
IMAP_PASSWORD=your-app-password
OPENROUTER_API_KEY=your-openrouter-key

2. Run with Docker (Recommended)

docker-compose up -d

Access at: http://localhost

3. Run Locally (Development)

# Terminal 1 - Backend
cd backend
python -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate
pip install -r requirements.txt
uvicorn app.main:app --reload --port 8000

# Terminal 2 - Frontend
cd frontend
npm install
npm run dev

Access at: http://localhost:5173

API Endpoints

Endpoint	Method	Description
`/api/sync/start`	POST	Start email sync
`/api/sync/stop`	POST	Stop sync
`/api/sync/status`	GET	Get sync progress
`/api/query`	POST	Ask a question
`/api/search`	POST	Search emails

Configuration

Variable	Description	Default
`IMAP_SERVER`	Zoho IMAP server	imappro.zoho.com
`SYNC_BATCH_SIZE`	Emails per batch	50
`SYNC_PARALLEL_WORKERS`	Concurrent folders	3
`EMBEDDING_MODEL`	OpenAI embedding model	text-embedding-3-small
`CHAT_MODEL`	LLM for responses	gpt-4o-mini

Project Structure

├── backend/
│   ├── app/
│   │   ├── api/routes.py       # API endpoints
│   │   ├── services/
│   │   │   ├── sync_service.py    # Email sync
│   │   │   ├── query_service.py   # RAG query engine
│   │   │   ├── hybrid_search.py   # Vector + keyword search
│   │   │   └── vector_store.py    # ChromaDB operations
│   │   └── models/email.py     # Data models
│   └── requirements.txt
├── frontend/
│   ├── src/
│   │   ├── App.jsx             # Main app
│   │   └── components/
│   │       └── ChatInterface.jsx  # Chat UI
│   └── package.json
├── docker-compose.yml
└── .env.example

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
backend		backend
frontend		frontend
.env.example		.env.example
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
docker-compose.yml		docker-compose.yml
run-backend.sh		run-backend.sh
run-frontend.sh		run-frontend.sh
run.sh		run.sh
sample.webp		sample.webp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Zoho Mail RAG

Architecture

Features

How Sync Works

How Search Works

Search Improvements

Hybrid Search Strategy

Query Expansion

Intelligent Date Filtering

Caching & Memory

Quick Start

Prerequisites

1. Configure Environment

2. Run with Docker (Recommended)

3. Run Locally (Development)

API Endpoints

Configuration

Project Structure

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

encryptedtouhid/zoho-mail-rag

Folders and files

Latest commit

History

Repository files navigation

Zoho Mail RAG

Architecture

Features

How Sync Works

How Search Works

Search Improvements

Hybrid Search Strategy

Query Expansion

Intelligent Date Filtering

Caching & Memory

Quick Start

Prerequisites

1. Configure Environment

2. Run with Docker (Recommended)

3. Run Locally (Development)

API Endpoints

Configuration

Project Structure

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages