Ask questions about any PDF — fully local, no paid APIs, no data leaving your machine.
A local RAG (Retrieval-Augmented Generation) pipeline built with BGE embeddings, Qdrant vector search, and TinyLlama via Ollama. Drop in any PDF and start asking questions — everything runs on your machine.
🚀 Quick Start · 🧠 How It Works · 💡 Example Output · 🗺️ Roadmap
PDF → Text Extraction → Chunking → BGE Embeddings → Qdrant Vector Store
↓
Question → Query Embedding → Similarity Search
↓
Top Chunks → TinyLlama → Answer
- PDF Loader — Extracts text page-by-page using PyMuPDF (
fitz) - Text Chunker — Splits into overlapping token windows using
tiktoken - Embedder — Generates dense vector embeddings via
BAAI/bge-small-en-v1.5 - Vector Store — Stores and searches embeddings in-memory using Qdrant
- Answer Generator — Feeds retrieved chunks as context to TinyLlama via Ollama
ai-research-assistant/
├── data/
│ └── papers/ # Drop your PDFs here
├── ingestion/
│ ├── pdf_loader.py # PDF text extraction (PyMuPDF)
│ └── chunking.py # Token-based overlapping chunker
├── embeddings/
│ └── embedder.py # BGE embedding model wrapper
├── retrieval/
│ └── vector_store.py # Qdrant in-memory vector store
├── llm/
│ └── generator.py # LLM answer generation via Ollama
├── research_assistant.py # Main pipeline entry point
├── requirements.txt
└── README.md
| Component | Tool |
|---|---|
| PDF Parsing | PyMuPDF (fitz) |
| Tokenization | tiktoken |
| Embeddings | BAAI/bge-small-en-v1.5 via sentence-transformers |
| Vector DB | Qdrant (in-memory) |
| LLM | TinyLlama via Ollama (fully local) |
| Language | Python 3.10+ |
- Python 3.10+
- Ollama installed
git clone https://github.com/overcastbulb/ai-research-assistant.git
cd ai-research-assistantpython3 -m venv venv
source venv/bin/activate # Linux/macOS
# venv\Scripts\activate # Windowspip install -r requirements.txtollama pull tinyllamamkdir -p data/papers
cp your_document.pdf data/papers/python3 research_assistant.pyLoading PDF...
Loaded 15 pages
Chunking text...
Created 36 chunks
Generating embeddings...
Embeddings generated: 36
Storing vectors...
Vector store ready
Question: Explain the transformer architecture
Contexts retrieved: 3
From pages: [3, 2, 2]
Answer:
The Transformer architecture uses stacked self-attention and point-wise,
fully connected layers for both the encoder and decoder...
Everything runs 100% locally:
- No data is sent to any external API
- No OpenAI, no Anthropic, no cloud dependency
- Your documents stay on your machine
- Support for multiple PDFs in a single session
- CLI arguments (
--pdf path/to/file.pdf --question "...") - Streamlit web UI for non-terminal users
- Persistent vector store (disk-based Qdrant)
- Swap TinyLlama for Groq API (optional, for faster inference)
- Relevance score display alongside retrieved chunks
This project is licensed under the MIT License.
Report a Bug · Request a Feature
