🔍 AI Research Assistant

Ask questions about any PDF — fully local, no paid APIs, no data leaving your machine.

A local RAG (Retrieval-Augmented Generation) pipeline built with BGE embeddings, Qdrant vector search, and TinyLlama via Ollama. Drop in any PDF and start asking questions — everything runs on your machine.

🚀 Quick Start · 🧠 How It Works · 💡 Example Output · 🗺️ Roadmap

🧠 How It Works

PDF → Text Extraction → Chunking → BGE Embeddings → Qdrant Vector Store
                                                              ↓
                                    Question → Query Embedding → Similarity Search
                                                              ↓
                                             Top Chunks → TinyLlama → Answer

PDF Loader — Extracts text page-by-page using PyMuPDF (fitz)
Text Chunker — Splits into overlapping token windows using tiktoken
Embedder — Generates dense vector embeddings via BAAI/bge-small-en-v1.5
Vector Store — Stores and searches embeddings in-memory using Qdrant
Answer Generator — Feeds retrieved chunks as context to TinyLlama via Ollama

📁 Project Structure

ai-research-assistant/
├── data/
│   └── papers/              # Drop your PDFs here
├── ingestion/
│   ├── pdf_loader.py        # PDF text extraction (PyMuPDF)
│   └── chunking.py          # Token-based overlapping chunker
├── embeddings/
│   └── embedder.py          # BGE embedding model wrapper
├── retrieval/
│   └── vector_store.py      # Qdrant in-memory vector store
├── llm/
│   └── generator.py         # LLM answer generation via Ollama
├── research_assistant.py    # Main pipeline entry point
├── requirements.txt
└── README.md

🛠️ Tech Stack

Component	Tool
PDF Parsing	PyMuPDF (`fitz`)
Tokenization	`tiktoken`
Embeddings	`BAAI/bge-small-en-v1.5` via `sentence-transformers`
Vector DB	Qdrant (in-memory)
LLM	TinyLlama via Ollama (fully local)
Language	Python 3.10+

⚙️ Setup & Installation

Prerequisites

Python 3.10+
Ollama installed

1. Clone the repository

git clone https://github.com/overcastbulb/ai-research-assistant.git
cd ai-research-assistant

2. Create a virtual environment

python3 -m venv venv
source venv/bin/activate        # Linux/macOS
# venv\Scripts\activate         # Windows

3. Install dependencies

pip install -r requirements.txt

4. Pull the local LLM

ollama pull tinyllama

5. Add your PDF

mkdir -p data/papers
cp your_document.pdf data/papers/

6. Run the assistant

python3 research_assistant.py

💡 Example Output

Loading PDF...
Loaded 15 pages
Chunking text...
Created 36 chunks
Generating embeddings...
Embeddings generated: 36
Storing vectors...
Vector store ready

Question: Explain the transformer architecture

Contexts retrieved: 3
From pages: [3, 2, 2]

Answer:
The Transformer architecture uses stacked self-attention and point-wise,
fully connected layers for both the encoder and decoder...

🔒 Privacy First

Everything runs 100% locally:

No data is sent to any external API
No OpenAI, no Anthropic, no cloud dependency
Your documents stay on your machine

🗺️ Roadmap

Support for multiple PDFs in a single session
CLI arguments (--pdf path/to/file.pdf --question "...")
Streamlit web UI for non-terminal users
Persistent vector store (disk-based Qdrant)
Swap TinyLlama for Groq API (optional, for faster inference)
Relevance score display alongside retrieved chunks

📄 License

This project is licensed under the MIT License.

Built for researchers, students, and developers who want to talk to their documents — privately.

Report a Bug · Request a Feature

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🔍 AI Research Assistant

🧠 How It Works

📁 Project Structure

🛠️ Tech Stack

⚙️ Setup & Installation

Prerequisites

1. Clone the repository

2. Create a virtual environment

3. Install dependencies

4. Pull the local LLM

5. Add your PDF

6. Run the assistant

💡 Example Output

🔒 Privacy First

🗺️ Roadmap

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
embeddings		embeddings
ingestion		ingestion
llm		llm
retrieval		retrieval
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
research_assistant.py		research_assistant.py

Folders and files

Latest commit

History

Repository files navigation

🔍 AI Research Assistant

🧠 How It Works

📁 Project Structure

🛠️ Tech Stack

⚙️ Setup & Installation

Prerequisites

1. Clone the repository

2. Create a virtual environment

3. Install dependencies

4. Pull the local LLM

5. Add your PDF

6. Run the assistant

💡 Example Output

🔒 Privacy First

🗺️ Roadmap

📄 License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages