Open Source Architecture for a Financial Document
Chatbot
Your Name
February 17, 2025
Contents
1 Introduction 3
2 Document Ingestion, Preprocessing, and Multimodal Handling 3
2.1 PDF and Image Input . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2 Data Cleaning and Structuring . . . . . . . . . . . . . . . . . . . . . . . . . 3
3 Content Analysis, Embedding Generation, and Graph RAG 4
3.1 Embedding Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
3.2 Graph RAG Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
3.3 Vector Store and Indexing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
4 Multi-Agent Architecture for Query Processing and RAG 4
4.1 Query Understanding and Pre-Processing Agent . . . . . . . . . . . . . . . . 4
4.2 RAG Agent with Graph Integration . . . . . . . . . . . . . . . . . . . . . . . 5
4.3 Multi-Agent Orchestration . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
5 Generative Response Creation with Pre-Trained Models 5
5.1 Generative Model and Prompting . . . . . . . . . . . . . . . . . . . . . . . . 5
5.2 Multimodal Response (Optional) . . . . . . . . . . . . . . . . . . . . . . . . 5
6 Integration, Deployment, and User Interaction 6
6.1 Backend Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
6.2 Frontend Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
6.3 Security and Compliance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
7 Testing, Monitoring, and Continuous Improvement 6
7.1 Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
7.2 Monitoring & Logging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
7.3 Iterative Improvements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
8 Summary 7
1
9 Conclusion 7
2
1 Introduction
This document describes an open source architecture for building a generative AI chatbot
that processes financial PDFs (both text-based and scanned) and answers questions based
on their content. The design integrates:
• Retrieval Augmented Generation (RAG)
• Graph RAG
• Multi-Agent Approach
• Pre-trained Models
• Multimodal Support
2 Document Ingestion, Preprocessing, and Multimodal
Handling
2.1 PDF and Image Input
• File Upload Interface: Create a web interface using frameworks such as Flask or
Django to allow users to upload PDFs.
• PDF Parsing:
– Text-Based PDFs: Use open source libraries such as PDFMiner or PyMuPDF to
extract text.
– Scanned PDFs: Use Tesseract OCR (with Python wrapper pytesseract) to
extract text from images.
• Multimodal Extraction: For financial charts or images, use OpenCV for pre-processing
and OpenAI’s CLIP model (via Hugging Face) to generate joint image-text embeddings.
2.2 Data Cleaning and Structuring
• Text Cleaning: Utilize Python libraries (e.g., regex, NLTK) to remove noise, header-
s/footers, and artifacts.
• Document Segmentation: Split text into pages, paragraphs, or logical sections.
• Graph Construction: Use NLP libraries such as spaCy for entity extraction (dates,
amounts, financial terms) and NetworkX to build a knowledge graph capturing entity
relationships.
3
3 Content Analysis, Embedding Generation, and Graph
RAG
3.1 Embedding Generation
• Text Embeddings: Use open source models from Hugging Face Transformers (e.g.,
BERT, Sentence Transformers) to generate embeddings.
• Multimodal Embeddings: Use CLIP (available via Hugging Face) to generate em-
beddings for images alongside text.
3.2 Graph RAG Setup
• Entity Extraction & Graph Building:
– Use spaCy to extract entities.
– Build a knowledge graph using NetworkX to represent relationships (e.g., linking
financial metrics to report dates).
• Graph Embedding: Explore open source graph embedding libraries such as PyTorch
Geometric or DGL to represent the graph structure in vector space.
3.3 Vector Store and Indexing
• Text & Multimodal Indexing: Use FAISS or Milvus (open source versions) to store
and query embeddings.
• Graph Indexing: Store the knowledge graph in a graph database like Neo4j Community
Edition or manage it in-memory with NetworkX for smaller-scale projects.
4 Multi-Agent Architecture for Query Processing and
RAG
4.1 Query Understanding and Pre-Processing Agent
• Query Parsing: Use Hugging Face models or spaCy to process and understand the
query, extracting key financial terms.
• Query Embedding: Generate a query embedding using the same model as for doc-
ument embeddings.
4
4.2 RAG Agent with Graph Integration
• Retriever Agent:
– Text Retriever: Query the FAISS/Milvus vector store.
– Graph Retriever: Query the knowledge graph using NetworkX queries or Neo4j
Cypher queries.
• Generator Agent: Use an open source generative model (e.g., GPT-2 or a fine-tuned
variant from Hugging Face) to produce the final answer. Alternatives such as Open
Assistant can also be considered.
• Context Fusion: Combine retrieved text segments with graph insights to form a
unified context for the generator.
4.3 Multi-Agent Orchestration
• Agent Framework: Use a task queue system like Celery along with a message broker
(RabbitMQ or Redis) to manage communication between agents:
– Document Agent: Handles ingestion, OCR, and embedding creation.
– Graph Agent: Manages entity extraction and graph building.
– Query Agent: Processes and embeds user queries.
– RAG Agent: Retrieves context and orchestrates response generation.
5 Generative Response Creation with Pre-Trained Mod-
els
5.1 Generative Model and Prompting
• Model Selection: Use open source models from Hugging Face (e.g., GPT-2 or GPT-Neo)
for response generation. Fine-tuning on financial texts may be applied if necessary.
• Prompt Engineering: Craft prompts that include both text and graph context. For
example:
"Using the following financial data and relationships between key
entities, answer the question: [user query]. Context: [aggregated
text and graph insights]."
5.2 Multimodal Response (Optional)
• Visual Summaries: If charts or images are relevant, generate captions or summaries
using image captioning models (open source versions available on Hugging Face).
5
6 Integration, Deployment, and User Interaction
6.1 Backend Development
• API Creation: Build RESTful APIs using Flask or Django to handle:
– File upload and processing.
– Agent orchestration.
– Vector and graph retrieval.
– Response generation.
• Containerization: Use Docker to containerize your application. Tools such as
Docker Compose or Kubernetes (open source version) can assist with orchestration
and scaling.
6.2 Frontend Interface
• Chat Interface: Develop an interactive web UI using frameworks like React or
Vue.js where users can:
– Upload financial PDFs.
– Pose questions.
– View responses along with context excerpts or visualized graphs.
• Visualization Tools: Use libraries such as D3.js or Plotly.js to visualize the
knowledge graph or extracted data.
6.3 Security and Compliance
• Data Security: Implement HTTPS, JWT-based authentication, and secure storage
practices.
• Compliance: Ensure the solution meets applicable data protection standards and
financial regulations.
7 Testing, Monitoring, and Continuous Improvement
7.1 Testing
• Unit & Integration Testing: Use frameworks like PyTest to test individual modules
(OCR, embedding, retrieval, generation) and the overall workflow.
• User Acceptance Testing (UAT): Validate the system using sample financial doc-
uments and real user queries.
6
7.2 Monitoring & Logging
• Monitoring Tools: Use open source monitoring tools like Prometheus and Grafana
for performance and health tracking.
• Logging: Utilize Python’s logging module or frameworks such as the ELK stack (Elas-
ticsearch, Logstash, Kibana) for logging and debugging.
7.3 Iterative Improvements
• Feedback Loop: Collect user feedback and logs to continuously improve extraction
accuracy, retrieval quality, and generative responses.
• Model Updates: Regularly update and fine-tune models using new data to adapt to
evolving financial document formats and terminology.
8 Summary
• Document Ingestion & Preprocessing: Utilize open source tools such as PDFMiner,
PyMuPDF, and Tesseract for PDFs and images. Use spaCy and NetworkX for entity
extraction and graph construction.
• Content Analysis & Embedding: Generate text and multimodal embeddings using
Hugging Face Transformers and CLIP. Store embeddings in FAISS or Milvus and index
the knowledge graph using Neo4j or NetworkX.
• Multi-Agent Retrieval & RAG: Leverage a multi-agent architecture with Celery
(using RabbitMQ/Redis) to orchestrate retrieval from text and graph stores and gen-
erate responses with open source generative models such as GPT-2 or GPT-Neo.
• Integration & Deployment: Build RESTful APIs with Flask/Django, containerize
with Docker, and develop a user-friendly UI using modern JavaScript frameworks.
Implement strong security and compliance measures.
• Testing & Monitoring: Utilize PyTest, Prometheus, Grafana, and the ELK stack
to ensure performance, security, and continuous improvements.
9 Conclusion
This open source architecture provides a comprehensive solution for building a robust finan-
cial document chatbot that integrates:
• Retrieval Augmented Generation (RAG)
• Graph RAG for relational insights
• A multi-agent approach for modular processing
7
• Pre-trained models and multimodal capabilities
Using freely available libraries and frameworks, this design ensures scalability, accuracy,
and compliance while enabling detailed financial document analysis and insightful response
generation.