Figure 1: Chatbot Pipeline
1. User Input
The user can supply only a query or a query plus a PDF.
2. PDF Attached? (Decision Node)
Yes: • The document is classified to see if it’s scanned or digital.
• The appropriate text extraction (OCR or digital) is performed.
• The text is then normalized and passed to the Graph RAG & Structuring Agent.
No (Query Only) → Use Graph RAG: • If no PDF is attached, the query goes directly to
Graph RAG & Structuring Agent.
• Relevant financial data is pulled from its knowledge base.
3. Graph RAG & Structuring Agent
Gathers and structures relevant data (tables, market info, regulatory filings, etc.) based on:
• The user’s query alone, or
• The combination of the query + extracted document text.
1
4. Merge Processed Doc & Query
Combines all text and context into a single prompt for the next step.
5. Finance LLM Agent
Generates the response using the merged prompt, leveraging domain-specific financial training.
6. Feedback Loop
Collects user input on the quality or accuracy of the response, used for iterative improvements.
7. Enhanced Output
Returns the final answer to the user, incorporating relevant context and data.
Design Considerations
Single, Consistent Flow
• Maintaining one path through the pipeline (Graph RAG → Merge → LLM ) avoids branching logic
that can complicate maintenance.
• If there’s no PDF, the “Merge” step simply merges the user’s query with the Graph RAG–retrieved
data, acting as a pass-through.
Modular Extensibility
• Later, you may add other optional data sources (e.g., user profile, previously uploaded documents, or
real-time market data). The “Merge” block is a natural place to combine them.
• Having a single merge node means no separate path is needed for the “no document” case.
Simplified Code and Orchestration
• Splitting the pipeline into separate routes (one bypassing “Merge” and one that doesn’t) introduces
extra branching or code paths.
• By treating “no PDF data” as an empty or null input, the “Merge” step still processes the user query
plus whatever Graph RAG context is available.
Overview of the Model Selection & Optimization Approach
1. Multimodal OCR
• Two Pre-Trained Models: Select two leading OCR solutions (e.g., from research papers or industry
benchmarks).
• Evaluation: Use a financially annotated dataset (tables, financial terms) to measure key metrics such
as Character/Word Error Rate and table-structure accuracy.
• Model Selection: Choose the model with the best overall performance (lowest errors, highest quality
output) based on standardized evaluation methods.
2. Finance LLM
• Two Pre-Trained Finance Models: Identify two specialized large language models tuned for fi-
nancial text.
• Testing Methods: Compare their domain-specific accuracy (factual consistency, clarity) using rec-
ognized finance NLP benchmarks.
• Best Model Choice: Select the LLM with superior performance on a set of financial queries or tasks.
2
3. Pipeline Optimization
• Efficiency Focus: Use architectures and techniques that minimize GPU/CPU consumption (e.g.,
quantization, pruning).
• Goal: Maintain strong performance for both OCR and LLM while reducing inference costs and resource
usage.
Hardware
– CPU: 8–12 cores for efficient preprocessing and orchestration.
– Memory: 32 GB system RAM; to manage resources effectively.
– GPU: High-memory GPU recommended with at least 16–24 GB VRAM ( NVIDIA RTX 3090, RTX
A5000, or A6000...) for fast inference and handling large model parameters.
– Storage: Fast NVMe SSD (500 GB or larger) for quick model loading and data caching.