Document RAG Assignment

The document outlines an assignment for creating a full-stack Document Intelligence Platform that integrates AI/RAG, requiring users to upload documents and ask natural language questions. It specifies the tech stack (Django REST Framework for backend, ReactJS/NextJS for frontend) and details the functionalities, including document processing, embedding generation, and a user-friendly interface. The submission deadline is May 31, 2025, with specific evaluation criteria and bonus points for advanced features.

Uploaded by

shubhxx15

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

92 views4 pages

Document RAG Assignment

Uploaded by

shubhxx15

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Frontend/Backend Assignment - Document

Intelligence Platform
To grab the internship, you are required to complete a small assignment. Early
submissions are given preference.
In this task, you are required to create a full-stack web application with AI/RAG
integration.

Objective:
The goal of this assignment is to build a Document Intelligence Platform where users
can upload documents (PDFs, Word docs, text files) and ask natural language
questions about the content. The system should use RAG (Retrieval Augmented
Generation) to provide accurate, contextual answer. Only the tech-stack mentioned in
the assignment should be used.

Backend (Django REST Framework)

GET APIs:
- Retrieve all uploaded documents from the database

POST APIs:
- Uploading and processing documents (TXT)
- Asking questions about documents (RAG query endpoint)

Document Processing Engine

Create a Python module that handles document processing and RAG implementation:
Extra marks for handling multiple document formats, smart chunking strategies, and
optimized embedding generation.
The system should:
● Text Chunking: Split documents into meaningful chunks (paragraphs, sections)
● Embedding Generation: Create vector embeddings for document chunks
● Vector Storage: Store embeddings in a vector database (ChromaDB or FAISS)
● Similarity Search: Find relevant chunks based on user queries
● Answer Generation: Use OpenAI/Anthropic API/LM Studio to generate contextual
answers
Input Parameters for Questions:
● Document ID
● User question
● Number of relevant chunks to retrieve (default: 3-5)

RAG Pipeline Implementation:

The system should implement a complete RAG pipeline that:
- Generates embeddings for user questions
- Performs similarity search across document chunks
- Constructs relevant context from retrieved chunks
- Generates contextual answers using LLM with source citations

Frontend (ReactJS / NextJS with Tailwind CSS)

User Interface:
The frontend should be developed using ReactJS or NextJS, styled with Tailwind CSS.

Required Pages:
1. Dashboard/Library Page:
o List all uploaded documents with title, pages count
2. Q&A Interface:
o Question input
o Answer display

3. Upload Page:

o File upload interface

Database Schema Hints:

Design your database with the following tables structure:
Documents table: Store document metadata including title, file path, type, size, pages,
processing status, and timestamps.
Document chunks table: Store processed text chunks with references to parent
document, chunk index, page numbers, and embedding identifiers for vector database
integration.

Tech Stack Requirements:

● Backend: Django REST Framework, Python
● Database: MySQL for metadata, ChromaDB/FAISS for vectors
● Frontend: ReactJS/NextJS with Tailwind CSS
● File Storage: Local storage
● AI Integration: OpenAI API, Claude API, or LM Studio (recommended if external
APIs are not available)

AI Integration Options:
Option 1: Use external APIs (OpenAI, Anthropic Claude)
Option 2: Use LM Studio for local LLM hosting (recommended alternative)
LM Studio allows you to run language models locally without requiring external API
keys. Download and set up LM Studio with models like Llama, Mistral, or Code Llama
for document question-answering tasks. This provides a cost-effective solution and
ensures data privacy.

Deadline
31 May, 2025, Friday, 11:55 PM
Submission Format
● Create a GitHub repository and add your code to it.
● In the README, add:
o Screenshots of the UI you have created
o Setup instructions for running the application
o API documentation
o Sample questions and answers from your system
● Include a requirements.txt file with all dependencies
● Add sample documents for testing
● Fill your repo link in this form: https://forms.gle/ZT6VZ1iW3ah91Gxu5.
● Make sure the code is neat and readable with proper comments.

Evaluation Criteria:
● Functionality (40%): Working RAG pipeline, accurate answers, proper citations
● Code Quality (25%): Clean, readable, well-structured code
● UI/UX (20%): User-friendly interface, responsive design
● Innovation (15%): Creative features, optimization techniques, error handling
Bonus Points (Optional):
● Support for multiple document formats (PDF, DOCX, TXT, MD)
● Advanced chunking strategies (semantic chunking, overlapping windows)
● Saving chat history
● Advanced search and filtering capabilities for the documents
For any doubts, feel free to drop a mail at [email protected] We’ll try to respond
as soon as possible!
In case you’re not able to complete it within the deadline, do submit the code even if a
part of it doesn’t work. The effort and skills you demonstrate through your code matter.
Note: An early submission will give you an advantage over others.
Helpful Resources:
● LM Studio: https://lmstudio.ai/ (for local LLM hosting)
● ChromaDB Documentation: https://docs.trychroma.com/
● Sentence Transformers: https://www.sbert.net/
● Django REST Framework: https://www.django-rest-framework.org/
● React File Upload: https://react-dropzone.js.org/

Take-Home Challenge
No ratings yet
Take-Home Challenge
3 pages
Assignment For Applied AI Engineer (RAG Pipeline) Role
No ratings yet
Assignment For Applied AI Engineer (RAG Pipeline) Role
4 pages
RAG System Backend Assignment
No ratings yet
RAG System Backend Assignment
4 pages
Interview Task 1
No ratings yet
Interview Task 1
2 pages
LLM Specialist Assignment - PanScience Innovations
No ratings yet
LLM Specialist Assignment - PanScience Innovations
2 pages
HLD LLD Design
No ratings yet
HLD LLD Design
3 pages
Smart Todo List Assignment
No ratings yet
Smart Todo List Assignment
4 pages
Synopsis
No ratings yet
Synopsis
3 pages
Assignment - SWE Intern at Wundrsight
No ratings yet
Assignment - SWE Intern at Wundrsight
4 pages
AI Document QA System Task
No ratings yet
AI Document QA System Task
3 pages
Mini Project Docubot Power Point
No ratings yet
Mini Project Docubot Power Point
17 pages
Mars Open Projects 2025
No ratings yet
Mars Open Projects 2025
7 pages
Sithafal Project Tasks
No ratings yet
Sithafal Project Tasks
2 pages
Byte Brawl
No ratings yet
Byte Brawl
11 pages
Harshit AI ML Engineer
No ratings yet
Harshit AI ML Engineer
4 pages
AI Solutions for Cybersecurity Challenges
No ratings yet
AI Solutions for Cybersecurity Challenges
5 pages
Major - Project - I - Presentation - 1 2025-26 Template
No ratings yet
Major - Project - I - Presentation - 1 2025-26 Template
13 pages
Python Coding Exercise
No ratings yet
Python Coding Exercise
2 pages
Presentation 2 K
No ratings yet
Presentation 2 K
12 pages
RAG LLM CheatSheet Cleaned
No ratings yet
RAG LLM CheatSheet Cleaned
3 pages
AI - Chatbot - Use - Case 1
No ratings yet
AI - Chatbot - Use - Case 1
2 pages
FSWD Lab Question Bank
No ratings yet
FSWD Lab Question Bank
5 pages
Assignment
No ratings yet
Assignment
5 pages
Katomaran AI Hackathon AI
No ratings yet
Katomaran AI Hackathon AI
3 pages
Backend Developer Assignment
No ratings yet
Backend Developer Assignment
3 pages
Intern Assignment
No ratings yet
Intern Assignment
1 page
BDIA Fall2024 Assignment2 3
No ratings yet
BDIA Fall2024 Assignment2 3
4 pages
Databricks Exam Guide Generative AI Engineer Associate Exam Guide
0% (1)
Databricks Exam Guide Generative AI Engineer Associate Exam Guide
6 pages
Edi 5
No ratings yet
Edi 5
15 pages
Datafy Generative-Ai Learning Path
No ratings yet
Datafy Generative-Ai Learning Path
7 pages
RAI AI Engineer Intern Assignments
No ratings yet
RAI AI Engineer Intern Assignments
3 pages
Fullstack Internship Assignment
No ratings yet
Fullstack Internship Assignment
2 pages
Post-Interview Evaluation Test1
No ratings yet
Post-Interview Evaluation Test1
2 pages
Assignments 4
No ratings yet
Assignments 4
1 page
RAG Project Understanding Document
No ratings yet
RAG Project Understanding Document
4 pages
Innovation Challenge 2025 - AI Hackathon Challenges
No ratings yet
Innovation Challenge 2025 - AI Hackathon Challenges
15 pages
Large Language Models and Prompt Engineering
No ratings yet
Large Language Models and Prompt Engineering
5 pages
? AI-Powered Knowledge Hub - Full-Stack Coding Challenge
No ratings yet
? AI-Powered Knowledge Hub - Full-Stack Coding Challenge
4 pages
AI Enhanced App Presentation
No ratings yet
AI Enhanced App Presentation
6 pages
Problem Statement
No ratings yet
Problem Statement
4 pages
Assignment
No ratings yet
Assignment
5 pages
An Effective Query System Using Llms and Langchain IJERTV12IS060161
No ratings yet
An Effective Query System Using Llms and Langchain IJERTV12IS060161
4 pages
Python - Backend and AI - ML - Job Description
No ratings yet
Python - Backend and AI - ML - Job Description
2 pages
PROJECT
No ratings yet
PROJECT
32 pages
Projects
No ratings yet
Projects
8 pages
Bootcamp GenAI AgenticAI Backend Engineers MacBook
No ratings yet
Bootcamp GenAI AgenticAI Backend Engineers MacBook
3 pages
Examplee
No ratings yet
Examplee
8 pages
Generative Adversarial Networks
No ratings yet
Generative Adversarial Networks
43 pages
AI Stack 2025
No ratings yet
AI Stack 2025
81 pages
Project Ideas
No ratings yet
Project Ideas
1 page
Shyena Consultant Ayush S MLOps 5+ Years
No ratings yet
Shyena Consultant Ayush S MLOps 5+ Years
5 pages
Project Seminar
No ratings yet
Project Seminar
12 pages
GenAI Training
No ratings yet
GenAI Training
3 pages
1 - Build A Complete OpenSource LLM RAG QA Chatbot - An In-Depth Journey (Introduction) - by Marco Bertelli - Level Up Coding
No ratings yet
1 - Build A Complete OpenSource LLM RAG QA Chatbot - An In-Depth Journey (Introduction) - by Marco Bertelli - Level Up Coding
12 pages
Pinnacle - Plus Projects
No ratings yet
Pinnacle - Plus Projects
12 pages
RP Journal-2
No ratings yet
RP Journal-2
54 pages
OpenMic Ai AI Product Engineer (Full Stack Engineer
No ratings yet
OpenMic Ai AI Product Engineer (Full Stack Engineer
4 pages
Best Open-Source AI Models For A Medical Chatbot
No ratings yet
Best Open-Source AI Models For A Medical Chatbot
7 pages
Deep Research Bittensor The Internet of AI
No ratings yet
Deep Research Bittensor The Internet of AI
52 pages
138 Harv. L. Rev. 1609 1
No ratings yet
138 Harv. L. Rev. 1609 1
24 pages
2025 05 15 RISC V Summit Europe P1.4.01 AHMAD Poster
No ratings yet
2025 05 15 RISC V Summit Europe P1.4.01 AHMAD Poster
1 page
Code Summarization Using LLM
No ratings yet
Code Summarization Using LLM
13 pages
Advancing Vehicle Plate Recognition: Multitasking Visual Language Models With Vehiclepaligemma
No ratings yet
Advancing Vehicle Plate Recognition: Multitasking Visual Language Models With Vehiclepaligemma
33 pages
Models - 'Free' - OpenRouter
No ratings yet
Models - 'Free' - OpenRouter
3 pages
The Leaderboard Illusion
No ratings yet
The Leaderboard Illusion
68 pages
Do Any of The LLM Have A Free API
No ratings yet
Do Any of The LLM Have A Free API
3 pages
See A2702050110
No ratings yet
See A2702050110
10 pages
S62797 - LLM Inference Sizing - Benchmarking End-to-End Inference Systems
No ratings yet
S62797 - LLM Inference Sizing - Benchmarking End-to-End Inference Systems
36 pages
1 s2.0 S0098135424003132 Main
No ratings yet
1 s2.0 S0098135424003132 Main
12 pages
Explainability For Large Language Models: A Survey
No ratings yet
Explainability For Large Language Models: A Survey
38 pages
Introducing The Llama Startup Program - Hacker News
No ratings yet
Introducing The Llama Startup Program - Hacker News
2 pages
Language Models Can Exploit Cross-Task In-Context Learning For Data-Scarce Novel Tasks
No ratings yet
Language Models Can Exploit Cross-Task In-Context Learning For Data-Scarce Novel Tasks
20 pages
Lecture 2 Language Model
No ratings yet
Lecture 2 Language Model
127 pages
Meta Vs Gemini
No ratings yet
Meta Vs Gemini
10 pages
Documentation - Llama
No ratings yet
Documentation - Llama
7 pages
Personalized Career Guidance Platform Using Psychometric and Activity Based Assessments-2
No ratings yet
Personalized Career Guidance Platform Using Psychometric and Activity Based Assessments-2
118 pages
WEBLINX - Real-World Website Navigation With Multi-Turn Dialogue
No ratings yet
WEBLINX - Real-World Website Navigation With Multi-Turn Dialogue
45 pages
HalluLens - LLM Hallucination Benchmark-2025
No ratings yet
HalluLens - LLM Hallucination Benchmark-2025
29 pages
Accelerate AI Era With Red Hat AI Solutions
No ratings yet
Accelerate AI Era With Red Hat AI Solutions
25 pages
Llama 3.1 Model Cards & Prompt Formats
No ratings yet
Llama 3.1 Model Cards & Prompt Formats
25 pages
DeepSeek图解10页
No ratings yet
DeepSeek图解10页
11 pages
Automatic ITR Filling SAAS App
No ratings yet
Automatic ITR Filling SAAS App
20 pages
提出了对prompt数据评估的三个指标
No ratings yet
提出了对prompt数据评估的三个指标
22 pages
Ruby Bug Detection & Repair Analysis
No ratings yet
Ruby Bug Detection & Repair Analysis
12 pages
Harshit Nigam 2311AI52
No ratings yet
Harshit Nigam 2311AI52
48 pages
De Bin Vul
No ratings yet
De Bin Vul
17 pages
人机协作的经济管理研究新时代
No ratings yet
人机协作的经济管理研究新时代
62 pages

Document RAG Assignment

Uploaded by

Document RAG Assignment

Uploaded by

Frontend/Backend Assignment - Document

Backend (Django REST Framework)

Document Processing Engine

RAG Pipeline Implementation:

Frontend (ReactJS / NextJS with Tailwind CSS)

3.​ Upload Page:

Database Schema Hints:

Tech Stack Requirements:

You might also like

3. Upload Page: