Project CSEC
A Model to retrieve projects information’s (technical drawings/relevant documents) based on
natural language input — such as structural features or unique design elements
For example:
1. Structural Criteria: “List buildings with steel roofs and more than 2 stories.”
2. Unique Features: “Which projects have circular staircases?”
Extracted Project Dataset:
Model’s Data Structure:
Prototype Model’s
1. Model V1 – Text Similarity (Keyword Match)
- simple string-matching using Python & CSV
- Downsides: It only finds results if the query matches the
text exactly (no meaning or relationships understanding)
2. Model V2 – Vector Embedding Search (Semantic)
- use Sentence Transformers + FAISS (facebook ai similarity
check)
- Embeddings capture meaning but ignore structured
relationships
- Example: It might find “pile foundation” similar to “raft
foundation” just because they occur in similar contexts not
because they’re technically related.
- Can't enforce filters like: "only projects with 2+ stories
AND circular staircase AND pile foundation".
3. Model V3 – Hybrid (Embedding + GPT Reasoning)
- use vector embedding for narrowing candidates, then GPT-
4 for filtering & reasoning
- Operational Cost
- Need improvement for higher accuracy.
4. Model V4 – Hybrid model with domain specific fine
tunings
- Highly Accurate + Domain-Aware Responses
Next Steps
1. Fine-Tune the Model (for Higher Accuracy)
- Implementing model V4
Goal: Teach the model domain-specific knowledge using
examples (like project queries & expected outputs and their
relationships)
2. Improve with New Data + Set Up Relational DB
Goal: Expand and organize your data
(drawings/projects/features) to support scalable search &
retrieval.
3. Make the Model Accessible to Others (Web App or)
- Design UI
- Connect UI to backend
- Create a cloud/server DB and deploy the system