DATA WAREHOUSING AND MINING
MINI-PROJECT
TE CMPN B
DIAMOND PRICE PREDICTOR
Submitted By
Sr. No Roll No Name
1 22102B0034 Vedika jondhale
2 22102B0016 Siddhi Ghode
3 22102B0020 Harshika patil
4 22102B0003 Yash Chavan
Under the Guidance of
Dr. Kavita Shirsat
Department of Computer
Engineering
Vidyalankar Institute of
Technology Wadala(E),
Mumbai 400 037
University of Mumbai
2022-23
ABSTRACT
Used Diamond Price Prediction using Machine Learning:
The Diamond Price Predictor is a web application that estimates diamond prices using
features like carat, clarity, color, and dimensions (X, Y, Z). Leveraging machine learning
algorithms—Linear Regression, Decision Tree Regressor, and Random Forest
Regressor—the app provides accurate price predictions. With an easy-to-use interface, it
helps users make informed decisions in the diamond market.
Project Focus:
[Link] algorithms such as Linear Regression, Decision Tree Regressor,
and Random Forest Regressor for accurate predictions.
[Link] a user-friendly web application for easy input and instant price estimates.
[Link] users in making informed choices when buying or selling diamonds.
Project Description:
The Diamond Price Predictor is a web application that estimates diamond prices using a
cleaned dataset from Kaggle. Built with Python, it utilizes Linear Regression, Decision
Tree Regressor, and Random Forest Regressor for fast and accurate predictions. This tool
helps users make informed decisions in the diamond market.
Description of Dataset:
The dataset contains 10 attributes (columns).
Attribute Information:
1. carat
2. cut
3. color
4. clarity
5. depth
6. table
7. price
8. x
9. y
10. z
Algorithms Used
Random Forest Regressor
Random Forest is an ensemble technique capable of performing both regression and
classification tasks with the use of multiple decision trees and a technique called
Bootstrap and Aggregation, commonly known as bagging. The basic idea behind this
is to combine multiple decision trees in determining the final output rather than
relying on individual decision trees.
Random Forest has multiple decision trees as base learning models.
Decision Tree Regressor
Decision tree regression observes features of an object and trains a model in the
structure of a tree to predict data in the future to produce meaningful continuous
output. Continuous output means that the output/result is not discrete, i.e., it is not
represented just by a discrete, known set of numbers or values.
Linear Regression
Decision tree regression observes features of an object and trains a model in the
structure of a tree to predict data in the future to produce meaningful continuous
output. Continuous output means that the output/result is not discrete, i.e., it is not
represented just by a discrete, known set of numbers or values.
Data Analysis
GUI Screenshots
ANALYSIS
ALGORITHM USED ACCURACY
Random Forest Regressor 95.29%
Decision Tree 87.32%
Linear Regressor 83.79%
CONCLUSION
Our project builds a machine learning model to predict diamond prices using
attributes like carat, clarity, dimensions, and color. Integrated with a Flask app, it
provides quick, data-driven price estimates, making diamond valuation easier and
more accessible.