0% found this document useful (0 votes)
29 views5 pages

Data Modeling Project Overview

The Data Modeling project for the DS3122 course involves a group of 4-5 students and accounts for 25% of the overall assessment. Students will gain hands-on experience in building deep learning models for image classification, focusing on high-variability datasets, and will be required to submit a report, a poster, and deliver a final presentation. The project includes tasks such as dataset selection, data preprocessing, model building, evaluation, and reporting, with specific guidelines and examples provided for successful completion.

Uploaded by

mohamed sayed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views5 pages

Data Modeling Project Overview

The Data Modeling project for the DS3122 course involves a group of 4-5 students and accounts for 25% of the overall assessment. Students will gain hands-on experience in building deep learning models for image classification, focusing on high-variability datasets, and will be required to submit a report, a poster, and deliver a final presentation. The project includes tasks such as dataset selection, data preprocessing, model building, evaluation, and reporting, with specific guidelines and examples provided for successful completion.

Uploaded by

mohamed sayed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

DS3122: Data Modeling

Year: 1446/1447, Semester-I


Department of Data Science
‫ﻛﻠﻴﺔ اﻟﺤﺎﺳﺒﺎت‬
COLLAGE OF COMPUTING

Data Modeling Project

General Notes
• This course project contributes to 25% of the overall course assessment.
• This project should be carried out by a group of 4-5 students.
• This project must be submitted via Blackboard.

Learning Outcomes:

The aim of this project is to provide students with hands-on experience in building deep
learning models for image classification problems, particularly those involving datasets with
high variability. By the end of the project, students will:

1. Gain practical experience in training and evaluating deep learning models using
high-variability image datasets.
2. Deepen knowledge of deep learning techniques: specifically CNNs, RNNs (for
sequential image-related tasks), and feedforward neural networks. Students will also
compare these methods with traditional machine learning approaches.
3. Tackle complex classification challenges involving diverse, real-world data,
focusing on extracting patterns and improving model accuracy.

Requirements:

1. Dataset Selection:
o Choose a dataset that involves image classification with high variability.
Datasets with diverse image categories or complex real-world scenes are
recommended. Example datasets can be found on platforms such as:
§ Google Dataset Search
§ Kaggle
§ UCI ML Repository
§ COCO, ImageNet, CIFAR-100, etc.
2. Data Preprocessing:
o Ensure appropriate data preprocessing techniques are applied. This includes
handling missing or corrupted data, data augmentation (such as rotation,
flipping, or scaling of images), normalization, and other preprocessing steps
like resizing images to a uniform format.
3. Model Building:
o Apply at least two deep learning architectures (e.g., CNNs, RNNs, or
feedforward neural networks) for image classification.
o Compare the performance of deep learning techniques with classical machine
learning methods (e.g., decision trees, SVM) to demonstrate why deep
learning performs better on high-variability datasets.

1
DS3122: Data Modeling
Year: 1446/1447, Semester-I
Department of Data Science
‫ﻛﻠﻴﺔ اﻟﺤﺎﺳﺒﺎت‬
COLLAGE OF COMPUTING

4. Evaluation:
o Use appropriate evaluation metrics like accuracy, precision, recall, F1 score,
or confusion matrix. Where applicable, track performance through
visualizations (e.g., learning curves, confusion matrices).
o Justify the choice of evaluation metrics based on the project goals.
5. Report Structure: Your report should include the following sections:
o Abstract: A brief paragraph (500 words max) summarizing the project goals,
dataset used, approach, and key findings.
o Introduction: Introduce the image classification problem, dataset, and why
it's a challenging classification task due to variability. Describe the approach
and methodology (half a page to one page).
o Data Exploration and Description:
§ Provide a detailed description of the dataset, including the number of
classes, types of images, variability, and any preprocessing challenges
faced (such as noise, missing data, or imbalance). Include relevant
visualizations (e.g., sample images, class distribution plots).
o Methodology:
§ Describe the data preprocessing techniques, deep learning models used
(e.g., CNNs, RNNs), evaluation metrics, and experimental setup.
Include hyperparameters, training epochs, and data augmentation
techniques, if any.
§ Justify your choice of models and their suitability for high variability
datasets.
o Patterns Discovery and Results:
§ Present results from the deep learning models. Include performance
metrics, learning curves, and visualizations like accuracy and loss
graphs, confusion matrices, etc.
§ Compare the results of deep learning models against classical machine
learning models (e.g., Random Forest, SVM), explaining the benefits
and limitations of each method.
o Conclusion:
§ Summarize your findings, highlight the challenges of dealing with high
variability, and discuss potential improvements or future work, such as
fine-tuning models or applying transfer learning.
o References:
§ Follow the IEEE referencing style.

Tasks:

1. Task 1: To Submit A Poster for the Exhibition Next Week 8th Of October 2024.
2. Task 2: Submit the report
3. Task 3: Final Presentation

2
DS3122: Data Modeling
Year: 1446/1447, Semester-I
Department of Data Science
‫ﻛﻠﻴﺔ اﻟﺤﺎﺳﺒﺎت‬
COLLAGE OF COMPUTING

Sample Image Classification Projects


1. ImageNet

• Description: One of the largest and most popular image classification datasets, with
over 14 million labeled images across 1,000 categories.
• Link: [Link]

2. CIFAR-10

• Description: Contains 60,000 32x32 color images in 10 classes (e.g., airplanes, birds,
cats, and dogs), with 6,000 images per class.
• Link: [Link]

3. Emotion Recognition from Facial Images

• Objective: Build a CNN-based model to classify emotions (e.g., happy, sad, angry,
surprised) from images of faces.
• Dataset: [Link]

4. Garbage Classification for Recycling

• Objective: Classify images of waste (plastic, paper, metal, etc.) to aid in automated
recycling processes.
• Dataset: [Link]

5. MNIST (Modified National Institute of Standards and Technology)

• Description: A well-known dataset containing 70,000 grayscale images of


handwritten digits (0-9), with each image being 28x28 pixels.
• Link: [Link]

3
DS3122: Data Modeling
Year: 1446/1447, Semester-I
Department of Data Science
‫ﻛﻠﻴﺔ اﻟﺤﺎﺳﺒﺎت‬
COLLAGE OF COMPUTING

6. Fashion MNIST

• Description: A dataset similar to MNIST but with 70,000 grayscale images of


fashion items (e.g., shoes, shirts, bags) instead of digits, aimed at more complex
classification tasks.
• Link: Fashion MNIST

7. Animal Species Identification

• Objective: Classify different species of animals (e.g., lions, tigers, elephants, birds)
using image datasets.
• Dataset: iNaturalist or Caltech-UCSD Birds 200

8. Flowers Recognition

• Description: A dataset containing images of flowers from 5 different species. It’s


commonly used for fine-grained image classification tasks.
• Link: Flowers Dataset

9. Plant Disease Detection

• Objective: Classify plant diseases based on leaf images to assist farmers in


diagnosing diseases.
• Dataset: [Link]

10. Stanford Cars

• Description: Contains 16,185 images of 196 classes of cars. This dataset is often used
for fine-grained classification of car models.
• Link: Stanford Cars Dataset

4
DS3122: Data Modeling
Year: 1446/1447, Semester-I
Department of Data Science
‫ﻛﻠﻴﺔ اﻟﺤﺎﺳﺒﺎت‬
COLLAGE OF COMPUTING

11. Food-101

• Description: Contains 101,000 images of food items from 101 categories. Each class
contains 1,000 images, and the dataset is designed for fine-grained food classification
tasks.
• Link: [Link]

12. CelebA (CelebFaces Attributes)

• Description: A large-scale facial attributes dataset with more than 200,000 celebrity
images, each annotated with 40 attributes such as age, gender, and expression.
• Link: [Link]

13. Caltech-256

• Description: Contains 30,607 images spanning 256 object categories, making it ideal
for object recognition tasks.
• Link: [Link]

14. Medical Image Classification

• Objective: Classify medical images into categories like disease/no disease or


different types of medical conditions (e.g., skin cancer, pneumonia).
• Dataset: [Link]

15. Age and Gender Classification

• Objective: Build a CNN model that can predict the age and gender of a person from
an image.

Dataset: [Link]

You might also like