0% found this document useful (0 votes)

39 views7 pages

Q1. Explain Data Science Process Along With Detailed Diagram

The Data Science Process consists of sequential steps including setting research goals, retrieving data, preparing data, exploring data, modeling, and presenting results. Data preparation involves sub-phases such as data cleansing, integration, and transformation to ensure data quality and suitability for analysis. Various data exploration techniques are employed to understand data patterns, while data science has applications across commercial, governmental, non-profit, and academic sectors.

Uploaded by

Raj

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

39 views7 pages

Q1. Explain Data Science Process Along With Detailed Diagram

Uploaded by

Raj

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Q1. Explain data science process along with detailed diagram.

Data Science Process

The Data Science Process involves several important steps to solve a problem using data. These
steps are followed in a sequence to ensure useful and reliable results for a business or
organization.

1. Setting the Research Goal

In this step, we define what the project is about. A project charter is prepared, which includes:

• The problem to be solved,

• The benefits to the business,

• What data and resources are needed,

• Timetable and expected outcomes.

2. Retrieving Data

Once the goal is set, the required data is collected. The data may come from:

• Company databases,

• Excel files,

• External sources like third-party services.

We also check if the data is accessible and of good quality.

3. Data Preparation

This step includes:

• Data Cleaning: Removing errors or wrong values,

• Data Integration: Combining data from multiple sources,

• Data Transformation: Converting data into a suitable format for analysis.

4. Data Exploration (EDA – Exploratory Data Analysis)

In this step, we try to understand the data better using:

• Descriptive statistics,
• Charts and graphs,

• Identifying trends and outliers.

This helps in making better decisions in the next step.

5. Data Modelling (Model Building)

Here, we build models using machine learning, statistics, or other techniques. This includes:

• Choosing the right method,

• Training and testing the model,

• Improving it based on performance.

6. Presentation and Automation

The final step is to present the results to the business through:

• Reports,

• Presentations, or

• Dashboards.

Sometimes, the whole process is automated so that it can be used again in future projects.
Q2) Write a short note on the following sub-phases of Data Preparation

1. Data Cleansing

2. Data Integration

3. Data Transformation

4.Sub-phases of Data Preparation

1. Data Cleansing

• This step involves removing incorrect, missing, or duplicate values from the dataset.

• It ensures that the data is accurate, consistent, and reliable for analysis.

• Example: Fixing spelling errors, filling missing values, or removing outliers.

2. Data Integration

• In this step, data from multiple sources is combined into a single dataset.

• It helps in creating a complete and unified view of the data.

• Example: Merging customer details from different departments like sales and support.

3. Data Transformation

• This process involves converting data into a suitable format for analysis or modeling.

• It includes tasks like normalization, scaling, encoding categorical data, etc.

• Example: Changing date formats, converting text to numbers, or scaling values between
0 and 1.

Q3) Write a note on Data exploration techniques.

Data Exploration, also known as Exploratory Data Analysis (EDA), is the process of
understanding the data before building models. It helps identify patterns, trends, and errors in
the data.

Simple Graphs

• These are basic visualizations used to understand the distribution and patterns in data.

• Examples: Bar charts, Histograms, Pie charts, Line graphs.

2. Combined Graphs

• These graphs combine multiple data variables in one visual.

• Useful to study relationships or comparisons.

• Example: Scatter plot with trend lines, or box plots with groupings.

3. Link and Brush

• An interactive technique used in data visualization.

• When you select data in one graph, the corresponding data is highlighted (brushed) in
another graph.

• Helps to explore relationships across multiple plots.

4. Nongraphical Techniques

• These are non-visual methods of exploring data.

• Includes summary statistics, like:

o Mean, Median, Mode

o Standard Deviation
o Skewness and Kurtosis

• Helps in understanding data behavior numerically.

Q4)List the Benefits and Uses of Data Science and Big Data.

1. Commercial Applications

• Customer Insights: Understands user behavior to improve services (e.g., Google

AdSense).

• Advertising: Delivers personalized ads in real-time (e.g., MaxPoint).

• Human Resources: Helps in candidate screening and employee mood analysis.

• Finance: Predicts markets, evaluates risks, and automates trades.

2. Governmental Applications

• Fraud Detection: Detects fraud and criminal activities.

• Public Data Sharing: Platforms like Data.gov offer open access to data.

• Surveillance: Monitors individuals and gathers intelligence (e.g., NSA).

3. Non-profit Applications

• Fundraising: Boosts campaigns using data (e.g., WWF).

• Social Impact: Aids NGOs in using data for social good (e.g., DataKind).

4. Academic Applications

• Research: Supports research and improves student experience.

• Online Learning: MOOCs use data to enhance e-learning (e.g., Coursera).

Q5) List and explain the Facets/Types of Data in Data Science.

1. Structured Data

• Follows a defined data model and fits into rows and columns (e.g., Excel, SQL
databases).

• Easy to store, manage, and query using SQL.

• Some structured forms (like hierarchies) can be tricky to handle in traditional databases.

2. Unstructured Data

• Does not follow a fixed format or model.

• Difficult to analyze due to varying content and context (e.g., emails, social media posts).

3. Natural Language

• A type of unstructured data written in human language.

• Hard to process due to ambiguity and context-specific meanings.

• Techniques used: sentiment analysis, entity recognition, summarization, etc.

4. Machine-Generated Data

• Created automatically by machines without human input.

• Examples: server logs, IoT data, call records, network logs.

• Grows rapidly with IoT development.

5. Graph-Based Data

• Represents relationships between entities (nodes and edges).

• Useful in social networks, recommendation systems, and fraud detection.

6. Audio, Video, and Images

• Rich media formats that require advanced tools to store and analyze.

• Used in speech recognition, facial recognition, video analytics, etc.

7. Streaming Data

• Real-time, continuous flow of data.

• Comes from sensors, live feeds, and online activities.

• Requires tools for real-time processing (e.g., Apache Kafka, Spark Streaming).

Q6) Explain different types of data in data science.

Same as previous

Q7)Differentiate between list and tuples in python.

Bhavika patil

Question Bank With Answers
No ratings yet
Question Bank With Answers
103 pages
Unit 1 - 2marks
No ratings yet
Unit 1 - 2marks
3 pages
Cs3352 - Foundation of Data Science
No ratings yet
Cs3352 - Foundation of Data Science
56 pages
UNIT 1 Material
No ratings yet
UNIT 1 Material
28 pages
Chapter 2
No ratings yet
Chapter 2
30 pages
Data Science Overview for Honours Students
No ratings yet
Data Science Overview for Honours Students
28 pages
Fods QB
No ratings yet
Fods QB
35 pages
FDS - Unit 1 Question Bank
No ratings yet
FDS - Unit 1 Question Bank
16 pages
DS 3-Marks Semeseter Suggestion
No ratings yet
DS 3-Marks Semeseter Suggestion
54 pages
2 Marks With Answers
No ratings yet
2 Marks With Answers
39 pages
DTS 201 Lecture Note
No ratings yet
DTS 201 Lecture Note
24 pages
Data Science Fundamentals QB
No ratings yet
Data Science Fundamentals QB
23 pages
2 Marks Foundations of Data Science
No ratings yet
2 Marks Foundations of Data Science
13 pages
Revision
No ratings yet
Revision
19 pages
12 2marks With Ans
No ratings yet
12 2marks With Ans
21 pages
CS3352-FDS 2 Marks Questions With Answer
No ratings yet
CS3352-FDS 2 Marks Questions With Answer
20 pages
12 2marks With Ans
No ratings yet
12 2marks With Ans
21 pages
Cs3352 Fods QB
No ratings yet
Cs3352 Fods QB
25 pages
Chapter 2 Data Science
No ratings yet
Chapter 2 Data Science
43 pages
Data Science and Analytics Reviewer
No ratings yet
Data Science and Analytics Reviewer
5 pages
Set. No - 1 P18pecs021-Data Science QP - Ph.d.
No ratings yet
Set. No - 1 P18pecs021-Data Science QP - Ph.d.
20 pages
ETCh 2
No ratings yet
ETCh 2
36 pages
Introduction to Data Science Basics
No ratings yet
Introduction to Data Science Basics
26 pages
Introduction To Data Science - 23CSH-283
100% (1)
Introduction To Data Science - 23CSH-283
48 pages
Data Science Comprehension Worksheets
No ratings yet
Data Science Comprehension Worksheets
32 pages
2marks Unit 1 2marks Unit 1: Foundations of Datascience (Anna University) Foundations of Datascience (Anna University)
No ratings yet
2marks Unit 1 2marks Unit 1: Foundations of Datascience (Anna University) Foundations of Datascience (Anna University)
8 pages
Data Science
No ratings yet
Data Science
10 pages
Chapter - 2 - Data Science
No ratings yet
Chapter - 2 - Data Science
33 pages
Data Science: Insights & Challenges
No ratings yet
Data Science: Insights & Challenges
33 pages
DA-1,2,3 (1) Merged
No ratings yet
DA-1,2,3 (1) Merged
39 pages
Class 9 (Chap #4)
No ratings yet
Class 9 (Chap #4)
9 pages
PDS Question Bank
No ratings yet
PDS Question Bank
19 pages
Fds QB
No ratings yet
Fds QB
21 pages
FDS Unit 1 QB
No ratings yet
FDS Unit 1 QB
7 pages
Data Science Unit 01
No ratings yet
Data Science Unit 01
19 pages
Chapter No.4 Exercise Solution (Computer)
No ratings yet
Chapter No.4 Exercise Solution (Computer)
8 pages
Unit 1
No ratings yet
Unit 1
11 pages
Unit 2 Data Gathering
No ratings yet
Unit 2 Data Gathering
14 pages
01.ad3491 Fdsa QB
No ratings yet
01.ad3491 Fdsa QB
16 pages
Set. No - 2 P18pecs021-Data Science QP - Ph.d.
No ratings yet
Set. No - 2 P18pecs021-Data Science QP - Ph.d.
20 pages
All Answers
No ratings yet
All Answers
55 pages
Screenshot 2025-04-23 at 8.26.12 AM
No ratings yet
Screenshot 2025-04-23 at 8.26.12 AM
14 pages
Fds Question Bank
No ratings yet
Fds Question Bank
116 pages
Fds Question Bank With Answer
No ratings yet
Fds Question Bank With Answer
35 pages
Data Science Fundamentals Detailed Notes
No ratings yet
Data Science Fundamentals Detailed Notes
31 pages
Chapter Two
No ratings yet
Chapter Two
14 pages
Business Intelligence Unit 2 Engineering Notes
No ratings yet
Business Intelligence Unit 2 Engineering Notes
50 pages
File
No ratings yet
File
27 pages
Unit I 2 Marks With Ans
No ratings yet
Unit I 2 Marks With Ans
7 pages
Data Science Foundations Guide
No ratings yet
Data Science Foundations Guide
19 pages
Data Science
No ratings yet
Data Science
31 pages
FDSNotes
No ratings yet
FDSNotes
12 pages
Data Science Essentials for Beginners
No ratings yet
Data Science Essentials for Beginners
20 pages
Data Science (Quick Guide) For College Exams
No ratings yet
Data Science (Quick Guide) For College Exams
34 pages
Introduction To Data Science
No ratings yet
Introduction To Data Science
29 pages
Unit I - Data Science
No ratings yet
Unit I - Data Science
185 pages
Essential Data Science Notes - A Concise PDF Guide
No ratings yet
Essential Data Science Notes - A Concise PDF Guide
20 pages
Ixs8h l8mgc
No ratings yet
Ixs8h l8mgc
40 pages
Invoice 4776584451147068501
No ratings yet
Invoice 4776584451147068501
3 pages
Hytrin (Kandungan Sama Dengan Hytroz)
No ratings yet
Hytrin (Kandungan Sama Dengan Hytroz)
7 pages
2nd Quarter Grade 7
No ratings yet
2nd Quarter Grade 7
4 pages
Bulk SMS Service - Smsmenow
No ratings yet
Bulk SMS Service - Smsmenow
2 pages
Advanced Particle Physics Analysis
No ratings yet
Advanced Particle Physics Analysis
17 pages
Et03188801 01 U01 Mrchat5 PD
No ratings yet
Et03188801 01 U01 Mrchat5 PD
14 pages
Sheeja 2
No ratings yet
Sheeja 2
1 page
TIA EIA 568 B.2 1final
No ratings yet
TIA EIA 568 B.2 1final
86 pages
Green House and Poly House
67% (3)
Green House and Poly House
14 pages
Chapter 10
88% (8)
Chapter 10
72 pages
Unit 9 at The Beach
No ratings yet
Unit 9 at The Beach
45 pages
Fraud Detection Using Machine Learning
No ratings yet
Fraud Detection Using Machine Learning
36 pages
ISN404 Research Thesis 1 Unit Outline 2025 S1
No ratings yet
ISN404 Research Thesis 1 Unit Outline 2025 S1
5 pages
Vocabulary & Grammar Test Unit 9 Test A
100% (5)
Vocabulary & Grammar Test Unit 9 Test A
5 pages
#Complications of Suppurative Otitis Media
No ratings yet
#Complications of Suppurative Otitis Media
8 pages
Basic Load Cases Used For Piping Stress Analysis
No ratings yet
Basic Load Cases Used For Piping Stress Analysis
5 pages
LCCC Educ 97 Mid Term Exam
No ratings yet
LCCC Educ 97 Mid Term Exam
7 pages
Exploring Pen Pal Relationships
No ratings yet
Exploring Pen Pal Relationships
103 pages
Word Based Arrangement For Bank Exam - Question Bank Set 1 (Eng)
No ratings yet
Word Based Arrangement For Bank Exam - Question Bank Set 1 (Eng)
5 pages
Öz, H. (2014) - Morphology and Implications For English Language Teaching. in A. Saricoban (Ed.), Linguistics For English Language
No ratings yet
Öz, H. (2014) - Morphology and Implications For English Language Teaching. in A. Saricoban (Ed.), Linguistics For English Language
42 pages
Class Notes of Unit 4 - Fashion Merchandising
No ratings yet
Class Notes of Unit 4 - Fashion Merchandising
38 pages
Angle of Pull and Pulley by Hamza Mir
No ratings yet
Angle of Pull and Pulley by Hamza Mir
25 pages
PH YS IC S: Physics STD 12: Physics MCQ - 3
No ratings yet
PH YS IC S: Physics STD 12: Physics MCQ - 3
18 pages
Tobee TSZ Series
No ratings yet
Tobee TSZ Series
20 pages
Daikin Altherma Ground Source Heat Pump - Product Profile - Installers - ECPEN15-728A - English
No ratings yet
Daikin Altherma Ground Source Heat Pump - Product Profile - Installers - ECPEN15-728A - English
8 pages
Stacks and Subroutines in 8085
No ratings yet
Stacks and Subroutines in 8085
25 pages
Foundation Engineering Course
No ratings yet
Foundation Engineering Course
70 pages
Indian Data Privacy in Digital Marketing
No ratings yet
Indian Data Privacy in Digital Marketing
31 pages
Mercedes Benz M264 M260 1.5L 2.0L Engine
50% (2)
Mercedes Benz M264 M260 1.5L 2.0L Engine
8 pages
Biology Tutoring Strategies Guide
No ratings yet
Biology Tutoring Strategies Guide
2 pages

Q1. Explain Data Science Process Along With Detailed Diagram

Uploaded by

Q1. Explain Data Science Process Along With Detailed Diagram

Uploaded by

Q1. Explain data science process along with detailed diagram.

Data Science Process

1. Setting the Research Goal

• The problem to be solved,

• The benefits to the business,

• What data and resources are needed,

• Timetable and expected outcomes.

• External sources like third-party services.

We also check if the data is accessible and of good quality.

This step includes:

• Data Cleaning: Removing errors or wrong values,

• Data Integration: Combining data from multiple sources,

• Data Transformation: Converting data into a suitable format for analysis.

4. Data Exploration (EDA – Exploratory Data Analysis)

In this step, we try to understand the data better using:

• Identifying trends and outliers.

This helps in making better decisions in the next step.

5. Data Modelling (Model Building)

• Choosing the right method,

• Training and testing the model,

• Improving it based on performance.

6. Presentation and Automation

The final step is to present the results to the business through:

4.Sub-phases of Data Preparation

• Example: Fixing spelling errors, filling missing values, or removing outliers.

• It helps in creating a complete and unified view of the data.

• It includes tasks like normalization, scaling, encoding categorical data, etc.

Q3) Write a note on Data exploration techniques.

• Examples: Bar charts, Histograms, Pie charts, Line graphs.

• These graphs combine multiple data variables in one visual.

• Useful to study relationships or comparisons.

3. Link and Brush

• An interactive technique used in data visualization.

• Helps to explore relationships across multiple plots.

• These are non-visual methods of exploring data.

• Includes summary statistics, like:

o Mean, Median, Mode

• Helps in understanding data behavior numerically.

• Customer Insights: Understands user behavior to improve services (e.g., Google

• Advertising: Delivers personalized ads in real-time (e.g., MaxPoint).

• Human Resources: Helps in candidate screening and employee mood analysis.

• Finance: Predicts markets, evaluates risks, and automates trades.

• Fraud Detection: Detects fraud and criminal activities.

• Surveillance: Monitors individuals and gathers intelligence (e.g., NSA).

• Fundraising: Boosts campaigns using data (e.g., WWF).

• Research: Supports research and improves student experience.

• Online Learning: MOOCs use data to enhance e-learning (e.g., Coursera).

Q5) List and explain the Facets/Types of Data in Data Science.

• Easy to store, manage, and query using SQL.

• Does not follow a fixed format or model.

• A type of unstructured data written in human language.

• Hard to process due to ambiguity and context-specific meanings.

• Techniques used: sentiment analysis, entity recognition, summarization, etc.

• Created automatically by machines without human input.

• Examples: server logs, IoT data, call records, network logs.

• Grows rapidly with IoT development.

• Represents relationships between entities (nodes and edges).

• Useful in social networks, recommendation systems, and fraud detection.

6. Audio, Video, and Images

• Used in speech recognition, facial recognition, video analytics, etc.

• Real-time, continuous flow of data.

• Comes from sensors, live feeds, and online activities.

Q6) Explain different types of data in data science.

Q7)Differentiate between list and tuples in python.

You might also like