0% found this document useful (0 votes)

38 views4 pages

DVA Assignment 1

The document outlines key concepts in data analytics, including the Analytics Process Model, data collection processes, sampling methods, handling missing values, outlier detection, standardization techniques, and categorization versus segmentation. It emphasizes the importance of structured approaches in analytics for informed decision-making and highlights various methods for data handling and analysis. The content serves as a comprehensive guide for understanding fundamental analytics concepts and their applications.

Uploaded by

Tanya Maheshwari

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

38 views4 pages

DVA Assignment 1

Uploaded by

Tanya Maheshwari

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

DVA Assignment 1

Name: Tanya Maheshwari

Enrollment No. 02613702022
Submitted to: Ms. Kavita Srivastva
1. Explain Analytics Process Model
Answer:
The Analytics Process Model outlines the structured approach used to derive insights from data and
support decision-making. It typically includes the following steps:
1. Problem Definition: Clearly define the business or research problem.
2. Data Collection: Gather relevant data from primary or secondary sources.
3. Data Preparation: Clean, integrate, and transform data for analysis.
4. Exploratory Data Analysis (EDA): Use statistical and visual methods to understand
patterns, trends, and anomalies.
5. Model Building: Apply statistical or machine learning models to solve the problem.
6. Validation and Testing: Evaluate model performance using test data.
7. Deployment: Implement the solution into the business environment.
8. Monitoring and Feedback: Continuously monitor model accuracy and update as needed.
This model ensures that analytics is goal-driven, repeatable, and actionable, helping organizations
gain insights and make informed decisions.
2. Describe Data Collection Process. Differentiate Between Primary and Secondary Data.
Discuss Ways of Obtaining Them.
Answer:
Data Collection is the process of gathering information to analyze and draw conclusions. It can be
done manually or automatically, and the quality of collected data directly impacts analysis.
Primary Data:
 Collected firsthand for a specific purpose.
 Methods: Surveys, interviews, experiments, focus groups, observations.
Secondary Data:
 Already collected and available for use.
 Sources: Government reports, company records, published research, online databases.
Differences:

Feature Primary Data Secondary Data

Purpose Specific, original Previously collected

Accuracy Higher (custom) Variable (dependent on source)

Cost High Low or free

3. Explain Different Ways of Sampling. What Are the Benefits of Sampling?

Answer:
Sampling is the process of selecting a subset from a larger population to analyze and draw
conclusions.
Types of Sampling:
 Probability Sampling: Every element has a known chance of selection.
o Simple Random Sampling

o Stratified Sampling

o Systematic Sampling

o Cluster Sampling

 Non-Probability Sampling: No known probability of selection.

o Convenience Sampling

o Judgmental Sampling

o Snowball Sampling

o Quota Sampling

Benefits of Sampling:
 Cost-Effective: Reduces data collection and processing costs.
 Time-Saving: Quicker than analyzing the entire population.
 Efficient: Useful when population is too large or inaccessible.
 Accurate: Provides reliable results when done correctly.
Sampling enables researchers to make generalized inferences without studying every data point.

4. Explain Different Ways of Handling Missing Values

Answer:
Handling missing values is crucial for accurate analysis and modeling. Common techniques include:
1. Deletion:
o Listwise Deletion: Remove rows with any missing value.

o Column Deletion: Remove features with high missingness.

o Best for small datasets or when few values are missing.

2. Imputation:
o Mean/Median/Mode Imputation: Replace with average or most frequent value.
o Forward/Backward Fill: Use previous or next value in sequence.

o KNN Imputation: Uses k-nearest neighbors to estimate missing values.

o Regression Imputation: Predict missing values using other features.

3. Advanced Techniques:
o Multiple Imputation or Machine Learning Models for more complex scenarios.

Proper handling of missing data avoids bias, improves model performance, and ensures data integrity.
5. What Are Outliers? How Are Outliers Detected and Handled Using Python?
Answer:
Outliers are data points significantly different from other observations. They can distort statistical
analysis and model accuracy.
Detection Methods in Python:
python
CopyEdit
import numpy as np
import pandas as pd
# Using Z-score
from scipy import stats
z_scores = [Link]([Link](df['column']))
outliers = df[z_scores > 3]\
# Using IQR
Q1 = df['column'].quantile(0.25)
Q3 = df['column'].quantile(0.75)
IQR = Q3 - Q1
outliers = df[(df['column'] < (Q1 - 1.5*IQR)) | (df['column'] > (Q3 + 1.5*IQR))]
Handling Techniques:
 Removal: Drop rows with extreme outliers.
 Transformation: Apply log or square root to reduce skewness.
 Capping (Winsorization): Limit extreme values to certain percentiles.
Outlier handling improves data quality and model robustness.
6. Differentiate Between Min/Max and Z-score Methods of Standardization
Answer:

Feature Min/Max Normalization Z-score Standardization

x' = \frac{x - \min(x)}{\max(x) - \
Formula x' = \frac{x - \mu}{\sigma}
min(x)}

Output Range [0, 1] or any defined range Mean = 0, SD = 1

Sensitive to
Yes Less sensitive
Outliers

When data needs to be scaled to a When data needs to follow normal

Usage
specific range distribution

7. Explain Categorization and Segmentation

Answer:
Categorization is the process of labeling or grouping data based on predefined categories or
attributes. It simplifies analysis by classifying data into manageable groups.
Example: Classifying customers into categories such as "new," "loyal," or "inactive."
Segmentation goes further by dividing data into meaningful, often homogeneous subgroups, based
on behavior, demographics, or purchasing habits.
Example: Customer segmentation for marketing based on age, buying frequency, or location.
Differences:
 Categorization is rule-based and static.
 Segmentation is dynamic and often data-driven (e.g., using clustering algorithms like K-
Means).
Both are essential for targeted marketing, personalized recommendations, and efficient decision-
making in data-driven environments.

CSA3007 Complete Answers
No ratings yet
CSA3007 Complete Answers
3 pages
CSA3007 Complete Answers With Diagrams
No ratings yet
CSA3007 Complete Answers With Diagrams
3 pages
Module 3 Notes
No ratings yet
Module 3 Notes
5 pages
DSBDA Lab Assignment No 2
No ratings yet
DSBDA Lab Assignment No 2
7 pages
Cognizant Data Analyst Interview Questions 1745235888
No ratings yet
Cognizant Data Analyst Interview Questions 1745235888
18 pages
CSA3007 Important Questions Complete
No ratings yet
CSA3007 Important Questions Complete
3 pages
Data Wrangling Assignment Guide
No ratings yet
Data Wrangling Assignment Guide
4 pages
Da 1733591326
No ratings yet
Da 1733591326
132 pages
Crack Data Science Interview 1731300339
No ratings yet
Crack Data Science Interview 1731300339
132 pages
Complete Data Science Questions
No ratings yet
Complete Data Science Questions
5 pages
Ads Imp Qna 2025 15 04 06 06 35
No ratings yet
Ads Imp Qna 2025 15 04 06 06 35
33 pages
Topic 3 Data Quality
No ratings yet
Topic 3 Data Quality
4 pages
Common Analytics Interview Questions
No ratings yet
Common Analytics Interview Questions
4 pages
Data Science Interview
No ratings yet
Data Science Interview
132 pages
ML Lecture 6 7 Preprocess
No ratings yet
ML Lecture 6 7 Preprocess
43 pages
Business Analytics Essentials
No ratings yet
Business Analytics Essentials
37 pages
Eda U2
No ratings yet
Eda U2
141 pages
Dsi237 Group 2
No ratings yet
Dsi237 Group 2
27 pages
Data Analysis & Python Essentials
No ratings yet
Data Analysis & Python Essentials
7 pages
Key Ingredients of PM
No ratings yet
Key Ingredients of PM
16 pages
Datascience Sum.23sol
No ratings yet
Datascience Sum.23sol
22 pages
ML Chapter 2
No ratings yet
ML Chapter 2
9 pages
Data Science and Data Analytics: Part B
No ratings yet
Data Science and Data Analytics: Part B
42 pages
Data Mining Reviewer
No ratings yet
Data Mining Reviewer
4 pages
Feature Engineering Techniques Guide
No ratings yet
Feature Engineering Techniques Guide
69 pages
Assignment 02
No ratings yet
Assignment 02
9 pages
Data Preparation for Data Science
No ratings yet
Data Preparation for Data Science
57 pages
DADV - Question Bank - Important Questions of DADV
No ratings yet
DADV - Question Bank - Important Questions of DADV
20 pages
Ba CH-2
No ratings yet
Ba CH-2
6 pages
Assignment Big Data
No ratings yet
Assignment Big Data
7 pages
QUESTION BANK Data Analytics
No ratings yet
QUESTION BANK Data Analytics
6 pages
Academic Performance Data Wrangling
No ratings yet
Academic Performance Data Wrangling
9 pages
ML ch-1
No ratings yet
ML ch-1
32 pages
Data Science Fundamentals and Processes
No ratings yet
Data Science Fundamentals and Processes
33 pages
Eda Indepth
No ratings yet
Eda Indepth
19 pages
22UCS303 DS-Unit II-N
No ratings yet
22UCS303 DS-Unit II-N
71 pages
Data Mining for Analysts
No ratings yet
Data Mining for Analysts
38 pages
55 Questions
No ratings yet
55 Questions
17 pages
Data Science Interview Questions (#Day11) PDF
100% (1)
Data Science Interview Questions (#Day11) PDF
11 pages
Chapter 2
No ratings yet
Chapter 2
37 pages
Business Analytics Course Guide
No ratings yet
Business Analytics Course Guide
38 pages
Class Xi Chapter 2
No ratings yet
Class Xi Chapter 2
10 pages
Data Preprocessing Essentials
No ratings yet
Data Preprocessing Essentials
14 pages
DWDM Unit 3
No ratings yet
DWDM Unit 3
16 pages
Unit 1
No ratings yet
Unit 1
21 pages
Data Minig Anwers
No ratings yet
Data Minig Anwers
37 pages
Data Preprocess Steps
No ratings yet
Data Preprocess Steps
2 pages
Bi Ut2 Answers
No ratings yet
Bi Ut2 Answers
23 pages
SQL Operations: EXCEPT vs INTERSECT
No ratings yet
SQL Operations: EXCEPT vs INTERSECT
71 pages
Unit 2 MCQ 12th Class
No ratings yet
Unit 2 MCQ 12th Class
11 pages
FDS PYQ Solution
No ratings yet
FDS PYQ Solution
8 pages
Business Stats & Sampling Guide
No ratings yet
Business Stats & Sampling Guide
13 pages
Data Analysis Concepts Explanation
No ratings yet
Data Analysis Concepts Explanation
3 pages
Data Preparation Notebook
No ratings yet
Data Preparation Notebook
14 pages
Unit II 10 Data Preprocessing Techniques
No ratings yet
Unit II 10 Data Preprocessing Techniques
13 pages
Data Mining Notes
No ratings yet
Data Mining Notes
43 pages
Data Cleaning
No ratings yet
Data Cleaning
4 pages
REVIEWER
No ratings yet
REVIEWER
9 pages
APGS Different Frequency Automatic Dielectric Loss Tester
No ratings yet
APGS Different Frequency Automatic Dielectric Loss Tester
16 pages
Valero Pembroke Accreditation Details
No ratings yet
Valero Pembroke Accreditation Details
2 pages
LCD 128x64 Microcontroller Interface Guide
100% (2)
LCD 128x64 Microcontroller Interface Guide
5 pages
Bbs 1
No ratings yet
Bbs 1
1 page
WO For Third Party Inspection of The Cables and Wires - HPE EIL RBI Bhubaneswar
No ratings yet
WO For Third Party Inspection of The Cables and Wires - HPE EIL RBI Bhubaneswar
1 page
WavLM: Advancements in Speech Processing
No ratings yet
WavLM: Advancements in Speech Processing
14 pages
AI Boosts Caterpillar's Forecasting
No ratings yet
AI Boosts Caterpillar's Forecasting
4 pages
Balancing - Determination of Permissible Unbalance
No ratings yet
Balancing - Determination of Permissible Unbalance
5 pages
Sales Strategies for SM Technologies Bangalore
No ratings yet
Sales Strategies for SM Technologies Bangalore
3 pages
Cognizant AIA CE Job Profile Overview
No ratings yet
Cognizant AIA CE Job Profile Overview
5 pages
ZKBioGo Manual
No ratings yet
ZKBioGo Manual
1 page
SPC Training
No ratings yet
SPC Training
78 pages
Sta 2300 West
No ratings yet
Sta 2300 West
3 pages
Kusto Explorer Base Queries Guide
No ratings yet
Kusto Explorer Base Queries Guide
2 pages
Logical Tools For Modelling Legal Argument
No ratings yet
Logical Tools For Modelling Legal Argument
319 pages
Roshan CV Oct24
No ratings yet
Roshan CV Oct24
5 pages
Catalogue Havells Air Cooler2
No ratings yet
Catalogue Havells Air Cooler2
13 pages
Simcom Sim5215 Sim5216 Atc en v1.21
No ratings yet
Simcom Sim5215 Sim5216 Atc en v1.21
527 pages
HP Probook-6570b
No ratings yet
HP Probook-6570b
46 pages
FPGA Based Signal Processing Implementation For Hearing Impairment
No ratings yet
FPGA Based Signal Processing Implementation For Hearing Impairment
5 pages
Project 2019 3 PL01 KA205 077866
No ratings yet
Project 2019 3 PL01 KA205 077866
4 pages
Akai MPC4000 Service Manual PDF
No ratings yet
Akai MPC4000 Service Manual PDF
54 pages
VLAN Security for Network Admins
No ratings yet
VLAN Security for Network Admins
15 pages
BYD Atto 3
No ratings yet
BYD Atto 3
2 pages
Wa0000.
No ratings yet
Wa0000.
3 pages
Scripting Languages
No ratings yet
Scripting Languages
95 pages
CENTUM VP Graphics Training: Course 7130
No ratings yet
CENTUM VP Graphics Training: Course 7130
2 pages
Weekly Progress Report 17
No ratings yet
Weekly Progress Report 17
2 pages
Geek Squad Setup Guide
No ratings yet
Geek Squad Setup Guide
1 page
Pressure Drop Calcualtion
No ratings yet
Pressure Drop Calcualtion
11 pages

DVA Assignment 1

Uploaded by

DVA Assignment 1

Uploaded by

DVA Assignment 1

Name: Tanya Maheshwari

Feature Primary Data Secondary Data

Purpose Specific, original Previously collected

Accuracy Higher (custom) Variable (dependent on source)

3. Explain Different Ways of Sampling. What Are the Benefits of Sampling?

 Non-Probability Sampling: No known probability of selection.

4. Explain Different Ways of Handling Missing Values

o Column Deletion: Remove features with high missingness.

o Best for small datasets or when few values are missing.

o KNN Imputation: Uses k-nearest neighbors to estimate missing values.

o Regression Imputation: Predict missing values using other features.

Feature Min/Max Normalization Z-score Standardization

Output Range [0, 1] or any defined range Mean = 0, SD = 1

When data needs to be scaled to a When data needs to follow normal

7. Explain Categorization and Segmentation

You might also like