0% found this document useful (0 votes)
15 views83 pages

Module 1 - Lesson 1

Uploaded by

ad3095144
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views83 pages

Module 1 - Lesson 1

Uploaded by

ad3095144
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 83

Bachelor of Science (Honours) in Data Science and Artificial Intelligence

DA 102 Data Analysis Basics


Introduction
Learning Objectives

01 Know various types of data

02 Issues associated with data

03 Understand broad group of analytics


04 Understand what is descriptive analytics

05 Understand what is predictive analytics

06 Tools used to perform analytics

3
4
About Data
Data

● Definition: collection of information, facts, or values

● The data is used as basis for computation, analysis and decision making.

● Data can be in the form of numbers, text, images, audio, video.

● Data is the raw material that is processed, organized, interpreted to extract meaning
and generate insights.

● Data is a fundamental concept that plays a crucial role in various fields such as
science, business, and research.

● Data is considered as the key in this era of digital age

6
Examples of data

7
Examples of data

8
Examples of data

9
About Data
Data 01

Numbers Quantitative Data Boolean Values


Qualitative Data
Quantity: 10 {"Un-married", {1 = Poor, 2 = Fair, 3 = {True, False}
"Married", Divorced"} Good, 4 = Very Good,
Price: 347.73 5 = Excellent}

{"Student", "Faculty", {AA = 10, AB = 9, BB =


"Staff"} 8, …, DD = 4, F = 0}

11
Data 02

Temporal Data Text Data Mixed data

Date: 15-Aug-2023 Sequence of characters such as Combination of any of


words, sentences or paragraphs these data types
Time: 11:30 AM

Time Stamp:
11:30 AM, 15-Aug-2023

12
Data 03

Audio Image Data Video data

An encoded file Data captured through cameras Combination several


corresponding to a song images and an
Data captured through associated audio data
screenshots in computers

13
14
Data – stored in spread sheet

15
Data – stored in a note pad

16
17
Real valued data
Analysis - example 01
• To buy the book titled "The Linux Programming Interface" by Michale Kerrisk from
online website

• The objective is to buy the above book at a website which offers best price.

• To achieve this objective, collect prices from various ecommerce websites

When data is real valued


19
Analysis – example 01

When data is real valued


20
Analysis – example 01

When data is real valued


21
Analysis – example 01

When data is real valued


22
Analysis – example 01

When data is real valued


23
Analysis – example 01

When data is real valued


24
Analysis – example 01

Website Price Additional information

www.amazon.in 5594 In stock

www.flipkart.com 5599 In stock

books.rediff.com 5378 Out of stock

bookswagon.com 7738 Out of stock

www.snapdeal.com Not available Not available

When data is real valued


25
Analysis – example 01

Website Price Additional • Prices are real valued


information numbers
www.amazon.in 5594 In stock
• Price at snapdeal.com is not
available due to
www.flipkart.com 5599 In stock unavailability of product

books.rediff.com 5378 Out of stock • Computing lowest price


from the second column of
the table is a challenging
bookswagon.com 7738 Out of stock task as price information in
unavailable for all websites
www.snapdeal.co Not available Not available
m
When data is real valued
26
Analysis – example 01

Website Price Additional • The price offered


information by snapdeal is made 0 to
make price a real valued
www.amazon.in 5594 In stock
number.

www.flipkart.com 5599 In stock • However, this change leads


to factually incorrect
books.rediff.com 5378 Out of stock information that the lowest
price is offered snapdeal.
bookswagon.com 7738 Out of stock • Handling "not
available" values is very
www.snapdeal.co 0 Not available important in data analysis
m otherwise, decisions swing
significantly resulting in
When data is real valued errors
27
Analysis – example 01

• Lowest price is offered by


books.rediff.com
Website Price Additional
information • Additional information tells
www.amazon.in 5594 In stock us that though this is the
best price offered, the book
www.flipkart.com 5599 In stock is out of stock.

• We therefore must search


books.rediff.com 5378 Out of stock for second lowest price.

bookswagon.com 7738 Out of stock • Sort price in descending


order and pick the second
element in sorted list.
www.snapdeal.co Not available Not available
m • We understand that Amazon
When data is real valued Offers the second lowest
price. 28
Analysis – example 01

Website Price Additional • Sampling – 5 websites are


information visited of many e-commerce
websites.
www.amazon.in 5594 In stock
• Computing minimum price
www.flipkart.com 5599 In stock value lead to decision
making
books.rediff.com 5378 Out of stock
• Is this example too easy?
Well the fact is:
bookswagon.com 7738 Out of stock
• 9 out of 10 consumers price
www.snapdeal.co Not available Not available check a product on Amazon
m
• https://www.bigcommerce.c
When data is real valued om/blog/amazon-statistics/
29
30
Quantitative data
Analysis – example 02
• Visit www.amazon.com
• Search for any product and
access associated product
description page.
• Apart from book description,
we get to see ratings given
by users
• The ratings are quantitative
values such as

When data is Quantitative


https://www.vecteezy.com/ 32
Analysis – example 02
• A total of 1043 users gave
ratings
• 75% of users gave 5 stars
• 17% of users gave 4 stars
and so on
• The ordered values are
presented as histogram
visualization
• Average rating is computed
and presented as 4.6 out of 5
stars

When data is Quantitative


33
Analysis – example 02

User 1 5 stars • Analysis: Computed the


following
User 2 5 stars • Number of users who gave 5
stars
User 3 4 stars • Number of users who gave 4
User 4 1 star starts
• ...
… … • Number of users who gave 1
star
User 1043 5 stars • Average rating across all
users

When data is Quantitative


34
Analysis – example 02

• Product ratings influence in conversions from


viewing to buying
• A correlation between ratings and conversion has
been observed – a product with 3.7 rating has 15%
more click through rate (CTR)
• Analysis involving only ratings

When data is Quantitative


35
Analysis – example 02

User 1 5 stars
• From the data perspective
User 2 5 stars • The rating data may be visualized to be stored as
shown
• When users visits product page, average rating and
User 3 4 stars
histograms are generated from this kind of data.
• Analytics play a pivotal role in transforming data and
User 4 1 star influencing decisions

… …

User 1043 5 stars

When data is Quantitative


36
37
Qualitative data
Analysis – example 03
• Visit www.youtube.com
• Search for any video
• Apart from the video, we get
to see how many users
{liked, disliked} the video
• More likes of a video has
strong correlation to
increase in revenue

https://www.vecteezy.com/

When data is Qualitative


39
Analysis – example 03
• Visit www.youtube.com
• Search for any video
• Apart from the video, we get
to see how many users
{liked, disliked} the video
• More likes of a video has
strong correlation to
increase in revenue

https://www.vecteezy.com/

When data is Qualitative


40
41
Text data
Analysis – example 04
• Visit www.amazon.com
• Search for any product and
access associated product
description page.
• Apart from book description,
we get to read detailed
product review given by
customers who bought the
product
• This information is in the text
form.

43
Analysis – example 04
• Analyze the contents for
finding out the sentiment of
the customer

• In the analytics terms,


perform sentiment analysis
of customer reviews

• If the complete review is in


positive sentiment, then the
it suggests that customer is
happy with the product.

44
45
A computer program
• A computer program is data

• Collection of such programs

• Collection of executables

46
Collection of programs
• A computer program is data

• Collection of such programs

• Collection of executables

47
Collection of executables
• A computer program is data

• Collection of such programs

• Collection of executables

48
49
Issues in Data Collection
Bias and Sampling Issues
• Sampling Bias - When the sample collected does not represent the entire
population

• Selection Bias - When participants are not selected randomly, leading to skewed or
inaccurate results

• Nonresponse Bias - When a significant portion of the selected participants does not
respond, leading to a biased sample

51
Measurement Issues
• Measurement Error: Inaccuracies or inconsistencies in measurement
instruments, leading to incorrect or imprecise data.

• Subjective Measurements: When data is collected through subjective methods like


surveys, individuals might respond based on personal biases or interpretations.

• Social Desirability Bias: Participants may provide responses they believe are
socially acceptable rather than their true opinions or behaviors

52
Data Quality and Integrity Issues
• Data Integrity Issues

• Missing values

• A data record that is not complete.

• Data entry errors.

• Duplications in data.

53
Ethical and Privacy Concerns
• Informed Consent: Obtaining proper informed consent from participants, especially
in sensitive or intrusive research.

• Privacy Protection: Ensuring that personally identifiable information is not leaked


or misused.

• Anonymity: Striking a balance between collecting meaningful data and


protecting participants' identities

54
Technological Challenges
• Data Security: Protecting data from breaches, leaks, or unauthorized access.

• Data Storage: Ensuring proper and secure storage of collected data.

• Technical Glitches: Issues with data collection tools, software, or hardware can
affect data quality.

55
Cultural and Language Related Challenges
• Cultural Bias: Data collection instruments might be biased towards a particular
culture or group.

• Translation Issues: Translating surveys or questionnaires into different languages


can lead to discrepancies in meanings.

56
Resource Limitations
• Financial Constraints: Adequate data collection might require funding for tools,
personnel, and infrastructure.

• Time Constraints: Rushed data collection might lead to errors or incomplete data.

57
Types of analytics
Analytics components
• A business context in order • Pregnant women are likely to be price-
to take up data analytics. insensitive.

• Two examples to support the • Their willingness to spend more is the potential
business context. for business.

• Baby products related market share is 38 billion


dollar.

• Given a female customer predict if the customer


is pregnant or not.

• In the business context, if the prediction is


accurate, offer pregnant women the customized
products and services.

59
Analytics components
• A business context in order • In competitive markets such as
to take up data analytics. telecommunication industry, customers abruptly
leave the services of the service provide.
• Two examples to support the
business context. • Predicting which customer leaves the service
before hand help address core concerns of the
customers.

• In the business context, if the prediction is


accurate, offer discounts to retain
such customers.

60
Analytics types

• Descriptive analytics

• Predictive analytics

61
Descriptive analytics
Descriptive analytics
• In descriptive analytics the focus is on summarizing and interpreting historical data

• It gives insights into past events and trends

• Strives to provide a clear and concise representation of data in order to understand


the data and aid in decision making

• Summarizing large volumes of data into manageable and interpretable forms

• Identify patterns and trends present in the historical data

• Visualize data as visualization make it easier to communicate complex data

63
Descriptive analytics - example
• An e-commerce company want to analyze the sales data for a particular product
over the past year.

• It has data describing each sale such as: date of the sale, the product sold, the
quantity, and the revenue generated.

64
Descriptive analytics - example
• Descriptive analytics include

• Summarization: Computing average revenue for each product

• Summarization: What is the total revenue of each product

• Visualization: Plot sales of product A over timeline

• Visualization: Plot Quantity sold every month for product A

• Determine top selling products based on total revenue generated by each product

• Customer segmentation: grouping customers based on purchase frequency,


total spending by customer, identify high value and low value customers

• Peak sales times: Identify days of the week when sales are highest

65
66
Predictive analytics
Predictive analytics
• In predictive analytics the focus is on finding future trends given historical data and
current data

• The goal is to build models that can learn from historical data.

• Use the learning to make predictions about future events (data).

• The models are trained on known outcomes and patterns.

• The trained models are employed to predict new or unseen data.

68
Example - customer churn
Example - customer churn
• Percentage of customers who stopped using the product or service

• In competitive marketplace identifying potential customer who would discontinue


their services is key.

• If known beforehand the intent of the customer that he/she would leave the
services company may initiate retention plan. Which may include offering a coupon
or providing discount for the previous three months service etc.

• The main challenge is no customer will explicitly state the reason leaving the
services

• The reasons are to be understood from multiple data sources. Here is an example

70
Example - customer churn

Data from billing department Data from service department


71
Example - customer churn
• In this example a correlation exists between number of complaints raised, resolved,
unresolved and rating.

• Obtain an elaborate data for all active customers for past four months a snapshot
of the same is given below.

72
Customer churn
• When large number of customer data is presented, it is hard to establish relations
between churn and the attributes of the data visually.

• When a relationship is established, the relationship is to be validated. In the


previous example, customer ID 1235 is identified as potential to churn. This must be
validated.

• The outline of predictive analytics is

• Perform data analysis and modeling: Analyze historical data and discover
relationships, correlations, and patterns that can be used to make predictions. In
the customer churn examples, complaints raised, number of unresolved
complaints, rating on resolved complains are the information relationships.

73
74
Data analytics tools
76
Data analytics tools
• Several tools which are used for data analytics. Some of them are listed below

• Microsoft Excel: Widely used for basic data analysis, calculations, and charting.

• Tableau: Known for its powerful data visualization capabilities and interactive
dashboards.

• Power BI: Microsoft's business analytics service for interactive visualizations and
business intelligence.

• QlikView/Qlik Sense: Tools for data visualization, reporting, and business


intelligence.

• R: A programming language and software environment for statistical computing


and graphics.

• Python: A versatile programming language used for data analysis and manipulation
with libraries like pandas and NumPy.
77
Data analytics tools
• SAS (Statistical Analysis System): Offers a suite of software for advanced analytics,
business intelligence, and data management

• SPSS (Statistical Package for Social Sciences): Software for statistical analysis,
used in social science research and data mining.

• MATLAB: A programming language and environment for numerical computing,


often used in engineering and scientific research.

• Stata: A software package used for data analysis and statistical purposes

• JMP: Statistical discovery software often used for exploratory data analysis.

78
DA102 – Microsoft Excel
• This course focus on basics of data analysis and business modeling

• Microsoft Excel software tool is used.

• The choice of this tool stems from the fact that students of this online programme
are in their first semester and are yet to introduce to programming languages.

• For those students who are introduced to the programming languages this course
should enrich them with the orientation of data manipulation through software
systems.

80
DA102 – Detailed contents
• Basic spread sheet modeling

• Understanding range names

• LOOKUP functions

• INDEX function

• MATCH function

• Text manipulation functions

• Time – Dates and date functions

• Conditional statements (IF statement)

81
DA102 – Detailed contents
• Three-dimensional formulas

• Sensitivity analysis

• COUNT family

• SUM family

• OFFSET function

• INDIRECT function

• Data validation

• Filtering and removing duplicates

• Consolidating data
82
DA102 – Detailed contents
• Pivot tables

83

You might also like