0% found this document useful (0 votes)

50 views36 pages

Data Analytics

Uploaded by

pediwa2592

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

50 views36 pages

Data Analytics

Uploaded by

pediwa2592

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

DATA ANALYTICS

SKILLS BUILD FOR COLLEGES

TITLE
 Definitions
 What is data analytics
 why data analytics is important to
business?
 Data Analytics Tools
 Processes in data analytics
 Data collections
 ETL (Extract Transform LOAD)
 The main four types of data analytics
 Role of a data analyst
 Career opportunities
DATA
ANALYTICS

3
BASIC DEFINITIONS

 Data : Data is a set of values of qualitative or quantitative variables. It is information in raw or unorganized form. It
may be a fact, figure, characters, symbols etc.
 Information: Meaningful or organized data is information.

 Analytics : Analytics is the discovery , interpretation, and communication of meaningful patterns or summery in
data.
 Data Analytics :(DA) is the process of examining data sets in order to draw conclusion about the information it
contains.
 Analytics is not a tool or technology, rather it is the way of thinking and acting on data.
WHAT IS DATA
ANALYTICS?

Data analytics is the process of

analyzing raw data in order to draw
out meaningful, actionable
insights, which are then used to
inform and drive smart business
decisions.
WHY DATA ANALYTICS IS IMPORTANT TO BUSINESS?

 Gain greater insight into target markets

 Enhance decision-making capabilities

 Create targeted strategies and marketing campaigns

 Improve operational inefficiencies and minimize risk

 Identify new product and service opportunities

DATA ANALYTICS TOOLS
 Python – This object-oriented open-source programming language is
used for manipulating, visualizing, and modelling data.
 R – An open-source programming language used in numerical and
statistical analysis.
 Tableau – This helps in creating several kinds of visualizations for
presenting insights and trends in a better way.
 Power BI – This is a business intelligence tool that supports multiple data
sources, helps in asking questions and getting immediate insights.
 SAS – This statistical analysis software helps in performing analytics,
visualizing data, writing SQL queries, performing statistical analysis, and
building ML models.
PROCESSES IN DATA ANALYTICS

The data analytics practice encompasses many separate processes, which can comprise a data pipeline:
 Collecting and ingesting the data
 Categorizing the data into structured/unstructured forms, which might also define next actions
 Managing the data, usually in databases, data lakes, and/or data warehouses
 Storing the data in hot, warm, or cold storage
 Performing ETL (extract, transform, load)
 Analyzing the data to extract patterns, trends, and insights
 Sharing the data to business users or consumers, often in a dashboard or via specific storage
PRIMARY DATA AND SECONDARY DATA
1. Primary Data Collection:
 Surveys and Questionnaires
 Interviews
 Observations
 Experiments
 Focus Groups
2. Secondary Data Collection:
 Published Sources
Primary data Secondary data  Online Databases
collection involves collection involves  Government and Institutional
the collection of using existing data Records
original data collected by someone  Publicly Available Data
directly from the else for a purpose
source or through different from the  Past Research Studies
direct interaction original intent.
with the
respondents.
ETL (EXTRACT TRANSFORM LOAD)

 Extract: Retrieve data from various sources, such

as databases, files, or APIs.
 Transform: Clean, filter, and manipulate data to
ensure consistency and prepare it for analysis.
 Load: Store the transformed data into a target
system or data warehouse for easy access and
analysis.
Types of data analytics?
DIAGNOSTIC ANALYTICS

 Definition: Diagnostic analytics aims to determine the root causes and reasons behind certain events or trends
observed in the data.
 Key Characteristics: Involves data exploration, drill-down analysis, and correlation identification. Diagnostic analytics
answers the question of "why did it happen."
 Examples: Data mining techniques, regression analysis, cohort analysis.
DESCRIPTIVE ANALYTICS

 Definition: Descriptive analytics focuses on summarizing historical data to gain insights into past events and
understand the current state.
 Key Characteristics: Involves data aggregation, visualization, and reporting. Descriptive analytics answers the
questions of "what happened" and "what is happening."
 Examples: Bar charts, line graphs, dashboards displaying key performance indicators (KPIs).
PREDICTIVE ANALYTICS

 Definition: Predictive analytics leverages historical data to make predictions about future outcomes or events.

 Key Characteristics: Involves statistical modeling, machine learning algorithms, and pattern recognition. Predictive
analytics answers the question of "what is likely to happen."
 Examples: Forecasting models, time series analysis, classification algorithms.
PRESCRIPTIVE ANALYTICS

 Definition: Prescriptive analytics recommends the best course of action based on predictive models, optimization
techniques, and business rules.
 Key Characteristics: Involves simulation, optimization algorithms, and decision support systems. Prescriptive
analytics answers the question of "what should be done."
 Examples: Optimization models, simulation tools, decision support systems.
ROLE OF A DATA ANALYST

 A data analyst role is to answer specific questions or address particular challenges that have already been
identified and are known to the business.
 To do this, they examine large datasets with the goal of identifying trends and patterns. They then “visualize” their
findings in the form of charts, graphs, and dashboards.
CAREER
OPPORTUNITIES
 1. Data Scientist
 2. Business Intelligence
Analyst
 3. Data Engineer
 4. Business Analyst
 5. Marketing Analytics
Manager
 6. Financial Analyst
 7. Quantitative Analyst
 8. Risk Analyst
 9. Data Governance Analyst
 10. Data Visualization
Engineer
Steps involved in data analytics.

Gather the required dataset

Understand the dataset

Clean the dataset

Do the necessary statistical analysis

Plot the necessary visualizations to draw out

meaningful, actionable insights from the data.
ABOUT ANACONDA NAVIGATOR
Platforms that we are going to use.

Google Colab Jupyter Notebook

Visual studio code

INTRODUCTION TO PANDAS

 Pandas is an open-source, BSD-licensed Python

library providing high-performance, easy-to-use
data structures and data analysis tools for the
Python programming language.
 Pandas is built on top of NumPy library.
 Pandas is well suited for many different of data

Image Source: https://realpython.com/pandas-dataframe/

FEATURES OF PANDAS

Image Source: https://data-flair.training/blogs/python-pandas-features/ 22

MOST USED FUNCTIONS IN PANDAS

read_csv() head() /head(n) describe() memory_usage() astype()

loc[:] to_datetime() value_counts() drop_duplicates() groupby()

merge() sort_values() fillna()

23
CORE COMPONENTS OF PANDAS :
SERIES AND DATA FRAME

https://www.learndatasci.com/tutorials/python-pandas-tutorial-complete-introduction-for-beginners/ 24
FILE HANDLING WITH PANDAS

https://pandas.pydata.org/docs/getting_started/intro_tutorials/02_read_write.html
SAMPLE READING AND WRITING .CSV FILE
import pandas as pd import pandas as pd
# Create a dataframe # Read csv file
raw_data = {'first_name': df = pd.read_csv(r'D:\Python\Tutorial\Example1.csv')
['Sam','Ziva','Kia','Robin'], df
'degree':
['PhD','MBA','','MS'],
'age': [25, 29, 19, 21]}
df = pd.DataFrame(raw_data)
df
#Save the dataframe
df.to_csv(r'Example1.csv')

Output

26
EXPLORING A DATASET USING PANDAS
Download the dataset from: https://drive.google.com/file/d/1q7qK03njlzZRQ7PyYoprn12uVnE1gjH6/view
Import pandas as pd
data_1 = pd.read_csv(r‘<datasetpath>')
data_1.head(6)
data_1.describe()
data_1.memory_usage(deep=True)
data_1['Gender'] = data_1.Gender.astype('category')
data_1.loc[0:4, ['Name', 'Age', 'State']]
data_1['DOB'] = pd.to_datetime(data_1['DOB'])
data_1['State'].value_counts()
data_1.drop_duplicates(inplace=True)
data_1.groupby(by='State').Salary.mean()
data_1.sort_values(by='Name', inplace=True)
data_1['City temp'].fillna(38.5, inplace=True)

Ref: https://www.analyticsvidhya.com/blog/2021/05/pandas-functions-13-most-important/ 27
Convert list into series
of elements

# convert element lists into series of elements, which have index from 0—5
import pandas as pd
my_data=[10,20,30,40,50]
pd.Series(data=my_data)
Convert dictionary into
series of elements
import numpy as np
import pandas as pd
d={'a':10,'b':20,'c':30,'d':40}
#dictionary keys act as index and values with every key act as series
values
pd.Series(d)

28
DATA MANIPULATION: DROP MISSING ELEMENTS

import pandas as pd
import numpy as np
d={'A':[1,2,np.NaN], 'B':[1,np.NaN,np.NaN],'C':[1,2,3]}
# np.NaN is the missing element in DataFrame
df=pd.DataFrame(d) dictionary will get converted in to dataframe
df.dropna() #pandas would drop any row with missing value
df.dropna(axis=1) #drop column with NULL value

29
DATA MANIPULATION: FILLING SUITABLE VALUE

df.fillna(value='FILL VALUE') #NaN is replaced by value=FILL VALUE

df['A'].fillna(value=df['A'].mean())

#Select column "A" and fill the missing value with mean value of the column A

df['A’].fillna(value=df['A’].std())

#Select column "A" and fill the missing value with standard deviation value of the column A

30
REPLACING A VALUE

import pandas as pd
df = pd.DataFrame({'one':[10,20,30,40,50,2000], 'two':[1000,0,30,40,50,60]})
print df.replace({1000:10,2000:60})

31
GROUPBY() FUNCTION

data = {'Company’: [ ‘CompA’, ‘CompA’, ‘CompB’, ‘CompB’, ‘CompC’, ‘CompC’],

'Person’: [‘Rajesh’, ‘Pradeep’, ‘Amit’, ‘Rakesh’, ‘Suresh’, ‘Raj’],
'Sales’: [200, 120, 340, 124, 243, 350]}
df=pd.DataFrame(data)
df
comp=df.groupby("Company").mean()
comp
comp1=df.groupby("Company") #grouping done using label name “Company”
comp1.std() #apply standard deviation on grouped data

32
FINDING MAXIMUM VALUE IN EACH LABEL

data = {'Company’: [ ‘CompA’, ‘CompA’, ‘CompB’, ‘CompB’, ‘CompC’, ‘CompC’],

'Person’: [‘Rajesh’, ‘Pradeep’, ‘Amit’, ‘Rakesh’, ‘Suresh’, ‘Raj’],
'Sales’: [200, 120, 340, 124, 243, 350]}
df=pd.DataFrame(data)
df
df.groupby("Company").max()

33
FINDING UNIQUE VALUE & NUMBER OF OCCURRENCE FROM
DATAFRAME

df = pd.DataFrame({'col1':[1,2,3,4],'col2':[444,555,666,444],'col3':['abc','def','ghi','xyz’]})
# col1, col2 & col3 are column labels, each column have their own values
df['col2'].unique() #fetches the unique values available in column
df['col2'].value_counts() # count number of occurrence of every value

34
STATISTICAL FUNCTIONS
import numpy as np

import pandas as pd
s = pd.Series([1,2,3,4,5,4])
print s.pct_change()
df = pd.DataFrame(np.random.randn(5, 2))
print df.pct_change()
s1 = pd.Series(np.random.randn(10))
s2 = pd.Series(np.random.randn(10))
print s1.cov(s2) import numpy as np
frame = pd.DataFrame(np.random.randn(10, 5), columns=['a', 'b', 'c', 'd', 'e'])
print frame['a'].corr(frame['b'])
print frame.corr()
s = pd.Series(np.random.randn(5), index=list('abcde'))
s['d'] = s['b'] # so there's a tie
print s
print s.rank()

Introduction To Data Analytics
No ratings yet
Introduction To Data Analytics
19 pages
Unit 1 Topic 1 Intro
100% (1)
Unit 1 Topic 1 Intro
30 pages
CH 1
No ratings yet
CH 1
31 pages
Unit 1
No ratings yet
Unit 1
21 pages
2.1 Data Analytics
No ratings yet
2.1 Data Analytics
16 pages
DA Notes-Unit 1
No ratings yet
DA Notes-Unit 1
11 pages
Big - Data Unit-2
100% (2)
Big - Data Unit-2
64 pages
Unit 2 DS
No ratings yet
Unit 2 DS
30 pages
Intro To Data Analytics
No ratings yet
Intro To Data Analytics
42 pages
Introduction To Data Science and Data Analytics
No ratings yet
Introduction To Data Science and Data Analytics
85 pages
Aa MDM MST
No ratings yet
Aa MDM MST
8 pages
Data-Analysis-Chapter 1-Compressed
No ratings yet
Data-Analysis-Chapter 1-Compressed
20 pages
Week 1
No ratings yet
Week 1
50 pages
Introduction To Data Analysis
No ratings yet
Introduction To Data Analysis
24 pages
Types of Data Analytics Explained
No ratings yet
Types of Data Analytics Explained
36 pages
ITGY403 Lesson 1
No ratings yet
ITGY403 Lesson 1
16 pages
AA THeory and Methods
No ratings yet
AA THeory and Methods
40 pages
Fundamental of Business Analytics Notes
No ratings yet
Fundamental of Business Analytics Notes
95 pages
Data Analytics With Python Lecture 1
No ratings yet
Data Analytics With Python Lecture 1
23 pages
Cami16 Data Analytics
No ratings yet
Cami16 Data Analytics
37 pages
Unit II
No ratings yet
Unit II
91 pages
Data Analyst Complete Notes
No ratings yet
Data Analyst Complete Notes
34 pages
Unit 2 - Data Science
No ratings yet
Unit 2 - Data Science
37 pages
Unit 1
No ratings yet
Unit 1
57 pages
Big Data Day II
No ratings yet
Big Data Day II
38 pages
Da Mod2
No ratings yet
Da Mod2
88 pages
What Is Data Analytics
No ratings yet
What Is Data Analytics
44 pages
Unit2 Da
No ratings yet
Unit2 Da
7 pages
Data Analytics Unit1
No ratings yet
Data Analytics Unit1
24 pages
Data & Data Analytics
No ratings yet
Data & Data Analytics
15 pages
UNIT-1 Data Analytics
No ratings yet
UNIT-1 Data Analytics
37 pages
Introduction To Data Analytics
No ratings yet
Introduction To Data Analytics
16 pages
Lec 1
No ratings yet
Lec 1
27 pages
1 Introduction To Data Analytics
No ratings yet
1 Introduction To Data Analytics
14 pages
Data Analytics for Professionals
No ratings yet
Data Analytics for Professionals
5 pages
Unit I (Notes 2)
No ratings yet
Unit I (Notes 2)
16 pages
Unit 1
No ratings yet
Unit 1
54 pages
Data Analytics Introduction
No ratings yet
Data Analytics Introduction
22 pages
Data Analytics-Wps Office
No ratings yet
Data Analytics-Wps Office
21 pages
What Is Data Analytics
No ratings yet
What Is Data Analytics
6 pages
FDS-Unit II-ECE
No ratings yet
FDS-Unit II-ECE
22 pages
Introduction To Data Analysis
100% (1)
Introduction To Data Analysis
94 pages
Data Analytics: Process and Types
No ratings yet
Data Analytics: Process and Types
81 pages
Chapter 1 DA
No ratings yet
Chapter 1 DA
73 pages
PTDLKT
No ratings yet
PTDLKT
11 pages
Data Analytics IA-1 IMP
No ratings yet
Data Analytics IA-1 IMP
9 pages
Unit-3 DS
No ratings yet
Unit-3 DS
21 pages
Business Analytics with Excel Guide
No ratings yet
Business Analytics with Excel Guide
56 pages
Fundamentals of Data Analytics
No ratings yet
Fundamentals of Data Analytics
39 pages
Advanced Data Analytics and Visualization Course Material
No ratings yet
Advanced Data Analytics and Visualization Course Material
45 pages
Unit 1 - Introduction (Data Analytics and Big Data) - 60515294 - 2025 - 05 - 15 - 17 - 42
No ratings yet
Unit 1 - Introduction (Data Analytics and Big Data) - 60515294 - 2025 - 05 - 15 - 17 - 42
25 pages
Data Analysis
No ratings yet
Data Analysis
36 pages
Data Analysis CheatSheet
No ratings yet
Data Analysis CheatSheet
34 pages
SHABEELEY MULTI PURPOSE and Cultural
No ratings yet
SHABEELEY MULTI PURPOSE and Cultural
6 pages
Zudio
No ratings yet
Zudio
3 pages
STBP Exploit Futures Propfirms
100% (2)
STBP Exploit Futures Propfirms
7 pages
Test Strategy Sample
100% (1)
Test Strategy Sample
23 pages
Tax Guide For Small Business: (For Individuals Who Use Schedule C)
No ratings yet
Tax Guide For Small Business: (For Individuals Who Use Schedule C)
53 pages
Roles of An Accounting Officer
No ratings yet
Roles of An Accounting Officer
2 pages
Case Study 1 HR Analytics
No ratings yet
Case Study 1 HR Analytics
3 pages
Hartner Machine Taps - 2019
No ratings yet
Hartner Machine Taps - 2019
112 pages
Chinese Standards On Steel Materials-1
100% (2)
Chinese Standards On Steel Materials-1
3 pages
Nike Shoes Project
100% (1)
Nike Shoes Project
20 pages
Unit-III BME-427 Project Management
No ratings yet
Unit-III BME-427 Project Management
57 pages
Neha Agarwal BA Banking-Finance
No ratings yet
Neha Agarwal BA Banking-Finance
4 pages
SL Workday Integration Guide
100% (1)
SL Workday Integration Guide
16 pages
Akuntansi Biaya - Tugas E4-23
No ratings yet
Akuntansi Biaya - Tugas E4-23
3 pages
Acko Report
No ratings yet
Acko Report
34 pages
DBMS Full QuestionPaper Solutions
No ratings yet
DBMS Full QuestionPaper Solutions
4 pages
(Locating The Distance of Automobiles Using GPS) SRS
No ratings yet
(Locating The Distance of Automobiles Using GPS) SRS
14 pages
2020 05 Product Owner Exam 03 PDF
No ratings yet
2020 05 Product Owner Exam 03 PDF
5 pages
Trading
100% (2)
Trading
4 pages
Literature Review On Digital Jewelry
100% (3)
Literature Review On Digital Jewelry
6 pages
Effects of Globalisation On Value Chain of Fast-Moving Consumer Goods Companies
No ratings yet
Effects of Globalisation On Value Chain of Fast-Moving Consumer Goods Companies
57 pages
External Affairs Manager - Sample JD
No ratings yet
External Affairs Manager - Sample JD
2 pages
Kritika Project File PDF
No ratings yet
Kritika Project File PDF
14 pages
UNIT - V - Welfare State
No ratings yet
UNIT - V - Welfare State
14 pages
Questionnaire For 2024-25.
No ratings yet
Questionnaire For 2024-25.
3 pages
What's The Point of Math Dorling Kindersley PDF Download
100% (2)
What's The Point of Math Dorling Kindersley PDF Download
19 pages
Corporate Finance Analysis PAUL Bakery Full
No ratings yet
Corporate Finance Analysis PAUL Bakery Full
8 pages
EPE Past Paper - Mock Exam
No ratings yet
EPE Past Paper - Mock Exam
9 pages
A Study On The Role of Cooperative Banks in Promoting Joint Liability 2025 NEW
No ratings yet
A Study On The Role of Cooperative Banks in Promoting Joint Liability 2025 NEW
85 pages
Flagship Marketing Concepts and Places 1st Edition Tony Kent Instant Download
100% (2)
Flagship Marketing Concepts and Places 1st Edition Tony Kent Instant Download
59 pages

Data Analytics

Uploaded by

Data Analytics

Uploaded by

DATA ANALYTICS

SKILLS BUILD FOR COLLEGES

Data analytics is the process of

 Gain greater insight into target markets

 Enhance decision-making capabilities

 Create targeted strategies and marketing campaigns

 Improve operational inefficiencies and minimize risk

 Identify new product and service opportunities

 Extract: Retrieve data from various sources, such

Gather the required dataset

Understand the dataset

Clean the dataset

Do the necessary statistical analysis

Plot the necessary visualizations to draw out

Google Colab Jupyter Notebook

Visual studio code

 Pandas is an open-source, BSD-licensed Python

Image Source: https://realpython.com/pandas-dataframe/

Image Source: https://data-flair.training/blogs/python-pandas-features/ 22

read_csv() head() /head(n) describe() memory_usage() astype()

loc[:] to_datetime() value_counts() drop_duplicates() groupby()

merge() sort_values() fillna()

df.fillna(value='FILL VALUE') #NaN is replaced by value=FILL VALUE

data = {'Company’: [ ‘CompA’, ‘CompA’, ‘CompB’, ‘CompB’, ‘CompC’, ‘CompC’],

data = {'Company’: [ ‘CompA’, ‘CompA’, ‘CompB’, ‘CompB’, ‘CompC’, ‘CompC’],

You might also like