0% found this document useful (0 votes)

105 views21 pages

Exploratory Data Analysis Guide

The document discusses importing packages and datasets for exploratory data analysis in Python. It imports the sweetviz package for data visualization and analysis, and loads a banking dataset from Kaggle containing over 32,000 rows and 16 columns. Basic statistics and univariate analysis are performed on the dataset, including checking the number of rows and columns, viewing the data types and counts of unique values for each variable.

Uploaded by

Jardeilson Nascimento

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

105 views21 pages

Exploratory Data Analysis Guide

Uploaded by

Jardeilson Nascimento

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

28/04/2022 12:37 Modulo 4 - EDA.

ipynb - Colaboratory

Data Science com Python

Análise Exploratória de Dados
Prof.: Lucas Roberto Correa

LEMBRETE: Fazer o import dos datasets usados no ambiente do colab antes de executar os
comandos.

Import de pacotes

!pip install sweetviz

Collecting sweetviz

Downloading [Link] (15.1 MB)

|████████████████████████████████| 15.1 MB 2.9 MB/s

Requirement already satisfied: scipy>=1.3.2 in /usr/local/lib/python3.7/dist-packages

Requirement already satisfied: numpy>=1.16.0 in /usr/local/lib/python3.7/dist-package
Requirement already satisfied: matplotlib>=3.1.3 in /usr/local/lib/python3.7/dist-pac
Requirement already satisfied: pandas!=1.0.0,!=1.0.1,!=1.0.2,>=0.25.3 in /usr/local/l
Requirement already satisfied: tqdm>=4.43.0 in /usr/local/lib/python3.7/dist-packages
Requirement already satisfied: jinja2>=2.11.1 in /usr/local/lib/python3.7/dist-packag
Requirement already satisfied: importlib-resources>=1.2.0 in /usr/local/lib/python3.7
Requirement already satisfied: zipp>=3.1.0 in /usr/local/lib/python3.7/dist-packages
Requirement already satisfied: MarkupSafe>=0.23 in /usr/local/lib/python3.7/dist-pack
Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.7/dist-packages
Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.7/dist-pac
Requirement already satisfied: python-dateutil>=2.1 in /usr/local/lib/python3.7/dist-
Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /usr/local
Requirement already satisfied: typing-extensions in /usr/local/lib/python3.7/dist-pac
Requirement already satisfied: pytz>=2017.3 in /usr/local/lib/python3.7/dist-packages
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.7/dist-packages (fr
Installing collected packages: sweetviz

Successfully installed sweetviz-2.1.3

import sweetviz as sv

import pandas as pd

import seaborn as sns

import [Link] as plt

from IPython import display

pd.set_option('display.max_rows', 500)

pd.set_option('display.max_columns', 500)

pd.set_option('display.max_colwidth', 10000)

[Link] 1/21
Jardeilsom do Nascimento Oliveira - blakjd2@[Link] - IP: [Link]
28/04/2022 12:37 Modulo 4 - [Link] - Colaboratory

Import da base

Fonte dos dados: [Link]

select=new_train.csv

metadata = pd.read_excel('[Link]')

metadata

Feature Feature_Type

0 age numeric

type of job ('admin.','blue-collar','entrepreneur','h

1 job Categorical,nominal
employed','services','stude

2 marital categorical,nominal marital status ('divorced','married','single','unknown'; note:

3 education categorical,nominal ('basic.4y','basic.6y','basic.9y','[Link]','illiterate','professiona

4 default categorical,nominal has

5 housing categorical,nominal h

6 loan categorical,nominal h

7 contact categorical,nominal contact co

8 month categorical,ordinal last contact month

9 dayofweek categorical,ordinal last contact da

last contact duration, in seconds . Important note: this attribute

10 duration numeric

11 campaign numeric number of contacts performed during this campaign a

number of days that passed by after the client was last co

12 pdays numeric
mea

13 previous numeric number of contacts performed

df = pd.read_csv('new_train.csv', sep=',')

[Link]()

[Link] 2/21
Jardeilsom do Nascimento Oliveira - blakjd2@[Link] - IP: [Link]
28/04/2022 12:37 Modulo 4 - [Link] - Colaboratory

age job marital education default housing loan contact month

0 49 blue-collar married basic.9y unknown no no cellular nov

1 37 entrepreneur married [Link] no no no telephone nov

# Explorar o output da biblioteca sweetviz em uma outra janela, com análise descritiva e g
2 78 retired married basic.4y no no no cellular jul
report = [Link](df)

3 36 admin. married [Link] no yes no telephone may

report.show_html('[Link]')

4 59 retired divorced [Link] no no no cellular jun

Done! Use 'show' commands to display/save.

Report [Link] was generated! NOTEBOOK/COLAB USERS: the web browser MAY not pop

Estatísticas básicas

# Método 'info' retorna diversas informações relacionadas ao Dataframe, dentre elas número

[Link]()

RangeIndex: 32950 entries, 0 to 32949

Data columns (total 16 columns):

# Column Non-Null Count Dtype

--- ------ -------------- -----

0 age 32950 non-null int64

1 job 32950 non-null object

2 marital 32950 non-null object

3 education 32950 non-null object

4 default 32950 non-null object

5 housing 32950 non-null object

6 loan 32950 non-null object

7 contact 32950 non-null object

8 month 32950 non-null object

9 day_of_week 32950 non-null object

10 duration 32950 non-null int64

11 campaign 32950 non-null int64

12 pdays 32950 non-null int64

13 previous 32950 non-null int64

14 poutcome 32950 non-null object

15 y 32950 non-null object

dtypes: int64(5), object(11)

memory usage: 4.0+ MB

# Número de linhas e colunas do Dataframe

[Link]

(32950, 16)

# Função len (length) para Dataframes retorna o número de linhas

len(df)

[Link] 3/21
Jardeilsom do Nascimento Oliveira - blakjd2@[Link] - IP: [Link]
28/04/2022 12:37 Modulo 4 - [Link] - Colaboratory

32950

# Método nunique retorna os valores únicos para cada variável (análogo ao "remover duplica

[Link]()

age 75

job 12

marital 4

education 8

default 3

housing 3

loan 3

contact 2

month 10

day_of_week 5

duration 1467

campaign 40

pdays 27

previous 8

poutcome 3

y 2

dtype: int64

Análise Univariada

# Retornar as 5 primeiras linhas do Dataframe (5 é o default, é possível alterar esse núme

df['age'].head()

0 49

1 37

2 78

3 36

4 59

Name: age, dtype: int64

# Retornar as 5 últimas linhas do Dataframe (mesmo default do 'head')

df['age'].tail()

32945 28

32946 52

32947 54

32948 29

32949 35

Name: age, dtype: int64

# Soma de todos os valores de uma coluna (no caso, coluna "age")

df['age'].sum()

[Link] 4/21
Jardeilsom do Nascimento Oliveira - blakjd2@[Link] - IP: [Link]
28/04/2022 12:37 Modulo 4 - [Link] - Colaboratory

1318465

# Valor mínimo observado para determinada coluna

df['age'].min()

# Valor médio

df['age'].mean()

40.01411229135053

# Valor máximo

df['age'].max()

# Boxplot dos dados referentes à coluna "Age". É possível observar onde estão dispostos os

[Link](x=df["age"])

<[Link]._subplots.AxesSubplot at 0x7f9a334c6050>

# O histograma também facilita a visualização da distribuição dos dados, fundamental na es

[Link](df['age'], 50, facecolor='b')

[Link]()

[Link] 5/21
Jardeilsom do Nascimento Oliveira - blakjd2@[Link] - IP: [Link]
28/04/2022 12:37 Modulo 4 - [Link] - Colaboratory

Medidas descritivas básicas

[Link](include='int64')

age duration campaign pdays previous

count 32950.000000 32950.000000 32950.000000 32950.000000 32950.000000

mean 40.014112 258.127466 2.560607 962.052413 0.174719

std 10.403636 258.975917 2.752326 187.951096 0.499025

min 17.000000 0.000000 1.000000 0.000000 0.000000

25% 32.000000 103.000000 1.000000 999.000000 0.000000

50% 38.000000 180.000000 2.000000 999.000000 0.000000

75% 47.000000 319.000000 3.000000 999.000000 0.000000

max 98.000000 4918.000000 56.000000 999.000000 7.000000

[Link](include='object')

job marital education default housing loan contact month day_

count 32950 32950 32950 32950 32950 32950 32950 32950

unique 12 4 8 3 3 3 2 10

top admin. married [Link] no yes no cellular may

freq 8314 19953 9736 26007 17254 27131 20908 11011

Análise de missings

[Link] 6/21
Jardeilsom do Nascimento Oliveira - blakjd2@[Link] - IP: [Link]
28/04/2022 12:37 Modulo 4 - [Link] - Colaboratory

[Link]().sum()

age 0

job 0

marital 0

education 0

default 0

housing 0

loan 0

contact 0

month 0

day_of_week 0

duration 0

campaign 0

pdays 0

previous 0

poutcome 0

y 0

dtype: int64

Tabela de Frequencia

df['poutcome'].value_counts()

nonexistent 28416

failure 3429

success 1105

Name: poutcome, dtype: int64

df['contact'].value_counts()

cellular 20908

telephone 12042

Name: contact, dtype: int64

df['age'].value_counts().hist()

[Link] 7/21
Jardeilsom do Nascimento Oliveira - blakjd2@[Link] - IP: [Link]
28/04/2022 12:37 Modulo 4 - [Link] - Colaboratory

<[Link]._subplots.AxesSubplot at 0x7f9a31f76490>

prev_y = [Link](index=df["previous"], columns=df["y"],margins=True)

prev_y

y no yes All

1 2889 784 3673

2 324 282 606

3 74 101 175

4 29 31 60

5 4 10 14

6 2 3 5

7 1 0 1

All 29238 3712 32950

job_y = [Link](index=df["job"], columns=df["y"],margins=True)

job_y

y no yes All

job

admin. 7244 1070 8314

blue-collar 6926 515 7441

entrepreneur 1060 100 1160

housemaid 769 86 855

management 2076 269 2345

retired 1018 348 1366

self-employed 980 119 1099

services 2942 254 3196

student 494 217 711

technician 4815 585 5400

unemployed 682 116 798

unknown 232 33 265

All 29238 3712 32950

[Link] 8/21
Jardeilsom do Nascimento Oliveira - blakjd2@[Link] - IP: [Link]
28/04/2022 12:37 Modulo 4 - [Link] - Colaboratory

Histograma

[Link]

age int64

job object

marital object

education object

default object

housing object

loan object

contact object

month object

day_of_week object

duration int64

campaign int64

pdays int64

previous int64

poutcome object

y object

dtype: object

[Link](data=df, x="pdays")

<[Link]._subplots.AxesSubplot at 0x7f9a31ea48d0>

[Link](data=df, x="duration")

[Link] 9/21
Jardeilsom do Nascimento Oliveira - blakjd2@[Link] - IP: [Link]
28/04/2022 12:37 Modulo 4 - [Link] - Colaboratory

<[Link]._subplots.AxesSubplot at 0x7f9a31e146d0>

df['duration'].describe()

count 32950.000000

mean 258.127466

std 258.975917

min 0.000000

25% 103.000000

50% 180.000000

75% 319.000000

max 4918.000000

Name: duration, dtype: float64

df['duration'].median()

180.0

df['duration'].mode()

0 90

dtype: int64

[Link](data=df, x="campaign")

[Link] 10/21
Jardeilsom do Nascimento Oliveira - blakjd2@[Link] - IP: [Link]
28/04/2022 12:37 Modulo 4 - [Link] - Colaboratory

<[Link]._subplots.AxesSubplot at 0x7f9a33900790>

Boxplot

[Link](x=df["campaign"])

<[Link]._subplots.AxesSubplot at 0x7f9a338915d0>

df['campaign'].value_counts()

1 14121

2 8469

3 4300

4 2116

5 1255

6 773

7 493

8 329

9 220

10 187

11 142

12 92

13 74

14 52

17 51

15 45

16 42

18 27

20 22

21 20

19 16

[Link] 11/21
Jardeilsom do Nascimento Oliveira - blakjd2@[Link] - IP: [Link]
28/04/2022 12:37 Modulo 4 - [Link] - Colaboratory

22 13

24 12

23 12

27 9

25 8

26 7

31 7

29 7

28 6

30 6

35 4

33 3

43 2

32 2

42 2

34 1

37 1

40 1

56 1

Name: campaign, dtype: int64

[Link]("[Link]")

Grafico de Dispersão

[Link]

age int64

job object

marital object

education object

default object

housing object

loan object

contact object

month object

day_of_week object

duration int64

[Link] 12/21
Jardeilsom do Nascimento Oliveira - blakjd2@[Link] - IP: [Link]
28/04/2022 12:37 Modulo 4 - [Link] - Colaboratory

campaign int64

pdays int64

previous int64

poutcome object

y object

dtype: object

[Link](data=df, x="campaign", y="duration")

<[Link]._subplots.AxesSubplot at 0x7f9a2ddfa950>

[Link](data=df, x="pdays", y="duration")

<[Link]._subplots.AxesSubplot at 0x7f9a2dd7c110>

[Link] 13/21
Jardeilsom do Nascimento Oliveira - blakjd2@[Link] - IP: [Link]
28/04/2022 12:37 Modulo 4 - [Link] - Colaboratory

Correlações

[Link]()

age duration campaign pdays previous

age 1.000000 -0.001841 0.003302 -0.032011 0.020670

duration -0.001841 1.000000 -0.075663 -0.047127 0.022538

campaign 0.003302 -0.075663 1.000000 0.053795 -0.079051

pdays -0.032011 -0.047127 0.053795 1.000000 -0.589601

previous 0.020670 0.022538 -0.079051 -0.589601 1.000000

[Link]([Link](), annot=True, fmt="f")

<[Link]._subplots.AxesSubplot at 0x7f9a2dd624d0>

Plot de variáveis categoricas

[Link](x="duration", y="y", data=df)

[Link] 14/21
Jardeilsom do Nascimento Oliveira - blakjd2@[Link] - IP: [Link]
28/04/2022 12:37 Modulo 4 - [Link] - Colaboratory

<[Link] at 0x7f9a2dd82750>

[Link](x="campaign", y="y", data=df)

<[Link] at 0x7f9a2dc0b650>

[Link](x="age", y="y", data=df)

[Link] 15/21
Jardeilsom do Nascimento Oliveira - blakjd2@[Link] - IP: [Link]
28/04/2022 12:37 Modulo 4 - [Link] - Colaboratory

<[Link] at 0x7f9a2db7ec50>

Análise Multivariada

[Link](x="age", y="duration", hue="y", data=df);

[Link] 16/21
Jardeilsom do Nascimento Oliveira - blakjd2@[Link] - IP: [Link]
28/04/2022 12:37 Modulo 4 - [Link] - Colaboratory

Análise de Componentes Principais - PCA no contexto de Análise Multivariada

from [Link] import StandardScaler

from [Link] import PCA

metadata

Feature Feature_Type

0 age numeric

type of job ('admin.','blue-collar','entrepreneur','h

1 job Categorical,nominal
employed','services','stude

2 marital categorical,nominal marital status ('divorced','married','single','unknown'; note:

3 education categorical,nominal ('basic.4y','basic.6y','basic.9y','[Link]','illiterate','professiona

4 default categorical,nominal has

5 housing categorical,nominal h

6 loan categorical,nominal h

7 contact categorical,nominal contact co

8 month categorical,ordinal last contact month

9 dayofweek categorical,ordinal last contact da

last contact duration, in seconds . Important note: this attribute

10 duration numeric

11 campaign numeric number of contacts performed during this campaign a

number of days that passed by after the client was last co

12 pdays numeric
mea

13 previous numeric number of contacts performed

14 poutcome categorical,nominal outcome of the previous marketing ca

df_pca = df[['age', 'duration','campaign','pdays','previous']]

df_pca.head()

[Link] 17/21
Jardeilsom do Nascimento Oliveira - blakjd2@[Link] - IP: [Link]
28/04/2022 12:37 Modulo 4 - [Link] - Colaboratory

age duration campaign pdays previous

0 49 227 4 999 0
pca = PCA(n_components=2, random_state=42)

1 37 202 2 999 1

df_expl_pca = StandardScaler().fit_transform(df_pca)

2 78 1148 1 999 0

3 36 120 2 999 0
df_expl_pca

4 59 368 2 999 0
array([[ 0.86373877, -0.12019627, 0.52298128, 0.19658384, -0.35012691],

[-0.28972159, -0.2167318 , -0.20368791, 0.19658384, 1.65381294],

[ 3.65126795, 3.43617293, -0.56702251, 0.19658384, -0.35012691],

...,

[ 1.34434725, -0.49089273, 0.52298128, 0.19658384, -0.35012691],

[-1.05869515, -0.3596044 , -0.56702251, 0.19658384, -0.35012691],

[-0.48196498, 1.10387435, 0.15964669, 0.19658384, -0.35012691]])

result_pca = pca.fit_transform(df_expl_pca)

result_pca_df = [Link](result_pca,

columns=['component1','component2'])

result_pca_df

component1 component2

0 -0.425175 -0.509855

1 1.005371 -0.146158

2 0.265589 2.274575

3 -0.421084 -0.115342

4 -0.197363 0.194940

... ... ...

32945 -0.379635 0.451884

32946 1.095991 -0.530097

32947 -0.433674 -0.855301

32948 -0.384307 0.361312

32949 -0.324058 0.829408

32950 rows × 2 columns

O quanto eu estou conseguindo explicar da variabilidade dos dados?

pca.explained_variance_ratio_

array([0.32246681, 0.2116934 ])

[Link] 18/21
Jardeilsom do Nascimento Oliveira - blakjd2@[Link] - IP: [Link]
28/04/2022 12:37 Modulo 4 - [Link] - Colaboratory

df_resp_pca = [Link]([df['y'], result_pca_df], axis=1)

df_resp_pca

y component1 component2

0 no -0.425175 -0.509855

1 no 1.005371 -0.146158

2 yes 0.265589 2.274575

3 no -0.421084 -0.115342

4 no -0.197363 0.194940

... ... ... ...

32945 no -0.379635 0.451884

32946 no 1.095991 -0.530097

32947 no -0.433674 -0.855301

32948 no -0.384307 0.361312

32949 no -0.324058 0.829408

32950 rows × 3 columns

fig = [Link](figsize= (10,10))

ax = fig.add_subplot(1,1,1)

ax.set_xlabel('Component_1', fontsize = 15)

ax.set_ylabel('Component_2', fontsize = 15)

ax.set_title('PCA 2 componentes', fontsize = 20)

targets = ['yes','no']

colors = ['r', 'b']

for target, color in zip(targets,colors):

indicesToKeep = df_resp_pca['y'] == target

[Link](df_resp_pca.loc[indicesToKeep, 'component1']

, df_resp_pca.loc[indicesToKeep, 'component2']

, c = color

, s = 50)

[Link](targets)

[Link]()

[Link] 19/21
Jardeilsom do Nascimento Oliveira - blakjd2@[Link] - IP: [Link]
28/04/2022 12:37 Modulo 4 - [Link] - Colaboratory

[Link] 20/21
Jardeilsom do Nascimento Oliveira - blakjd2@[Link] - IP: [Link]
28/04/2022 12:37 Modulo 4 - [Link] - Colaboratory

[Link] 21/21
Jardeilsom do Nascimento Oliveira - blakjd2@[Link] - IP: [Link]

EDA All Functions
No ratings yet
EDA All Functions
9 pages
Unit7 Working With Pandas - Solved
No ratings yet
Unit7 Working With Pandas - Solved
12 pages
Observation: Import As Import As Import As Import As
No ratings yet
Observation: Import As Import As Import As Import As
31 pages
Student Notebook HR Analysis
No ratings yet
Student Notebook HR Analysis
11 pages
Sakina Assign1 Batch3
No ratings yet
Sakina Assign1 Batch3
8 pages
RA303 Machine Learning
No ratings yet
RA303 Machine Learning
2 pages
Data Pre Processing and Cleaning
No ratings yet
Data Pre Processing and Cleaning
56 pages
Data Science Practicals - Ipynb
No ratings yet
Data Science Practicals - Ipynb
54 pages
Data Pre Processing and Cleaning
No ratings yet
Data Pre Processing and Cleaning
23 pages
1st Project
No ratings yet
1st Project
24 pages
Eda 2 Code
No ratings yet
Eda 2 Code
20 pages
Lab 3
No ratings yet
Lab 3
2 pages
Python For Data Science - Unit 6 - Week 3
No ratings yet
Python For Data Science - Unit 6 - Week 3
5 pages
Practical Questions
No ratings yet
Practical Questions
7 pages
Tung Wah College GEN3005 / GED3005 Big Data and Data Sciences
No ratings yet
Tung Wah College GEN3005 / GED3005 Big Data and Data Sciences
6 pages
Python
No ratings yet
Python
32 pages
Prints
No ratings yet
Prints
43 pages
Chapter 2 - Python Pandas II
No ratings yet
Chapter 2 - Python Pandas II
71 pages
Employee Data Analysis Report
No ratings yet
Employee Data Analysis Report
22 pages
Lab2!17!07-2025 - Demonstrate Various Data Pre-Processing Techniques For A Given Dataset.
No ratings yet
Lab2!17!07-2025 - Demonstrate Various Data Pre-Processing Techniques For A Given Dataset.
17 pages
EDA Cheat Sheet - Exploratory Data Analysis
No ratings yet
EDA Cheat Sheet - Exploratory Data Analysis
2 pages
Kunal Assignment 3
No ratings yet
Kunal Assignment 3
19 pages
Data Analysis in The Banking Sector: Pandas Fundamentals
No ratings yet
Data Analysis in The Banking Sector: Pandas Fundamentals
16 pages
Pandas Cheat Sheet
No ratings yet
Pandas Cheat Sheet
20 pages
Python For Data Analysis Jan 28
No ratings yet
Python For Data Analysis Jan 28
105 pages
Download
No ratings yet
Download
10 pages
Machine Learning
No ratings yet
Machine Learning
3 pages
Python For Data Science - Unit 6 - Week 3
No ratings yet
Python For Data Science - Unit 6 - Week 3
5 pages
230103-ECON209 S2025 Lab 2.ipynb-Colab
No ratings yet
230103-ECON209 S2025 Lab 2.ipynb-Colab
10 pages
MLS 5 - Python Project Support
No ratings yet
MLS 5 - Python Project Support
54 pages
Pandas Cheat Sheet
No ratings yet
Pandas Cheat Sheet
19 pages
Problem 1:: Readingcsv PD Read - Excel (Readingcsv) Readingcsv Head
No ratings yet
Problem 1:: Readingcsv PD Read - Excel (Readingcsv) Readingcsv Head
18 pages
Data Analysis Using Python
No ratings yet
Data Analysis Using Python
12 pages
CSE445 NSU Week - 3
No ratings yet
CSE445 NSU Week - 3
48 pages
Unit 5
No ratings yet
Unit 5
93 pages
ML Lab Manual 1-10
No ratings yet
ML Lab Manual 1-10
58 pages
Marketing Analytics EDA Insights
No ratings yet
Marketing Analytics EDA Insights
42 pages
Python Libraries for Data Science
No ratings yet
Python Libraries for Data Science
96 pages
Sowmi DS
No ratings yet
Sowmi DS
27 pages
More On Pandas
No ratings yet
More On Pandas
51 pages
Murali Internship
No ratings yet
Murali Internship
34 pages
Data Analysis
No ratings yet
Data Analysis
42 pages
Python For Data Science
No ratings yet
Python For Data Science
45 pages
Pandas
No ratings yet
Pandas
32 pages
DSBDA3 - Jupyter Notebook
No ratings yet
DSBDA3 - Jupyter Notebook
12 pages
Dev Record Aids
No ratings yet
Dev Record Aids
24 pages
Untitled0.ipynb - Colab
No ratings yet
Untitled0.ipynb - Colab
6 pages
Exploratory Data Analysis of Heart Disease Dataset 1737826105
No ratings yet
Exploratory Data Analysis of Heart Disease Dataset 1737826105
50 pages
Exploratory Data Analysis Using Python
No ratings yet
Exploratory Data Analysis Using Python
41 pages
Data Analysis & Visualization Exam Paper
No ratings yet
Data Analysis & Visualization Exam Paper
7 pages
Python Data Analysis Basics
No ratings yet
Python Data Analysis Basics
32 pages
Pandas Merged
No ratings yet
Pandas Merged
2 pages
12 Pandas
No ratings yet
12 Pandas
14 pages
Predictive+Modelling+-+Logistic+Regression+-+Student+Version-New2.3.ipynb - Colaboratory
No ratings yet
Predictive+Modelling+-+Logistic+Regression+-+Student+Version-New2.3.ipynb - Colaboratory
12 pages
Python Data Analysis Tutorial
No ratings yet
Python Data Analysis Tutorial
47 pages
Data Analysis Exam for CS Majors
No ratings yet
Data Analysis Exam for CS Majors
12 pages
DHP Unit - 4 Part2
No ratings yet
DHP Unit - 4 Part2
16 pages
Multi Row Fetch, INSERT and Get Diagnostics How To Put It Into Code
No ratings yet
Multi Row Fetch, INSERT and Get Diagnostics How To Put It Into Code
40 pages
Shed Structure Output
No ratings yet
Shed Structure Output
9 pages
C++ Pokédex Implementation Homework 5
No ratings yet
C++ Pokédex Implementation Homework 5
3 pages
Information Technology
No ratings yet
Information Technology
22 pages
1
No ratings yet
1
1 page
S322-01 S+ Operations - Advanced Configuration - Design and Engineering Considerations RevC
No ratings yet
S322-01 S+ Operations - Advanced Configuration - Design and Engineering Considerations RevC
29 pages
HEC MS Computer Science Curriculum
0% (1)
HEC MS Computer Science Curriculum
215 pages
Windows Update Error 80070002
No ratings yet
Windows Update Error 80070002
3 pages
Introduction To SGSN
100% (3)
Introduction To SGSN
79 pages
Maximo Consultant Expertise Overview
No ratings yet
Maximo Consultant Expertise Overview
12 pages
Engineering Student CV Highlights
No ratings yet
Engineering Student CV Highlights
1 page
Reverse Shell
No ratings yet
Reverse Shell
27 pages
Class 12 Previous Year Questions
No ratings yet
Class 12 Previous Year Questions
60 pages
Digital Banking Challenges Emerging Technology Tre
No ratings yet
Digital Banking Challenges Emerging Technology Tre
20 pages
AcumaticaERP FrameworkDevelopmentGuide
No ratings yet
AcumaticaERP FrameworkDevelopmentGuide
337 pages
English Tech Tips for Juniors
No ratings yet
English Tech Tips for Juniors
5 pages
UVM Testbench Configuration Guide
100% (1)
UVM Testbench Configuration Guide
6 pages
Software Testing Management Guide
No ratings yet
Software Testing Management Guide
19 pages
Data Flow Diagram: Course 5
No ratings yet
Data Flow Diagram: Course 5
25 pages
TeamMate+ Admin Guide
No ratings yet
TeamMate+ Admin Guide
35 pages
Quote - Management 10.2.300
No ratings yet
Quote - Management 10.2.300
39 pages
Optimization Algorithms
No ratings yet
Optimization Algorithms
5 pages
Introduction of DevOps
No ratings yet
Introduction of DevOps
14 pages
Understanding Computers: Basics & History
100% (1)
Understanding Computers: Basics & History
37 pages
Chapter 7 - The CPU and Memory
No ratings yet
Chapter 7 - The CPU and Memory
9 pages
9 Ways To Tell If Your Android Phone Is Hacked Certo
0% (1)
9 Ways To Tell If Your Android Phone Is Hacked Certo
9 pages
Wi 5180
No ratings yet
Wi 5180
7 pages
SB 10065353 6903
No ratings yet
SB 10065353 6903
3 pages
05 - Cisco ASA Firewall
No ratings yet
05 - Cisco ASA Firewall
33 pages
Yoga Pose Detection with Machine Learning
No ratings yet
Yoga Pose Detection with Machine Learning
11 pages

Exploratory Data Analysis Guide

Uploaded by

Exploratory Data Analysis Guide

Uploaded by

28/04/2022 12:37 Modulo 4 - EDA.

Data Science com Python

Downloading [Link] (15.1 MB)

|████████████████████████████████| 15.1 MB 2.9 MB/s

Requirement already satisfied: scipy>=1.3.2 in /usr/local/lib/python3.7/dist-packages

Successfully installed sweetviz-2.1.3

Fonte dos dados: [Link]

type of job ('admin.','blue-collar','entrepreneur','h

2 marital categorical,nominal marital status ('divorced','married','single','unknown'; note:

3 education categorical,nominal ('basic.4y','basic.6y','basic.9y','[Link]','illiterate','professiona

4 default categorical,nominal has

7 contact categorical,nominal contact co

8 month categorical,ordinal last contact month

9 dayofweek categorical,ordinal last contact da

last contact duration, in seconds . Important note: this attribute

11 campaign numeric number of contacts performed during this campaign a

number of days that passed by after the client was last co

13 previous numeric number of contacts performed

age job marital education default housing loan contact month

0 49 blue-collar married basic.9y unknown no no cellular nov

1 37 entrepreneur married [Link] no no no telephone nov

3 36 admin. married [Link] no yes no telephone may

4 59 retired divorced [Link] no no no cellular jun

Done! Use 'show' commands to display/save.

RangeIndex: 32950 entries, 0 to 32949

Data columns (total 16 columns):

# Column Non-Null Count Dtype

--- ------ -------------- -----

0 age 32950 non-null int64

1 job 32950 non-null object

2 marital 32950 non-null object

3 education 32950 non-null object

4 default 32950 non-null object

5 housing 32950 non-null object

6 loan 32950 non-null object

7 contact 32950 non-null object

8 month 32950 non-null object

9 day_of_week 32950 non-null object

10 duration 32950 non-null int64

11 campaign 32950 non-null int64

12 pdays 32950 non-null int64

13 previous 32950 non-null int64

14 poutcome 32950 non-null object

15 y 32950 non-null object

dtypes: int64(5), object(11)

memory usage: 4.0+ MB

Name: age, dtype: int64

Name: age, dtype: int64

Medidas descritivas básicas

age duration campaign pdays previous

count 32950.000000 32950.000000 32950.000000 32950.000000 32950.000000

mean 40.014112 258.127466 2.560607 962.052413 0.174719

std 10.403636 258.975917 2.752326 187.951096 0.499025

min 17.000000 0.000000 1.000000 0.000000 0.000000

25% 32.000000 103.000000 1.000000 999.000000 0.000000

50% 38.000000 180.000000 2.000000 999.000000 0.000000

75% 47.000000 319.000000 3.000000 999.000000 0.000000

max 98.000000 4918.000000 56.000000 999.000000 7.000000

job marital education default housing loan contact month day_

count 32950 32950 32950 32950 32950 32950 32950 32950

top admin. married [Link] no yes no cellular may

freq 8314 19953 9736 26007 17254 27131 20908 11011

Name: poutcome, dtype: int64

Name: contact, dtype: int64

0 25915 2501 28416

1 2889 784 3673

2 324 282 606

All 29238 3712 32950

admin. 7244 1070 8314

blue-collar 6926 515 7441

entrepreneur 1060 100 1160

housemaid 769 86 855

management 2076 269 2345

retired 1018 348 1366

self-employed 980 119 1099

services 2942 254 3196