0% found this document useful (0 votes)
10 views13 pages

Data Analytics Tips From Experience

The document provides ten essential tips for data analytics, including fixing column types before merging, using .info() and .describe() for early data checks, and addressing missing values appropriately. It also covers advanced techniques like method chaining, pivot tables, log transformations for skewed data, time-based analysis, and detecting correlation clusters. Each tip is accompanied by examples and explanations to enhance data analysis practices.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views13 pages

Data Analytics Tips From Experience

The document provides ten essential tips for data analytics, including fixing column types before merging, using .info() and .describe() for early data checks, and addressing missing values appropriately. It also covers advanced techniques like method chaining, pivot tables, log transformations for skewed data, time-based analysis, and detecting correlation clusters. Each tip is accompanied by examples and explanations to enhance data analysis practices.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

10 MUST KNOW

DATA ANALYTICS
TIPS
1/ FIX COLUMN TYPES BEFORE
MERGING
Type: int

Type: Object

Make sure to turn both IDs into ‘String’ type before


merging, here’s how:

Anas Riad
2/ USE .INFO() AND .DESCRIBE()
EARLY
These simple method helps you catch incorrect data types, missing
values, and get a feel for distribution and cardinality.

.info()

Acquire the missing


data and data type
information quickly.

All the available data


types and memory
usage are available

.describe()

The data statistics are


available for us to use

Cornellius Yudha Wijaya


3/ DEAL WITH MISSING
VALUES

1269 missing values in


‘Artist column

Make sure to fix missing values. If it’s a numerical value, you can
use mean or median. If it’s a categorical value, you can put the
category with the highest occurrence or put ‘unknown’.

Filled missing artists


with ‘unkown’

Anas Riad
4/ USE .ASSIGN() FOR METHOD
CHAINING IN EDA
Make data transformation pipelines more readable and compact with
method chaining.

Method Chaining
of 4 different
method

The output after


all the Method
Chaining

Cornellius Yudha Wijaya


5/ GROUP BY
SUMMARY
Choose column
to group by and
aggregate

Aggregation:
sum, count,
mean...

Show Top 10

Anas Riad
6/ PIVOT TABLES FOR
MULTIDIMENSIONAL SUMMARIES
Multi-index pivot tables help explore data from different perspectives
quickly.

Column for the overall data values


Column for the table index
Column for the table index
Statistical Function to perform
Add “All” rows

The Pivot Tables

Cornellius Yudha Wijaya


7/ SIMPLE BAR CHART WITH
PLOTLY

First, define what are you going to


visualise. In this case top 10 count
artists by nationality

Second, define a simple bar


chart with plotly using the
variable created previously
top 10 nationalities.

Anas Riad
8/ APPLY LOG TRANSFORM FOR
SKEWED DATA

Highly skewed numeric data (e.g. income, spending) can distort


analysis. Apply a log or Box-Cox transform for normalization.

The transformation result will direct the overall data distribution close
to normal

Cornellius Yudha Wijaya


9/ TIME-BASED ANALYSIS
Make sure date columns are in
’datetime’ format first

Get the values you want to


visualise as Time Series

PS: Limited to top 10


here to not get a Use [Link] in plotly to plot a
super long list
line chart (time series).

What can you observe?

Investigate?

Down trend?

Anas Riad
10/ DETECT AND VISUALIZE
CORRELATION CLUSTERS
Use correlation matrices + clustering to group correlated features and
detect redundancy or multicollinearity.

Dendrogram (Top and


Left)
Clustering of features
based on similarity in
correlation. Features
close together in the
dendrogram are more
similar than the
others.

Heatmap (Center)
Shows the Pearson
correlation coefficient
between each pair of
features.

Cornellius Yudha Wijaya


WHICH TIP YOU
FOUND THE MOST
INTERESTING?
COMMENT BELOW
FOLLOW ANAS AND CORNELLIUS
FOR MORE CONTENT ON DATA
ANALYTICS AND DATA SCIENCE.

You might also like