0% ont trouvé ce document utile (0 vote)
31 vues19 pages

Data Frames-2

Transféré par

Israa Al
Copyright
© © All Rights Reserved
Nous prenons très au sérieux les droits relatifs au contenu. Si vous pensez qu’il s’agit de votre contenu, signalez une atteinte au droit d’auteur ici.
Formats disponibles
Téléchargez aux formats PDF, TXT ou lisez en ligne sur Scribd
0% ont trouvé ce document utile (0 vote)
31 vues19 pages

Data Frames-2

Transféré par

Israa Al
Copyright
© © All Rights Reserved
Nous prenons très au sérieux les droits relatifs au contenu. Si vous pensez qu’il s’agit de votre contenu, signalez une atteinte au droit d’auteur ici.
Formats disponibles
Téléchargez aux formats PDF, TXT ou lisez en ligne sur Scribd

data_frames-2

January 20, 2024

1 Introduction
Dans cette partie du cours nous allons découvrir ensemble les fonctionnalités de pandas qui per-
mettent l’analyse des données.
Les données que nous allons utiliser dans notre cours sont les données concernant les bilionnaires
du monde.
Notre base de donnée contient des informations sur ces bilionnaires tels que leurs industries, leurs
informations personnelles et leur fortune.

[1]: # chargement des librairies


import pandas as pd
import numpy as np

[7]: data = pd.read_csv("C:/Users/Haifa.benmessaoud/OneDrive - National Pen Company/


↪Desktop/Billionaires_Statistics_Dataset.csv")

[8]: data.head()

[8]: rank finalWorth category personName age \


0 1 211000 Fashion & Retail Bernard Arnault & family 74.0
1 2 180000 Automotive Elon Musk 51.0
2 3 114000 Technology Jeff Bezos 59.0
3 4 107000 Technology Larry Ellison 78.0
4 5 106000 Finance & Investments Warren Buffett 92.0

country city source industries \


0 France Paris LVMH Fashion & Retail
1 United States Austin Tesla, SpaceX Automotive
2 United States Medina Amazon Technology
3 United States Lanai Oracle Technology
4 United States Omaha Berkshire Hathaway Finance & Investments

countryOfCitizenship … cpi_change_country gdp_country \


0 France … 1.1 $2,715,518,274,227
1 United States … 7.5 $21,427,700,000,000
2 United States … 7.5 $21,427,700,000,000
3 United States … 7.5 $21,427,700,000,000

1
4 United States … 7.5 $21,427,700,000,000

gross_tertiary_education_enrollment \
0 65.6
1 88.2
2 88.2
3 88.2
4 88.2

gross_primary_education_enrollment_country life_expectancy_country \
0 102.5 82.5
1 101.8 78.5
2 101.8 78.5
3 101.8 78.5
4 101.8 78.5

tax_revenue_country_country total_tax_rate_country population_country \


0 24.2 60.7 67059887.0
1 9.6 36.6 328239523.0
2 9.6 36.6 328239523.0
3 9.6 36.6 328239523.0
4 9.6 36.6 328239523.0

latitude_country longitude_country
0 46.227638 2.213749
1 37.090240 -95.712891
2 37.090240 -95.712891
3 37.090240 -95.712891
4 37.090240 -95.712891

[5 rows x 35 columns]

[10]: data.tail()

[10]: rank finalWorth category personName age \


2635 2540 1000 Healthcare Yu Rong 51.0
2636 2540 1000 Food & Beverage Richard Yuengling, Jr. 80.0
2637 2540 1000 Manufacturing Zhang Gongyun 60.0
2638 2540 1000 Real Estate Zhang Guiping & family 71.0
2639 2540 1000 Diversified Inigo Zobel 66.0

country city source \


2635 China Shanghai Health clinics
2636 United States Pottsville Beer
2637 China Gaomi Tyre manufacturing machinery
2638 China Nanjing Real estate
2639 Philippines Makati Diversified

2
industries countryOfCitizenship … cpi_change_country \
2635 Healthcare China … 2.9
2636 Food & Beverage United States … 7.5
2637 Manufacturing China … 2.9
2638 Real Estate China … 2.9
2639 Diversified Philippines … 2.5

gdp_country gross_tertiary_education_enrollment \
2635 $19,910,000,000,000 50.6
2636 $21,427,700,000,000 88.2
2637 $19,910,000,000,000 50.6
2638 $19,910,000,000,000 50.6
2639 $376,795,508,680 35.5

gross_primary_education_enrollment_country life_expectancy_country \
2635 100.2 77.0
2636 101.8 78.5
2637 100.2 77.0
2638 100.2 77.0
2639 107.5 71.1

tax_revenue_country_country total_tax_rate_country population_country \


2635 9.4 59.2 1.397715e+09
2636 9.6 36.6 3.282395e+08
2637 9.4 59.2 1.397715e+09
2638 9.4 59.2 1.397715e+09
2639 14.0 43.1 1.081166e+08

latitude_country longitude_country
2635 35.861660 104.195397
2636 37.090240 -95.712891
2637 35.861660 104.195397
2638 35.861660 104.195397
2639 12.879721 121.774017

[5 rows x 35 columns]

[11]: data.shape

[11]: (2640, 35)

[12]: data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2640 entries, 0 to 2639
Data columns (total 35 columns):
# Column Non-Null Count Dtype

3
--- ------ -------------- -----
0 rank 2640 non-null int64
1 finalWorth 2640 non-null int64
2 category 2640 non-null object
3 personName 2640 non-null object
4 age 2575 non-null float64
5 country 2602 non-null object
6 city 2568 non-null object
7 source 2640 non-null object
8 industries 2640 non-null object
9 countryOfCitizenship 2640 non-null object
10 organization 325 non-null object
11 selfMade 2640 non-null bool
12 status 2640 non-null object
13 gender 2640 non-null object
14 birthDate 2564 non-null object
15 lastName 2640 non-null object
16 firstName 2637 non-null object
17 title 339 non-null object
18 date 2640 non-null object
19 state 753 non-null object
20 residenceStateRegion 747 non-null object
21 birthYear 2564 non-null float64
22 birthMonth 2564 non-null float64
23 birthDay 2564 non-null float64
24 cpi_country 2456 non-null float64
25 cpi_change_country 2456 non-null float64
26 gdp_country 2476 non-null object
27 gross_tertiary_education_enrollment 2458 non-null float64
28 gross_primary_education_enrollment_country 2459 non-null float64
29 life_expectancy_country 2458 non-null float64
30 tax_revenue_country_country 2457 non-null float64
31 total_tax_rate_country 2458 non-null float64
32 population_country 2476 non-null float64
33 latitude_country 2476 non-null float64
34 longitude_country 2476 non-null float64
dtypes: bool(1), float64(14), int64(2), object(18)
memory usage: 704.0+ KB

1.1 La fonction describe de pandas


Elle permet d’obtenir des statistiques descriptives sur les données du DataFrame. La méthode ne
peut etre appliquée que sur un objet pandas de type dataframe.

[13]: data.describe()

[13]: rank finalWorth age birthYear birthMonth \


count 2640.000000 2640.000000 2575.000000 2564.000000 2564.000000

4
mean 1289.159091 4623.787879 65.140194 1957.183307 5.740250
std 739.693726 9834.240939 13.258098 13.282516 3.710085
min 1.000000 1000.000000 18.000000 1921.000000 1.000000
25% 659.000000 1500.000000 56.000000 1948.000000 2.000000
50% 1312.000000 2300.000000 65.000000 1957.000000 6.000000
75% 1905.000000 4200.000000 75.000000 1966.000000 9.000000
max 2540.000000 211000.000000 101.000000 2004.000000 12.000000

birthDay cpi_country cpi_change_country \


count 2564.000000 2456.000000 2456.000000
mean 12.099844 127.755204 4.364169
std 9.918876 26.452951 3.623763
min 1.000000 99.550000 -1.900000
25% 1.000000 117.240000 1.700000
50% 11.000000 117.240000 2.900000
75% 21.000000 125.080000 7.500000
max 31.000000 288.570000 53.500000

gross_tertiary_education_enrollment \
count 2458.000000
mean 67.225671
std 21.343426
min 4.000000
25% 50.600000
50% 65.600000
75% 88.200000
max 136.600000

gross_primary_education_enrollment_country life_expectancy_country \
count 2459.000000 2458.000000
mean 102.858520 78.122823
std 4.710977 3.730099
min 84.700000 54.300000
25% 100.200000 77.000000
50% 101.800000 78.500000
75% 102.600000 80.900000
max 142.100000 84.200000

tax_revenue_country_country total_tax_rate_country \
count 2457.000000 2458.000000
mean 12.546235 43.963344
std 5.368625 12.145296
min 0.100000 9.900000
25% 9.600000 36.600000
50% 9.600000 41.200000
75% 12.800000 59.100000
max 37.200000 106.300000

5
population_country latitude_country longitude_country
count 2.476000e+03 2476.000000 2476.000000
mean 5.102053e+08 34.903592 12.583156
std 5.542447e+08 17.003497 86.762989
min 3.801900e+04 -40.900557 -106.346771
25% 6.683440e+07 35.861660 -95.712891
50% 3.282395e+08 37.090240 10.451526
75% 1.366418e+09 40.463667 104.195397
max 1.397715e+09 61.924110 174.885971

Dans ce cas la méthode describe n’affiche que des statistiques sur les données de type numérique.
Les données de type object ont été exclus automatiquement. Pour ajouter tous les champs, il suffit
d’ajouter la commande include = “all” dans la commande describe

[14]: data.describe(include ='all')

[14]: rank finalWorth category personName \


count 2640.000000 2640.000000 2640 2640
unique NaN NaN 18 2638
top NaN NaN Finance & Investments Li Li
freq NaN NaN 372 2
mean 1289.159091 4623.787879 NaN NaN
std 739.693726 9834.240939 NaN NaN
min 1.000000 1000.000000 NaN NaN
25% 659.000000 1500.000000 NaN NaN
50% 1312.000000 2300.000000 NaN NaN
75% 1905.000000 4200.000000 NaN NaN
max 2540.000000 211000.000000 NaN NaN

age country city source \


count 2575.000000 2602 2568 2640
unique NaN 78 741 906
top NaN United States New York Real estate
freq NaN 754 99 151
mean 65.140194 NaN NaN NaN
std 13.258098 NaN NaN NaN
min 18.000000 NaN NaN NaN
25% 56.000000 NaN NaN NaN
50% 65.000000 NaN NaN NaN
75% 75.000000 NaN NaN NaN
max 101.000000 NaN NaN NaN

industries countryOfCitizenship … cpi_change_country \


count 2640 2640 … 2456.000000
unique 18 77 … NaN
top Finance & Investments United States … NaN
freq 372 735 … NaN

6
mean NaN NaN … 4.364169
std NaN NaN … 3.623763
min NaN NaN … -1.900000
25% NaN NaN … 1.700000
50% NaN NaN … 2.900000
75% NaN NaN … 7.500000
max NaN NaN … 53.500000

gdp_country gross_tertiary_education_enrollment \
count 2476 2458.000000
unique 68 NaN
top $21,427,700,000,000 NaN
freq 754 NaN
mean NaN 67.225671
std NaN 21.343426
min NaN 4.000000
25% NaN 50.600000
50% NaN 65.600000
75% NaN 88.200000
max NaN 136.600000

gross_primary_education_enrollment_country life_expectancy_country \
count 2459.000000 2458.000000
unique NaN NaN
top NaN NaN
freq NaN NaN
mean 102.858520 78.122823
std 4.710977 3.730099
min 84.700000 54.300000
25% 100.200000 77.000000
50% 101.800000 78.500000
75% 102.600000 80.900000
max 142.100000 84.200000

tax_revenue_country_country total_tax_rate_country population_country \


count 2457.000000 2458.000000 2.476000e+03
unique NaN NaN NaN
top NaN NaN NaN
freq NaN NaN NaN
mean 12.546235 43.963344 5.102053e+08
std 5.368625 12.145296 5.542447e+08
min 0.100000 9.900000 3.801900e+04
25% 9.600000 36.600000 6.683440e+07
50% 9.600000 41.200000 3.282395e+08
75% 12.800000 59.100000 1.366418e+09
max 37.200000 106.300000 1.397715e+09

7
latitude_country longitude_country
count 2476.000000 2476.000000
unique NaN NaN
top NaN NaN
freq NaN NaN
mean 34.903592 12.583156
std 17.003497 86.762989
min -40.900557 -106.346771
25% 35.861660 -95.712891
50% 37.090240 10.451526
75% 40.463667 104.195397
max 61.924110 174.885971

[11 rows x 35 columns]

Nous pouvons aussi exclure certain type de describe en ajoutant la commande exclude

[15]: data.describe(exclude ='bool')

[15]: rank finalWorth category personName \


count 2640.000000 2640.000000 2640 2640
unique NaN NaN 18 2638
top NaN NaN Finance & Investments Li Li
freq NaN NaN 372 2
mean 1289.159091 4623.787879 NaN NaN
std 739.693726 9834.240939 NaN NaN
min 1.000000 1000.000000 NaN NaN
25% 659.000000 1500.000000 NaN NaN
50% 1312.000000 2300.000000 NaN NaN
75% 1905.000000 4200.000000 NaN NaN
max 2540.000000 211000.000000 NaN NaN

age country city source \


count 2575.000000 2602 2568 2640
unique NaN 78 741 906
top NaN United States New York Real estate
freq NaN 754 99 151
mean 65.140194 NaN NaN NaN
std 13.258098 NaN NaN NaN
min 18.000000 NaN NaN NaN
25% 56.000000 NaN NaN NaN
50% 65.000000 NaN NaN NaN
75% 75.000000 NaN NaN NaN
max 101.000000 NaN NaN NaN

industries countryOfCitizenship … cpi_change_country \


count 2640 2640 … 2456.000000
unique 18 77 … NaN

8
top Finance & Investments United States … NaN
freq 372 735 … NaN
mean NaN NaN … 4.364169
std NaN NaN … 3.623763
min NaN NaN … -1.900000
25% NaN NaN … 1.700000
50% NaN NaN … 2.900000
75% NaN NaN … 7.500000
max NaN NaN … 53.500000

gdp_country gross_tertiary_education_enrollment \
count 2476 2458.000000
unique 68 NaN
top $21,427,700,000,000 NaN
freq 754 NaN
mean NaN 67.225671
std NaN 21.343426
min NaN 4.000000
25% NaN 50.600000
50% NaN 65.600000
75% NaN 88.200000
max NaN 136.600000

gross_primary_education_enrollment_country life_expectancy_country \
count 2459.000000 2458.000000
unique NaN NaN
top NaN NaN
freq NaN NaN
mean 102.858520 78.122823
std 4.710977 3.730099
min 84.700000 54.300000
25% 100.200000 77.000000
50% 101.800000 78.500000
75% 102.600000 80.900000
max 142.100000 84.200000

tax_revenue_country_country total_tax_rate_country population_country \


count 2457.000000 2458.000000 2.476000e+03
unique NaN NaN NaN
top NaN NaN NaN
freq NaN NaN NaN
mean 12.546235 43.963344 5.102053e+08
std 5.368625 12.145296 5.542447e+08
min 0.100000 9.900000 3.801900e+04
25% 9.600000 36.600000 6.683440e+07
50% 9.600000 41.200000 3.282395e+08
75% 12.800000 59.100000 1.366418e+09

9
max 37.200000 106.300000 1.397715e+09

latitude_country longitude_country
count 2476.000000 2476.000000
unique NaN NaN
top NaN NaN
freq NaN NaN
mean 34.903592 12.583156
std 17.003497 86.762989
min -40.900557 -106.346771
25% 35.861660 -95.712891
50% 37.090240 10.451526
75% 40.463667 104.195397
max 61.924110 174.885971

[11 rows x 34 columns]

Nous pouvons aussi personnaliser les percentiles que nous avons déjà dans la commande describe

[16]: data.describe(percentiles=[.01,.10,.90,.99])

[16]: rank finalWorth age birthYear birthMonth \


count 2640.000000 2640.000000 2575.000000 2564.000000 2564.000000
mean 1289.159091 4623.787879 65.140194 1957.183307 5.740250
std 739.693726 9834.240939 13.258098 13.282516 3.710085
min 1.000000 1000.000000 18.000000 1921.000000 1.000000
1% 27.390000 1000.000000 35.000000 1928.000000 1.000000
10% 261.000000 1200.000000 49.000000 1940.000000 1.000000
50% 1312.000000 2300.000000 65.000000 1957.000000 6.000000
90% 2259.000000 8000.000000 82.000000 1974.000000 11.000000
99% 2540.000000 41808.000000 94.000000 1988.000000 12.000000
max 2540.000000 211000.000000 101.000000 2004.000000 12.000000

birthDay cpi_country cpi_change_country \


count 2564.000000 2456.000000 2456.000000
mean 12.099844 127.755204 4.364169
std 9.918876 26.452951 3.623763
min 1.000000 99.550000 -1.900000
1% 1.000000 99.550000 0.255000
10% 1.000000 110.620000 0.600000
50% 11.000000 117.240000 2.900000
90% 27.000000 180.440000 7.500000
99% 30.000000 234.440000 15.200000
max 31.000000 288.570000 53.500000

gross_tertiary_education_enrollment \
count 2458.000000
mean 67.225671

10
std 21.343426
min 4.000000
1% 23.900000
10% 36.300000
50% 65.600000
90% 88.200000
99% 113.100000
max 136.600000

gross_primary_education_enrollment_country life_expectancy_country \
count 2459.000000 2458.000000
mean 102.858520 78.122823
std 4.710977 3.730099
min 84.700000 54.300000
1% 93.200000 69.400000
10% 100.200000 72.700000
50% 101.800000 78.500000
90% 108.400000 82.800000
99% 126.600000 84.200000
max 142.100000 84.200000

tax_revenue_country_country total_tax_rate_country \
count 2457.000000 2458.000000
mean 12.546235 43.963344
std 5.368625 12.145296
min 0.100000 9.900000
1% 9.400000 20.742000
10% 9.400000 29.500000
50% 9.600000 41.200000
90% 23.100000 59.200000
99% 27.900000 65.100000
max 37.200000 106.300000

population_country latitude_country longitude_country


count 2.476000e+03 2476.000000 2476.000000
mean 5.102053e+08 34.903592 12.583156
std 5.542447e+08 17.003497 86.762989
min 3.801900e+04 -40.900557 -106.346771
1% 1.425870e+06 -25.274398 -106.346771
10% 1.028545e+07 20.593684 -95.712891
50% 3.282395e+08 37.090240 10.451526
90% 1.397715e+09 55.378051 104.195397
99% 1.397715e+09 61.524010 138.252924
max 1.397715e+09 61.924110 174.885971

11
1.2 Gérer les valeurs manquantes
Pandas utilise la valeur NaN (Not a Number) pour réprésenter les valeurs manquantes dans un jeu
de données.
Une fonction Pandas que l’on peut utiliser pour savoir si certaines données sont manquantes dans
le Dataframe est la fonction isnull() : cette fonction retourne True lorsque la valeur testée est
manquante (NaN) et False sinon.

[17]: data.isnull()

[17]: rank finalWorth category personName age country city source \


0 False False False False False False False False
1 False False False False False False False False
2 False False False False False False False False
3 False False False False False False False False
4 False False False False False False False False
… … … … … … … … …
2635 False False False False False False False False
2636 False False False False False False False False
2637 False False False False False False False False
2638 False False False False False False False False
2639 False False False False False False False False

industries countryOfCitizenship … cpi_change_country gdp_country \


0 False False … False False
1 False False … False False
2 False False … False False
3 False False … False False
4 False False … False False
… … … … … …
2635 False False … False False
2636 False False … False False
2637 False False … False False
2638 False False … False False
2639 False False … False False

gross_tertiary_education_enrollment \
0 False
1 False
2 False
3 False
4 False
… …
2635 False
2636 False
2637 False
2638 False

12
2639 False

gross_primary_education_enrollment_country life_expectancy_country \
0 False False
1 False False
2 False False
3 False False
4 False False
… … …
2635 False False
2636 False False
2637 False False
2638 False False
2639 False False

tax_revenue_country_country total_tax_rate_country population_country \


0 False False False
1 False False False
2 False False False
3 False False False
4 False False False
… … … …
2635 False False False
2636 False False False
2637 False False False
2638 False False False
2639 False False False

latitude_country longitude_country
0 False False
1 False False
2 False False
3 False False
4 False False
… … …
2635 False False
2636 False False
2637 False False
2638 False False
2639 False False

[2640 rows x 35 columns]

[18]: # comptage des valeurs manquantes en ajoutant la commande sum()


print(data.isnull().sum())

rank 0
finalWorth 0

13
category 0
personName 0
age 65
country 38
city 72
source 0
industries 0
countryOfCitizenship 0
organization 2315
selfMade 0
status 0
gender 0
birthDate 76
lastName 0
firstName 3
title 2301
date 0
state 1887
residenceStateRegion 1893
birthYear 76
birthMonth 76
birthDay 76
cpi_country 184
cpi_change_country 184
gdp_country 164
gross_tertiary_education_enrollment 182
gross_primary_education_enrollment_country 181
life_expectancy_country 182
tax_revenue_country_country 183
total_tax_rate_country 182
population_country 164
latitude_country 164
longitude_country 164
dtype: int64
Dans un premier temps, il peut être souhaitable de filtrer/retirer les valeurs manquantes dans le
dataframe.
Pour cela on utilise la fonction dropna(). Cette fonction possède plusieurs arguments, permettant
de moduler le niveau de filtrage des valeurs manquantes :
axis= (0 ou 1) : filtrage des valeurs manquantes selon les lignes (axis = 0) ou les colonnes (axis=1).
Valeur par défaut : axis=0. how=all : argument permettant de ne retirer que les lignes (pour
axis=0) ou toutes les colonnes (pour axis=1) du dataframe pour lequelles toutes les valeurs sont
manquantes. → Attention : si cet argument n’est pas spécifié, chaque ligne (resp. chaque colonne)
contenant au moins une donnée manquante sera supprimée !
inplace=True : applique directement la modification dans le dataframe (sans créer un autre
dataframe). thresh= (nombre entier n) : permet de ne conserver que les lignes (resp. les colonnes)
contenant au moins n valeurs non manquantes.

14
[22]: data.dropna(inplace=True)
data

[22]: rank finalWorth category personName age \


1 2 180000 Automotive Elon Musk 51.0
2 3 114000 Technology Jeff Bezos 59.0
3 4 107000 Technology Larry Ellison 78.0
4 5 106000 Finance & Investments Warren Buffett 92.0
5 6 104000 Technology Bill Gates 67.0
… … … … … …
2519 2405 1100 Sports Tiger Woods 47.0
2566 2540 1000 Fashion & Retail William Fisher 66.0
2581 2540 1000 Sports LeBron James 38.0
2602 2540 1000 Technology Apoorva Mehta 36.0
2608 2540 1000 Media & Entertainment Tyler Perry 53.0

country city source \


1 United States Austin Tesla, SpaceX
2 United States Medina Amazon
3 United States Lanai Oracle
4 United States Omaha Berkshire Hathaway
5 United States Medina Microsoft
… … … …
2519 United States Jupiter Island Golf
2566 United States San Francisco Gap
2581 United States Los Angeles Basketball
2602 United States San Francisco Grocery delivery service
2608 United States Atlanta Movies, television

industries countryOfCitizenship … cpi_change_country \


1 Automotive United States … 7.5
2 Technology United States … 7.5
3 Technology United States … 7.5
4 Finance & Investments United States … 7.5
5 Technology United States … 7.5
… … … … …
2519 Sports United States … 7.5
2566 Fashion & Retail United States … 7.5
2581 Sports United States … 7.5
2602 Technology Canada … 7.5
2608 Media & Entertainment United States … 7.5

gdp_country gross_tertiary_education_enrollment \
1 $21,427,700,000,000 88.2
2 $21,427,700,000,000 88.2
3 $21,427,700,000,000 88.2
4 $21,427,700,000,000 88.2

15
5 $21,427,700,000,000 88.2
… … …
2519 $21,427,700,000,000 88.2
2566 $21,427,700,000,000 88.2
2581 $21,427,700,000,000 88.2
2602 $21,427,700,000,000 88.2
2608 $21,427,700,000,000 88.2

gross_primary_education_enrollment_country life_expectancy_country \
1 101.8 78.5
2 101.8 78.5
3 101.8 78.5
4 101.8 78.5
5 101.8 78.5
… … …
2519 101.8 78.5
2566 101.8 78.5
2581 101.8 78.5
2602 101.8 78.5
2608 101.8 78.5

tax_revenue_country_country total_tax_rate_country population_country \


1 9.6 36.6 328239523.0
2 9.6 36.6 328239523.0
3 9.6 36.6 328239523.0
4 9.6 36.6 328239523.0
5 9.6 36.6 328239523.0
… … … …
2519 9.6 36.6 328239523.0
2566 9.6 36.6 328239523.0
2581 9.6 36.6 328239523.0
2602 9.6 36.6 328239523.0
2608 9.6 36.6 328239523.0

latitude_country longitude_country
1 37.09024 -95.712891
2 37.09024 -95.712891
3 37.09024 -95.712891
4 37.09024 -95.712891
5 37.09024 -95.712891
… … …
2519 37.09024 -95.712891
2566 37.09024 -95.712891
2581 37.09024 -95.712891
2602 37.09024 -95.712891
2608 37.09024 -95.712891

16
[238 rows x 35 columns]

Pour remplacer les valeurs manquantes au lieu de les supprimer nous allons utiliser la commande
fillna

[23]: data.fillna(0, inplace=True)


data

[23]: rank finalWorth category personName age \


1 2 180000 Automotive Elon Musk 51.0
2 3 114000 Technology Jeff Bezos 59.0
3 4 107000 Technology Larry Ellison 78.0
4 5 106000 Finance & Investments Warren Buffett 92.0
5 6 104000 Technology Bill Gates 67.0
… … … … … …
2519 2405 1100 Sports Tiger Woods 47.0
2566 2540 1000 Fashion & Retail William Fisher 66.0
2581 2540 1000 Sports LeBron James 38.0
2602 2540 1000 Technology Apoorva Mehta 36.0
2608 2540 1000 Media & Entertainment Tyler Perry 53.0

country city source \


1 United States Austin Tesla, SpaceX
2 United States Medina Amazon
3 United States Lanai Oracle
4 United States Omaha Berkshire Hathaway
5 United States Medina Microsoft
… … … …
2519 United States Jupiter Island Golf
2566 United States San Francisco Gap
2581 United States Los Angeles Basketball
2602 United States San Francisco Grocery delivery service
2608 United States Atlanta Movies, television

industries countryOfCitizenship … cpi_change_country \


1 Automotive United States … 7.5
2 Technology United States … 7.5
3 Technology United States … 7.5
4 Finance & Investments United States … 7.5
5 Technology United States … 7.5
… … … … …
2519 Sports United States … 7.5
2566 Fashion & Retail United States … 7.5
2581 Sports United States … 7.5
2602 Technology Canada … 7.5
2608 Media & Entertainment United States … 7.5

gdp_country gross_tertiary_education_enrollment \

17
1 $21,427,700,000,000 88.2
2 $21,427,700,000,000 88.2
3 $21,427,700,000,000 88.2
4 $21,427,700,000,000 88.2
5 $21,427,700,000,000 88.2
… … …
2519 $21,427,700,000,000 88.2
2566 $21,427,700,000,000 88.2
2581 $21,427,700,000,000 88.2
2602 $21,427,700,000,000 88.2
2608 $21,427,700,000,000 88.2

gross_primary_education_enrollment_country life_expectancy_country \
1 101.8 78.5
2 101.8 78.5
3 101.8 78.5
4 101.8 78.5
5 101.8 78.5
… … …
2519 101.8 78.5
2566 101.8 78.5
2581 101.8 78.5
2602 101.8 78.5
2608 101.8 78.5

tax_revenue_country_country total_tax_rate_country population_country \


1 9.6 36.6 328239523.0
2 9.6 36.6 328239523.0
3 9.6 36.6 328239523.0
4 9.6 36.6 328239523.0
5 9.6 36.6 328239523.0
… … … …
2519 9.6 36.6 328239523.0
2566 9.6 36.6 328239523.0
2581 9.6 36.6 328239523.0
2602 9.6 36.6 328239523.0
2608 9.6 36.6 328239523.0

latitude_country longitude_country
1 37.09024 -95.712891
2 37.09024 -95.712891
3 37.09024 -95.712891
4 37.09024 -95.712891
5 37.09024 -95.712891
… … …
2519 37.09024 -95.712891
2566 37.09024 -95.712891

18
2581 37.09024 -95.712891
2602 37.09024 -95.712891
2608 37.09024 -95.712891

[238 rows x 35 columns]

[ ]:

19

Vous aimerez peut-être aussi