0% found this document useful (0 votes)
151 views10 pages

Working-With-Csv Cheatsheet

The document discusses loading and working with CSV files in Pandas. It shows how to import Pandas, open local and remote CSV files, specify the separator, and set the index column. Examples are given of reading CSV files from a local path, URL, specifying tab separation, and setting the index column.

Uploaded by

Yajur Agarwal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
151 views10 pages

Working-With-Csv Cheatsheet

The document discusses loading and working with CSV files in Pandas. It shows how to import Pandas, open local and remote CSV files, specify the separator, and set the index column. Examples are given of reading CSV files from a local path, URL, specifying tab separation, and setting the index column.

Uploaded by

Yajur Agarwal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

6/15/23, 2:09 AM working-with-csv - Jupyter Notebook

1. Importing pandas
In [1]: import pandas as pd

2. Opening a local csv file


In [34]: df = pd.read_csv('aug_train.csv')
df

Out[34]: enrollee_id city city_development_index gender relevent_experience enrolled_university education_level major_discipline experience company_size company_type last_new_job training_hours target

0 8949 city_103 0.920 Male Has relevent experience no_enrollment Graduate STEM >20 NaN NaN 1 36 1.0

1 29725 city_40 0.776 Male No relevent experience no_enrollment Graduate STEM 15 50-99 Pvt Ltd >4 47 0.0

2 11561 city_21 0.624 NaN No relevent experience Full time course Graduate STEM 5 NaN NaN never 83 0.0

3 33241 city_115 0.789 NaN No relevent experience NaN Graduate Business Degree <1 NaN Pvt Ltd never 52 1.0

4 666 city_162 0.767 Male Has relevent experience no_enrollment Masters STEM >20 50-99 Funded Startup 4 8 0.0

... ... ... ... ... ... ... ... ... ... ... ... ... ... ...

19153 7386 city_173 0.878 Male No relevent experience no_enrollment Graduate Humanities 14 NaN NaN 1 42 1.0

19154 31398 city_103 0.920 Male Has relevent experience no_enrollment Graduate STEM 14 NaN NaN 4 52 1.0

19155 24576 city_103 0.920 Male Has relevent experience no_enrollment Graduate STEM >20 50-99 Pvt Ltd 4 44 0.0

19156 5756 city_65 0.802 Male Has relevent experience no_enrollment High School NaN <1 500-999 Pvt Ltd 2 97 0.0

19157 23834 city_67 0.855 NaN No relevent experience no_enrollment Primary School NaN 2 NaN NaN 1 127 0.0

19158 rows × 14 columns

3. Opening a csv file from an URL

localhost:8888/notebooks/Downloads/100-days-of-machine-learning-main/100-days-of-machine-learning-main/day15 - working with csv files/working-with-csv.ipynb 1/10


6/15/23, 2:09 AM working-with-csv - Jupyter Notebook

In [35]: import requests


from io import StringIO

url = "https://raw.githubusercontent.com/cs109/2014_data/master/countries.csv"
headers = {"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:66.0) Gecko/20100101 Firefox/66.0"}
req = requests.get(url, headers=headers)
data = StringIO(req.text)

pd.read_csv(data)

Out[35]: Country Region

0 Algeria AFRICA

1 Angola AFRICA

2 Benin AFRICA

3 Botswana AFRICA

4 Burkina AFRICA

... ... ...

189 Paraguay SOUTH AMERICA

190 Peru SOUTH AMERICA

191 Suriname SOUTH AMERICA

192 Uruguay SOUTH AMERICA

193 Venezuela SOUTH AMERICA

194 rows × 2 columns

4. Sep Parameter
In [41]: pd.read_csv('movie_titles_metadata.tsv',sep='\t',names=['sno','name','release_year','rating','votes','genres'])

Out[41]: sno name release_year rating votes genres

0 m0 10 things i hate about you 1999 6.9 62847.0 ['comedy' 'romance']

1 m1 1492: conquest of paradise 1992 6.2 10421.0 ['adventure' 'biography' 'drama' 'history']

2 m2 15 minutes 2001 6.1 25854.0 ['action' 'crime' 'drama' 'thriller']

3 m3 2001: a space odyssey 1968 8.4 163227.0 ['adventure' 'mystery' 'sci-fi']

4 m4 48 hrs. 1982 6.9 22289.0 ['action' 'comedy' 'crime' 'drama' 'thriller']

... ... ... ... ... ... ...

612 m612 watchmen 2009 7.8 135229.0 ['action' 'crime' 'fantasy' 'mystery' 'sci-fi'...

613 m613 xxx 2002 5.6 53505.0 ['action' 'adventure' 'crime']

614 m614 x-men 2000 7.4 122149.0 ['action' 'sci-fi']

615 m615 young frankenstein 1974 8.0 57618.0 ['comedy' 'sci-fi']

616 m616 zulu dawn 1979 6.4 1911.0 ['action' 'adventure' 'drama' 'history' 'war']

617 rows × 6 columns

localhost:8888/notebooks/Downloads/100-days-of-machine-learning-main/100-days-of-machine-learning-main/day15 - working with csv files/working-with-csv.ipynb 2/10


6/15/23, 2:09 AM working-with-csv - Jupyter Notebook

5. Index_col parameter
In [43]: pd.read_csv('aug_train.csv',index_col='enrollee_id')

Out[43]: city city_development_index gender relevent_experience enrolled_university education_level major_discipline experience company_size company_type last_new_job training_hours target

enrollee_id

8949 city_103 0.920 Male Has relevent experience no_enrollment Graduate STEM >20 NaN NaN 1 36 1.0

29725 city_40 0.776 Male No relevent experience no_enrollment Graduate STEM 15 50-99 Pvt Ltd >4 47 0.0

11561 city_21 0.624 NaN No relevent experience Full time course Graduate STEM 5 NaN NaN never 83 0.0

33241 city_115 0.789 NaN No relevent experience NaN Graduate Business Degree <1 NaN Pvt Ltd never 52 1.0

666 city_162 0.767 Male Has relevent experience no_enrollment Masters STEM >20 50-99 Funded Startup 4 8 0.0

... ... ... ... ... ... ... ... ... ... ... ... ... ...

7386 city_173 0.878 Male No relevent experience no_enrollment Graduate Humanities 14 NaN NaN 1 42 1.0

31398 city_103 0.920 Male Has relevent experience no_enrollment Graduate STEM 14 NaN NaN 4 52 1.0

24576 city_103 0.920 Male Has relevent experience no_enrollment Graduate STEM >20 50-99 Pvt Ltd 4 44 0.0

5756 city_65 0.802 Male Has relevent experience no_enrollment High School NaN <1 500-999 Pvt Ltd 2 97 0.0

23834 city_67 0.855 NaN No relevent experience no_enrollment Primary School NaN 2 NaN NaN 1 127 0.0

19158 rows × 13 columns

6. Header parameter
In [46]: pd.read_csv('test.csv',header=1)

Out[46]: 0 enrollee_id city city_development_index gender relevent_experience enrolled_university education_level major_discipline experience company_size company_type last_new_job training_hours target

0 1 29725 city_40 0.776 Male No relevent experience no_enrollment Graduate STEM 15 50-99 Pvt Ltd >4 47 0

1 2 11561 city_21 0.624 NaN No relevent experience Full time course Graduate STEM 5 NaN NaN never 83 0

2 3 33241 city_115 0.789 NaN No relevent experience NaN Graduate Business Degree <1 NaN Pvt Ltd never 52 1

3 4 666 city_162 0.767 Male Has relevent experience no_enrollment Masters STEM >20 50-99 Funded Startup 4 8 0

7. use_cols parameter

localhost:8888/notebooks/Downloads/100-days-of-machine-learning-main/100-days-of-machine-learning-main/day15 - working with csv files/working-with-csv.ipynb 3/10


6/15/23, 2:09 AM working-with-csv - Jupyter Notebook

In [48]: pd.read_csv('aug_train.csv',usecols=['enrollee_id','gender','education_level'])

Out[48]: enrollee_id gender education_level

0 8949 Male Graduate

1 29725 Male Graduate

2 11561 NaN Graduate

3 33241 NaN Graduate

4 666 Male Masters

... ... ... ...

19153 7386 Male Graduate

19154 31398 Male Graduate

19155 24576 Male Graduate

19156 5756 Male High School

19157 23834 NaN Primary School

19158 rows × 3 columns

8. Squeeze parameters
In [50]: pd.read_csv('aug_train.csv',usecols=['gender'],squeeze=True)

Out[50]: 0 Male
1 Male
2 NaN
3 NaN
4 Male
...
19153 Male
19154 Male
19155 Male
19156 Male
19157 NaN
Name: gender, Length: 19158, dtype: object

9. Skiprows/nrows Parameter

localhost:8888/notebooks/Downloads/100-days-of-machine-learning-main/100-days-of-machine-learning-main/day15 - working with csv files/working-with-csv.ipynb 4/10


6/15/23, 2:09 AM working-with-csv - Jupyter Notebook

In [103]: pd.read_csv('aug_train.csv',nrows=100)

Out[103]: enrollee_id city city_development_index gender relevent_experience enrolled_university education_level major_discipline experience company_size company_type last_new_job training_hours target

0 8949 city_103 0.920 Male Has relevent experience no_enrollment Graduate STEM >20 NaN NaN 1 36 1.0

1 29725 city_40 0.776 Male No relevent experience no_enrollment Graduate STEM 15 50-99 Pvt Ltd >4 47 0.0

2 11561 city_21 0.624 NaN No relevent experience Full time course Graduate STEM 5 NaN NaN never 83 0.0

3 33241 city_115 0.789 NaN No relevent experience NaN Graduate Business Degree <1 NaN Pvt Ltd never 52 1.0

4 666 city_162 0.767 Male Has relevent experience no_enrollment Masters STEM >20 50-99 Funded Startup 4 8 0.0

... ... ... ... ... ... ... ... ... ... ... ... ... ... ...

95 12081 city_65 0.802 Male Has relevent experience Full time course Graduate STEM 9 50-99 Pvt Ltd 1 33 0.0

96 7364 city_160 0.920 NaN No relevent experience Full time course High School NaN 2 100-500 Pvt Ltd 1 142 0.0

97 11184 city_74 0.579 NaN No relevent experience Full time course Graduate STEM 2 100-500 Pvt Ltd 1 34 0.0

98 7016 city_65 0.802 Male Has relevent experience no_enrollment Graduate STEM 6 50-99 Pvt Ltd 2 14 1.0

99 8695 city_11 0.550 Male Has relevent experience no_enrollment Graduate STEM 6 10/49 Pvt Ltd 2 27 1.0

100 rows × 14 columns

10. Encoding parameter

localhost:8888/notebooks/Downloads/100-days-of-machine-learning-main/100-days-of-machine-learning-main/day15 - working with csv files/working-with-csv.ipynb 5/10


6/15/23, 2:09 AM working-with-csv - Jupyter Notebook

In [97]: pd.read_csv('zomato.csv',encoding='latin-1')

Out[97]: Switch
Has Has Is
Restaurant Restaurant Country Locality to Price Aggregate Rating Rating
City Address Locality Longitude Latitude Cuisines ... Currency Table Online delivering Vo
ID Name Code Verbose order range rating color text
booking delivery now
menu

Third Floor, Century City


Century City
Century City Mall, French,
Le Petit Mall, Botswana Dark
0 6317637 162 Makati City Mall, Poblacion, 121.027535 14.565443 Japanese, ... Yes No No No 3 4.8 Excellent 3
Souffle Poblacion, Pula(P) Green
Kalayaan Makati City, Desserts
Makati City
Avenu... Mak...

Little Tokyo, Little Tokyo,


Little Tokyo,
2277 Chino Legaspi
Izakaya Legaspi Botswana Dark
1 6304287 162 Makati City Roces Village, 121.014101 14.553708 Japanese ... Yes No No No 3 4.5 Excellent 5
Kikufuji Village, Pula(P) Green
Avenue, Makati City,
Makati City
Legaspi... Ma...

Edsa Edsa
Edsa Shangri- Seafood,
Heat - Shangri-La, 1 Shangri-La,
Mandaluyong La, Ortigas, Asian, Botswana Very
2 6300002 Edsa 162 Garden Way, Ortigas, 121.056831 14.581404 ... Yes No No No 4 4.4 Green 2
City Mandaluyong Filipino, Pula(P) Good
Shangri-La Ortigas, Mandaluyong
City, Ma... Indian
Mandal... City

Third Floor,
SM SM
Mega
Megamall, Megamall,
Mandaluyong Fashion Hall, Japanese, Botswana Dark
3 6318506 Ooma 162 Ortigas, Ortigas, 121.056475 14.585318 ... No No No No 4 4.9 Excellent 3
City SM Sushi Pula(P) Green
Mandaluyong Mandaluyong
Megamall,
City City, Mandal...
O...

Third Floor, SM SM
Mega Atrium, Megamall, Megamall,
Sambo Mandaluyong Japanese, Botswana Dark
4 6314302 162 SM Ortigas, Ortigas, 121.057508 14.584450 ... Yes No No No 4 4.8 Excellent 2
Kojin City Korean Pula(P) Green
Megamall, Mandaluyong Mandaluyong
Ortigas... City City, Mandal...

... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...

Kemankeô
Karamustafa
NamlÛ± Karakí_y, Turkish Very
9546 5915730 208 ÛÁstanbul Paôa Karakí_y 28.977392 41.022793 Turkish ... No No No No 3 4.1 Green 7
Gurme ÛÁstanbul Lira(TL) Good
Mahallesi,
RÛ±htÛ±...

Koôuyolu
World
Mahallesi,
Ceviz Koôuyolu, Cuisine, Turkish Very
9547 5908749 208 ÛÁstanbul Muhittin Koôuyolu 29.041297 41.009847 ... No No No No 3 4.2 Green 10
AÛôacÛ± ÛÁstanbul Patisserie, Lira(TL) Good
íìstí_ndaÛô
Cafe
Cadd...

Kuruí_eôme
Italian,
Mahallesi, Kuruí_eôme, Turkish
9548 5915807 Huqqa 208 ÛÁstanbul Kuruí_eôme 29.034640 41.055817 World ... No No No No 4 3.7 Yellow Good 6
Muallim Naci ÛÁstanbul Lira(TL)
Cuisine
Caddesi, N...

Kuruí_eôme
Aôôk Mahallesi, Kuruí_eôme, Restaurant Turkish Very
9549 5916112 208 ÛÁstanbul Kuruí_eôme 29.036019 41.057979 ... No No No No 4 4.0 Green 9
Kahve Muallim Naci ÛÁstanbul Cafe Lira(TL) Good
Caddesi, N...

CafeaÛôa
Walter's Mahallesi,
Moda, Turkish Very
9550 5927402 Coffee 208 ÛÁstanbul BademaltÛ± Moda 29.026016 40.984776 Cafe ... No No No No 2 4.0 Green 5
ÛÁstanbul Lira(TL) Good
Roastery Sokak, No
21/B,...

9551 rows × 21 columns


 

localhost:8888/notebooks/Downloads/100-days-of-machine-learning-main/100-days-of-machine-learning-main/day15 - working with csv files/working-with-csv.ipynb 6/10


6/15/23, 2:09 AM working-with-csv - Jupyter Notebook

11. Skip bad lines


In [93]: pd.read_csv('BX-Books.csv', sep=';', encoding="latin-1",error_bad_lines=False)

b'Skipping line 6452: expected 8 fields, saw 9\nSkipping line 43667: expected 8 fields, saw 10\nSkipping line 51751: expected 8 fields, saw 9\n'
b'Skipping line 92038: expected 8 fields, saw 9\nSkipping line 104319: expected 8 fields, saw 9\nSkipping line 121768: expected 8 fields, saw 9\n'
b'Skipping line 144058: expected 8 fields, saw 9\nSkipping line 150789: expected 8 fields, saw 9\nSkipping line 157128: expected 8 fields, saw 9\nSkipping line 180189: expected
8 fields, saw 9\nSkipping line 185738: expected 8 fields, saw 9\n'
b'Skipping line 209388: expected 8 fields, saw 9\nSkipping line 220626: expected 8 fields, saw 9\nSkipping line 227933: expected 8 fields, saw 11\nSkipping line 228957: expected
8 fields, saw 10\nSkipping line 245933: expected 8 fields, saw 9\nSkipping line 251296: expected 8 fields, saw 9\nSkipping line 259941: expected 8 fields, saw 9\nSkipping line 2
61529: expected 8 fields, saw 9\n'

Out[93]: Book- Year-Of-


ISBN Book-Title Publisher Image-URL-S Image-URL-M Image-URL-L
Author Publication

Classical Mark P. O. Oxford University


0 0195153448 2002 http://images.amazon.com/images/P/0195153448.0... http://images.amazon.com/images/P/0195153448.0... http://images.amazon.com/images/P/0195153448.0...
Mythology Morford Press

Richard
Clara HarperFlamingo
1 0002005018 Bruce 2001 http://images.amazon.com/images/P/0002005018.0... http://images.amazon.com/images/P/0002005018.0... http://images.amazon.com/images/P/0002005018.0...
Callan Canada
Wright

Decision in Carlo
2 0060973129 1991 HarperPerennial http://images.amazon.com/images/P/0060973129.0... http://images.amazon.com/images/P/0060973129.0... http://images.amazon.com/images/P/0060973129.0...
Normandy D'Este

Flu: The
Story of the
Gina Bari
3 0374157065 Great 1999 Farrar Straus Giroux http://images.amazon.com/images/P/0374157065.0... http://images.amazon.com/images/P/0374157065.0... http://images.amazon.com/images/P/0374157065.0...
Kolata
Influenza
Pandemic...

The
E. J. W. W. W. Norton &amp;
4 0393045218 Mummies 1999 http://images.amazon.com/images/P/0393045218.0... http://images.amazon.com/images/P/0393045218.0... http://images.amazon.com/images/P/0393045218.0...
Barber Company
of Urumchi

... ... ... ... ... ... ... ... ...

There's a
Paula Random House
271355 0440400988 Bat in Bunk 1988 http://images.amazon.com/images/P/0440400988.0... http://images.amazon.com/images/P/0440400988.0... http://images.amazon.com/images/P/0440400988.0...
Danziger Childrens Pub (Mm)
Five

From One
271356 0525447644 to One Teri Sloat 1991 Dutton Books http://images.amazon.com/images/P/0525447644.0... http://images.amazon.com/images/P/0525447644.0... http://images.amazon.com/images/P/0525447644.0...
Hundred

Lily Dale :
The True
Christine
271357 006008667X Story of the 2004 HarperSanFrancisco http://images.amazon.com/images/P/006008667X.0... http://images.amazon.com/images/P/006008667X.0... http://images.amazon.com/images/P/006008667X.0...
Wicker
Town that
Ta...

Republic
Oxford University
271358 0192126040 (World's Plato 1996 http://images.amazon.com/images/P/0192126040.0... http://images.amazon.com/images/P/0192126040.0... http://images.amazon.com/images/P/0192126040.0...
Press
Classics)

A Guided
Tour of
McGraw-Hill
Rene Christopher
271359 0767409752 2000 Humanities/Social http://images.amazon.com/images/P/0767409752.0... http://images.amazon.com/images/P/0767409752.0... http://images.amazon.com/images/P/0767409752.0...
Descartes' Biffle
Sciences/Languages
Meditations
o...

271360 rows × 8 columns


 

localhost:8888/notebooks/Downloads/100-days-of-machine-learning-main/100-days-of-machine-learning-main/day15 - working with csv files/working-with-csv.ipynb 7/10


6/15/23, 2:09 AM working-with-csv - Jupyter Notebook

12. dtypes parameter


In [108]: pd.read_csv('aug_train.csv',dtype={'target':int}).info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 19158 entries, 0 to 19157
Data columns (total 14 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 enrollee_id 19158 non-null int64
1 city 19158 non-null object
2 city_development_index 19158 non-null float64
3 gender 14650 non-null object
4 relevent_experience 19158 non-null object
5 enrolled_university 18772 non-null object
6 education_level 18698 non-null object
7 major_discipline 16345 non-null object
8 experience 19093 non-null object
9 company_size 13220 non-null object
10 company_type 13018 non-null object
11 last_new_job 18735 non-null object
12 training_hours 19158 non-null int64
13 target 19158 non-null int32
dtypes: float64(1), int32(1), int64(2), object(10)
memory usage: 2.0+ MB

13. Handling Dates


In [112]: pd.read_csv('IPL Matches 2008-2020.csv',parse_dates=['date']).info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 816 entries, 0 to 815
Data columns (total 17 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 id 816 non-null int64
1 city 803 non-null object
2 date 816 non-null datetime64[ns]
3 player_of_match 812 non-null object
4 venue 816 non-null object
5 neutral_venue 816 non-null int64
6 team1 816 non-null object
7 team2 816 non-null object
8 toss_winner 816 non-null object
9 toss_decision 816 non-null object
10 winner 812 non-null object
11 result 812 non-null object
12 result_margin 799 non-null float64
13 eliminator 812 non-null object
14 method 19 non-null object
15 umpire1 816 non-null object
16 umpire2 816 non-null object
dtypes: datetime64[ns](1), float64(1), int64(2), object(13)
memory usage: 108.5+ KB

localhost:8888/notebooks/Downloads/100-days-of-machine-learning-main/100-days-of-machine-learning-main/day15 - working with csv files/working-with-csv.ipynb 8/10


6/15/23, 2:09 AM working-with-csv - Jupyter Notebook

In [142]: def rename(name):


if name == "Royal Challengers Bangalore":
return "RCB"
else:
return name

In [143]: rename("Royal Challengers Bangalore")

Out[143]: 'RCB'

14. Convertors
In [144]: pd.read_csv('IPL Matches 2008-2020.csv',converters={'team1':rename})

Out[144]: id city date player_of_match venue neutral_venue team1 team2 toss_winner toss_decision winner result result_margin eliminator method umpire1 umpire2

Royal
2008- M Chinnaswamy Kolkata Kolkata Asad
0 335982 Bangalore BB McCullum 0 RCB Challengers field runs 140.0 N NaN RE Koertzen
04-18 Stadium Knight Riders Knight Riders Rauf
Bangalore

Punjab Cricket
2008- Kings XI Chennai Chennai Super Chennai MR
1 335983 Chandigarh MEK Hussey Association 0 bat runs 33.0 N NaN SL Shastri
04-19 Punjab Super Kings Kings Super Kings Benson
Stadium, Mohali

2008- Delhi Rajasthan Rajasthan Delhi Aleem GA


2 335984 Delhi MF Maharoof Feroz Shah Kotla 0 bat wickets 9.0 N NaN
04-19 Daredevils Royals Royals Daredevils Dar Pratapkumar

Royal Royal
2008- Wankhede Mumbai Mumbai
3 335985 Mumbai MV Boucher 0 Challengers bat Challengers wickets 5.0 N NaN SJ Davis DJ Harper
04-20 Stadium Indians Indians
Bangalore Bangalore

Kolkata
2008- Deccan Deccan Kolkata BF
4 335986 Kolkata DJ Hussey Eden Gardens 0 Knight bat wickets 5.0 N NaN K Hariharan
04-20 Chargers Chargers Knight Riders Bowden
Riders

... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...

Dubai Royal
2020- Mumbai Mumbai Nitin
811 1216547 Dubai AB de Villiers International 0 RCB field Challengers tie NaN Y NaN PR Reiffel
09-28 Indians Indians Menon
Cricket Stadium Bangalore

Dubai
2020- Mumbai Mumbai CB
812 1237177 Dubai JJ Bumrah International 0 Delhi Capitals Delhi Capitals field runs 57.0 N NaN Nitin Menon
11-05 Indians Indians Gaffaney
Cricket Stadium

2020- Sheikh Zayed Sunrisers Sunrisers Sunrisers PR


813 1237178 Abu Dhabi KS Williamson 0 RCB field wickets 6.0 N NaN S Ravi
11-06 Stadium Hyderabad Hyderabad Hyderabad Reiffel

2020- Sheikh Zayed Delhi Sunrisers PR


814 1237180 Abu Dhabi MP Stoinis 0 Delhi Capitals bat Delhi Capitals runs 17.0 N NaN S Ravi
11-08 Stadium Capitals Hyderabad Reiffel

Dubai
2020- Delhi Mumbai Mumbai CB
815 1237181 Dubai TA Boult International 0 Delhi Capitals bat wickets 5.0 N NaN Nitin Menon
11-10 Capitals Indians Indians Gaffaney
Cricket Stadium

816 rows × 17 columns

15. na_values parameter

localhost:8888/notebooks/Downloads/100-days-of-machine-learning-main/100-days-of-machine-learning-main/day15 - working with csv files/working-with-csv.ipynb 9/10


6/15/23, 2:09 AM working-with-csv - Jupyter Notebook

In [147]: pd.read_csv('aug_train.csv',na_values=['Male',])

Out[147]: enrollee_id city city_development_index gender relevent_experience enrolled_university education_level major_discipline experience company_size company_type last_new_job training_hours target

0 8949 city_103 0.920 Male Has relevent experience no_enrollment Graduate STEM >20 NaN NaN 1 36 1.0

1 29725 city_40 0.776 Male No relevent experience no_enrollment Graduate STEM 15 50-99 Pvt Ltd >4 47 0.0

2 11561 city_21 0.624 NaN No relevent experience Full time course Graduate STEM 5 NaN NaN never 83 0.0

3 33241 city_115 0.789 NaN No relevent experience NaN Graduate Business Degree <1 NaN Pvt Ltd never 52 1.0

4 666 city_162 0.767 Male Has relevent experience no_enrollment Masters STEM >20 50-99 Funded Startup 4 8 0.0

... ... ... ... ... ... ... ... ... ... ... ... ... ... ...

19153 7386 city_173 0.878 Male No relevent experience no_enrollment Graduate Humanities 14 NaN NaN 1 42 1.0

19154 31398 city_103 0.920 Male Has relevent experience no_enrollment Graduate STEM 14 NaN NaN 4 52 1.0

19155 24576 city_103 0.920 Male Has relevent experience no_enrollment Graduate STEM >20 50-99 Pvt Ltd 4 44 0.0

19156 5756 city_65 0.802 Male Has relevent experience no_enrollment High School NaN <1 500-999 Pvt Ltd 2 97 0.0

19157 23834 city_67 0.855 NaN No relevent experience no_enrollment Primary School NaN 2 NaN NaN 1 127 0.0

19158 rows × 14 columns

16. Loading a huge dataset in chunks


In [151]: dfs = pd.read_csv('aug_train.csv',chunksize=5000)

In [152]: for chunks in dfs:


print(chunk.shape)

(4158, 14)
(4158, 14)
(4158, 14)
(4158, 14)

In [ ]: ​

localhost:8888/notebooks/Downloads/100-days-of-machine-learning-main/100-days-of-machine-learning-main/day15 - working with csv files/working-with-csv.ipynb 10/10

You might also like