Python for Data Science 3150713
Practical-7
Assignment For pandas library
URL for test.xml file.
https://drive.google.com/file/d/1FqOWhY2XNYkHwCBYOjhAILCzVUo9QEp6/view?usp=sharing
Read the xml file (test.xml) and create a dataframe from it and do the following.
Find and print duplicate records.
Remove duplicates and save data in other dataframe.
URL for the file for this assignment.
https://drive.google.com/file/d/1CNAdqFZ-Amji8kOMd4GovivK8UKVLQ-p/view?usp=sharing
Read the csv file (indian_food.csv). Consider value -1 for missing or NA values.(Replace -1 with
NaN when reading a csv file.)
Print the first and last 10 records of dataframe, also print column names and summary of data.
Print information about data such as data types of each column.
Convert columns with name course,diet,flavor_profile,state,region to categorical data type &
print data type for dataframe using info function.
Categories are defined as follows.
Course ['dessert' 'main course' 'starter' 'snack']
Flavor_profile ['sweet' 'spicy' 'bitter' 'sour']
State ['West Bengal' 'Rajasthan' 'Punjab' 'Uttar Pradesh' 'Odisha' 'Maharashtra' 'Uttarakhand'
'Assam' 'Bihar' 'Andhra Pradesh' 'Karnataka' 'Telangana' 'Kerala' 'Tamil Nadu' 'Gujarat' 'Tripura'
'Manipur' 'Nagaland' 'NCT of Delhi' 'Jammu & Kashmir' 'Chhattisgarh' 'Haryana' 'Madhya Pradesh'
'Goa']
Region ['East' 'West' 'North' nan 'North East' 'South' 'Central']
Print name of items with course as dessert.
Print count of items with flavor_profile with sweet type.
Print name of items with cooking_time < prep_time.
Print summary of data grouped by diet column.
Print average cooking_time & prep_time for vegetarian diet type.
S.V.I.T 210410107130
Python for Data Science 3150713
Insert a new column with column name as total_time which contains sum of cooking_time &
prep_time into existing dataframe.
Print name,cooking_time,prep_time,total_time of items with total_time >=500.
Print count of items with various flavour_profile per region.
# e.g.
# region flavor_profile
# Central spicy 2
# sweet 1
# East spicy 5
# sweet 20
Find & print records with missing data in the state column.
Fill missing data in the state column with -.
Write regular expression,
To extract phone numbers (+dd-dddd-dddd) from the following text
“Hey my number is +01-555-1212 & his number is +01-770-1410”
To extract email addresses from the following text.
“You can contact to
[email protected] or to
[email protected]”.
Demonstrate stemming & stop word removal using nltk library for content given below.
[“Most of the world will make decisions by either guessing or using their gut. They will be
either lucky or wrong.”,
“The goal is to turn data into information and information into insight.”]
Using a 20 newsgroup dataset, create and demonstrate a bag of words model.Also convert the raw
newsgroup documents into a matrix of TF-IDF feature.
S.V.I.T 210410107130