0% found this document useful (0 votes)
71 views2 pages

Pandas Assignment

The document provides instructions for several tasks using pandas and other Python libraries to analyze data. It includes: 1) Reading an XML and CSV file, finding and removing duplicate records from the XML data, and printing summaries of the CSV data. 2) Converting certain columns in the CSV to categorical data types. 3) Adding a new column to the CSV for total time, and printing data meeting criteria. 4) Counting flavor profiles by region in the CSV. 5) Finding and filling missing state values in the CSV. 6) Demonstrating regular expressions, stemming, stop word removal and bag of words modeling on text data.

Uploaded by

hetgoti4911
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
71 views2 pages

Pandas Assignment

The document provides instructions for several tasks using pandas and other Python libraries to analyze data. It includes: 1) Reading an XML and CSV file, finding and removing duplicate records from the XML data, and printing summaries of the CSV data. 2) Converting certain columns in the CSV to categorical data types. 3) Adding a new column to the CSV for total time, and printing data meeting criteria. 4) Counting flavor profiles by region in the CSV. 5) Finding and filling missing state values in the CSV. 6) Demonstrating regular expressions, stemming, stop word removal and bag of words modeling on text data.

Uploaded by

hetgoti4911
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Python for Data Science 3150713

Practical-7
Assignment For pandas library

URL for test.xml file.


https://drive.google.com/file/d/1FqOWhY2XNYkHwCBYOjhAILCzVUo9QEp6/view?usp=sharing

Read the xml file (test.xml) and create a dataframe from it and do the following.
Find and print duplicate records.
Remove duplicates and save data in other dataframe.

URL for the file for this assignment.


https://drive.google.com/file/d/1CNAdqFZ-Amji8kOMd4GovivK8UKVLQ-p/view?usp=sharing

Read the csv file (indian_food.csv). Consider value -1 for missing or NA values.(Replace -1 with
NaN when reading a csv file.)

Print the first and last 10 records of dataframe, also print column names and summary of data.
Print information about data such as data types of each column.

Convert columns with name course,diet,flavor_profile,state,region to categorical data type &


print data type for dataframe using info function.

Categories are defined as follows.


Course ['dessert' 'main course' 'starter' 'snack']
Flavor_profile ['sweet' 'spicy' 'bitter' 'sour']
State ['West Bengal' 'Rajasthan' 'Punjab' 'Uttar Pradesh' 'Odisha' 'Maharashtra' 'Uttarakhand'
'Assam' 'Bihar' 'Andhra Pradesh' 'Karnataka' 'Telangana' 'Kerala' 'Tamil Nadu' 'Gujarat' 'Tripura'
'Manipur' 'Nagaland' 'NCT of Delhi' 'Jammu & Kashmir' 'Chhattisgarh' 'Haryana' 'Madhya Pradesh'
'Goa']
Region ['East' 'West' 'North' nan 'North East' 'South' 'Central']

Print name of items with course as dessert.


Print count of items with flavor_profile with sweet type.

Print name of items with cooking_time < prep_time.


Print summary of data grouped by diet column.

Print average cooking_time & prep_time for vegetarian diet type.

S.V.I.T 210410107130
Python for Data Science 3150713

Insert a new column with column name as total_time which contains sum of cooking_time &
prep_time into existing dataframe.
Print name,cooking_time,prep_time,total_time of items with total_time >=500.
Print count of items with various flavour_profile per region.
# e.g.
# region flavor_profile
# Central spicy 2
# sweet 1
# East spicy 5
# sweet 20

Find & print records with missing data in the state column.
Fill missing data in the state column with -.

Write regular expression,


To extract phone numbers (+dd-dddd-dddd) from the following text
“Hey my number is +01-555-1212 & his number is +01-770-1410”
To extract email addresses from the following text.
“You can contact to [email protected] or to [email protected]”.

Demonstrate stemming & stop word removal using nltk library for content given below.

[“Most of the world will make decisions by either guessing or using their gut. They will be
either lucky or wrong.”,
“The goal is to turn data into information and information into insight.”]

Using a 20 newsgroup dataset, create and demonstrate a bag of words model.Also convert the raw
newsgroup documents into a matrix of TF-IDF feature.

S.V.I.T 210410107130

You might also like