University of Management and Technology
Machine Learning
Assignment 01 (CLO-02)
Course Instructor: Aqsa Afzal
Name: Roll Number:
Date: Dated:
Semester: Total Marks: 10
INSTRUCTIONS
Complete the assignment in a group of two only.
Late submissions will not be accepted under any circumstances.
Prepare for a viva session, where you may be questioned on the content of your assignment.
Ensure figures and diagrams are clearly labeled
Submit your assignment in PDF format and hard form too.
Instructor Signature
You can get quick recap about Data Wrangling from:
[Link]
Page PAGE 3 of NUMPAGES 3
Steps for assignment:
Gathering data from multiple sources
* Assessing the data visually and programmatically to identify quality and tidiness issues
* Ridding each dataframe of every tidiness and quality issue
* Merging the three dataframes into one clean master dataframe
* Analyzing, exploring relationships and visualizing insights from the clean data
Gathering of Data
Gathered three different datasets with different formats in three different ways
1. Download the :
[twitter_archive_enhanced.csv]([Link]
August/59a4e958_twitter-archive-enhanced/[Link]) file and read it into
a pandas dataframe.
2. Programmatically downloaded the second file, '[Link]' from the provided [url
here]([Link]
predictions/[Link]) using the Requests library
3. Sourced data from Twitter using the Tweepy library to query additional data via the Twitter
API, saving it into a txt file 'tweet_json.txt' and read it line by line into a pandas dataframe.
Merged the dataframes into a single dataframe twitter_archive_master.csv
Major Work has already been done, you just have to analyze the code and follow the text and
comments provided
Import python notebook as pdf and submit it. For VS code, do install Markdown(for
comment and text and pdf).
Don’t include the long output in hard form
Outputs for Issues(mentioned in notebook) should be visible
Project Credit: Nohmie Aguga
Page PAGE 3 of NUMPAGES 3