0% found this document useful (0 votes)
64 views2 pages

Ml-Data Wrangling-Assignment 01

The document provides instructions for a machine learning assignment involving data wrangling. Students are asked to gather data from multiple sources, assess and clean the data, then merge it into a single clean dataframe to analyze. The instructions describe gathering three different datasets from various sources and merging them into one clean csv file for analysis. Students should import and submit their Python notebook with code, outputs, and analysis as a PDF.

Uploaded by

shahfaisal gfg
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
64 views2 pages

Ml-Data Wrangling-Assignment 01

The document provides instructions for a machine learning assignment involving data wrangling. Students are asked to gather data from multiple sources, assess and clean the data, then merge it into a single clean dataframe to analyze. The instructions describe gathering three different datasets from various sources and merging them into one clean csv file for analysis. Students should import and submit their Python notebook with code, outputs, and analysis as a PDF.

Uploaded by

shahfaisal gfg
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

University of Management and Technology

Machine Learning
Assignment 01 (CLO-02)
Course Instructor: Aqsa Afzal

Name: Roll Number:

Date: Dated:

Semester: Total Marks: 10

INSTRUCTIONS

 Complete the assignment in a group of two only.


 Late submissions will not be accepted under any circumstances.
 Prepare for a viva session, where you may be questioned on the content of your assignment.
 Ensure figures and diagrams are clearly labeled
 Submit your assignment in PDF format and hard form too.

Instructor Signature

You can get quick recap about Data Wrangling from:


[Link]

Page PAGE 3 of NUMPAGES 3


Steps for assignment:
Gathering data from multiple sources
* Assessing the data visually and programmatically to identify quality and tidiness issues
* Ridding each dataframe of every tidiness and quality issue
* Merging the three dataframes into one clean master dataframe
* Analyzing, exploring relationships and visualizing insights from the clean data

Gathering of Data
Gathered three different datasets with different formats in three different ways
1. Download the :
[twitter_archive_enhanced.csv]([Link]
August/59a4e958_twitter-archive-enhanced/[Link]) file and read it into
a pandas dataframe.
2. Programmatically downloaded the second file, '[Link]' from the provided [url
here]([Link]
predictions/[Link]) using the Requests library
3. Sourced data from Twitter using the Tweepy library to query additional data via the Twitter
API, saving it into a txt file 'tweet_json.txt' and read it line by line into a pandas dataframe.

Merged the dataframes into a single dataframe twitter_archive_master.csv

Major Work has already been done, you just have to analyze the code and follow the text and
comments provided

 Import python notebook as pdf and submit it. For VS code, do install Markdown(for
comment and text and pdf).
 Don’t include the long output in hard form
 Outputs for Issues(mentioned in notebook) should be visible

Project Credit: Nohmie Aguga

Page PAGE 3 of NUMPAGES 3

You might also like