7001ICT Programming Principles 1
Assignment
School of Information and Communication Technology
Griffith University
Trimester 1, 2018
Due Friday 25 May 2018, 11.59 pm
Goals To solve some bigger problems with Python 3 programs.
Mark This will be marked out of 100, contributing to 19 % of the course
total.
Conditions This assignment is strictly individual – no group or team work.
Students may discuss the problems with anyone, but may not view
or reproduce another person’s code, or show to or share their code
with other persons (with the exception of teaching staff) or on any
online service.
1 Background
Aliens exist! Scientists have intercepted a communications stream from another planet for the first time.
Initial analysis has shown that the data stream is in fact the aliens’ version of Twitter.
The United Nations have decided that we will reach out to these aliens and contact them, but only
after we have learned as much as we can about them. At present we can not translate their language, but
we can try to learn about them as a society and as individuals by textual analysis of the stream.
2 The data
The data to analyse is saved in two text files:
follows.txt which contains the information about which other users follow each user; and
stream.txt which is the actual alien Twitter stream of tweets.
2.1 User names
In the text files a user name is a sequence of alphanumeric characters, length >= 0; for example,
mrmagoo23.
2.2 follows.txt
The file follows.txt contains the follow graph. Each line is of the form
user0<>user1<>user2<>...
where <> represents some whitespace. It means that user0 follows user1, user2, ... If a user doesn’t
follow anyone, the line looks like this:
user0
The lines are in no particular order.
1
2.3 stream.txt
The file stream.txt contains the Twitter stream in chronological order. Each line is one of:
ordinary tweet
user0<>any text...
where user0 is the username of the author of the tweet, and any text... is any sequence of
characters terminated by the end of the line, the content of the tweet.
retweet
user0<>RT<>@user1<>any text...
where user0 is the username of the user who has retweeted an original tweet authored by user1.
direct message
user0<>DM<>@user1<>any text...
where user0 is author of a private message, normally seen only by user1.
Within the content of any tweet (any text...) can appear mentions of other users. Mentions start with
@ followed by a user name. The user name is terminated by the end of the line or any non-alphanumeric
character. Any user mentioned will see that tweet, even if it is a direct message. A tweeter might make
a mistake, so a mentioned user, or direct message addressee, might not actually exist.
3 Tasks
Your tasks for this assignment are to write Python programs that analyse one or more of the data files to
answer some questions we have about the alien Twitter users.
3.1 Task 1: Who are the most popular users? (10 marks)
The more popular a user is, the more other users follow the user. Write a program that prints the user
name of the most popular user. In the case of a tie, print the user names of all the most popular users,
one per line, in lexicographic order. Sample output:
andrew
mrmagoo23
sally
3.2 Task 2: Who are the top parrots? (20 marks)
Write a program that prints the usernames of the top n parrots and how many tweets they retweeted. n
is a number prompted for and entered by the user. The top parrots retweeted the most. Sample output
for n = 5:
1034 mrwordy
999 chatty3 d555lucy
10 blabberMcBlabberFace john mrmagoo23
Note:
• how the output is formatted in a table with justified columns;
• that users with the same numbers of tweets were printed in lexicographic order; and
• that even though only the top 5 were requested, because of a tie, more than 5 were printed.
2
3.3 Task 3: Who are the worst trolls? (20 marks)
Write a program that prints the usernames of the worst n trolls and how many times they mentioned other
users. n is a number prompted for and entered by the user. The worst trolls are those who mentioned
the most other users in their tweets, including addressees in direct messages and the authors of retweeted
tweets. Format the output in the same manner as for task 2.
3.4 Task 4: Who are the biggest influencers? (30 marks)
Write a program that prints the usernames of the top n users than on average have the highest number of
users that see each of their tweets, and the average number of times each tweet was seen. A user will see
a tweet if it comes from a user they follow, or they are mentioned in the tweet, including as addressees in
direct messages and as the authors of retweeted tweets. Sample output (note the formatting):
1034.6 popidol34 vacuousTwit
999.0 smartymcsmartyface
10.4 chatty3 d555lucy zippo
4 Report (10 marks)
Prepare a report in PDF format that:
• identifies:
– the name of the course;
– the year and trimester;
– the title of the assessment item
– your student number;
– your name; and
– the name of your lab tutor;
• and for each task:
– state the name(s) of the program file(s) for this task;
– outline in English how your program is supposed to work, particularly with reference to the
data structures it uses;
– state whether or not you think your program has worked; and
– a screen shot of your program’s results, even if that is just an error message.
There is no set minimum or maximum length for this report, but clarity, completeness, and brevity will
be appreciated.
5 Code presentation (10 marks)
Your code will be assessed on its readability as well as its correctness. Use commenting to:
• identify the task;
• document the purpose and types of variables;
• document your functions’ purposes, parameters and returned values, and
• the purpose of sections of your code.
Assume the reader of your program knows Python at least as well as you do.
6 Submission
Submit your assignment in a zip file to Learning@Griffith. The zip file should only contain:
• your programs as text files;
• your PDF report.