0% found this document useful (0 votes)

62 views8 pages

Week 09 Tutorial Sample Answers

The document provides sample answers to tutorial questions about using the slippy command line tool to process text. It discusses using slippy with various addresses, commands, and options to print, delete, substitute, and quit on lines that match certain patterns. It also covers using multiple commands, input files, whitespace, and other slippy features.

Uploaded by

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

62 views8 pages

Week 09 Tutorial Sample Answers

Uploaded by

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Week 09 Tutorial Sample Answers

1. Below are the current assignment autotests.

Discuss what these print and why:

subset 0: quit
seq 42 44 | 2041 slippy 1q

ANSWER:

1 is the address

q is the command

The addess 1 is the address of the first line.

The command q is the command to quit.

So 1q will quit on the first line.

slippy will print the current line before it quits.

Giving us a single line of output: whatever the first line is.

In this case, the first line is 42 .

2041 slippy 10q < [Link]

ANSWER:

...
...
the first 10 lines of the [Link] file

10 is the address

q is the command

The addess 10 is the address of the 10th line.

The command q is the command to quit.

So 10q will quit on the 10th line.

slippy will print the current line before it quits.

Giving us 10 lines of output: the first 10 lines.

seq 41 43 | 2041 slippy 4q

ANSWER:

41
42
43

4 is the address

q is the command

The addess 4 is the address of the 4th line.

The command q is the command to quit.

So 4q will quit on the 4th line.

But as there are only 3 lines of input, slippy will hit EOF first.
Therefore, all three lines of output will be printed.

The q command never gets the chance to be used.

seq 90 110 | 2041 slippy /.1/q

ANSWER:
90
91

/.1/ is the address

q is the command

The addess /.1/ is the address of any line that matches the regex .1 .
The command q is the command to quit.

The line 91 matches the regex .1 .

As the regex . (any character) matches the 9 .

So we will quit on the line 91 .

As allways print the current line before quitting.

2041 slippy '/r.*v/q' < [Link]

ANSWER:

...
...
aardvark

/r.*v/ is the address

q is the command

The addess /r.*v/ is the address of any line that matches the regex r.*v .

The command q is the command to quit.

Depending on the contents of the [Link] file, the lines printed may be different.

For my dictionary 247 line were printed before aardvark matches the regex r.*v .

All lines upto and including aardvark will be printed.

yes | 2041 slippy 3q

ANSWER:

y
y
y

Note: the yes command will print y infinitely.

3 is the address

q is the command

The addess 3 is the address of the 3rd line.

The command q is the command to quit.

Because yes prints infinitely slippy can't wait untill EOF to stop.

slippy also can't read all input lines into an array.

slippy must process lines one at a time.

Note: because of the $ address (last line) slippy needs to read two lines at a time.
The current lines and the next line (to detect when there is no next line).
slippy should not store more that two lines in memory at any time.

subset 0: print
seq 41 43 | 2041 slippy 2p

ANSWER:

41
42
42
43

Note: the yes command will print y infinitely.

2 is the address

p is the command
The addess 2 is the address of the 2nd line.
The command p is the print command.

So 2p will print on the second line.

This print is in addition to the automatic print of the current line that slippy already does.
This causes the second line to be printed twice.

head [Link] | 2041 slippy 3p

ANSWER:

Third line of the [Link] file is printed twice.

seq 41 43 | 2041 slippy -n 2p

ANSWER:

The -n option is used suppress (turn off) the automatic printing of the current line.

Therefore we only print when explicitly asked to.

We are asked to print the second line, and so get the output of 42 .

2041 slippy -n 42p < [Link]

ANSWER:

Similar to the previous example

Only the 42nd line is printed.

head -n 1000 [Link] | 2041 slippy -n '/z.$/p'

ANSWER:

Similar to the previous example

Only print a line if it matches the regex z.$ .

That is: if the second last character is a z .

subset 0: substitute
seq 1 5 | 2041 slippy 's/[15]/zzz/'

ANSWER:

Run the substitute command on each line.

replace the first instance of 1 or 5 on each line with zzz .

seq 1 5 | 2041 slippy 's/[15]/zzz/g'

ANSWER:

Run the substitute command on each line.

replace all instances of 1 or 5 with zzz .

echo "Hello Andrew" | 2041 slippy 's/e//'

ANSWER:

Run the substitute command on each line.

replace the first instance of e on each line with the empty string.
echo "Hello Andrew" | 2041 slippy 's/e//g'

ANSWER:

Run the substitute command on each line.

replace all instances of e with the empty string.

subset 1: addresses
seq 1 5 | 2041 slippy '$d'

ANSWER:

$ is the special address for the last line.

d is the delete command.

If a line is deleted then processing immediately moves on to the next line.

The line is not automatically printed.

seq 42 44 | 2041 slippy 2,3d

ANSWER:

2,3 is a range address.

The command is applied to all lines within the range (start and end line inclusive).

seq 10 21 | 2041 slippy 3,/2/d

ANSWER:

Similar to the previous example

seq 10 21 | 2041 slippy /2/,7d

ANSWER:

Similar to the previous example

seq 10 21 | 2041 slippy /2/,/7/d

ANSWER:

Similar to the previous example

subset 1: substitute
seq 1 5 | 2041 slippy 'sX[15]XzzzX'

subset 1: multiple commands

seq 1 5 | 2041 slippy '4q;/2/d'

subset 1: -f
echo "4q" > [Link]
echo "/2/d" >> [Link]
seq 1 5 | 2041 slippy -f [Link]

subset 1: input files

seq 1 2 > [Link]
seq 1 5 > [Link]
2041 slippy '4q;/2/d' [Link] [Link]

subset 1: whitespace
seq 24 42 | 2041 slippy ' 3, 17 d # comment'

subset 2: -i
seq 1 5 > [Link]
2041 slippy -i /[24]/d [Link]
cat [Link]

subset 2: multiple commands

echo 'Punctuation characters include . , ; :' | 2041 slippy 's/;/semicolon/g;/;/q'

2. Write a Python program, [Link] which given the URL of a web page fetches it by running wget and prints the HTML
tags it uses.
The tag should be converted to lower case and printed in alphabetical order with a count of how often each is used.

Don't count closing tags.

Note the counts in the above example will not be current - the CSE pages change almost daily.

ANSWER:
#! /usr/bin/env python3

# written by Nasser Malibari and Dylan Brotherston

# fetch specified web page and count the HTML tags in them

import sys, re, subprocess

from collections import Counter

def main():

if len([Link]) != 2:
print(f"Usage: {[Link][0]} <url>", file=[Link])
[Link](1)

url = [Link][1]

process = [Link](["wget", "-q", "-O-", url], capture_output=True, text=True)

webpage = [Link]()

# remove comments
webpage = [Link](r"", "", webpage, flags=[Link])

# get all tags

# note: use of capturing in [Link] returns list of the captured part
tags = [Link](r"<\s*(\w+)", webpage)

# using [Link], alternatively can use a dict to count

tags_counter = Counter()
for tag in tags:
tags_counter[tag] += 1

for tag, counter in sorted(tags_counter.items()):

print(f"{tag} {counter}")

if __name__ == "__main__":
main()

3. Add an -f option to [Link] which indicates the tags are to be printed in order of frequency.

$ ./[Link] -f [Link]
head 1
noscript 1
html 1
form 1
title 1
footer 1
header 1
body 1
h2 2
hr 3
h4 3
span 3
link 3
small 3
h5 3
em 3
meta 4
strong 4
input 5
img 12
br 14
script 14
p 18
ul 25
li 99
a 141
div 161

ANSWER:
#! /usr/bin/env python3

# written by Nasser Malibari and Dylan Brotherston

# fetch specified web page and count the HTML tags in them

import re, subprocess

from collections import Counter
from argparse import ArgumentParser

def main():

parser = ArgumentParser()
parser.add_argument('-f', '--frequency', action='store_true', help='print tags by
frequency')
parser.add_argument("url", help="url to fetch")
args = parser.parse_args()

process = [Link](["wget", "-q", "-O-", [Link]], capture_output=True, text=True)

webpage = [Link]()

# remove comments
webpage = [Link](r"", "", webpage, flags=[Link])

# get all tags

# note: use of capturing in [Link] returns list of the captured part
tags = [Link](r"<\s*(\w+)", webpage)

# using [Link], alternatively can use a dict to count

tags_counter = Counter()
for tag in tags:
tags_counter[tag] += 1

if [Link]:
for tag, counter in reversed(tags_counter.most_common()):
print(f"{tag} {counter}")
else:
for tag, counter in sorted(tags_counter.items()):
print(f"{tag} {counter}")

if __name__ == "__main__":
main()

4. Modify [Link] to use the requests and beautifulsoup4 modules.

ANSWER:
#! /usr/bin/env python3

# written by Dylan Brotherston

# fetch specified web page and count the HTML tags in them

from collections import Counter

from argparse import ArgumentParser

import requests
from bs4 import BeautifulSoup

def main():

parser = ArgumentParser()
parser.add_argument('-f', '--frequency', action='store_true', help='print tags by
frequency')
parser.add_argument("url", help="url to fetch")
args = parser.parse_args()

response = [Link]([Link])
webpage = [Link]()

soup = BeautifulSoup(webpage, 'html5lib')

tags = soup.find_all()
names = [[Link] for tag in tags]

tags_counter = Counter()
for tag in names:
tags_counter[tag] += 1

if [Link]:
for tag, counter in reversed(tags_counter.most_common()):
print(f"{tag} {counter}")
else:
for tag, counter in sorted(tags_counter.items()):
print(f"{tag} {counter}")

if __name__ == "__main__":
main()

5. If you fell like a harder challenge after finishing the challenge activity in the lab this week have a look at the following
websites for some problems to solve using regexp:

◦ [Link]
◦ [Link]

COMP2041 Exam Preparation Guide
No ratings yet
COMP2041 Exam Preparation Guide
3 pages
Linux
No ratings yet
Linux
5 pages
Python Tutorial - Execute A Script
No ratings yet
Python Tutorial - Execute A Script
1 page
Python OOP Lab Setup & DOS Commands
No ratings yet
Python OOP Lab Setup & DOS Commands
22 pages
Final Study Notes
No ratings yet
Final Study Notes
36 pages
First Web Scraper
No ratings yet
First Web Scraper
34 pages
Sedbook
No ratings yet
Sedbook
16 pages
Linux Stream Editor
No ratings yet
Linux Stream Editor
85 pages
Introduction To The Unix Environment: Valeriu Ohan
No ratings yet
Introduction To The Unix Environment: Valeriu Ohan
13 pages
Bash Cheatsheets GitHub
No ratings yet
Bash Cheatsheets GitHub
8 pages
CS 35L: Linux, Scripting, Git Basics
No ratings yet
CS 35L: Linux, Scripting, Git Basics
6 pages
05 App
No ratings yet
05 App
34 pages
Some Useful UNIX Commands
100% (1)
Some Useful UNIX Commands
5 pages
Python3 Notes
No ratings yet
Python3 Notes
215 pages
Python
100% (9)
Python
431 pages
Python3 Notes
No ratings yet
Python3 Notes
432 pages
Using The Python Interpreter
No ratings yet
Using The Python Interpreter
7 pages
Import This: Python 2.7.6 (Default, Oct 26 2016, 20:30:19) (GCC 4.8.4) On Linux2 01
No ratings yet
Import This: Python 2.7.6 (Default, Oct 26 2016, 20:30:19) (GCC 4.8.4) On Linux2 01
3 pages
Btech Linux Experiment
No ratings yet
Btech Linux Experiment
43 pages
Laboratory No. 1 - PYTHON
No ratings yet
Laboratory No. 1 - PYTHON
7 pages
Linux Text Processing Guide
No ratings yet
Linux Text Processing Guide
11 pages
Web Scraping
No ratings yet
Web Scraping
35 pages
Essential Linux File Commands Guide
No ratings yet
Essential Linux File Commands Guide
33 pages
Essential Linux Command Guide
No ratings yet
Essential Linux Command Guide
7 pages
Lesson-5 Shell Scripting and Django
No ratings yet
Lesson-5 Shell Scripting and Django
64 pages
Module 5
No ratings yet
Module 5
14 pages
Unix Commands
No ratings yet
Unix Commands
4 pages
Bash Ch01
No ratings yet
Bash Ch01
14 pages
Awk One-Liners
No ratings yet
Awk One-Liners
58 pages
LinuxCommands Ipython
No ratings yet
LinuxCommands Ipython
2 pages
Questions
No ratings yet
Questions
10 pages
Sed, A Stream Editor: by Ken Pizzini, Paolo Bonzini
No ratings yet
Sed, A Stream Editor: by Ken Pizzini, Paolo Bonzini
81 pages
Unix Command Guide for Beginners
No ratings yet
Unix Command Guide for Beginners
8 pages
Perl Reference Card Overview
No ratings yet
Perl Reference Card Overview
2 pages
Basic Unix Commands1
No ratings yet
Basic Unix Commands1
40 pages
Linux Intro
No ratings yet
Linux Intro
6 pages
Linux Intro
No ratings yet
Linux Intro
6 pages
Linux Intro PDF
No ratings yet
Linux Intro PDF
6 pages
Sed One-Liners Explained (Preview Copy)
No ratings yet
Sed One-Liners Explained (Preview Copy)
17 pages
UNIX Commands and Shell Scripts
No ratings yet
UNIX Commands and Shell Scripts
20 pages
Python Programming
No ratings yet
Python Programming
89 pages
Unix Commands
No ratings yet
Unix Commands
2 pages
Lab1 2024
No ratings yet
Lab1 2024
5 pages
Shell Commands Sept4Update
No ratings yet
Shell Commands Sept4Update
66 pages
Perl Training Session1 22nd Sept 2012
No ratings yet
Perl Training Session1 22nd Sept 2012
125 pages
Python Basics: A Comprehensive Guide
No ratings yet
Python Basics: A Comprehensive Guide
61 pages
Notes 2 Working On A Terminal 11aug2022
No ratings yet
Notes 2 Working On A Terminal 11aug2022
10 pages
String Algorithms & Pattern Matching
No ratings yet
String Algorithms & Pattern Matching
22 pages
Week 10 Randomised Algorithms, Algorithm and Data Ethics, Course Review
No ratings yet
Week 10 Randomised Algorithms, Algorithm and Data Ethics, Course Review
21 pages
Week 2 Analysis of Algorithms
No ratings yet
Week 2 Analysis of Algorithms
36 pages
Week 08 Tutorial Sample Answers
No ratings yet
Week 08 Tutorial Sample Answers
4 pages
Overview of Cybercrime Types
No ratings yet
Overview of Cybercrime Types
27 pages
HTML
No ratings yet
HTML
58 pages
OpenUI5 Drag and Drop Guide
No ratings yet
OpenUI5 Drag and Drop Guide
2 pages
Social Media Safety Tips for Users
No ratings yet
Social Media Safety Tips for Users
6 pages
Grammarly Cookies
100% (2)
Grammarly Cookies
7 pages
20200425210620cybersecurity Iot Researchpaper
No ratings yet
20200425210620cybersecurity Iot Researchpaper
15 pages
Flutter Firestore Subcollections Guide
No ratings yet
Flutter Firestore Subcollections Guide
19 pages
Customer Awareness On Cybersecurity in Financial Services
No ratings yet
Customer Awareness On Cybersecurity in Financial Services
9 pages
Sadia Resume
No ratings yet
Sadia Resume
2 pages
Image Task Guide
No ratings yet
Image Task Guide
9 pages
The Rise and Fall of Snapchat
No ratings yet
The Rise and Fall of Snapchat
3 pages
Intel XDK App Programming Full Course Handout
75% (8)
Intel XDK App Programming Full Course Handout
257 pages
ISSAP
No ratings yet
ISSAP
5 pages
Next Js Ebook
No ratings yet
Next Js Ebook
54 pages
UCM Configure Eventlist BLF
No ratings yet
UCM Configure Eventlist BLF
10 pages
Digital Signature: Cryptography and Network Security 1
No ratings yet
Digital Signature: Cryptography and Network Security 1
12 pages
BlackBerry Messenger Overview and Tips
No ratings yet
BlackBerry Messenger Overview and Tips
5 pages
Document 1631780.1
No ratings yet
Document 1631780.1
5 pages
Google Analytics Certification Test Questions
100% (1)
Google Analytics Certification Test Questions
36 pages
Huawei Partner Account Guide
No ratings yet
Huawei Partner Account Guide
10 pages
Assignment Chapter 5 Advanced E-Commerce (DR - Maha)
No ratings yet
Assignment Chapter 5 Advanced E-Commerce (DR - Maha)
3 pages
List of Dofollow Blogs
No ratings yet
List of Dofollow Blogs
18 pages
Extracted PPTX Content
No ratings yet
Extracted PPTX Content
44 pages
Application For GPA
No ratings yet
Application For GPA
10 pages
Understanding Cybercrime for Parents
100% (1)
Understanding Cybercrime for Parents
15 pages
Computer Viruses & Cybersecurity Guide
No ratings yet
Computer Viruses & Cybersecurity Guide
7 pages
Bing Ads API Access Guide
No ratings yet
Bing Ads API Access Guide
6 pages
Incident Escalation Quiz
No ratings yet
Incident Escalation Quiz
1 page
OAF Personalization
No ratings yet
OAF Personalization
9 pages
Module 07 Malware Threats
No ratings yet
Module 07 Malware Threats
292 pages

Week 09 Tutorial Sample Answers

Uploaded by

Week 09 Tutorial Sample Answers

Uploaded by

Week 09 Tutorial Sample Answers

1. Below are the current assignment autotests.

The addess 1 is the address of the first line.

So 1q will quit on the first line.

slippy will print the current line before it quits.

Giving us a single line of output: whatever the first line is.

In this case, the first line is 42 .

2041 slippy 10q < [Link]

The addess 10 is the address of the 10th line.

The command q is the command to quit.

So 10q will quit on the 10th line.

Giving us 10 lines of output: the first 10 lines.

seq 41 43 | 2041 slippy 4q

The addess 4 is the address of the 4th line.

So 4q will quit on the 4th line.

The q command never gets the chance to be used.

seq 90 110 | 2041 slippy /.1/q

/.1/ is the address

The line 91 matches the regex .1 .

As the regex . (any character) matches the 9 .

As allways print the current line before quitting.

2041 slippy '/r.*v/q' < [Link]

/r.*v/ is the address

The command q is the command to quit.

All lines upto and including aardvark will be printed.

yes | 2041 slippy 3q

Note: the yes command will print y infinitely.

The addess 3 is the address of the 3rd line.

The command q is the command to quit.

slippy also can't read all input lines into an array.

slippy must process lines one at a time.

Note: the yes command will print y infinitely.

So 2p will print on the second line.

head [Link] | 2041 slippy 3p

Third line of the [Link] file is printed twice.

seq 41 43 | 2041 slippy -n 2p

Therefore we only print when explicitly asked to.

2041 slippy -n 42p < [Link]

Similar to the previous example

Only the 42nd line is printed.

head -n 1000 [Link] | 2041 slippy -n '/z.$/p'

Similar to the previous example

Only print a line if it matches the regex z.$ .

Run the substitute command on each line.

seq 1 5 | 2041 slippy 's/[15]/zzz/g'

Run the substitute command on each line.

replace all instances of 1 or 5 with zzz .

echo "Hello Andrew" | 2041 slippy 's/e//'

Run the substitute command on each line.

Run the substitute command on each line.

$ is the special address for the last line.

d is the delete command.

If a line is deleted then processing immediately moves on to the next line.

The line is not automatically printed.

seq 42 44 | 2041 slippy 2,3d

2,3 is a range address.

seq 10 21 | 2041 slippy 3,/2/d

Similar to the previous example

seq 10 21 | 2041 slippy /2/,7d

Similar to the previous example

seq 10 21 | 2041 slippy /2/,/7/d

Similar to the previous example

subset 1: multiple commands

subset 1: input files

subset 2: multiple commands

Don't count closing tags.

Make sure you don't print tags within HTML comments.

# written by Nasser Malibari and Dylan Brotherston

import sys, re, subprocess

process = [Link](["wget", "-q", "-O-", url], capture_output=True, text=True)

# get all tags

# using [Link], alternatively can use a dict to count

for tag, counter in sorted(tags_counter.items()):

# written by Nasser Malibari and Dylan Brotherston

import re, subprocess

process = [Link](["wget", "-q", "-O-", [Link]], capture_output=True, text=True)