0% found this document useful (0 votes)

50 views113 pages

Lab Linux Complete

The document outlines the structure and objectives of a Linux course, emphasizing practical preparation and the use of GitLab for source code versioning. It introduces key Linux concepts, including its open-source nature, flexibility, and command-line interface, while also explaining the importance of user control and remote access. The lab sessions will cover basic Linux usage, source code management, and the initial steps in using GitLab, with a focus on hands-on learning and automation.

Uploaded by

ernan0924

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

50 views113 pages

Lab Linux Complete

Uploaded by

ernan0924

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Lab #1 (February 13 - February 17)

1. Course organization
2. What is Linux
3. Key concepts of Linux
4. First steps inside Linux
5. Source code versioning essentials
6. First steps inside GitLab
7. Submitting assignments
8. Wrapping up
9. Post-class tasks (deadline: March 5)
10. Learning outcomes

Course organization

Because this is a highly practically oriented course there is a great stress on before class preparation.
We strongly believe that you can learn topics of this course only if you try them yourself. Doing that
during the labs is often not the best course of action as it allows to pass the lab without doing much
yet having the feeling of good understanding. Which is usually proved wrong when you complete
graded assignment after the lab.

Therefore we decided that before coming to the labs you will study the materials yourself and lab will
be used to (a) discuss your solutions (b) answer your questions and (c) do more examples.

We will publish lab contents several weeks ahead so you can organize your time as you see fit. Before
coming to the lab, you will submit a set of graded tasks to verify that you understood the basic
principles correctly. We will look at them during the labs, discuss unclarities and thus strengthen your
knowledge.

There will be also a shorter task that you complete after the class so that you can demonstrate that
you have learned from the lab itself (e.g. from the feedback to your before-class tasks).

Take this as a brief overview, course organization is described in more detail on a separate page.

This lab is somewhat special as it is the very first one, hence there are no before-class tasks.
However, starting from next week there will be graded tasks that you are supposed to complete
before coming to next lab.

This lab already contains graded post-class tasks.

In this lab we will cover basic concepts of Linux, source code versioning and also how to submit graded
tasks.
Purpose of this course

The purpose of this course is not only to show you a different operating system but also to show you
a different style of work.

We expect that after this course, you will be able to do the following:

• Use Linux as a user for your everyday work. This includes the activity of normal user such as reading e-
mails as well as activities of a power user who can really control his machine.
• Use typical Linux tools with ease. We will not spend our time on common software such as a web
browser or image editor that you can find virtually anywhere, but focus on tools that are closer to the
system itself.
• Automate your work a lot. You will learn that many everyday tasks can be simplified by writing small
programs that automate them. Linux offers the right environment for this.

On the other hand, this course does not cover machine administration (except for the fundamentals
required to maintain your laptop) or compiling your own kernel.

What is Linux

Under the term Linux we mean the operating system and software that is typically available on such
system. This includes – but is not limited to – development tools (compilers etc.), graphical
environment, text editors, spreadsheet software, web browsers etc.

Note that for simplicity – both in writing and in speaking – we use the term Linux to name the whole
environment we will be working in.

Strictly speaking, the name Linux refers only to the kernel of the operating system – i.e. the bottom
layer of the software stack (the applications are considered the top layers).

The whole environment is often called GNU/Linux to emphasize that it is the kernel and other free
software.
You will also often hear the term Linux distribution. That is a fancy name for a packaging of the Linux
kernel (i.e. the lowest layer of the software stack) and user applications. There are hundreds of
distributions available, some differ only in the default wallpaper, some are specific for a certain
domain (e.g., network testing).

Their fundamental differences are mostly on maintenance level, e.g. how software is installed or how
the system is configured. Most of the time end users do not need to care at all which distribution they
use.

We will be using Fedora which is a generic distribution that can run on servers as well as on desktops.
If you are new to Linux we strongly recommend to stay with our choice and use our installation.

Although many of the Linux concepts as well as the software available on Linux is also present in other
operating systems, Linux provides them in a nice integrated packaging. Also we believe that only Linux
provides an environment for their seamless integration.
Key concepts of Linux

Here we list the key concepts of a Linux environment. Take it as an overview only, we will provide
further details in subsequent labs.

Linux uses open-source software (OSS). That means that you are free to inspect how things are
implemented. You are also free to change the implementation. Do not underestimate this aspect. It is
really important. And as a sidenote: even with OSS one can earn money.

Linux is extremely flexible and customizable. You can run Linux on IoT devices as well as on heavy-
duty routers. Linux is running on cell-phones as well as supercomputers. The user can configure
virtually anything. Traditionally, configuration is stored as plain text files. While it is often possible to
edit the configuration from a GUI-based tool, Linux always allows the user to edit the file manually.
An advanced user virtually needs only a good text editor to configure the whole system.

Linux also has a graphical user interface. But it is an optional part of the system as it is not always
needed. Server-style machines do not need any movable windows to operate. And when you need
the GUI, you can choose from many types to best suit your needs. From the system perspective, GUI
is just another application running in the system, not a part of the system.

Linux excels when controlled through a command-line interface (CLI). While entering textual
commands might seem a very obsolete way of controlling your machine, it is not. After all, most
programming languages are still based on textual source code. And CLI has many advantages over a
GUI: it is explicit and easily automated. It is also perfect for remote access as it is very modest on
resources.

Probably the most important concept is that everything is a file. This means that even your devices
– such as the hard-drive – are available as normal files that can be read or written. This actually
simplifies implementation of the tools and it enables fuller control over the system. But it does not
stop with devices: even information about your system – such as list of running programs – is available
as a contents of a special file. This is a great thing for a programmer: virtually you need only file API
(such as Pythonic with open("filename", "r") as f) to get all the information about the system.

Linux is by default a multi-user system. Not only that it allows to set up user accounts, but multiple
users can use the system at the same time. They can be connected remotely, but it is even possible to
use a machine with a dual-head graphical card and two keyboards by two people simultaneously.

Linux also prides itself on remote access support. As long as your system is connected to a network,
you can configure it to be remotely accessible. This simplifies management of server machines, but it
can be useful even for your laptop left at home. While remote CLI access is usually preferred, it is
possible to connect graphically, too. Actually, you can even connect graphically in several instances at
once, each using a different environment.

Linux simplifies management of installed software by use of packages. They can be roughly compared
to various Stores that you may know from your cell-phones. They simplify the installation as you do
not need to click through any installation wizard and package managers also keep your system up to
date.
Probably the most important concept is that user is in control of the machine. The philosophy is that
the user is smart enough to control the computer and Linux does virtually nothing without explicit
action and does not hide information from you. You can configure it to do things automatically, but
it is always a layer on top of the base system. You do not need special tools to look inside.

This may sound scary, but it is actually fun. You will (not might) understand how the computer and
software works much better if you use and explore Linux.

First steps inside Linux

Here we assume that you have your USB disk ready or you have your virtual machine running. Please,
refer to another page on how to actually boot your machine to Linux.

Return to this page once you boot from the USB (or using any other method mentioned) to continue
with this lab.

Feel free to bring your laptop to the lab and let us help you with the booting.

Selecting your desktop environment

Once you boot from the USB disk, you can choose which desktop environment you will use.

On most operating systems, there are not many options on how to control your graphical interface.
With Linux, there is a much wider choice. It ranges from rich environments with a plenty of eye-candy
to very austere ones that do not even employ a mouse. Of course, there are dozens of environments
somewhere in between.

Recall that this is easily possible, because the GUI is actually controlled by a normal application: it is
not hard-wired into the system.

On your installation, you can choose from six different environments:

GNOME, Plasma (a.k.a. KDE), Xfce and LXDE represent mainstream desktop environments that
should be familiar to any user coming from either Windows or Mac environments. The above ordering
roughly corresponds to the amount of eye-candy they offer (and to the hardware requirements they
need).

Openbox and i3 are special environments as they do not contain the traditional task bar with a list of
windows and they require a bit more patience before they are mastered. On the other hand, the time
investment, especially for i3 that is driven by keyboard only, pays back in a much more efficient usage
of your computer.

We encourage you to try all of them. Login into the environment, determine how applications are
launched and decide which environment you like the most. Note that the environments can be further
customized – from the overall color scheme to keyboard shortcuts.

If you are unable to decide, Plasma is a good choice for ex-Windows users with decent hardware.
Choose LXDE if your machine is shorter winded. And after a month of using these, switch to i3 to
become a true power user.

Selecting your applications

Once you decide on your desktop environment, look around for other applications you will need.

Above all, look around for the text editors available. There are several popular graphical editors
already installed as you can see on the following screenshot.

Note that other editors are available from the command-line: we will talk about these during the next
lab.
Source code versioning essentials

We will now switch to a side track and talk about software projects in general.

Modern software is rarely a product of one-man teams. Rather, it is developed by large teams that
can span several time zones or even continents.

Development in such teams requires that all developers have access to the (most up-to-date version
of the) source code and that they can communicate with other members of the team efficiently.

There are many solutions to this: from e-mails and shared network disks to more sophisticated
solutions. To prepare you a little for the software engineering practice, we will be using one of the
more sophisticated solutions and that is GitLab.

GitLab offers a place where developers can share the source code, but also manage a list of existing
bugs, keep documentation, and even automatically test their code. And since it can be integrated with
other tools as well, for many companies as well as open source projects, GitLab became the central
place for their product.

Furthermore, we can use its advantages even when working alone. Even if we would use it only as a
smart backup for our source code at the beginning.

For this course, GitLab will also become the central place for many tasks. You will submit solutions to
it and there is also the Forum project where you can ask questions.

There are other alternatives to GitLab offering similar features. We will be focusing on GitLab in this
course, but the general principles apply to other tools, too.

Source code versioning tools

The central point of any software project is the source code. Without it, there is nothing to be
executed. Therefore, extra care and tooling is provided for source code management itself.

GitLab itself is built around Git. Git is a versioning system. In layman terms, it means that it watches
your files for changes and remembers previous versions of your files. It has the big advantage that
you can freely update your code and still return to its older versions.

We will be working with Git through the whole course. Take this description as a very high-level
overview so you can start working with Git the GUI-way in GitLab.

Practically, Git always works in a certain directory that typically represents one project. The user needs
to tell Git which files are to be tracked and at which point to create a new version.

Git does not track all files as there is typically no need to version the compiled files (because you can always
recreate them). For example, for Java project you do not need to track *.class files as you can create them
from *.java ones by compiling the source codes again (some applies to *.pyc with Python or *.o with
C++). Another example would be that you do not track PDF export of a LibreOffice document (though
tracking *.odt files is not something where Git would unleash its full potential).
Git does not create the versions automatically as each version is supposed to capture a reasonable
state of the project. Thus, for example, you create a new version (sometimes also called revision) once
you add a new feature to your software. Or when you fix a bug. Or when you fix a typo in the
documentation. Or even when you want to backup your work before going to lunch :-).

It allows you to create a reasonable history of the software that is small enough for reviewing (for
example), but it does not preserve every small typo you made. Versioning does not replace
undo/redo of your editor, it operates one level above that.

And when employed in a team, Git can be used to synchronize changes done by multiple users. For
example, if Alice makes a change to file alpha.txt and Bob at the same time changes the
file bravo.txt, Git allows Carol to work seamlessly on a version that contains changes both from Alice
and Bob.

At this moment we will be using only the graphical interface provided by GitLab in the web browser.
Later in the course we will uncover even the more advanced scenarios.

Using MFF GitLab

For this course, we will be using the faculty instance of GitLab at https://gitlab.mff.cuni.cz. Please,
do not confuse it with the instance at gitlab.com that you can freely use, but which is in no way
connected with this course.

For login (username) you will be using your CAS credentials, i.e., the same ones as you use for SIS.
Your first login will activate your account.

Always use your name-based login (e.g. johndoe) not the numerical one.

Please activate your account now, if you have not yet done so. Please, read our Q & A if you have
trouble logging in.

First steps inside GitLab

To quickly try GitLab (we will focus on it more in several labs), create a new project (create a Blank
project). You need to fill in a project name, its slug (a short version of the name used in the URL), and
its visibility.

In the example screenshots below, we create a project with our source code from the introductory
programming course. Do not forget to ensure that the project is initialized with a README.
Now open Web IDE which is a simple editor available for on-line editing of the source code files.

Using the icons and the help of the following set of screenshots, create a new file, name
it hello.py and insert a simple Python program.

We will now create a so-called commit. Commit in Git captures the current state of the project and
can be seen as a named version. In fact, whenever you create a commit, Git will ask you for a Commit
message where you are supposed to describe what changes you made.

We highly recommend the article How to Write a Git Commit Message by Chris Beams for nice tips on how
to write a good commit message. However, it might make more sense to return to this article later on once
you know Git a bit more.
For now, we will be making all changes directly to the Master branch. We will explain the concepts of
branches later on, for now take them as a magic that works :-).
The important thing to remember is that commit assigns a name to a particular state of your source
code (revision).

Often you will see names such as Add icons to the menu or Fix button typo or Finish Czech translation.
As you see, they refer to the state of the project.

Sign-out from GitLab now.

On your own: sign in to GitLab again, find your project and create a new file, pasting in some of your
source code. Note that when you click on the filename on the project homepage, you will see its
contents and again a link for its editing.

Self-test: check you understood GitLab essentials

Select all true statements.

GitLab helps to track changes in source code files.

GitLab can replace any development environment.
Commit represents every change to a file.
GitLab allows users to track not only source code but also list of bugs or other assets.
Evaluate
Submitting assignments

We will be using GitLab to submit and evaluate the graded assignments.

We will create a special project for each of you here with your CAS login in its name.

We will create this project during first week of the semester.

For technical reasons, we can create the project only after you sign-in to GitLab for the first time. We
create these projects semi-manually so may need to wait until another day for your project to appear.

Each assignment will have a prescribed filename where to submit the solution. Submitting under a
different filename (or to a different folder) means we will not be able to find your assignment (and
thus we will count it as not submitted). There are about 300 students enrolled to this course and we
need to automate a lot of things: in this sense we really cannot manually look around your project to
guess whether you have submitted under a different name.

Filenames in Linux (and on GitLab too) are case-sensitive.

Each submission – more precisely each commit – will launch automated tests on top of your repository.
These tests will check whether you have submitted the solution at all and also check whether it
behaves as it is supposed to.

We have put more details on how to interpret the results on a separate page.

Wrapping up

Each lab also contains so-called learning outcomes. They capture the most important theoretical
knowledge as well as practical skills that you should have after completing the lab.

Use them as you see fit. They can serve as a checklist that you understand a new topic or as a summary
if you are already familiar with some topics.
Post-class tasks (deadline: March 5)

We expect you will solve the following tasks after attending the labs and hearing feedback to your
before-class solutions.

All tasks (unless explicitly noted otherwise) must be submitted to your submission repository. For most
of the tasks there are automated tests that can help you check completeness of your solution (see
here how to interpret their results).

01/project_name.py (50 points, group admin)

Write a Python program that tries to guess project name when executed.

The program will look for a file named README.md (in current directory) and it will print its first non-
empty line. We assume that project name would be on the first line of the read-me file.

If all lines are empty or the file does not exist, the program will try to look
for readme.md, README and readme in this order (stopping the search when title is found). That is,
if readme.md contains non-empty line, it will not even try to look for README.

If no file from the above list contains non-empty line, the program will print name of the current
directory (os.path.basename(os.getcwd())).

We consider line empty if line.strip().lstrip('# ') == ''. Printing the project title should also
strip it of blank spaces and leading # (i.e. the above code). This ensures that trailing whitespace is
ignored and for Markdown titles we remove extra formatting.

You can also consult the results of the provided tests if behaviour in some situations is not clear from
the description above. Interpreting test results is described on a separate page.

Upload your solution into folder 01 in your GitLab submission repository and name the
file project_name.py. For most of the tasks we will name them in this manner: task name will directly
refer to the file where to store your solution.

For this task, we have created a simple skeleton in your repository that you should use as a starting
point.

Important: the first line in the skeleton code from us starting with #! is there for a reason and you
have to keep it there (we will discuss this later on). Also keep the usage of main() and the condition
with __name__ as that represents a proper module-ready code.

Important: do not upload the README files into your project. This file is provided by the tests
automatically (but, obviously, create it on your machine when debugging your solution).
Forum confidential issue (50 points, group git)
An Issue in GitLab is a report typically describing an existing bug in a software project. We will use
such Issues for off-line communications in this course.

When an issue is marked Confidential, only users with certain access rights can see it. In the case of
the Forum project, only teachers can see such issues.

This assignment asks you to create a Confidential Issue on the Forum with the following properties.

• The Title must be 01/issue (no extra characters, please).

• Do not assign the issue to anybody
• Attach to it the file /etc/os-release from your Linux (using computers in the IMPAKT or other lab
is fine too). Make the attachment link in the Description to read as /etc/os-release.
• Below the /etc/os-release link, paste the contents of /proc/loadavg file as source code (look
for Code block formatting in the Help). You will need to open this file in a text editor to copy the
contents, do not be surprised that the size is 0 and yet it contains useful content. Again, use a value
you have seen on your computer or in the faculty lab when you were logged in.
• Below the /proc/loadavg content, write a single paragraph describing which (Linux) desktop
environment you have selected and why. Write this paragraph in full sentences. By desktop
environment we mean the graphical environment you will use when being logged in such GNOME,
Xfce or i3. We do not mean Linux distributions (i.e. not interested in reasoning whether Ubuntu or
Fedora).
• Please, leave the issue open (we will close it after the deadline).

The issue before submitting should therefore look like this in the Preview tab (colors and fonts may
obviously differ).

Important: make sure you create this issue confidential. Public issues cause notifications to be sent
to all members of the project, in this case it means that all of your colleagues would receive an e-mail
they have no interest in.

Learning outcomes

Learning outcomes provide a condensed view of fundamental concepts and skills that you should be
able to explain and/or use after each lesson. They also represent the bare minimum required for
understanding subsequent labs (and other courses as well).

Conceptual knowledge

Conceptual knowledge is about understanding the meaning and context of given terms and putting
them into context. Therefore, you should be able to …
• explain why graphical user interface is not a fixed part of Linux
• list several differences between various graphical interfaces available in Linux
• explain in broad terms what is a Linux distribution
• explain what can be understood under the term of unix family of operating systems
• list a few types of assets that are typically needed for software projects
• explain in broad terms what is a versioning tool
• explain fundamental high-level operations of versioning tools
Practical skills

Practical skills are usually about usage of given programs to solve various tasks. Therefore, you should
be able to …

• boot your own machine into Linux (either via USB, dual-boot or virtualized)
• log in to a graphical Linux environment
• log in to the faculty instance of GitLab
• create a new project in GitLab
• upload a new file to GitLab via its web user interface and create a commit from it
• edit existing files in a GitLab project using its web interface
• customize a selected graphical environment
• create a basic GitLab issue in a given project
Introduction to Linux (NSWI177)

Lab #2 (February 20 - February 24)

• Why shall we use the command line at all

• Filenames and paths
• Fire up your terminal, please
• Navigating through the filesystem
• Text user interface tools
• Editing file contents
• Setup excercise files
• Shell wildcards
• Exploring file contents
• Manual pages
• Caveats (file names with spaces in them)
• Work efficiently
• Running Python scripts from the command line
• Before-class tasks (deadline: start of your lab, week February 20 - February 24)
• Post-class tasks (deadline: March 12)
• Learning outcomes
• This page changelog

In this lab we will start on learning the most effective way to control your Linux machine
– via command-line interface.

The lab starts with a bit motivation why we should care about command-line interface
at all. Then we do a short recap of what is a filename and a file path and continue to a
brief explanation of the Linux file system hierarchy.

After this more theoretical introduction we dive into using the terminal. You will learn
how to navigate through directories and how to display file contents in the terminal.

Why shall we use the command line at all

First of all, it is explicit and precise. There is no danger that a user would have a different
skin or a different set of taskbars when describing an action to make. Using an exact
command leaves no place minimizes misunderstanding.

Next, it is also rather fast. Once we start comparing the possible speed of a mouse
when clicking on icons versus the speed of keyboard keystrokes, the keyboard would
be a clear winner (assuming in both approaches we would know what we want to do).

And partially connected with the above reasons, it is also easy to save the typed
commands into a file and re-run later.
Such (text) files are called scripts in the Linux world. They can simply be a list of
commands to execute, but they can also consist of loops and conditions to execute
more sophisticated actions. We will devote several labs to these.

And from the machine side, it is also extremely efficient. Especially when we talk about
remote access over an unstable connection. The difference between sharing even a
small 800×600 screen vs. sending keystrokes is substantial. Managing Linux servers
over a flaky 2G connection is possible, managing a server offering only GUI over such
a poor connection is out of question.

The downside is that an exploratory approach is a bit more complicated. In a GUI-

oriented system, you can try clicking on various icons and explore the system relatively
easily. On the command line, you need to learn a lot of commands to use the system
effectively. However, you are not forced to remember every single option for every
single command, as there is an easily accessible comprehensive documentation (more
on this topic in later sections).

Actually, it is exactly the same as with any programming language: you need to know
the API before being able to write a program. Except in Linux, the API are not functions
in a typical programming language, but rather complete programs.

The fact is that the shell we will be using was born about 50 years ago. But it is still
used today. It may mean that we were not able to come with anything better for quite
a long time. But, more likely, it may suggest that the pros are worth it.

The beginning might be difficult, but you will not regret it in the long run.

From the practical point of view, using the command line is somewhat similar to using
Python in an interactive session. You type commands, you can edit them and once you
are happy, you execute the command by hitting <Enter>.

From now on, our interactive with the system would mostly look like this.
Do not be scared, though. Linux often trades eye-candy for efficiency. Try to approach
it more like a new programming language: you need to learn (and remember too!) a
bit about the constructs and the standard library first before writing big programs. In
Linux it is the same. And it is definitely worth it if you mean it with computers seriously.
We mean it. Seriously.

Filenames and paths

As a matter of fact, you probably know all of this. Feel free to skim over this part if that
is so. We have highlighted the important parts for you.

Basic terms

In our text, we will use the term filename to refer to a plain file name without any
directory specification. Filenames are what you see when you open any kind of file
browser or in an e-mail with attachments.

Note that on Linux we prefer to use the word directory over the term
folder. Folder usually refers to something virtual that is not present on the file system
(i.e., as if not physically existing as-is on the hard drive). Therefore, we can talk about
folders in your e-mail client or in a cloud storage.

Path means that the filename is prefixed with some kind of directory specification.

On Linux the path separator is a forward slash / (i.e., no escaping needed when
writing it in Python). Linux does not have any notion of disk drives: everything is found
under a so-called root which is a single forward slash /.
When you are building paths in Python, you should prefer to use
functions os.path.join() and similar as they ensure that the right separator is used
regardless of the actual platform (instead of pasting dir_name + '/' +
filename manually).

File names on Linux are also case-sensitive. Therefore, it is possible to have

file foo.txt, FOO.txt and Foo.txt in a single directory. (The fact that it is possible does
not necessarily mean it is recommended. Quite the opposite. Simply do not do that.)

Relative and absolute path, working directory

A path can be relative or absolute. When a path is absolute, it refers to a specific file
on a given computer. No matter what directory you are currently in. A relative path is
always combined with another directory to form an absolute path.

On Linux, each absolute path must start with a slash; if a path does not start with a
slash, it is treated as a path relative to the working (current) directory. Intuitively, the
working directory refers to the directory that you just opened in the file browser.

Special directories

A path can contain references to parent directories via .. (two dots). For example,
relative path ../documents/letter.odt means that the file is located in
directory documents that is one level up from the current directory. Assuming we are in
directory /home/intro/movies (note that this is an absolute path), the absolute path for
the letter.odt would be /home/intro/movies/../documents/letter.odt which can be
resolved (shortened) to /home/intro/documents/letter.odt.

Apart from the special directory name of .., there is also a special directory . (dot) that
refers to the current directory. Therefore ./bin/run_tests.sh refers to a
file run_tests.sh in a bin directory that is a subdirectory of the current one (i.e., it is
exactly the same as bin/run_tests.sh). Later, we will see why the dot . directory is
needed.

Filename extensions

Linux does not enforce or restrict the use of an extension in the filename
(e.g., .zip or .pdf). In fact, a file can exist without it and it can even have multiple ones.
A typical example of multi-extension file is file.tar.gz which denotes that the file is a
tape archive (.tar) later compressed with gzip.

Actually, file.tar.gz is a typical example of division of responsibility on Linux systems.

There is a separate program that is able to create one file from multiple ones (that is, the
archiver) and another one that is able to compress just one file. While the user can easily
pack/unpack such files with a single command, the difference is captured by the extension
and the programs can be developed separately. And from a practical standpoint, gzip can be
easily replaced by a different compression algorithm without any need to change the
archiver.

Hidden files

File names that start with a dot . are by default hidden.

It is important to remember that dot-files are completely normal files (or directories) and it is
just a convention to not show them by default. It is not a security measure. It just keeps the
listing a bit less verbose.

Typically, configuration (e.g., which wallpaper you have on your desktop) is stored in dot files
as they are usually supposed to be ignored by the user (at least most of the time) and would
only clutter the listing.

Everything in Linux is a file

We have started our Linux exploration with paths and filenames for a very good reason.
Virtually everything in a Linux system is a file.

You already know that there are plain files (e.g., the letter.odt file we mentioned
above that represents a word processor document) and directories (for organizing
other files).

Be aware that the word file in Linux can refer to both normal files as well as directories,
i.e., a directory is a file.

There are also other special types of files that can represent hardware devices, systems
state etc. We will talk about these later.

Self-test: check you understand this section

Select all true statements.

File .bashrc is a hidden file.

File bashrc is a hidden file.
Path /bin/cat is an absolute path.
Path ./bin/cat is an absolute path.
Paths /usr/share/../bin/ls and /usr/share/bin/ls refer to the same file.
Paths /usr/bin/../share/zoneinfo/../mc/mc.lib and /usr/share/mc/mc.lib refer to
the same file.
Files /users and /Users are the same.
File /home/intro/mff\file.txt is inside a mff directory.
Both bin and cat in the path /bin/cat are files.
Evaluate

Fire up your terminal, please

Enough of theory. Please, locate the Terminal program and start it. Depending on your
environment, it will be either Terminal, Console, or perhaps even Shell (although,
technically, shell is the program running inside a terminal emulator).

We recommend you spend some time configuring the look of your terminal, such as
having a nice font family and a reasonable font size. You will be spending quite a lot
of time with it, so make the experience nice. Below are some possibilities of what you
might get :-).
You will see something like [intro@localhost ~] and a blinking cursor after that. This
is called a prompt and if you see it, it means you can enter your commands.
The prompt is displayed by your shell which is an interpreter of the commands you
enter. The shell is actually a full-fledged programming language, but in this lab we will
use it to launch very simple commands only.

Type uptime and start this command by submitting it with <Enter>. Until you
hit <Enter>, you can easily edit the command. Shortcuts such as <Ctrl>-<Arrow> for
jumping over words work, too.

As we already mentioned, the experience is somewhat similar to an interactive Python

session (editing etc.).

Quick copy-paste (and forceful program termination)

Whenever you select a text in the terminal with your mouse, it is automatically copied.
This text then can be inserted by simply clicking the middle mouse-button (or the
wheel).

Note that the well-known <Ctrl>-C and <Ctrl>-V combinations do not work in the shell
as <Ctrl>-C is used to forcefully terminate a program. However, <Ctrl>-<Shift>-
C usually works.

Note that these are actually two distinct clipboards – the special one bound to the
middle mouse button and the one bound to <Ctrl>-C (<Ctrl>-<Shift>-C) and <Ctrl>-V.
In graphical applications, <Ctrl>-C and <Ctrl>-V work as usual.

Closing the terminal

To close the terminal, you can simply close the whole window (e.g., via mouse) but you
can also type exit or hit <Ctrl>-D on an empty line. Because we are moving away from
needing mouse (in a sense), you should prefer <Ctrl>-D ;-).

Debugging issues

When running programs in a terminal, never paste their output as a screenshot.

Instead, select the text (including the command you have run) and paste where
needed.

We will also stop inserting here screenshots of the terminal from now on and paste
only the output (though you should always run the command by yourself to see
what it does as first-hand experience).

For pasting into our Forum enclose the text in the fenced block ``` to preserve the
monospace font.

```
ls nonexistent
ls: cannot access 'nonexistent': No such file or directory
```
Navigating through the filesystem

We will start with simple navigation through the file system. Two basic commands will
get you through.

Directory listing with ls

The ls command lists files in the current directory.

Executing ls shall produce something like this:

Desktop Downloads Music Public Videos

Documents gif.md Pictures Templates
Now run ls -l. That is, ls and -l separated by a space. Here we are calling the
program ls and giving it an extra argument, -l. Because the argument starts with a
dash, it is actually a so-called option (or switch) that instructs ls to modify its
behaviour. Now ls prints something like this:

total 4
drwxr-xr-x. 1 intro intro 0 Feb 10 13:43 Desktop
drwxr-xr-x. 1 intro intro 0 Feb 10 13:43 Documents
drwxr-xr-x. 1 intro intro 0 Feb 10 13:43 Downloads
-rw-r--r--. 1 intro intro 1022 Jan 9 18:13 gif.md
drwxr-xr-x. 1 intro intro 0 Feb 10 13:43 Music
drwxr-xr-x. 1 intro intro 0 Feb 10 13:43 Pictures
drwxr-xr-x. 1 intro intro 0 Feb 10 13:43 Public
drwxr-xr-x. 1 intro intro 0 Feb 10 13:43 Templates
drwxr-xr-x. 1 intro intro 0 Feb 10 13:43 Videos
The -l turned on the so-called long mode where more details about each file are
printed.

We will return to the meaning of some of the columns later on, deciphering the
columns for the last modification time and the file size is straightforward and sufficient
for the moment.

Changing working directory with cd

The cd command allows us to change the working (current) directory. It takes one
argument – the directory we want to switch to.

Thus, cd Documents would move us to the Documents directory.

Execute ls here. What is the output? Answer.

How would you move back to the parent directory? Answer.

What will do the following command?

cd .
Answer.

Notice that the command prompt changed whenever you switched to a different
directory.
By default, it shows only the last component of the path. To show the full (absolute)
path, we need to run pwd.

It will show something like

/home/intro/Videos
Tab completion

Typing long filenames can be cumbersome and making typos is annoying. Shell
offers tab completion to help you with this.

For this example, we assume you just launched your terminal and ls prints Desktop
Documents Downloads Templates etc.

If we want to change to directory Templates, start typing cd Te and hit <Tab>. Unless
there is another filename (directory) starting with Te, the name shall be completed for
you and should read the full cd Templates/.

Submitting the command with <Enter> would switch you to the directory as we would
expect. Try it and come back to this directory again.

Now, let us switch to Documents directory. For this example, type cd Do and press <Tab>.
There are two directories with this prefix: Documents and Downloads. Because the shell
cannot know which one you want, it does nothing.

However, pressing <Tab> for the second time shows the possible matches and after
typing c (the next letter), <Tab> can finish the completion.

Tab completion is an extremely powerful feature that saves hundreds of keystrokes

and makes your interaction with the shell much faster.

Note that shells in other operating systems also offer tab completion but in a less organized
manner.

As an exercise, what happens if you type cd and hit <Tab>? Answer.

Type just c (as in cd) and hit <Tab>. What happens? Answer.

Home directory

You probably noticed that when you start your terminal, the directory name you see
there is just a ~ even though it should read intro (or your username on that particular
machine) as that is the last component from pwd.

However, the path /home/intro is your home directory and has a special shortcut of
tilde ~.

Futhermore, if you just run command cd without any extra arguments, it will change
the directory back to your home.
A quick recap

What you remember? Select all statements that are correct.

Command cd is used to change directories.

Executing cd without any arguments will change directory to your home.
Executing ls (without further arguments) will print files in your home directory.
To see file sizes one can use ls -l.
Typing cd Doc<tab> will print list of all files in current directory.
Assuming file one.txt exists and there is no file starting with one, typing ls<Space␣
>one.<Tab> will result in ls␣one.txt␣.
Evaluate

Text user interface tools

While the use of purely command-line tools such as uptime, ls or cd is cool and
extremely useful for scripts, there are also occasions where a more interactive
approach is faster.

In this sense, Linux typically offers three layers you can choose from. From a fully
graphical one called Graphical User Interface (GUI), over a tool with a Text-based User
Interface (TUI) to a pure Command-Line Interface (CLI). Every of these can be useful,
depending on the circumstances.

Actually, there is also a fourth (bottom) layer where you directly access the special files
yourself.

By a textual user interface we mean what is offered by Midnight

commander or Ranger.

Midnight commander

Run mc and navigate through the files as you have done with ls and cd.

The numbers at the bottom refer to your function keys for typical file operations
(e.g., F5 copies the file).

Note that in a typical setup, MC offers two panels with file listing, you switch between
them via <Tab> and, by default, copying is done to the directory in the other panel.

MC is a quite powerful tool as it can inspect file archives, show files on a remote
machine, etc.

We will briefly mention the most important things that you can do with it. Do try them
:-)

• <Insert> allows you to select multiple files for deletion/copying.

• <F3> displays file contents.
• <F4> offers a simple text editor with syntax highlighting.
• <+> allows you to enter a filename mask to select multiple files at once (we will talk
about this more later in the Wildcards section).
• <Ctrl>-o hides the panels and temporarily switches you back to the shell. Perfect
for running commands without leaving MC.

You can quit MC with <F10> or via a menu (activated by <F9>). Note that some terminals
capture <F10> to activate their window menu (but this behaviour can be changed
in Preferences of the terminal application).

Ranger

Ranger is a Vim-inspired file manager for the console. It brings some well-known key
bindings from the Vim realm together with tabs pages.

Navigation

• j - Move down
• k - Move up
• h - Move to the parent directory
• l - Open file or move to directory
• gg - Go to the top of the list
• G - Go to the bottom of the list
• gh - cd ~
• gm - cd /media
• gr - cd /
• q - Quit Ranger

Working with Files

• zh - View hidden files

• cw - Rename current file
• <space> - Select current file
• yy - Yank (copy) file (or selected files)
• dd - Mark file (or selected files) for cut operation file
• pp - Paste yanked or cut file(s)
• dD - Delete file (or selected files)

See more on Ranger: A terminal file manager.

Editing file contents

You probably noticed that the Development submenu contains several graphical text
editors that you can use to edit the source code. However, it is also possible to edit
files in TUI editors.

If you are asking why to learn another editor (if you are already happy with some of
the graphical ones), here is the answer. On some machines, you may not have access
to GUI at all. Recall that we talked about remote access earlier: in that case you will
have only TUI available (and you will often need to edit files on the remote machine).
Some users thus never use GUI editors at all, the reasoning is that it is much better to
learn (and customize) one editor properly, and that editor is a TUI-based one.
On our disk, you will find Emacs, Joe, mcedit and Vim.

Each has its own advantages and it is up to you which one you will choose. Note
that mcedit is probably the closest to an editor you may know from other
systems. joe is a small one, but perfectly suitable for script editing that we will be doing
the most. Both emacs and vim are extremely powerful tools that can do much more than
just edit files. However, they require a bit of time investment before you can start using
them effectively.

If you are new to Linux, we would recommend you to use mcedit (either using it directly
or when editing files in Midnight commander) and come back to the other ones later
on for a final decision of THE text editor of your choice.

All of these editors can be launched from the command line, giving it the filename to
edit as a parameter (e.g., mcedit quiz.md).

There will be many occasions (including some graded tasks in this course) where you
will be forced to edit files on a remote machine that offers only CLI (TUI) interface.

Learn how to use some TUI editor soon, we will need it in future labs.

Some of the editors mentioned above also offer GUI version and are multi-platform so
there is no excuse for not trying something new :-)

Setup excercise files

For the following, you will need to have the same list of files as we have.

Please, download this archive and unpack its contents. If you want to download it
from the command line, you can use wget URL, otherwise use whatever browser you
like. Use Midnight commander to copy the unpacked content to your home
directory. Hint.

You should see a directory nswi177-lab02 on your disk.

Shell wildcards

We assume you have the directory nswi177-lab02 prepared as described in the

previous section.

So far, we used ls to display all files in a directory. If we are interested in only a subset,
we can specifically name them on the command line.

Move to the directory where you have unpacked the nswi177-lab02.tar.gz. You should
see the following files:

a/ b/ c/ one.txt two.txt three.txt four.txt

If we want to list only details about the text files, we can execute

ls -l one.txt two.txt three.txt four.txt

-rw-r--r-- 1 intro intro 0 Mar 3 13:38 four.txt
-rw-r--r-- 1 intro intro 0 Mar 3 13:38 one.txt
-rw-r--r-- 1 intro intro 0 Mar 3 13:38 three.txt
-rw-r--r-- 1 intro intro 0 Mar 3 13:38 two.txt
Doing that for more files would not be very elegant, but the shell offers so
called wildcards to specify multiple files at once. Thus, the same output can be
obtained by running

ls -l *.txt
It is essential to note that ls (or any other program for that matter) will receive
the expanded list of files – finding the matching files is done by the shell, not by
individual programs. Thus for the above example, from inside ls there is no way of
distinguishing whether the user used the full list or the *.txt wildcard. You will
experiment with this in one of the next labs where we will talk about accessing these
parameters in your favorite programming language. For developers, it means that they
do not need to care about implementing the wildcard expansion themselves. The
program would always receive a list of existing filenames, not a wildcard.

By the way – is the last sentence completely correct? What happens if we run ls -l
*.txxxt? Answer.

How would you print all files starting with the letter t? Answer.

If we would like to print only information about files starting with

either o or f with .txt extension, we would use.

ls [of]*.txt
If we want to print files that end with any of the letters from a to f, we could use

ls *[a-f].txt
Try it in the a subdirectory.

Note that the files are sorted alphabetically when specified via wildcards.

Switch back to your home directory. Hint.

And now list all files/directories starting with D (recall that Linux is case-sensitive). You
might be surprised because a straightforward ls D* would actually list the contents in
these directories. It is perfectly expectable, because ls Documents is supposed to print
a list of files in that directory. If we do not want ls to descend into directories, we can
add -d option to prevent that.

What happens when you specify a file that does not exist? And what if only some of
the specified files do not exist?

Apart from * (that matches any part of the filename) and [list-of-characters] (that
matches one letter in the filename) also exists ? that matches any single letter
(character).

Hence x?.txt will match files where the filename is 6 letters (chars) long, starts
with x and ends with .txt (i.e., two letter filename starting with x of plain text type).
More about hidden files

Recall that filenames starting with dot . are hidden. These are by default not listed
by ls. If you want to see these files too, you have to either name them explicitly or use
the -a option.

Try it in the nswi177-lab02 directory.

What hidden files are in your home directory? Answer.

Again: it is not a security measure, just a way to make the listing less cluttered.

Exploring file contents

We have already mentioned text editors and MC to look into files when working in the
terminal. They are not the only options.

Text files

The simplest way to dump the contents of any file is to call a program called cat. Its
arguments are filenames to print. The name cat has nothing to do with the mammal
but refers to the middle of the word concatenate as it can be used to actually
concatenate files.

Move to the b subdirectory. Executing cat 000.txt will show the contents of 000.txt on
the screen.

How would you show the contents of all files in this directory? Answer.

Binary files

If we want to dump binary files (such as images), it is usually better to dump their bytes
in hexadecimal.

hexdump utility can be used for that.

We will always use it with -C switch to print hexdump and ASCII characters next to each
other. The dump of the GIF file looks like this:

hexdump -C c/sample.gif
00000000 47 49 46 38 39 61 0a 00 0a 00 91 00 00 ff ff ff |GIF89a..........|
00000010 ff 00 00 00 00 ff 00 00 00 21 f9 04 00 00 00 00 |.........!......|
00000020 00 2c 00 00 00 00 0a 00 0a 00 00 02 16 8c 2d 99 |.,............-.|
00000030 87 2a 1c dc 33 a0 02 75 ec 95 fa a8 de 60 8c 04 |.*..3..u.....`..|
00000040 91 4c 01 00 3b |.L..;|
00000045
Unprintable values (e.g., smaller than 32) are replaced with a dot.

Notice that the first characters are normal ASCII letters (which was a smart decision of the
authors of the file format).
Guessing file type

Even though the file extension is not mandatory, it is better to use it to explicitly
identify file types.

If you are not sure about the file type, utility file can identify the file type for you.

Try running it on the sample GIF file.

file c/sample.gif

Manual pages

We have seen that the ls behaviour can be modified with -a, -d, and -l. hexdump has -
C. Do you know that uptime accepts -s? And that cat takes -n to print line numbers?

It is virtually impossible to remember all of this. Luckily, Linux contains so-called

manual pages (or just manpages) that describe the available options for (almost) each
program that you have on your system.

Execute man cmd to access a manual for the cmd program (substitute cmd for the
actual command name). Use arrows for scrolling and q to quit the manual. You can
search inside the page with / (slash) key.

Manual pages are organized into sections and you can specify the section number as
part of the man execution, e.g., man 3 printf opens a help page for printf() function in
the C language because that is the contents of section 3. Note that man printf would
show you the contents of printf manual from section 1, i.e., the shell command.
Open man man to see the full list of sections. Briefly, 1 is for shell commands, 3 is for
library calls, and 4 and 5 are used for specific files (e.g., man 5 proc launches the manual
page for the whole /proc directory).

Note that manual pages are also available on-line, hence you can study your favourite
commands even without access to your Linux machine.

Typical options

Many of the options are more-or-less standardized across multiple programs and are
worth remembering.

Almost all GNU programs that you will have on your machine will print a small help
when executed with --help. Try it for ls or cd.

--versioncould be used to print the version and copyright information of the executed
program. Sometimes -v or -V works as well.

--verboseor --debug (sometimes -v or -d) launch the program in verbose mode where
the program prints in more detail what it is doing.

--dry-run (sometimes -n) executes the program without performing actual changes
(e.g., it can print which files would be removed without actually deleting any of them).

--interactive (sometimes -i) will typically cause the program to ask for interactive
confirmation of destructive actions.

-- could be used to terminate the list of options if you have filenames starting with a
dash. For a classical example, move into the d subdirectory of nswi177-lab02 and list
information about a file named -a. Then check your result and try again using the -
- delimiter. Answer.

Do not underestimate the need for -- when working with unknown files. It might be
an innocent mistake when a file named -f appears, but the results without using -
- in cmd WILDCARD might be tremendous.

Always use cmd -- WILDCARD when the wildcard starts with * or when the wildcard
comes potentially from the user (i.e. also when user specifies list of files on the
command line).

Caveats (file names with spaces in them)

If you create a file called file with spaces.txt and then execute

ls file with spaces.txt

you will receive

ls: cannot access 'file': No such file or directory

ls: cannot access 'with': No such file or directory
ls: cannot access 'spaces.txt': No such file or directory
because the space (or tab) is used as a delimiter between parameters. Hence, ls was
actually looking for three files.

If you would use tab completion, your command would be completed with escape
characters.

ls file\ with\ spaces.txt

Note that the output would typically look like this:

'file with spaces.txt'

because using apostrophes (or quotes) is another way to specify that the space is a
literal character and not a separator.

We will mention this again when talking about scripts, but it is something to remember:
spaces in filenames can cause unexpected surprises and it is better to avoid such
naming.

And yes, it is possible to create a file named ' ' (i.e., space) and show its contents
with cat " " but it is not a very sensible idea to do so. It is similar to creating files
starting with a dash – it is possible, there are ways to bypass the issues (e.g., using -
- delimiter) but it is just simpler to avoid these issues.

Work efficiently

Do not be afraid of running multiple terminals next to each other. Use one to navigate
with ls and cd, use the other one for Midnight commander to mirror your actions.

Open another one with a manual page for the command you are using.

Most desktop environments allow you to create multiple workspaces or desktops. Then,
each workspace has its own list of opened windows, windows opened on other
workspaces are not visible. This can reduce the clutter significantly and – with proper
keyboard shortcuts – speed up your work.

Running Python scripts from the command line

We will talk about this in greater detail in the following lab, for now you can use the
following command to actually run your Python script:

python3 path_to_your_python_script.py

Before-class tasks (deadline: start of your lab, week February 20 -

February 24)

The following tasks must be solved and submitted before attending your lab. If you
have lab on Wednesday at 10:40, the files must be pushed to your repository (project)
at GitLab on Wednesday at 10:39 latest.

For virtual lab the deadline is Tuesday 9:00 AM every week (regardless of vacation
days).
All tasks (unless explicitly noted otherwise) must be submitted to your submission
repository. For most of the tasks there are automated tests that can help you check
completeness of your solution (see here how to interpret their results).

02/wildcards.md (40 points, group shell)

Copy the following fragment to GitLab (obviously, to file 02/wildcards.md) and fill in
the answers.

Note that this task is not fully checked by GitLab as it would reveal the answers.

Get this archive first:

https://d3s.mff.cuni.cz/f/teaching/nswi177/202223/labs/nswi177-task02.tar.gz

Get contents of all files in subdirectory `login` that

start with a decimal digit, ends with `z.txt` and the middle letter is
any letter (i.e., a to z, not numbers) from your GitLab login
(in lowercase).

For example, if your login is `johndoe`, you should paste contents from files
`0jz.txt`, `1ez.txt` but not from `ajz.txt` or `2wz.txt` or
`0jx.txt`.

Sort the list of files alphabetically before getting their content, duplicate
letters should be ignored (i.e., use wildcards naturally and you will be fine).

Q1 Paste the contents of the files here

(do not insert any extra characters such as newlines, though).

Insert your answer between the markers below, replacing the three dots.
Leading and trailing whitespace in your answer will be ignored but
keep the starred A1 markers without changes (tests will check that).

[A1] ... [/A1]

**Q2** Insert here the wildcard pattern that you have used
(only the pattern without `ls` or any other command you have used).

[A2] ... [/A2]

Q3 Paste here a full command (including `ls`) to print all

files in the `letters` directory that end with the letter `o`.
Please, assume that you are executing `ls`
from the `letters` subdirectory and **use a proper wildcard**.

[A3] ... [/A3]

02/gif.txt (30 points, group devel)

Download from the following address a broken GIF image (substitute LOGIN with your
GitLab login, e.g. with johndoe).

https://d3s.mff.cuni.cz/f/teaching/nswi177/202223/labs/02/LOGIN.broken.gif
The image is broken because we replaced the signature at first 3 bytes with letters XXX.
This should not prevent you from reading the size of the original GIF image. Following
links will guide you through the internals of the GIF file format (look for logical screen
descriptor).

• Anatomie grafického formátu GIF (CZ)

• Matthew Flickinger: What’s In A GIF (EN)
• Wikipedia article about GIF (EN)
• FileFormat.com GIF page (EN)
• GIF specification at W3.org (EN)

Write the answer in the format WIDTHxHEIGHT into 02/gif.txt file (e.g. 50x100).

The automated tests only check format of your answer (otherwise the solution would
be too easy).

We will upload the files at the beginning of the first week and update it daily (for those
that enroll later).

We have uploaded the files.

Hint: you may find the -n option useful to limit scrolling through hexdump output.

02/architecture.sh (30 points, group admin)

Paste into this file (as its only content!) a command that prints what hardware
architecture your computer has (for most of you, it will be x86_64).

Hint: learn about the uname command.

Post-class tasks (deadline: March 12)

We expect you will solve the following tasks after attending the labs and hearing
feedback to your before-class solutions.

All tasks (unless explicitly noted otherwise) must be submitted to your submission
repository. For most of the tasks there are automated tests that can help you check
completeness of your solution (see here how to interpret their results).

02/file-size.txt (60 points, group shell)

Prepare a command that prints filename followed by its size.

Store the command into the 02/file-size.txt without any actual filename. We will add
the filenames during testing, for your experiments add it manually.

Assuming the file 02/file-size.txt contains stat -f we expect that running stat -f
filename.txt will print the following.

filename.txt 42
Similarly, running stat -f one.txt two.txt will print the following.
one.txt 1
two.txt 2047
Do not append any filenames to your command so that we can properly test it.

As you might have guessed, look into the manual page of stat to find the right
options. Do not print any other information apart from the filename and file size.

02/filepath.txt (20 points, group admin)

Resolve the following path to not contain any relative references (i.e., convert it to an
absolute path without any .. or .).

/home/../usr/./share/./man/../../lib/../../etc/ssh/.././os-release
The automated tests only check format of your answer (otherwise the solution would
be too easy).

02/proc-fs.txt (20 points, group admin)

Select from the options below the one that best describes the purpose of the
following fragment of Python code.

We are after the best possible (i.e., most precise) answer: certainly an answer of it is a
Python code that prints something is true but that is not what we are after :-)

stats = {}
with open('/proc/meminfo', 'r') as f:
for line in f:
parts = line.split(":")
stats[parts[0].strip()] = parts[1].split()[0].strip()
print(float(stats['MemFree'])/float(stats['MemTotal']))

1. Prints data of the first two lines from the file /proc/meminfo.
2. Prints the second column of the file /proc/meminfo where the columns are
separated by colons (:).
3. Prints aproximate percentage of free memory on the system.
4. Ensures that /proc/meminfo contains valid data.
5. Reads /proc/meminfo to determine if they are in a correct format.

Store the answer as single integer into 02/proc-fs.txt.

The automated tests only check format of your answer (otherwise the solution would
be too easy).

Learning outcomes

Learning outcomes provide a condensed view of fundamental concepts and skills that
you should be able to explain and/or use after each lesson. They also represent the
bare minimum required for understanding subsequent labs (and other courses as well).
Conceptual knowledge

Conceptual knowledge is about understanding the meaning and context of given

terms and putting them into context. Therefore, you should be able to …

• list pros and cons of using a command-line interface vs a graphical one

• explain the difference between a terminal (emulator) and a shell
• explain what is a path to a file
• explain difference between a relative and an absolute file path
• explain what are shell (filename) wildcards
• explain what are command-line options (switches)
• explain usefulness of -- delimiter (when using wildcards on specifically named
files)
Practical skills

Practical skills are usually about usage of given programs to solve various tasks.
Therefore, you should be able to …

• start and close (exit) a terminal emulator

• customize a selected terminal emulator
• browse through a filesystem via text user interface tools (e.g. mc or ranger)
• browse through a filesystem using commands ls and cd
• use basic switches of the ls command such as -l, -h or -a
• use wildcards to apply commands to specific subsets of filenames
• run own Python programs from the command line
• view contents of text files using the cat utility
• view contents of binary files in hexadecimal using the hexdump utility
• identify file type by using file utility
• use (basic operations) of the built-in manual pages
• use clipboards available in a graphical interface on Linux
• use tab completion to effectively write file names and paths
• use irregularly named files

This page changelog

• 2023-02-14: Files for before lab tasks were uploaded, missing links to GIF
specification were added.
• 2023-02-14: Add more info about tests and some hints to graded tasks.
• 2023-02-25: Move task 02/filepath.txt to the admin group.
Lab #3 (February 27 - March 3)
• Linux scripting
• Git principles
• Running tests locally
• Before-class tasks (deadline: start of your lab, week February 27 - March 3)
• Post-class tasks (deadline: March 19)
• Learning outcomes
• This page changelog

The goal of this lab is to introduce you to the Git command-line client and how to write reusable scripts.
We will demonstrate how Linux is suited for interpreted languages. And we will make our work with
GitLab much more efficient and see how to transfer files from it and back to it via a command-line
client.

Linux scripting

A script in the Linux environment is any program that is interpreted when being run (i.e., the program
is distributed as a source code). In this sense, there are shell scripts (the language is the shell as you
have seen it last time), Python, Ruby or PHP scripts.

The advantage of so-called scripting languages is that they do require only a text editor for
development and that they are easily portable. Disadvantage is that you need to install the interpreter
first. Fortunately, Linux typically comes with many interpreters preinstalled and starting with a scripting
language is thus very easy.

Simple shell scripts

To write a shell script, we simply write the commands into a file (instead of typing them in a terminal).

Therefore, a simple script that prints some information about your system could be as simple as the
following.

cat /proc/cpuinfo
cat /proc/meminfo
If you store this into a file first.sh, then you can execute it with the following command.

bash first.sh
Notice that we have executed bash as that is the shell program (interpreter) that we are using and the
name of the input file.

It will cat those two files (note that we could have executed a single cat with two arguments as well).

Recall that your project_name.py script can be executed with the following command (again, we run the
right interpreter).

python3 factor.py
Shebang and executable bit

Running scripts by specifying the interpreter to use (i.e., the command to run the script file with) is not
very elegant. There is an easier way: we mark the file as executable and Linux handles the rest.

Actually, when we execute the cat command or mc, there is a file (usually in
the /bin or /usr/bin directory) that is named cat or mc and marked executable. (For now, imagine the
special executable mark as a special file attribute.) Notice that there is no file extension.

However, marking the file as executable is only the first half of the solution. Imagine that we create the
following content and store it into a file hello.py marked as executable.

print("Hello")
And then we want to run it.

But wait! How will the system know which interpreter to use? For binary executables (e.g., originally
from C sources), it is easy as the binary is (almost) directly in the machine code. But here we need an
interpreter first.

In Linux, the interpreter is specified via so-called shebang or hashbang. As a matter of fact, you have
already encountered it several times: When the first line of the script starts with #! (hence the name
hash and bang), Linux expects a path to the interpreter after it and will run this interpreter and ask it to
execute the script.

If there is no shebang, the behavior is not well defined.

The Linux kernel refuses to execute shebang-less scripts. But if you run them from the shell, the shell will try
interpreting them as shell scripts. It is good practice not to rely on this behavior.

For shell scripts, we will be using #!/bin/bash, for Python we need to use #!/usr/bin/env python3. We
will explain the env later on; for now, please just remember to use this version.

Note that most interpreters use # to denote a comment which means that no extra handling is needed to skip
the first line (as it is really not needed by the interpreter).

You will often encounter #!/bin/sh for shell scripts. For most scripts is actually does not matter: simple
constructs work the same, but /bin/bash offers some nice extensions. We will be using /bin/bash in this
course as the extensions are rather useful.

You may need to use /bin/sh if you are working on older systems or you need to have your script
portable to different flavours of Unix systems.

To complicate things a bit more, on some systems /bin/sh is the same as /bin/bash as it is really a
superset.

Bottom line is: unless you know what you are doing, stick with #!/bin/bash shebang for now.
Now back to the original question: how is the script executed. The system takes the command from the
shebang, appends the actual filename of the script as a parameter, and runs that. When the user
specifies more arguments (such as --version), they are appended as well.
For example, if hexdump were actually a shell script, it would start with the following:

#!/bin/bash

...
code-to-loop-over-bytes-and-print-them-goes-here
...
Executing hexdump -C file.gif would then actually execute the following command:

/bin/bash hexdump -C file.gif

Notice that the only magic thing behind shebang and executable files is that the system assembles a
longer command line.

The user does not need to care about the implementation language.

Let us try it practically.

We know about the shebang, so we will update our example and also mark the file as an executable
one.

Store the following into first.sh.

#!/bin/bash

cat /proc/cpuinfo
cat /proc/meminfo
To mark it as executable, we run the following command. For now, please, remember it as a magic that
must be done, more details why it looks like this will come later.

chmod +x first.sh
chmod will not work on file systems that are not Unix/Linux-friendly. That unfortunately includes even NTFS.

GitLab web GUI does not offer any means for setting the executable bit. You need to use Git CLI client instead
(see the second half of this lab).

Now we can easily execute the script with the following command:

./first.sh
The obvious question is: why the redundant ./? It refers to the current directory after all, right (recall
previous lab)? So it refers to the same file!

When you type a command (e.g., cat) without any path (i.e., only bare filename containing the
program), shell looks into so-called $PATH to actually find the file with the program
(usually, $PATH would contain directory /usr/bin where most of the executable binaries are stored).
Unlike in other operating systems, shell does not look into the working directory when program cannot
be found in the $PATH.

To run a program in the current directory, we need to specify its path (when any extra path is provided,
shell ignores $PATH and simply looks for the file). Luckily, it does not have to be an absolute path, but a
relative one is sufficient. Hence the magic spell of ./.
If you move to another directory, you can execute it by providing a relative path too, such
as ../first.sh.

Run ls in the directory now. You should see first.sh now printed in green. If not, you can try ls --
color or check that you have run chmod correctly.

If you do not have a colorful terminal (unusual but still possible), you can use ls -F to distinguish file
types: directories will have a slash appended, executable files will have an asterisk next to their filename.

Excercise

Create a script that prints all image files in current directory (for now, you can safely assume there will
always be some). Try to run it from different directories using relative and absolute path. Answer.

Create a script that prints information about currently visible disk partitions in the system. For now, it
will only display contents of /proc/partitions. Answer.

Changing working directory

Let us modify our first script a little bit.

cd /proc
cat cpuinfo
cat meminfo
Run the script again.

Despite the fact that the script changed directory to /proc, when it terminates, we are still in the original
directory.

Try inserting pwd to ensure that the script really is inside /proc.

This is an essential take away – every process (running program; this includes scripts) has its own current
directory. When it is started, it inherits the directory from its caller (e.g., from the shell it was run from). Then it
can change the current directory, but that does not affect other processes in any way. Thus, when the program
terminates, the caller is still in the same directory.

This also means that cd itself cannot be a normal binary. Because if it would be a normal program (e.g.,
in Python), any change inside it would be useless after its termination.

Hence, cd is a so called builtin that is implemented inside the shell itself.

Debugging the scripts

If you want to see what is happening, run the script as bash -x first.sh. Try it now. For longer scripts,
it is better to print your own messages as -x tends to become too verbose.

To print a message to the terminal, you can use the echo command. With few exceptions (more about
these later), all arguments are simply echoed to the terminal.

Create a script echos.sh with the following content and explain the differences:

#!/bin/bash
echo alpha bravo charlie
echo alpha bravo charlie
echo "alpha bravo" charlie
Answer.

Command-line arguments

Command-line arguments (such as -l for ls or -C for hexdump) are the usual way to control the
behaviour of CLI tools in Linux. For us, as developers, it is important to learn how to work with them
inside our programs.

We will talk about using these arguments in shell scripts later on, today we will handle them in Python.

Accessing these arguments in Python is very easy. We need to add import sys to our program and then
we can access these arguments in the sys.argv list.

Therefore, the following program only prints its arguments.

#!/usr/bin/env python3

import sys

def main():
for arg in sys.argv:
print("'{}'".format(arg))

if __name__ == '__main__':
main()
When we execute it (of course, first we chmod +x it), we will see the following (lines prefixed
with $ denote the command, the rest is command output).

$ ./args.py
'./args.py'
$ ./args.py one two
'./args.py'
'one'
'two'
$ ./args.py "one two"
'./args.py'
'one two'
Note that the zeroth index is occupied by the command itself (we will not use it now, but it can be used
for some clever tricks) and notice how the second and third command differs from inside Python.

It should not be surprising though, recall the previous lab and handling of filenames with spaces in
them.

Other interpreters

We will now try what other interpreters we can put in the shebang.

Construct an absolute (!) path (hint: man 1 realpath) to the args.py that we have used above. Use it as
a shebang on an otherwise empty file (e.g. use-args) and make this file executable. Hint.

And now run it like this:

./use-args
./use-args first second
You will see that the argument zero now contains a path to your script. Argument on index one contains
the outer script – use-args and only after these items are the actual command-line arguments
(first and second).

This is essential – when you add a shebang, the interpreter receives the input filename as the first argument. In
other words – every Linux-friendly interpreter shall start evaluating a program passed to it as a filename in the
first argument.

While it may seem as an exercise in futility, it demonstrates an important principle: GNU/Linux is

extremely friendly towards the creation of mini-languages. If you need to create an interpreter for your
own mini-language, you only need to make sure it accepts the input filename as the first argument.
And voilà, users can create their own executables on the top of it.

As another example, prepare the following file and store it as experiment (with no file extension) and
make the file executable:

#!/bin/bash

echo Hello
Note that we decided to drop the extension again altogether. The user does not really need to know
which language was used. That is captured by the shebang, after all.

Now change the shebang to #!/bin/cat. Run the program again. What happens? Now run it with an
argument (e.g., ./experiment experiment). What happened? Answer.

Change the shebang to /bin/echo. What happened?

Shebang: check you understand the basics

We will assume that both my-cat and my-echo are executable scripts in the current directory.

my-cat contains as the only content the following shebang #!/bin/cat and my-echo contains
only #!/bin/echo.

Select all true statements.

Running my-cat my-echo will print contents of my-echo.

Running ./my-cat will actually execute /bin/cat my-cat.
Running ./my-cat my-echo will actually execute /bin/cat my-cat and /bin/cat my-echo.
Running ./my-cat my-echo will actually execute /bin/cat my-cat my-echo.
Running ./my-echo will print ./my-echo.
Running ./my-echo my-echo will print contents of my-echo file.
Evaluate
Git principles

So far, our interaction with GitLab was over its GUI. We will switch to the command line for higher
efficiency now.

Recall that GitLab is built on top of Git which is the actual versioning system used.

Git offers a command-line client that can download the whole project to your machine, track changes
in it, and then upload it back to the server (GitLab in our case, but there are other products, too).

While it is possible to edit many files on-line in GitLab, it is much easier to have them locally and use a better
editor (or IDE). Furthermore, not all tools have their on-line counterparts and you have to run them locally.

Before diving into Git itself, we need to prepare our environment a bit.

Setting your editor

Git will often need to run your editor. It is essential to ensure it uses the editor of your choice.

We will explain the following steps in more detail later on, for now ensure that you add the following
line to the end of ~/.bashrc file (replace mcedit with editor of your choice):

export EDITOR=mcedit
Now open a new terminal and run (including the dollar sign):

$EDITOR ~/.bashrc
If you set the above correctly, you should see again .bashrc opened in your favorite text editor.

If not, ensure you have really modified your .bashrc file (in your home directory¨) to contain the same
as above (no spaces around = etc.).

You need to close all terminals for this change to make an effect (i.e., before you start using any of the Git
commands mentioned below).

Never use a graphical editor for $EDITOR unless you really know what you are doing. Git expects a certain
behaviour from the editor that is rarely satisfied by GUI editors but is always provided by a TUI-based one.

If you want to know why GUI editors are a bad choice, the explanation is relatively simple: Git will start a new
editor a commit message (see below) and it will assume that the commit message is ready once the editor
terminates. However, many GUI editors work in a mode where there is single instance running and you only
open new tabs. In that case, the editor that is launched by Git actually terminates immediatelly – it only tells
the existing editor to open a new file – and Git sees only an empty commit message.

The git command

Virtually everything around Git is performed by its git command. Its first argument is always the actual
action – often called a subcommand – that we want to perform. For example, there is git config to
configure Git and git commit to perform a commit (create a version).

There is always a built-in help available via the following command:

git SUBCOMMAND --help
Manual pages are also available as man git-SUBCOMMAND.

Git has over 100 subcommands available. Don’t panic, though. We will start with less than 10 of them
and even quite advanced usage requires knowledge of no more than 20 of them.

Configure Git

One of the key concepts in Git is that each commit (change) is authored – i.e., it is known who made it.
(Git also supports cryptographic signatures of commits, so that authorship cannot be forged, but let us
keep things simple for now.)

Thus, we need to tell Git who we are. The following two commands are the absolute minimum you
need to execute on any machine (or account) where you want to use Git.

git config --global user.name "My real name"

git config --global user.email "my-email"
The --global flag specifies that this setting is valid for all Git projects. You can change this locally by
running the same command without this flag inside a specific project. That can be useful to distinguish
your free-lance and corporate identity, for example.

Note that Git does not check the validity of your e-mail address or your name (indeed, there is no way
how to do it). Therefore, anything can be there. However, if you use your real e-mail address, GitLab
will be able to pair the commit with your account etc. which can be quite useful. The decision is up to
you.

Working copy (a.k.a. using Git locally)

The very first operation you need to perform is so called clone. During cloning, you copy your project
source code from the server (GitLab) to your local machine. The server may require authentication for
cloning to happen.

Cloning also copies the whole history of the project. Once you clone the project, you can view all the
commits you have made so far. Without need for an internet connection.

The clone is often called a working copy. As a matter of fact, the clone is a 1:1 copy, so if someone
deleted the project, you would be able to recreate the source code without any problem. (That is not
true about the Issues or the Wiki as it applies only to the Git-versioned part of the project.)

As you will see, the whole project as you see it on GitLab becomes a directory on your hard-drive. As
usually, there are also GUI alternatives to the commands we will be showing here, but we will focus our
attention on the CLI variants only.

Cloning for the first time ( git clone)

For the following example, we will be using your submission repository
under teaching/nswi177/2023/.
Move to your project (in the web browser) and click on the blue Clone button. You should see Clone
with SSH and Clone with HTTPS addresses.

Copy the HTTPS address and use it as the correct address for the clone command:

git clone https://gitlab.mff.cuni.cz/teaching/nswi177/2023/student-LOGIN.git

The command will ask you for your username and password. As usual with our GitLab, please use the
SIS credentials.

Note that some environments may offer you to use some kind of a keyring or another form of a
credential helper (to store your password). Feel free to use them, later on, we will see how to use SSH
and asymmetric cryptography for seamless work with Git projects without any need for
username/password handling.

It seems that some environments are rather forceful in their propagation of their password helpers (and
if you enter your password incorrectly the first time, they do not provide a simple way to clear it).

Try running the following first if you encounter HTTP Basic: Access denied. and no password prompt is
shown (see also this issue).

export GIT_ASKPASS=""
export SSH_ASKPASS=""
git clone ...
Note that you should have the student-LOGIN directory on your machine now. Move to it and see what
files are there. What about hidden files? Answer.

Unless stated otherwise, all commands will be executed from the student-LOGIN directory.

After the project is cloned, you can start editing files. This is completely orthogonal to Git and until you
explicitly tell Git to do something, it does not touch your files at all.

It is also important to note that Git will not fetch updates from the server automatically for you. That is, if you
clone the project and then modify something on GitLab directly, the changes will not propagate to your
working copy unless you explicitly ask for it.

Once you are finished with your changes (e.g., you fixed a certain bug), it is time to tell Git about the
new revision.

Making changes ( git status and git diff)

Before changing any file locally, open a new terminal and run git status. You should see something
like this.

$ git status
On branch master
Your branch is up to date with 'origin/master'.

nothing to commit, working tree clean

We will now do a trivial change. Open the README.md file in your project (locally, i.e., not in GitLab
browser UI) and add a link to the Forum there.
Notice how links are created in Markdown and add your link as the last paragraph.

Run git status after the change. Read carefully the whole output of this command to understand what
it reports.

Create a new file, 03/editor.txt and put into it the name of the editor that you have decided to use
(feel free to create directory 03 in some graphical tool or use mkdir 03).

Again, check how git status reports this change in your project directory.

What have you learned? Answer.

Run git diff to see how Git tracks the changes you made.

You will see a list of modified files (i.e., their content differs from last commit) and you can also see a
so called diff (sometimes also called a patch) that describes the change.

The diff will typically look like this:

diff --git a/README.md b/README.md

index 39abc23..61ad679 100644
--- a/README.md
+++ b/README.md
@@ -3,3 +3,5 @@
Submit your solutions for all graded tasks and quizzes here.

See details at course homepage: <https://d3s.mff.cuni.cz/teaching/nswi177/>

+
+Forum is at ...
How to read it? It is a piece of plain text that contains the following information:

• the file where the change happened

• the context of the change
o line numbers (-3,3 +3,5)
o lines without modifications (starting with space)
• the actual change
o lines added (starting with +)
o lines removed (starting with -)

Why this output is suitable for source code changes?

Note that git diff is also extremely useful to check that the change you made is correct as it focuses
on the context of the change rather than the whole file.

Making the change permanent ( git add and git commit)

Once you are happy with these changes, you can stage the changes. This is Git-speak for saying these
files (their current content) will be in the next revision. Often, you will stage all changed files. But
sometimes you may want to split the commit as you actually worked on two different things and first
you commit one part and then the other.
For example, you were fixing a bug, but also encountered a typo somewhere along the way. It is possible
to add them both to the same commit, but it is much better to keep the commits well organized. The
first commit would be a Bugfix in XY, the second one will be Typo fix.

That clearly states what the commit changed. It is actually similar to how you create functions in a
programming language. A single function should do one thing (and do it well). A single commit should
capture one change.

Now prepare your first commit (recall that commit is basically a version or a named state of the project)
– run git add 03/editor.txt. We will take care of the extension in README.md later.

How git status differs from the previous state? Answer.

After staging all the relevant changes (i.e. git add-ing all the needed files), you create a commit. The
commit clears the staging status and you can work on fixing another bug :-).

Make your first commit via git commit. Do not forget to use a descriptive commit message!

Note that without any other options, git commit will open your text editor. Write the commit message
there and quit the editor (save the file first). Your commit is done.

For short commit messages, you may use git commit -m "Typo fix" where the whole commit message
is given as argument to the -m option (notice the quotes because of the space).

How will git status look like now? Think about it first before actually running the command!

You basically repeat this as long as you need to make changes. Recall that each commit should capture
a reasonable state of the project that is worth returning to later.

Whenever you make a commit, the commit remains local. It is not propagated back to the server
automatically.

Sending the changes to the server

To upload the changes (commits) back to the server, you need to initiate a so-called push. It uploads
all new commits (i.e., those between your clone operation and now) back to the server. The command
is rather simple.

git push
It will again ask for your password and after that, you should see your changes on GitLab.

Which changes are on GitLab? Answer.

Exercise

Add the link to Forum as a second commit from the command line.

As a third commit, create 03/architecture.sh script that contains the right shebang, it is executable and
prints the current architecture (if you skipped this task in previous lab, simply run only uname there or
look up the right switch in the man page now).
Push now the changes to GitLab. Note that all commits were pushed at the same time.

Browsing through the commits ( git log)

Investigate what is in the Repository -> Commits menu in GitLab. Compare it with the output of git
log and git log --oneline.

Yes, commands can be even that simple.

Getting the changes from the server

Ensure you have committed and pushed all changes in your local copy to GitLab first.

Change the title in the README.md to also contain for YOUR NAME. But this time make the change on
GitLab.

To update your local clone of the project, execute git pull.

What is the easiest way to ensure that you have also the change in README.md on your machine after git
pull? Answer.

Note that git pull is quite powerful as it can incorporate changes that happened virtually at the same
time in both GitLab web UI as well as in your local clone. However, understanding this process requires
also knowledge about branches, which is out-of-scope for this lab.

For now, remember to not mix changes locally and in GitLab UI (or on a different machine) without always
ending with git push and starting with git pull.

Working on multiple machines

Things get a little bit more complex when you work on multiple machines (e.g., mornings at a school
desktop, evenings at your personal notebook).

Git is really powerful and can do extraordinary merges of your work.

But for now it is best to ensure the following workflow to minimize introducing incompatible changes.

Note that if things go horribly wrong, you can always do a fresh clone to a different directory, copy the
files manually and remove the broken clone.

As long as you ensure that you work in the following manner, nothing will ever break:

1. Clone your work on machine A.

2. Work on machine A (and commit the result)
3. Push on A (to server).
4. Move to machine B and clone there.
5. Work on B (commits).
6. Push on B (to server).
7. Move to A and pull (from server).
8. Work on A (commits).
9. Push on A.
10. Pull on B.
11. Work on B.
12. Etc (i.e., go to 5).

Once you forgot some of the synchronizing pulls/pushes when switching between machines, problems
can arise. They are easy to solve, but we will talk about that in later labs.

For now, you can always do a fresh clone and simply copy files with the new changes and commit again
(not the right Git way, but it definitely works).

Going further

The command git log shows plenty of information but often you are interested in recent changes only.
You use them to refresh your mind of what you were working on etc.

Hence, the following command would actually make more sense:

git log --max-count=20 --oneline

But that is quite long and difficult to remember. Try the following instead:

git config --global alias.ls 'log --max-count=20 --oneline'

That is even worse! But with the above magic, Git will suddenly start to recognize the following
subcommand:

git ls
And that could save time.

Our favorite aliases are for the following commands.

st = status
ci = commit
ll = log --format='tformat:%C(yellow)%h%Creset %an (%cr) %C(yellow)%s%Creset' --max-count=20 --
first-parent
Try running them first before adding them to your Git.
Git: check you remember the basic commands
Select all true statements.

git clone is used for updating an existing working copy.

git log shows a list of existing changes (commits, versions).
git pull is used for updating an existing working copy.
git add prepares (kind of) the file for next commit.
git status shows an overview of individual changes (i.e., modified lines).
git status shows an overview of changes (i.e., modified files).
Evaluate

Running tests locally

Because you now know about shebangs, executable bits and scripts in general, you have enough
knowledge to actually run our tests locally without needing GitLab.
It should make your development faster and more natural as you do not need to wait for GitLab.

Simply execute ./bin/run_tests.sh in the root directory of your project and check the results.

You can even run only a specific subset of tests.

./bin/run_tests.sh 03-before
./bin/run_tests.sh 03-post
./bin/run_tests.sh 03-before/architecture
Note: If you are using your own installation of Linux, you might need to install the bats (or bash-
bats or bats-core) package first.

Before-class tasks (deadline: start of your lab, week February 27 - March 3)

The following tasks must be solved and submitted before attending your lab. If you have lab on
Wednesday at 10:40, the files must be pushed to your repository (project) at GitLab on Wednesday at
10:39 latest.

For virtual lab the deadline is Tuesday 9:00 AM every week (regardless of vacation days).

03/git.txt (40 points, group git)

You will need the following repository.

https://d3s.mff.cuni.cz/f/teaching/nswi177/202223/labs/task-03.git/
There are multiple files in this repository. Copy the one mentioned in the commit messages
to 03/git.txt.

In other words, clone the above repository, view existing commits and in the commit messages, you
will see a filename that you should copy to your own project (as 03/git.txt).

Automated tests only check presence of the file, not that you have copied the right one.

03/architecture.sh (30 points, group shell)

Extend your solution from the previous lab and write a script that that prints what hardware architecture
your computer has.

Ensure your script has the right shebang and executable bit set.

03/editor.txt (30 points, group git)

Store into this file the name of your text editor that you use from the command line. Save simply the
command that you execute such as joe.
If you have went through the text above, you are already done :-).

Post-class tasks (deadline: March 19)

We expect you will solve the following tasks after attending the labs and hearing feedback to your
before-class solutions.

Using Git on the command line (70 points, group git)

This task is not tested through automated tests in GitLab.

We need to distribute you passwords for this repository (we do not want to bind it with your SIS
account). We will do that during week 02. For this task you will be using your SIS/GitLab login but a
different password. The password was uploaded to the Wiki that is part of your NSWI177 project. The
information is on a page called Secrets. See the screenshot below for details how to find the page.

You will need the following repository (obviously, replace LOGIN with your SIS/GitLab login). Use the
password from the Wiki page Secrets (recall that pasting can be done simply by selecting the text here
and pasting it into the terminal with middle mouse click). This URL has no browser-friendly version, do
not be surprised by 404 if you open it in a web browser.

https://lab.d3s.mff.cuni.cz/nswi177/git-03/LOGIN.git
After you clone it, create a file 03.txt inside it.

Make two commits with this file.

In the first commit, insert 2022 as its only content (i.e., to 03.txt).

As a second commit, modify it to 2023.

Push your changes back to the repository.

03/local.txt (30 points, group shell)

In this task, you will only store a specific string into this file.

The correct answer is printed by the automated tests when you execute them locally (i.e., with 03-
post/local).
Learning outcomes

Conceptual knowledge

Conceptual knowledge is about understanding the meaning and context of given terms and putting
them into context. Therefore, you should be able to …

• explain what is a script in a Linux environment

• explain what is a shebang (hashbang) and how it influences script execution
• understand the difference when script has or does not have executable bit set
• explain what is a working directory
• explain why working directory is private to a running program
• explain how are parameters (arguments) passed in a script with a shebang
• explain what is a Git working copy (clone)
• optional: explain why cd cannot be a normal executable file like /usr/bin/ls
• optional: understand major differences between /bin/sh and /bin/bash shebangs
Practical skills

Practical skills are usually about usage of given programs to solve various tasks. Therefore, you should
be able to …

• create a Linux script with correct shebang

• set the executable script using the chmod utility
• access command-line arguments in a Python program
• configure author information in Git
• setup default editor in a shell (set EDITOR in ~/.bashrc)
• clone a Git repository over HTTPS in shell
• review changes in a Git working copy (git status command)
• create a Git commit from command-line (git add and git commit commands)
• upload new commits to Git server or download new ones to a working copy (assuming single
user project, git push and git pull commands)
• view summary information about previous commits using git log
• optional: customize Git with aliases

This page changelog

• 2023-02-27: Warning about password helpers and executable bit in GitLab UI.
• 2023-02-24: Add connection details about the Git repository from post-class task.
Lab #4 (March 6 - March 10)

• Running example
• Standard input and outputs
• Filters
• Pipes (data streaming composition)
• Writing your own filters
• Standard error output
• Under the hood (about file descriptors)
• Advanced I/O redirection
• Program return (exit) code
• Shell customization
• More examples
• Before-class tasks (deadline: start of your lab, week March 6 - March 10)
• Post-class tasks (deadline: March 26)
• Learning outcomes
• This page changelog

The goal of this lab is to define and thoroughly understand the concepts of standard input, output, and
standard error output. This would allow us to understand program I/O redirection and composition of
different programs via pipes. We will also customize our shell environment a little by investigating
command aliases and the .bashrc file.

Running example

We will build this lab around a single example that we will incrementally develop, so that you learn the
basic concepts on a practical example (obviously, there are specific tools that could be used instead, but
we hope that this is better than a completely artificial example).

Data for our example can be downloaded (i.e., git cloned) from this repository where they reside in
the 04/ subdirectory.

They simulate simplified logs from a web server, where the web server records which files (URLs) were
accessed at which time.

Practically, each file represents traffic for one day in a simplified CSV format.

Fields are separated by a comma, there is no header, and for each record we remember the date, the
client’s IP address, the URL that was requested, and the amount of transferred bytes.

In reality, the data would be also compressed and would probably contain more details about the client (e.g., the
browser used), but otherwise the data recorded represent a fairly typical web server log format.

Our task is to write a program that prints a brief summary of the data:

1. Print 3 most accessed URLs.

2. Print 3 days with the highest volume of traffic (i.e., the sum of transferred bytes).
3. Print total amount of data transferred.

Before we build the solution we need to lay some groundwork.

Standard input and outputs

We will start the lab with few definitions of concepts that you probably already know (but maybe not
under exactly these names).

Standard output

Standard output (often shortened to stdout) is the default output that you can use by
calling print("Hello") if you are in Python, for example. Stdout is used by the basic output routines in
almost every programming language.

Generally, this output has the same API as if you were writing to a file. Be it print in
Python, System.out.print in Java or printf in C (where the limitations of the language necessitate the
existence of a pair of printf and fprintf).

This output is usually prepared by the language runtime together with the shell and the operating system
(the technical details are not that important for this course anyway). Practically, the standard output is
printed to the terminal or its equivalent (and when the application is launched graphically, stdout is
typically lost).

Note that in Python you can access it explicitly via sys.stdout that acts as an opened file handle (i.e.,
result of open).

Standard input

Similarly to stdout, almost all languages have access to stdin that represents the default input. By default,
this input comes from the keyboard, although usually through the terminal (i.e., stdin is not used in
graphical applications for reading keyboard input).

Note that the function input() that you may have used in your Python programs is an upgrade on top
of stdin because it offers basic editing functions. Plain standard input does not support any form of
editing (though typically you could use backspace to erase characters at the end of the line).

If you want to access the standard input in Python, you need to use sys.stdin explicitly. As one could
expect, it uses a file API, hence it is possible to read a line from it calling .readline() on it or to iterate
through all lines.

In fact, the iteration of the following form is a quite common pattern for many Linux utilities (they are
usually written in C but the pattern remains the same).

for line in sys.stdin:

...
Note that the above pattern actually works for any opened text file in Python and it is the preferred way to read
a textual file.

Many of the utilities actually read from stdin by default. For example, cut -d : -f 1 prints only the first
column of data of each line (and expects the columns to be delimited by :).

Run it and type the following on the keyboard, terminating each line with <Enter>.

cut -d : -f 1
one:two
alpha:bravo
uno:dos
You should see the first column echoed underneath your input.

What to do when you are done? Typing exit will not help here but <Ctrl>-D works.

Pressing <Ctrl>-D on an empty line will close the standard input. The program cut will realize that there is no
more input to process and will gracefully terminate. Note that this is something else than <Ctrl>-C which
forcefully kills the running process. From the user’s perspective, these look similar in the context of the
utility cut, but the behavior is totally different with important semantics difference (that can be observed when
using other tools).

Standard I/O redirection

As a technical detail, we mentioned earlier that the standard input and output are prepared (partially) by
the operating system. This also means that it can be changed (i.e., initialized differently) without changing
the program. And the program may not even “know” about it.

This is called redirection and it allows the user to specify that the standard output would not go to the
screen (terminal), but rather to a file. From the point of view of the program, the API is still the same.

This redirection has to be done before the program is started and it has to be done by the caller. For us,
it means we have to do it in the shell.

It is very simple: at the end of the command we can specify > output.txt and everything that would be
normally printed on a screen goes to output.txt.

Before you start experimenting: the output redirection is a low-level operation and has no form of undo.
Therefore, if the file you redirect to already exists, it will be overwritten without questions. And without
any easy option to restore the original file content (and for small files, the restoration is technically
impossible for most file systems used in Linux).

As a precaution, get into a habit to hit <Tab> after you specify the filename. If the file does not exist, the
cursor will not move. If the file already exists, the tab completion routine will insert a space.
As the simplest example, the following two commands will create files one.txt and two.txt with the
words ONE and TWO inside (including the new line character at the end).

echo ONE > one.txt

echo TWO >two.txt
Note that the shell is quite flexible in the use of spaces and both options are valid (i.e., one.txt does not
have a space as the first character in the filename).

From implementation point of view, echo received a single argument, the part with > filename is not
passed to the program at all (i.e., do not expect to find > filename in your sys.argv).

If you know Python’s popen or a similar call, they also offer the option to specify which file to use for stdout if
you want to do a redirection in your program (but only for a new program launched, not inside a running
program).

If you recall Lab 02, we mentioned that the program cat is used to concatenate files. With the knowledge
of output redirection, it suddenly starts to make more sense as the (merged) output can be easily stored
in a file.

cat one.txt two.txt >merged.txt

Appending in output redirection
The shell also offers an option to append the output to an existing file using the >> operator. Thus, the
following command would add UNO as another line into one.txt.

echo UNO >>one.txt

If the file does not exist, it will be created.

For the following example, we will need the program tac that reverses the order of individual lines but
otherwise works like cat (note that tac is cat but backwards, what a cool name). Try this first.

tac one.txt two.txt

If you have executed the commands above, you should see the following:

UNO
ONE
TWO
Try the following and explain what happens (and why) if you execute

tac one.txt two.txt >two.txt

Answer.

Input redirection
Similarly, the shell offers < for redirecting stdin. Then, instead of reading input typed by the user on the
keyboard, the program reads the input from a file.

Note that programs using Pythonic input() do not work that well with redirected input.
Practically, input() is suitable for interactive programs only. You might want to
use sys.stdin.readline() or for line in sys.stdin instead.

When input is redirected, we do not need to issue <Ctrl>-D to close the input as the input is closed
automatically when reaching the end of the file.

Standard input and output: check you understand the basics

Select all true statements.

Standard output does not have to be explicitly opened in end-user applications.

Standard output always goes to the screen.
sys.stdin and sys.argv are different identifiers ultimately leading to the same data.
Standard input is usually used for data while program arguments are usually used for controlling
program behavior.
Command echo >file.txt will replace contents of file.txt with an empty line.
Commands echo OLD > file.txt followed by echo NEW >file.txt and undo file.txt ensures
that file.txt will contain OLD.
Evaluate

Filters

Many utilities in Linux work as so-called filters. They accept the input from stdin and print their output
to stdout.
One such example is cut that can be used to print only certain columns from the input. For example,
running it as cut -d : -f 1 with /etc/passwd as its input will display a list of accounts (usernames) on the
current machine.

Try to explain the difference between the following two calls:

cut -d : -f 1 </etc/passwd
cut -d : -f 1 /etc/passwd
The above behavior is quite common for most filters: you can specify the input file explicitly, but when it
is missing, the program reads from the stdin.

To return to the question above: the difference is that in the first case (with input redirection), the input
file is opened by the shell and opened file is passed to cut. Problems in opening the file are reported by
shell and cut might not be launched at all. In the second case, the file is opened by cut (i.e., cut executes
the open() call and also needs to handle errors).

Advancing the running example

Armed with this knowledge, we can actually solve the first part of our running example. Recall that we
have files that logged traffic each day and we want to find URLs that are most common in all the files
together.

That means we need to join all files together, keep only the URL and find the three most frequent lines.

And we can do that. Recall that cat can be used concatenate files and cut can be used to keep only
certain columns. We will do finding the most frequent URL in a while.

So, how about this?

#!/bin/bash

cat logs/20[0-9][0-9]-[01][0-9]-[0-3][0-9].csv >_logs_merged.csv

cut -d , -f 5 <_logs_merged.csv
We have used a quite explicit wildcard to ensure we do not print some random CSVs even though cat
logs/*.csv could work as well.

Consider how much time this would take to write in Python.

The script has one big flaw (we will solve it soon but it needs to be mentioned anyway).

The script writes to a file called _logs_merged.csv. We have prefixed the filename with underscore to mark
it as somewhat special but still: what if the user created such file manually?

We would overwrite that file, no question asked. No option to recover.

Never do that in your scripts again.

You may also encounter variant where cut is called as cut -d, -f3. Most programs are smart enough to
recognize both variants but it is important to remember that this is something that must be handled by
each program.

That is, the program must be able to work with sys.argv[1] == '-d,' and with (sys.argv[1] == '-d') and
(sys.argv[2] == ',').
Pipes (data streaming composition)

We finally move to the area where Linux excels: program composition. In essence, the whole idea behind
Unix-family of operating systems is to allow easy composition of various small programs together.

Mostly, the programs that are composed together are filters and they operate on text inputs. These
programs do not make any assumptions on the text format and are very generic. Special tools (that are
nevertheless part of Linux software repositories) are needed if the input is more structured, such as XML
or JSON.

The advantage is that composing the programs is very easy and it is very easy to compose them
incrementally too (i.e., add another filter only when the output from the previous ones looks reasonable).
This kind of incremental composition is more difficult in normal languages where printing data requires
extra commands (here it is printed to the stdout without any extra work).

The disadvantage is that complex compositions can become difficult to read. It is up to the developer to
decide when it is time to switch to a better language and process the data there. A typical division of
labour is that shell scripts are used to preprocess the data: they are best when you need to combine data
from multiple files (such as hundreds of various reports, etc.) or when the data needs to be converted to
a reasonable format (e.g. non-structured logs from your web server into a CSV loadable into your favorite
spreadsheet software or R). Computing statistics and similar tasks are best left to specialized tools.

Needless to add, Linux offers a plenty of tools for statistical computations or plot drawing utilities that
can be controlled by CLI. Mastering of these tools is, unfortunately, out of topic for this course.

Let us return to the running example again.

We already mentioned that the temporary file we used is bad because we might have overwritten
someone elses data.

But it also requires disk space for another copy of the (possibly huge) data.

A bit more subtle but much more dangerous problem is that the path to the temporary file is fixed.
Imagine what happens if you execute the script in two terminals concurrently. Do not be fooled by the
feeling that the script so short that the probability of concurrent execution is negligible. It is a trap that
is waiting to spring. We will talk about proper use of mktemp(1) later, but in this example no temporary
file is needed at all.

We learned about program composition, right? And we can use it here.

cat logs/20[0-9][0-9]-[01][0-9]-[0-3][0-9].csv | cut -d , -f 5

The | symbol stands for a pipe, which connects the standard output of cat to the standard input of cut.
The pipe passes data between the two processes without writing them to the disk at all. (Technically, the
data are passed using memory buffers, but that is a technical detail.)

The result is the same, but we escaped the pitfalls of using temporary files and the result is actually even
more readable.

For cases when the first command also reads from standard input another syntax is available. For
example, this prints a sorted list of local user accounts (usernames).
cut -d : -f 1 </etc/passwd | sort
We can even move the first < before cut, so that the script can be read left-to-right like “take /etc/passwd,
extract the first column, and then sort it”:

</etc/passwd cut -d : -f 1 | sort

In essence, the family of unix systems is built on top of the ability of creating pipelines, which chain a
sequence of programs using pipes. Each program in the pipeline denotes a type of transformation. These
transformations are composed together to produce the final result.

Advancing the running example a bit more

We wanted to print the three most visited URLs first.

Using the pipe above we can print all the URLs in a single list.

To find the most often visited ones we will use a typical trick where we first sort the lines alphabetically
and then use program uniq with -c to count unique lines (in effect counting how many times each URL
was visited). We then sort this output by the numbers and print first 3 lines.

Hence our program will evolve like this (lines starting with # are obviously comments).

# Get all URLs

cat logs/20[0-9][0-9]-[01][0-9]-[0-3][0-9].csv | cut -d , -f 5

# We will make the wildcard shorter to save space

cat logs/*.csv | cut -d , -f 5

# Sort URLs, have same URLs on adjoining lines

cat logs/*.csv | cut -d , -f 5 | sort

# Count number of occurrences (uniq does not sort the file)

cat logs/*.csv | cut -d , -f 5 | sort | uniq -c

# Sort output of uniq numerically

cat logs/*.csv | cut -d , -f 5 | sort | uniq -c | sort -n

# Print last file lines only

cat logs/*.csv | cut -d , -f 5 | sort | uniq -c | sort -n | tail -n 3
Do not be scared. We advanced by little steps on each line. Run the individual commands yourself and
watch how the output is transformed.

Exercise
Print the total amount of transferred bytes using the logs from our running example (i.e., the last part of
the task).

Hint: you will need cat, cut, paste and bc.

First part should be easy: we are interested only in the last column.

cat logs/*.csv | cut -d , -f 4

To sum lines of numbers we will use paste that is able to merge lines from multiple files or join lines into
a single file. We will give it separator of + to create a huge expression SIZE1+SIZE2+SIZE3+....

cat logs/*.csv | cut -d , -f 4 | paste -s -d +

Finally, we will use bc to sum the lines.

cat logs/*.csv | cut -d , -f 4 | paste -s -d + | bc

bc alone is a quite powerful calculator than can be used interactively too (recall that <Ctrl>-D will
terminate the input in interactive mode).

More examples are provided at the end of this lab.

Quick check of filters

Select all true statements.

A filter is a program that reads standard input and prints results to standard output.
A pipe connects the standard output of one program to the standard input of another program.
Pipes can be replaced with I/O redirection.
Pipes can split standard input to two programs for further processing.
Evaluate

Writing your own filters

Let us finish another part of the running example. We want to compute traffic for each day and print
days with the most traffic.

Knowing how we composed things so far, we lack only the middle part of the pipeline. Summing the
sizes for each day.

There is no ready-made solution for this (advanced users might consider installing termsql) but we will
create our own in Python and plug it into our pipeline.

We will try to make it simple yet versatile enough.

Recall we want to group the traffic by dates, hence our program should be able to do the following
tranformation.

# Input
day1 1
day1 2
day2 4
day1 3
day2 1
# Output
day1 6
day2 5
Here is our version of the program. Notice that we have (for now) ignored error handling but allowed
the program to be used as a filter in the middle of the pipeline (i.e., read from stdin when no arguments
are provided) but also easily usable for multiple files.

In your own filters, you should also follow this approach: the amount of source code you need to write
is negligible, but it gives the user flexibility in use.

#!/usr/bin/env python3

import sys

def sum_file(inp, results):

for line in inp:
(key, number) = line.split(maxsplit=1)
results[key] = results.get(key, 0) + int(number)
def main():
sums = {}
if len(sys.argv) == 1:
sum_file(sys.stdin, sums)
else:
for filename in sys.argv[1:]:
with open(filename, "r") as inp:
sum_file(inp, sums)
for key, sum in sums.items():
print(f"{key} {sum}")

if __name__ == "__main__":
main()
With such program in place, we can extend our web statistics script in the following manner.

cat logs/*.csv | cut -d , -f 1,4 | tr ',' ' ' | ./group_sum.py

Use man to find out what tr does.

On your own, extend the solution to print only the top 3 days ( sort can order the lines using different
columns than the whole line too). Answer.

Standard error output

While it often makes sense to redirect the output, you often want to see error messages still on the
screen.

Imagine files one.txt and two.txt exist while nonexistent.txt is not in the directory. We will now execute
the following command.

No, do not imagine it. Create the files one.txt and two.txt to contain words ONE and TWO yourself on the
command line. Hint. Answer.

cat one.txt nonexistent.txt two.txt >merged.txt

Obviously, cat prints an error message when the file does not exist. However, if the error message were
printed to stdout, it would be redirected to merged.txt together with the actual output. This would not
be practical.

Therefore, every Linux program also has a standard error output (often just stderr) that also goes to the
screen but is logically different from stdout and is not subject to > redirection.

In Python, it is available as sys.stderr and it is (as sys.stdout) an opened file.

We can extend our implementation to handle I/O errors like this:

try:
with open(filename, "r") as inp:
sum_file(inp, sums)
except IOError as e:
print(f"Error reading file {filename}: {e}", file=sys.stderr)

Under the hood (about file descriptors)

The following text provides overview of file descriptors that are abstractions used by the OS and the
application when working with opened files. Understanding this concept is not essential for this course
but it is a general principle that (to some extent) is present in most operating systems and applications
(or programming languages).
Technically, opened files have so-called file descriptors that are used when an application communicates
with the operating system (recall that file operations have to be done by the operating system). The file
descriptor is an integer that serves as an index in a table of opened files that is kept for each process
(i.e., a running instance of a program).

This number — the file descriptor — is then passed to system calls which operate on the opened file.
For example, write gets two arguments: an opened file descriptor and a byte buffer to write (in our
examples, we will pass the string directly for simplicity). Therefore, when your application
calls print("Message", file=some_file), eventually your program would call the operating system
as write(3, "Message\n") where 3 denotes the file descriptor for the opened file represented by
the some_file handle.

While the above may look like a technical detail, it will help you understand why the standard error
redirection looks the way it does, or why file operations in most programming languages require opening
the file first before writing to it (i.e., why write_to_file(filename, contents) is never a primitive
operation).

In any unix-style environment, the file descriptors 0, 1, and 2 are always used for standard input, standard
output, and standard error output, respectively. That is, the call print("Message") in Python eventually
ends up in calling write(1, "Message\n") and a call to print("Error", file=sys.stderr) calls write(2,
"Error\n").

When a new process is started, it obtains these three file descriptors from its caller (e.g., the shell). By
default, they point to the terminal, but the caller can simply open them to point to a different file. This is
how redirection works.

The fact that stdout and stderr are logically different streams (files) also explains the word probably in
one of the examples above. Even though they both end in the same physical device (the terminal), they
may use a different configuration: typically, the standard output is buffered, i.e., output of your
application goes to the screen only when there is enough of it, while the standard error is not buffered
– it is printed immediately. The reason is probably obvious – error messages should be visible as soon
as possible, while normal output might be delayed to improve performance.

Note that the buffering policy can be more sophisticated, but the essential take away is that any output
to the stderr is displayed immediately while stdout might be delayed.

Advanced I/O redirection

Ensure you have the group_sum.py script available.

Prepare files one.txt and two.txt:

echo ONE 1 > one.txt

echo ONE 1 > two.txt
echo TWO 2 >> two.txt
Now execute the following commands.

./group_sum.py <one.txt
./group_sum.py one.txt
./group_sum.py one.txt two.txt
./group_sum.py one.txt <two.txt
Has it behaved as you expected?
Trace which paths (i.e. through which lines) the program has taken with the above invocations.

Redirecting standard error output

To redirect the standard error output, you can use > again, but this time preceded by the number 2 (that
denotes the stderr file descriptor).

Hence, our cat example can be transformed to the following form where err.txt would contain the error
message and nothing would be printed on the screen.

cat one.txt nonexistent.txt two.txt >merged.txt 2>err.txt

Redirecting into and inside a script

Consider the following mini-script (first-column.sh) that extracts and sorts the first column (for colon-
delimited data such as in /etc/passwd).

#!/bin/bash

cut -d : -f 1 | sort
Then the user can use the script like this and cut standard input would be properly wired to the shell
standard input or through the pipe.

cat /etc/passwd | ./first-column.sh

./first-column.sh </etc/passwd
head /etc/passwd | ./first-column.sh | tail -n 3
While the above example is somewhat artificial but it demonstrates the important principle that stdin is
naturally available even inside scripts when redirected from the “outside”.

Generic redirection

Shell allows us to redirect outputs quite freely using file descriptor numbers before and after the greater-
than sign.

For example, >&2 specifies that the standard output is redirected to a standard error output. That may
sound weird but consider the following mini-script.

Here, wget used to fetch file from given URL.

echo "Downloading tarball for lab 02..." >&2

wget https://d3s.mff.cuni.cz/f/teaching/nswi177/202122/labs/nswi177-lab02.tar.gz 2>/dev/null
We actually want to hide the progress messages of wget and print ours instead.

Take this as an illustration of the concept as wget can be silenced via command-line arguments (--quiet)
as well.

Sometimes, we want to redirect stdout and stderr to one single file. In these situations simple >output.txt
2>output.txt would not work and we have to use >output.txt 2>&1 or &>output.txt (to redirect both at
once). However, what about 2>&1 >output.txt, can we use it as well? Try it yourself! Hint.

Notable special files

We already mentioned that virtually everything in Linux is a file. Many special files representing devices
are in /dev/ subdirectory.

Some of them are very useful for output redirection.

Run cat one.txt and redirect the output to /dev/full and then to /dev/null. What happened?

Especially /dev/null is a very useful file as it can be used in any situation when we are not interested in
the output of a program.

For many programs you can specify the use of stdin explicitly by using - (dash) as the input filename.

Another option is to use /dev/stdin explicitly: with this name, we can make the example
with group_sum.py work:

./group_sum.py /dev/stdin one.txt <two.txt

Then Python opens the file /dev/stdin as a file and operating system (together with shell) actually
connects it with two.txt.

/dev/stdoutcan be used if we want to specify standard output explicitly (this is mostly useful for
programs coming from other environments where the emphasis is not on using stdout that much).

Program return (exit) code

So far, the programs we have used announced errors as messages. That is quite useful for interactive
programs as the user wants to know what went wrong.

However, for non-interactive use, checking for error messages is actually very error-prone. Error
messages change, the users can have their system localized etc. etc. Therefore, Linux offers a different
way of checking whether a program terminated correctly or not.

Whether a program terminates successfully or with a failure, is signalled by its so-called return (or exit)
code. This code is an integer and unlike in other programming languages, zero denotes success and any
non-zero value denotes an error.

Why do you think that the authors decided that zero (that is traditionally reserved for false) means
success and nonzero (traditionally converted to true) means failure? Hint: in how many ways can a
program succeed?

Unless specified otherwise, when your program terminates normally (i.e., main reaches the end and no
exception is raised), the exit code is zero.

If you want to change this behavior, you need to specify this exit code as a parameter to
the exit function. In Python, it is sys.exit.

For C programs, the main function actually returns an int, whose value is the exit code. Use it properly.

The full signature is actually int main(int argc, char *argv[]) so that you can access command-line
options as function arguments (most environments will actually allow you to use plain void
main(void) but it is not recommended).
As an example, the following is a modification of the group_sum.py above, this time with proper exit code
handling.

def main():
sums = {}
exit_code = 0
if len(sys.argv) == 1:
sum_file(sys.stdin, sums)
else:
for filename in sys.argv[1:]:
try:
with open(filename, "r") as inp:
sum_file(inp, sums)
except IOError as e:
print(f"Error reading file {filename}: {e}", file=sys.stderr)
exit_code = 1
for key, sum in sums.items():
print(f"{key} {sum}")
sys.exit(exit_code)
We will later see that shell control flow (e.g., conditions and loops) is actually controlled by program
exit codes.

Failing fast

So far, we expected that our shell scripts will never fail. We have not prepared them for any kind of
failure.

We will eventually see how exit codes can be tested and used to control our shell scripts more, but for
now we want to stop whenever any failure occurs.

That is actually quite sane behavior: you typically want the whole program to terminate if there is an
unexpected failure (rather than continuing with inconsistent data). Like an uncaught exception in Python.

To enable terminate-on-failure, you need to call set -e. In case of failure, the shell will stop executing
the script and exit with the same exit code as the failed command.

Furthermore, you usually want to terminate the script when an uninitialized variable is used: that is
enabled by set -u. We will talk about variables later but -e and -u are usually set together.

And there is also a caveat regarding pipes and success of commands: the success of a pipeline is
determined by its last command. Thus, sort /nonexistent | head is a successful command. To make a
failure of any command fail the (whole) pipeline, you need to run set -o pipefail in your script (or shell)
before the pipeline.

Therefore, typically, you want to start your script with the following trio:

set -o pipefail
set -e
set -u
Many commands allow short options (such as -l or -h you know from ls) to be merged like this (note
that -o pipefail has to be last):

set -ueo pipefail

Get into a habit where each of your scripts starts with this command.

Actually, from now on, the GitLab pipeline will check that this command is a part of your scripts.

Pitfalls of pipes (a.k.a. SIGPIPE)

set -ueo pipefail can sometimes cause unwanted and quite unexpected behavior.

The following script terminates with a hard-to-explain error, i.e., we never reach the final echo. Note that
the final hexdump is there only to ensure we do not print garbage from /dev/urandom directly on the
terminal.
#!/bin/bash

set -ueo pipefail

cat /dev/urandom | head -n 1 | hexdump

echo OKAY NOT PRINTED

Despite the fact that everything looks fine.

The reason comes from the head command. head has a very smart implementation that terminates after
first -n lines were printed. Reasonable right? But that means that the first cat is suddenly writing to a
pipe that no one reads. It is like writing to a file that was already closed. That generates an exception
(well, kind of) and cat terminates with an error. Because of set -o pipefail, the whole pipeline fails.

The truth is that distinguishing whether the closed pipe is a valid situation that shall be handled gracefully
or if it indicates an issue is impossible. Therefore cat terminates with an error (after all, someone just
closed its output without letting it know first) and thus the shell has to mark the whole pipeline as failed.

Solving this is not always easy and several options are available. Each has its pros and cons.

When you know why this can occur, adding || true marks the pipeline as fine (we will learn about || later
on, though).
Exit code: check you understand the basics
Select all true statements.

Exit code 0 denotes success.

Exit code is of integer type.
Programs must print an error message when terminating with non-zero exit code.
Program exit code is set by assigning value to sys.argv.
Program exit code in Python is determined from the return value of main().
set -ueo pipefail ensures that script is terminated when any of the programs inside it fail.
Evaluate

Shell customization

We already mentioned that you should customize your terminal emulator to make it comfortable to use.
After all, you will spend at least this semester with it and it should be fun to use.

In this lab, we will show some other options how to make your shell more comfortable to use.

Command aliases

You probably noticed that you execute some commands with the same options a lot. One such example
could be ls -l -h that prints a detailed file listing, using human-readable sizes. Or perhaps ls -F to
append a slash to the directories. And probably ls --color, too.

Shell offers to create so-called aliases where you can easily add new commands without creating full-
fledged scripts somewhere.

Try executing the following commands to see how a new command l could be defined.

alias l='ls -l -h`

l
We can even override the original command, the shell will ensure that rewriting is not a recursive.

alias ls='ls -F --color=auto'

Note that these two aliases together also ensure that l will display filenames in colors.

There are no spaces around the equal sign.

Some typical aliases that you will probably want to try are the following ones. Use a manual page if you
are unsure what the alias does. Note that curl is used to retrieve contents from a URL and wttr.in is
really a URL. By the way, try that command even if you do not plan to use this alias :-).

alias ls='ls -F --color=auto'

alias ll='ls -l'
alias l='ls -l -h'

alias cp='cp -i'

alias mv='mv -i'
alias rm='rm -i'

alias man='man -a'

alias weather='curl wttr.in'

~/.bashrc
Aliases above are nice, but you probably do not want to define them each time you launch the shell.
However, most shells in Linux have some kind of file that they execute before they enter interactive
mode. Typically, the file resides directly in your home directory and it is named after the shell, ending
with rc (you can remember it as runtime configuration).

For Bash which we are using now (if you are using a different shell, you probably already know where to
find its configuration files), that file is called ~/.bashrc.

You have already used it when setting EDITOR for Git, but you can also add aliases there. Depending on
your distribution, you may already see some aliases or some other commands there.

Add aliases you like there, save the file and launch a new terminal. Check that the aliases work.

The .bashrc file behaves as a shell script and you are not limited to have only aliases there. Virtually any
commands can be there that you want to execute in every terminal that you launch.

Changing your prompt ( $PS1)

You can also modify how your prompt looks like. The default is usually reasonable but some people
prefer more information in there. If you are one of those, here are the details (take it as an overview as
prompt customization is a topic for a whole book).

The prompt is modified through the PS1 variable. We will talk about variables in more detail later on, for
now we will learn the syntax only.

When setting the variable, we can directly modify it in shell and immediatelly observe the result.

Try executing the following command.

PS1=''
The prompt is gone. We have set it to an empty string.

PS1='Enter your commands: '

This is much better, right?

And try the following:

PS1='\w '
Here we set it to print current directory and a space. The special sequence \w will be automatically
replaced by the name of the working directory.

Many users prefer to know as which user they are logged in.

PS1='\u: \w '
The usual tradition is end the prompt with a dollar sign.

PS1='\u \w\$ '

Using a special sequence of \[\033[01;32m\] and \[\033[0m\] we can change the prompt color too.

PS1='\[\033[01;32m\]\u \w\[\033[0m\]\$ '

Use different numbers in place of the 32 to modify the color yourself. Special value of 0m switches back
to terminal default.

It is also possible to add your own commands to be executed or even make the prompt multi-line.

PS1='$( date ) \u \w\$ '

Here, the special part $( date ) denotes that output from the program date will become part of the
prompt (we will talk about $( ) construct later on, take it as a teaser here only).

Using \n allows us to split the prompt into multiple lines.

PS1='\n$( date )\n\u \w\$ '

And of course, everything can be combined.

PS1='\n\[\033[01;32m\]$( date )\[\033[0m\]\n\[\033[01;34m\]\u\[\033[0m\]

\[\033[01;35m\]\w\[\033[0m\]\$ '

More examples

The following examples can be solved either by executing multiple commands or by piping basic shell
commands together. To help you find the right program, you can use manual pages. You can also use
our manual as a starting point.

Note that none of the solutions requires anything else than using few pipelines. For advanced users:
definitely you do not need if or while or read or even using PERL or AWK.

Use the following CSV with data on how long it took to copy the USB disk image to the USB drives in
the library. The first column represents the device, the second duration of the copying.

As a matter of fact, the first column also indirectly represents port of the USB hub (this is more by accident
but it stems from the way we organized the copying). As a sidenote: it is interesting to see that some
ports that are supposed to be the same are actually systematically slower.
We want to know what was the longest duration of the copying: in other words, the maximum of column
two.

Solution.
Create a directory a and inside it create a text file --help containing Lorem Ipsum. Print the content of this
file and then delete it. Solution.

Create a directory called b and inside it create files called alpha.txt and *. Then delete the file called * and
watch out what happened to the file alpha.txt. Solution.

Print the content of the file /etc/passwd sorted by the rows. Solution.

Print the first and third column of the file /etc/group. Solution.

Count the lines of the file /etc/services. Solution.

Print last two lines of the files /etc/passwd and /etc/group using a single command. Solution.

Recall the file disk-speeds-data.csv with the disk copying durations. Compute the sum of all
durations. Solution.

Consider the following file format.

Alpha 8 4 5 0
Bravo 12 5 3 2
Charlie 1 0 11 4
Append to each row sum of its line. You do not need to keep the original alignment (i.e., feel free to
squeeze the spaces). Hint. Solution.
Print the contents of /etc/passwd and /etc/group separated by text Ha ha ha (i.e., contents
of /etc/passwd, line with Ha ha ha and contents of /etc/group). Solution.

Print vendors of your CPU. Use the file /proc/cpuinfo as the starting point.

Solution.

Before-class tasks (deadline: start of your lab, week March 6 - March 10)

The following tasks must be solved and submitted before attending your lab. If you have lab on
Wednesday at 10:40, the files must be pushed to your repository (project) at GitLab on Wednesday at
10:39 latest.

For virtual lab the deadline is Tuesday 9:00 AM every week (regardless of vacation days).

We are sorry but the automated tests are not yet ready. We will upload them ASAP. Tests are available.

This lab is about pipes. The shell tasks here must be solved using pipes, not using shell loops (even if you know
them) or by off-loading to another programming language.

04/line_count.sh (30 points, group shell)

Count total number of lines of all text files (i.e., *.txt) in current directory. The script will output only a
single number.
You can assume that there will be always at least one such file present.

04/users.sh (40 points, group admin)

Print real names of users containing system anywhere in their record (i.e. the word system appears
anywhere on the line).

List of users is stored either in /etc/passwd or via getent passwd. Your script will assume that the list of
users will come on standard input.

Hence test it as getent passwd | 04/users.sh.

04/fastest.sh (30 points, group shell)

Assume the following input format (durations are integers) containing program execution durations
together with their authors.

name1,duration_in_seconds_1
name2,duration_in_seconds_2
Write author of the fastest solution (you can safely assume that the durations are distinct).

Post-class tasks (deadline: March 26)

We expect you will solve the following tasks after attending the labs and hearing feedback to your
before-class solutions.

We are sorry but the automated tests are not yet ready. We will upload them ASAP. Tests are available.

This lab is about pipes. The shell tasks here must be solved using pipes, not using shell loops (even if you know
them) or by off-loading to another programming language.

04/row_sum.sh (50 points, group shell)

Assume that you have a a matrix writen in a “fancy” notation. You can rely that the format is fixed (with
regard to spacing, 3 digits maximum, position of pipe symbol etc.) but the number of columns or rows
can differ.

Write a script that prints sum of each row.

We expect that for the following matrix we would get this output.

| 106 179 |
| 188 50 |
| 5 125 |
285
238
130
The script will read input from stdin, there is no limit on the amount of columns or rows but you can rely
on the fixed format as explained above.
04/day_of_week.py (50 points, group devel)
Write a Python filter that converts date to day of week.

The program will convert dates in first column only (using whitespace for splitting), invalid dates will be
ignored (and the line will be kept as-is). Rest of the column will copied to the output.

2023-02-20 Rest of the line

Some other line
2023-02-21 Line contents
Monday Rest of the line
Some other line
Tuesday Line contents
The program must be able launchable as:

04/day_of_week.py <input.txt
04/day_of_week.py input.txt
cat one.txt two.txt | 04/day_of_week.py
If the file cannot be opened, the program will print an error message to stderr (exact wording is defined
by the tests) and will terminate with exit code 1.

You can expect that the program will not be invoked as 04/day_of_week.py one.txt two.txt.

We expect you will use functions from the datetime module.

Learning outcomes

Conceptual knowledge

Conceptual knowledge is about understanding the meaning and context of given terms and putting
them into context. Therefore, you should be able to …

• explain what is standard input and output

• explain why standard input or output redirection is not (directly) observable from within the
program
• explain why there are two output streams: stdout and stderr
• explain how execution of cat foo.txt and cat <foo.txt differs
• explain how standard inputs/outputs of several programs can be chained together
• explain what is program exit code
• explain differences and typical uses for the main five interfaces of a command-line program:
command-line arguments, stdin, stdout, stderr, and exit code
• optional: explain what is a file descriptor (from the perspective of a userland developer)
Practical skills

Practical skills are usually about usage of given programs to solve various tasks. Therefore, you should
be able to …

• redirect standard input and standard (error) output of a program in shell

• set exit code of a Python script
• use the special file /dev/null
• use standard input and output in Python
• use the pipe | to chain multiple programs together
• use basic text filtering tools: cut, sort, …
• use grep -F to filter lines matching provided pattern
• optional: customize shell script with aliases
• optional: store custom shell configuration in .bashrc (or .profile) scripts
• optional: customize prompt with the PS1 variable

This page changelog

• 2023-02-25: Move task 04/users.sh to the admin group.

• 2023-03-03: Emphasize how stdin can be redirected into a script.
Lab #5 (March 13 - March 17)
• Networking introduction
• Asymmetric cryptography
• Using SSH
• Unix-style access rights
• Git over SSH
• Processes
• Going further
• Before-class tasks (deadline: start of your lab, week March 13 - March 17)
• Post-class tasks (deadline: April 2)
• Learning outcomes

The goal of this lab is to start using Linux in network environment and learn basic concepts needed for
machines shared among multiple users. After the lab, you will be able to log in to a remote Linux machine,
use Git over SSH and also have a look of what programs are currently running on a Linux machine.

We provide brief overview of several concepts that we believe you should already be familiar with. Feel
free to skim these parts and focus on the new topics only.

Networking introduction

This text assumes that you have basic knowledge of networking. Terms such as IP address, interface or
port shall not be new for you. If you need a refresher, we have set-up a short page with a brief overview
of networking.

Asymmetric cryptography

Before diving into details about how to access remote machines (over SSH), we need to make a brief
refresh of some cryptography-related topics.

In this lab, we will be talking a lot about asymmetric cryptography. In general, it is a method of
encryption/decryption where the user needs a different key for decryption of the message than the one
that was used for message encryption.

This is different from symmetric ciphers. For example, the well-known Caesar cipher has a single key (the
alphabet shift step) which is used for both encryption and decryption.

Asymmetric cryptography creates usually a pair of keys: a public key, that is usually used for encryption
and a private one. For example, if you make your encryption key public and decryption key private,
everybody can encrypt a message for you, but only you can decrypt it. This is secure if it is impossible (or
hard enough) to derive the private key from the public one, which is usually the case.

This has an obvious advantage: you do not need to create a secret symmetric key for every pair of users
who would want to communicate. Instead, everybody just distributes their public key and guards the
single private key. (This is not as trivial as it looks: When Alice wants to send an encrypted message to
Bob, she has to make sure that the public key does really belong to Bob. Otherwise, you can easily establish
a secure connection, but to an attacker.)
Unfortunately, there is no good example of an asymmetric cipher as simple as the Caesar’s cipher. For an
example, which is more complex, but still approachable, have a look at RSA.

Please note that selecting a good cipher is only a small step in communicating securely. If you want to
learn more, please consult some real textbook on cryptography or attend one of our cryptographic
courses. The above serves as a refresher to ensure we are on the same page for the rest of this lab.

Two uses for asymmetric cryptography

Asymmetric cryptography has two main uses. The first one is obvious: if we know the public key of the
receiver of the message, we can use it to encrypt the message and send it over unprotected medium (and
without fear that anyone else would be able to read it).

But it can be used also in reverse to authenticate the owner of the private key. We assume we are able to
distribute the public keys safely here.

The mini-protocol is then able to authenticate (i.e., verify) that the other party is who they claim to be by
proving the ownership of the private key (i.e., we assume that private keys were not stolen).

The method is very simple – the sender generates a random text and encrypts it with the public key of the
receiver (the one we wish to verify). If the receiver is the real owner, they would be able to decrypt the
random text and send it back to us. Inability to decrypt the text means that the receiver is not the owner
of the private key.

Public & private key authentication

Typically a user authenticates to a service with a login and a password. While this method is quite natural
and common for human users, it has several drawbacks.

The most obvious problem is password strength: people rarely have long passwords and not many people
use any form of password manager. Using the same password at multiple places also allows administrator
(or hacker) of one service to impersonate you in other services.

If you do not use a password manager, consider using one. The general idea is that you remember one
(but strong!) password and this password encrypts rest of the passwords.

Therefore, all passwords can be generated to be long enough and unique for each service.

There are plenty of managers available, a simple one is pass that can use Git backend and has plenty of
GUI clients available, including ones for Android and iOS.
Back to private/public key authentication. Some services allow the user to authenticate with their public
key instead of using username/password.

The user uploads their public key to the server (using login and password for authenticating that
operation) and when they want to log in, the server challenges them with a random secret encrypted with
their public key. As the sole owner of the private key (and hence the only one able to decrypt), the user
can decrypt the secret and confirm their identity. The operation then continues as with any other
authenticated user.
Useful rules

For the public key authentication to work securely, the following is highly recommended (note that most
of these rules apply to any other type of authentication, too).

The private key is like the password – it must not leak. Because the private key is usually a file, you must protect
this file. Having an encrypted hard drive is a typical option for portable machines.

It is possible to protect the key itself with a passphrase (basically, another password). Then even a leaked
private key file is not an immediate threat of identity theft. Note that there are helpers, such as ssh-
agent(1), that can store the passphrase for some time so you do not have to enter it every time you use
the key.

If you have multiple computers, it is preferred to use a different public/private key pair on each machine.
If one machine is compromised, it it sufficient to remove its public key from all applications, while you can
still use the other key pairs.

Cryptography overview: check you understand the basics

Select all true statements.

Private key is derived from a public key.

Uploading public key to GitLab is fine.
Authentication using private/public key requires users entering their password as well.
Using the same private key over multiple machines is fine.
On Linux, public and private keys are stored (usually) in text files.
Evaluate

Using SSH

Enough of theory: let us connect to some remote machine. Let us explore SSH.

What is SSH?

SSH – that stands for Secure Shell – is a protocol for connecting to a different machine and running a shell
there.

From a user perspective, after you SSH from a Linux machine into a different Linux machine, the shell may
look the same and some commands would behave completely the same. Except they might be running
on a different computer.

Note that this is intentional: remote shell is a natural way to control a Linux machine. No need to make it
different from controlling it through a local shell.

SSH practically

Using SSH is very simple (unless you make it complex). To SSH to a remote machine, you need to know
your credentials (i.e., login name and a password) and, of course, the name of the remote machine.

Then simply execute the following:

ssh YOUR_LOGIN@REMOTE_MACHINE_NAME
Note that the command ssh is often called a SSH client as it connects to the SSH server (similar
to curl or wget being web clients connecting to a web server).

We have set up a remote machine for you on linux.ms.mff.cuni.cz. You will be using your GitLab (SIS/CAS)
login and also the same password.

ssh [email protected]
If your GitLab account was created manually, the chances your SIS password will not work on this machine.

Please, contact us via this link and we will create an account for you manually.
The first login to the machine is a bit more complicated. SSH client wants from you a verification that you
trust the remote server. It shows you a so-called server fingerprint:

The authenticity of host 'linux.ms.mff.cuni.cz (195.113.20.170)' can't be established.

ECDSA key fingerprint is SHA256:ltoc1TjoYhCZk6b8qSTAL6wFsRv7blw6u3h6NqCcYvI.
Are you sure you want to continue connecting (yes/no/[fingerprint])?
Based on your configuration, RSA or ED25519 key may be used instead:

RSA: SHA256:Z11Qbd6nN6mVmCSY57Y6WmxIJzqEFHFm47ZGiH4QQ9Y
ED25519: SHA256:/CVwDW388z6Z5VlhLJT0JX+o1UzakyQ+S01+34BI0BA
You should have received this fingerprint in a secure way before connecting to the server (for example,
printed from your employer etc.). In this case, we hope that a potential attacker would not be able to
break both into this web server and the SSH server at once. So we use the HTTPS protocol on the web as
a secure way of verifying the SSH server.

The program then continues to ask your for a password and also informs you that the fingerprint was
stored.

On following logins, the SSH client checks that the fingerprint has not changed. A changed fingerprint
belonging to the same machine (i.e., with the same DNS name) could indicate a man-in-the-middle attack.
Do not connect when the fingerprint has changed. Always contact the administrator (e.g., the teachers
in this particular case) and check with them what is happening.

If you were able to log-in, you will see an (almost) empty home directory and otherwise a normal Linux
machine, this time without any graphical applications installed.

Try to run lscpu to see what machine you have logged in to.

Note that this machine is shared for all students of this course. Use it to solve graded tasks or to
experiment with commands we show in the labs. You will be using it for the rest of the course for several
tasks so keep it useable, please.

Do not use it for computationally intensive tasks or other tasks that are not related to this course.

We also strictly prohibit to use it for any kind of remote development with tools such as Visual Studio,
IntelliJ IDEA or similar IDEs. These tools install huge blobs of code/data on the remote machine and
because they rarely remove old versions, they are very quick in taking all free space available for
themselves.

If we encounter any form of abusive use, we will block the offending account.
Configuration of $PS1
We have already touched this a little bit in the previous lab as an extra for further reading.

specified how your prompt looks like. From now on, you should ensure that your $PS1 shows also the
$PS1
machine name so that you always know where you are (i.e., which machine you are logged into).

Similarly to setting EDITOR in your ~/.bashrc you should ensure to specify the following (at least) if the
default does not display the machine name.

PS1='\u@\h \w\$'
This ensures that you can see your username (\u), the hostname (\h) and also the working directory (\w).

Which ~/.bashrc you need to modify? Answer.

Return to previous lab if you want to see more options about setting your prompt.

As a personal tip: use colorful prompt without user/machine on your workstation but keep the non-colored
version with \u@\h on all remote machines to keep short prompt on the personal machine but keep visual
distinction when working on a remote one.

Using SSH to run one command

The command ssh is actually quite powerful and configurable. One important feature is that you can
specify a command after the hostname and run this command directly. In this case, SSH never starts
an interactive shell on the remote machine, but executes only the given command and returns.

Following commands prints uname -r on the remote machine.

ssh user@hostname uname -r

Investigate how (and why) the following command behaves differently.

ssh user@hostname uname -r >local_file.txt

ssh user@hostname "uname -r >remote_file.txt"
To verify that you are on a different machine, run on both uname -a and hostname -f (it provides the full
DNS name of the current machine).

Can you see the differences?

You can also trying checking free -h or uptime.

SSH and passwordless authentication

To enable authentication with public key over SSH, we need to perform two steps. Generate the public-
private key pair and copy the public key to the remote machine.

To generate the public key we need to run the following command (the -C is actually a comment,
providing e-mail or your login is the usual approach but keeping it empty is possible too).

ssh-keygen -t ed25519 -C "[email protected]"

The program would ask us where to store the generated private and public key. Keep the defaults. Choose
if you want a passphrase or not (it is more secure to have one, but the use is a bit more cumbersome).
After ssh-keygen finishes, check that the directory ~/.ssh contains id_ed25519 (the private key)
and id_ed25519.pub (the public key).
Feel free to inspect their content, e.g. using cat. Surprised that they are text files?

Once we have the public key ready, we need to upload it to the remote machine. If you have multiple key
pairs, read about the -i switch.

ssh-copy-id LOGIN@REMOTE_MACHINE
If you log in to the remote machine again, the SSH client should use the key pair and log you in without
asking for a password. If not, run SSH with -vvv to debug the issue.

Note that the public key was stored into ~/.ssh/authorized_keys file on the remote machine. You can copy
it there manually but using ssh-copy-id is easier.

If the copying fails with cryptic message about warning: here-document at line 251 delimited by end-of-
file (wanted EOF), try upgrading the SSH client first.

If you use the image from us, simple sudo dnf upgrade openssh-clients should work.
Using keys only

Note that some services actually require that you authenticate using a key pair instead of a password as
it is considered more secure.

The advantage is that any random attackers could keep guessing your password and still never get access
to your machine.
Automated client ban

Typically, SSH server is also configured to ban any client that tries to login without success several times
in a row. Our server does that, too.
Copying files

If you have deciphered what the following command does, you should by now have an idea how to copy
files to and from a remote machine (we are not saying it would be the most effective way):

ssh user@hostname "uname -r >remote_file.txt"

Wondering how this helps you copy the files? Think about it a bit more before reading on, consider how
pipes are used etc.

Have you really thought about it? :-)

To copy the files, we can leverage cat that simply dumps the file as-is to stdout. We will pipe its output to
SSH and on the other machine run second cat to store the file.

cat README.md | ssh user@hostname "cat >remote-readme.md"

Now, please, ensure that you can explain what is happening. SSH is able to read stdin (after all, we were
typing our commands locally and they appeared in the remote shell on our local screen) and hence we
can tunnel the data through it.

There are also scp and rsync that can be used for copying multiple files over SSH easily but we will talk
about these in later labs.

File managers
Many file managers allows you to use SSH transparently and copy between machines with the same ease
as when working with local files.
For example, in mc, select Shell connection either in left or right panel and specify SSH connection. One of
the panels will show the files on the remote machine. Try using F5 to copy files interactively.

Using machines in Rotunda

Apart from the machine linux.ms.mff.cuni.cz, there is also a full lab of machines available in the Rotunda
computer lab on Malostranské náměstí.

All the Linux machines in the lab are also reachable via SSH. Again, use your SIS credentials to log in. Note
that all machines in Rotunda (but not linux.ms.mff.cuni.cz!) share the same home directory, i.e., it does
not matter which one you physically connect to. Your files will be available on all machines.

Unfortunately, the used file system and authentication mechanism does not allow to use public key
authentication for Rotunda machines. You need to always specify your password.
(linux.ms.mff.cuni.cz does not have this limitation and we expect you will use public key authentication
there.)

Following machines are available.

• Lab SU1
o u1-1.ms.mff.cuni.cz
o u1-2.ms.mff.cuni.cz
o …
o u1-14.ms.mff.cuni.cz
• Lab SU2
o u2-1.ms.mff.cuni.cz
o u2-2.ms.mff.cuni.cz
o …
o u2-25.ms.mff.cuni.cz
• Rotunda
o u-pl1.ms.mff.cuni.cz
o u-pl2.ms.mff.cuni.cz
o …
o u-pl23.ms.mff.cuni.cz

SSH: review your knowledge

Select all true statements.

SSH is the protocol for connecting to remote machines.

Running commands on the remote machine will give the same output as when executed locally.
SSH generally only forwards standard input and output over the network.
If successful, ssh user@localhost uname -r will have the same output as uname -r when run locally.
SSH will always listen on port 22 (TCP).
SSH authenticates users via username and password only.
Evaluate

Unix-style access rights

So far we have silently ignored the fact that there are different user accounts on any Linux machine. And
that users cannot access all files on the machine. In this section we will explain the basics of Unix-style
access rights and how to interpret them.
After all, now you can log in to a shared machine and you should be able to understand what you can
access and what you cannot.

Recall what we said about /etc/passwd earlier – it contains the list of user accounts on that particular
machine (technically, it is not the only source of user records, but it is a good enough approximation for
now).

Every running application, i.e., a process, is owned by one of the users from /etc/passwd (again, we simplify
things a little bit). We also say that the process is running under a specific user.

And every file in the filesystem (including both real files such as ~/.bashrc and virtual ones such
as /dev/sda or /proc/uptime) has some owner.

When a process tries to read or modify a file, the operating system decides whether the operation is
permitted. This decision is based on the owner of the file, the owner of the process, and permissions
defined for the file. If the operation is forbidden, the input/output function in your program raises an
exception (e.g., in Python), or returns an error code (in C).

Since a model based solely on owners would be too inflexible, there are also groups of users (defined
in /etc/group). Every user is a member of one or more groups, one of them is called the primary group.
These are associated with every process of the user. Files have both an owning user and an owning group.

Files are assigned three sets of permissions: one for the owner of the file, one for users in the owning
group, and one for all other users. The exact algorithm for deciding which set will be used is this:

1. If the user running the process is the same as the owner of the file, owner access rights are used
(sometimes also referred to as user access rights).
2. If the user running the process is in a group that was set on the file group access rights are used.
3. Otherwise, the system checks against other access rights.

Every set of permissions contains three rights: read (r), write (w), execute (x):

• Read and write operations on a file are obvious.

• The execute right is checked when the user tries to run the file as a program (recall that
without chmod +x, the error message was in the sense of Permission denied: this is the reason).
o Note that a script which is readable, but not executable, can be still run by launching the
appropriate interpreter manually.
o Note that when a program is run, the new process will inherit the owner and groups of its parent
(e.g., of the shell that launched it). Ownership of the executable file itself is not relevant once the
program was started. For example, /usr/bin/mc is owned by root, yet it can be launched by any
user and the running process is then owned by the user who launched it.

The same permissions also apply to directories. Their meaning is a bit different, though:

• Read right allows the user to list directory entries (files, symlinks, sub-directories, etc.).
• Write right allows the user to create, remove, and rename entries inside that directory. Note that
removing write permission from a file inside a writable directory is pointless as it does not prevent the user
from overwriting the file completely with a new one.
• Execute right on a directory allows the user to open the entries. (If a directory has x, but not r, you can
use the files inside it if you know their names; however, you cannot list them. On the contrary, if a directory
has r, but not x, you can only view the entries, but not use them.)
Permissions of a file or directory can be changed only by its owner, regardless of the current permissions.
That is, the owner can deny access to themselves by removing all access rights, but can always restore
them later.

root account
Apart from accounts for normal users, there is always an account for a so-called superuser – more often
called simply just root – that has administrator privileges and is permitted to do anything with any file in
the system. The permissions checks described above simply do not apply to root-owned processes.

Unlike other systems, Linux is designed in such way that end-user programs are always executed under
normal users and never require root privileges. As a matter of fact, some programs (historically, this was
a very common behaviour for IRC chat programs) would not even start under root.

Viewing and changing the permissions

Looking at the shortcuts of rwx for individual permissions, you may found them familiar:

drwxr-xr-x 1 intro intro 48 Feb 23 16:00 02/

drwxr-xr-x 1 intro intro 60 Feb 23 16:00 03/
-rw-r--r-- 1 intro intro 51 Feb 23 16:00 README.md
The very first column actually contains the type of the entry (d for directory, - for a plain file, etc.) and
three triplets of permissions. The first triplet refers to owner of the file, the middle one to the group, and
last to the rest of the world (other). The third and fourth columns contain the owner and the group of the
file.

Typically, your personal files in your home directory will have you as the owner together with a group with
the same name. That is a default configuration that prevents other users from seeing your files.

Do check that it is true for all directories under /home on the shared machine.

But also note that most of the files in your home directory are actually world-readable (i.e., anyone can
read them).

That is actually quite fine because if you check permissions for your ~, you will see that it is typically drwx-
-----. Only the owner can modify and cd to it. Since no one can actually change to your directory, no one
will be able to read your files (technically, reading a file involves traversing the whole directory and
checking access rights on the whole path).

To change the permissions, you can use chmod program. It has the general format of

chmod WHO[+=-]PERMISSION file1 file2 ...

WHO can be empty (for all three of user, group and others) or one of u, g or o. And PERMISSION can be one
of r, w or x.

Sticky and other bits

If you execute the following command, you will see a bit different output that you would probably expect.

ls -ld /usr/bin/passwd /tmp

drwxrwxrwt 23 root root 640 Mar 3 08:15 /tmp/
-rwsr-xr-x 1 root root 51464 Jan 27 14:47 /usr/bin/passwd*
You should have noticed that /tmp has t in place of an executable bit and passwd has s there.
Those are special variants of the executable bit. The t bit on a directory specifies that user can remove
only their own files. The reason is obvious – you shall not be able to remove someone else’s files
inside /tmp. Something that is otherwise impossible to specify with traditional (basic) permissions.

The s bit (set-uid) is a bit more tricky. It specifies that no matter who runs the shell, passwd will be running
under the user owning the file (i.e., root for this file).

While it may look useless, it is a simple way to allow running certain programs with elevated (higher)
permissions. passwd is a typical example. It allows the user to change their password. However, this
password is stored in a file that is not readable by any user on the system except root (for obvious reasons).
Giving the s bit to the executable means that the process would be running under root and would be able
to modify the user database (i.e., /etc/passwd and /etc/shadow that contains the actual passwords).

Since changing the permissions can be done only by the owner of the file, there is no danger that a
malicious user would add the s bit to other executables.

There are other nuances regarding Unix permissions and their setting, refer to chmod(1) for details.

Beyond traditional Unix permissions: POSIX ACL

The permission model described above is a rare example of a concept coming from Unix that is considered
inflexible for use today. However, it is also considered as a typical example of a simple but well usable
security model.

Many programs copied this model and you can encounter it in other places too. It is definitely something
worth remembering and understanding.

The inflexibility of the system comes from the fact that allowing a set of users to access a particular file
means creating a special group for these users. These groups are defined in /etc/group and changing them
requires administrator privileges.

With an increasing number of users, the amount of possibly needed groups grows exponentially. On the
other hand, for most situations, the basic Unix permissions are sufficient.

To tackle this problem, Linux offers also so-called POSIX access control lists where it is possible to assign
an arbitrary list of users to any file to specify the permissions.

getfacl and setfacl are the utilities to control these rights but since these are practically needed rarely,
we will leave their knowledge at the level of reading the corresponding manpages and acl(5).
Access rights checks

Let’s return a little bit to the access rights.

Change permission of some of your scripts to be --x. Try to execute them. What happens? Answer.

Remove writable bit for a file and write to it using stdout redirection. What happens?

Access rights quiz

Assuming the following output of ls -l (script.sh is really a shell script) and assuming user bob is in
group wonderland while user lewis is not.

-rwxr-xr-- 1 alice wonderland 1234 Feb 20 11:11 script.sh

Select all true statements.

User alice can modify script.sh.

User bob can run script.sh.
User lewis can run script.sh.
User lewis can modify the file.
alice can make the script unreadable for bob.
bob can make the script executable for lewis.
Evaluate

Git over SSH

So far, we have used Git over HTTPS. Git can be used over SSH too. Then, the traffic is basically tunneled
through an SSH connection and Git relies on the SSH wrapper for security as well as for (partial)
authentication.

Virtually all Git servers (GitLab, GitHub, Bitbucket…) will require you to upload your public key if you want
to use Git over SSH.

To actually use Git over SSH, we first need to tell GitLab about our SSH keys (recall the protocol that is
used to authenticate the user).

GitLab and SSH keys

Copy your public key to GitLab. Navigate to right-top menu with your avatar, select Preferences and
then SSH keys or visit this link.

Copy your public key there and name it. Typically, the name should mention your username and your
machine. Note that GitLab will send you an e-mail informing you about a new key. Why? Hint. Answer.

Go to your project and clone it again. This time, use the Clone with SSH URL.

git clone [email protected]:teaching/nswi177/2023/student-LOGIN.git

Have you noticed that this looks like SSH address? It actually is exactly that. The first part identifies the
machine and the user (git) and after a colon is a local path.

This way, you can clone a Git directory from any SSH server by specifying its remote path there (here,
GitLab does some mangling but the principle holds).

Note that the user we clone with is git – not you. This way, GitLab needs only one physical user account
for handling Git requests and distinguishes the users via their public keys. How? Answer.

By the way, what happens if you try to run the following?

ssh [email protected]
Note that you should prefer the SSH protocol for working with Git as it is much more comfortable for use.

Git on other platforms also offers generation of an SSH key but often the key is usable only by one application
(different applications have incompatible key formats), while on Linux a single key is generally usable for Git, other
machines, and other services.
Rest of the work with Git remains the same. git add, git commit, git push etc. will work the same but only
the communication with GitLab goes through SSH tunnel. Note that you don’t have to reenter your
credentials while doing git push. This is because git remembers how did you git clone the repository and
will use the same URL (either HTTPS or SSH) for git push as well (unless you configure it otherwise). And
since you used SSH for pulling, it will use SSH for pushing as well, which uses the public/private
authentication you already set up.

Processes

Files in the system are the passive elements of it. The active parts are running programs that actually
modify data. Let us have a look what is actually running on our machine.

When you start a program (i.e., an executable file), it becomes a process. The executable file and a running
process share the code – it is the same in both. However, the process also contains the stack (e.g., for local
variables), heap, current directory, list of opened files etc. etc. – all this is usually considered a context of
the process. Often, the phrases running program and process are used interchangeably.

To view the list of running processes on our machine, we can use htop to view basic properties of
processes. Similar to MidnightCommander, function keys perform the most important actions and the
help is visible in the bottom bar. You can also configure htop to display information about your system
like amount of free memory or CPU usage.

For non-interactive use we can execute ps -e (or ps -axufw for a more detailed list).

For illustration here, this is an example of ps output (with --forest option use to depict also the
parent/child relation).

However, run ps -ef --forest on the shared machine to also view running processes of your colleagues.

Listing of processes is not protected in any way from other users. Every user on a particular machine can
see what other users are running (including command-line arguments).

Keep in mind to never pass passwords as command line arguments and pass them always through files
(with proper permissions) or interactively on stdin.
UID PID PPID C STIME TTY TIME CMD
root 2 0 0 Feb22 ? 00:00:00 [kthreadd]
root 3 2 0 Feb22 ? 00:00:00 \_ [rcu_gp]
root 4 2 0 Feb22 ? 00:00:00 \_ [rcu_par_gp]
root 6 2 0 Feb22 ? 00:00:00 \_ [kworker/0:0H-events_highpri]
root 8 2 0 Feb22 ? 00:00:00 \_ [mm_percpu_wq]
root 10 2 0 Feb22 ? 00:00:00 \_ [rcu_tasks_kthre]
root 11 2 0 Feb22 ? 00:00:00 \_ [rcu_tasks_rude_]
root 1 0 0 Feb22 ? 00:00:09 /sbin/init
root 275 1 0 Feb22 ? 00:00:16 /usr/lib/systemd/systemd-journald
root 289 1 0 Feb22 ? 00:00:02 /usr/lib/systemd/systemd-udevd
root 558 1 0 Feb22 ? 00:00:00 /usr/bin/xdm -nodaemon -config /etc/X11/...
root 561 558 10 Feb22 tty2 22:42:35 \_ /usr/lib/Xorg :0 -nolisten tcp -auth
/var/lib/xdm/...
root 597 558 0 Feb22 ? 00:00:00 \_ -:0
intro 621 597 0 Feb22 ? 00:00:40 \_ xfce4-session
intro 830 621 0 Feb22 ? 00:05:54 \_ xfce4-panel --display :0.0 --sm-
client-id ...
intro 1870 830 4 Feb22 ? 09:32:37 \_ /usr/lib/firefox/firefox
intro 1966 1870 0 Feb22 ? 00:00:01 | \_ /usr/lib/firefox/firefox -
contentproc ...
intro 4432 830 0 Feb22 ? 01:14:50 \_ xfce4-terminal
intro 4458 4432 0 Feb22 pts/0 00:00:11 \_ bash
intro 648552 4458 0 09:54 pts/0 00:00:00 | \_ ps -ef --forest
intro 15655 4432 0 Feb22 pts/4 00:00:00 \_ bash
intro 639421 549293 0 Mar02 pts/8 00:02:00 \_ man ps
...
First of all, each process has a process ID, often just PID (but not this one). The PID is a number assigned
by the kernel and used by many utilities for process management. PID 1 is used by the first process in the
system, which is always running. (PID 0 is reserved as a special value – see fork(2) if you are interested in
details.) Other processes are assigned their PIDs incrementally (more or less) and PIDs are eventually
reused.

Note that all this information is actually available in /proc/PID/ and that is where ps reads its information
from.

Execute ps -ef --forest again to view all process on your machine. Because of your graphical interface,
the list will be probably quite long.

Practically, a small server offering web pages, calendar and SSH access can have about 80 processes, for
a desktop running Xfce with browser and few other applications, the number will rise to almost 300 (this
really depends a lot on the configuration but it is a ballpark estimate). About 50–60 of these are actually
internal kernel threads. In other words, a web/calendar server needs about 20 “real” processes, a desktop
about 200 of them :-).

Forcefully terminating processes

At this moment we will show probably the most important thing that you can do with processes. And that
is their forceful termination.

Eventually we will learn about the concept of signals, at the moment we resort ourselves to the basic two
commands.

The pgrep command can be used to find processes matching a given name.

Open two extra terminals and run sleep 600 in one and sleep 800 in the second one. The sleep program
simply waits given amount of seconds before terminating.

In a third terminal, run the following commands to understand how the searching for the processes is
done.

pgrep sleep
pgrep 600
pgrep -f 600
What have you learnt? Answer.

When we know the PID, we can use the kill utility to actually terminate the program. Try running kill
PID with PID of one of the sleeps and look what happened in the terminal with sleep.

You should see something like this:

Terminated (SIGTERM).
This message informs us that the command was forcefully terminated.

Some programs ignore the kill command and do not terminate. We will explain why that is possible in
some of the next labs but for now we want to mention that it is possible to add -9 to the kill command
which instructs the operating system to be a bit more forceful and terminate the program without giving
it any option to disagree ;-).

You can always kill your own processes but killing processes of someone else is not possible.

Quick check on processes

Select all true statements.

Each running process is uniquely identified by its PID.

PID 0 is not used for any process.
A process is simply a copy of an executable file in the memory.
Any user can terminate any process through the kill command.
The list of processes is visible to all users of a particular machine.
Evaluate

Going further

Few extra bits that will improve your user experience with SSH a lot but can be returned to any time later.

Using tmux for better SSH experience

tmux is a terminal multiplexer. That means that inside one terminal it opens several terminals for you that
are running in parallel. It also allows you to send the session into background so that it remains there even
if you log out or your remote connection is interrupted (this is useful for running long scripts). In other
words, tmux gives you tabs (called windows) inside your existing session that can be minimized/iconified
(if borrowing terms from GUI would explain the usefulness better).

The simplest way how to start tmux session is simply:

tmux
Alternatively, we can start session with some meaningful name:

tmux new -s <session_name>

To list all sessions run:

tmux ls
To connect/attach to the running session run:

tmux attach -t <session_name>

And finally, in order to kill the session we use:

tmux kill-session -t <session_name>

Inside the session we are able to create multiple windows, split the screen and much more. In order to
invoke a tmux command, we need firstly to type tmux prefix. The default key binding is <Ctrl>-b.

In order to detach from session we can simply press (do not forget to type the prefix!):

d detach from session

Operation with windows:

c create window
w list windows
n next window
p previous window
f find window
, name window
& kill window
Sometimes it is useful to split the screen into several terminals. These splits are called panes.

Operation with panes (splits):

% vertical split
" horizontal split

o swap panes
q show pane numbers
x kill pane
← switch to left pane
→ switch to right pane
↑ switch to upward pane
↓ switch to downward pane
Other feature is that you can toggle writing simultaneously to all panes. Performing the same operation
multiple times may seem not much useful, but you can for example open several different ssh connections
in advance and then interactively control all these computers at the same time.

To toggle it, type the prefix and then write :set synchronize-panes. If you want to try this in Rotunda,
please do not run computationally intensive tasks…

As usual with Linux tools, you can modify its behavior widely via rc configuration. For instance, in order to
navigate among the panes with vim shortcuts, modify your .tmux.conf so it contains

bind-key C-h run "tmux select-pane -L"

bind-key C-j run "tmux select-pane -D"
bind-key C-k run "tmux select-pane -U"
bind-key C-l run "tmux select-pane -R"
Personal tip № 0: tmux is an excellent tools for collaboration, as multiple users can attach to the same
session.

Personal tip № 1: when you give a lecture, you can attach to the tmux session from two terminals. Later
on, you push the first one to the projector, while the second one is on you laptop screen. This eliminates
the necessity of mirroring your screen. Together with pdfpc and a tiling window manager we get a Swiss-
army knife for presentation.

There is much more to say. For more details see this tmux cheatsheet or manual pages.
SSH configuration

The SSH client is configured via the ~/.ssh/config file. Review the syntax of this file via man 5 ssh_config.

The file is divided into sections. Each section is related to one or more remote hosts. The section header
is in the format Host pattern, where pattern might use wildcards.

The syntax is mostly self-explanatory, so we will only provide an example.

Host *
IdentityFile ~/.ssh/id_ed25519

Host intro
Hostname linux.ms.mff.cuni.cz
User YOUR_SIS_LOGIN

Host mff1
Hostname u-pl6.ms.mff.cuni.cz
User YOUR_SIS_LOGIN

Host mff2
Hostname u-pl17.ms.mff.cuni.cz
User YOUR_SIS_LOGIN

With this ~/.ssh/config, we can type ssh intro and the ssh will start connection equivalent to

ssh [email protected]
We recommend to use different u-pl* hostnames in your config to distribute the load across multiple
machines. Note that the Rotunda machines may be unevenly loaded, so it is a good idea to bookmark
several of them and re-login if the first one is too slow.

Before-class tasks (deadline: start of your lab, week March 13 - March 17)

The following tasks must be solved and submitted before attending your lab. If you have lab on
Wednesday at 10:40, the files must be pushed to your repository (project) at GitLab on Wednesday at
10:39 latest.

For virtual lab the deadline is Tuesday 9:00 AM every week (regardless of vacation days).

All tasks (unless explicitly noted otherwise) must be submitted to your submission repository. For most of
the tasks there are automated tests that can help you check completeness of your solution (see here how
to interpret their results).

05/rights.md (30 points, group admin)

Copy the following text to 05/rights.md in your repository and answer the questions (as usual, keep the
markers intact).

These are multiple options available, separate your answer with spaces or commas, e.g. **[A1]** 1,2
**[/A1]**.

Assume that we have a file `test.txt` for which `ls -l` prints the following:

-rw-r----- 1 bjorn ursidae 13 Mar 21 14:54 test.txt

Which of the following users will be able to read the contents of the file?

1. `bjorn` in group `ursidae`

2. `bjorn` in groups `carnivora` and `mammalia`
3. ìorek` in group ùrsidae`
4. ìorek` in groups `carnivora` and `mammalia`
5. `root` (the superuser)
6. everybody

[A1] ... [/A1]

Consider that the file from the previous example is stored within
the directory `/data` with the following permissions as printed by `ls -l`:

drwxrwx-wx 3 bjorn ursidae 4096 Mar 21 14:53 data

Which of the following users will be able to delete the file?

1. `bjorn` in group `ursidae`

2. `bjorn` in groups `carnivora` and `mammalia`
3. ìorek` in group ùrsidae`
4. ìorek` in groups `carnivora` and `mammalia`
5. `root` (the superuser)
6. everybody

You can assume that the root directory `/` is readable and executable
by everybody.

[A2] ... [/A2]

Continuing with the previous questions, which commands can be used to make
the file `test.txt` readable and writeable only to the owner and nobody else?

1. `chmod u=rw test.txt`

2. `chmod =rw test.txt`
3. `chmod g= test.txt`
4. `chmod o= test.txt`
5. `chmod g=,o= test.txt`
6. `chmod g-r test.txt`
7. `chmod g-rwx test.txt`

[A3] ... [/A3]

Remote file ~/LAB05 (30 points, group net)

Create a normal file LAB05 in your home directory on linux.ms.mff.cuni.cz. Make your UKČO (the number
from your ISIC) its only content.

Do not create this file in GitLab, we will check it on the remote machine only.

The presence of this file is semi-automatically checked by GitLab pipeline. However, there is a certain delay
before the tests are able to tell you that the file was really found on linux.ms.mff.cuni.cz.

05/key.pub (40 points, group net)

Store your public key into this file.

Do not lose the private part for it — we will use it for some other tasks later on.

Post-class tasks (deadline: April 2)

We expect you will solve the following tasks after attending the labs and hearing feedback to your before-
class solutions.

All tasks (unless explicitly noted otherwise) must be submitted to your submission repository. For most of
the tasks there are automated tests that can help you check completeness of your solution (see here how
to interpret their results).

Git over SSH (80 points, group net)

For this task, please make sure that your 05/key.pub file in your repository contains your current public
key.

Then clone the following repository and copy the output from uname -a (from your machine) to a file
called uname.txt in this repository and push it back.
The repositories will be created during week 04 or at the beginning of week 05.

This task is not checked by your GitLab pipeline.

Upload our key (20 points, group net)

Add the following public key to authorized keys on your account at linux.ms.mff.cuni.cz. We will check
that you have completed this task by SSHing to linux.ms.mff.cuni.cz with your login and this key.

ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIPuoJJldIS6zOxunFxIFGk6tQlw0qpYSOxHYs57117o/ teachers-

[email protected]
Do not create this file in GitLab, we will check it on the linux.ms.mff.cuni.cz only.

Learning outcomes

Learning outcomes provide a condensed view of fundamental concepts and skills that you should be able
to explain and/or use after each lesson. They also represent the bare minimum required for understanding
subsequent labs (and other courses as well).

Conceptual knowledge

Conceptual knowledge is about understanding the meaning and context of given terms and putting them
into context. Therefore, you should be able to …

• explain basic principles of network communication (in OS agnostic manner)

• explain basic principles of asymmetric cryptography
• explain in detail how asymmetric cryptography (public and private key) can be used to authenticate
a user
• explain what is SSH and what functions it offers
• explain what account types exist on Linux and how they differ (e.g. johndoe, root and nobody)
• explain how differs execution of programs locally vs over SSH remotely
• explain difference between using Git over HTTPS vs Git over SSH
• explain basic access rights in Unix operating systems
• explain what individual access rights r, w and x mean for normal files and what for directories
• explain what is a sticky bit
• explain what is a process and how it differs from an executable file
• explain difference of ownership of a file and a running process
• optional: provide a high-level overview of POSIX ACLs
Practical skills

Practical skills are usually about usage of given programs to solve various tasks. Therefore, you should be
able to …

• set PS1 variable to distinguish different machines

• use ssh command to login to a remote machine
• execute commands on a remote machine using SSH
• use hostname command
• transfer files over SSH using cat
• setup password-less authentication on a remote Linux machine using a private/public key pair
• setup publickey-based authentication to GitLab
• use git clone (and pull and push) over SSH
• view and change basic access permissions of a file
• use ps to view list of existing processes (including -e, -f and --forest switches)
• use htop to interactively monitor existing processes
• optional: configure SSH shortcuts
• optional: use basic functions of tmux terminal multiplexer
Lab #6 (March 20 - March 24)
• Pandoc
• Running example
• Using && and || (logical program composition)
• Shell variables
• Command substitution (a.k.a. capturing stdout into a variable)
• Functions in shell
• Subshells and variable scoping
• Arithmetic in the shell
• More examples
• Before-class tasks (deadline: start of your lab, week March 20 - March 24)
• Post-class tasks (deadline: April 9)
• Learning outcomes

The goal of this lab is to expand our knowledge about shell scripting. We will introduce
variables, command substitution and also see how to perform basic arithmetics in shell.

We will build this lab around a single example that we will incrementally develop, so
that you learn the basic concepts on a practical example (obviously, there are specific
tools that could be used instead, but we hope that this is better than a completely
artificial example).

Our example will be built around building a small website from Markdown sources
using Pandoc. We will describe Pandoc first and then describe our running example.

Pandoc

Pandoc is a universal document converter that can convert between various formats,
including HTML, Markdown, Docbook, LaTeX, Word, LibreOffice, or PDF.

Ensure that your installation of Pandoc is reasonably up-to-date (i.e., at least version 2.19
that was released about a year ago).

Basic usage

Please, clone our example repository (or git pull it if you still have the clone around).

Move to the 06/pandoc subdirectory.

cat example.md
pandoc example.md
As you can see, the output is a conversion of the Markdown file into HTML, though
without an HTML header.
Markdown can be combined with HTML directly (useful if you want a more
complicated HTML code: Pandoc will copy it as-is).

<p>This is an example page in Markdown.</p>

<p>Paragraphs as well as <strong>formatting</strong> are supported.</p>
<p>Inline <code>code</code> as well.</p>
<p class="alert alert-primary">
Third paragraph with <em>HTML code</em>.
</p>
If you add --standalone, it generates a full HTML page. Let’s try it (both invocations will
have the same end result):

pandoc --standalone example.md >example.html

pandoc --standalone -o example.html example.md
Try opening example.html in your web browser, too.

As mentioned, Pandoc can create OpenDocument, too (the format used mostly in the
OpenOffice/LibreOffice suite).

pandoc -o example.odt example.md

Note that we have omitted the --standalone here as it is not needed for anything else
than HTML output. Check how the generated document looks like in
LibreOffice/OpenOffice or you can even import it to some online office suites.

You should not commit example.odt into your repository as it can be generated. That
is a general rule for any file that can be created automatically.

Side note about LibreOffice

Did you know that LibreOffice can be used from the command line, too? For example,
we can ask LibreOffice to convert a document to PDF via the following command:

soffice --headless --convert-to pdf example.odt

The --headless prevents opening any GUI and --convert-to should be self-
explanatory.

Combined with Pandoc, three commands are enough to create an HTML page and
PDF output from a single source.

Pandoc templates

By default, Pandoc uses its own default template for the final HTML. But we can change
this template, too.

Look inside template.html. When the template is expanded (or rendered), the parts
between dollars would be replaced with the actual content.

Let’s try it with Pandoc.

pandoc --template template.html exanple.md >example.html
Check what the output looks like. Notice how $body$ and $title$ were replaced.

Further uses of Pandoc

Pandoc can be used even in more sophisticated ways, but the basic usage (including
templates) is enough for our running example.

Pandoc supports conversion to and from LaTeX and plenty of other formats (try with -
-list-output-formats and --list-input-formats).

It can be also used as a universal Markdown parser with -t json (the Python call is not
needed as it only reformats the output).

echo 'Hello, world!' | pandoc -t json | python3 -m json.tool

Running example

Please, move to the 06/web subdirectory to see what files we have.

Our example is a trivial website where the user edits Markdown files and we use
Pandoc and a custom template to produce the final HTML. At this moment the final
stage of the example is to produce HTML files that would be later copied to a web
server.

If you look at the files, there are some Markdown sources and build.sh that creates
the web.

Run it to see what the final result looks like.

We will now talk more about shell scripting and use our build.sh script to demonstrate
how we can improve it.

Using && and || (logical program composition)

Recall what is a program exit (return) code before continuing with this section.

Execute the following commands:

ls / && echo "ls okay"

ls /nonexistent-filename || echo "ls failed"
This is an example of how exit codes can be used in practice. We can chain commands
to be executed only when the previous one failed or terminated with zero exit code.

Understanding the following is essential, because together with pipes and standard
I/O redirection, it forms the basic building blocks of shell scripts.
First of all, we will introduce a syntax for conditional chaining of program calls.

If we want to execute one command only if the previous one succeeded, we separate
them with && (i.e., it is a logical and) On the other hand, if we want to execute the
second command only if the first one fails (in other words, execute the first or the
second), we separate them with ||.

The example with ls is quite artificial as ls is quite noisy when an error occurs.
However, there is also a program called test that is silent and can be used to compare
numbers or check file properties. For example, test -d ~/Desktop checks
that ~/Desktop is a directory. If you run it, nothing will be printed. However, in company
with && or ||, we can check its result.

test -d .git && echo "We are in a root of a Git project"

test -f README.md || echo "README.md missing"
This could be used as a very primitive branching in our scripts. In one of the next labs,
we will introduce proper conditional statements, such as if and while.

Despite its silentness test is actually a very powerful command – it does not print
anything but can be used to control other programs.

It is possible to chain commands, && and || are left-associative and they have the same
priority.

Compare the following commands and how they behave when in a directory where
the file README.md is or is not present:

test -f README.md || echo "README.md missing" && echo "We have README.md"
test -f README.md && echo "We have README.md" || echo "README.md missing"
Extending the running example

You probably noticed that we get the last commit id (that is what git rev-parse --
short HEAD does) and use to create a footer for the web page (using the -A switch of
Pandoc).

That works as long as we are part of a Git repository. Copy the whole web directory
outside a Git repository and run build.sh again.

fatal: not a git repository (or any parent up to mount point /)

Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
We got an awful message and the web was not rebuilt.

If we change the line to the following, we ensure that the script can be executed
outside of a Git project.

git rev-parse --short HEAD >>version.inc.html 2>/dev/null || echo "unknown"

>>version.inc.html
Perhaps it is not perfect but at least the web can still be generated.

Shell variables

Variables in the shell are often called environment variables as they are (unlike
variables of any other language) visible in other programs, too.

In this sense shell variables play two important roles. There are normal variables for
shell scripts (i.e., variables with the same meaning as in other programming languages),
but they can also be used to configure other programs.

We have already set the variable EDITOR that is used by Git (and other programs) to
determine which editor to launch. That is, the variable controls behaviour of non-script
programs.

Variables are assigned by the following construct:

MY_VARIABLE="value"
Note that there can be no spaces around = as otherwise shell would consider that as
calling program MY_VARIABLE with arguments = and value.

The value is usually enclosed in quotes, but you can omit them if the value contains
no spaces or other special characters. Generally, it is safer to always quote the value
unless it looks like a C-style identifier.

To retrieve the value of the variable, prefix its name with the dollar sign $. Occurrences
of $VARIABLE are expanded to the value of the variable. This is similar to how ~ is
expanded to your home directory or wildcards are expanded to the actual file names.
We will discuss expansion in more detail later.

Therefore we can print the value of a variable by the following:

echo "Variable MY_VARIABLE is $MY_VARIABLE."

# Prints Variable MY_VARIABLE is value.
Note that environment variables (i.e., those that are intended to be visible to other
applications) are usually named in upper case. For purely shell variables (i.e., variables
in your scripts that are not interesting to other applications) you may prefer lower case
names. In both cases, the convention is usually for snake_case.

Unlike in other languages, shell variables are always strings. The shell has rudimentary
support for arithmetics with integers encoded as strings of digits.

Bash also supports dictionaries and arrays. While they can be extremely useful, their
usage often marks the boundary where using higher-level language might make more
sense with respect to maintainability of the code. We will not cover them in this course
at all.
Extending the running example

Currently our files are generated to the same directory as our source files. That makes
copying the HTML files to a web server error-prone as we might forget some file or
copy source that is not really needed.

Let us change the code to copy the files to a separate directory. We will
create public/ directory for that and modify the main part of our script to the
following:

pandoc --template template.html -A version.inc.html index.md >public/index.html

pandoc --template template.html -A version.inc.html rules.md >public/rules.html
We should also add the following at the end of the script so that public contains all the
required files.

cp main.css public/
All is good. Except the path is hard-coded in several places in the script. That might
complicate maintenance later on.

But we can easily use variable here to store the path and allow the user to change the
target directory by modifying the path in one place.

html_dir="public"

...

pandoc --template template.html -A version.inc.html index.md

>"$html_dir/index.html"
pandoc --template template.html -A version.inc.html rules.md
>"$html_dir/index.html"
cp main.css "$html_dir/"
It may seem as an extra work with no real benefit. But remember that programs are
perhaps written once but read many times and any piece of information that better
describes the intent of the code helps the reader.

Reading environment variables in Python (and export)

If we want to read a shell variable in Python, we can use os.getenv(). Note that this
function has an optional argument (apart from the variable name) for a default value.
Always specify the default or explicitly check for None – there is no guarantee that the
variable has been set.

Note that you can also use os.environ.

By default, the shell does not make all variables available to Python (or any other
application, for that matter). Only so-called exported variables are visible outside the
shell. To make your variable visible, simply use one of the following (the first call
assumes VAR was already set):
export VAR
export VAR="value"
It is also possible to export a variable only for a specific command using this shortcut:

VAR=value command_to_run ...

The variable is changed only for the duration of the command and returns to the
original state afterwards.

Extending the running example

To demonstrate this on our running example, we will use a environment variable to
modify the table caption that is generated by table.py.

caption = os.getenv('TABLE_CAPTION', 'Points')

print(f"""
<table>
<caption>{caption}</caption>
<thead>
<tr>
<th>Team</th>
<th>Points</th>
</tr>
</thead>
<tbody>""")
And then we can change it in the build.sh script:

TABLE_CAPTION="Scoring table" ./table.py <score.csv | pandoc

Special variables and set and env

If you want to see the list of all exported variables, you can use env that prints their
names together with their values.

For the list of all variables, you can execute set (again, as with cd, it is a shell built-in).

Note that some built-ins do not have their own man page but are instead described
in man bash – in the manual page of the shell we are using.

There are several variables worth knowing that are usually present in any shell on any
Linux installation:

• refers to your home directory. This is what ~ (the tilde) expands to.
$HOME
• $PWD contains your current working directory.
• $USER contains the name of the current user (e.g., intro).
• $RANDOM contains a random number, different in each expansion (try echo
$RANDOM $RANDOM $RANDOM).
$PATH

We already mentioned the $PATH variable. Now, it is the right time to explain it in detail.
There are two basic ways how to specify a command to the shell. It can be given as a
(relative or absolute) path (e.g., ./script.sh or 01/project_name.py or /bin/bash), or as
a bare name without slashes (e.g., ls).

In the first case, the shell just takes the path (relative to the working directory if needed)
and executes the particular file. Of course, the file has to have its executable bit set.

In the second case, the shell looks for for the program in all directories specified in the
environment variable $PATH. If there are multiple matches, the first one is used. If there
is none, the shell announces a failure.

The directories in $PATH are separated by colon : and typically, $PATH would contain at
least /usr/local/bin, /usr/bin, and /bin. Find out how your $PATH looks like
(simply echo it on your terminal).

The concept of a search path exists in other operating systems, too. Unfortunately,
they often use different separators (such as ;) because using colon may not be easily
possible.

However, installed programs are not always installed to the directories listed in it and
thus you typically cannot run them from the command line easily.

Extra pro hint for Windows users: if you use Chocolatey, the programs will be in
the $PATH and installing new software via choco will make the experience at least a bit
less painful :-).
It is possible to add . (the current directory) to the $PATH. This would enable executing
your script as just script.sh instead of ./script.sh. However, do not do that (even if
it is a modus operandi on other systems). This thread explains several reasons why it
is a bad idea.

In short: If you put it at the beginning of $PATH, you will likely execute random files in
the current directory which just happen to be named like a standard command (this is
a security problem!). If you put it at the end, you will likely execute standard commands
you even did not know to exist (e.g., test is a shell builtin).

However, it is very useful to create a subdirectory of your home directory

(typically ~/bin), add it to the $PATH, and put all your useful scripts there.
$PATH and the shebang (why we need env)
The shebang requires the interpreter to be given as an absolute path. Sometimes, this
can be inconvenient.
For this reason, Python scripts often use the /usr/bin/env python3 shebang. Here, env is
a command that launches the program specified as the first argument (i.e., python3),
looking for it in the $PATH.

Note that the script filename is appended as another argument, so everything works
as one could expect.

This is something we have not mentioned earlier – the shebang can have one optional
argument (but only one). It is added between the name of the interpreter and the
name of the script.

Therefore, the env-style shebang causes the env program to run with
parameters python3, path-to-the-script.py, and all other arguments. The env then
finds python3 in $PATH, launches it and passes path-to-the-script.py as the first
argument.

Note that this is the same env command we have used to print environment variables.
Without any arguments, it prints the variables. With arguments, it runs the command.

Unix has a long history. Back in the 1970s, the primary purpose of env was to work with the
environment. This included running a program within a modified environment, because the
shell did not know about VAR=value command yet. Decades later, it was discovered that
the side-effect of finding the program in the $PATH is much more useful :-).

We will see in a few weeks why it makes sense to search for Python in the $PATH instead
of using /usr/bin/python3 directly.

The short version is that with env, you can modify the $PATH variable by some clever tricks
and easily switch between different Python versions without any need to modify your code.

Script parameters

In Python, we access script parameters via sys.argv. In shell the situation is a bit more
complicated and unfortunately it is one of the places where the design of the
language/environment is somewhat lacking.

Shell uses special variables $1, $2, … to refer to individual arguments of the
script; $0 contains the script name.

We will later see how we can parse arguments in the usual format of -d , -f ..., now
we will use $i directly.

Shell also offers a special variable "$@" that can be used to pass all current parameters
to another program. We have explicitly used the quotes here as without them the
argument passing can break for arguments with spaces.
As a typical example of using "$@" we will create a simple wrapper for Pandoc that
adds some common options but allows the user to be further customized.

#!/bin/bash

pandoc --self-contained --base-header-level=2 --strip-comments "$@"

Effectively, our call below would be translated like this.

./pandoc_wrapper.sh --standalone --template main.html input.md

# pandoc --self-contained --base-header-level=2 --strip-comments --standalone --
template main.html input.md
Recall that if the user calls this script as ./pandoc_wrapper.sh <input.md things will work.
The standard input is transparently sent do Pandoc.

Uninitialized values and similar caveats

If you try to use a variable that was not initialized, shell will pretend it contains an
empty string. While this can be useful, it can be also a source of nasty surprises.

As we mentioned earlier, you should always start you shell scripts with set -u to warn
you about such situations.

However, you sometimes need to read from a potentially uninitialized variable to check
if it was initialized. For example, we might want to read $EDITOR to get the user’s
preferred editor, but provide a sane default if the variable is not set. This is easily done
using the ${VAR:-default_value} notation. If VAR was set, its value is used,
otherwise default_value is used. This does not trigger the warning produced by set -
u.

So we can write:

"${EDITOR:-mcedit}" file-to-edit.txt
Frequently, it is better to handle the defaults at the beginning of a script using this
idiom:

EDITOR="${EDITOR:-mcedit}"
Later in the script, we may call the editor using just:

"$EDITOR" file-to-edit.txt
Note that it is also possible to write ${EDITOR} to explicitly delimit the variable name.
This is useful if you want to print variable followed by a letter:

file_prefix=nswi177-
echo "Will store into ${file_prefix}log.txt"
echo "Will store into $file_prefixlog.txt"
Extending the running example

We will now extend our running example with several echos so that the script can print
what it is doing.
This is a trivial code that checks if the first argument is --verbose and if so, it sets the
variable verbose to true.

#!/bin/bash

verbose=false
test "${1:-none}" = "--verbose" && verbose=true

...
Such approach would not work very well if we would like to add more switches but it
is good enough for us now.

And now we can add the logging messages.

...

$verbose && echo "Reading current version..." >&2

echo "<p>Version:" >version.inc.html
git rev-parse --short HEAD >>version.inc.html 2>/dev/null || echo "unknown"
>>version.inc.html
echo "</p>" >>version.inc.html

$verbose && echo "Generating HTML ..." >&2

pandoc --template template.html -A version.inc.html index.md
>"$html_dir/index.html"
pandoc --template template.html -A version.inc.html rules.md
>"$html_dir/index.html"

...
How the code above works? Hint. Answer.

Expansion of variables (and other such constructs)

We saw that the shell performs various types of expansion. It expands variables,
wildcards, tildes, arithmetic expressions (see below), and many other things.

It is essential to understand how these expansions interact with each other. Instead of
describing the formal process (which is quite complicated), we will show several
examples to demonstrate typical situations.

We will call args.py from the previous labs to demonstrate what happens. (Of course
you need to call it from the right directory.)

First, parameters are prepared (split) after variable expansion:

VAR="value with spaces"

args.py "$VAR"
args.py $VAR
Prepare files named one.sh and with space.sh for the following example:

VAR="*.sh"
args.py "$VAR"
args.py $VAR
args.py "\$VAR"
args.py '$VAR'
Run the above again but remove one.sh after assigning to VAR.

Tilde expansion (your home directory) is a bit more tricky:

VAR=~
echo "$VAR" '$VAR' $VAR
VAR="~"
echo "$VAR" '$VAR' $VAR
The important take-away is that variable expansion is tricky. But it is always very easy
to try it practically instead of remembering all the gotchas. As a matter of fact, if you
keep in mind that spaces and wildcards require special attention, you will be fine :-).

Extending the running example

We will do only a small change. We will replace the assignment to $html_dir with the
following code.

html_dir="${html_dir:-public}"
What has changed?Answer.

We can now change the behaviour of the program by two means. Use can add --
verbose or modify variable html_dir. That is definitely not very user friendly. We should
allow our script to be executed with --html=DIR to specify the output directory. We will
get back to this in one of the later labs.

At this moment, take it as an illustration of what options are available. The use
of html_dir="${html_dir:-public}" is a very cheap way to add customizability of the
script that can be sufficient in many situations.

Command substitution (a.k.a. capturing stdout into a variable)

Often, we need to store output from a command into a variable. This also includes
storing content of a file (or part of it) in a variable.

A prominent example is the use of the mktemp(1) command. It solves the problem with
secure creation of temporary files (remember that creating a fixed-name temporary
file in /tmp or elsewhere is dangerous). The mktemp command creates a uniquely-named
file (or a directory) and prints its name to stdout. Obviously, to use the file in further
commands, we need to store its name in a variable.

Shell offers the following syntax for the so-called command substitution:

my_temp="$( mktemp -d )"

The command mktemp -d is run and its output is stored into the variable $my_temp.
Where is stderr stored? Answer.

How would you capture stderr then?

For example like this:

my_temp="$( mktemp -d )"

stdout="$( the_command 2>"$my_temp/err.txt" )"
stderr="$( cat "$my_temp/err.txt" )"

...
# At the end of the script
rm -rf "$my_temp"
Command substitution is also often used in logging or when transforming filenames
(use man pages to learn what date, basename, and dirname do):

echo "I am running on $( uname -m ) architecture."

input_filename="/some/path/to/a/file.sh"
backup="$( dirname "$input_filename" )/$( basename "$input_filename" ).bak"
other_backup="$( dirname "$input_filename" )/$( basename "$input_filename" .sh
).bak.sh"
Extending the running example

We will use command substitution to simplify version information generation.

echo "<p>Version: $( git rev-parse --short HEAD 2>/dev/null || echo unknown )</p>"
>version.inc.html
The change is rather small but it makes the generation of the version.inc.html a bit
more compact. We will improve readability of this piece of code with functions in the
next section.

Functions in shell

Recall from your programming classes that functions have one main purpose.

They allow the developer to introduce a higher level of abstraction by naming a certain
block of code, thus better capturing the intent of a larger piece of code.

Functions also reduce code duplicity (i.e., the DRY principle: don’t repeat yourself) but
that is mostly a side effect of creating new abstractions.

Functions in shell are rather primitive in their definition as there is never any formal list
of arguments or return type specification.

function_name() {
commands
}
A function has the same interface as a full-fledged shell script. Arguments are passed
as $1, $2, …. The result of the function is an integer with the same semantics as the exit
code. Thus, the () is there just to mark that this is a function; it is not a list of
arguments.

Please consult the following section on variable scoping for details about which
variables are visible inside a function.

Extending the running example

We will add several new functions to our example to make it a bit more useful.

We will start with the logging.

log_message() {
echo "$( date '+build.sh | %Y-%m-%d %H:%M:%S |' )" "$@" >&2
}
Run the inner call to date by itself to see what it does (the key is that + at the beginning
which informs date that we want to use a custom format).

And now we will replace the logging calls like this.

logger=":"
test "${1:-none}" = "--verbose" && logger=log_message

$logger "Reading current version..."

...
$logger "Generating HTML ..."
There are two tricks here. We have replaced true/false with direct calls to our function.
Hence we do not need to have the conditional execution with && at all.

The second trick is the use of colon :. That is basically a special builtin that does
nothing. But it still behaves as a command. So by setting logger to : or to log_message,
we execute one of the following:

: "Reading current version"

log_message "Reading current version"
The second one calls the logger, the first one does nothing.

Voilà, our logging is complete.

On your own, wrap the the version generation into a reasonable function. Solution.

On your own, wrap the calls of Pandoc to a suitable function.Solution.

Function return value

Calling return terminates function execution, the optional parameter of return is the
exit code.

If you use exit within a function, it terminates the whole script.

The following is an example that checks whether given file has the right Bash shebang.

is_shell_script() {
test "$( head -n 1 "$1" 2>/dev/null )" = '#!/bin/bash' && return 0
return 1
}
Because the exit code of the last program is also the exit code of the whole function,
we can simplify the code to the following.

is_shell_script() {
test "$( head -n 1 "$1" 2>/dev/null )" = '#!/bin/bash'
}
And such function can be used to control program flow:

is_shell_script "input.sh" || echo "Warning: shebang missing from input.sh" >&2

Note how good naming simplifies reading of the script above.

The same effect would be obtained by using the following code directly but using
function allows us to capture the intent.

test "$( head -n 1 "input.sh" 2>/dev/null)" = '#!/bin/bash' || echo "Warning:

shebang missing from input.sh" >&2
Local variables in functions

It is also a good idea to give a name to the function argument instead of referring to
it by $1. You can assign it to a variable, but it is preferred to mark the variable
as local (see details below):

is_shell_script() {
local filename="$1"
test "$( head -n 1 "$filename" 2>/dev/null)" = '#!/bin/bash' )"
}
The code is virtually the same. But by assigning $1 to a properly named variable we
increase the readability: the reader immediately sees that the first argument is a
filename.

Command precedence

You might notice that aliases, functions, built-ins, and regular commands are all called
the same way. Therefore, the shell has a fixed order of precedence: Aliases are checked
first, then functions, then built-ins, and finally regular commands from $PATH.
Regarding that, the built-ins command and builtin might be useful (e.g., for functions of
the same name).

Take away

Despite many differences from functions in other programming languages, shell

functions still represent the best way to structure your scripts. A properly named
function creates an abstraction and captures the intent of the script while also
hiding implementation details.

Subshells and variable scoping

This section explains few rules and facts about scoping of variables and why some
constructs could not work.

Shell variables are global by default. All variables are visible in all functions,
modification done inside a function is visible in the rest of the script, and so on.

It is often convenient to declare variables within functions as local, which limits the
scope of the variable to the function.

More precisely, the variable is visible in the function and all functions called from it. You can
imagine that the previous value of the variable is saved when you execute the local and
restored upon return from the function. This is unlike what most programming languages
do.

When you run another program (including shell scripts and Python programs), it gets
a copy of all exported variables. When the program modifies the variables, the changes
stay inside the program, not affecting the original shell in any way. (This is similar to
how working directory changes behave.)

However, when you use a pipe, it is equivalent to launching a new shell: variables set
inside the pipeline are not propagated to the outer code. (The only exception is that
the pipeline gets even non-exported variables.)

Enclosing part of our script in ( .. ) creates a so-called subshell which behaves as if

another script was launched. Again, variables modified inside this subshell are not
visible to the outer shell.

Read and run the following code to understand the mentioned issues.

global_var="one"

change_global() {
echo "change_global():"
echo " global_var=$global_var"
global_var="two"
echo " global_var=$global_var"
}

change_local() {
echo "change_local():"
echo " global_var=$global_var"
local global_var="three"
echo " global_var=$global_var"
}
echo "global_var=$global_var"
change_global
echo "global_var=$global_var"
change_local
echo "global_var=$global_var"

(
global_var="four"
echo "global_var=$global_var"
)

echo "global_var=$global_var"

echo "loop:"
(
echo "five"
echo "six"
) | while read value; do
global_var="$value"
echo " global_var=$global_var"
done
echo "global_var=$global_var"

Arithmetic in the shell

The shell is capable of basic arithmetic operations. It is good enough for computing
simple sums, counting the numbers of processed files etc. If you want to solve
differential equations, please choose a different programming language :-).

Simple calculations are done inside a special $(( )) environment:

counter=1
counter=$(( counter + 1 ))
Note that variables shall not be prefixed with a $ inside this environment. As a matter
of fact, in most cases things will work even with $ (e.g., $(( $counter + 1 ))) but it
is not a good habit to get into.

Extending the running example

As a last change to our running example we will measure how long the execution was.

For that we will use date because with +%s it will print the amount of seconds since the
start of the Epoch.

As a matter of fact, all unix systems internally measure time by counting seconds from 1st
January of 1970 (Epoch start) and all displayed dates are recomputed from this.

Therefore following 3 lines around the whole script can give us number of seconds
that were spent running our script (at the moment, the script should not take more
than 1 second to complete but we might have more pages or more data eventually).
#!/bin/bash

wallclock_start="$( date +%s )"

...

wallclock_end="$( date +%s )"

$logger "Took $(( wallclock_end - wallclock_start )) seconds to generate."

More examples

More examples to try your knowledge before attacking the graded tasks.

Return to the examples from Lab 04 and decide where adding a function to the
implementation would improve the readability of the script.

Print information about the last commit, when the script is executed in a directory that is not
part of any Git project, the script shall print only Not inside a Git
repository. Hint. Solution.

The command getent passwd USERNAME prints the information about user
account USERNAME (e.g., intro) on your machine. Write a command that prints information
about user intro or a message This is not NSWI177 disk if the user does not
exist. Solution.

Before-class tasks (deadline: start of your lab, week March 20 - March

24)

For virtual lab the deadline is Tuesday 9:00 AM every week (regardless of vacation
days).

All tasks (unless explicitly noted otherwise) must be submitted to your submission
repository. For most of the tasks there are automated tests that can help you check
completeness of your solution (see here how to interpret their results).

06/override.sh (25 points, group shell)

The script will print to stdout contents of a file HEADER (in the working directory).

However, if a file .NO_HEADER exists in the current directory, nothing will be printed
(even if HEADER exists).
If neither of the files exists, the program should print Error: HEADER not found. on
standard error and terminate with exit status 1.

Otherwise, the script will terminate with success.

Use only && and || to control program flow, do not use if even if you happen to know
these constructs in shell. It is okay to get information about file existence several times
in the script, we will not modify the files while your script is running.

06/mod_date.sh (25 points, group shell)

The script will print modification date (%Y) of a file given to it as its first argument.

The modification date should be printed in YYYY-MM-DD format, if the file does not exist
(or there is some other issue in reading the modification time) the program should
terminate with non-zero exit code.

Hint: stat, date.

06/feedback.txt (50 points, group admin)

As we are approaching the middle of the semester, we would like to collect some
feedback (so that we can apply it to the rest of the course).

Please, complete the following form and after you complete it, please, create an empty
file 06/feedback.txt in your repository.

The link points to different language translations of the same survey, complete only
one of them, please.

• Survey in English: https://forms.gle/67H1vNcdfr2h7kA76

• Survey in Czech: https://forms.gle/BejxjPAqVdhJAt3e8

(We do not see any other simple way to ensure that the survey remains anonymous.)

Post-class tasks (deadline: April 9)

We expect you will solve the following tasks after attending the labs and hearing
feedback to your before-class solutions.

All tasks (unless explicitly noted otherwise) must be submitted to your submission
repository. For most of the tasks there are automated tests that can help you check
completeness of your solution (see here how to interpret their results).

TBA
Learning outcomes

Learning outcomes provide a condensed view of fundamental concepts and skills that
you should be able to explain and/or use after each lesson. They also represent the
bare minimum required for understanding subsequent labs (and other courses as well).

Conceptual knowledge

Conceptual knowledge is about understanding the meaning and context of given

terms and putting them into context. Therefore, you should be able to …

• explain what is an environment variable

• explain how variable scoping works in shell
• explain the difference between a normal and exported shell variable
• explain how $PATH variable is used in shell
• explain how changing $PATH affects program execution
• explain how shell expansion and splitting into command-line arguments is
performed
• optional: explain why current directory is usually not part of $PATH variable
Practical skills

Practical skills are usually about usage of given programs to solve various tasks.
Therefore, you should be able to …

• use Pandoc to convert between various text formats

• set (assign) and read environment variables
• compute mathematical expressions directly in shell using $(( )) construct
• use command substitution ($( ))
• use composition operands && and || in shell scripts
• create and use shell functions
• use subshell to group multiple commands
• optional: read environment variables in Python
• optional: create custom templates for Pandoc

Linux For Beginners - Shane Black
100% (4)
Linux For Beginners - Shane Black
165 pages
Overview of Linux Operating System
No ratings yet
Overview of Linux Operating System
8 pages
Linux For Beginners - The Ultimate Practical Guide To Operating System, Command Line and Programming. Improve Your Computer Skills and Become A Computing Expertise.
No ratings yet
Linux For Beginners - The Ultimate Practical Guide To Operating System, Command Line and Programming. Improve Your Computer Skills and Become A Computing Expertise.
119 pages
Linux For Absolute Beginners - 5 Books in 1 The Ultimate Guide To Advanced Linux Programming - Kernel
No ratings yet
Linux For Absolute Beginners - 5 Books in 1 The Ultimate Guide To Advanced Linux Programming - Kernel
251 pages
كتاب تعلم لينكس للمبتدئين PDF
No ratings yet
كتاب تعلم لينكس للمبتدئين PDF
96 pages
Linux for Absolute Beginners
100% (1)
Linux for Absolute Beginners
22 pages
Linux Essentials: A Beginner's Guide
No ratings yet
Linux Essentials: A Beginner's Guide
24 pages
Linux & Kali Hacking Guide
No ratings yet
Linux & Kali Hacking Guide
244 pages
Corso Linux PDF
No ratings yet
Corso Linux PDF
365 pages
Getting Started with Linux Installation
No ratings yet
Getting Started with Linux Installation
17 pages
Linux For Beginners - 21th Edition 2025
100% (1)
Linux For Beginners - 21th Edition 2025
86 pages
Linux For Beginners Ed21 2025
100% (2)
Linux For Beginners Ed21 2025
86 pages
Linux/Unix Course Overview and Benefits
No ratings yet
Linux/Unix Course Overview and Benefits
31 pages
Linux - System Commands
100% (1)
Linux - System Commands
199 pages
Linux Systems
No ratings yet
Linux Systems
199 pages
AGTLinux
No ratings yet
AGTLinux
40 pages
02 IntroLinux
No ratings yet
02 IntroLinux
30 pages
Introduction to Linux Basics
No ratings yet
Introduction to Linux Basics
4 pages
Linux Programming Bible - John Goerzen
75% (4)
Linux Programming Bible - John Goerzen
517 pages
LFS101x - Introduction To Linux Outline
0% (1)
LFS101x - Introduction To Linux Outline
4 pages
Linux For Beginners The Ultimate Guide To The Linux Operating System Linux
No ratings yet
Linux For Beginners The Ultimate Guide To The Linux Operating System Linux
97 pages
Seminar Report ON "Linux"
No ratings yet
Seminar Report ON "Linux"
17 pages
Linux Lab File
No ratings yet
Linux Lab File
30 pages
Linux Basic To Advance
No ratings yet
Linux Basic To Advance
189 pages
LINUX
No ratings yet
LINUX
26 pages
Linux Administration Guide
No ratings yet
Linux Administration Guide
26 pages
Master Linux: Novice to Expert Guide
No ratings yet
Master Linux: Novice to Expert Guide
267 pages
Linux Basic Notes For VLSI
No ratings yet
Linux Basic Notes For VLSI
23 pages
Introduction to Linux at AAU
No ratings yet
Introduction to Linux at AAU
18 pages
Understanding Linux Operating System Basics
No ratings yet
Understanding Linux Operating System Basics
52 pages
Chapter 0 Introduction Au Linux
No ratings yet
Chapter 0 Introduction Au Linux
31 pages
1.2 Linux Is A Kernel: Share
No ratings yet
1.2 Linux Is A Kernel: Share
18 pages
Linux Preview
No ratings yet
Linux Preview
22 pages
Linux Basics PPT Imp
100% (1)
Linux Basics PPT Imp
26 pages
Linux OS: A Comprehensive Guide
No ratings yet
Linux OS: A Comprehensive Guide
21 pages
Introduction to Linux Fundamentals
No ratings yet
Introduction to Linux Fundamentals
84 pages
Linux GUI Tool for DNS/DHCP Config
No ratings yet
Linux GUI Tool for DNS/DHCP Config
4 pages
Linux
No ratings yet
Linux
13 pages
Introduction to Linux Operating System
No ratings yet
Introduction to Linux Operating System
13 pages
The Complete Linux Manual Ed2 2019
100% (1)
The Complete Linux Manual Ed2 2019
147 pages
Introduction to Linux Operating System
No ratings yet
Introduction to Linux Operating System
5 pages
Presentation 510 Content Document 20241110104928PM
No ratings yet
Presentation 510 Content Document 20241110104928PM
48 pages
Introduction to Linux Operating System
No ratings yet
Introduction to Linux Operating System
17 pages
Introduction to Linux Operating System
No ratings yet
Introduction to Linux Operating System
17 pages
Linux Essentials for Beginners
No ratings yet
Linux Essentials for Beginners
9 pages
NDG Linux Essentials Course Overview
No ratings yet
NDG Linux Essentials Course Overview
9 pages
Basic Linux
No ratings yet
Basic Linux
46 pages
Unit 4 Fcit
No ratings yet
Unit 4 Fcit
20 pages
Linux Desktop Usage and Tools Guide
No ratings yet
Linux Desktop Usage and Tools Guide
11 pages
NDG Linux Essentials English 0821
No ratings yet
NDG Linux Essentials English 0821
8 pages
Introduction To The Linux Operating System
100% (2)
Introduction To The Linux Operating System
31 pages
Introduction to Linux and Unix Systems
No ratings yet
Introduction to Linux and Unix Systems
72 pages
Mock Test 2
No ratings yet
Mock Test 2
3 pages
Kinematics Study Guide: Vectors & Motion
No ratings yet
Kinematics Study Guide: Vectors & Motion
22 pages
Right Triangles (Trigonometry)
No ratings yet
Right Triangles (Trigonometry)
36 pages
Hierarchical Clustering & Recommendation Systems
No ratings yet
Hierarchical Clustering & Recommendation Systems
5 pages
Linus Torvalds: Richard Matthew Stallman
No ratings yet
Linus Torvalds: Richard Matthew Stallman
35 pages
AS4041 ASME B31 - 3 Pipe Wall Thickness
100% (1)
AS4041 ASME B31 - 3 Pipe Wall Thickness
8 pages
Liebherr-Lubricants Technical Datasheet: Downloaded by Mammadli Fuad (LAZ)
100% (3)
Liebherr-Lubricants Technical Datasheet: Downloaded by Mammadli Fuad (LAZ)
2 pages
Bradbery 2013
No ratings yet
Bradbery 2013
18 pages
Sales Quotation
No ratings yet
Sales Quotation
3 pages
PL900 Microsoft Power Platform Fundamentals (Mod1)
No ratings yet
PL900 Microsoft Power Platform Fundamentals (Mod1)
29 pages
Risk Register Template
No ratings yet
Risk Register Template
3 pages
8 Gbps LVDS Transmitter Design Thesis
No ratings yet
8 Gbps LVDS Transmitter Design Thesis
75 pages
Bacc 5 TLA 4
No ratings yet
Bacc 5 TLA 4
2 pages
" Enhance Basic Education Act Including Als and Learners " With Special Needs
75% (4)
" Enhance Basic Education Act Including Als and Learners " With Special Needs
12 pages
Visitation Novena Printable Rev
No ratings yet
Visitation Novena Printable Rev
4 pages
Smarties Color Distribution Analysis
No ratings yet
Smarties Color Distribution Analysis
2 pages
Cement Plant Separators & Classifiers
100% (1)
Cement Plant Separators & Classifiers
52 pages
Ethical and Professional Issues in Computing: Chapter 2: Computer Ethics
No ratings yet
Ethical and Professional Issues in Computing: Chapter 2: Computer Ethics
52 pages
Share Point Server Architecture-1
No ratings yet
Share Point Server Architecture-1
4 pages
L2 - DNA KNex Lab
100% (1)
L2 - DNA KNex Lab
16 pages
The Power of Stories in Ethics
No ratings yet
The Power of Stories in Ethics
15 pages
Technology and Livelihood Education: Agricultural Crop Production Quarter 1 - Module 3: Layout Garden Plots
80% (5)
Technology and Livelihood Education: Agricultural Crop Production Quarter 1 - Module 3: Layout Garden Plots
25 pages
Invata Engleza Din Seriale Cu Rita
100% (1)
Invata Engleza Din Seriale Cu Rita
120 pages
Why Can't I Fly? A Dolch Reader
No ratings yet
Why Can't I Fly? A Dolch Reader
7 pages
Calculation Report Beam RSJ - Rev01
No ratings yet
Calculation Report Beam RSJ - Rev01
17 pages
W22Xdb IE3 Squirrel Cage Motor Data Sheet
No ratings yet
W22Xdb IE3 Squirrel Cage Motor Data Sheet
2 pages
(Ad Feminam) Wendy Barker - Lunacy of Light - Emily Dickinson and The Experience of Metaphor (Ad Feminam) - Southern Illinois University Press (1987) PDF
No ratings yet
(Ad Feminam) Wendy Barker - Lunacy of Light - Emily Dickinson and The Experience of Metaphor (Ad Feminam) - Southern Illinois University Press (1987) PDF
234 pages
Questions
No ratings yet
Questions
16 pages
Nandi Chemicals Product Guide Complete
No ratings yet
Nandi Chemicals Product Guide Complete
4 pages
Nantucket Nectars' Next Move
0% (1)
Nantucket Nectars' Next Move
3 pages