Lab Linux Complete
Lab Linux Complete
1. Course organization
2. What is Linux
3. Key concepts of Linux
4. First steps inside Linux
5. Source code versioning essentials
6. First steps inside GitLab
7. Submitting assignments
8. Wrapping up
9. Post-class tasks (deadline: March 5)
10. Learning outcomes
Course organization
Because this is a highly practically oriented course there is a great stress on before class preparation.
We strongly believe that you can learn topics of this course only if you try them yourself. Doing that
during the labs is often not the best course of action as it allows to pass the lab without doing much
yet having the feeling of good understanding. Which is usually proved wrong when you complete
graded assignment after the lab.
Therefore we decided that before coming to the labs you will study the materials yourself and lab will
be used to (a) discuss your solutions (b) answer your questions and (c) do more examples.
We will publish lab contents several weeks ahead so you can organize your time as you see fit. Before
coming to the lab, you will submit a set of graded tasks to verify that you understood the basic
principles correctly. We will look at them during the labs, discuss unclarities and thus strengthen your
knowledge.
There will be also a shorter task that you complete after the class so that you can demonstrate that
you have learned from the lab itself (e.g. from the feedback to your before-class tasks).
Take this as a brief overview, course organization is described in more detail on a separate page.
This lab is somewhat special as it is the very first one, hence there are no before-class tasks.
However, starting from next week there will be graded tasks that you are supposed to complete
before coming to next lab.
In this lab we will cover basic concepts of Linux, source code versioning and also how to submit graded
tasks.
Purpose of this course
The purpose of this course is not only to show you a different operating system but also to show you
a different style of work.
We expect that after this course, you will be able to do the following:
• Use Linux as a user for your everyday work. This includes the activity of normal user such as reading e-
mails as well as activities of a power user who can really control his machine.
• Use typical Linux tools with ease. We will not spend our time on common software such as a web
browser or image editor that you can find virtually anywhere, but focus on tools that are closer to the
system itself.
• Automate your work a lot. You will learn that many everyday tasks can be simplified by writing small
programs that automate them. Linux offers the right environment for this.
On the other hand, this course does not cover machine administration (except for the fundamentals
required to maintain your laptop) or compiling your own kernel.
What is Linux
Under the term Linux we mean the operating system and software that is typically available on such
system. This includes – but is not limited to – development tools (compilers etc.), graphical
environment, text editors, spreadsheet software, web browsers etc.
Note that for simplicity – both in writing and in speaking – we use the term Linux to name the whole
environment we will be working in.
Strictly speaking, the name Linux refers only to the kernel of the operating system – i.e. the bottom
layer of the software stack (the applications are considered the top layers).
The whole environment is often called GNU/Linux to emphasize that it is the kernel and other free
software.
You will also often hear the term Linux distribution. That is a fancy name for a packaging of the Linux
kernel (i.e. the lowest layer of the software stack) and user applications. There are hundreds of
distributions available, some differ only in the default wallpaper, some are specific for a certain
domain (e.g., network testing).
Their fundamental differences are mostly on maintenance level, e.g. how software is installed or how
the system is configured. Most of the time end users do not need to care at all which distribution they
use.
We will be using Fedora which is a generic distribution that can run on servers as well as on desktops.
If you are new to Linux we strongly recommend to stay with our choice and use our installation.
Although many of the Linux concepts as well as the software available on Linux is also present in other
operating systems, Linux provides them in a nice integrated packaging. Also we believe that only Linux
provides an environment for their seamless integration.
Key concepts of Linux
Here we list the key concepts of a Linux environment. Take it as an overview only, we will provide
further details in subsequent labs.
Linux uses open-source software (OSS). That means that you are free to inspect how things are
implemented. You are also free to change the implementation. Do not underestimate this aspect. It is
really important. And as a sidenote: even with OSS one can earn money.
Linux is extremely flexible and customizable. You can run Linux on IoT devices as well as on heavy-
duty routers. Linux is running on cell-phones as well as supercomputers. The user can configure
virtually anything. Traditionally, configuration is stored as plain text files. While it is often possible to
edit the configuration from a GUI-based tool, Linux always allows the user to edit the file manually.
An advanced user virtually needs only a good text editor to configure the whole system.
Linux also has a graphical user interface. But it is an optional part of the system as it is not always
needed. Server-style machines do not need any movable windows to operate. And when you need
the GUI, you can choose from many types to best suit your needs. From the system perspective, GUI
is just another application running in the system, not a part of the system.
Linux excels when controlled through a command-line interface (CLI). While entering textual
commands might seem a very obsolete way of controlling your machine, it is not. After all, most
programming languages are still based on textual source code. And CLI has many advantages over a
GUI: it is explicit and easily automated. It is also perfect for remote access as it is very modest on
resources.
Probably the most important concept is that everything is a file. This means that even your devices
– such as the hard-drive – are available as normal files that can be read or written. This actually
simplifies implementation of the tools and it enables fuller control over the system. But it does not
stop with devices: even information about your system – such as list of running programs – is available
as a contents of a special file. This is a great thing for a programmer: virtually you need only file API
(such as Pythonic with open("filename", "r") as f) to get all the information about the system.
Linux is by default a multi-user system. Not only that it allows to set up user accounts, but multiple
users can use the system at the same time. They can be connected remotely, but it is even possible to
use a machine with a dual-head graphical card and two keyboards by two people simultaneously.
Linux also prides itself on remote access support. As long as your system is connected to a network,
you can configure it to be remotely accessible. This simplifies management of server machines, but it
can be useful even for your laptop left at home. While remote CLI access is usually preferred, it is
possible to connect graphically, too. Actually, you can even connect graphically in several instances at
once, each using a different environment.
Linux simplifies management of installed software by use of packages. They can be roughly compared
to various Stores that you may know from your cell-phones. They simplify the installation as you do
not need to click through any installation wizard and package managers also keep your system up to
date.
Probably the most important concept is that user is in control of the machine. The philosophy is that
the user is smart enough to control the computer and Linux does virtually nothing without explicit
action and does not hide information from you. You can configure it to do things automatically, but
it is always a layer on top of the base system. You do not need special tools to look inside.
This may sound scary, but it is actually fun. You will (not might) understand how the computer and
software works much better if you use and explore Linux.
Here we assume that you have your USB disk ready or you have your virtual machine running. Please,
refer to another page on how to actually boot your machine to Linux.
Return to this page once you boot from the USB (or using any other method mentioned) to continue
with this lab.
Feel free to bring your laptop to the lab and let us help you with the booting.
Once you boot from the USB disk, you can choose which desktop environment you will use.
On most operating systems, there are not many options on how to control your graphical interface.
With Linux, there is a much wider choice. It ranges from rich environments with a plenty of eye-candy
to very austere ones that do not even employ a mouse. Of course, there are dozens of environments
somewhere in between.
Recall that this is easily possible, because the GUI is actually controlled by a normal application: it is
not hard-wired into the system.
Openbox and i3 are special environments as they do not contain the traditional task bar with a list of
windows and they require a bit more patience before they are mastered. On the other hand, the time
investment, especially for i3 that is driven by keyboard only, pays back in a much more efficient usage
of your computer.
We encourage you to try all of them. Login into the environment, determine how applications are
launched and decide which environment you like the most. Note that the environments can be further
customized – from the overall color scheme to keyboard shortcuts.
If you are unable to decide, Plasma is a good choice for ex-Windows users with decent hardware.
Choose LXDE if your machine is shorter winded. And after a month of using these, switch to i3 to
become a true power user.
Once you decide on your desktop environment, look around for other applications you will need.
Above all, look around for the text editors available. There are several popular graphical editors
already installed as you can see on the following screenshot.
Note that other editors are available from the command-line: we will talk about these during the next
lab.
Source code versioning essentials
We will now switch to a side track and talk about software projects in general.
Modern software is rarely a product of one-man teams. Rather, it is developed by large teams that
can span several time zones or even continents.
Development in such teams requires that all developers have access to the (most up-to-date version
of the) source code and that they can communicate with other members of the team efficiently.
There are many solutions to this: from e-mails and shared network disks to more sophisticated
solutions. To prepare you a little for the software engineering practice, we will be using one of the
more sophisticated solutions and that is GitLab.
GitLab offers a place where developers can share the source code, but also manage a list of existing
bugs, keep documentation, and even automatically test their code. And since it can be integrated with
other tools as well, for many companies as well as open source projects, GitLab became the central
place for their product.
Furthermore, we can use its advantages even when working alone. Even if we would use it only as a
smart backup for our source code at the beginning.
For this course, GitLab will also become the central place for many tasks. You will submit solutions to
it and there is also the Forum project where you can ask questions.
There are other alternatives to GitLab offering similar features. We will be focusing on GitLab in this
course, but the general principles apply to other tools, too.
The central point of any software project is the source code. Without it, there is nothing to be
executed. Therefore, extra care and tooling is provided for source code management itself.
GitLab itself is built around Git. Git is a versioning system. In layman terms, it means that it watches
your files for changes and remembers previous versions of your files. It has the big advantage that
you can freely update your code and still return to its older versions.
We will be working with Git through the whole course. Take this description as a very high-level
overview so you can start working with Git the GUI-way in GitLab.
Practically, Git always works in a certain directory that typically represents one project. The user needs
to tell Git which files are to be tracked and at which point to create a new version.
Git does not track all files as there is typically no need to version the compiled files (because you can always
recreate them). For example, for Java project you do not need to track *.class files as you can create them
from *.java ones by compiling the source codes again (some applies to *.pyc with Python or *.o with
C++). Another example would be that you do not track PDF export of a LibreOffice document (though
tracking *.odt files is not something where Git would unleash its full potential).
Git does not create the versions automatically as each version is supposed to capture a reasonable
state of the project. Thus, for example, you create a new version (sometimes also called revision) once
you add a new feature to your software. Or when you fix a bug. Or when you fix a typo in the
documentation. Or even when you want to backup your work before going to lunch :-).
It allows you to create a reasonable history of the software that is small enough for reviewing (for
example), but it does not preserve every small typo you made. Versioning does not replace
undo/redo of your editor, it operates one level above that.
And when employed in a team, Git can be used to synchronize changes done by multiple users. For
example, if Alice makes a change to file alpha.txt and Bob at the same time changes the
file bravo.txt, Git allows Carol to work seamlessly on a version that contains changes both from Alice
and Bob.
At this moment we will be using only the graphical interface provided by GitLab in the web browser.
Later in the course we will uncover even the more advanced scenarios.
For this course, we will be using the faculty instance of GitLab at https://gitlab.mff.cuni.cz. Please,
do not confuse it with the instance at gitlab.com that you can freely use, but which is in no way
connected with this course.
For login (username) you will be using your CAS credentials, i.e., the same ones as you use for SIS.
Your first login will activate your account.
Always use your name-based login (e.g. johndoe) not the numerical one.
Please activate your account now, if you have not yet done so. Please, read our Q & A if you have
trouble logging in.
To quickly try GitLab (we will focus on it more in several labs), create a new project (create a Blank
project). You need to fill in a project name, its slug (a short version of the name used in the URL), and
its visibility.
In the example screenshots below, we create a project with our source code from the introductory
programming course. Do not forget to ensure that the project is initialized with a README.
Now open Web IDE which is a simple editor available for on-line editing of the source code files.
Using the icons and the help of the following set of screenshots, create a new file, name
it hello.py and insert a simple Python program.
We will now create a so-called commit. Commit in Git captures the current state of the project and
can be seen as a named version. In fact, whenever you create a commit, Git will ask you for a Commit
message where you are supposed to describe what changes you made.
We highly recommend the article How to Write a Git Commit Message by Chris Beams for nice tips on how
to write a good commit message. However, it might make more sense to return to this article later on once
you know Git a bit more.
For now, we will be making all changes directly to the Master branch. We will explain the concepts of
branches later on, for now take them as a magic that works :-).
The important thing to remember is that commit assigns a name to a particular state of your source
code (revision).
Often you will see names such as Add icons to the menu or Fix button typo or Finish Czech translation.
As you see, they refer to the state of the project.
On your own: sign in to GitLab again, find your project and create a new file, pasting in some of your
source code. Note that when you click on the filename on the project homepage, you will see its
contents and again a link for its editing.
We will create a special project for each of you here with your CAS login in its name.
For technical reasons, we can create the project only after you sign-in to GitLab for the first time. We
create these projects semi-manually so may need to wait until another day for your project to appear.
Each assignment will have a prescribed filename where to submit the solution. Submitting under a
different filename (or to a different folder) means we will not be able to find your assignment (and
thus we will count it as not submitted). There are about 300 students enrolled to this course and we
need to automate a lot of things: in this sense we really cannot manually look around your project to
guess whether you have submitted under a different name.
Each submission – more precisely each commit – will launch automated tests on top of your repository.
These tests will check whether you have submitted the solution at all and also check whether it
behaves as it is supposed to.
We have put more details on how to interpret the results on a separate page.
Wrapping up
Each lab also contains so-called learning outcomes. They capture the most important theoretical
knowledge as well as practical skills that you should have after completing the lab.
Use them as you see fit. They can serve as a checklist that you understand a new topic or as a summary
if you are already familiar with some topics.
Post-class tasks (deadline: March 5)
We expect you will solve the following tasks after attending the labs and hearing feedback to your
before-class solutions.
All tasks (unless explicitly noted otherwise) must be submitted to your submission repository. For most
of the tasks there are automated tests that can help you check completeness of your solution (see
here how to interpret their results).
The program will look for a file named README.md (in current directory) and it will print its first non-
empty line. We assume that project name would be on the first line of the read-me file.
If all lines are empty or the file does not exist, the program will try to look
for readme.md, README and readme in this order (stopping the search when title is found). That is,
if readme.md contains non-empty line, it will not even try to look for README.
If no file from the above list contains non-empty line, the program will print name of the current
directory (os.path.basename(os.getcwd())).
We consider line empty if line.strip().lstrip('# ') == ''. Printing the project title should also
strip it of blank spaces and leading # (i.e. the above code). This ensures that trailing whitespace is
ignored and for Markdown titles we remove extra formatting.
You can also consult the results of the provided tests if behaviour in some situations is not clear from
the description above. Interpreting test results is described on a separate page.
Upload your solution into folder 01 in your GitLab submission repository and name the
file project_name.py. For most of the tasks we will name them in this manner: task name will directly
refer to the file where to store your solution.
For this task, we have created a simple skeleton in your repository that you should use as a starting
point.
Important: the first line in the skeleton code from us starting with #! is there for a reason and you
have to keep it there (we will discuss this later on). Also keep the usage of main() and the condition
with __name__ as that represents a proper module-ready code.
Important: do not upload the README files into your project. This file is provided by the tests
automatically (but, obviously, create it on your machine when debugging your solution).
Forum confidential issue (50 points, group git)
An Issue in GitLab is a report typically describing an existing bug in a software project. We will use
such Issues for off-line communications in this course.
When an issue is marked Confidential, only users with certain access rights can see it. In the case of
the Forum project, only teachers can see such issues.
This assignment asks you to create a Confidential Issue on the Forum with the following properties.
The issue before submitting should therefore look like this in the Preview tab (colors and fonts may
obviously differ).
Important: make sure you create this issue confidential. Public issues cause notifications to be sent
to all members of the project, in this case it means that all of your colleagues would receive an e-mail
they have no interest in.
Learning outcomes
Learning outcomes provide a condensed view of fundamental concepts and skills that you should be
able to explain and/or use after each lesson. They also represent the bare minimum required for
understanding subsequent labs (and other courses as well).
Conceptual knowledge
Conceptual knowledge is about understanding the meaning and context of given terms and putting
them into context. Therefore, you should be able to …
• explain why graphical user interface is not a fixed part of Linux
• list several differences between various graphical interfaces available in Linux
• explain in broad terms what is a Linux distribution
• explain what can be understood under the term of unix family of operating systems
• list a few types of assets that are typically needed for software projects
• explain in broad terms what is a versioning tool
• explain fundamental high-level operations of versioning tools
Practical skills
Practical skills are usually about usage of given programs to solve various tasks. Therefore, you should
be able to …
• boot your own machine into Linux (either via USB, dual-boot or virtualized)
• log in to a graphical Linux environment
• log in to the faculty instance of GitLab
• create a new project in GitLab
• upload a new file to GitLab via its web user interface and create a commit from it
• edit existing files in a GitLab project using its web interface
• customize a selected graphical environment
• create a basic GitLab issue in a given project
Introduction to Linux (NSWI177)
In this lab we will start on learning the most effective way to control your Linux machine
– via command-line interface.
The lab starts with a bit motivation why we should care about command-line interface
at all. Then we do a short recap of what is a filename and a file path and continue to a
brief explanation of the Linux file system hierarchy.
After this more theoretical introduction we dive into using the terminal. You will learn
how to navigate through directories and how to display file contents in the terminal.
First of all, it is explicit and precise. There is no danger that a user would have a different
skin or a different set of taskbars when describing an action to make. Using an exact
command leaves no place minimizes misunderstanding.
Next, it is also rather fast. Once we start comparing the possible speed of a mouse
when clicking on icons versus the speed of keyboard keystrokes, the keyboard would
be a clear winner (assuming in both approaches we would know what we want to do).
And partially connected with the above reasons, it is also easy to save the typed
commands into a file and re-run later.
Such (text) files are called scripts in the Linux world. They can simply be a list of
commands to execute, but they can also consist of loops and conditions to execute
more sophisticated actions. We will devote several labs to these.
And from the machine side, it is also extremely efficient. Especially when we talk about
remote access over an unstable connection. The difference between sharing even a
small 800×600 screen vs. sending keystrokes is substantial. Managing Linux servers
over a flaky 2G connection is possible, managing a server offering only GUI over such
a poor connection is out of question.
Actually, it is exactly the same as with any programming language: you need to know
the API before being able to write a program. Except in Linux, the API are not functions
in a typical programming language, but rather complete programs.
The fact is that the shell we will be using was born about 50 years ago. But it is still
used today. It may mean that we were not able to come with anything better for quite
a long time. But, more likely, it may suggest that the pros are worth it.
The beginning might be difficult, but you will not regret it in the long run.
From the practical point of view, using the command line is somewhat similar to using
Python in an interactive session. You type commands, you can edit them and once you
are happy, you execute the command by hitting <Enter>.
From now on, our interactive with the system would mostly look like this.
Do not be scared, though. Linux often trades eye-candy for efficiency. Try to approach
it more like a new programming language: you need to learn (and remember too!) a
bit about the constructs and the standard library first before writing big programs. In
Linux it is the same. And it is definitely worth it if you mean it with computers seriously.
We mean it. Seriously.
As a matter of fact, you probably know all of this. Feel free to skim over this part if that
is so. We have highlighted the important parts for you.
Basic terms
In our text, we will use the term filename to refer to a plain file name without any
directory specification. Filenames are what you see when you open any kind of file
browser or in an e-mail with attachments.
Note that on Linux we prefer to use the word directory over the term
folder. Folder usually refers to something virtual that is not present on the file system
(i.e., as if not physically existing as-is on the hard drive). Therefore, we can talk about
folders in your e-mail client or in a cloud storage.
Path means that the filename is prefixed with some kind of directory specification.
On Linux the path separator is a forward slash / (i.e., no escaping needed when
writing it in Python). Linux does not have any notion of disk drives: everything is found
under a so-called root which is a single forward slash /.
When you are building paths in Python, you should prefer to use
functions os.path.join() and similar as they ensure that the right separator is used
regardless of the actual platform (instead of pasting dir_name + '/' +
filename manually).
A path can be relative or absolute. When a path is absolute, it refers to a specific file
on a given computer. No matter what directory you are currently in. A relative path is
always combined with another directory to form an absolute path.
On Linux, each absolute path must start with a slash; if a path does not start with a
slash, it is treated as a path relative to the working (current) directory. Intuitively, the
working directory refers to the directory that you just opened in the file browser.
Special directories
A path can contain references to parent directories via .. (two dots). For example,
relative path ../documents/letter.odt means that the file is located in
directory documents that is one level up from the current directory. Assuming we are in
directory /home/intro/movies (note that this is an absolute path), the absolute path for
the letter.odt would be /home/intro/movies/../documents/letter.odt which can be
resolved (shortened) to /home/intro/documents/letter.odt.
Apart from the special directory name of .., there is also a special directory . (dot) that
refers to the current directory. Therefore ./bin/run_tests.sh refers to a
file run_tests.sh in a bin directory that is a subdirectory of the current one (i.e., it is
exactly the same as bin/run_tests.sh). Later, we will see why the dot . directory is
needed.
Filename extensions
Linux does not enforce or restrict the use of an extension in the filename
(e.g., .zip or .pdf). In fact, a file can exist without it and it can even have multiple ones.
A typical example of multi-extension file is file.tar.gz which denotes that the file is a
tape archive (.tar) later compressed with gzip.
Hidden files
It is important to remember that dot-files are completely normal files (or directories) and it is
just a convention to not show them by default. It is not a security measure. It just keeps the
listing a bit less verbose.
Typically, configuration (e.g., which wallpaper you have on your desktop) is stored in dot files
as they are usually supposed to be ignored by the user (at least most of the time) and would
only clutter the listing.
We have started our Linux exploration with paths and filenames for a very good reason.
Virtually everything in a Linux system is a file.
You already know that there are plain files (e.g., the letter.odt file we mentioned
above that represents a word processor document) and directories (for organizing
other files).
Be aware that the word file in Linux can refer to both normal files as well as directories,
i.e., a directory is a file.
There are also other special types of files that can represent hardware devices, systems
state etc. We will talk about these later.
Enough of theory. Please, locate the Terminal program and start it. Depending on your
environment, it will be either Terminal, Console, or perhaps even Shell (although,
technically, shell is the program running inside a terminal emulator).
We recommend you spend some time configuring the look of your terminal, such as
having a nice font family and a reasonable font size. You will be spending quite a lot
of time with it, so make the experience nice. Below are some possibilities of what you
might get :-).
You will see something like [intro@localhost ~] and a blinking cursor after that. This
is called a prompt and if you see it, it means you can enter your commands.
The prompt is displayed by your shell which is an interpreter of the commands you
enter. The shell is actually a full-fledged programming language, but in this lab we will
use it to launch very simple commands only.
Type uptime and start this command by submitting it with <Enter>. Until you
hit <Enter>, you can easily edit the command. Shortcuts such as <Ctrl>-<Arrow> for
jumping over words work, too.
Whenever you select a text in the terminal with your mouse, it is automatically copied.
This text then can be inserted by simply clicking the middle mouse-button (or the
wheel).
Note that the well-known <Ctrl>-C and <Ctrl>-V combinations do not work in the shell
as <Ctrl>-C is used to forcefully terminate a program. However, <Ctrl>-<Shift>-
C usually works.
Note that these are actually two distinct clipboards – the special one bound to the
middle mouse button and the one bound to <Ctrl>-C (<Ctrl>-<Shift>-C) and <Ctrl>-V.
In graphical applications, <Ctrl>-C and <Ctrl>-V work as usual.
To close the terminal, you can simply close the whole window (e.g., via mouse) but you
can also type exit or hit <Ctrl>-D on an empty line. Because we are moving away from
needing mouse (in a sense), you should prefer <Ctrl>-D ;-).
Debugging issues
We will also stop inserting here screenshots of the terminal from now on and paste
only the output (though you should always run the command by yourself to see
what it does as first-hand experience).
For pasting into our Forum enclose the text in the fenced block ``` to preserve the
monospace font.
```
ls nonexistent
ls: cannot access 'nonexistent': No such file or directory
```
Navigating through the filesystem
We will start with simple navigation through the file system. Two basic commands will
get you through.
total 4
drwxr-xr-x. 1 intro intro 0 Feb 10 13:43 Desktop
drwxr-xr-x. 1 intro intro 0 Feb 10 13:43 Documents
drwxr-xr-x. 1 intro intro 0 Feb 10 13:43 Downloads
-rw-r--r--. 1 intro intro 1022 Jan 9 18:13 gif.md
drwxr-xr-x. 1 intro intro 0 Feb 10 13:43 Music
drwxr-xr-x. 1 intro intro 0 Feb 10 13:43 Pictures
drwxr-xr-x. 1 intro intro 0 Feb 10 13:43 Public
drwxr-xr-x. 1 intro intro 0 Feb 10 13:43 Templates
drwxr-xr-x. 1 intro intro 0 Feb 10 13:43 Videos
The -l turned on the so-called long mode where more details about each file are
printed.
We will return to the meaning of some of the columns later on, deciphering the
columns for the last modification time and the file size is straightforward and sufficient
for the moment.
cd .
Answer.
Notice that the command prompt changed whenever you switched to a different
directory.
By default, it shows only the last component of the path. To show the full (absolute)
path, we need to run pwd.
/home/intro/Videos
Tab completion
Typing long filenames can be cumbersome and making typos is annoying. Shell
offers tab completion to help you with this.
For this example, we assume you just launched your terminal and ls prints Desktop
Documents Downloads Templates etc.
If we want to change to directory Templates, start typing cd Te and hit <Tab>. Unless
there is another filename (directory) starting with Te, the name shall be completed for
you and should read the full cd Templates/.
Submitting the command with <Enter> would switch you to the directory as we would
expect. Try it and come back to this directory again.
Now, let us switch to Documents directory. For this example, type cd Do and press <Tab>.
There are two directories with this prefix: Documents and Downloads. Because the shell
cannot know which one you want, it does nothing.
However, pressing <Tab> for the second time shows the possible matches and after
typing c (the next letter), <Tab> can finish the completion.
Note that shells in other operating systems also offer tab completion but in a less organized
manner.
Type just c (as in cd) and hit <Tab>. What happens? Answer.
Home directory
You probably noticed that when you start your terminal, the directory name you see
there is just a ~ even though it should read intro (or your username on that particular
machine) as that is the last component from pwd.
However, the path /home/intro is your home directory and has a special shortcut of
tilde ~.
Futhermore, if you just run command cd without any extra arguments, it will change
the directory back to your home.
A quick recap
While the use of purely command-line tools such as uptime, ls or cd is cool and
extremely useful for scripts, there are also occasions where a more interactive
approach is faster.
In this sense, Linux typically offers three layers you can choose from. From a fully
graphical one called Graphical User Interface (GUI), over a tool with a Text-based User
Interface (TUI) to a pure Command-Line Interface (CLI). Every of these can be useful,
depending on the circumstances.
Actually, there is also a fourth (bottom) layer where you directly access the special files
yourself.
Midnight commander
Run mc and navigate through the files as you have done with ls and cd.
The numbers at the bottom refer to your function keys for typical file operations
(e.g., F5 copies the file).
Note that in a typical setup, MC offers two panels with file listing, you switch between
them via <Tab> and, by default, copying is done to the directory in the other panel.
MC is a quite powerful tool as it can inspect file archives, show files on a remote
machine, etc.
We will briefly mention the most important things that you can do with it. Do try them
:-)
You can quit MC with <F10> or via a menu (activated by <F9>). Note that some terminals
capture <F10> to activate their window menu (but this behaviour can be changed
in Preferences of the terminal application).
Ranger
Ranger is a Vim-inspired file manager for the console. It brings some well-known key
bindings from the Vim realm together with tabs pages.
Navigation
• j - Move down
• k - Move up
• h - Move to the parent directory
• l - Open file or move to directory
• gg - Go to the top of the list
• G - Go to the bottom of the list
• gh - cd ~
• gm - cd /media
• gr - cd /
• q - Quit Ranger
You probably noticed that the Development submenu contains several graphical text
editors that you can use to edit the source code. However, it is also possible to edit
files in TUI editors.
If you are asking why to learn another editor (if you are already happy with some of
the graphical ones), here is the answer. On some machines, you may not have access
to GUI at all. Recall that we talked about remote access earlier: in that case you will
have only TUI available (and you will often need to edit files on the remote machine).
Some users thus never use GUI editors at all, the reasoning is that it is much better to
learn (and customize) one editor properly, and that editor is a TUI-based one.
On our disk, you will find Emacs, Joe, mcedit and Vim.
Each has its own advantages and it is up to you which one you will choose. Note
that mcedit is probably the closest to an editor you may know from other
systems. joe is a small one, but perfectly suitable for script editing that we will be doing
the most. Both emacs and vim are extremely powerful tools that can do much more than
just edit files. However, they require a bit of time investment before you can start using
them effectively.
If you are new to Linux, we would recommend you to use mcedit (either using it directly
or when editing files in Midnight commander) and come back to the other ones later
on for a final decision of THE text editor of your choice.
All of these editors can be launched from the command line, giving it the filename to
edit as a parameter (e.g., mcedit quiz.md).
There will be many occasions (including some graded tasks in this course) where you
will be forced to edit files on a remote machine that offers only CLI (TUI) interface.
Learn how to use some TUI editor soon, we will need it in future labs.
Some of the editors mentioned above also offer GUI version and are multi-platform so
there is no excuse for not trying something new :-)
For the following, you will need to have the same list of files as we have.
Please, download this archive and unpack its contents. If you want to download it
from the command line, you can use wget URL, otherwise use whatever browser you
like. Use Midnight commander to copy the unpacked content to your home
directory. Hint.
Shell wildcards
So far, we used ls to display all files in a directory. If we are interested in only a subset,
we can specifically name them on the command line.
Move to the directory where you have unpacked the nswi177-lab02.tar.gz. You should
see the following files:
ls -l *.txt
It is essential to note that ls (or any other program for that matter) will receive
the expanded list of files – finding the matching files is done by the shell, not by
individual programs. Thus for the above example, from inside ls there is no way of
distinguishing whether the user used the full list or the *.txt wildcard. You will
experiment with this in one of the next labs where we will talk about accessing these
parameters in your favorite programming language. For developers, it means that they
do not need to care about implementing the wildcard expansion themselves. The
program would always receive a list of existing filenames, not a wildcard.
By the way – is the last sentence completely correct? What happens if we run ls -l
*.txxxt? Answer.
How would you print all files starting with the letter t? Answer.
ls [of]*.txt
If we want to print files that end with any of the letters from a to f, we could use
ls *[a-f].txt
Try it in the a subdirectory.
Note that the files are sorted alphabetically when specified via wildcards.
And now list all files/directories starting with D (recall that Linux is case-sensitive). You
might be surprised because a straightforward ls D* would actually list the contents in
these directories. It is perfectly expectable, because ls Documents is supposed to print
a list of files in that directory. If we do not want ls to descend into directories, we can
add -d option to prevent that.
What happens when you specify a file that does not exist? And what if only some of
the specified files do not exist?
Apart from * (that matches any part of the filename) and [list-of-characters] (that
matches one letter in the filename) also exists ? that matches any single letter
(character).
Hence x?.txt will match files where the filename is 6 letters (chars) long, starts
with x and ends with .txt (i.e., two letter filename starting with x of plain text type).
More about hidden files
Recall that filenames starting with dot . are hidden. These are by default not listed
by ls. If you want to see these files too, you have to either name them explicitly or use
the -a option.
Again: it is not a security measure, just a way to make the listing less cluttered.
We have already mentioned text editors and MC to look into files when working in the
terminal. They are not the only options.
Text files
The simplest way to dump the contents of any file is to call a program called cat. Its
arguments are filenames to print. The name cat has nothing to do with the mammal
but refers to the middle of the word concatenate as it can be used to actually
concatenate files.
Move to the b subdirectory. Executing cat 000.txt will show the contents of 000.txt on
the screen.
How would you show the contents of all files in this directory? Answer.
Binary files
If we want to dump binary files (such as images), it is usually better to dump their bytes
in hexadecimal.
We will always use it with -C switch to print hexdump and ASCII characters next to each
other. The dump of the GIF file looks like this:
hexdump -C c/sample.gif
00000000 47 49 46 38 39 61 0a 00 0a 00 91 00 00 ff ff ff |GIF89a..........|
00000010 ff 00 00 00 00 ff 00 00 00 21 f9 04 00 00 00 00 |.........!......|
00000020 00 2c 00 00 00 00 0a 00 0a 00 00 02 16 8c 2d 99 |.,............-.|
00000030 87 2a 1c dc 33 a0 02 75 ec 95 fa a8 de 60 8c 04 |.*..3..u.....`..|
00000040 91 4c 01 00 3b |.L..;|
00000045
Unprintable values (e.g., smaller than 32) are replaced with a dot.
Notice that the first characters are normal ASCII letters (which was a smart decision of the
authors of the file format).
Guessing file type
Even though the file extension is not mandatory, it is better to use it to explicitly
identify file types.
If you are not sure about the file type, utility file can identify the file type for you.
file c/sample.gif
Manual pages
We have seen that the ls behaviour can be modified with -a, -d, and -l. hexdump has -
C. Do you know that uptime accepts -s? And that cat takes -n to print line numbers?
Execute man cmd to access a manual for the cmd program (substitute cmd for the
actual command name). Use arrows for scrolling and q to quit the manual. You can
search inside the page with / (slash) key.
Manual pages are organized into sections and you can specify the section number as
part of the man execution, e.g., man 3 printf opens a help page for printf() function in
the C language because that is the contents of section 3. Note that man printf would
show you the contents of printf manual from section 1, i.e., the shell command.
Open man man to see the full list of sections. Briefly, 1 is for shell commands, 3 is for
library calls, and 4 and 5 are used for specific files (e.g., man 5 proc launches the manual
page for the whole /proc directory).
Note that manual pages are also available on-line, hence you can study your favourite
commands even without access to your Linux machine.
Typical options
Many of the options are more-or-less standardized across multiple programs and are
worth remembering.
Almost all GNU programs that you will have on your machine will print a small help
when executed with --help. Try it for ls or cd.
--versioncould be used to print the version and copyright information of the executed
program. Sometimes -v or -V works as well.
--verboseor --debug (sometimes -v or -d) launch the program in verbose mode where
the program prints in more detail what it is doing.
--dry-run (sometimes -n) executes the program without performing actual changes
(e.g., it can print which files would be removed without actually deleting any of them).
--interactive (sometimes -i) will typically cause the program to ask for interactive
confirmation of destructive actions.
-- could be used to terminate the list of options if you have filenames starting with a
dash. For a classical example, move into the d subdirectory of nswi177-lab02 and list
information about a file named -a. Then check your result and try again using the -
- delimiter. Answer.
Do not underestimate the need for -- when working with unknown files. It might be
an innocent mistake when a file named -f appears, but the results without using -
- in cmd WILDCARD might be tremendous.
Always use cmd -- WILDCARD when the wildcard starts with * or when the wildcard
comes potentially from the user (i.e. also when user specifies list of files on the
command line).
If you create a file called file with spaces.txt and then execute
If you would use tab completion, your command would be completed with escape
characters.
We will mention this again when talking about scripts, but it is something to remember:
spaces in filenames can cause unexpected surprises and it is better to avoid such
naming.
And yes, it is possible to create a file named ' ' (i.e., space) and show its contents
with cat " " but it is not a very sensible idea to do so. It is similar to creating files
starting with a dash – it is possible, there are ways to bypass the issues (e.g., using -
- delimiter) but it is just simpler to avoid these issues.
Work efficiently
Do not be afraid of running multiple terminals next to each other. Use one to navigate
with ls and cd, use the other one for Midnight commander to mirror your actions.
Open another one with a manual page for the command you are using.
Most desktop environments allow you to create multiple workspaces or desktops. Then,
each workspace has its own list of opened windows, windows opened on other
workspaces are not visible. This can reduce the clutter significantly and – with proper
keyboard shortcuts – speed up your work.
We will talk about this in greater detail in the following lab, for now you can use the
following command to actually run your Python script:
python3 path_to_your_python_script.py
The following tasks must be solved and submitted before attending your lab. If you
have lab on Wednesday at 10:40, the files must be pushed to your repository (project)
at GitLab on Wednesday at 10:39 latest.
For virtual lab the deadline is Tuesday 9:00 AM every week (regardless of vacation
days).
All tasks (unless explicitly noted otherwise) must be submitted to your submission
repository. For most of the tasks there are automated tests that can help you check
completeness of your solution (see here how to interpret their results).
Note that this task is not fully checked by GitLab as it would reveal the answers.
https://d3s.mff.cuni.cz/f/teaching/nswi177/202223/labs/nswi177-task02.tar.gz
For example, if your login is `johndoe`, you should paste contents from files
`0jz.txt`, `1ez.txt` but not from `ajz.txt` or `2wz.txt` or
`0jx.txt`.
Sort the list of files alphabetically before getting their content, duplicate
letters should be ignored (i.e., use wildcards naturally and you will be fine).
Insert your answer between the markers below, replacing the three dots.
Leading and trailing whitespace in your answer will be ignored but
keep the starred A1 markers without changes (tests will check that).
**Q2** Insert here the wildcard pattern that you have used
(only the pattern without `ls` or any other command you have used).
https://d3s.mff.cuni.cz/f/teaching/nswi177/202223/labs/02/LOGIN.broken.gif
The image is broken because we replaced the signature at first 3 bytes with letters XXX.
This should not prevent you from reading the size of the original GIF image. Following
links will guide you through the internals of the GIF file format (look for logical screen
descriptor).
Write the answer in the format WIDTHxHEIGHT into 02/gif.txt file (e.g. 50x100).
The automated tests only check format of your answer (otherwise the solution would
be too easy).
We will upload the files at the beginning of the first week and update it daily (for those
that enroll later).
Hint: you may find the -n option useful to limit scrolling through hexdump output.
We expect you will solve the following tasks after attending the labs and hearing
feedback to your before-class solutions.
All tasks (unless explicitly noted otherwise) must be submitted to your submission
repository. For most of the tasks there are automated tests that can help you check
completeness of your solution (see here how to interpret their results).
Store the command into the 02/file-size.txt without any actual filename. We will add
the filenames during testing, for your experiments add it manually.
Assuming the file 02/file-size.txt contains stat -f we expect that running stat -f
filename.txt will print the following.
filename.txt 42
Similarly, running stat -f one.txt two.txt will print the following.
one.txt 1
two.txt 2047
Do not append any filenames to your command so that we can properly test it.
As you might have guessed, look into the manual page of stat to find the right
options. Do not print any other information apart from the filename and file size.
/home/../usr/./share/./man/../../lib/../../etc/ssh/.././os-release
The automated tests only check format of your answer (otherwise the solution would
be too easy).
We are after the best possible (i.e., most precise) answer: certainly an answer of it is a
Python code that prints something is true but that is not what we are after :-)
stats = {}
with open('/proc/meminfo', 'r') as f:
for line in f:
parts = line.split(":")
stats[parts[0].strip()] = parts[1].split()[0].strip()
print(float(stats['MemFree'])/float(stats['MemTotal']))
1. Prints data of the first two lines from the file /proc/meminfo.
2. Prints the second column of the file /proc/meminfo where the columns are
separated by colons (:).
3. Prints aproximate percentage of free memory on the system.
4. Ensures that /proc/meminfo contains valid data.
5. Reads /proc/meminfo to determine if they are in a correct format.
The automated tests only check format of your answer (otherwise the solution would
be too easy).
Learning outcomes
Learning outcomes provide a condensed view of fundamental concepts and skills that
you should be able to explain and/or use after each lesson. They also represent the
bare minimum required for understanding subsequent labs (and other courses as well).
Conceptual knowledge
Practical skills are usually about usage of given programs to solve various tasks.
Therefore, you should be able to …
• 2023-02-14: Files for before lab tasks were uploaded, missing links to GIF
specification were added.
• 2023-02-14: Add more info about tests and some hints to graded tasks.
• 2023-02-25: Move task 02/filepath.txt to the admin group.
Lab #3 (February 27 - March 3)
• Linux scripting
• Git principles
• Running tests locally
• Before-class tasks (deadline: start of your lab, week February 27 - March 3)
• Post-class tasks (deadline: March 19)
• Learning outcomes
• This page changelog
The goal of this lab is to introduce you to the Git command-line client and how to write reusable scripts.
We will demonstrate how Linux is suited for interpreted languages. And we will make our work with
GitLab much more efficient and see how to transfer files from it and back to it via a command-line
client.
Linux scripting
A script in the Linux environment is any program that is interpreted when being run (i.e., the program
is distributed as a source code). In this sense, there are shell scripts (the language is the shell as you
have seen it last time), Python, Ruby or PHP scripts.
The advantage of so-called scripting languages is that they do require only a text editor for
development and that they are easily portable. Disadvantage is that you need to install the interpreter
first. Fortunately, Linux typically comes with many interpreters preinstalled and starting with a scripting
language is thus very easy.
To write a shell script, we simply write the commands into a file (instead of typing them in a terminal).
Therefore, a simple script that prints some information about your system could be as simple as the
following.
cat /proc/cpuinfo
cat /proc/meminfo
If you store this into a file first.sh, then you can execute it with the following command.
bash first.sh
Notice that we have executed bash as that is the shell program (interpreter) that we are using and the
name of the input file.
It will cat those two files (note that we could have executed a single cat with two arguments as well).
Recall that your project_name.py script can be executed with the following command (again, we run the
right interpreter).
python3 factor.py
Shebang and executable bit
Running scripts by specifying the interpreter to use (i.e., the command to run the script file with) is not
very elegant. There is an easier way: we mark the file as executable and Linux handles the rest.
Actually, when we execute the cat command or mc, there is a file (usually in
the /bin or /usr/bin directory) that is named cat or mc and marked executable. (For now, imagine the
special executable mark as a special file attribute.) Notice that there is no file extension.
However, marking the file as executable is only the first half of the solution. Imagine that we create the
following content and store it into a file hello.py marked as executable.
print("Hello")
And then we want to run it.
But wait! How will the system know which interpreter to use? For binary executables (e.g., originally
from C sources), it is easy as the binary is (almost) directly in the machine code. But here we need an
interpreter first.
In Linux, the interpreter is specified via so-called shebang or hashbang. As a matter of fact, you have
already encountered it several times: When the first line of the script starts with #! (hence the name
hash and bang), Linux expects a path to the interpreter after it and will run this interpreter and ask it to
execute the script.
The Linux kernel refuses to execute shebang-less scripts. But if you run them from the shell, the shell will try
interpreting them as shell scripts. It is good practice not to rely on this behavior.
For shell scripts, we will be using #!/bin/bash, for Python we need to use #!/usr/bin/env python3. We
will explain the env later on; for now, please just remember to use this version.
Note that most interpreters use # to denote a comment which means that no extra handling is needed to skip
the first line (as it is really not needed by the interpreter).
You will often encounter #!/bin/sh for shell scripts. For most scripts is actually does not matter: simple
constructs work the same, but /bin/bash offers some nice extensions. We will be using /bin/bash in this
course as the extensions are rather useful.
You may need to use /bin/sh if you are working on older systems or you need to have your script
portable to different flavours of Unix systems.
To complicate things a bit more, on some systems /bin/sh is the same as /bin/bash as it is really a
superset.
Bottom line is: unless you know what you are doing, stick with #!/bin/bash shebang for now.
Now back to the original question: how is the script executed. The system takes the command from the
shebang, appends the actual filename of the script as a parameter, and runs that. When the user
specifies more arguments (such as --version), they are appended as well.
For example, if hexdump were actually a shell script, it would start with the following:
#!/bin/bash
...
code-to-loop-over-bytes-and-print-them-goes-here
...
Executing hexdump -C file.gif would then actually execute the following command:
The user does not need to care about the implementation language.
We know about the shebang, so we will update our example and also mark the file as an executable
one.
#!/bin/bash
cat /proc/cpuinfo
cat /proc/meminfo
To mark it as executable, we run the following command. For now, please, remember it as a magic that
must be done, more details why it looks like this will come later.
chmod +x first.sh
chmod will not work on file systems that are not Unix/Linux-friendly. That unfortunately includes even NTFS.
GitLab web GUI does not offer any means for setting the executable bit. You need to use Git CLI client instead
(see the second half of this lab).
Now we can easily execute the script with the following command:
./first.sh
The obvious question is: why the redundant ./? It refers to the current directory after all, right (recall
previous lab)? So it refers to the same file!
When you type a command (e.g., cat) without any path (i.e., only bare filename containing the
program), shell looks into so-called $PATH to actually find the file with the program
(usually, $PATH would contain directory /usr/bin where most of the executable binaries are stored).
Unlike in other operating systems, shell does not look into the working directory when program cannot
be found in the $PATH.
To run a program in the current directory, we need to specify its path (when any extra path is provided,
shell ignores $PATH and simply looks for the file). Luckily, it does not have to be an absolute path, but a
relative one is sufficient. Hence the magic spell of ./.
If you move to another directory, you can execute it by providing a relative path too, such
as ../first.sh.
Run ls in the directory now. You should see first.sh now printed in green. If not, you can try ls --
color or check that you have run chmod correctly.
If you do not have a colorful terminal (unusual but still possible), you can use ls -F to distinguish file
types: directories will have a slash appended, executable files will have an asterisk next to their filename.
Excercise
Create a script that prints all image files in current directory (for now, you can safely assume there will
always be some). Try to run it from different directories using relative and absolute path. Answer.
Create a script that prints information about currently visible disk partitions in the system. For now, it
will only display contents of /proc/partitions. Answer.
cd /proc
cat cpuinfo
cat meminfo
Run the script again.
Despite the fact that the script changed directory to /proc, when it terminates, we are still in the original
directory.
Try inserting pwd to ensure that the script really is inside /proc.
This is an essential take away – every process (running program; this includes scripts) has its own current
directory. When it is started, it inherits the directory from its caller (e.g., from the shell it was run from). Then it
can change the current directory, but that does not affect other processes in any way. Thus, when the program
terminates, the caller is still in the same directory.
This also means that cd itself cannot be a normal binary. Because if it would be a normal program (e.g.,
in Python), any change inside it would be useless after its termination.
If you want to see what is happening, run the script as bash -x first.sh. Try it now. For longer scripts,
it is better to print your own messages as -x tends to become too verbose.
To print a message to the terminal, you can use the echo command. With few exceptions (more about
these later), all arguments are simply echoed to the terminal.
Create a script echos.sh with the following content and explain the differences:
#!/bin/bash
echo alpha bravo charlie
echo alpha bravo charlie
echo "alpha bravo" charlie
Answer.
Command-line arguments
Command-line arguments (such as -l for ls or -C for hexdump) are the usual way to control the
behaviour of CLI tools in Linux. For us, as developers, it is important to learn how to work with them
inside our programs.
We will talk about using these arguments in shell scripts later on, today we will handle them in Python.
Accessing these arguments in Python is very easy. We need to add import sys to our program and then
we can access these arguments in the sys.argv list.
#!/usr/bin/env python3
import sys
def main():
for arg in sys.argv:
print("'{}'".format(arg))
if __name__ == '__main__':
main()
When we execute it (of course, first we chmod +x it), we will see the following (lines prefixed
with $ denote the command, the rest is command output).
$ ./args.py
'./args.py'
$ ./args.py one two
'./args.py'
'one'
'two'
$ ./args.py "one two"
'./args.py'
'one two'
Note that the zeroth index is occupied by the command itself (we will not use it now, but it can be used
for some clever tricks) and notice how the second and third command differs from inside Python.
It should not be surprising though, recall the previous lab and handling of filenames with spaces in
them.
Other interpreters
We will now try what other interpreters we can put in the shebang.
Construct an absolute (!) path (hint: man 1 realpath) to the args.py that we have used above. Use it as
a shebang on an otherwise empty file (e.g. use-args) and make this file executable. Hint.
This is essential – when you add a shebang, the interpreter receives the input filename as the first argument. In
other words – every Linux-friendly interpreter shall start evaluating a program passed to it as a filename in the
first argument.
As another example, prepare the following file and store it as experiment (with no file extension) and
make the file executable:
#!/bin/bash
echo Hello
Note that we decided to drop the extension again altogether. The user does not really need to know
which language was used. That is captured by the shebang, after all.
Now change the shebang to #!/bin/cat. Run the program again. What happens? Now run it with an
argument (e.g., ./experiment experiment). What happened? Answer.
We will assume that both my-cat and my-echo are executable scripts in the current directory.
my-cat contains as the only content the following shebang #!/bin/cat and my-echo contains
only #!/bin/echo.
So far, our interaction with GitLab was over its GUI. We will switch to the command line for higher
efficiency now.
Recall that GitLab is built on top of Git which is the actual versioning system used.
Git offers a command-line client that can download the whole project to your machine, track changes
in it, and then upload it back to the server (GitLab in our case, but there are other products, too).
While it is possible to edit many files on-line in GitLab, it is much easier to have them locally and use a better
editor (or IDE). Furthermore, not all tools have their on-line counterparts and you have to run them locally.
Before diving into Git itself, we need to prepare our environment a bit.
Git will often need to run your editor. It is essential to ensure it uses the editor of your choice.
We will explain the following steps in more detail later on, for now ensure that you add the following
line to the end of ~/.bashrc file (replace mcedit with editor of your choice):
export EDITOR=mcedit
Now open a new terminal and run (including the dollar sign):
$EDITOR ~/.bashrc
If you set the above correctly, you should see again .bashrc opened in your favorite text editor.
If not, ensure you have really modified your .bashrc file (in your home directory¨) to contain the same
as above (no spaces around = etc.).
You need to close all terminals for this change to make an effect (i.e., before you start using any of the Git
commands mentioned below).
Never use a graphical editor for $EDITOR unless you really know what you are doing. Git expects a certain
behaviour from the editor that is rarely satisfied by GUI editors but is always provided by a TUI-based one.
If you want to know why GUI editors are a bad choice, the explanation is relatively simple: Git will start a new
editor a commit message (see below) and it will assume that the commit message is ready once the editor
terminates. However, many GUI editors work in a mode where there is single instance running and you only
open new tabs. In that case, the editor that is launched by Git actually terminates immediatelly – it only tells
the existing editor to open a new file – and Git sees only an empty commit message.
Git has over 100 subcommands available. Don’t panic, though. We will start with less than 10 of them
and even quite advanced usage requires knowledge of no more than 20 of them.
Configure Git
One of the key concepts in Git is that each commit (change) is authored – i.e., it is known who made it.
(Git also supports cryptographic signatures of commits, so that authorship cannot be forged, but let us
keep things simple for now.)
Thus, we need to tell Git who we are. The following two commands are the absolute minimum you
need to execute on any machine (or account) where you want to use Git.
Note that Git does not check the validity of your e-mail address or your name (indeed, there is no way
how to do it). Therefore, anything can be there. However, if you use your real e-mail address, GitLab
will be able to pair the commit with your account etc. which can be quite useful. The decision is up to
you.
The very first operation you need to perform is so called clone. During cloning, you copy your project
source code from the server (GitLab) to your local machine. The server may require authentication for
cloning to happen.
Cloning also copies the whole history of the project. Once you clone the project, you can view all the
commits you have made so far. Without need for an internet connection.
The clone is often called a working copy. As a matter of fact, the clone is a 1:1 copy, so if someone
deleted the project, you would be able to recreate the source code without any problem. (That is not
true about the Issues or the Wiki as it applies only to the Git-versioned part of the project.)
As you will see, the whole project as you see it on GitLab becomes a directory on your hard-drive. As
usually, there are also GUI alternatives to the commands we will be showing here, but we will focus our
attention on the CLI variants only.
Copy the HTTPS address and use it as the correct address for the clone command:
Note that some environments may offer you to use some kind of a keyring or another form of a
credential helper (to store your password). Feel free to use them, later on, we will see how to use SSH
and asymmetric cryptography for seamless work with Git projects without any need for
username/password handling.
It seems that some environments are rather forceful in their propagation of their password helpers (and
if you enter your password incorrectly the first time, they do not provide a simple way to clear it).
Try running the following first if you encounter HTTP Basic: Access denied. and no password prompt is
shown (see also this issue).
export GIT_ASKPASS=""
export SSH_ASKPASS=""
git clone ...
Note that you should have the student-LOGIN directory on your machine now. Move to it and see what
files are there. What about hidden files? Answer.
Unless stated otherwise, all commands will be executed from the student-LOGIN directory.
After the project is cloned, you can start editing files. This is completely orthogonal to Git and until you
explicitly tell Git to do something, it does not touch your files at all.
It is also important to note that Git will not fetch updates from the server automatically for you. That is, if you
clone the project and then modify something on GitLab directly, the changes will not propagate to your
working copy unless you explicitly ask for it.
Once you are finished with your changes (e.g., you fixed a certain bug), it is time to tell Git about the
new revision.
$ git status
On branch master
Your branch is up to date with 'origin/master'.
Run git status after the change. Read carefully the whole output of this command to understand what
it reports.
Create a new file, 03/editor.txt and put into it the name of the editor that you have decided to use
(feel free to create directory 03 in some graphical tool or use mkdir 03).
Again, check how git status reports this change in your project directory.
Run git diff to see how Git tracks the changes you made.
You will see a list of modified files (i.e., their content differs from last commit) and you can also see a
so called diff (sometimes also called a patch) that describes the change.
Note that git diff is also extremely useful to check that the change you made is correct as it focuses
on the context of the change rather than the whole file.
That clearly states what the commit changed. It is actually similar to how you create functions in a
programming language. A single function should do one thing (and do it well). A single commit should
capture one change.
Now prepare your first commit (recall that commit is basically a version or a named state of the project)
– run git add 03/editor.txt. We will take care of the extension in README.md later.
After staging all the relevant changes (i.e. git add-ing all the needed files), you create a commit. The
commit clears the staging status and you can work on fixing another bug :-).
Make your first commit via git commit. Do not forget to use a descriptive commit message!
Note that without any other options, git commit will open your text editor. Write the commit message
there and quit the editor (save the file first). Your commit is done.
For short commit messages, you may use git commit -m "Typo fix" where the whole commit message
is given as argument to the -m option (notice the quotes because of the space).
How will git status look like now? Think about it first before actually running the command!
You basically repeat this as long as you need to make changes. Recall that each commit should capture
a reasonable state of the project that is worth returning to later.
Whenever you make a commit, the commit remains local. It is not propagated back to the server
automatically.
To upload the changes (commits) back to the server, you need to initiate a so-called push. It uploads
all new commits (i.e., those between your clone operation and now) back to the server. The command
is rather simple.
git push
It will again ask for your password and after that, you should see your changes on GitLab.
Exercise
Add the link to Forum as a second commit from the command line.
As a third commit, create 03/architecture.sh script that contains the right shebang, it is executable and
prints the current architecture (if you skipped this task in previous lab, simply run only uname there or
look up the right switch in the man page now).
Push now the changes to GitLab. Note that all commits were pushed at the same time.
Change the title in the README.md to also contain for YOUR NAME. But this time make the change on
GitLab.
What is the easiest way to ensure that you have also the change in README.md on your machine after git
pull? Answer.
Note that git pull is quite powerful as it can incorporate changes that happened virtually at the same
time in both GitLab web UI as well as in your local clone. However, understanding this process requires
also knowledge about branches, which is out-of-scope for this lab.
For now, remember to not mix changes locally and in GitLab UI (or on a different machine) without always
ending with git push and starting with git pull.
Things get a little bit more complex when you work on multiple machines (e.g., mornings at a school
desktop, evenings at your personal notebook).
But for now it is best to ensure the following workflow to minimize introducing incompatible changes.
Note that if things go horribly wrong, you can always do a fresh clone to a different directory, copy the
files manually and remove the broken clone.
As long as you ensure that you work in the following manner, nothing will ever break:
Once you forgot some of the synchronizing pulls/pushes when switching between machines, problems
can arise. They are easy to solve, but we will talk about that in later labs.
For now, you can always do a fresh clone and simply copy files with the new changes and commit again
(not the right Git way, but it definitely works).
Going further
The command git log shows plenty of information but often you are interested in recent changes only.
You use them to refresh your mind of what you were working on etc.
git ls
And that could save time.
st = status
ci = commit
ll = log --format='tformat:%C(yellow)%h%Creset %an (%cr) %C(yellow)%s%Creset' --max-count=20 --
first-parent
Try running them first before adding them to your Git.
Git: check you remember the basic commands
Select all true statements.
Because you now know about shebangs, executable bits and scripts in general, you have enough
knowledge to actually run our tests locally without needing GitLab.
It should make your development faster and more natural as you do not need to wait for GitLab.
Simply execute ./bin/run_tests.sh in the root directory of your project and check the results.
./bin/run_tests.sh 03-before
./bin/run_tests.sh 03-post
./bin/run_tests.sh 03-before/architecture
Note: If you are using your own installation of Linux, you might need to install the bats (or bash-
bats or bats-core) package first.
The following tasks must be solved and submitted before attending your lab. If you have lab on
Wednesday at 10:40, the files must be pushed to your repository (project) at GitLab on Wednesday at
10:39 latest.
For virtual lab the deadline is Tuesday 9:00 AM every week (regardless of vacation days).
All tasks (unless explicitly noted otherwise) must be submitted to your submission repository. For most
of the tasks there are automated tests that can help you check completeness of your solution (see here
how to interpret their results).
https://d3s.mff.cuni.cz/f/teaching/nswi177/202223/labs/task-03.git/
There are multiple files in this repository. Copy the one mentioned in the commit messages
to 03/git.txt.
In other words, clone the above repository, view existing commits and in the commit messages, you
will see a filename that you should copy to your own project (as 03/git.txt).
Automated tests only check presence of the file, not that you have copied the right one.
Ensure your script has the right shebang and executable bit set.
We expect you will solve the following tasks after attending the labs and hearing feedback to your
before-class solutions.
All tasks (unless explicitly noted otherwise) must be submitted to your submission repository. For most
of the tasks there are automated tests that can help you check completeness of your solution (see here
how to interpret their results).
We need to distribute you passwords for this repository (we do not want to bind it with your SIS
account). We will do that during week 02. For this task you will be using your SIS/GitLab login but a
different password. The password was uploaded to the Wiki that is part of your NSWI177 project. The
information is on a page called Secrets. See the screenshot below for details how to find the page.
You will need the following repository (obviously, replace LOGIN with your SIS/GitLab login). Use the
password from the Wiki page Secrets (recall that pasting can be done simply by selecting the text here
and pasting it into the terminal with middle mouse click). This URL has no browser-friendly version, do
not be surprised by 404 if you open it in a web browser.
https://lab.d3s.mff.cuni.cz/nswi177/git-03/LOGIN.git
After you clone it, create a file 03.txt inside it.
In the first commit, insert 2022 as its only content (i.e., to 03.txt).
The correct answer is printed by the automated tests when you execute them locally (i.e., with 03-
post/local).
Learning outcomes
Learning outcomes provide a condensed view of fundamental concepts and skills that you should be
able to explain and/or use after each lesson. They also represent the bare minimum required for
understanding subsequent labs (and other courses as well).
Conceptual knowledge
Conceptual knowledge is about understanding the meaning and context of given terms and putting
them into context. Therefore, you should be able to …
Practical skills are usually about usage of given programs to solve various tasks. Therefore, you should
be able to …
• 2023-02-27: Warning about password helpers and executable bit in GitLab UI.
• 2023-02-24: Add connection details about the Git repository from post-class task.
Lab #4 (March 6 - March 10)
• Running example
• Standard input and outputs
• Filters
• Pipes (data streaming composition)
• Writing your own filters
• Standard error output
• Under the hood (about file descriptors)
• Advanced I/O redirection
• Program return (exit) code
• Shell customization
• More examples
• Before-class tasks (deadline: start of your lab, week March 6 - March 10)
• Post-class tasks (deadline: March 26)
• Learning outcomes
• This page changelog
The goal of this lab is to define and thoroughly understand the concepts of standard input, output, and
standard error output. This would allow us to understand program I/O redirection and composition of
different programs via pipes. We will also customize our shell environment a little by investigating
command aliases and the .bashrc file.
Running example
We will build this lab around a single example that we will incrementally develop, so that you learn the
basic concepts on a practical example (obviously, there are specific tools that could be used instead, but
we hope that this is better than a completely artificial example).
Data for our example can be downloaded (i.e., git cloned) from this repository where they reside in
the 04/ subdirectory.
They simulate simplified logs from a web server, where the web server records which files (URLs) were
accessed at which time.
Practically, each file represents traffic for one day in a simplified CSV format.
Fields are separated by a comma, there is no header, and for each record we remember the date, the
client’s IP address, the URL that was requested, and the amount of transferred bytes.
In reality, the data would be also compressed and would probably contain more details about the client (e.g., the
browser used), but otherwise the data recorded represent a fairly typical web server log format.
Our task is to write a program that prints a brief summary of the data:
We will start the lab with few definitions of concepts that you probably already know (but maybe not
under exactly these names).
Standard output
Standard output (often shortened to stdout) is the default output that you can use by
calling print("Hello") if you are in Python, for example. Stdout is used by the basic output routines in
almost every programming language.
Generally, this output has the same API as if you were writing to a file. Be it print in
Python, System.out.print in Java or printf in C (where the limitations of the language necessitate the
existence of a pair of printf and fprintf).
This output is usually prepared by the language runtime together with the shell and the operating system
(the technical details are not that important for this course anyway). Practically, the standard output is
printed to the terminal or its equivalent (and when the application is launched graphically, stdout is
typically lost).
Note that in Python you can access it explicitly via sys.stdout that acts as an opened file handle (i.e.,
result of open).
Standard input
Similarly to stdout, almost all languages have access to stdin that represents the default input. By default,
this input comes from the keyboard, although usually through the terminal (i.e., stdin is not used in
graphical applications for reading keyboard input).
Note that the function input() that you may have used in your Python programs is an upgrade on top
of stdin because it offers basic editing functions. Plain standard input does not support any form of
editing (though typically you could use backspace to erase characters at the end of the line).
If you want to access the standard input in Python, you need to use sys.stdin explicitly. As one could
expect, it uses a file API, hence it is possible to read a line from it calling .readline() on it or to iterate
through all lines.
In fact, the iteration of the following form is a quite common pattern for many Linux utilities (they are
usually written in C but the pattern remains the same).
Many of the utilities actually read from stdin by default. For example, cut -d : -f 1 prints only the first
column of data of each line (and expects the columns to be delimited by :).
Run it and type the following on the keyboard, terminating each line with <Enter>.
cut -d : -f 1
one:two
alpha:bravo
uno:dos
You should see the first column echoed underneath your input.
What to do when you are done? Typing exit will not help here but <Ctrl>-D works.
Pressing <Ctrl>-D on an empty line will close the standard input. The program cut will realize that there is no
more input to process and will gracefully terminate. Note that this is something else than <Ctrl>-C which
forcefully kills the running process. From the user’s perspective, these look similar in the context of the
utility cut, but the behavior is totally different with important semantics difference (that can be observed when
using other tools).
As a technical detail, we mentioned earlier that the standard input and output are prepared (partially) by
the operating system. This also means that it can be changed (i.e., initialized differently) without changing
the program. And the program may not even “know” about it.
This is called redirection and it allows the user to specify that the standard output would not go to the
screen (terminal), but rather to a file. From the point of view of the program, the API is still the same.
This redirection has to be done before the program is started and it has to be done by the caller. For us,
it means we have to do it in the shell.
It is very simple: at the end of the command we can specify > output.txt and everything that would be
normally printed on a screen goes to output.txt.
Before you start experimenting: the output redirection is a low-level operation and has no form of undo.
Therefore, if the file you redirect to already exists, it will be overwritten without questions. And without
any easy option to restore the original file content (and for small files, the restoration is technically
impossible for most file systems used in Linux).
As a precaution, get into a habit to hit <Tab> after you specify the filename. If the file does not exist, the
cursor will not move. If the file already exists, the tab completion routine will insert a space.
As the simplest example, the following two commands will create files one.txt and two.txt with the
words ONE and TWO inside (including the new line character at the end).
From implementation point of view, echo received a single argument, the part with > filename is not
passed to the program at all (i.e., do not expect to find > filename in your sys.argv).
If you know Python’s popen or a similar call, they also offer the option to specify which file to use for stdout if
you want to do a redirection in your program (but only for a new program launched, not inside a running
program).
If you recall Lab 02, we mentioned that the program cat is used to concatenate files. With the knowledge
of output redirection, it suddenly starts to make more sense as the (merged) output can be easily stored
in a file.
For the following example, we will need the program tac that reverses the order of individual lines but
otherwise works like cat (note that tac is cat but backwards, what a cool name). Try this first.
UNO
ONE
TWO
Try the following and explain what happens (and why) if you execute
Input redirection
Similarly, the shell offers < for redirecting stdin. Then, instead of reading input typed by the user on the
keyboard, the program reads the input from a file.
Note that programs using Pythonic input() do not work that well with redirected input.
Practically, input() is suitable for interactive programs only. You might want to
use sys.stdin.readline() or for line in sys.stdin instead.
When input is redirected, we do not need to issue <Ctrl>-D to close the input as the input is closed
automatically when reaching the end of the file.
Filters
Many utilities in Linux work as so-called filters. They accept the input from stdin and print their output
to stdout.
One such example is cut that can be used to print only certain columns from the input. For example,
running it as cut -d : -f 1 with /etc/passwd as its input will display a list of accounts (usernames) on the
current machine.
cut -d : -f 1 </etc/passwd
cut -d : -f 1 /etc/passwd
The above behavior is quite common for most filters: you can specify the input file explicitly, but when it
is missing, the program reads from the stdin.
To return to the question above: the difference is that in the first case (with input redirection), the input
file is opened by the shell and opened file is passed to cut. Problems in opening the file are reported by
shell and cut might not be launched at all. In the second case, the file is opened by cut (i.e., cut executes
the open() call and also needs to handle errors).
Armed with this knowledge, we can actually solve the first part of our running example. Recall that we
have files that logged traffic each day and we want to find URLs that are most common in all the files
together.
That means we need to join all files together, keep only the URL and find the three most frequent lines.
And we can do that. Recall that cat can be used concatenate files and cut can be used to keep only
certain columns. We will do finding the most frequent URL in a while.
#!/bin/bash
The script has one big flaw (we will solve it soon but it needs to be mentioned anyway).
The script writes to a file called _logs_merged.csv. We have prefixed the filename with underscore to mark
it as somewhat special but still: what if the user created such file manually?
That is, the program must be able to work with sys.argv[1] == '-d,' and with (sys.argv[1] == '-d') and
(sys.argv[2] == ',').
Pipes (data streaming composition)
We finally move to the area where Linux excels: program composition. In essence, the whole idea behind
Unix-family of operating systems is to allow easy composition of various small programs together.
Mostly, the programs that are composed together are filters and they operate on text inputs. These
programs do not make any assumptions on the text format and are very generic. Special tools (that are
nevertheless part of Linux software repositories) are needed if the input is more structured, such as XML
or JSON.
The advantage is that composing the programs is very easy and it is very easy to compose them
incrementally too (i.e., add another filter only when the output from the previous ones looks reasonable).
This kind of incremental composition is more difficult in normal languages where printing data requires
extra commands (here it is printed to the stdout without any extra work).
The disadvantage is that complex compositions can become difficult to read. It is up to the developer to
decide when it is time to switch to a better language and process the data there. A typical division of
labour is that shell scripts are used to preprocess the data: they are best when you need to combine data
from multiple files (such as hundreds of various reports, etc.) or when the data needs to be converted to
a reasonable format (e.g. non-structured logs from your web server into a CSV loadable into your favorite
spreadsheet software or R). Computing statistics and similar tasks are best left to specialized tools.
Needless to add, Linux offers a plenty of tools for statistical computations or plot drawing utilities that
can be controlled by CLI. Mastering of these tools is, unfortunately, out of topic for this course.
We already mentioned that the temporary file we used is bad because we might have overwritten
someone elses data.
But it also requires disk space for another copy of the (possibly huge) data.
A bit more subtle but much more dangerous problem is that the path to the temporary file is fixed.
Imagine what happens if you execute the script in two terminals concurrently. Do not be fooled by the
feeling that the script so short that the probability of concurrent execution is negligible. It is a trap that
is waiting to spring. We will talk about proper use of mktemp(1) later, but in this example no temporary
file is needed at all.
The result is the same, but we escaped the pitfalls of using temporary files and the result is actually even
more readable.
For cases when the first command also reads from standard input another syntax is available. For
example, this prints a sorted list of local user accounts (usernames).
cut -d : -f 1 </etc/passwd | sort
We can even move the first < before cut, so that the script can be read left-to-right like “take /etc/passwd,
extract the first column, and then sort it”:
Using the pipe above we can print all the URLs in a single list.
To find the most often visited ones we will use a typical trick where we first sort the lines alphabetically
and then use program uniq with -c to count unique lines (in effect counting how many times each URL
was visited). We then sort this output by the numbers and print first 3 lines.
Hence our program will evolve like this (lines starting with # are obviously comments).
Exercise
Print the total amount of transferred bytes using the logs from our running example (i.e., the last part of
the task).
First part should be easy: we are interested only in the last column.
A filter is a program that reads standard input and prints results to standard output.
A pipe connects the standard output of one program to the standard input of another program.
Pipes can be replaced with I/O redirection.
Pipes can split standard input to two programs for further processing.
Evaluate
Let us finish another part of the running example. We want to compute traffic for each day and print
days with the most traffic.
Knowing how we composed things so far, we lack only the middle part of the pipeline. Summing the
sizes for each day.
There is no ready-made solution for this (advanced users might consider installing termsql) but we will
create our own in Python and plug it into our pipeline.
Recall we want to group the traffic by dates, hence our program should be able to do the following
tranformation.
# Input
day1 1
day1 2
day2 4
day1 3
day2 1
# Output
day1 6
day2 5
Here is our version of the program. Notice that we have (for now) ignored error handling but allowed
the program to be used as a filter in the middle of the pipeline (i.e., read from stdin when no arguments
are provided) but also easily usable for multiple files.
In your own filters, you should also follow this approach: the amount of source code you need to write
is negligible, but it gives the user flexibility in use.
#!/usr/bin/env python3
import sys
if __name__ == "__main__":
main()
With such program in place, we can extend our web statistics script in the following manner.
On your own, extend the solution to print only the top 3 days ( sort can order the lines using different
columns than the whole line too). Answer.
While it often makes sense to redirect the output, you often want to see error messages still on the
screen.
Imagine files one.txt and two.txt exist while nonexistent.txt is not in the directory. We will now execute
the following command.
No, do not imagine it. Create the files one.txt and two.txt to contain words ONE and TWO yourself on the
command line. Hint. Answer.
Therefore, every Linux program also has a standard error output (often just stderr) that also goes to the
screen but is logically different from stdout and is not subject to > redirection.
try:
with open(filename, "r") as inp:
sum_file(inp, sums)
except IOError as e:
print(f"Error reading file {filename}: {e}", file=sys.stderr)
The following text provides overview of file descriptors that are abstractions used by the OS and the
application when working with opened files. Understanding this concept is not essential for this course
but it is a general principle that (to some extent) is present in most operating systems and applications
(or programming languages).
Technically, opened files have so-called file descriptors that are used when an application communicates
with the operating system (recall that file operations have to be done by the operating system). The file
descriptor is an integer that serves as an index in a table of opened files that is kept for each process
(i.e., a running instance of a program).
This number — the file descriptor — is then passed to system calls which operate on the opened file.
For example, write gets two arguments: an opened file descriptor and a byte buffer to write (in our
examples, we will pass the string directly for simplicity). Therefore, when your application
calls print("Message", file=some_file), eventually your program would call the operating system
as write(3, "Message\n") where 3 denotes the file descriptor for the opened file represented by
the some_file handle.
While the above may look like a technical detail, it will help you understand why the standard error
redirection looks the way it does, or why file operations in most programming languages require opening
the file first before writing to it (i.e., why write_to_file(filename, contents) is never a primitive
operation).
In any unix-style environment, the file descriptors 0, 1, and 2 are always used for standard input, standard
output, and standard error output, respectively. That is, the call print("Message") in Python eventually
ends up in calling write(1, "Message\n") and a call to print("Error", file=sys.stderr) calls write(2,
"Error\n").
When a new process is started, it obtains these three file descriptors from its caller (e.g., the shell). By
default, they point to the terminal, but the caller can simply open them to point to a different file. This is
how redirection works.
The fact that stdout and stderr are logically different streams (files) also explains the word probably in
one of the examples above. Even though they both end in the same physical device (the terminal), they
may use a different configuration: typically, the standard output is buffered, i.e., output of your
application goes to the screen only when there is enough of it, while the standard error is not buffered
– it is printed immediately. The reason is probably obvious – error messages should be visible as soon
as possible, while normal output might be delayed to improve performance.
Note that the buffering policy can be more sophisticated, but the essential take away is that any output
to the stderr is displayed immediately while stdout might be delayed.
./group_sum.py <one.txt
./group_sum.py one.txt
./group_sum.py one.txt two.txt
./group_sum.py one.txt <two.txt
Has it behaved as you expected?
Trace which paths (i.e. through which lines) the program has taken with the above invocations.
To redirect the standard error output, you can use > again, but this time preceded by the number 2 (that
denotes the stderr file descriptor).
Hence, our cat example can be transformed to the following form where err.txt would contain the error
message and nothing would be printed on the screen.
Consider the following mini-script (first-column.sh) that extracts and sorts the first column (for colon-
delimited data such as in /etc/passwd).
#!/bin/bash
cut -d : -f 1 | sort
Then the user can use the script like this and cut standard input would be properly wired to the shell
standard input or through the pipe.
Generic redirection
Shell allows us to redirect outputs quite freely using file descriptor numbers before and after the greater-
than sign.
For example, >&2 specifies that the standard output is redirected to a standard error output. That may
sound weird but consider the following mini-script.
Take this as an illustration of the concept as wget can be silenced via command-line arguments (--quiet)
as well.
Sometimes, we want to redirect stdout and stderr to one single file. In these situations simple >output.txt
2>output.txt would not work and we have to use >output.txt 2>&1 or &>output.txt (to redirect both at
once). However, what about 2>&1 >output.txt, can we use it as well? Try it yourself! Hint.
We already mentioned that virtually everything in Linux is a file. Many special files representing devices
are in /dev/ subdirectory.
Especially /dev/null is a very useful file as it can be used in any situation when we are not interested in
the output of a program.
For many programs you can specify the use of stdin explicitly by using - (dash) as the input filename.
Another option is to use /dev/stdin explicitly: with this name, we can make the example
with group_sum.py work:
/dev/stdoutcan be used if we want to specify standard output explicitly (this is mostly useful for
programs coming from other environments where the emphasis is not on using stdout that much).
So far, the programs we have used announced errors as messages. That is quite useful for interactive
programs as the user wants to know what went wrong.
However, for non-interactive use, checking for error messages is actually very error-prone. Error
messages change, the users can have their system localized etc. etc. Therefore, Linux offers a different
way of checking whether a program terminated correctly or not.
Whether a program terminates successfully or with a failure, is signalled by its so-called return (or exit)
code. This code is an integer and unlike in other programming languages, zero denotes success and any
non-zero value denotes an error.
Why do you think that the authors decided that zero (that is traditionally reserved for false) means
success and nonzero (traditionally converted to true) means failure? Hint: in how many ways can a
program succeed?
Unless specified otherwise, when your program terminates normally (i.e., main reaches the end and no
exception is raised), the exit code is zero.
If you want to change this behavior, you need to specify this exit code as a parameter to
the exit function. In Python, it is sys.exit.
For C programs, the main function actually returns an int, whose value is the exit code. Use it properly.
The full signature is actually int main(int argc, char *argv[]) so that you can access command-line
options as function arguments (most environments will actually allow you to use plain void
main(void) but it is not recommended).
As an example, the following is a modification of the group_sum.py above, this time with proper exit code
handling.
def main():
sums = {}
exit_code = 0
if len(sys.argv) == 1:
sum_file(sys.stdin, sums)
else:
for filename in sys.argv[1:]:
try:
with open(filename, "r") as inp:
sum_file(inp, sums)
except IOError as e:
print(f"Error reading file {filename}: {e}", file=sys.stderr)
exit_code = 1
for key, sum in sums.items():
print(f"{key} {sum}")
sys.exit(exit_code)
We will later see that shell control flow (e.g., conditions and loops) is actually controlled by program
exit codes.
Failing fast
So far, we expected that our shell scripts will never fail. We have not prepared them for any kind of
failure.
We will eventually see how exit codes can be tested and used to control our shell scripts more, but for
now we want to stop whenever any failure occurs.
That is actually quite sane behavior: you typically want the whole program to terminate if there is an
unexpected failure (rather than continuing with inconsistent data). Like an uncaught exception in Python.
To enable terminate-on-failure, you need to call set -e. In case of failure, the shell will stop executing
the script and exit with the same exit code as the failed command.
Furthermore, you usually want to terminate the script when an uninitialized variable is used: that is
enabled by set -u. We will talk about variables later but -e and -u are usually set together.
And there is also a caveat regarding pipes and success of commands: the success of a pipeline is
determined by its last command. Thus, sort /nonexistent | head is a successful command. To make a
failure of any command fail the (whole) pipeline, you need to run set -o pipefail in your script (or shell)
before the pipeline.
Therefore, typically, you want to start your script with the following trio:
set -o pipefail
set -e
set -u
Many commands allow short options (such as -l or -h you know from ls) to be merged like this (note
that -o pipefail has to be last):
Actually, from now on, the GitLab pipeline will check that this command is a part of your scripts.
set -ueo pipefail can sometimes cause unwanted and quite unexpected behavior.
The following script terminates with a hard-to-explain error, i.e., we never reach the final echo. Note that
the final hexdump is there only to ensure we do not print garbage from /dev/urandom directly on the
terminal.
#!/bin/bash
The reason comes from the head command. head has a very smart implementation that terminates after
first -n lines were printed. Reasonable right? But that means that the first cat is suddenly writing to a
pipe that no one reads. It is like writing to a file that was already closed. That generates an exception
(well, kind of) and cat terminates with an error. Because of set -o pipefail, the whole pipeline fails.
The truth is that distinguishing whether the closed pipe is a valid situation that shall be handled gracefully
or if it indicates an issue is impossible. Therefore cat terminates with an error (after all, someone just
closed its output without letting it know first) and thus the shell has to mark the whole pipeline as failed.
Solving this is not always easy and several options are available. Each has its pros and cons.
When you know why this can occur, adding || true marks the pipeline as fine (we will learn about || later
on, though).
Exit code: check you understand the basics
Select all true statements.
Shell customization
We already mentioned that you should customize your terminal emulator to make it comfortable to use.
After all, you will spend at least this semester with it and it should be fun to use.
In this lab, we will show some other options how to make your shell more comfortable to use.
Command aliases
You probably noticed that you execute some commands with the same options a lot. One such example
could be ls -l -h that prints a detailed file listing, using human-readable sizes. Or perhaps ls -F to
append a slash to the directories. And probably ls --color, too.
Shell offers to create so-called aliases where you can easily add new commands without creating full-
fledged scripts somewhere.
Try executing the following commands to see how a new command l could be defined.
Some typical aliases that you will probably want to try are the following ones. Use a manual page if you
are unsure what the alias does. Note that curl is used to retrieve contents from a URL and wttr.in is
really a URL. By the way, try that command even if you do not plan to use this alias :-).
~/.bashrc
Aliases above are nice, but you probably do not want to define them each time you launch the shell.
However, most shells in Linux have some kind of file that they execute before they enter interactive
mode. Typically, the file resides directly in your home directory and it is named after the shell, ending
with rc (you can remember it as runtime configuration).
For Bash which we are using now (if you are using a different shell, you probably already know where to
find its configuration files), that file is called ~/.bashrc.
You have already used it when setting EDITOR for Git, but you can also add aliases there. Depending on
your distribution, you may already see some aliases or some other commands there.
Add aliases you like there, save the file and launch a new terminal. Check that the aliases work.
The .bashrc file behaves as a shell script and you are not limited to have only aliases there. Virtually any
commands can be there that you want to execute in every terminal that you launch.
The prompt is modified through the PS1 variable. We will talk about variables in more detail later on, for
now we will learn the syntax only.
When setting the variable, we can directly modify it in shell and immediatelly observe the result.
PS1=''
The prompt is gone. We have set it to an empty string.
PS1='\w '
Here we set it to print current directory and a space. The special sequence \w will be automatically
replaced by the name of the working directory.
Many users prefer to know as which user they are logged in.
PS1='\u: \w '
The usual tradition is end the prompt with a dollar sign.
It is also possible to add your own commands to be executed or even make the prompt multi-line.
More examples
The following examples can be solved either by executing multiple commands or by piping basic shell
commands together. To help you find the right program, you can use manual pages. You can also use
our manual as a starting point.
Note that none of the solutions requires anything else than using few pipelines. For advanced users:
definitely you do not need if or while or read or even using PERL or AWK.
Use the following CSV with data on how long it took to copy the USB disk image to the USB drives in
the library. The first column represents the device, the second duration of the copying.
As a matter of fact, the first column also indirectly represents port of the USB hub (this is more by accident
but it stems from the way we organized the copying). As a sidenote: it is interesting to see that some
ports that are supposed to be the same are actually systematically slower.
We want to know what was the longest duration of the copying: in other words, the maximum of column
two.
Solution.
Create a directory a and inside it create a text file --help containing Lorem Ipsum. Print the content of this
file and then delete it. Solution.
Create a directory called b and inside it create files called alpha.txt and *. Then delete the file called * and
watch out what happened to the file alpha.txt. Solution.
Print the content of the file /etc/passwd sorted by the rows. Solution.
Print the first and third column of the file /etc/group. Solution.
Print last two lines of the files /etc/passwd and /etc/group using a single command. Solution.
Recall the file disk-speeds-data.csv with the disk copying durations. Compute the sum of all
durations. Solution.
Alpha 8 4 5 0
Bravo 12 5 3 2
Charlie 1 0 11 4
Append to each row sum of its line. You do not need to keep the original alignment (i.e., feel free to
squeeze the spaces). Hint. Solution.
Print the contents of /etc/passwd and /etc/group separated by text Ha ha ha (i.e., contents
of /etc/passwd, line with Ha ha ha and contents of /etc/group). Solution.
Print vendors of your CPU. Use the file /proc/cpuinfo as the starting point.
Solution.
Before-class tasks (deadline: start of your lab, week March 6 - March 10)
The following tasks must be solved and submitted before attending your lab. If you have lab on
Wednesday at 10:40, the files must be pushed to your repository (project) at GitLab on Wednesday at
10:39 latest.
For virtual lab the deadline is Tuesday 9:00 AM every week (regardless of vacation days).
All tasks (unless explicitly noted otherwise) must be submitted to your submission repository. For most
of the tasks there are automated tests that can help you check completeness of your solution (see here
how to interpret their results).
We are sorry but the automated tests are not yet ready. We will upload them ASAP. Tests are available.
This lab is about pipes. The shell tasks here must be solved using pipes, not using shell loops (even if you know
them) or by off-loading to another programming language.
List of users is stored either in /etc/passwd or via getent passwd. Your script will assume that the list of
users will come on standard input.
name1,duration_in_seconds_1
name2,duration_in_seconds_2
Write author of the fastest solution (you can safely assume that the durations are distinct).
We expect you will solve the following tasks after attending the labs and hearing feedback to your
before-class solutions.
All tasks (unless explicitly noted otherwise) must be submitted to your submission repository. For most
of the tasks there are automated tests that can help you check completeness of your solution (see here
how to interpret their results).
We are sorry but the automated tests are not yet ready. We will upload them ASAP. Tests are available.
This lab is about pipes. The shell tasks here must be solved using pipes, not using shell loops (even if you know
them) or by off-loading to another programming language.
We expect that for the following matrix we would get this output.
| 106 179 |
| 188 50 |
| 5 125 |
285
238
130
The script will read input from stdin, there is no limit on the amount of columns or rows but you can rely
on the fixed format as explained above.
04/day_of_week.py (50 points, group devel)
Write a Python filter that converts date to day of week.
The program will convert dates in first column only (using whitespace for splitting), invalid dates will be
ignored (and the line will be kept as-is). Rest of the column will copied to the output.
04/day_of_week.py <input.txt
04/day_of_week.py input.txt
cat one.txt two.txt | 04/day_of_week.py
If the file cannot be opened, the program will print an error message to stderr (exact wording is defined
by the tests) and will terminate with exit code 1.
You can expect that the program will not be invoked as 04/day_of_week.py one.txt two.txt.
Learning outcomes
Learning outcomes provide a condensed view of fundamental concepts and skills that you should be
able to explain and/or use after each lesson. They also represent the bare minimum required for
understanding subsequent labs (and other courses as well).
Conceptual knowledge
Conceptual knowledge is about understanding the meaning and context of given terms and putting
them into context. Therefore, you should be able to …
Practical skills are usually about usage of given programs to solve various tasks. Therefore, you should
be able to …
The goal of this lab is to start using Linux in network environment and learn basic concepts needed for
machines shared among multiple users. After the lab, you will be able to log in to a remote Linux machine,
use Git over SSH and also have a look of what programs are currently running on a Linux machine.
We provide brief overview of several concepts that we believe you should already be familiar with. Feel
free to skim these parts and focus on the new topics only.
Networking introduction
This text assumes that you have basic knowledge of networking. Terms such as IP address, interface or
port shall not be new for you. If you need a refresher, we have set-up a short page with a brief overview
of networking.
Asymmetric cryptography
Before diving into details about how to access remote machines (over SSH), we need to make a brief
refresh of some cryptography-related topics.
In this lab, we will be talking a lot about asymmetric cryptography. In general, it is a method of
encryption/decryption where the user needs a different key for decryption of the message than the one
that was used for message encryption.
This is different from symmetric ciphers. For example, the well-known Caesar cipher has a single key (the
alphabet shift step) which is used for both encryption and decryption.
Asymmetric cryptography creates usually a pair of keys: a public key, that is usually used for encryption
and a private one. For example, if you make your encryption key public and decryption key private,
everybody can encrypt a message for you, but only you can decrypt it. This is secure if it is impossible (or
hard enough) to derive the private key from the public one, which is usually the case.
This has an obvious advantage: you do not need to create a secret symmetric key for every pair of users
who would want to communicate. Instead, everybody just distributes their public key and guards the
single private key. (This is not as trivial as it looks: When Alice wants to send an encrypted message to
Bob, she has to make sure that the public key does really belong to Bob. Otherwise, you can easily establish
a secure connection, but to an attacker.)
Unfortunately, there is no good example of an asymmetric cipher as simple as the Caesar’s cipher. For an
example, which is more complex, but still approachable, have a look at RSA.
Please note that selecting a good cipher is only a small step in communicating securely. If you want to
learn more, please consult some real textbook on cryptography or attend one of our cryptographic
courses. The above serves as a refresher to ensure we are on the same page for the rest of this lab.
Asymmetric cryptography has two main uses. The first one is obvious: if we know the public key of the
receiver of the message, we can use it to encrypt the message and send it over unprotected medium (and
without fear that anyone else would be able to read it).
But it can be used also in reverse to authenticate the owner of the private key. We assume we are able to
distribute the public keys safely here.
The mini-protocol is then able to authenticate (i.e., verify) that the other party is who they claim to be by
proving the ownership of the private key (i.e., we assume that private keys were not stolen).
The method is very simple – the sender generates a random text and encrypts it with the public key of the
receiver (the one we wish to verify). If the receiver is the real owner, they would be able to decrypt the
random text and send it back to us. Inability to decrypt the text means that the receiver is not the owner
of the private key.
Typically a user authenticates to a service with a login and a password. While this method is quite natural
and common for human users, it has several drawbacks.
The most obvious problem is password strength: people rarely have long passwords and not many people
use any form of password manager. Using the same password at multiple places also allows administrator
(or hacker) of one service to impersonate you in other services.
If you do not use a password manager, consider using one. The general idea is that you remember one
(but strong!) password and this password encrypts rest of the passwords.
Therefore, all passwords can be generated to be long enough and unique for each service.
There are plenty of managers available, a simple one is pass that can use Git backend and has plenty of
GUI clients available, including ones for Android and iOS.
Back to private/public key authentication. Some services allow the user to authenticate with their public
key instead of using username/password.
The user uploads their public key to the server (using login and password for authenticating that
operation) and when they want to log in, the server challenges them with a random secret encrypted with
their public key. As the sole owner of the private key (and hence the only one able to decrypt), the user
can decrypt the secret and confirm their identity. The operation then continues as with any other
authenticated user.
Useful rules
For the public key authentication to work securely, the following is highly recommended (note that most
of these rules apply to any other type of authentication, too).
The private key is like the password – it must not leak. Because the private key is usually a file, you must protect
this file. Having an encrypted hard drive is a typical option for portable machines.
It is possible to protect the key itself with a passphrase (basically, another password). Then even a leaked
private key file is not an immediate threat of identity theft. Note that there are helpers, such as ssh-
agent(1), that can store the passphrase for some time so you do not have to enter it every time you use
the key.
If you have multiple computers, it is preferred to use a different public/private key pair on each machine.
If one machine is compromised, it it sufficient to remove its public key from all applications, while you can
still use the other key pairs.
Using SSH
Enough of theory: let us connect to some remote machine. Let us explore SSH.
What is SSH?
SSH – that stands for Secure Shell – is a protocol for connecting to a different machine and running a shell
there.
From a user perspective, after you SSH from a Linux machine into a different Linux machine, the shell may
look the same and some commands would behave completely the same. Except they might be running
on a different computer.
Note that this is intentional: remote shell is a natural way to control a Linux machine. No need to make it
different from controlling it through a local shell.
SSH practically
Using SSH is very simple (unless you make it complex). To SSH to a remote machine, you need to know
your credentials (i.e., login name and a password) and, of course, the name of the remote machine.
ssh YOUR_LOGIN@REMOTE_MACHINE_NAME
Note that the command ssh is often called a SSH client as it connects to the SSH server (similar
to curl or wget being web clients connecting to a web server).
We have set up a remote machine for you on linux.ms.mff.cuni.cz. You will be using your GitLab (SIS/CAS)
login and also the same password.
ssh [email protected]
If your GitLab account was created manually, the chances your SIS password will not work on this machine.
Please, contact us via this link and we will create an account for you manually.
The first login to the machine is a bit more complicated. SSH client wants from you a verification that you
trust the remote server. It shows you a so-called server fingerprint:
RSA: SHA256:Z11Qbd6nN6mVmCSY57Y6WmxIJzqEFHFm47ZGiH4QQ9Y
ED25519: SHA256:/CVwDW388z6Z5VlhLJT0JX+o1UzakyQ+S01+34BI0BA
You should have received this fingerprint in a secure way before connecting to the server (for example,
printed from your employer etc.). In this case, we hope that a potential attacker would not be able to
break both into this web server and the SSH server at once. So we use the HTTPS protocol on the web as
a secure way of verifying the SSH server.
The program then continues to ask your for a password and also informs you that the fingerprint was
stored.
On following logins, the SSH client checks that the fingerprint has not changed. A changed fingerprint
belonging to the same machine (i.e., with the same DNS name) could indicate a man-in-the-middle attack.
Do not connect when the fingerprint has changed. Always contact the administrator (e.g., the teachers
in this particular case) and check with them what is happening.
If you were able to log-in, you will see an (almost) empty home directory and otherwise a normal Linux
machine, this time without any graphical applications installed.
Try to run lscpu to see what machine you have logged in to.
Note that this machine is shared for all students of this course. Use it to solve graded tasks or to
experiment with commands we show in the labs. You will be using it for the rest of the course for several
tasks so keep it useable, please.
Do not use it for computationally intensive tasks or other tasks that are not related to this course.
We also strictly prohibit to use it for any kind of remote development with tools such as Visual Studio,
IntelliJ IDEA or similar IDEs. These tools install huge blobs of code/data on the remote machine and
because they rarely remove old versions, they are very quick in taking all free space available for
themselves.
If we encounter any form of abusive use, we will block the offending account.
Configuration of $PS1
We have already touched this a little bit in the previous lab as an extra for further reading.
specified how your prompt looks like. From now on, you should ensure that your $PS1 shows also the
$PS1
machine name so that you always know where you are (i.e., which machine you are logged into).
Similarly to setting EDITOR in your ~/.bashrc you should ensure to specify the following (at least) if the
default does not display the machine name.
PS1='\u@\h \w\$'
This ensures that you can see your username (\u), the hostname (\h) and also the working directory (\w).
Return to previous lab if you want to see more options about setting your prompt.
As a personal tip: use colorful prompt without user/machine on your workstation but keep the non-colored
version with \u@\h on all remote machines to keep short prompt on the personal machine but keep visual
distinction when working on a remote one.
The command ssh is actually quite powerful and configurable. One important feature is that you can
specify a command after the hostname and run this command directly. In this case, SSH never starts
an interactive shell on the remote machine, but executes only the given command and returns.
To enable authentication with public key over SSH, we need to perform two steps. Generate the public-
private key pair and copy the public key to the remote machine.
To generate the public key we need to run the following command (the -C is actually a comment,
providing e-mail or your login is the usual approach but keeping it empty is possible too).
Once we have the public key ready, we need to upload it to the remote machine. If you have multiple key
pairs, read about the -i switch.
ssh-copy-id LOGIN@REMOTE_MACHINE
If you log in to the remote machine again, the SSH client should use the key pair and log you in without
asking for a password. If not, run SSH with -vvv to debug the issue.
Note that the public key was stored into ~/.ssh/authorized_keys file on the remote machine. You can copy
it there manually but using ssh-copy-id is easier.
If the copying fails with cryptic message about warning: here-document at line 251 delimited by end-of-
file (wanted EOF), try upgrading the SSH client first.
If you use the image from us, simple sudo dnf upgrade openssh-clients should work.
Using keys only
Note that some services actually require that you authenticate using a key pair instead of a password as
it is considered more secure.
The advantage is that any random attackers could keep guessing your password and still never get access
to your machine.
Automated client ban
Typically, SSH server is also configured to ban any client that tries to login without success several times
in a row. Our server does that, too.
Copying files
If you have deciphered what the following command does, you should by now have an idea how to copy
files to and from a remote machine (we are not saying it would be the most effective way):
To copy the files, we can leverage cat that simply dumps the file as-is to stdout. We will pipe its output to
SSH and on the other machine run second cat to store the file.
There are also scp and rsync that can be used for copying multiple files over SSH easily but we will talk
about these in later labs.
File managers
Many file managers allows you to use SSH transparently and copy between machines with the same ease
as when working with local files.
For example, in mc, select Shell connection either in left or right panel and specify SSH connection. One of
the panels will show the files on the remote machine. Try using F5 to copy files interactively.
Apart from the machine linux.ms.mff.cuni.cz, there is also a full lab of machines available in the Rotunda
computer lab on Malostranské náměstí.
All the Linux machines in the lab are also reachable via SSH. Again, use your SIS credentials to log in. Note
that all machines in Rotunda (but not linux.ms.mff.cuni.cz!) share the same home directory, i.e., it does
not matter which one you physically connect to. Your files will be available on all machines.
Unfortunately, the used file system and authentication mechanism does not allow to use public key
authentication for Rotunda machines. You need to always specify your password.
(linux.ms.mff.cuni.cz does not have this limitation and we expect you will use public key authentication
there.)
• Lab SU1
o u1-1.ms.mff.cuni.cz
o u1-2.ms.mff.cuni.cz
o …
o u1-14.ms.mff.cuni.cz
• Lab SU2
o u2-1.ms.mff.cuni.cz
o u2-2.ms.mff.cuni.cz
o …
o u2-25.ms.mff.cuni.cz
• Rotunda
o u-pl1.ms.mff.cuni.cz
o u-pl2.ms.mff.cuni.cz
o …
o u-pl23.ms.mff.cuni.cz
So far we have silently ignored the fact that there are different user accounts on any Linux machine. And
that users cannot access all files on the machine. In this section we will explain the basics of Unix-style
access rights and how to interpret them.
After all, now you can log in to a shared machine and you should be able to understand what you can
access and what you cannot.
Recall what we said about /etc/passwd earlier – it contains the list of user accounts on that particular
machine (technically, it is not the only source of user records, but it is a good enough approximation for
now).
Every running application, i.e., a process, is owned by one of the users from /etc/passwd (again, we simplify
things a little bit). We also say that the process is running under a specific user.
And every file in the filesystem (including both real files such as ~/.bashrc and virtual ones such
as /dev/sda or /proc/uptime) has some owner.
When a process tries to read or modify a file, the operating system decides whether the operation is
permitted. This decision is based on the owner of the file, the owner of the process, and permissions
defined for the file. If the operation is forbidden, the input/output function in your program raises an
exception (e.g., in Python), or returns an error code (in C).
Since a model based solely on owners would be too inflexible, there are also groups of users (defined
in /etc/group). Every user is a member of one or more groups, one of them is called the primary group.
These are associated with every process of the user. Files have both an owning user and an owning group.
Files are assigned three sets of permissions: one for the owner of the file, one for users in the owning
group, and one for all other users. The exact algorithm for deciding which set will be used is this:
1. If the user running the process is the same as the owner of the file, owner access rights are used
(sometimes also referred to as user access rights).
2. If the user running the process is in a group that was set on the file group access rights are used.
3. Otherwise, the system checks against other access rights.
Every set of permissions contains three rights: read (r), write (w), execute (x):
The same permissions also apply to directories. Their meaning is a bit different, though:
• Read right allows the user to list directory entries (files, symlinks, sub-directories, etc.).
• Write right allows the user to create, remove, and rename entries inside that directory. Note that
removing write permission from a file inside a writable directory is pointless as it does not prevent the user
from overwriting the file completely with a new one.
• Execute right on a directory allows the user to open the entries. (If a directory has x, but not r, you can
use the files inside it if you know their names; however, you cannot list them. On the contrary, if a directory
has r, but not x, you can only view the entries, but not use them.)
Permissions of a file or directory can be changed only by its owner, regardless of the current permissions.
That is, the owner can deny access to themselves by removing all access rights, but can always restore
them later.
root account
Apart from accounts for normal users, there is always an account for a so-called superuser – more often
called simply just root – that has administrator privileges and is permitted to do anything with any file in
the system. The permissions checks described above simply do not apply to root-owned processes.
Unlike other systems, Linux is designed in such way that end-user programs are always executed under
normal users and never require root privileges. As a matter of fact, some programs (historically, this was
a very common behaviour for IRC chat programs) would not even start under root.
Looking at the shortcuts of rwx for individual permissions, you may found them familiar:
Typically, your personal files in your home directory will have you as the owner together with a group with
the same name. That is a default configuration that prevents other users from seeing your files.
Do check that it is true for all directories under /home on the shared machine.
But also note that most of the files in your home directory are actually world-readable (i.e., anyone can
read them).
That is actually quite fine because if you check permissions for your ~, you will see that it is typically drwx-
-----. Only the owner can modify and cd to it. Since no one can actually change to your directory, no one
will be able to read your files (technically, reading a file involves traversing the whole directory and
checking access rights on the whole path).
To change the permissions, you can use chmod program. It has the general format of
If you execute the following command, you will see a bit different output that you would probably expect.
The s bit (set-uid) is a bit more tricky. It specifies that no matter who runs the shell, passwd will be running
under the user owning the file (i.e., root for this file).
While it may look useless, it is a simple way to allow running certain programs with elevated (higher)
permissions. passwd is a typical example. It allows the user to change their password. However, this
password is stored in a file that is not readable by any user on the system except root (for obvious reasons).
Giving the s bit to the executable means that the process would be running under root and would be able
to modify the user database (i.e., /etc/passwd and /etc/shadow that contains the actual passwords).
Since changing the permissions can be done only by the owner of the file, there is no danger that a
malicious user would add the s bit to other executables.
There are other nuances regarding Unix permissions and their setting, refer to chmod(1) for details.
The permission model described above is a rare example of a concept coming from Unix that is considered
inflexible for use today. However, it is also considered as a typical example of a simple but well usable
security model.
Many programs copied this model and you can encounter it in other places too. It is definitely something
worth remembering and understanding.
The inflexibility of the system comes from the fact that allowing a set of users to access a particular file
means creating a special group for these users. These groups are defined in /etc/group and changing them
requires administrator privileges.
With an increasing number of users, the amount of possibly needed groups grows exponentially. On the
other hand, for most situations, the basic Unix permissions are sufficient.
To tackle this problem, Linux offers also so-called POSIX access control lists where it is possible to assign
an arbitrary list of users to any file to specify the permissions.
getfacl and setfacl are the utilities to control these rights but since these are practically needed rarely,
we will leave their knowledge at the level of reading the corresponding manpages and acl(5).
Access rights checks
Change permission of some of your scripts to be --x. Try to execute them. What happens? Answer.
Remove writable bit for a file and write to it using stdout redirection. What happens?
Assuming the following output of ls -l (script.sh is really a shell script) and assuming user bob is in
group wonderland while user lewis is not.
So far, we have used Git over HTTPS. Git can be used over SSH too. Then, the traffic is basically tunneled
through an SSH connection and Git relies on the SSH wrapper for security as well as for (partial)
authentication.
Virtually all Git servers (GitLab, GitHub, Bitbucket…) will require you to upload your public key if you want
to use Git over SSH.
To actually use Git over SSH, we first need to tell GitLab about our SSH keys (recall the protocol that is
used to authenticate the user).
Copy your public key to GitLab. Navigate to right-top menu with your avatar, select Preferences and
then SSH keys or visit this link.
Copy your public key there and name it. Typically, the name should mention your username and your
machine. Note that GitLab will send you an e-mail informing you about a new key. Why? Hint. Answer.
Go to your project and clone it again. This time, use the Clone with SSH URL.
This way, you can clone a Git directory from any SSH server by specifying its remote path there (here,
GitLab does some mangling but the principle holds).
Note that the user we clone with is git – not you. This way, GitLab needs only one physical user account
for handling Git requests and distinguishes the users via their public keys. How? Answer.
ssh [email protected]
Note that you should prefer the SSH protocol for working with Git as it is much more comfortable for use.
Git on other platforms also offers generation of an SSH key but often the key is usable only by one application
(different applications have incompatible key formats), while on Linux a single key is generally usable for Git, other
machines, and other services.
Rest of the work with Git remains the same. git add, git commit, git push etc. will work the same but only
the communication with GitLab goes through SSH tunnel. Note that you don’t have to reenter your
credentials while doing git push. This is because git remembers how did you git clone the repository and
will use the same URL (either HTTPS or SSH) for git push as well (unless you configure it otherwise). And
since you used SSH for pulling, it will use SSH for pushing as well, which uses the public/private
authentication you already set up.
Processes
Files in the system are the passive elements of it. The active parts are running programs that actually
modify data. Let us have a look what is actually running on our machine.
When you start a program (i.e., an executable file), it becomes a process. The executable file and a running
process share the code – it is the same in both. However, the process also contains the stack (e.g., for local
variables), heap, current directory, list of opened files etc. etc. – all this is usually considered a context of
the process. Often, the phrases running program and process are used interchangeably.
To view the list of running processes on our machine, we can use htop to view basic properties of
processes. Similar to MidnightCommander, function keys perform the most important actions and the
help is visible in the bottom bar. You can also configure htop to display information about your system
like amount of free memory or CPU usage.
For non-interactive use we can execute ps -e (or ps -axufw for a more detailed list).
For illustration here, this is an example of ps output (with --forest option use to depict also the
parent/child relation).
However, run ps -ef --forest on the shared machine to also view running processes of your colleagues.
Listing of processes is not protected in any way from other users. Every user on a particular machine can
see what other users are running (including command-line arguments).
Keep in mind to never pass passwords as command line arguments and pass them always through files
(with proper permissions) or interactively on stdin.
UID PID PPID C STIME TTY TIME CMD
root 2 0 0 Feb22 ? 00:00:00 [kthreadd]
root 3 2 0 Feb22 ? 00:00:00 \_ [rcu_gp]
root 4 2 0 Feb22 ? 00:00:00 \_ [rcu_par_gp]
root 6 2 0 Feb22 ? 00:00:00 \_ [kworker/0:0H-events_highpri]
root 8 2 0 Feb22 ? 00:00:00 \_ [mm_percpu_wq]
root 10 2 0 Feb22 ? 00:00:00 \_ [rcu_tasks_kthre]
root 11 2 0 Feb22 ? 00:00:00 \_ [rcu_tasks_rude_]
root 1 0 0 Feb22 ? 00:00:09 /sbin/init
root 275 1 0 Feb22 ? 00:00:16 /usr/lib/systemd/systemd-journald
root 289 1 0 Feb22 ? 00:00:02 /usr/lib/systemd/systemd-udevd
root 558 1 0 Feb22 ? 00:00:00 /usr/bin/xdm -nodaemon -config /etc/X11/...
root 561 558 10 Feb22 tty2 22:42:35 \_ /usr/lib/Xorg :0 -nolisten tcp -auth
/var/lib/xdm/...
root 597 558 0 Feb22 ? 00:00:00 \_ -:0
intro 621 597 0 Feb22 ? 00:00:40 \_ xfce4-session
intro 830 621 0 Feb22 ? 00:05:54 \_ xfce4-panel --display :0.0 --sm-
client-id ...
intro 1870 830 4 Feb22 ? 09:32:37 \_ /usr/lib/firefox/firefox
intro 1966 1870 0 Feb22 ? 00:00:01 | \_ /usr/lib/firefox/firefox -
contentproc ...
intro 4432 830 0 Feb22 ? 01:14:50 \_ xfce4-terminal
intro 4458 4432 0 Feb22 pts/0 00:00:11 \_ bash
intro 648552 4458 0 09:54 pts/0 00:00:00 | \_ ps -ef --forest
intro 15655 4432 0 Feb22 pts/4 00:00:00 \_ bash
intro 639421 549293 0 Mar02 pts/8 00:02:00 \_ man ps
...
First of all, each process has a process ID, often just PID (but not this one). The PID is a number assigned
by the kernel and used by many utilities for process management. PID 1 is used by the first process in the
system, which is always running. (PID 0 is reserved as a special value – see fork(2) if you are interested in
details.) Other processes are assigned their PIDs incrementally (more or less) and PIDs are eventually
reused.
Note that all this information is actually available in /proc/PID/ and that is where ps reads its information
from.
Execute ps -ef --forest again to view all process on your machine. Because of your graphical interface,
the list will be probably quite long.
Practically, a small server offering web pages, calendar and SSH access can have about 80 processes, for
a desktop running Xfce with browser and few other applications, the number will rise to almost 300 (this
really depends a lot on the configuration but it is a ballpark estimate). About 50–60 of these are actually
internal kernel threads. In other words, a web/calendar server needs about 20 “real” processes, a desktop
about 200 of them :-).
At this moment we will show probably the most important thing that you can do with processes. And that
is their forceful termination.
Eventually we will learn about the concept of signals, at the moment we resort ourselves to the basic two
commands.
The pgrep command can be used to find processes matching a given name.
Open two extra terminals and run sleep 600 in one and sleep 800 in the second one. The sleep program
simply waits given amount of seconds before terminating.
In a third terminal, run the following commands to understand how the searching for the processes is
done.
pgrep sleep
pgrep 600
pgrep -f 600
What have you learnt? Answer.
When we know the PID, we can use the kill utility to actually terminate the program. Try running kill
PID with PID of one of the sleeps and look what happened in the terminal with sleep.
Terminated (SIGTERM).
This message informs us that the command was forcefully terminated.
Some programs ignore the kill command and do not terminate. We will explain why that is possible in
some of the next labs but for now we want to mention that it is possible to add -9 to the kill command
which instructs the operating system to be a bit more forceful and terminate the program without giving
it any option to disagree ;-).
You can always kill your own processes but killing processes of someone else is not possible.
Going further
Few extra bits that will improve your user experience with SSH a lot but can be returned to any time later.
tmux
Alternatively, we can start session with some meaningful name:
tmux ls
To connect/attach to the running session run:
In order to detach from session we can simply press (do not forget to type the prefix!):
c create window
w list windows
n next window
p previous window
f find window
, name window
& kill window
Sometimes it is useful to split the screen into several terminals. These splits are called panes.
% vertical split
" horizontal split
o swap panes
q show pane numbers
x kill pane
← switch to left pane
→ switch to right pane
↑ switch to upward pane
↓ switch to downward pane
Other feature is that you can toggle writing simultaneously to all panes. Performing the same operation
multiple times may seem not much useful, but you can for example open several different ssh connections
in advance and then interactively control all these computers at the same time.
To toggle it, type the prefix and then write :set synchronize-panes. If you want to try this in Rotunda,
please do not run computationally intensive tasks…
As usual with Linux tools, you can modify its behavior widely via rc configuration. For instance, in order to
navigate among the panes with vim shortcuts, modify your .tmux.conf so it contains
Personal tip № 1: when you give a lecture, you can attach to the tmux session from two terminals. Later
on, you push the first one to the projector, while the second one is on you laptop screen. This eliminates
the necessity of mirroring your screen. Together with pdfpc and a tiling window manager we get a Swiss-
army knife for presentation.
There is much more to say. For more details see this tmux cheatsheet or manual pages.
SSH configuration
The SSH client is configured via the ~/.ssh/config file. Review the syntax of this file via man 5 ssh_config.
The file is divided into sections. Each section is related to one or more remote hosts. The section header
is in the format Host pattern, where pattern might use wildcards.
Host *
IdentityFile ~/.ssh/id_ed25519
Host intro
Hostname linux.ms.mff.cuni.cz
User YOUR_SIS_LOGIN
Host mff1
Hostname u-pl6.ms.mff.cuni.cz
User YOUR_SIS_LOGIN
Host mff2
Hostname u-pl17.ms.mff.cuni.cz
User YOUR_SIS_LOGIN
With this ~/.ssh/config, we can type ssh intro and the ssh will start connection equivalent to
ssh [email protected]
We recommend to use different u-pl* hostnames in your config to distribute the load across multiple
machines. Note that the Rotunda machines may be unevenly loaded, so it is a good idea to bookmark
several of them and re-login if the first one is too slow.
Before-class tasks (deadline: start of your lab, week March 13 - March 17)
The following tasks must be solved and submitted before attending your lab. If you have lab on
Wednesday at 10:40, the files must be pushed to your repository (project) at GitLab on Wednesday at
10:39 latest.
For virtual lab the deadline is Tuesday 9:00 AM every week (regardless of vacation days).
All tasks (unless explicitly noted otherwise) must be submitted to your submission repository. For most of
the tasks there are automated tests that can help you check completeness of your solution (see here how
to interpret their results).
These are multiple options available, separate your answer with spaces or commas, e.g. **[A1]** 1,2
**[/A1]**.
Assume that we have a file `test.txt` for which `ls -l` prints the following:
Which of the following users will be able to read the contents of the file?
Consider that the file from the previous example is stored within
the directory `/data` with the following permissions as printed by `ls -l`:
You can assume that the root directory `/` is readable and executable
by everybody.
Continuing with the previous questions, which commands can be used to make
the file `test.txt` readable and writeable only to the owner and nobody else?
Do not create this file in GitLab, we will check it on the remote machine only.
The presence of this file is semi-automatically checked by GitLab pipeline. However, there is a certain delay
before the tests are able to tell you that the file was really found on linux.ms.mff.cuni.cz.
Do not lose the private part for it — we will use it for some other tasks later on.
We expect you will solve the following tasks after attending the labs and hearing feedback to your before-
class solutions.
All tasks (unless explicitly noted otherwise) must be submitted to your submission repository. For most of
the tasks there are automated tests that can help you check completeness of your solution (see here how
to interpret their results).
Then clone the following repository and copy the output from uname -a (from your machine) to a file
called uname.txt in this repository and push it back.
The repositories will be created during week 04 or at the beginning of week 05.
Learning outcomes
Learning outcomes provide a condensed view of fundamental concepts and skills that you should be able
to explain and/or use after each lesson. They also represent the bare minimum required for understanding
subsequent labs (and other courses as well).
Conceptual knowledge
Conceptual knowledge is about understanding the meaning and context of given terms and putting them
into context. Therefore, you should be able to …
Practical skills are usually about usage of given programs to solve various tasks. Therefore, you should be
able to …
The goal of this lab is to expand our knowledge about shell scripting. We will introduce
variables, command substitution and also see how to perform basic arithmetics in shell.
We will build this lab around a single example that we will incrementally develop, so
that you learn the basic concepts on a practical example (obviously, there are specific
tools that could be used instead, but we hope that this is better than a completely
artificial example).
Our example will be built around building a small website from Markdown sources
using Pandoc. We will describe Pandoc first and then describe our running example.
Pandoc
Pandoc is a universal document converter that can convert between various formats,
including HTML, Markdown, Docbook, LaTeX, Word, LibreOffice, or PDF.
Ensure that your installation of Pandoc is reasonably up-to-date (i.e., at least version 2.19
that was released about a year ago).
Basic usage
Please, clone our example repository (or git pull it if you still have the clone around).
cat example.md
pandoc example.md
As you can see, the output is a conversion of the Markdown file into HTML, though
without an HTML header.
Markdown can be combined with HTML directly (useful if you want a more
complicated HTML code: Pandoc will copy it as-is).
As mentioned, Pandoc can create OpenDocument, too (the format used mostly in the
OpenOffice/LibreOffice suite).
You should not commit example.odt into your repository as it can be generated. That
is a general rule for any file that can be created automatically.
Did you know that LibreOffice can be used from the command line, too? For example,
we can ask LibreOffice to convert a document to PDF via the following command:
Combined with Pandoc, three commands are enough to create an HTML page and
PDF output from a single source.
Pandoc templates
By default, Pandoc uses its own default template for the final HTML. But we can change
this template, too.
Look inside template.html. When the template is expanded (or rendered), the parts
between dollars would be replaced with the actual content.
Pandoc can be used even in more sophisticated ways, but the basic usage (including
templates) is enough for our running example.
Pandoc supports conversion to and from LaTeX and plenty of other formats (try with -
-list-output-formats and --list-input-formats).
It can be also used as a universal Markdown parser with -t json (the Python call is not
needed as it only reformats the output).
Running example
Our example is a trivial website where the user edits Markdown files and we use
Pandoc and a custom template to produce the final HTML. At this moment the final
stage of the example is to produce HTML files that would be later copied to a web
server.
If you look at the files, there are some Markdown sources and build.sh that creates
the web.
We will now talk more about shell scripting and use our build.sh script to demonstrate
how we can improve it.
Understanding the following is essential, because together with pipes and standard
I/O redirection, it forms the basic building blocks of shell scripts.
First of all, we will introduce a syntax for conditional chaining of program calls.
If we want to execute one command only if the previous one succeeded, we separate
them with && (i.e., it is a logical and) On the other hand, if we want to execute the
second command only if the first one fails (in other words, execute the first or the
second), we separate them with ||.
The example with ls is quite artificial as ls is quite noisy when an error occurs.
However, there is also a program called test that is silent and can be used to compare
numbers or check file properties. For example, test -d ~/Desktop checks
that ~/Desktop is a directory. If you run it, nothing will be printed. However, in company
with && or ||, we can check its result.
Despite its silentness test is actually a very powerful command – it does not print
anything but can be used to control other programs.
It is possible to chain commands, && and || are left-associative and they have the same
priority.
Compare the following commands and how they behave when in a directory where
the file README.md is or is not present:
test -f README.md || echo "README.md missing" && echo "We have README.md"
test -f README.md && echo "We have README.md" || echo "README.md missing"
Extending the running example
You probably noticed that we get the last commit id (that is what git rev-parse --
short HEAD does) and use to create a footer for the web page (using the -A switch of
Pandoc).
That works as long as we are part of a Git repository. Copy the whole web directory
outside a Git repository and run build.sh again.
If we change the line to the following, we ensure that the script can be executed
outside of a Git project.
Shell variables
Variables in the shell are often called environment variables as they are (unlike
variables of any other language) visible in other programs, too.
In this sense shell variables play two important roles. There are normal variables for
shell scripts (i.e., variables with the same meaning as in other programming languages),
but they can also be used to configure other programs.
We have already set the variable EDITOR that is used by Git (and other programs) to
determine which editor to launch. That is, the variable controls behaviour of non-script
programs.
MY_VARIABLE="value"
Note that there can be no spaces around = as otherwise shell would consider that as
calling program MY_VARIABLE with arguments = and value.
The value is usually enclosed in quotes, but you can omit them if the value contains
no spaces or other special characters. Generally, it is safer to always quote the value
unless it looks like a C-style identifier.
To retrieve the value of the variable, prefix its name with the dollar sign $. Occurrences
of $VARIABLE are expanded to the value of the variable. This is similar to how ~ is
expanded to your home directory or wildcards are expanded to the actual file names.
We will discuss expansion in more detail later.
Unlike in other languages, shell variables are always strings. The shell has rudimentary
support for arithmetics with integers encoded as strings of digits.
Bash also supports dictionaries and arrays. While they can be extremely useful, their
usage often marks the boundary where using higher-level language might make more
sense with respect to maintainability of the code. We will not cover them in this course
at all.
Extending the running example
Currently our files are generated to the same directory as our source files. That makes
copying the HTML files to a web server error-prone as we might forget some file or
copy source that is not really needed.
Let us change the code to copy the files to a separate directory. We will
create public/ directory for that and modify the main part of our script to the
following:
cp main.css public/
All is good. Except the path is hard-coded in several places in the script. That might
complicate maintenance later on.
But we can easily use variable here to store the path and allow the user to change the
target directory by modifying the path in one place.
html_dir="public"
...
By default, the shell does not make all variables available to Python (or any other
application, for that matter). Only so-called exported variables are visible outside the
shell. To make your variable visible, simply use one of the following (the first call
assumes VAR was already set):
export VAR
export VAR="value"
It is also possible to export a variable only for a specific command using this shortcut:
For the list of all variables, you can execute set (again, as with cd, it is a shell built-in).
Note that some built-ins do not have their own man page but are instead described
in man bash – in the manual page of the shell we are using.
There are several variables worth knowing that are usually present in any shell on any
Linux installation:
• refers to your home directory. This is what ~ (the tilde) expands to.
$HOME
• $PWD contains your current working directory.
• $USER contains the name of the current user (e.g., intro).
• $RANDOM contains a random number, different in each expansion (try echo
$RANDOM $RANDOM $RANDOM).
$PATH
We already mentioned the $PATH variable. Now, it is the right time to explain it in detail.
There are two basic ways how to specify a command to the shell. It can be given as a
(relative or absolute) path (e.g., ./script.sh or 01/project_name.py or /bin/bash), or as
a bare name without slashes (e.g., ls).
In the first case, the shell just takes the path (relative to the working directory if needed)
and executes the particular file. Of course, the file has to have its executable bit set.
In the second case, the shell looks for for the program in all directories specified in the
environment variable $PATH. If there are multiple matches, the first one is used. If there
is none, the shell announces a failure.
The directories in $PATH are separated by colon : and typically, $PATH would contain at
least /usr/local/bin, /usr/bin, and /bin. Find out how your $PATH looks like
(simply echo it on your terminal).
The concept of a search path exists in other operating systems, too. Unfortunately,
they often use different separators (such as ;) because using colon may not be easily
possible.
However, installed programs are not always installed to the directories listed in it and
thus you typically cannot run them from the command line easily.
Extra pro hint for Windows users: if you use Chocolatey, the programs will be in
the $PATH and installing new software via choco will make the experience at least a bit
less painful :-).
It is possible to add . (the current directory) to the $PATH. This would enable executing
your script as just script.sh instead of ./script.sh. However, do not do that (even if
it is a modus operandi on other systems). This thread explains several reasons why it
is a bad idea.
In short: If you put it at the beginning of $PATH, you will likely execute random files in
the current directory which just happen to be named like a standard command (this is
a security problem!). If you put it at the end, you will likely execute standard commands
you even did not know to exist (e.g., test is a shell builtin).
Note that the script filename is appended as another argument, so everything works
as one could expect.
This is something we have not mentioned earlier – the shebang can have one optional
argument (but only one). It is added between the name of the interpreter and the
name of the script.
Therefore, the env-style shebang causes the env program to run with
parameters python3, path-to-the-script.py, and all other arguments. The env then
finds python3 in $PATH, launches it and passes path-to-the-script.py as the first
argument.
Note that this is the same env command we have used to print environment variables.
Without any arguments, it prints the variables. With arguments, it runs the command.
Unix has a long history. Back in the 1970s, the primary purpose of env was to work with the
environment. This included running a program within a modified environment, because the
shell did not know about VAR=value command yet. Decades later, it was discovered that
the side-effect of finding the program in the $PATH is much more useful :-).
We will see in a few weeks why it makes sense to search for Python in the $PATH instead
of using /usr/bin/python3 directly.
The short version is that with env, you can modify the $PATH variable by some clever tricks
and easily switch between different Python versions without any need to modify your code.
Script parameters
In Python, we access script parameters via sys.argv. In shell the situation is a bit more
complicated and unfortunately it is one of the places where the design of the
language/environment is somewhat lacking.
Shell uses special variables $1, $2, … to refer to individual arguments of the
script; $0 contains the script name.
We will later see how we can parse arguments in the usual format of -d , -f ..., now
we will use $i directly.
Shell also offers a special variable "$@" that can be used to pass all current parameters
to another program. We have explicitly used the quotes here as without them the
argument passing can break for arguments with spaces.
As a typical example of using "$@" we will create a simple wrapper for Pandoc that
adds some common options but allows the user to be further customized.
#!/bin/bash
If you try to use a variable that was not initialized, shell will pretend it contains an
empty string. While this can be useful, it can be also a source of nasty surprises.
As we mentioned earlier, you should always start you shell scripts with set -u to warn
you about such situations.
However, you sometimes need to read from a potentially uninitialized variable to check
if it was initialized. For example, we might want to read $EDITOR to get the user’s
preferred editor, but provide a sane default if the variable is not set. This is easily done
using the ${VAR:-default_value} notation. If VAR was set, its value is used,
otherwise default_value is used. This does not trigger the warning produced by set -
u.
So we can write:
"${EDITOR:-mcedit}" file-to-edit.txt
Frequently, it is better to handle the defaults at the beginning of a script using this
idiom:
EDITOR="${EDITOR:-mcedit}"
Later in the script, we may call the editor using just:
"$EDITOR" file-to-edit.txt
Note that it is also possible to write ${EDITOR} to explicitly delimit the variable name.
This is useful if you want to print variable followed by a letter:
file_prefix=nswi177-
echo "Will store into ${file_prefix}log.txt"
echo "Will store into $file_prefixlog.txt"
Extending the running example
We will now extend our running example with several echos so that the script can print
what it is doing.
This is a trivial code that checks if the first argument is --verbose and if so, it sets the
variable verbose to true.
#!/bin/bash
verbose=false
test "${1:-none}" = "--verbose" && verbose=true
...
Such approach would not work very well if we would like to add more switches but it
is good enough for us now.
...
...
How the code above works? Hint. Answer.
We saw that the shell performs various types of expansion. It expands variables,
wildcards, tildes, arithmetic expressions (see below), and many other things.
It is essential to understand how these expansions interact with each other. Instead of
describing the formal process (which is quite complicated), we will show several
examples to demonstrate typical situations.
We will call args.py from the previous labs to demonstrate what happens. (Of course
you need to call it from the right directory.)
VAR="*.sh"
args.py "$VAR"
args.py $VAR
args.py "\$VAR"
args.py '$VAR'
Run the above again but remove one.sh after assigning to VAR.
VAR=~
echo "$VAR" '$VAR' $VAR
VAR="~"
echo "$VAR" '$VAR' $VAR
The important take-away is that variable expansion is tricky. But it is always very easy
to try it practically instead of remembering all the gotchas. As a matter of fact, if you
keep in mind that spaces and wildcards require special attention, you will be fine :-).
We will do only a small change. We will replace the assignment to $html_dir with the
following code.
html_dir="${html_dir:-public}"
What has changed?Answer.
We can now change the behaviour of the program by two means. Use can add --
verbose or modify variable html_dir. That is definitely not very user friendly. We should
allow our script to be executed with --html=DIR to specify the output directory. We will
get back to this in one of the later labs.
At this moment, take it as an illustration of what options are available. The use
of html_dir="${html_dir:-public}" is a very cheap way to add customizability of the
script that can be sufficient in many situations.
Often, we need to store output from a command into a variable. This also includes
storing content of a file (or part of it) in a variable.
A prominent example is the use of the mktemp(1) command. It solves the problem with
secure creation of temporary files (remember that creating a fixed-name temporary
file in /tmp or elsewhere is dangerous). The mktemp command creates a uniquely-named
file (or a directory) and prints its name to stdout. Obviously, to use the file in further
commands, we need to store its name in a variable.
Shell offers the following syntax for the so-called command substitution:
...
# At the end of the script
rm -rf "$my_temp"
Command substitution is also often used in logging or when transforming filenames
(use man pages to learn what date, basename, and dirname do):
input_filename="/some/path/to/a/file.sh"
backup="$( dirname "$input_filename" )/$( basename "$input_filename" ).bak"
other_backup="$( dirname "$input_filename" )/$( basename "$input_filename" .sh
).bak.sh"
Extending the running example
echo "<p>Version: $( git rev-parse --short HEAD 2>/dev/null || echo unknown )</p>"
>version.inc.html
The change is rather small but it makes the generation of the version.inc.html a bit
more compact. We will improve readability of this piece of code with functions in the
next section.
Functions in shell
Recall from your programming classes that functions have one main purpose.
They allow the developer to introduce a higher level of abstraction by naming a certain
block of code, thus better capturing the intent of a larger piece of code.
Functions also reduce code duplicity (i.e., the DRY principle: don’t repeat yourself) but
that is mostly a side effect of creating new abstractions.
Functions in shell are rather primitive in their definition as there is never any formal list
of arguments or return type specification.
function_name() {
commands
}
A function has the same interface as a full-fledged shell script. Arguments are passed
as $1, $2, …. The result of the function is an integer with the same semantics as the exit
code. Thus, the () is there just to mark that this is a function; it is not a list of
arguments.
Please consult the following section on variable scoping for details about which
variables are visible inside a function.
We will add several new functions to our example to make it a bit more useful.
log_message() {
echo "$( date '+build.sh | %Y-%m-%d %H:%M:%S |' )" "$@" >&2
}
Run the inner call to date by itself to see what it does (the key is that + at the beginning
which informs date that we want to use a custom format).
logger=":"
test "${1:-none}" = "--verbose" && logger=log_message
The second trick is the use of colon :. That is basically a special builtin that does
nothing. But it still behaves as a command. So by setting logger to : or to log_message,
we execute one of the following:
On your own, wrap the the version generation into a reasonable function. Solution.
Calling return terminates function execution, the optional parameter of return is the
exit code.
is_shell_script() {
test "$( head -n 1 "$1" 2>/dev/null )" = '#!/bin/bash' && return 0
return 1
}
Because the exit code of the last program is also the exit code of the whole function,
we can simplify the code to the following.
is_shell_script() {
test "$( head -n 1 "$1" 2>/dev/null )" = '#!/bin/bash'
}
And such function can be used to control program flow:
The same effect would be obtained by using the following code directly but using
function allows us to capture the intent.
It is also a good idea to give a name to the function argument instead of referring to
it by $1. You can assign it to a variable, but it is preferred to mark the variable
as local (see details below):
is_shell_script() {
local filename="$1"
test "$( head -n 1 "$filename" 2>/dev/null)" = '#!/bin/bash' )"
}
The code is virtually the same. But by assigning $1 to a properly named variable we
increase the readability: the reader immediately sees that the first argument is a
filename.
Command precedence
You might notice that aliases, functions, built-ins, and regular commands are all called
the same way. Therefore, the shell has a fixed order of precedence: Aliases are checked
first, then functions, then built-ins, and finally regular commands from $PATH.
Regarding that, the built-ins command and builtin might be useful (e.g., for functions of
the same name).
Take away
This section explains few rules and facts about scoping of variables and why some
constructs could not work.
Shell variables are global by default. All variables are visible in all functions,
modification done inside a function is visible in the rest of the script, and so on.
It is often convenient to declare variables within functions as local, which limits the
scope of the variable to the function.
More precisely, the variable is visible in the function and all functions called from it. You can
imagine that the previous value of the variable is saved when you execute the local and
restored upon return from the function. This is unlike what most programming languages
do.
When you run another program (including shell scripts and Python programs), it gets
a copy of all exported variables. When the program modifies the variables, the changes
stay inside the program, not affecting the original shell in any way. (This is similar to
how working directory changes behave.)
However, when you use a pipe, it is equivalent to launching a new shell: variables set
inside the pipeline are not propagated to the outer code. (The only exception is that
the pipeline gets even non-exported variables.)
Read and run the following code to understand the mentioned issues.
global_var="one"
change_global() {
echo "change_global():"
echo " global_var=$global_var"
global_var="two"
echo " global_var=$global_var"
}
change_local() {
echo "change_local():"
echo " global_var=$global_var"
local global_var="three"
echo " global_var=$global_var"
}
echo "global_var=$global_var"
change_global
echo "global_var=$global_var"
change_local
echo "global_var=$global_var"
(
global_var="four"
echo "global_var=$global_var"
)
echo "global_var=$global_var"
echo "loop:"
(
echo "five"
echo "six"
) | while read value; do
global_var="$value"
echo " global_var=$global_var"
done
echo "global_var=$global_var"
The shell is capable of basic arithmetic operations. It is good enough for computing
simple sums, counting the numbers of processed files etc. If you want to solve
differential equations, please choose a different programming language :-).
counter=1
counter=$(( counter + 1 ))
Note that variables shall not be prefixed with a $ inside this environment. As a matter
of fact, in most cases things will work even with $ (e.g., $(( $counter + 1 ))) but it
is not a good habit to get into.
As a last change to our running example we will measure how long the execution was.
For that we will use date because with +%s it will print the amount of seconds since the
start of the Epoch.
As a matter of fact, all unix systems internally measure time by counting seconds from 1st
January of 1970 (Epoch start) and all displayed dates are recomputed from this.
Therefore following 3 lines around the whole script can give us number of seconds
that were spent running our script (at the moment, the script should not take more
than 1 second to complete but we might have more pages or more data eventually).
#!/bin/bash
...
More examples
More examples to try your knowledge before attacking the graded tasks.
Return to the examples from Lab 04 and decide where adding a function to the
implementation would improve the readability of the script.
Print information about the last commit, when the script is executed in a directory that is not
part of any Git project, the script shall print only Not inside a Git
repository. Hint. Solution.
The command getent passwd USERNAME prints the information about user
account USERNAME (e.g., intro) on your machine. Write a command that prints information
about user intro or a message This is not NSWI177 disk if the user does not
exist. Solution.
The following tasks must be solved and submitted before attending your lab. If you
have lab on Wednesday at 10:40, the files must be pushed to your repository (project)
at GitLab on Wednesday at 10:39 latest.
For virtual lab the deadline is Tuesday 9:00 AM every week (regardless of vacation
days).
All tasks (unless explicitly noted otherwise) must be submitted to your submission
repository. For most of the tasks there are automated tests that can help you check
completeness of your solution (see here how to interpret their results).
However, if a file .NO_HEADER exists in the current directory, nothing will be printed
(even if HEADER exists).
If neither of the files exists, the program should print Error: HEADER not found. on
standard error and terminate with exit status 1.
Use only && and || to control program flow, do not use if even if you happen to know
these constructs in shell. It is okay to get information about file existence several times
in the script, we will not modify the files while your script is running.
The modification date should be printed in YYYY-MM-DD format, if the file does not exist
(or there is some other issue in reading the modification time) the program should
terminate with non-zero exit code.
Please, complete the following form and after you complete it, please, create an empty
file 06/feedback.txt in your repository.
The link points to different language translations of the same survey, complete only
one of them, please.
(We do not see any other simple way to ensure that the survey remains anonymous.)
We expect you will solve the following tasks after attending the labs and hearing
feedback to your before-class solutions.
All tasks (unless explicitly noted otherwise) must be submitted to your submission
repository. For most of the tasks there are automated tests that can help you check
completeness of your solution (see here how to interpret their results).
TBA
Learning outcomes
Learning outcomes provide a condensed view of fundamental concepts and skills that
you should be able to explain and/or use after each lesson. They also represent the
bare minimum required for understanding subsequent labs (and other courses as well).
Conceptual knowledge
Practical skills are usually about usage of given programs to solve various tasks.
Therefore, you should be able to …