Programming – Thoughts on Programming, Technology and Software

How to make docker in VS Code devcontainer use the host network

If you have a complex set up, e.g. with VPN and proxy, and docker with an isolated (bridge) network is unable to connect to the internet or is even mis-configuring your DNS name resolution on the host machine (I will add another post on that), the docker option --network=host often helps. It tells docker to simply re-use the network connections of the host machine.

However, this is not the default and there is no central docker configuration file setting that would make this the default behavior, so when you have downloaded a project that ships it’s own VS code devcontainer, chances are that the docker image build will fail (if some RUN commands in the Dockerfile require internet access) or that network access within the running docker devcontainer will not work.

You can add this configuration setting to .devcontainer/devcontainer.json to forward the --network=host option to docker build:

{
  "build": {
    # ...
    "options": [
      "--network=host"
    ]
}

And you can add this configuration setting to .devcontainer/devcontainer.json to forward the --network=host option to docker run:

{
  # ...
  "runArgs": ["--network=host"]
}

Try it out, and you will see the option appear in the docker build / docker run call in the devcontainer start-up log.

Using sh_binary with bazel run to export files from bazel-bin

In my quest to bazelize “side targets” like documentation, I’ve had to extract files which are the output of a bazel target to a folder outside of the “bazel world”.

First, I tried to copy the files over from the <workspace>/bazel-bin directory in a CI script after running the bazel build <target> command. While this worked locally, the files were nowhere to be found when the job was executed by one of the CI/CD workers. This attempt failed for multiple reasons. First, the bazel-bin directory is a symlink, and it simply existed in a different path on the CI worker. So, I queried the bazel-bin directory with bazel info bazel-bin. Now, the action was able to see directories, but no files.

I learned that the root cause for this is that bazel was configured to only download artifacts from the remote cache when they are needed. Since bazel did not know about the script which “needs” the files to exist, it did not download the files. What was misleading about all this, is that bazel caches the build log (output), so in the log viewer of the CI/CD workflow, it always looked like the target was actually running everything (at least at first sight).

My next attempt was to add --remote_download_outputs=all to the bazel call. This worked, but not reliably. Since I had to copy build outputs from several targets bundled together as filegroups, it may have been more complicated. Some colleague suggested that I extract all the generated files from the build_event JSON file, but another idea was more elegant in the end:

We created a small shell script that does the copying of the files, and added it as the source to a sh_binary rule. The user – or the CI workflow – can now use bazel run <sh_binary_target> and this will copy the files to the export folder outside of the “bazel directories”. The beauty of this approach is, that we don’t even have to run bazel build <target> before, because bazel will run this for us if the target is outdated. I also don’t have to tell bazel build to download all remoute outputs explicitly – bazel will do so if the bazel run command is invoked, because the outputs are modelled as a data dependency to the sh_binary target.

In order to keep the copy job configurable (different output paths may be provided as command line argument), I’ve had to jump through some hoops, but here is a simplified version, that only copies one file to the export directory:

#!/bin/bash
# export_files.sh
set -euo pipefail

exported_file=$2
cp "$1" "$exported_file"

This is the content of BUILD.bazel:

some_target(
    name = "output.tgz",
    srcs = [":all_srcs"],
)

sh_binary(
    name = "export_files",
    srcs = ["export_files.sh"],
    args = ["output_path"],
    data = [":output.tgz"],
)

Fix a spurious ruff python linter warning E402

Ruff issues the warning message “E402 Module level import not at top of file” for the following python code. First I thought that this is a bug, because the module imports are at the top of the file. There’s just the shebang, the module docstring, and the copyright string before that… and all those should appear before the import statements according to python style guides. So this example file should be valid python code:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
__copyright__ = """
(c) 2017-2024 Arwed Starke
The reproduction, distribution and utilization of this file as
well as the communication of its contents to others without express
authorization is prohibited. Offenders will be held liable for the
payment of damages and can be prosecuted. All rights reserved
particularly in the event of the grant of a patent, utility model
or design.
"""

""" This ROS node displays traffic light interface data."""

import rospy
import signal

# ...

Another warning by pydocstyle that the module has no module docstring pointed me to the solution: I have to put the module docstring before the copyright string:

#!/usr/bin/env python
""" This ROS node displays traffic light interface data."""

__copyright__ = """
(c) 2017-2024 Arwed Starke
The reproduction, distribution and utilization of this file as
well as the communication of its contents to others without express
authorization is prohibited. Offenders will be held liable for the
payment of damages and can be prosecuted. All rights reserved
particularly in the event of the grant of a patent, utility model or design.
"""

import rospy
import signal

# ...

It was an easy fix, but hard to find out because the ruff message is kinda unspecific for this issue and there is no auto-fix available.

Solve pytest import error (while python can)

After refactoring a formerly monolithic python script into several files, I started getting problems related to module imports. I shifted the source code into a package directory, and the test/ directory was now parallel to the package source directory. The full directory layout was like this:

<packagename>
|- <packagename>/
|  |- __init__.py
|  |- module_1.py
|  `- ...
|- test/
|  |- __init__.py
|  |- test_module1.py
|  `- ...
|- __init__.py
|- README.md
`- packagename.py

When running “pytest <packagename>”, I got:

ImportError while importing test module '...'

However, when running “packagename.py” (executable python script via shebang), everything worked fine. Both the test modules and the main script contained the same import statement:

from packagename import module_1

A google search turned up: https://stackoverflow.com/questions/41748464/pytest-cannot-import-module-while-python-can, which pointed me in the direction that the issue must be related to __init__.py

However, the solution was not to delete __init__.py in the test folder, but to delete __init__.py in the main package folder. I came to this solution based on the documentation of pytest: https://docs.pytest.org/en/latest/explanation/goodpractices.html#tests-as-part-of-application-code

The working layout is:

<packagename>
|- <packagename>/
|  |- __init__.py
|  |- module_1.py
|  `- ...
|- test/
|  |- __init__.py
|  |- test_module1.py
|  `- ...
|- README.md
`- packagename.py

Disclaimer:

I do not want to advertise this directory layout for python packages. In my use case, I have a python script inside a very large repository that is not distributed as a package outside of this repository; I just wanted to improve maintainability.
When you want to create a python package for distribution, you’re faced with many other considerations. There are two competing package structure styles: With or without the package sources inside a “src/” folder. The python packing guide advertises the “src/” folder layout. See https://packaging.python.org/en/latest/tutorials/packaging-projects/#a-simple-project for an example of “<packagename>/src/<packagename>” layout. If in doubt, go with one of the more popular cookiecutter templates.

A good introduction to dependency management in large software projects

I have worked on some parts of the build dependency handling in a large software project in my company (monorepo project, > 1800 packages, > 10 million LOC, 100s of external dependencies). I recently stumbled upon this article in Google’s Bazel documentation and really enjoyed reading it:

https://bazel.build/basics/dependencies

As usual, when it comes to developing software at scale, Google has already written about the experiences you make in your journey. I can agree to all of the points about dependency handling that they make.

Modify last commit date in git

Set the date of the last commit to the current date

git commit --amend --no-edit --date "$(date)"

Set the date of the last commit to an arbitrary date

git commit --amend --no-edit --date "Mon 21 Jul 2021 20:21:00 CET"

Dump the context of a github action

Github Actions documentation on specific events, e.g. “pull_request”, isn’t great. To answer questions like “where can I find the labels added to the PR that triggered the workflow”, it may be the most straightforward way to simply dump all information that the workflow receives, by dumping the github context.

You can to this within a github workflow file with this step:

- name: Dump GitHub context
  env:
    GITHUB_CONTEXT: ${{ toJson(github) }}
  run: echo "$GITHUB_CONTEXT"

Install pip packages from git repo

Pip supports installing a python package using a link to a git repository directly. You can specify such a direct link to the package source from the pip command line or a requirements.txt file. The following article gives an overview how.

It can be handy to use pip to install a project dependency directly from a git repository instead of from a Python package index. I’ll show you why you might want to do that and how to do it.

How to pip install from a git repository

Instead of a package name, give pip a git repository URL as parameter:

# general form: pip install git+<repository_url>
# example:
pip install git+https://github.com/arwedus/some-example.git

If this repo does not yet contain a python package struture, you’ll need to add a setup.py at least, so that pip can carry out the install (c.f. cookie-cutter).

You can explicitly call out the package name that you’re installing with #egg=

# general form: pip install git+<repository_url>#egg=<package_name>
# example:
pip install git+https://github.com/arwedus/some-example.git#egg=example-package

There are also a number of different ways to specify a version of the repository that you want to fetch:

# Use a commit SHA
pip install git+https://github.com/arwedus/some-example.git@4045597
# Use a tag
pip install git+https://github.com/arwedus/[email protected]
# Use a branch
pip install git+https://github.com/arwedus/some-example.git@feature/fix-readme

How to include the dependency in a requirements.txt

If you want to share a git repository dependency with other developers, you’ll likely want to add it to your requirements.txt, like so:

# Just put the pip install argument straight into your requirements.txt
package-a==1.2.3
git+https://github.com/arwedus/[email protected]
package-b==4.5.6

# Or you can use the preferred PEP 440 direct URL syntax
package-a==1.2.3
some-example @ git+https://github.com/arwedus/some-example.git@deadbeef123

Python: Cleaning a virtual environment

If you want to test whether a requirements.txt installs all packages required to run your python application, and you have experimented with the virtual environment before (installed some extra packages etc.), you want to remove all packages and keep only those a the requirements.txt would install. Two solutions come to my mind:

Delete the virtual environment and create it again.
Remove all packages installed into the virtual environment:
pip freeze | xargs pip uninstall -y Then install all packages from the requirements.txt again: pip install -r requirements.txt

Doxygen cannot handle more than one image with the same name

As you might know, you can include images in the output of your Doxygen html documentation like this:

@image html path/to/img.png

In one of my larger projects, I found out that doxygen simply copies all referenced images, which it finds somewhere below the IMAGE_PATH, into the root directory of the output folder. Given the following structure:

src/component1/doc/component1.md
src/component1/doc/img/structure.svg
src/component2/doc/component2.md
src/component2/doc/img/structure.svg

where component1.md and component2.md both have the following content:

...
![Structure](img/structure.svg)
...

the output will show the component2/doc/img/structure.svg for both component1 and component2. There is no warning or error message during doxygen generation about this.

As I don’t think this behavior can be right, I added an issue report for it:

https://github.com/doxygen/doxygen/issues/8362

Nevertheless, this means for the time being: Make sure all image file names referenced by Doxygen are unique.