Python – Thoughts on Programming, Technology and Software

Fix a spurious ruff python linter warning E402

Ruff issues the warning message “E402 Module level import not at top of file” for the following python code. First I thought that this is a bug, because the module imports are at the top of the file. There’s just the shebang, the module docstring, and the copyright string before that… and all those should appear before the import statements according to python style guides. So this example file should be valid python code:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
__copyright__ = """
(c) 2017-2024 Arwed Starke
The reproduction, distribution and utilization of this file as
well as the communication of its contents to others without express
authorization is prohibited. Offenders will be held liable for the
payment of damages and can be prosecuted. All rights reserved
particularly in the event of the grant of a patent, utility model
or design.
"""

""" This ROS node displays traffic light interface data."""

import rospy
import signal

# ...

Another warning by pydocstyle that the module has no module docstring pointed me to the solution: I have to put the module docstring before the copyright string:

#!/usr/bin/env python
""" This ROS node displays traffic light interface data."""

__copyright__ = """
(c) 2017-2024 Arwed Starke
The reproduction, distribution and utilization of this file as
well as the communication of its contents to others without express
authorization is prohibited. Offenders will be held liable for the
payment of damages and can be prosecuted. All rights reserved
particularly in the event of the grant of a patent, utility model or design.
"""

import rospy
import signal

# ...

It was an easy fix, but hard to find out because the ruff message is kinda unspecific for this issue and there is no auto-fix available.

Solve pytest import error (while python can)

After refactoring a formerly monolithic python script into several files, I started getting problems related to module imports. I shifted the source code into a package directory, and the test/ directory was now parallel to the package source directory. The full directory layout was like this:

<packagename>
|- <packagename>/
|  |- __init__.py
|  |- module_1.py
|  `- ...
|- test/
|  |- __init__.py
|  |- test_module1.py
|  `- ...
|- __init__.py
|- README.md
`- packagename.py

When running “pytest <packagename>”, I got:

ImportError while importing test module '...'

However, when running “packagename.py” (executable python script via shebang), everything worked fine. Both the test modules and the main script contained the same import statement:

from packagename import module_1

A google search turned up: https://stackoverflow.com/questions/41748464/pytest-cannot-import-module-while-python-can, which pointed me in the direction that the issue must be related to __init__.py

However, the solution was not to delete __init__.py in the test folder, but to delete __init__.py in the main package folder. I came to this solution based on the documentation of pytest: https://docs.pytest.org/en/latest/explanation/goodpractices.html#tests-as-part-of-application-code

The working layout is:

<packagename>
|- <packagename>/
|  |- __init__.py
|  |- module_1.py
|  `- ...
|- test/
|  |- __init__.py
|  |- test_module1.py
|  `- ...
|- README.md
`- packagename.py

Disclaimer:

I do not want to advertise this directory layout for python packages. In my use case, I have a python script inside a very large repository that is not distributed as a package outside of this repository; I just wanted to improve maintainability.
When you want to create a python package for distribution, you’re faced with many other considerations. There are two competing package structure styles: With or without the package sources inside a “src/” folder. The python packing guide advertises the “src/” folder layout. See https://packaging.python.org/en/latest/tutorials/packaging-projects/#a-simple-project for an example of “<packagename>/src/<packagename>” layout. If in doubt, go with one of the more popular cookiecutter templates.

Install pip packages from git repo

Pip supports installing a python package using a link to a git repository directly. You can specify such a direct link to the package source from the pip command line or a requirements.txt file. The following article gives an overview how.

It can be handy to use pip to install a project dependency directly from a git repository instead of from a Python package index. I’ll show you why you might want to do that and how to do it.

How to pip install from a git repository

Instead of a package name, give pip a git repository URL as parameter:

# general form: pip install git+<repository_url>
# example:
pip install git+https://github.com/arwedus/some-example.git

If this repo does not yet contain a python package struture, you’ll need to add a setup.py at least, so that pip can carry out the install (c.f. cookie-cutter).

You can explicitly call out the package name that you’re installing with #egg=

# general form: pip install git+<repository_url>#egg=<package_name>
# example:
pip install git+https://github.com/arwedus/some-example.git#egg=example-package

There are also a number of different ways to specify a version of the repository that you want to fetch:

# Use a commit SHA
pip install git+https://github.com/arwedus/some-example.git@4045597
# Use a tag
pip install git+https://github.com/arwedus/[email protected]
# Use a branch
pip install git+https://github.com/arwedus/some-example.git@feature/fix-readme

How to include the dependency in a requirements.txt

If you want to share a git repository dependency with other developers, you’ll likely want to add it to your requirements.txt, like so:

# Just put the pip install argument straight into your requirements.txt
package-a==1.2.3
git+https://github.com/arwedus/[email protected]
package-b==4.5.6

# Or you can use the preferred PEP 440 direct URL syntax
package-a==1.2.3
some-example @ git+https://github.com/arwedus/some-example.git@deadbeef123

Python: Cleaning a virtual environment

If you want to test whether a requirements.txt installs all packages required to run your python application, and you have experimented with the virtual environment before (installed some extra packages etc.), you want to remove all packages and keep only those a the requirements.txt would install. Two solutions come to my mind:

Delete the virtual environment and create it again.
Remove all packages installed into the virtual environment:
pip freeze | xargs pip uninstall -y Then install all packages from the requirements.txt again: pip install -r requirements.txt

Shared post: Using SCons effectively

An overall interesting read to understand core concepts of SCons:

http://blog.bfitz.us/?p=1679

Profiling with Python: cProfile and RunSnakeRun

This is my favorite way to analyze the performance of my Python scripts:

Call the script with cProfile:
python -m cProfile -o profile.dat main.py
Open the profiler data with RunSnakeRun:
%PYTHON_ROOT%\scripts\runsnake.exe
A nice GUI will start up, and you can open the profile.dat file in there.

You have to install RunSnakeRun first, of course. You should choose the easy_install procedure I described here for the runsnake.exe to be installed in the scripts directory. You can find the current RunSnakeRun package on pypi.

RunSnakeRun homepage

Libxml2 and Python on Windows

For the purpose of using XPath and XQuery out of my python scripts for test statistics generation, I decided to try libxml2. I mostly decided against ElementTree because their website told me the XPath subset does not support queries like:

count(//event[evalresult/text()="FALSE"][warnlevel/text()="1"])

Also, I hope to be faster with the C-based implementation of libxml2. For installation, I only had to download this pre-bundled lxml windows binary from the python package repository; it comes with libxml2 included.

The following code is enough to get the above count:

from lxml import etree
doc = etree.parse(filePath)
result = doc.xpath('//event[evalresult/text()="FALSE"][warnlevel/text()="1"]')
count = len(result)
count2 = doc.xpath(count('//event[evalresult/text()="FALSE"][warnlevel/text()="1"]')) # alternative

About XQuery: lxml serves as frontend for libxml2 and libxslt, neither of which support XQuery.

Further information:
lxml homepage
SketchPath, a good XPath expression evaluation software

Python Pitfalls: References to the same dictionary

Today, I tried to build a dictionary which contains the contents of a list as keys and empty dictionarys as values. First, I went with the method dict.fromkeys(seq, [value]), then I filled the sub-dictionaries in a for loop, like this:

mydict = dict.fromkeys(lstLabels, {})
for strLabel in dict.keys():
    mydict[strLabel][strLabel] = "x"

What I ended up with was that the sub-dictionaries referenced by the keys from lstLabels all referenced to the same dictionary instance. What this means is, that whenever you add something to dict[lstLabels[0]], it appears in dict[lstLabels[1]], too, and so on.

The correct way to accomplish what I wanted – different instances of subdirectories – was this:

    for strLabel in lstLabels:
        mydict[strLabel] = {}
        mydict[strLabel][strLabel] = "x"

Apparently, the dict.fromkeys() method does something like lstFail = [[] * 3]

Are there more elegant ways to create a dictionary with subdictionarys based on a list of strings than the one presented?

Installing Python packages with easy_install

Python comes with a nice tool called easy_install that enables you to install additional packages as easy as:

easy_install <package_name>

If you are behind a HTTP proxy server, this might fail unless you specify the proxy URL in the environment variable HTTP_PROXY like this:

set http_proxy=”http://www.myproxy.org”

However, when you are sitting behind a firewall (like we at my company are), you might still not be able to download any packages from the python package repository. Here’s a quick guide how to circumvent the issue:

Create a folder on your local machine that will contain all packages you have to download.
Go to http://pypi.python.org/simple/ and download the package you want to install into your local folder.
In the scripts directory of your python installation, call:
easy_install -H none -f d:\Backup\Programs\python-packages <package_name>
When you disallow all foreign hosts with “-H none”, easy_install will not be able to fetch any packages your package depends on, so you have to resolve all dependency errors by repeating step 1 – 3 for the missing package if easy_install gives you a message like this:
“No local packages or download links found for logilab-common>=0.49.0”

Another solution I found on the internet would be to install and use ASProxy, but I’m not going to try that for now.