grass.script: More carefully setup env in create_project by wenzeslaus · Pull Request #5854 · OSGeo/grass

wenzeslaus · 2025-06-06T19:58:24Z

This changes the condition used to create the runtime environment in create_project so that the environment is not set up not only when GISBASE is not set, but also when GISBASE is not in PATH which (at least at this point) indicates that no tools can run, so a call to g.proj later (and technically the potential g.message call before that) will fail.

This helps when GISBASE is set, even correctly, for some reason, but PATH is not. This may happen when someone is trying to GISBASE in some special way before GRASS actually runs. I have suspicion this is happening on Windows, but hard to tell without reading the setup scripts in details (when running on Windows you could print the environment and see). However, judging from @echoix experiments in #5717, this may be unrelated to the issues with create_project. Anyway, before this PR, the following results in a traceback, but with this PR it works.

GISBASE="/.../dist.x86_64-pc-linux-gnu/" python -c "import grass.script as gs; import tempfile; d = tempfile.TemporaryDirectory(); gs.create_project(d.name, 'project', epsg=3358); d.cleanup();"

Traceback (most recent call last):
  File "<string>", line 1, in <module>
...
    raise OSError(_("Cannot find the executable {0}").format(args[0]))
OSError: Cannot find the executable g.proj

The env setup is done only when needed, so simple XY does not trigger creation of the environment, and so it is more lightweight in terms of things happening and simply faster.

code	time	command
original create_project	558 usec per loop	`python -m timeit -n 1000 -- "import grass.script as gs; import tempfile; d = tempfile.TemporaryDirectory(); gs.create_project(d.name, 'project'); d.cleanup();"`
new create_project	365 usec per loop	`python -m timeit -n 1000 -- "import grass.script as gs; import tempfile; d = tempfile.TemporaryDirectory(); gs.create_project(d.name, 'project'); d.cleanup();"`
low-level calls	309 usec per loop	`python -m timeit -n 1000 -- "import grass.script as gs; import tempfile; d = tempfile.TemporaryDirectory(); gs.core._create_location_xy(d.name, 'project'); gs.core._set_location_description(d.name, 'project', None); d.cleanup();"`

With faster create_project, I replaced the old gs.core._create_location_xy calls in tests by simple gs.create_project calls.

The code to ensure runtime environment is moved to grass.script.setup, broken down and made reusable. The difference for usage in create_project is that os.environ is now always copied even if it already has GISBASE (we saved that copy operation before if GISBASE was already present). The handling of env, however, is consistent with other functions in grass.script.setup.

The env in create_project is not set up not only when GISBASE is not set, but also when GISBASE is not in PATH which at this point indicates that tools will not run. The env setup is done only when needed, so simple XY does not trigger creation of the environment, and so it is more lightweight. The code to ensure runtime environment is moved to grass.script.setup, broken down and made reusable. The difference for usage in create_project is that os.environ is now always copied even if it already has GISBASE (we saved that copy operation before if GISBASE was already present). The handling of env, however, is consistent with other functions in grass.script.setup. With faster create_project, I replaced the old gs.core._create_location_xy calls by simple gs.project calls in tests.

echoix

Other higher level observations:

What's the real difference between gs.create_project() and gs.setup.init() (disregarding the specifics or implementation, looking at a distance above all that...). From the adjustments made in the tests, it seems one would always need to use both (aka boilerplate). If the pattern of using a "with" (context manager) for gs.setup.init() was supposed to be the end-game of the wrappers to make grass work, why would we need another layer above to patch the shortcomings of gs.setup.init()
The topic of performance was brought in into the PRs description. Even though currently, I think (as I'm not able to properly measure and profile due to how all grass+python works currently, or at least do it easily) the biggest slowdowns/bottlenecks are the uncountable process calls (launching a binary), usually in python it would be the number of function calls that would be addressed. These are some parts that no newer version of python* (famous last words...) can really improve and reduce. It's hard to optimize when crossing function boundaries, as there's a whole lot going on when "calling a function".
- Otherwise the next place I would look into, for performance, would be the import times. What is the "cost" of the imports we do, and ending up to "import the world". It's a bit easier to measure, as there's a python cli flag to trace the import times. I suspect temporal import times would be kinda big too. I already know from older exploration that when an import of one of our modules is made to one of our ctypes wrappers, that the import times significantly increase (the proportions are something like 300 for our ctypes part for 80 for normal/the rest: it's been a while and it's by memory). I also noticed in that exploration session a while ago that when there was two imports of two different ctypes modules, the total import time was almost doubled (because of the importance of the ctypes import time). That's why the suspicion of temporal is founded.

python/grass/script/core.py

general/g.mapsets/tests/conftest.py

lib/gis/tests/lib_gis_env_test.py

python/grass/script/core.py

python/grass/script/tests/grass_script_core_location_test.py

python/grass/script/core.py

python/grass/script/setup.py

echoix · 2025-06-06T23:10:20Z

Also, do you want me to redo some tests/exploration on test+pytest, especially on windows, and also randomizing the order to see if this helps the global situation discussed in #4480?

(Ie, if I can get further than just what was ready in #5584)

wenzeslaus · 2025-06-07T01:52:33Z

About redoing the tests, yes, perhaps better done after merging this, but I would definitely appreciate some help with debugging create_project on Windows. For starters, I don't know if we have a test which would actually test the same thing you a @landam did manually when you both got to g.proj not found.

I need more time to respond to the rest.

wenzeslaus · 2025-06-18T17:44:49Z

What's the real difference between gs.create_project() and gs.setup.init() ...From the adjustments made in the tests, it seems one would always need to use both (aka boilerplate)...why would we need another layer above to patch the shortcomings of gs.setup.init()

You need to have a project (data) and an active session using a project (or more precisely: env variables to make the runtime work and connection to a project). gs.setup.init() could create a project if we add additional parameters or it could work on some delayed start basis and having a user set up the project later.

# One-step init + create
gs.setup.init("~/myproject", crs="some PROJ-accepted string")  # crs implies create

# Delayed CRS setup
session = gs.setup.init("~/myproject")  # creates an empty XY project if dir does not exist
session.set_crs("some PROJ-accepted string")

The topic of performance was brought in into the PRs description. Even though currently, I think (as I'm not able to properly measure and profile due to how all grass+python works currently, or at least do it easily) the biggest slowdowns/bottlenecks are the uncountable process calls (launching a binary)...

I measured these times way back when in experiment with small Python processes. While I don't have the code or the results anymore, I'm sure it is still an issue. Tools written in Python, even when executed directly in command line, invoke g.parser to deal with their command line parameters. In #2923, I'm even adding another subprocess call to deal with with command line parameters before the tool is executed. This is not ideal, but usually more stable and straightforward than going through ctypes. Ultimately, having direct library function access to the tools (in C or whatever) would be the best, but it is worth seeing what the best options are now for the subprocesses. From API perspective, that's what #2923 is doing. From performance perspective, we are mostly limited by the subprocess call cost which is out of our hands, but from some experiment at some point, it looked like Linux is much faster than Windows for subprocess execution, so perhaps there is still room for improvement. Let's have this discussion some time later.

...What is the "cost" of the imports we do, and ending up to "import the world". It's a bit easier to measure, as there's a python cli flag to trace the import times...

We did some work to avoid ctypes for stability reasons, but the Python library was originally minimal and started in the time when from x import * was the favorite style, so a lot of that legacy is still there. I'm experimenting with lazy imports through module-level __getattr__ in #2923, so that's perhaps a way how to reduce some of the times. It may not make sense for grass.script imports from core.py, but it may make sense for other things.

I already know from older exploration that when an import of one of our modules is made to one of our ctypes wrappers, that the import times significantly increase...

The complexity of ctypes is one of the reasons why subprocesses, while far from perfect, still seem to be somewhat okay. :-)

…aning GISRC without processing g.proj run.

echoix · 2025-06-19T22:42:37Z

Looks promising and eager to try it out

echoix

I didn't try it out manually yet, but if you need to merge it before I get to it, it still looks right, and tests are right.

github-actions bot added vector Related to vector data processing temporal Related to temporal data processing Python Related code is in Python libraries module general tests Related to Test Suite notebook labels Jun 6, 2025

echoix reviewed Jun 6, 2025

View reviewed changes

wenzeslaus and others added 4 commits June 18, 2025 13:55

Remove three forgotten pylint: disable=protected-access

72c7402

Remove one variable, construct the path directly from a string

c3c478b

Test overwritting a project which for XY project should also test cle…

a885595

…aning GISRC without processing g.proj run.

Merge branch 'main' into better-env-handling-in-create-project

4e43192

echoix approved these changes Jun 20, 2025

View reviewed changes

wenzeslaus merged commit 9dd92f9 into OSGeo:main Jun 20, 2025
27 checks passed

wenzeslaus deleted the better-env-handling-in-create-project branch June 20, 2025 19:13

github-actions bot added this to the 8.5.0 milestone Jun 20, 2025

This was referenced Jun 24, 2025

[Bug] [help-needed] gs.create_project doesn't seem enough to get a working session on Windows #4480

Open

grass.script: Always use env for shutil.which #5717

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Comments

grass.script: More carefully setup env in create_project#5854

grass.script: More carefully setup env in create_project#5854
wenzeslaus merged 5 commits intoOSGeo:mainfrom
wenzeslaus:better-env-handling-in-create-project

wenzeslaus commented Jun 6, 2025

Uh oh!

echoix left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

echoix commented Jun 6, 2025

Uh oh!

wenzeslaus commented Jun 7, 2025

Uh oh!

wenzeslaus commented Jun 18, 2025

Uh oh!

echoix commented Jun 19, 2025

Uh oh!

echoix left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Comments

Conversation

wenzeslaus commented Jun 6, 2025

Uh oh!

echoix left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

echoix commented Jun 6, 2025

Uh oh!

wenzeslaus commented Jun 7, 2025

Uh oh!

wenzeslaus commented Jun 18, 2025

Uh oh!

echoix commented Jun 19, 2025

Uh oh!

echoix left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants