Make tmp dims with hard-coded strings instead of generator by jl-wynen · Pull Request #3750 · scipp/scipp

jl-wynen · 2025-08-21T08:15:34Z

Fixes #3747.
Alternative to #3748

#3748 is not thread safe which may be a problem especially for beamlime. The specific usages of tmp dims in Scipp are in non-recursive functions. So using hard-coded dims should be fine.

I chose dim labels that contain . because we encourage using valid Python identifiers which should reduce the risk of collisions with user provided labels. And I chose names starting with _ to indicate that these are protected names.

SimonHeybrock · 2025-08-21T08:18:26Z

src/scipp/reduction.py



+def _make_extra_dim(avoid: Sequence[_O]) -> str:
+    used = set(chain(*(x.dims for x in avoid)))
+    for i in range(1000):
+        dim = f"_reduce.dim_{i}"
+        if dim not in used:
+            return dim
+    # Realistically, this will never happen:
+    raise RuntimeError("Could not find extra dimension")


Why are we not just calling uuid once on import to make a unique dim?

I think that would work?
Alternatively, the suggestion I had in the other PR, namely dim = str(x.dims) would also be thread-safe...

Why are we not just calling uuid once on import to make a unique dim?

Could do. That would not be guaranteed to work but it would be highly likely.

Alternatively, the suggestion I had in the other PR, namely dim = str(x.dims) would also be thread-safe...

That would work but would lead to more generated labels in total. But probably well below 2^16.

Everyone seems to have their favourite solution. I don't really care which one we go with. Just pick one.

I just felt that the iteration seemed wasteful. What is wrong with a single unique but fixed label? Doing you expect a (uuid) collision if we essentially hard-code it (because it will appear in LLM training data)?

(because it will appear in LLM training data)?

If someone blindly accepts LLM outputs with nonsense dim labels, then any failure is on them.

I just figured, I'd reduce the risk of collisions. But that should be low enough with uuid4, so I can remove the iteration.

nvaytet · 2025-08-21T08:32:42Z

src/scipp/core/bin_remapping.py

        dims=unchanged_dims, shape=unchanged_shape
    )
-    params = params.flatten(to=uuid.uuid4().hex)
+    params = params.flatten(to="_combine_bins.flat_dim")


Unless I missed something, I am not sure why we need to find a name here?
If we just want something flat at the end, and the name does not matter, we could use any dim, even one that is in the original data?
So couldn't we just use an empty string "" as the dim? (same in the case below)

What would that gain us?

Nothing from a functionality point of view. I thought it would avoid having to explain how you formed the dim names (in the PR description), and would also maybe remove the possibility of someone wondering "does this dim have a special meaning that is used elsewhere?".

But I think i'm thinking too much into it.
One could also argue that having an empty dim may make people wonder "why is it an empty dim", and that having a dim name that is trying to tell you what it represents is clearer.

I really don't have a strong opinion here, it was just a thought.

nvaytet · 2025-08-21T08:37:46Z

src/scipp/reduction.py



+def _make_extra_dim(avoid: Sequence[_O]) -> str:
+    used = set(chain(*(x.dims for x in avoid)))
+    for i in range(1000):
+        dim = f"_reduce.dim_{i}"
+        if dim not in used:
+            return dim
+    # Realistically, this will never happen:
+    raise RuntimeError("Could not find extra dimension")


I think that would work?
Alternatively, the suggestion I had in the other PR, namely dim = str(x.dims) would also be thread-safe...

Avoid making dims with uuid

177f9cc

SimonHeybrock reviewed Aug 21, 2025

View reviewed changes

nvaytet reviewed Aug 21, 2025

View reviewed changes

Use static dim name

5eed5cc

SimonHeybrock approved these changes Aug 22, 2025

View reviewed changes

jl-wynen merged commit ece3c8c into main Aug 22, 2025
4 checks passed

jl-wynen deleted the no-uuid-2 branch August 22, 2025 09:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make tmp dims with hard-coded strings instead of generator#3750

Make tmp dims with hard-coded strings instead of generator#3750
jl-wynen merged 2 commits intomainfrom
no-uuid-2

jl-wynen commented Aug 21, 2025 •

edited

Loading

Uh oh!

SimonHeybrock Aug 21, 2025

Uh oh!

nvaytet Aug 21, 2025

Uh oh!

jl-wynen Aug 21, 2025

Uh oh!

SimonHeybrock Aug 21, 2025

Uh oh!

jl-wynen Aug 22, 2025

Uh oh!

nvaytet Aug 21, 2025

Uh oh!

jl-wynen Aug 21, 2025

Uh oh!

nvaytet Aug 21, 2025 •

edited

Loading

Uh oh!

nvaytet Aug 21, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

jl-wynen commented Aug 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nvaytet Aug 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jl-wynen commented Aug 21, 2025 •

edited

Loading

nvaytet Aug 21, 2025 •

edited

Loading