Conversation
src/scipp/reduction.py
Outdated
|
|
||
|
|
||
| def _make_extra_dim(avoid: Sequence[_O]) -> str: | ||
| used = set(chain(*(x.dims for x in avoid))) | ||
| for i in range(1000): | ||
| dim = f"_reduce.dim_{i}" | ||
| if dim not in used: | ||
| return dim | ||
| # Realistically, this will never happen: | ||
| raise RuntimeError("Could not find extra dimension") |
There was a problem hiding this comment.
Why are we not just calling uuid once on import to make a unique dim?
There was a problem hiding this comment.
I think that would work?
Alternatively, the suggestion I had in the other PR, namely dim = str(x.dims) would also be thread-safe...
There was a problem hiding this comment.
Why are we not just calling uuid once on import to make a unique dim?
Could do. That would not be guaranteed to work but it would be highly likely.
Alternatively, the suggestion I had in the other PR, namely dim = str(x.dims) would also be thread-safe...
That would work but would lead to more generated labels in total. But probably well below 2^16.
Everyone seems to have their favourite solution. I don't really care which one we go with. Just pick one.
There was a problem hiding this comment.
I just felt that the iteration seemed wasteful. What is wrong with a single unique but fixed label? Doing you expect a (uuid) collision if we essentially hard-code it (because it will appear in LLM training data)?
There was a problem hiding this comment.
(because it will appear in LLM training data)?
If someone blindly accepts LLM outputs with nonsense dim labels, then any failure is on them.
I just figured, I'd reduce the risk of collisions. But that should be low enough with uuid4, so I can remove the iteration.
| dims=unchanged_dims, shape=unchanged_shape | ||
| ) | ||
| params = params.flatten(to=uuid.uuid4().hex) | ||
| params = params.flatten(to="_combine_bins.flat_dim") |
There was a problem hiding this comment.
Unless I missed something, I am not sure why we need to find a name here?
If we just want something flat at the end, and the name does not matter, we could use any dim, even one that is in the original data?
So couldn't we just use an empty string "" as the dim? (same in the case below)
There was a problem hiding this comment.
What would that gain us?
There was a problem hiding this comment.
Nothing from a functionality point of view. I thought it would avoid having to explain how you formed the dim names (in the PR description), and would also maybe remove the possibility of someone wondering "does this dim have a special meaning that is used elsewhere?".
But I think i'm thinking too much into it.
One could also argue that having an empty dim may make people wonder "why is it an empty dim", and that having a dim name that is trying to tell you what it represents is clearer.
I really don't have a strong opinion here, it was just a thought.
src/scipp/reduction.py
Outdated
|
|
||
|
|
||
| def _make_extra_dim(avoid: Sequence[_O]) -> str: | ||
| used = set(chain(*(x.dims for x in avoid))) | ||
| for i in range(1000): | ||
| dim = f"_reduce.dim_{i}" | ||
| if dim not in used: | ||
| return dim | ||
| # Realistically, this will never happen: | ||
| raise RuntimeError("Could not find extra dimension") |
There was a problem hiding this comment.
I think that would work?
Alternatively, the suggestion I had in the other PR, namely dim = str(x.dims) would also be thread-safe...
Fixes #3747.
Alternative to #3748
#3748 is not thread safe which may be a problem especially for beamlime. The specific usages of tmp dims in Scipp are in non-recursive functions. So using hard-coded dims should be fine.
I chose dim labels that contain
.because we encourage using valid Python identifiers which should reduce the risk of collisions with user provided labels. And I chose names starting with_to indicate that these are protected names.