Skip to content

Bring name tracker fix#82

Merged
LiaCastaneda merged 1 commit intobranch-52from
lia/bring-name-tracker-fx
Feb 26, 2026
Merged

Bring name tracker fix#82
LiaCastaneda merged 1 commit intobranch-52from
lia/bring-name-tracker-fx

Conversation

@LiaCastaneda
Copy link

Brings apache#19856. This will be included in v53.

- Closes apache#17508

The previous implementation used UUID-based aliasing as a workaround to
prevent duplicate names for literals in Substrait plans. This approach
had several drawbacks:
- Non-deterministic plan names that made testing difficult (requiring
UUID regex filters)
- Only addressed literal naming conflicts, not the broader issue of name
deduplication
- Added unnecessary dependency on the `uuid` crate
- Didn't properly handle cases where the same qualified name could
appear with different schema representations

  1. Enhanced NameTracker: Refactored to detect two types of conflicts:
- Duplicate schema names: Tracked via schema_name() to prevent
validate_unique_names failures (e.g., two Utf8(NULL) literals)
- Ambiguous references: Tracked via qualified_name() to prevent
DFSchema::check_names failures when a qualified field (e.g.,
left.Utf8(NULL)) and unqualified field (e.g., Utf8(NULL)) share the same
column name
2. **Removed UUID dependency**: Eliminated the `uuid` crate from
`datafusion/substrait`
3. **Removed literal-specific aliasing**: The UUID-based workaround in
`project_rel.rs` is no longer needed as the improved NameTracker handles
all naming conflicts consistently
4. **Deterministic naming**: Name conflicts now use predictable
`__temp__N` suffixes instead of random UUIDs

Note: This doesn't fully fix all the issues in apache#17508 which allow some
special casing of `CAST` which are not included here.

Yes:
- Updated snapshot tests to reflect the new deterministic naming (e.g.,
`Utf8("people")__temp__0` instead of UUID-based names)
- Modified some roundtrip tests to verify semantic equivalence (schema
matching and execution) rather than exact string matching, which is more
robust
- All existing integration tests pass with the new naming scheme

Minimal. The generated plan names are now deterministic and more
readable (using `__temp__N` suffixes instead of UUIDs), but this is
primarily an internal representation change. The functional behavior and
query results remain unchanged.

(cherry picked from commit d59cdfe)
@LiaCastaneda LiaCastaneda marked this pull request as ready for review February 26, 2026 10:45
@LiaCastaneda LiaCastaneda merged commit 7ed98ff into branch-52 Feb 26, 2026
60 of 61 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants