[syntax-errors] detect duplicate keyword arguments by eduardorittner · Pull Request #17804 · astral-sh/ruff

eduardorittner · 2025-05-03T01:28:58Z

Summary

Part of #17412

Detects duplicate keyword arguments in function call.

Test Plan

Added inline tests to validate intended behavior

MichaReiser

I've a small perf comment but this otherwise looks good to me. I'll leave it to @ntBre to approve

MichaReiser · 2025-05-03T14:26:12Z

crates/ruff_python_parser/src/semantic_errors.rs

+        let mut unique_keyword_args = FxHashSet::default();
+        for key in args.keywords.iter() {
+            if let Some(ident) = &key.arg {
+                if !unique_keyword_args.insert(ident) {
+                    let range = ident.range();
+                    // test_err duplicate_keyword_args
+                    // def foo(x): ...
+                    // foo(x=1, x=2)
+                    // def baz(x, y, z): ...
+                    // baz(x, y=1, z=3, y=4)
+
+                    // test_ok non_duplicate_keyword_args
+                    // def foo(x): ...
+                    // foo(x=1)
+                    // def bar(x, y, z): ...
+                    // foo(x="a", y=1, z=True)
+                    Self::add_error(
+                        ctx,
+                        SemanticSyntaxErrorKind::DuplicateKeywordArgs(ident.to_string()),
+                        range,
+                    );
+                }
+            }
+        }


Nit: it would be nice if we can avoid allocating a hash set if we know that there are fewer than 2 keyword arguments.

let mut keyword_arguments = args.keywords.iter().filter_map(|key| &key.arg).peekable(); let Some(first) = keyword_arguments.next() else { return; }; if keyword_arguments.peek().is_none() { return; } let mut unique_args = FxHashSet::default(); unique_args.insert(first); for arg in keyword_arguments { if !unique_keyword_args.insert(ident) { let range = ident.range(); // test_err duplicate_keyword_args // def foo(x): ... // foo(x=1, x=2) // def baz(x, y, z): ... // baz(x, y=1, z=3, y=4) // test_ok non_duplicate_keyword_args // def foo(x): ... // foo(x=1) // def bar(x, y, z): ... // foo(x="a", y=1, z=True) Self::add_error( ctx, SemanticSyntaxErrorKind::DuplicateKeywordArgs(ident.to_string()), range, ); } }

Another alternative would be that we make the FxHashSet a field on the visitor and reuse it (we'd have to make sure that we always call clear()). This would ensure that we only allocate at most one hash set. I think I'd prefer that (because we could use it for other checks too)

yeah that makes sense! By visitor do you mean SemanticSyntaxChecker or one of InvalidExpressionVisitor, ReboundComprehensionVisitor or MatchPatternVisitor?

I think Micha means SemanticSyntaxChecker, which would also require passing a &mut self parameter to this function.

Yeah that's what I was wondering, in this case if we wanted to avoid clone, we'd have a HashSet<&Identifier> which means adding a lifetime to SemanticSyntaxChecker and most of the places it's used. The borrow checker was giving me some trouble but I'll take another shot at it

I got it to compile using unsafe like this

fn duplicate_keyword_args<Ctx: SemanticSyntaxContext>( &'a mut self, args: &ast::Arguments, ctx: &Ctx, ) { let args: &'a ast::Arguments = unsafe { std::mem::transmute(args) }; for key in &args.keywords { if let Some(ident) = &key.arg { if !self.keyword_args.insert(ident) { let range = ident.range(); Self::add_error( ctx, SemanticSyntaxErrorKind::DuplicateKeywordArgs(ident.to_string()), range, ); } } } self.keyword_args.clear(); }

The problem is that the borrow checker doesn't understand that calling clear() at the end of the function invalidates the references to &Arguments stored in the HashSet, and since mutable references are invariant this leads it to believe that &Arguments must outlive &mut self. We know that isn't necessary since any reference to &Arguments will always be cleared at the end of the function, so I used unsafe to coerce the &Arguments lifetime into the same as &mut self.

The problem with this approach is that

We are using unsafe, even though it seems pretty "safe" here, I'm not sure what ruff's stance to using unsafe is.

This raises lifetime errors in tests/fixtures.rs, these can probably be solved but would still require some work.

The other options I can think of are:

Leaving it as is, allocating and deallocating a hashset on every function call

Reuse the hashset, but make it a FxHashSet<ast::Identifier> and call clone() on every argument

Both of these options allocate more memory than necessary, though I'm not sure which is costlier (though I think cloneing would most times be cheaper I haven't actually tested it).

Let me know what your thoughts are.

Funny enough I did try something similar to this but couldn't get it to work, but now it worked, thanks! Now I should just need to add some lifetime annotations where SemanticSyntaxChecker is used elsewhere in the code base

Yeah I mean I can't do it, changing this lifetime raised a bunch of lifetime errors (close to 80) on other parts of the codebase, mostly the Visitor implementations on with_semantic_checker calls. Maybe there's a way to fix them using some specific annotations, but I couldn't figure it out. So I'll leave it like this for now.

Thanks for trying. I think this is okay. codspeed is only showing a few ~1% performance regressions, which I think is pretty typical for any new rule. I think we just need to resolve the question about the duplicate diagnostic (and the clippy errors) and then we can land this.

It may not overlap 100%, but there is an open issue in ty (astral-sh/ty#119) to avoid emitting type inference diagnostics in the presence of syntax errors, which might be enough to ignore the duplication issue for the sake of this PR.

I think this goes somewhat beyond #119. At least, what I had in mind for it. The goal for #119 is to avoid type checker errors for nodes that have non semantic or version related syntax errors.

github-actions · 2025-05-03T14:35:50Z

`ruff-ecosystem` results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

Formatter (stable)

ℹ️ ecosystem check encountered format errors. (no format changes; 1 project error)

mesonbuild/meson-python (error)

warning: Detected debug build without --no-cache.
error: Failed to read tests/packages/symlinks/baz.py: No such file or directory (os error 2)
error: Failed to read tests/packages/symlinks/qux.py: No such file or directory (os error 2)

Formatter (preview)

ℹ️ ecosystem check encountered format errors. (no format changes; 1 project error)

mesonbuild/meson-python (error)

ruff format --preview

warning: Detected debug build without --no-cache.
error: Failed to read tests/packages/symlinks/baz.py: No such file or directory (os error 2)
error: Failed to read tests/packages/symlinks/qux.py: No such file or directory (os error 2)

eduardorittner · 2025-05-04T13:31:52Z

It looks like there already is an error type for duplicate keyword arguments in red_knot. I ran into this when adding integration tests into semantic_syntax_errors.md following #17526. The tests trigger the ParameterAlreadyAssigned error instead of the new DuplicateKeywordArgs.

ruff/crates/ty_python_semantic/src/types/call/bind.rs

Lines 1480 to 1484 in fe4051b

    
               /// Multiple arguments were provided for a single parameter. 
        
               ParameterAlreadyAssigned { 
        
                   argument_index: Option<usize>, 
        
                   parameter: ParameterContext, 
        
               },

How should we proceed?

ntBre

Thanks, this looks good to me!

It looks like there already is an error type for duplicate keyword arguments in red_knot. I ran into this when adding integration tests into semantic_syntax_errors.md following #17526. The tests trigger the ParameterAlreadyAssigned error instead of the new DuplicateKeywordArgs.

I think we may want to remove the other check and emit this as a syntax error. What do you think, @MichaReiser?

MichaReiser · 2025-05-05T16:10:05Z

I don't think we can simply replace them because ty checks more. For example:

# error: 13 [missing-argument] "No argument provided for required parameter `x` of function `f`"
# error: 18 [parameter-already-assigned] "Multiple values provided for parameter `x` of function `f`"
reveal_type(f(1, x=2))  # revealed: int

I think we have two options here:

We skip this check for ty
We change the check in ty to not emit a diagnostic for the simple case where x is repeated on the call site

@dcreager what's your take on this. You probably know the call checking in ty the best

ntBre · 2025-05-05T16:23:32Z

Ah I see, and that case (f(1, x=2)) is not actually a syntax error, so we can't just extend the semantic error check either. I'm happy with either of those options, whichever the ty team prefers!

We could just add a filter here, if that's the approach we want to take:

ruff/crates/ty_python_semantic/src/types.rs

Lines 97 to 102 in ea62fc9

    
           diagnostics.extend_diagnostics( 
        
               index 
        
                   .semantic_syntax_errors() 
        
                   .iter() 
        
                   .map(|error| create_semantic_syntax_diagnostic(file, error)), 
        
           );

MichaReiser · 2025-05-05T16:43:01Z

My preference (because of consistency) would be to change the check during type inference to skip over errors that are known syntax errors. Unless this is hard for some reason

Detects duplicate keyword (named) arguments passed to functions. The hashmap allocation is reused between function calls. Ideally it would be of type FxHashSet<&ast::Identifier> to avoid cloning, I just wasn't able to do this due to the lifetime errors that arised.

eduardorittner requested review from MichaReiser and dhruvmanila as code owners May 3, 2025 01:28

eduardorittner force-pushed the dup-keyargs branch from c52a8f6 to 0444011 Compare May 3, 2025 02:12

MichaReiser requested a review from ntBre May 3, 2025 14:20

MichaReiser approved these changes May 3, 2025

View reviewed changes

MichaReiser self-requested a review May 3, 2025 14:26

ntBre approved these changes May 5, 2025

View reviewed changes

eduardorittner force-pushed the dup-keyargs branch from 0444011 to 3c2993b Compare May 14, 2025 23:16

eduardorittner force-pushed the dup-keyargs branch from 3c2993b to ff59236 Compare May 14, 2025 23:21

ntBre mentioned this pull request Dec 4, 2025

[syntax-errors] Remaining syntax errors raised by the compiler #17412

Open

14 tasks

Comments

Conversation

eduardorittner commented May 3, 2025

Summary

Test Plan

Uh oh!

MichaReiser left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented May 3, 2025

ruff-ecosystem results

Linter (stable)

Linter (preview)

Formatter (stable)

Formatter (preview)

Uh oh!

eduardorittner commented May 4, 2025

Uh oh!

ntBre left a comment

Choose a reason for hiding this comment

Uh oh!

MichaReiser commented May 5, 2025

Uh oh!

ntBre commented May 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MichaReiser commented May 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

`ruff-ecosystem` results

ntBre commented May 5, 2025 •

edited

Loading