Skip to content

Comments

[syntax-errors] detect duplicate keyword arguments#17804

Open
eduardorittner wants to merge 1 commit intoastral-sh:mainfrom
eduardorittner:dup-keyargs
Open

[syntax-errors] detect duplicate keyword arguments#17804
eduardorittner wants to merge 1 commit intoastral-sh:mainfrom
eduardorittner:dup-keyargs

Conversation

@eduardorittner
Copy link

Summary

Part of #17412

Detects duplicate keyword arguments in function call.

Test Plan

Added inline tests to validate intended behavior

Copy link
Member

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've a small perf comment but this otherwise looks good to me. I'll leave it to @ntBre to approve

Comment on lines 521 to 563
let mut unique_keyword_args = FxHashSet::default();
for key in args.keywords.iter() {
if let Some(ident) = &key.arg {
if !unique_keyword_args.insert(ident) {
let range = ident.range();
// test_err duplicate_keyword_args
// def foo(x): ...
// foo(x=1, x=2)
// def baz(x, y, z): ...
// baz(x, y=1, z=3, y=4)

// test_ok non_duplicate_keyword_args
// def foo(x): ...
// foo(x=1)
// def bar(x, y, z): ...
// foo(x="a", y=1, z=True)
Self::add_error(
ctx,
SemanticSyntaxErrorKind::DuplicateKeywordArgs(ident.to_string()),
range,
);
}
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: it would be nice if we can avoid allocating a hash set if we know that there are fewer than 2 keyword arguments.

let mut keyword_arguments = args.keywords.iter().filter_map(|key| &key.arg).peekable();

let Some(first) = keyword_arguments.next() else {
	return;
};


if keyword_arguments.peek().is_none() {
	return;
}

let mut unique_args = FxHashSet::default();
unique_args.insert(first);

for arg in keyword_arguments {
	if !unique_keyword_args.insert(ident) {
	  let range = ident.range();
	  // test_err duplicate_keyword_args
	  // def foo(x): ...
	  // foo(x=1, x=2)
	  // def baz(x, y, z): ...
	  // baz(x, y=1, z=3, y=4)
	
	  // test_ok non_duplicate_keyword_args
	  // def foo(x): ...
	  // foo(x=1)
	  // def bar(x, y, z): ...
	  // foo(x="a", y=1, z=True)
	  Self::add_error(
	      ctx,
	      SemanticSyntaxErrorKind::DuplicateKeywordArgs(ident.to_string()),
	      range,
	  );
	}
}

Another alternative would be that we make the FxHashSet a field on the visitor and reuse it (we'd have to make sure that we always call clear()). This would ensure that we only allocate at most one hash set. I think I'd prefer that (because we could use it for other checks too)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah that makes sense! By visitor do you mean SemanticSyntaxChecker or one of InvalidExpressionVisitor, ReboundComprehensionVisitor or MatchPatternVisitor?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Micha means SemanticSyntaxChecker, which would also require passing a &mut self parameter to this function.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that's what I was wondering, in this case if we wanted to avoid clone, we'd have a HashSet<&Identifier> which means adding a lifetime to SemanticSyntaxChecker and most of the places it's used. The borrow checker was giving me some trouble but I'll take another shot at it

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got it to compile using unsafe like this

    fn duplicate_keyword_args<Ctx: SemanticSyntaxContext>(
        &'a mut self,
        args: &ast::Arguments,
        ctx: &Ctx,
    ) {
        let args: &'a ast::Arguments = unsafe { std::mem::transmute(args) };

        for key in &args.keywords {
            if let Some(ident) = &key.arg {
                if !self.keyword_args.insert(ident) {
                    let range = ident.range();

                    Self::add_error(
                        ctx,
                        SemanticSyntaxErrorKind::DuplicateKeywordArgs(ident.to_string()),
                        range,
                    );
                }
            }
        }

        self.keyword_args.clear();
    }

The problem is that the borrow checker doesn't understand that calling clear() at the end of the function invalidates the references to &Arguments stored in the HashSet, and since mutable references are invariant this leads it to believe that &Arguments must outlive &mut self. We know that isn't necessary since any reference to &Arguments will always be cleared at the end of the function, so I used unsafe to coerce the &Arguments lifetime into the same as &mut self.

The problem with this approach is that

  1. We are using unsafe, even though it seems pretty "safe" here, I'm not sure what ruff's stance to using unsafe is.
  2. This raises lifetime errors in tests/fixtures.rs, these can probably be solved but would still require some work.

The other options I can think of are:

  1. Leaving it as is, allocating and deallocating a hashset on every function call
  2. Reuse the hashset, but make it a FxHashSet<ast::Identifier> and call clone() on every argument

Both of these options allocate more memory than necessary, though I'm not sure which is costlier (though I think cloneing would most times be cheaper I haven't actually tested it).

Let me know what your thoughts are.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Funny enough I did try something similar to this but couldn't get it to work, but now it worked, thanks! Now I should just need to add some lifetime annotations where SemanticSyntaxChecker is used elsewhere in the code base

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I mean I can't do it, changing this lifetime raised a bunch of lifetime errors (close to 80) on other parts of the codebase, mostly the Visitor implementations on with_semantic_checker calls. Maybe there's a way to fix them using some specific annotations, but I couldn't figure it out. So I'll leave it like this for now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for trying. I think this is okay. codspeed is only showing a few ~1% performance regressions, which I think is pretty typical for any new rule. I think we just need to resolve the question about the duplicate diagnostic (and the clippy errors) and then we can land this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may not overlap 100%, but there is an open issue in ty (astral-sh/ty#119) to avoid emitting type inference diagnostics in the presence of syntax errors, which might be enough to ignore the duplication issue for the sake of this PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this goes somewhat beyond #119. At least, what I had in mind for it. The goal for #119 is to avoid type checker errors for nodes that have non semantic or version related syntax errors.

@MichaReiser MichaReiser self-requested a review May 3, 2025 14:26
@github-actions
Copy link
Contributor

github-actions bot commented May 3, 2025

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

Formatter (stable)

ℹ️ ecosystem check encountered format errors. (no format changes; 1 project error)

mesonbuild/meson-python (error)

warning: Detected debug build without --no-cache.
error: Failed to read tests/packages/symlinks/baz.py: No such file or directory (os error 2)
error: Failed to read tests/packages/symlinks/qux.py: No such file or directory (os error 2)

Formatter (preview)

ℹ️ ecosystem check encountered format errors. (no format changes; 1 project error)

mesonbuild/meson-python (error)

ruff format --preview

warning: Detected debug build without --no-cache.
error: Failed to read tests/packages/symlinks/baz.py: No such file or directory (os error 2)
error: Failed to read tests/packages/symlinks/qux.py: No such file or directory (os error 2)

@eduardorittner
Copy link
Author

It looks like there already is an error type for duplicate keyword arguments in red_knot. I ran into this when adding integration tests into semantic_syntax_errors.md following #17526. The tests trigger the ParameterAlreadyAssigned error instead of the new DuplicateKeywordArgs.

/// Multiple arguments were provided for a single parameter.
ParameterAlreadyAssigned {
argument_index: Option<usize>,
parameter: ParameterContext,
},

How should we proceed?

Copy link
Contributor

@ntBre ntBre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this looks good to me!

It looks like there already is an error type for duplicate keyword arguments in red_knot. I ran into this when adding integration tests into semantic_syntax_errors.md following #17526. The tests trigger the ParameterAlreadyAssigned error instead of the new DuplicateKeywordArgs.

I think we may want to remove the other check and emit this as a syntax error. What do you think, @MichaReiser?

@MichaReiser
Copy link
Member

I don't think we can simply replace them because ty checks more. For example:

# error: 13 [missing-argument] "No argument provided for required parameter `x` of function `f`"
# error: 18 [parameter-already-assigned] "Multiple values provided for parameter `x` of function `f`"
reveal_type(f(1, x=2))  # revealed: int

I think we have two options here:

  • We skip this check for ty
  • We change the check in ty to not emit a diagnostic for the simple case where x is repeated on the call site

@dcreager what's your take on this. You probably know the call checking in ty the best

@ntBre
Copy link
Contributor

ntBre commented May 5, 2025

Ah I see, and that case (f(1, x=2)) is not actually a syntax error, so we can't just extend the semantic error check either. I'm happy with either of those options, whichever the ty team prefers!

We could just add a filter here, if that's the approach we want to take:

diagnostics.extend_diagnostics(
index
.semantic_syntax_errors()
.iter()
.map(|error| create_semantic_syntax_diagnostic(file, error)),
);

@MichaReiser
Copy link
Member

My preference (because of consistency) would be to change the check during type inference to skip over errors that are known syntax errors. Unless this is hard for some reason

Detects duplicate keyword (named) arguments passed to functions. The
hashmap allocation is reused between function calls. Ideally it would be
of type FxHashSet<&ast::Identifier> to avoid cloning, I just wasn't able
to do this due to the lifetime errors that arised.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants