[syntax-errors] detect duplicate keyword arguments#17804
[syntax-errors] detect duplicate keyword arguments#17804eduardorittner wants to merge 1 commit intoastral-sh:mainfrom
Conversation
c52a8f6 to
0444011
Compare
MichaReiser
left a comment
There was a problem hiding this comment.
I've a small perf comment but this otherwise looks good to me. I'll leave it to @ntBre to approve
| let mut unique_keyword_args = FxHashSet::default(); | ||
| for key in args.keywords.iter() { | ||
| if let Some(ident) = &key.arg { | ||
| if !unique_keyword_args.insert(ident) { | ||
| let range = ident.range(); | ||
| // test_err duplicate_keyword_args | ||
| // def foo(x): ... | ||
| // foo(x=1, x=2) | ||
| // def baz(x, y, z): ... | ||
| // baz(x, y=1, z=3, y=4) | ||
|
|
||
| // test_ok non_duplicate_keyword_args | ||
| // def foo(x): ... | ||
| // foo(x=1) | ||
| // def bar(x, y, z): ... | ||
| // foo(x="a", y=1, z=True) | ||
| Self::add_error( | ||
| ctx, | ||
| SemanticSyntaxErrorKind::DuplicateKeywordArgs(ident.to_string()), | ||
| range, | ||
| ); | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Nit: it would be nice if we can avoid allocating a hash set if we know that there are fewer than 2 keyword arguments.
let mut keyword_arguments = args.keywords.iter().filter_map(|key| &key.arg).peekable();
let Some(first) = keyword_arguments.next() else {
return;
};
if keyword_arguments.peek().is_none() {
return;
}
let mut unique_args = FxHashSet::default();
unique_args.insert(first);
for arg in keyword_arguments {
if !unique_keyword_args.insert(ident) {
let range = ident.range();
// test_err duplicate_keyword_args
// def foo(x): ...
// foo(x=1, x=2)
// def baz(x, y, z): ...
// baz(x, y=1, z=3, y=4)
// test_ok non_duplicate_keyword_args
// def foo(x): ...
// foo(x=1)
// def bar(x, y, z): ...
// foo(x="a", y=1, z=True)
Self::add_error(
ctx,
SemanticSyntaxErrorKind::DuplicateKeywordArgs(ident.to_string()),
range,
);
}
}Another alternative would be that we make the FxHashSet a field on the visitor and reuse it (we'd have to make sure that we always call clear()). This would ensure that we only allocate at most one hash set. I think I'd prefer that (because we could use it for other checks too)
There was a problem hiding this comment.
yeah that makes sense! By visitor do you mean SemanticSyntaxChecker or one of InvalidExpressionVisitor, ReboundComprehensionVisitor or MatchPatternVisitor?
There was a problem hiding this comment.
I think Micha means SemanticSyntaxChecker, which would also require passing a &mut self parameter to this function.
There was a problem hiding this comment.
Yeah that's what I was wondering, in this case if we wanted to avoid clone, we'd have a HashSet<&Identifier> which means adding a lifetime to SemanticSyntaxChecker and most of the places it's used. The borrow checker was giving me some trouble but I'll take another shot at it
There was a problem hiding this comment.
I got it to compile using unsafe like this
fn duplicate_keyword_args<Ctx: SemanticSyntaxContext>(
&'a mut self,
args: &ast::Arguments,
ctx: &Ctx,
) {
let args: &'a ast::Arguments = unsafe { std::mem::transmute(args) };
for key in &args.keywords {
if let Some(ident) = &key.arg {
if !self.keyword_args.insert(ident) {
let range = ident.range();
Self::add_error(
ctx,
SemanticSyntaxErrorKind::DuplicateKeywordArgs(ident.to_string()),
range,
);
}
}
}
self.keyword_args.clear();
}The problem is that the borrow checker doesn't understand that calling clear() at the end of the function invalidates the references to &Arguments stored in the HashSet, and since mutable references are invariant this leads it to believe that &Arguments must outlive &mut self. We know that isn't necessary since any reference to &Arguments will always be cleared at the end of the function, so I used unsafe to coerce the &Arguments lifetime into the same as &mut self.
The problem with this approach is that
- We are using unsafe, even though it seems pretty "safe" here, I'm not sure what ruff's stance to using unsafe is.
- This raises lifetime errors in
tests/fixtures.rs, these can probably be solved but would still require some work.
The other options I can think of are:
- Leaving it as is, allocating and deallocating a hashset on every function call
- Reuse the hashset, but make it a
FxHashSet<ast::Identifier>and callclone()on every argument
Both of these options allocate more memory than necessary, though I'm not sure which is costlier (though I think cloneing would most times be cheaper I haven't actually tested it).
Let me know what your thoughts are.
There was a problem hiding this comment.
Funny enough I did try something similar to this but couldn't get it to work, but now it worked, thanks! Now I should just need to add some lifetime annotations where SemanticSyntaxChecker is used elsewhere in the code base
There was a problem hiding this comment.
Yeah I mean I can't do it, changing this lifetime raised a bunch of lifetime errors (close to 80) on other parts of the codebase, mostly the Visitor implementations on with_semantic_checker calls. Maybe there's a way to fix them using some specific annotations, but I couldn't figure it out. So I'll leave it like this for now.
There was a problem hiding this comment.
Thanks for trying. I think this is okay. codspeed is only showing a few ~1% performance regressions, which I think is pretty typical for any new rule. I think we just need to resolve the question about the duplicate diagnostic (and the clippy errors) and then we can land this.
There was a problem hiding this comment.
It may not overlap 100%, but there is an open issue in ty (astral-sh/ty#119) to avoid emitting type inference diagnostics in the presence of syntax errors, which might be enough to ignore the duplication issue for the sake of this PR.
|
|
It looks like there already is an error type for duplicate keyword arguments in red_knot. I ran into this when adding integration tests into ruff/crates/ty_python_semantic/src/types/call/bind.rs Lines 1480 to 1484 in fe4051b How should we proceed? |
ntBre
left a comment
There was a problem hiding this comment.
Thanks, this looks good to me!
It looks like there already is an error type for duplicate keyword arguments in red_knot. I ran into this when adding integration tests into semantic_syntax_errors.md following #17526. The tests trigger the ParameterAlreadyAssigned error instead of the new DuplicateKeywordArgs.
I think we may want to remove the other check and emit this as a syntax error. What do you think, @MichaReiser?
|
I don't think we can simply replace them because ty checks more. For example: # error: 13 [missing-argument] "No argument provided for required parameter `x` of function `f`"
# error: 18 [parameter-already-assigned] "Multiple values provided for parameter `x` of function `f`"
reveal_type(f(1, x=2)) # revealed: intI think we have two options here:
@dcreager what's your take on this. You probably know the call checking in ty the best |
|
Ah I see, and that case ( We could just add a filter here, if that's the approach we want to take: ruff/crates/ty_python_semantic/src/types.rs Lines 97 to 102 in ea62fc9 |
|
My preference (because of consistency) would be to change the check during type inference to skip over errors that are known syntax errors. Unless this is hard for some reason |
0444011 to
3c2993b
Compare
Detects duplicate keyword (named) arguments passed to functions. The hashmap allocation is reused between function calls. Ideally it would be of type FxHashSet<&ast::Identifier> to avoid cloning, I just wasn't able to do this due to the lifetime errors that arised.
3c2993b to
ff59236
Compare
Summary
Part of #17412
Detects duplicate keyword arguments in function call.
Test Plan
Added inline tests to validate intended behavior