Introduce TypeSignature::Comparable and update NullIf signature#13356
Introduce TypeSignature::Comparable and update NullIf signature#13356jayzhan211 merged 8 commits intoapache:mainfrom
NullIf signature#13356Conversation
Signed-off-by: Jay Zhan <[email protected]>
Signed-off-by: Jay Zhan <[email protected]>
|
@Omega359 ptal |
Omega359
left a comment
There was a problem hiding this comment.
I have concerns about the signature change causing queries that used to work to now fail. I am not certain that the signature change for nullif is for the better for that reason.
With respect to the TypeSignature::Boolean I left some comments.
| return Ok(t); | ||
| } | ||
| } else { | ||
| // TODO: Deprecate this branch after all signatures are well-supported (aka coercion are happend already) |
There was a problem hiding this comment.
| // TODO: Deprecate this branch after all signatures are well-supported (aka coercion are happend already) | |
| // TODO: Deprecate this branch after all signatures are well-supported (aka coercion has happened already) |
| TypeSignature::Boolean(number) => { | ||
| function_length_check(current_types.len(), *number)?; | ||
|
|
||
| // Find common boolean type amongs given types |
There was a problem hiding this comment.
| // Find common boolean type amongs given types | |
| // Find common boolean type amongst the given types |
| function_length_check(current_types.len(), *number)?; | ||
|
|
||
| // Find common boolean type amongs given types | ||
| let mut valid_type = current_types.first().unwrap().to_owned(); |
There was a problem hiding this comment.
Unless I'm mistaken there is only one possible valid type here - Boolean doesn't have multiple types, does it? If so, I don't see the need for this variable nor the code below the for loop. valid_type must be DataType::Boolean, no?
There was a problem hiding this comment.
This is to ensure the given types are all boolean, if it isn't we should return error here. If we got null, we return boolean
| 2, | ||
| SUPPORTED_NULLIF_TYPES.to_vec(), | ||
| signature: Signature::one_of( | ||
| // Hack: String is at the beginning so the return type is String if both args are Nulls |
There was a problem hiding this comment.
Not sure I'd call that a hack tbh
| ---- | ||
| 1 | ||
|
|
||
| query error DataFusion error: Error during planning: Internal error: Failed to match any signature, errors: Error during planning: The signature expected NativeType::String but received NativeType::Int64 |
There was a problem hiding this comment.
This used to work, I just ran this locally against v43. I can't see a reason why this should no longer be supported.
There was a problem hiding this comment.
select nullif(2, 'a');
This query failed in Postgres, I think we should return error for this query
| # TODO: support numeric string | ||
| # This query success in Postgres and DuckDB | ||
| query error DataFusion error: Error during planning: Internal error: Failed to match any signature, errors: Error during planning: The signature expected NativeType::String but received NativeType::Int64 | ||
| select nullif(2, '1'); |
There was a problem hiding this comment.
This also used to work.
query T
select nullif(2, '1');
----
2
Interesting that the type is text vs number but still, it did work.
There was a problem hiding this comment.
This is quite a hard issue to fix, since we can't tell the difference between Unknown type and String type currently
This query pass in main too but it should not.
query T
select nullif('1'::varchar, 2);
----
1
The change did have regression on nullif(2, '1'), but also fix nullif('1'::varchar, 2)
There was a problem hiding this comment.
I don't have strong opinions against this PR because it fixes nullif('1'::varchar, 2) and refines the TypeSignature. We could file a ticket for it and address it in a follow-up.
There was a problem hiding this comment.
Duckdb can run this query
D select nullif('2'::varchar, 2);
┌─────────────────────────────────┐
│ nullif(CAST('2' AS VARCHAR), 2) │
│ varchar │
├─────────────────────────────────┤
│ │
└─────────────────────────────────┘
😕
There was a problem hiding this comment.
To me what is important is not strictly what other systems support - those may/could be a guide to what datafusion will support - but whether the provided arguments to a signature can be losslessly converted to what the signature accepts and whether it logically makes sense to do so.
I personally would rather be lenient for what is accepted and do casting/coercion as required than to be strict and push the onus onto the user to do that. That's just me though, I don't know if that is the general consensus of the community. Perhaps we should file a discussion ticket with the options and decide?
There was a problem hiding this comment.
I just double checked on this branch and it seems good to me (supports things reasonably)
Running `target/debug/datafusion-cli`
DataFusion CLI v43.0.0
> select nullif(2, '1');
+----------------------------+
| nullif(Int64(2),Utf8("1")) |
+----------------------------+
| 2 |
+----------------------------+
1 row(s) fetched.
Elapsed 0.027 seconds.
> select nullif('1'::varchar, 2);
+----------------------------+
| nullif(Utf8("1"),Int64(2)) |
+----------------------------+
| 1 |
+----------------------------+
1 row(s) fetched.
Elapsed 0.007 seconds.Signed-off-by: jayzhan211 <[email protected]>
| TypeSignature::String(arg_count) => get_data_types(&NativeType::String) | ||
| .into_iter() | ||
| .map(|dt| vec![dt; *arg_count]) | ||
| .collect::<Vec<_>>(), |
| # TODO: support numeric string | ||
| # This query success in Postgres and DuckDB | ||
| query error DataFusion error: Error during planning: Internal error: Failed to match any signature, errors: Error during planning: The signature expected NativeType::String but received NativeType::Int64 | ||
| select nullif(2, '1'); |
There was a problem hiding this comment.
I don't have strong opinions against this PR because it fixes nullif('1'::varchar, 2) and refines the TypeSignature. We could file a ticket for it and address it in a follow-up.
Signed-off-by: jayzhan211 <[email protected]>
NullIf signatureNullIf signature
|
I revert to the previous function's behaviour since it makes more sense to me |
|
This is ready for review too |
alamb
left a comment
There was a problem hiding this comment.
Thank you @jayzhan211 -- I think this looks like a nice improvement.
I have some suggestions on improving the documentation to be more explicit about what TypeSignature::Comparable means but I also think we could implement that as a follow on
| function_length_check(current_types.len(), *num)?; | ||
| let mut target_type = current_types[0].to_owned(); | ||
| for data_type in current_types.iter().skip(1) { | ||
| if let Some(dt) = comparison_coercion_numeric(&target_type, data_type) { |
There was a problem hiding this comment.
It seems like TypeSignature::Comparable has different rules that, for example, = or < (it prefers numeric)
I think this is fine, but should probably be documented better / more explicitly as it might be expected
Also the coercion / defaulting to Utf8 might also be a good thing to document
| # TODO: support numeric string | ||
| # This query success in Postgres and DuckDB | ||
| query error DataFusion error: Error during planning: Internal error: Failed to match any signature, errors: Error during planning: The signature expected NativeType::String but received NativeType::Int64 | ||
| select nullif(2, '1'); |
There was a problem hiding this comment.
I just double checked on this branch and it seems good to me (supports things reasonably)
Running `target/debug/datafusion-cli`
DataFusion CLI v43.0.0
> select nullif(2, '1');
+----------------------------+
| nullif(Int64(2),Utf8("1")) |
+----------------------------+
| 2 |
+----------------------------+
1 row(s) fetched.
Elapsed 0.027 seconds.
> select nullif('1'::varchar, 2);
+----------------------------+
| nullif(Utf8("1"),Int64(2)) |
+----------------------------+
| 1 |
+----------------------------+
1 row(s) fetched.
Elapsed 0.007 seconds.
alamb
left a comment
There was a problem hiding this comment.
Thank you @jayzhan211 -- I think this looks like a nice improvement.
I have some suggestions on improving the documentation to be more explicit about what TypeSignature::Comparable means but I also think we could implement that as a follow on
| /// arguments like `vec![DataType::Int32]` or `vec![DataType::Float32]` | ||
| /// since i32 and f32 can be casted to f64 | ||
| Coercible(Vec<LogicalTypeRef>), | ||
| /// The number of arguments that are comparable |
There was a problem hiding this comment.
Would it be possible to define what comparable means in this context?
Signed-off-by: Jay Zhan <[email protected]>
Signed-off-by: Jay Zhan <[email protected]>
Which issue does this PR close?
Part of #13301
Closes #.
Rationale for this change
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?