Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
136 changes: 136 additions & 0 deletions datafusion/sql/tests/sql_integration.rs
Original file line number Diff line number Diff line change
Expand Up @@ -4641,6 +4641,28 @@ fn test_infer_types_from_join() {
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! I see your inferred type changes is in another issue. This part of the test shouldn't be removed in this issue, right? You can add new inferred type tests in #16019 along with your changes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed them because without the inferred type changes the tests will fail because of the validation linked below, which happen after we call plan.with_param_values(param_values).unwrap(); in the tests, thats one of the reason I needed to introduce the inferred type changes.

/// Verify parameter list length and type
pub fn verify(&self, expect: &[DataType]) -> Result<()> {
match self {
ParamValues::List(list) => {
// Verify if the number of params matches the number of values
if expect.len() != list.len() {
return _plan_err!(
"Expected {} parameters, got {}",
expect.len(),
list.len()
);
}

Copy link
Contributor

@qstommyshu qstommyshu May 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, I see what you mean. This test can only be corrected once you set up the infer type feature. But the development flow would be weird if both this two PR addresses the same issue.

I think it would be better if we move everything to your new PR, so the new PR would include infer types along with fixing/adding corresponding tests, and we only need to deal with one PR (closes #15577 and #16019). We can also merge this PR first then move on to #16019 if @alamb @kczimm prefers this way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've no preference here.

}

#[test]
fn test_prepare_statement_infer_types_from_join() {
let sql =
"PREPARE my_plan AS SELECT id, order_id FROM person JOIN orders ON id = customer_id and age = $1";

let plan = logical_plan(sql).unwrap();
assert_snapshot!(
plan,
@r#"
Prepare: "my_plan" []
Projection: person.id, orders.order_id
Inner Join: Filter: person.id = orders.customer_id AND person.age = $1
TableScan: person
TableScan: orders
"#
);

let actual_types = plan.get_parameter_types().unwrap();
let expected_types = HashMap::from([("$1".to_string(), Some(DataType::Int32))]);
assert_eq!(actual_types, expected_types);
}

#[test]
fn test_infer_types_from_predicate() {
let sql = "SELECT id, age FROM person WHERE age = $1";
Expand Down Expand Up @@ -4672,6 +4694,25 @@ fn test_infer_types_from_predicate() {
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same for here

}

#[test]
fn test_prepare_statement_infer_types_from_predicate() {
let sql = "PREPARE my_plan AS SELECT id, age FROM person WHERE age = $1";
let plan = logical_plan(sql).unwrap();
assert_snapshot!(
plan,
@r#"
Prepare: "my_plan" []
Projection: person.id, person.age
Filter: person.age = $1
TableScan: person
"#
);

let actual_types = plan.get_parameter_types().unwrap();
let expected_types = HashMap::from([("$1".to_string(), Some(DataType::Int32))]);
assert_eq!(actual_types, expected_types);
}

#[test]
fn test_infer_types_from_between_predicate() {
let sql = "SELECT id, age FROM person WHERE age BETWEEN $1 AND $2";
Expand Down Expand Up @@ -4707,6 +4748,29 @@ fn test_infer_types_from_between_predicate() {
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and here

}

#[test]
fn test_prepare_statement_infer_types_from_between_predicate() {
let sql = "PREPARE my_plan AS SELECT id, age FROM person WHERE age BETWEEN $1 AND $2";

let plan = logical_plan(sql).unwrap();
assert_snapshot!(
plan,
@r#"
Prepare: "my_plan" []
Copy link
Contributor Author

@brayanjuls brayanjuls Apr 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently when the logical plan is created it does not include the type inferred, this result in some issues when we try to call the function plan.with_param_values(param_values) as it tries to extract the type from the plan but it is empty. Are we expecting the plan to contain the type at this time? if so probably there is another bug we need to investigate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alamb @qstommyshu any advice on where to look to debug why the plan is not properly generating the inferred type?

Copy link
Contributor

@qstommyshu qstommyshu Apr 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @brayanjuls ,

the reason why this happen is there is no parameters being passed to the sql, but the sql itself contains positional arguments like $1 and $2, please refer to https://datafusion.apache.org/user-guide/sql/prepared_statements.html#inferred-types.

The simple way you can fix it is by adding the position arguments type to the sql like how it was done in test_prepare_statement_to_plan_one_param. If you want to dig into the details about it, you can check how planner parse PREPARE statement and transfer it into a logical plan.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but wouldn't that mean the type is not being inferred, but instead we are explicitly declaring it?

Thanks, I would check the planner parser.

Copy link
Contributor

@qstommyshu qstommyshu Apr 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't check the details for parsing "PREPARE" statement yet, but I think the type inference behaviour depends on how the planner parser is implemented, it could be either way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this PR is a good improvment -- it updates the tests to show the current behavior

We can update the behavior in another PR, which is what I think you are working on in

Projection: person.id, person.age
Filter: person.age BETWEEN $1 AND $2
TableScan: person
"#
);

let actual_types = plan.get_parameter_types().unwrap();
let expected_types = HashMap::from([
("$1".to_string(), Some(DataType::Int32)),
("$2".to_string(), Some(DataType::Int32)),
]);
assert_eq!(actual_types, expected_types);
}

#[test]
fn test_infer_types_subquery() {
let sql = "SELECT id, age FROM person WHERE age = (select max(age) from person where id = $1)";
Expand Down Expand Up @@ -4749,6 +4813,31 @@ fn test_infer_types_subquery() {
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and here

}

#[test]
fn test_prepare_statement_infer_types_subquery() {
let sql = "PREPARE my_plan AS SELECT id, age FROM person WHERE age = (select max(age) from person where id = $1)";

let plan = logical_plan(sql).unwrap();
assert_snapshot!(
plan,
@r#"
Prepare: "my_plan" []
Projection: person.id, person.age
Filter: person.age = (<subquery>)
Subquery:
Projection: max(person.age)
Aggregate: groupBy=[[]], aggr=[[max(person.age)]]
Filter: person.id = $1
TableScan: person
TableScan: person
"#
);

let actual_types = plan.get_parameter_types().unwrap();
let expected_types = HashMap::from([("$1".to_string(), Some(DataType::UInt32))]);
assert_eq!(actual_types, expected_types);
}

#[test]
fn test_update_infer() {
let sql = "update person set age=$1 where id=$2";
Expand Down Expand Up @@ -4786,6 +4875,30 @@ fn test_update_infer() {
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and here

}

#[test]
fn test_prepare_statement_update_infer() {
let sql = "PREPARE my_plan AS update person set age=$1 where id=$2";

let plan = logical_plan(sql).unwrap();
assert_snapshot!(
plan,
@r#"
Prepare: "my_plan" []
Dml: op=[Update] table=[person]
Projection: person.id AS id, person.first_name AS first_name, person.last_name AS last_name, $1 AS age, person.state AS state, person.salary AS salary, person.birth_date AS birth_date, person.😀 AS 😀
Filter: person.id = $2
TableScan: person
"#
);

let actual_types = plan.get_parameter_types().unwrap();
let expected_types = HashMap::from([
("$1".to_string(), Some(DataType::Int32)),
("$2".to_string(), Some(DataType::UInt32)),
]);
assert_eq!(actual_types, expected_types);
}

#[test]
fn test_insert_infer() {
let sql = "insert into person (id, first_name, last_name) values ($1, $2, $3)";
Expand Down Expand Up @@ -4824,6 +4937,29 @@ fn test_insert_infer() {
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and here

}

#[test]
fn test_prepare_statement_insert_infer() {
let sql = "PREPARE my_plan AS insert into person (id, first_name, last_name) values ($1, $2, $3)";
let plan = logical_plan(sql).unwrap();
assert_snapshot!(
plan,
@r#"
Prepare: "my_plan" []
Dml: op=[Insert Into] table=[person]
Projection: column1 AS id, column2 AS first_name, column3 AS last_name, CAST(NULL AS Int32) AS age, CAST(NULL AS Utf8) AS state, CAST(NULL AS Float64) AS salary, CAST(NULL AS Timestamp(Nanosecond, None)) AS birth_date, CAST(NULL AS Int32) AS 😀
Values: ($1, $2, $3)
"#
);

let actual_types = plan.get_parameter_types().unwrap();
let expected_types = HashMap::from([
("$1".to_string(), Some(DataType::UInt32)),
("$2".to_string(), Some(DataType::Utf8)),
("$3".to_string(), Some(DataType::Utf8)),
]);
assert_eq!(actual_types, expected_types);
}

#[test]
fn test_prepare_statement_to_plan_one_param() {
let sql = "PREPARE my_plan(INT) AS SELECT id, age FROM person WHERE age = $1";
Expand Down