Skip to content

feat:implement sql style 'ends_with' and 'instr' string function#8862

Merged
alamb merged 2 commits intoapache:mainfrom
zy-kkk:string_fun
Jan 23, 2024
Merged

feat:implement sql style 'ends_with' and 'instr' string function#8862
alamb merged 2 commits intoapache:mainfrom
zy-kkk:string_fun

Conversation

@zy-kkk
Copy link
Member

@zy-kkk zy-kkk commented Jan 14, 2024

Which issue does this PR close?

Closes #.

Rationale for this change

instr: https://dev.mysql.com/doc/refman/8.0/en/string-functions.html#function_instr
ends_with: Refer to the existing starts_with function

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added logical-expr Logical plan and expressions physical-expr Changes to the physical-expr crates core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) labels Jan 14, 2024
/// Returns true if string ends with suffix.
/// ends_with('alphabet', 'abet') = 't'
pub fn ends_with<T: OffsetSizeTrait>(args: &[ArrayRef]) -> Result<ArrayRef> {
let string_array = as_generic_string_array::<T>(&args[0])?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean that I can call this library function directly in the code or do I not need to implement this function?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean that I can call this library function directly in the code or do I not need to implement this function?

Yes, I think the body of thus function could be reduced to something like

Suggested change
let string_array = as_generic_string_array::<T>(&args[0])?;
compute::comparison::ends_with(&args[0])

BuiltinScalarFunction::DateBin => Volatility::Immutable,
BuiltinScalarFunction::EndsWith => Volatility::Immutable,
BuiltinScalarFunction::InitCap => Volatility::Immutable,
BuiltinScalarFunction::InStr => Volatility::Immutable,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the function name we implement need to follow postgresql? If so, I will change the name of this function

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for this PR @zy-kkk -- it is very well tested and commented 🦾 .

It looks quite close to me. The only thing I think it needs prior to merging is to address the potential use of an arrow kernel rather than reimplementation for ends_with but otherwise it is ready to go 🚀

Thanks again

/// Returns true if string ends with suffix.
/// ends_with('alphabet', 'abet') = 't'
pub fn ends_with<T: OffsetSizeTrait>(args: &[ArrayRef]) -> Result<ArrayRef> {
let string_array = as_generic_string_array::<T>(&args[0])?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean that I can call this library function directly in the code or do I not need to implement this function?

Yes, I think the body of thus function could be reduced to something like

Suggested change
let string_array = as_generic_string_array::<T>(&args[0])?;
compute::comparison::ends_with(&args[0])

Boolean,
BooleanArray
);
test_function!(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯 for testing with NULL

@zy-kkk
Copy link
Member Author

zy-kkk commented Jan 22, 2024

Hello @alamb, first of all thank you for your review of my PR,
I have used the arrow comparison function for the ends_with function implementation, and by the way changed starts_with to this.
For the instr function, I added more tests, but for this function, its implementation in MySQL is: instr('Helloworld', 'world') = 6, and for postgresql, it is, position('world' in 'Helloworld') = 6, which one do you want me to use, please tell me, currently I still retain the MySQL design
Thanks again for your review

@alamb
Copy link
Contributor

alamb commented Jan 22, 2024

Hello @alamb, first of all thank you for your review of my PR, I have used the arrow comparison function for the ends_with function implementation, and by the way changed starts_with to this. For the instr function, I added more tests, but for this function, its implementation in MySQL is: instr('Helloworld', 'world') = 6, and for postgresql, it is, position('world' in 'Helloworld') = 6, which one do you want me to use, please tell me, currently I still retain the MySQL design Thanks again for your review

Thank you @zy-kkk -- I think we try to follow the postgres style, but making the function packages as compatible as possible is a good thing.

In this case, perhaps you can add an alias for instr so it can also be called with position: https://docs.rs/datafusion/latest/datafusion/logical_expr/enum.BuiltinScalarFunction.html#method.aliases

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much @zy-kkk -- I reviewed this PR and I think it looks quite good.

It would be nice to add a position alias, but also think we can do that as a follow on PR as well

_ => None,
})
.collect::<BooleanArray>();
let result = arrow::compute::kernels::comparison::starts_with(left, right)?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@zy-kkk
Copy link
Member Author

zy-kkk commented Jan 23, 2024

Thank you very much @zy-kkk -- I reviewed this PR and I think it looks quite good.

It would be nice to add a position alias, but also think we can do that as a follow on PR as well

Thank you, @alamb, I will continue to modify this PR so that position('world' in 'Helloworld') = 6 can be run to be compatible with postgresql

@zy-kkk
Copy link
Member Author

zy-kkk commented Jan 23, 2024

Thank you very much @zy-kkk -- I reviewed this PR and I think it looks quite good.
It would be nice to add a position alias, but also think we can do that as a follow on PR as well

Thank you, @alamb, I will continue to modify this PR so that position('world' in 'Helloworld') = 6 can be run to be compatible with postgresql

Hi @alamb, I tested changing instr to position. If I want to support the syntax position('world' in 'Helloworld') = 6, I need to do it at https://github.com/sqlparser-rs/sqlparser-rs Add a syntax parsing in , which may be like adding the overlay function in apache/datafusion-sqlparser-rs#594. What do you suggest?

@alamb
Copy link
Contributor

alamb commented Jan 23, 2024

If I want to support the syntax position('world' in 'Helloworld') = 6, I need to do it at https://github.com/sqlparser-rs/sqlparser-rs Add a syntax parsing in , which may be like adding the overlay function in apache/datafusion-sqlparser-rs#594. What do you suggest?

Hi @zy-kkk -- I didn't realize there was special syntax. Yes in order to support that special syntax we would need a change in sqlparser.

What I suggest in this case is that we file a follow on ticket to support posgresql style and merge this PR. I will file the ticket shortly

@alamb
Copy link
Contributor

alamb commented Jan 23, 2024

I filed #8969 to track adding the position function

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again @zy-kkk -- this is a really nice addition

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate logical-expr Logical plan and expressions physical-expr Changes to the physical-expr crates sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants