improve performance of regexp_count#13364
improve performance of regexp_count#13364comphead merged 3 commits intoapache:mainfrom Dimchikkk:main
Conversation
comphead
left a comment
There was a problem hiding this comment.
Thanks @Dimchikkk for your contribution.
Do you mean the Entry API misbehaved returning Vacant all the time and forced the regexp pattern to be recompiled?
Hi @comphead , |
|
Please rebase from latest main to avoid the CI failure and personally I like the numbers |
|
@comphead I actually found the root cause... it was the cloning of regex that is expensive. Now the numbers even more sexy: |
comphead
left a comment
There was a problem hiding this comment.
Thanks @Dimchikkk its great first PR 👍
Since this is a first I'll wait for another member to approve and we can merge it.
@Dandandan if you dont mind to approve?
Yeah, that makes much more sense |
|
Nice find! |
|
Thank you guys, now I am wondering why other regexp functions slower than regexp_count :) |
That would be a great thing to check... one thing I saw in the arrow-rs kernels some (string) cloning is happening. Would be great to check & improve! |
|
I'm wondering if other |
Would be nice if you can check really quick other regexp functions if they can be optimized the same way |
* improve performance of regexp_count * fix clippy * collect with Int64Array to eliminate one temp Vec --------- Co-authored-by: Dima <[email protected]>

Which issue does this PR close?
Closes #13011
Rationale for this change
regexp_count becomes the fastest from regexp functions :)
Are these changes tested?
Are there any user-facing changes?
No