-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Description
New SQL functions are usually one of the most proper parts for newcomers to try to participate in Doris' development. For future reference, we'd like to list the basic processes for implementing a new function in Doris.
How to Begin
If you are new for Apache Doris, you can follow https://doris.apache.org/zh-CN/community/source-install/compilation-with-ldb-toolchain to build your development environment. Then follow https://doris.apache.org/zh-CN/docs/install/deploy-manually/integrated-storage-compute-deploy-manually to build your cluster after building succeed. For more utils may be useful, see https://doris.apache.org/zh-CN/community/developer-guide/be-vscode-dev and other docs in same directory.
What to do
To implement a new SQL function, here's what you need to write in your PR:
- The function implementation and registration in BE
- The function signature and visitor for nereids planner in FE
- The constant fold implementation in FE if possible. just like what https://github.com/apache/doris/pull/40744/files did in
functions/executable/NumericArithmetic.java. - A function docs PR in https://github.com/apache/doris-website must follow our newest docs specification. See https://github.com/apache/doris-website/pull/1992/files for an example. Before we merge code PR, there's must a ready document PR linked in "Does this need documentation?"
- Enough regression-test and BE-UT cases, referring files
test_template_{X}_arg(s).groovyin https://github.com/apache/doris/pull/47307/files (maybe updated. So find the newest version in master branch)
You could refer to https://github.com/apache/doris/pull/47307/files as a complete example(only missing FE constant folding)
btw: You may see some PR modified doris_builtin_functions.py. Now we don't need it anymore.
Aggregation Functions
You can refer code structure for implementing an aggregation function at https://github.com/apache/doris/pull/41240/files. It includes what you need to add.
Key Points
BE Implementations
- Use the base template when you try to implement a date/arithmetic calculation. You can find them by searching for other similar functions.
- Execution speed is very, very important for Doris. Therefore, you must eliminate all unnecessary copies and calculations. Try to use raw operations on inputs and outputs. If you can use the output Column's memory to receive the calculation result, do not add another variable and copy them. Don't call any virtual function in a loop. If it's necessary, use the template to generate different function entities to eliminate type judgment.
- You should not only pay attention to the new code you have added, but also consider its relevance to the existing code. If there is, please consider them together to achieve the best level of abstraction processing.
FE Signature
Most functions use one of the following interfaces:
AlwaysNullablemeans the function's return type is always wrapped inNullable. Use it when the function may generate thenullvalue for not-null input.AlwaysNotNullablemeans the function's return type is never wrapped inNullable. Use it when the function changes all thenullinput to a not-null output.PropagateNullable: when the input columns contain at least oneNullablecolumn, the output column isNullable. otherwise not. When you calculate the result for a not-null value and leavenullalone, it's the right choice.
Testcase
The testcases' type and quantities must not be less than the corresponding files in https://github.com/apache/doris/pull/47307/files.
The data you use must cover all the borders of its datatype and other sensitive values.
Add BE-UT case with check_function_all_arg_comb interface to cover Const-combinations.
You can run the cases using the scripts run-regression-test.sh and run-be-ut.sh. They have details explanations in them.
Other Advice
- If you don't know how to use the proper interface of a certain type of object, just look for how others played with them.
- The AI-assisted programming is quite mature now. So ask AI first if you want to make clear how some parts of the code work.