Skip to content

[Guidebook - Good first issue] How to implement a new SQL function #48201

@zclllyybb

Description

@zclllyybb

New SQL functions are usually one of the most proper parts for newcomers to try to participate in Doris' development. For future reference, we'd like to list the basic processes for implementing a new function in Doris.

How to Begin

If you are new for Apache Doris, you can follow https://doris.apache.org/zh-CN/community/source-install/compilation-with-ldb-toolchain to build your development environment. Then follow https://doris.apache.org/zh-CN/docs/install/deploy-manually/integrated-storage-compute-deploy-manually to build your cluster after building succeed. For more utils may be useful, see https://doris.apache.org/zh-CN/community/developer-guide/be-vscode-dev and other docs in same directory.

What to do

To implement a new SQL function, here's what you need to write in your PR:

  1. The function implementation and registration in BE
  2. The function signature and visitor for nereids planner in FE
  3. The constant fold implementation in FE if possible. just like what https://github.com/apache/doris/pull/40744/files did in functions/executable/NumericArithmetic.java.
  4. A function docs PR in https://github.com/apache/doris-website must follow our newest docs specification. See https://github.com/apache/doris-website/pull/1992/files for an example. Before we merge code PR, there's must a ready document PR linked in "Does this need documentation?"
  5. Enough regression-test and BE-UT cases, referring files test_template_{X}_arg(s).groovy in https://github.com/apache/doris/pull/47307/files (maybe updated. So find the newest version in master branch)

You could refer to https://github.com/apache/doris/pull/47307/files as a complete example(only missing FE constant folding)

btw: You may see some PR modified doris_builtin_functions.py. Now we don't need it anymore.

Aggregation Functions

You can refer code structure for implementing an aggregation function at https://github.com/apache/doris/pull/41240/files. It includes what you need to add.

Key Points

BE Implementations

  1. Use the base template when you try to implement a date/arithmetic calculation. You can find them by searching for other similar functions.
  2. Execution speed is very, very important for Doris. Therefore, you must eliminate all unnecessary copies and calculations. Try to use raw operations on inputs and outputs. If you can use the output Column's memory to receive the calculation result, do not add another variable and copy them. Don't call any virtual function in a loop. If it's necessary, use the template to generate different function entities to eliminate type judgment.
  3. You should not only pay attention to the new code you have added, but also consider its relevance to the existing code. If there is, please consider them together to achieve the best level of abstraction processing.

FE Signature

Most functions use one of the following interfaces:

  1. AlwaysNullable means the function's return type is always wrapped in Nullable. Use it when the function may generate the null value for not-null input.
  2. AlwaysNotNullable means the function's return type is never wrapped in Nullable. Use it when the function changes all the null input to a not-null output.
  3. PropagateNullable: when the input columns contain at least one Nullable column, the output column is Nullable. otherwise not. When you calculate the result for a not-null value and leave null alone, it's the right choice.

Testcase

The testcases' type and quantities must not be less than the corresponding files in https://github.com/apache/doris/pull/47307/files.

The data you use must cover all the borders of its datatype and other sensitive values.

Add BE-UT case with check_function_all_arg_comb interface to cover Const-combinations.

You can run the cases using the scripts run-regression-test.sh and run-be-ut.sh. They have details explanations in them.

Other Advice

  1. If you don't know how to use the proper interface of a certain type of object, just look for how others played with them.
  2. The AI-assisted programming is quite mature now. So ask AI first if you want to make clear how some parts of the code work.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions