Skip to content

Extend system.functions with in-source documentation#49222

Merged
rschu1ze merged 10 commits intoClickHouse:masterfrom
DanRoscigno:add-examples-to-functions
Apr 28, 2023
Merged

Extend system.functions with in-source documentation#49222
rschu1ze merged 10 commits intoClickHouse:masterfrom
DanRoscigno:add-examples-to-functions

Conversation

@DanRoscigno
Copy link
Copy Markdown
Contributor

End Goal:

  • source code is the source of truth for docs
  • reduce the burden on developers; when devs edit their code they edit the docs in the same file
  • auto-generate docs in CI; when the code changes generate new markdown

Problem:
There are examples embedded in the source, but they are not being copied to the system tables
when the description information is copied over. This PR adds the embedded examples and
will be extended to add the rest of the available information.

Step 2 will be to increase the amount of embedded documentation by copying from the markdown
files into the source (so that the source is the source of truth for the docs)

Step 3 will be to automate the generation of the docs in CI

Changelog category (leave one):

  • Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Copies embedded examples to a new field example in system.functions to supplement the field description

@DanRoscigno DanRoscigno added pr-improvement Pull request with some product improvements can be tested Allows running workflows for external contributors labels Apr 26, 2023
@DanRoscigno DanRoscigno requested a review from rschu1ze April 26, 2023 18:52
@DanRoscigno DanRoscigno marked this pull request as draft April 26, 2023 18:52
@rschu1ze

This comment was marked as outdated.

@rschu1ze rschu1ze changed the title Add examples to functions Extend system.functions with in-source documentation Apr 26, 2023
@DanRoscigno
Copy link
Copy Markdown
Contributor Author

This is great Robert, I had started to make a list of things to cover (syntax, etc.) and you have all of the ones I thought of plus more. Thanks so much.

What is the NOLINT for on line 73?

@rschu1ze
Copy link
Copy Markdown
Member

The new fields are the obvious ones I found in the existing docs. Perhaps there are more and it should not be too difficult to add them.

NOLINT is just to suppress a linter warning about a missing explicit specifier. No bad things will happen if this specifier is not there.

Interestingly, FastTest still fails, I remember that it was green in my local environment. Let me check again.

@rschu1ze rschu1ze marked this pull request as ready for review April 27, 2023 12:14
@UnamedRus
Copy link
Copy Markdown
Contributor

I like idea of #36023 issue for arguments (or signature) of functions.

Ie it will look like:

SELECT name, is_aggregate, case_insensitive, origin, arguments, argument_types FROM system.functions WHERE name = 'uniq' OR name = 'has' OR name = 'concat';

image

@DanRoscigno
Copy link
Copy Markdown
Contributor Author

I like idea of #36023 issue for arguments (or signature) of functions.

Ie it will look like:

SELECT name, is_aggregate, case_insensitive, origin, arguments, argument_types FROM system.functions WHERE name = 'uniq' OR name = 'has' OR name = 'concat';

image

Thanks @UnamedRus

Would arguments and syntax provide the functionality that you asked for @sarah-mdv

arguments

syntax

@rschu1ze
Copy link
Copy Markdown
Member

rschu1ze commented Apr 27, 2023

A few thoughts:

  • While this PR adds new columns to system.functions, these are deliberately not publicly documented for now (here), because it is not 100% clear yet which fields make the most sense, how they are populated and formatted in the system table.
  • I generally like the idea of FEATURE: Add function signatures to information returned when querying system.functions  #36023. Using reflection to populate system.functions has a certain charm. Unfortunately, functions currently neither communicate their arguments (mandatory or optional) nor the datatypes of the arguments to the function factory which would be the most natural place to collect such information. Instead, arguments/datatypes are "implicitly" encoded individually for each function, making it infeasible to extract the information automatically (at least for now).
  • This doesn't mean that there can not or should not be a standardized way of instrumenting functions with signature "metadata" for use in reflection in future. Imho, this is orthogonal to this PR which focuses on generating web documentation from embedded documentation. E.g. we could have "duplicate" columns for different representations of the same information in system.functions - a human-readable column and a machine-readable column.
  • About human-readable signature information (i.e. everything below official function docs): Documentation of the arguments and sometimes also of the return type need a description to be useful. So this information needs to be embedded in the source as well.

@DanRoscigno
Copy link
Copy Markdown
Contributor Author

I have one more column to suggest, but I have to think about what it should be named. Let me start with the problem statement:

Our settings documentation has settings that should be set in /etc/clickhouse/server/config.d/filename.xml, some that should be in a specific section of ...users.d/filename.xml, some that should be set on a session basis, and some that should be set in a user's profile in Cloud. (are there more options? I don't think so). I think that we need a list of places/methods for configuring these settings.

Example, if you configure max_concurrent_connections_for_all_users in config.d, ClickHouse will not start.

Maybe the column is configuration?

@rschu1ze
Copy link
Copy Markdown
Member

@DanRoscigno Completely agree. Note that docs for settings docs follow a different "template" than function docs. E.g. only the former has "default value" and "possible values" fields. Also, settings docs are exposed in a different system table (this one). I think we should build (in a separate PR) a second C++ Documentation class specifically tailored to the needs of settings documentation.

@DanRoscigno
Copy link
Copy Markdown
Contributor Author

@DanRoscigno Completely agree. Note that docs for settings docs follow a different "template" than function docs. E.g. only the former has "default value" and "possible values" fields. Also, settings docs are exposed in a different system table (this one). I think we should build (in a separate PR) a second C++ Documentation class specifically tailored to the needs of settings documentation.

Agreed, I will work on that this morning. Thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

can be tested Allows running workflows for external contributors pr-improvement Pull request with some product improvements

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants