Skip to content

Rename "inverted index" to "full-text index"#62884

Merged
rschu1ze merged 3 commits intoClickHouse:masterfrom
rschu1ze:inverted-to-fulltext
Apr 30, 2024
Merged

Rename "inverted index" to "full-text index"#62884
rschu1ze merged 3 commits intoClickHouse:masterfrom
rschu1ze:inverted-to-fulltext

Conversation

@rschu1ze
Copy link
Copy Markdown
Member

@rschu1ze rschu1ze commented Apr 23, 2024

This PR renames "inverted indexes" to "full-text indexes", see #50268 (comment)

Review note: Due to conflicts with "full-text indexes" as existing internal name for bloom filter indexes, this PR consists of three steps:

  • f0faac2 moves everything related to bloom filter indexes into a single header / source file. This is technically unrelated to the original purpose, but makes the code organization consistent with the other secondary index implementations.
  • 0ae422d renames MergeTreeIndexFullText* to MergeTreeIndexBloomFilterText*.
  • 56e32e0 renames MergeTreeIndexInverted* to MergeTreeIndexFullText*.

Changelog category (leave one):

  • Backward Incompatible Change

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Renamed "inverted indexes" to "full-text indexes" which is a less technical / more user-friendly name. This also changes internal table metadata and breaks tables with existing (experimental) inverted indexes. Please make to drop such indexes before upgrade and re-create them after upgrade.

@rschu1ze rschu1ze changed the title (*wip*) Rename "inverted index" to "full-text index" (wip) Rename "inverted index" to "full-text index" Apr 23, 2024
@robot-ch-test-poll robot-ch-test-poll added the pr-not-for-changelog This PR should not be mentioned in the changelog label Apr 23, 2024
@robot-ch-test-poll
Copy link
Copy Markdown
Contributor

robot-ch-test-poll commented Apr 23, 2024

This is an automated comment for commit 56e32e0 with description of existing statuses. It's updated for the latest CI running

❌ Click here to open a full report in a separate page

Check nameDescriptionStatus
CI runningA meta-check that indicates the running CI. Normally, it's in success or pending state. The failed status indicates some problems with the PR⏳ pending
Stateless testsRuns stateless functional tests for ClickHouse binaries built in various configurations -- release, debug, with sanitizers, etc❌ failure
Stress testRuns stateless functional tests concurrently from several clients to detect concurrency-related errors❌ failure
Successful checks
Check nameDescriptionStatus
A SyncThere's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS✅ success
AST fuzzerRuns randomly generated queries to catch program errors. The build type is optionally given in parenthesis. If it fails, ask a maintainer for help✅ success
ClickBenchRuns [ClickBench](https://github.com/ClickHouse/ClickBench/) with instant-attach table✅ success
ClickHouse build checkBuilds ClickHouse in various configurations for use in further steps. You have to fix the builds that fail. Build logs often has enough information to fix the error, but you might have to reproduce the failure locally. The cmake options can be found in the build log, grepping for cmake. Use these options and follow the general build process✅ success
Compatibility checkChecks that clickhouse binary runs on distributions with old libc versions. If it fails, ask a maintainer for help✅ success
Docker keeper imageThe check to build and optionally push the mentioned image to docker hub✅ success
Docker server imageThe check to build and optionally push the mentioned image to docker hub✅ success
Docs checkBuilds and tests the documentation✅ success
Fast testNormally this is the first check that is ran for a PR. It builds ClickHouse and runs most of stateless functional tests, omitting some. If it fails, further checks are not started until it is fixed. Look at the report to see which tests fail, then reproduce the failure locally as described here✅ success
Flaky testsChecks if new added or modified tests are flaky by running them repeatedly, in parallel, with more randomization. Functional tests are run 100 times with address sanitizer, and additional randomization of thread scheduling. Integrational tests are run up to 10 times. If at least once a new test has failed, or was too long, this check will be red. We don't allow flaky tests, read the doc✅ success
Install packagesChecks that the built packages are installable in a clear environment✅ success
Integration testsThe integration tests report. In parenthesis the package type is given, and in square brackets are the optional part/total tests✅ success
Mergeable CheckChecks if all other necessary checks are successful✅ success
PR CheckThere's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS✅ success
Performance ComparisonMeasure changes in query performance. The performance test report is described in detail here. In square brackets are the optional part/total tests✅ success
Stateful testsRuns stateful functional tests for ClickHouse binaries built in various configurations -- release, debug, with sanitizers, etc✅ success
Style checkRuns a set of checks to keep the code style clean. If some of tests failed, see the related log from the report✅ success
Unit testsRuns the unit tests for different release types✅ success
Upgrade checkRuns stress tests on server version from last release and then tries to upgrade it to the version from the PR. It checks if the new server can successfully startup without any errors, crashes or sanitizer asserts✅ success

@rschu1ze rschu1ze force-pushed the inverted-to-fulltext branch from 3266823 to 404fa45 Compare April 24, 2024 06:41
@rschu1ze rschu1ze changed the title (wip) Rename "inverted index" to "full-text index" Rename "inverted index" to "full-text index" Apr 24, 2024
@rschu1ze rschu1ze marked this pull request as ready for review April 24, 2024 06:41
@rschu1ze rschu1ze force-pushed the inverted-to-fulltext branch from 404fa45 to 56e32e0 Compare April 24, 2024 07:20
@alexey-milovidov
Copy link
Copy Markdown
Member

How about full_text_ngram, full_text_token?

@rschu1ze
Copy link
Copy Markdown
Member Author

How about full_text_ngram, full_text_token?

You surely mean

  • full_text_token as alias for full_text(0), and
  • full_text_ngram as alias for full_text(N)

during index creation. I generally like this idea and it seems (kind of) consistent with the naming of bloom filter indexes tokenbf_v1 and ngrambf_v1. But perhaps we like to specify tokenizers in future in more detail (think of stemming, lemmatization, etc.). So instead of hard-coding some index type names now, we should first check which flexibility we really need. I better do that in another PR.

@antaljanosbenjamin antaljanosbenjamin self-assigned this Apr 30, 2024
@rschu1ze rschu1ze added this pull request to the merge queue Apr 30, 2024
Merged via the queue into ClickHouse:master with commit 0a01a92 Apr 30, 2024
@rschu1ze rschu1ze deleted the inverted-to-fulltext branch April 30, 2024 20:46
@robot-ch-test-poll1 robot-ch-test-poll1 added the pr-synced-to-cloud The PR is synced to the cloud repo label Apr 30, 2024
@Algunenano
Copy link
Copy Markdown
Member

Changing this to a breaking change. Even though it's experimental, if should appear in the notice list

@Algunenano Algunenano added pr-backward-incompatible Pull request with backwards incompatible changes and removed pr-not-for-changelog This PR should not be mentioned in the changelog labels May 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-backward-incompatible Pull request with backwards incompatible changes pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants