Skip to content

Better rendering of multiline strings in Pretty formats#59940

Merged
alexey-milovidov merged 18 commits intoClickHouse:masterfrom
Volodyachan:multiline-strings-in-Pretty-formats
May 6, 2024
Merged

Better rendering of multiline strings in Pretty formats#59940
alexey-milovidov merged 18 commits intoClickHouse:masterfrom
Volodyachan:multiline-strings-in-Pretty-formats

Conversation

@Volodyachan
Copy link
Copy Markdown
Contributor

@Volodyachan Volodyachan commented Feb 13, 2024

Changelog category (leave one):

  • Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Multiline strings with border preservation and column width change

Example console output:

┌─x──────┬───y─┐
│ hello… │ 123 │
│ …world │     │
└────────┴─────┘

UPD: in the position where the value cannot appear

   ┌─x─────┬───y─┐
1. │ hello…│ 123 │
   │…world │     │
   └───────┴─────┘

It also works with line numbering

@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Feb 13, 2024

CLA assistant check
All committers have signed the CLA.

@yariks5s yariks5s self-assigned this Feb 13, 2024
@yariks5s yariks5s added the can be tested Allows running workflows for external contributors label Feb 13, 2024
@robot-ch-test-poll4 robot-ch-test-poll4 added the pr-improvement Pull request with some product improvements label Feb 13, 2024
@robot-ch-test-poll4
Copy link
Copy Markdown
Contributor

robot-ch-test-poll4 commented Feb 13, 2024

This is an automated comment for commit 34a9ec3 with description of existing statuses. It's updated for the latest CI running

❌ Click here to open a full report in a separate page

Check nameDescriptionStatus
AST fuzzerRuns randomly generated queries to catch program errors. The build type is optionally given in parenthesis. If it fails, ask a maintainer for help❌ failure
CI runningA meta-check that indicates the running CI. Normally, it's in success or pending state. The failed status indicates some problems with the PR⏳ pending
Performance ComparisonMeasure changes in query performance. The performance test report is described in detail here. In square brackets are the optional part/total tests❌ failure
Successful checks
Check nameDescriptionStatus
A SyncThere's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS✅ success
ClickBenchRuns [ClickBench](https://github.com/ClickHouse/ClickBench/) with instant-attach table✅ success
ClickHouse build checkBuilds ClickHouse in various configurations for use in further steps. You have to fix the builds that fail. Build logs often has enough information to fix the error, but you might have to reproduce the failure locally. The cmake options can be found in the build log, grepping for cmake. Use these options and follow the general build process✅ success
Compatibility checkChecks that clickhouse binary runs on distributions with old libc versions. If it fails, ask a maintainer for help✅ success
Docker keeper imageThe check to build and optionally push the mentioned image to docker hub✅ success
Docker server imageThe check to build and optionally push the mentioned image to docker hub✅ success
Docs checkBuilds and tests the documentation✅ success
Fast testNormally this is the first check that is ran for a PR. It builds ClickHouse and runs most of stateless functional tests, omitting some. If it fails, further checks are not started until it is fixed. Look at the report to see which tests fail, then reproduce the failure locally as described here✅ success
Flaky testsChecks if new added or modified tests are flaky by running them repeatedly, in parallel, with more randomization. Functional tests are run 100 times with address sanitizer, and additional randomization of thread scheduling. Integrational tests are run up to 10 times. If at least once a new test has failed, or was too long, this check will be red. We don't allow flaky tests, read the doc✅ success
Install packagesChecks that the built packages are installable in a clear environment✅ success
Integration testsThe integration tests report. In parenthesis the package type is given, and in square brackets are the optional part/total tests✅ success
Mergeable CheckChecks if all other necessary checks are successful✅ success
PR CheckThere's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS✅ success
Stateful testsRuns stateful functional tests for ClickHouse binaries built in various configurations -- release, debug, with sanitizers, etc✅ success
Stateless testsRuns stateless functional tests for ClickHouse binaries built in various configurations -- release, debug, with sanitizers, etc✅ success
Stress testRuns stateless functional tests concurrently from several clients to detect concurrency-related errors✅ success
Style checkRuns a set of checks to keep the code style clean. If some of tests failed, see the related log from the report✅ success
Unit testsRuns the unit tests for different release types✅ success
Upgrade checkRuns stress tests on server version from last release and then tries to upgrade it to the version from the PR. It checks if the new server can successfully startup without any errors, crashes or sanitizer asserts✅ success

Copy link
Copy Markdown
Member

@yariks5s yariks5s left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello! Can you please also add tests to check how your changes work?

@yariks5s
Copy link
Copy Markdown
Member

also, try to avoid cases like

if (condition) {

use instead:

if (condition)
{

@Volodyachan
Copy link
Copy Markdown
Contributor Author

Hello! Can you please also add tests to check how your changes work?

Okay, I'll do it later, but right now there's another problem.

@Volodyachan
Copy link
Copy Markdown
Contributor Author

There are 2 tests where my solution gives a different answer

Example of current answer:

┌─name─┬─type─────────────────────────────┬─default_type─┬─default_expression─┬─comment──────┬─codec_expression─┬─ttl_expression─┐
│ id   │ UInt64                           │              │                    │ index column │                  │                │
│ arr  │ Array(UInt64)                    │ DEFAULT      │ [10, 20]           │              │ ZSTD(1)          │                │
│ t    │ Tuple(
    a String,
    b UInt64) │ DEFAULT      │ ('foo', 0)         │              │ ZSTD(1)          │                │
└──────┴──────────────────────────────────┴──────────────┴────────────────────┴──────────────┴──────────────────┴────────────────┘

Example of my answer:

┌─name─┬─type────────────┬─default_type─┬─default_expression─┬─comment──────┬─codec_expression─┬─ttl_expression─┐
│ id   │ UInt64          │              │                    │ index column │                  │                │
│ arr  │ Array(UInt64)   │ DEFAULT      │ [10, 20]           │              │ ZSTD(1)          │                │
│ t    │ Tuple(…         │ DEFAULT      │ ('foo', 0)         │              │ ZSTD(1)          │                │
│      │ …    a String,… │              │                    │              │                  │                │
│      │ …    b UInt64)  │              │                    │              │                  │                │
└──────┴─────────────────┴──────────────┴────────────────────┴──────────────┴──────────────────┴────────────────┘

Can I canonize the answer, and if so, how?

@alexey-milovidov
Copy link
Copy Markdown
Member

@Volodyachan, the new behavior looks much better! Yes, you should update the .reference file for the test.
To do it, you can run the test alone with

clickhouse-client -n < test.sql > test.reference

@alexey-milovidov
Copy link
Copy Markdown
Member

alexey-milovidov commented Feb 14, 2024

I advise to place in the position where the value cannot appear - for disambiguation:

┌─x─────┬───y─┐
│ hello…│ 123 │
│…world │     │
└───────┴─────┘

Alternatives:

┌─x─────┬───y─┐
│ hello⋯│ 123 │
│⋯world │     │
└───────┴─────┘

┌─x─────┬───y─┐
│ hello↴│ 123 │
│↳world │     │
└───────┴─────┘

┌─x─────┬───y─┐
│ hello⮷│ 123 │
│⮱world │     │
└───────┴─────┘


@robot-ch-test-poll4 robot-ch-test-poll4 added the submodule changed At least one submodule changed in this PR. label Mar 1, 2024
@Volodyachan Volodyachan force-pushed the multiline-strings-in-Pretty-formats branch from 38d770d to d0dbb39 Compare April 26, 2024 14:26
@robot-ch-test-poll3 robot-ch-test-poll3 removed the submodule changed At least one submodule changed in this PR. label Apr 26, 2024
@Volodyachan Volodyachan force-pushed the multiline-strings-in-Pretty-formats branch from bdf099e to eb9e2d7 Compare April 27, 2024 10:24
@Volodyachan Volodyachan requested a review from yariks5s April 27, 2024 14:33
Copy link
Copy Markdown
Member

@yariks5s yariks5s left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but would be good to enhance tests

@Volodyachan Volodyachan force-pushed the multiline-strings-in-Pretty-formats branch from 9a6d7f0 to 276cabf Compare April 29, 2024 18:28
@Volodyachan
Copy link
Copy Markdown
Contributor Author

@yariks5s Do all tests have to be passed for merge? Because the dropped tests seem to have nothing to do with my code. If they shouldn't, you can merge? I don't have such opportunity)

@alexey-milovidov
Copy link
Copy Markdown
Member

Yes, it looks like all required tests are passed, and others are not related.

@Volodyachan Volodyachan force-pushed the multiline-strings-in-Pretty-formats branch from 2176083 to 7857d01 Compare May 3, 2024 22:15
@alexey-milovidov alexey-milovidov added this pull request to the merge queue May 6, 2024
Merged via the queue into ClickHouse:master with commit e0c23fb May 6, 2024
@robot-clickhouse robot-clickhouse added the pr-synced-to-cloud The PR is synced to the cloud repo label May 6, 2024
@azat
Copy link
Copy Markdown
Member

azat commented May 16, 2024

azat:.../tmp/tests-perf$ grep -e select_format.query4.run0 right-server-log.log  | less
2024.05.04 02:26:00.467540 [ 1158 ] {select_format.query4.run0} <Debug> executeQuery: (from [::1]:43952) INSERT INTO table_PrettySpace SELECT * FROM test.hits LIMIT 10000 (stage: Complete)
2024.05.04 02:26:00.467661 [ 1158 ] {select_format.query4.run0} <Trace> ContextAccess (default): Access granted: INSERT(WatchID, JavaEnable, Title, GoodEvent, EventTime, EventDate, CounterID, ClientIP, ClientIP6, RegionID, UserID, CounterClass, OS, UserAgent, URL, Referer, URLDomain, RefererDo>
2024.05.04 02:26:00.468303 [ 1158 ] {select_format.query4.run0} <Trace> Planner: Query SELECT __table1.WatchID AS WatchID, __table1.JavaEnable AS JavaEnable, __table1.Title AS Title, __table1.GoodEvent AS GoodEvent, __table1.EventTime AS EventTime, __table1.EventDate AS EventDate, __table1.Cou>
2024.05.04 02:26:00.468506 [ 1158 ] {select_format.query4.run0} <Trace> ContextAccess (default): Access granted: SELECT(WatchID, JavaEnable, Title, GoodEvent, EventTime, EventDate, CounterID, ClientIP, ClientIP6, RegionID, UserID, CounterClass, OS, UserAgent, URL, Referer, URLDomain, RefererDo>
2024.05.04 02:26:00.469094 [ 1158 ] {select_format.query4.run0} <Trace> Planner: Query SELECT __table1.WatchID AS WatchID, __table1.JavaEnable AS JavaEnable, __table1.Title AS Title, __table1.GoodEvent AS GoodEvent, __table1.EventTime AS EventTime, __table1.EventDate AS EventDate, __table1.Cou>
2024.05.04 02:26:00.470625 [ 1158 ] {select_format.query4.run0} <Debug> test.hits (7e4a60c6-566b-4cdb-84a9-47b825d7f4ff) (SelectExecutor): Key condition: unknown
2024.05.04 02:26:00.470643 [ 1158 ] {select_format.query4.run0} <Debug> test.hits (7e4a60c6-566b-4cdb-84a9-47b825d7f4ff) (SelectExecutor): MinMax index condition: unknown
2024.05.04 02:26:00.470666 [ 1158 ] {select_format.query4.run0} <Debug> test.hits (7e4a60c6-566b-4cdb-84a9-47b825d7f4ff) (SelectExecutor): Selected 1/1 parts by partition key, 1 parts by primary key, 1084/1084 marks by primary key, 1084 marks to read from 1 ranges
2024.05.04 02:26:00.470682 [ 1158 ] {select_format.query4.run0} <Trace> test.hits (7e4a60c6-566b-4cdb-84a9-47b825d7f4ff) (SelectExecutor): Spreading mark ranges among streams (default reading)
2024.05.04 02:26:00.470875 [ 1158 ] {select_format.query4.run0} <Trace> test.hits (7e4a60c6-566b-4cdb-84a9-47b825d7f4ff) (SelectExecutor): Reading 1 ranges in order from part 201403_10_18_2, approx. 10000 rows starting from 0
2024.05.04 02:26:01.229303 [ 2731 ] {select_format.query4.run0} <Debug> MemoryTracker: Current memory usage (for query): 1.07 GiB.
2024.05.04 02:26:01.822873 [ 2731 ] {select_format.query4.run0} <Debug> MemoryTracker: Current memory usage (for query): 2.07 GiB.
2024.05.04 02:26:02.978317 [ 2731 ] {select_format.query4.run0} <Debug> MemoryTracker: Current memory usage (for query): 4.07 GiB.
2024.05.04 02:26:05.268606 [ 2731 ] {select_format.query4.run0} <Debug> MemoryTracker: Current memory usage (for query): 8.07 GiB.
2024.05.04 02:26:05.268714 [ 2731 ] {select_format.query4.run0} <Debug> MemoryTracker: Current memory usage (total): 18.92 GiB.
2024.05.04 02:26:08.623803 [ 2731 ] {select_format.query4.run0} <Debug> MemoryTracker: Current memory usage (for query): 11.67 GiB.
2024.05.04 02:26:08.623904 [ 2731 ] {select_format.query4.run0} <Debug> MemoryTracker: Current memory usage (total): 22.52 GiB.
2024.05.04 02:26:14.320555 [ 1158 ] {select_format.query4.run0} <Debug> executeQuery: Read 16384 rows, 16.61 MiB in 13.853056 sec., 1182.6993264157743 rows/sec., 1.20 MiB/sec.
2024.05.04 02:26:14.326616 [ 1158 ] {select_format.query4.run0} <Debug> MemoryTracker: Peak memory usage (for query): 11.67 GiB.
2024.05.04 02:26:14.326630 [ 1158 ] {select_format.query4.run0} <Debug> TCPHandler: Processed in 13.858514872 sec.
azat:.../tmp/tests-perf$ grep -e select_format.query4.run0 left-server-log.log  | less
2024.05.04 02:26:00.101605 [ 1156 ] {select_format.query4.run0} <Debug> executeQuery: (from [::1]:37072) INSERT INTO table_PrettySpace SELECT * FROM test.hits LIMIT 10000 (stage: Complete)
2024.05.04 02:26:00.101758 [ 1156 ] {select_format.query4.run0} <Trace> ContextAccess (default): Access granted: INSERT(WatchID, JavaEnable, Title, GoodEvent, EventTime, EventDate, CounterID, ClientIP, ClientIP6, RegionID, UserID, CounterClass, OS, UserAgent, URL, Referer, URLDomain, RefererDo>
2024.05.04 02:26:00.102560 [ 1156 ] {select_format.query4.run0} <Trace> Planner: Query SELECT __table1.WatchID AS WatchID, __table1.JavaEnable AS JavaEnable, __table1.Title AS Title, __table1.GoodEvent AS GoodEvent, __table1.EventTime AS EventTime, __table1.EventDate AS EventDate, __table1.Cou>
2024.05.04 02:26:00.102755 [ 1156 ] {select_format.query4.run0} <Trace> ContextAccess (default): Access granted: SELECT(WatchID, JavaEnable, Title, GoodEvent, EventTime, EventDate, CounterID, ClientIP, ClientIP6, RegionID, UserID, CounterClass, OS, UserAgent, URL, Referer, URLDomain, RefererDo>
2024.05.04 02:26:00.103380 [ 1156 ] {select_format.query4.run0} <Trace> Planner: Query SELECT __table1.WatchID AS WatchID, __table1.JavaEnable AS JavaEnable, __table1.Title AS Title, __table1.GoodEvent AS GoodEvent, __table1.EventTime AS EventTime, __table1.EventDate AS EventDate, __table1.Cou>
2024.05.04 02:26:00.104611 [ 1156 ] {select_format.query4.run0} <Debug> test.hits (7e4a60c6-566b-4cdb-84a9-47b825d7f4ff) (SelectExecutor): Key condition: unknown
2024.05.04 02:26:00.104631 [ 1156 ] {select_format.query4.run0} <Debug> test.hits (7e4a60c6-566b-4cdb-84a9-47b825d7f4ff) (SelectExecutor): MinMax index condition: unknown
2024.05.04 02:26:00.104654 [ 1156 ] {select_format.query4.run0} <Debug> test.hits (7e4a60c6-566b-4cdb-84a9-47b825d7f4ff) (SelectExecutor): Selected 1/1 parts by partition key, 1 parts by primary key, 1084/1084 marks by primary key, 1084 marks to read from 1 ranges
2024.05.04 02:26:00.104671 [ 1156 ] {select_format.query4.run0} <Trace> test.hits (7e4a60c6-566b-4cdb-84a9-47b825d7f4ff) (SelectExecutor): Spreading mark ranges among streams (default reading)
2024.05.04 02:26:00.104857 [ 1156 ] {select_format.query4.run0} <Trace> test.hits (7e4a60c6-566b-4cdb-84a9-47b825d7f4ff) (SelectExecutor): Reading 1 ranges in order from part 201403_10_18_2, approx. 10000 rows starting from 0
2024.05.04 02:26:00.465344 [ 1156 ] {select_format.query4.run0} <Debug> executeQuery: Read 16384 rows, 16.61 MiB in 0.36379 sec., 45036.97187938096 rows/sec., 45.66 MiB/sec.
2024.05.04 02:26:00.466790 [ 1156 ] {select_format.query4.run0} <Debug> MemoryTracker: Peak memory usage (for query): 130.81 MiB.
2024.05.04 02:26:00.466803 [ 1156 ] {select_format.query4.run0} <Debug> TCPHandler: Processed in 0.365560894 sec.

Note, that it also started to use too much RAM, but I haven't look deeply

@Algunenano
Copy link
Copy Markdown
Member

It looks like this PR makes PrettySpace format significantly slowe

Thanks for the analysis! Let's revert (I'll revert)

@azat
Copy link
Copy Markdown
Member

azat commented May 16, 2024

I was thinking about fixing this, but your idea is "better" 😂

@azat
Copy link
Copy Markdown
Member

azat commented May 16, 2024

Actually I think we should do this at least under a setting

@azat
Copy link
Copy Markdown
Member

azat commented May 16, 2024

@alexey-milovidov and BTW the perf tests shows the problem, timeout for queries from select_format.xml (likely it picked the correct query)

@Algunenano
Copy link
Copy Markdown
Member

In theory there is a setting: #63479

But 20x slower is too much, even with a setting. I'd rather revert and we can reintroduce once it's investigated than (myself) spent lots of time analyzing unfamiliar code

@azat
Copy link
Copy Markdown
Member

azat commented May 16, 2024

BTW it looks like the functionality of pager is implemented with this

@Algunenano
Copy link
Copy Markdown
Member

Algunenano commented May 16, 2024

Funny thing: Since more PRs where built on top of this, the reverts means reverting everything:

The good thing: Reverting the revert once the fix is done should be simple

@Algunenano
Copy link
Copy Markdown
Member

Confirmed.

Before the revert:

stage   start   0.005   0.005
stage   parse   0.000   0.006
stage   before-connect  0.000   0.006
server  0       localhost       49000
stage   connect 0.000   0.006
drop    0       0.001971006393432617    DROP TABLE IF EXISTS table_PrettySpace
stage   drop-1  0.002   0.008
stage   settings        0.001   0.009
create  0       0.011439085006713867    CREATE TABLE IF NOT EXISTS table_PrettySpace ENGINE = File(PrettySpace, '/dev/null') AS test.hits
stage   create  0.012   0.021
stage   sync    0.021   0.042
display-name    0       INSERT INTO table_PrettySpace SELECT * FROM test.hits LIMIT 10000
prewarm 0       select_format.query0.prewarm0   0       2.6575686931610107
query   0       select_format.query0.run0       0       1.8274660110473633
client-time     0       1.8275002849986777      1.8274660110473633
profile-total   0
stage   run     4.485   4.527
drop    0       0.0006453990936279297   DROP TABLE IF EXISTS table_PrettySpace
stage   drop-2  0.001   4.527

After the revert:

/mnt/ch/ClickHouse/tests/performance $ ../../tests/performance/scripts/perf.py --runs 1 select_format.xml  --port 49000
stage   start   0.006   0.006
stage   parse   0.000   0.006
stage   before-connect  0.000   0.006
server  0       localhost       49000
stage   connect 0.000   0.006
drop    0       0.002940654754638672    DROP TABLE IF EXISTS table_PrettySpace
stage   drop-1  0.003   0.009
stage   settings        0.001   0.010
create  0       0.01272130012512207     CREATE TABLE IF NOT EXISTS table_PrettySpace ENGINE = File(PrettySpace, '/dev/null') AS test.hits
stage   create  0.013   0.023
stage   sync    0.026   0.049
display-name    0       INSERT INTO table_PrettySpace SELECT * FROM test.hits LIMIT 10000
prewarm 0       select_format.query0.prewarm0   0       0.29047489166259766
query   0       select_format.query0.run0       0       0.23655939102172852
query   0       select_format.query0.run1       0       0.23544073104858398
query   0       select_format.query0.run2       0       0.22042036056518555
query   0       select_format.query0.run3       0       0.22723174095153809
query   0       select_format.query0.run4       0       0.23174118995666504
client-time     0       1.1515622549995896      1.1513934135437012
profile-total   0
stage   run     1.442   1.492
drop    0       0.0005781650543212891   DROP TABLE IF EXISTS table_PrettySpace
stage   drop-2  0.001   1.492

Taking 2 seconds to show 10k lines is not good.

Algunenano added a commit to Algunenano/ClickHouse that referenced this pull request May 16, 2024
…e-strings-in-Pretty-formats"

This reverts commit e0c23fb, reversing
changes made to 02a5b01.
@alexey-milovidov
Copy link
Copy Markdown
Member

@Algunenano, we can give it a chance by carefully rewriting this code. It could have some performance impact, e.g., 2..3x will be ok for Pretty format (with is not meant for large data exports). 20x is too much but it should be easily fixable.

@Algunenano
Copy link
Copy Markdown
Member

@Algunenano, we can give it a chance by carefully rewriting this code. It could have some performance impact, e.g., 2..3x will be ok for Pretty format (with is not meant for large data exports). 20x is too much but it should be easily fixable.

Yes, I'm open to somebody doing it, but in the meantime it's causing issues in CI and if somebody else doesn't act it will go into the next release. We can revert the revert with the fix

@Volodyachan
Copy link
Copy Markdown
Contributor Author

@Algunenano Can you tell me how I can locally test the performance. There is no test.hits table.

@Algunenano
Copy link
Copy Markdown
Member

Volodyachan added a commit to Volodyachan/ClickHouse that referenced this pull request May 18, 2024
…multiline-strings-in-Pretty-formats""

This reverts commit 7bcef97.
@Volodyachan Volodyachan mentioned this pull request May 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

can be tested Allows running workflows for external contributors pr-improvement Pull request with some product improvements pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants