Implement proper array conversion in http driver#146
Merged
Conversation
49c15da to
f4596e9
Compare
f4596e9 to
fcff760
Compare
Up to now, the http driver used quite rudimentary ClickHouse array
detection and parsing that failed to account for single-quoted strings.
As a result, it rendered strings *with the single quotes.* It also would
fail to parse strings that included ClickHouse array brackets.
To address these shortcomings, add proper array parsing to `parser.c`,
based on ClickHouse's own array parsing functions. `ch_http_read_next()`
now detects a field starting with `[` and passes it off to the new
`ch_http_read_array()` function, which detects the start and end of
nested arrays as well as single-quoted strings. The latter it passes off
to `ch_http_read_array_string_literal()`, which converts a ClickHouse
TSV array single-quoted string value to a Postgres double-quoted array
string value.
In all cases, these functions adjust `ch_http_read_state->curpos` to
track position between them; `ch_http_read_next()` no longer uses a
separate `pos` variable.
With these improvements to the TSV parser, remove the naive array
parsing from `char_to_datum()` in `pglink.c`. Also remove the
`VARCHAROID` handling, which seems to have no effect.
Add tests to `http_inserts.sql` to ensure the new functions properly
parse arrays and handle all the special characters. Also update older
tests that incorrectly returned arrays with single quotes; e.g., the
invalid Postgres array `{'num0','num1'}` becomes the correct
`{num0,num1}`.
Test the same values with the binary driver, and fix an issue where it
raised an error for an empty array.
While at it, remove the `buflen` field from `ch_http_read_state`, as its
purpose was overtaken by the introduction of the `StringInfo` in
5c010e5.
fcff760 to
957e43e
Compare
serprex
approved these changes
Feb 11, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Up to now, the http driver used quite rudimentary ClickHouse array detection and parsing that failed to account for single-quoted strings. As a result, it rendered strings with the single quotes. It also would fail to parse strings that included ClickHouse array brackets.
To address these shortcomings, add proper array parsing to
parser.c, based on ClickHouse's own array parsing functions.ch_http_read_next()now detects a field starting with[and passes it off to the newch_http_read_array()function, which detects the start and end of nested arrays as well as single-quoted strings. The latter it passes off toch_http_read_array_string_literal(), which converts a ClickHouse TSV array single-quoted string value to a Postgres double-quoted array string value.In all cases, these functions adjust
ch_http_read_state->curposto track position between them;ch_http_read_next()no longer uses a separateposvariable.With these improvements to the TSV parser, remove the naive array parsing from
char_to_datum()inpglink.c. Also remove theVARCHAROIDhandling, which seems to have no effect.Add tests to
http_inserts.sqlto ensure the new functions properly parse arrays and handle all the special characters. Also update older tests that incorrectly returned arrays with single quotes; e.g., the invalid Postgres array{'num0','num1'}becomes the correct{num0,num1}.Test the same values with the binary driver, and fix an issue where it raised an error for an empty array.
While at it, remove the
buflenfield fromch_http_read_state, as its purpose was overtaken by the introduction of theStringInfoin 5c010e5.Fixes #142.