Skip to content

Implement proper array conversion in http driver#146

Merged
theory merged 1 commit intomainfrom
parse-http-array
Feb 11, 2026
Merged

Implement proper array conversion in http driver#146
theory merged 1 commit intomainfrom
parse-http-array

Conversation

@theory
Copy link
Copy Markdown
Collaborator

@theory theory commented Feb 3, 2026

Up to now, the http driver used quite rudimentary ClickHouse array detection and parsing that failed to account for single-quoted strings. As a result, it rendered strings with the single quotes. It also would fail to parse strings that included ClickHouse array brackets.

To address these shortcomings, add proper array parsing to parser.c, based on ClickHouse's own array parsing functions. ch_http_read_next() now detects a field starting with [ and passes it off to the new ch_http_read_array() function, which detects the start and end of nested arrays as well as single-quoted strings. The latter it passes off to ch_http_read_array_string_literal(), which converts a ClickHouse TSV array single-quoted string value to a Postgres double-quoted array string value.

In all cases, these functions adjust ch_http_read_state->curpos to track position between them; ch_http_read_next() no longer uses a separate pos variable.

With these improvements to the TSV parser, remove the naive array parsing from char_to_datum() in pglink.c. Also remove the VARCHAROID handling, which seems to have no effect.

Add tests to http_inserts.sql to ensure the new functions properly parse arrays and handle all the special characters. Also update older tests that incorrectly returned arrays with single quotes; e.g., the invalid Postgres array {'num0','num1'} becomes the correct {num0,num1}.

Test the same values with the binary driver, and fix an issue where it raised an error for an empty array.

While at it, remove the buflen field from ch_http_read_state, as its purpose was overtaken by the introduction of the StringInfo in 5c010e5.

Fixes #142.

@theory theory force-pushed the parse-http-array branch 2 times, most recently from 49c15da to f4596e9 Compare February 11, 2026 21:52
@theory theory changed the title Parse http array Implement proper array conversion in http driver Feb 11, 2026
@theory theory requested a review from serprex February 11, 2026 21:53
@theory theory self-assigned this Feb 11, 2026
@theory theory added bug Something isn't working drivers Improve binary and/or http driver support labels Feb 11, 2026
@theory theory marked this pull request as ready for review February 11, 2026 21:54
Up to now, the http driver used quite rudimentary ClickHouse array
detection and parsing that failed to account for single-quoted strings.
As a result, it rendered strings *with the single quotes.* It also would
fail to parse strings that included ClickHouse array brackets.

To address these shortcomings, add proper array parsing to `parser.c`,
based on ClickHouse's own array parsing functions. `ch_http_read_next()`
now detects a field starting with `[` and passes it off to the new
`ch_http_read_array()` function, which detects the start and end of
nested arrays as well as single-quoted strings. The latter it passes off
to `ch_http_read_array_string_literal()`, which converts a ClickHouse
TSV array single-quoted string value to a Postgres double-quoted array
string value.

In all cases, these functions adjust `ch_http_read_state->curpos` to
track position between them; `ch_http_read_next()` no longer uses a
separate `pos` variable.

With these improvements to the TSV parser, remove the naive array
parsing from `char_to_datum()` in `pglink.c`. Also remove the
`VARCHAROID` handling, which seems to have no effect.

Add tests to `http_inserts.sql` to ensure the new functions properly
parse arrays and handle all the special characters. Also update older
tests that incorrectly returned arrays with single quotes; e.g., the
invalid Postgres array `{'num0','num1'}` becomes the correct
`{num0,num1}`.

Test the same values with the binary driver, and fix an issue where it
raised an error for an empty array.

While at it, remove the `buflen` field from `ch_http_read_state`, as its
purpose was overtaken by the introduction of the `StringInfo` in
5c010e5.
@theory theory merged commit 957e43e into main Feb 11, 2026
36 checks passed
@theory theory deleted the parse-http-array branch February 11, 2026 23:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working drivers Improve binary and/or http driver support

Projects

None yet

Development

Successfully merging this pull request may close these issues.

http engine fails to parse arrays with quoted strings

2 participants