Skip to content

HTTP interface: more reliable way to detect errors in the middle of a stream #75175

@slvrtrn

Description

@slvrtrn

Errors detection

For language client usage, the HTTP interface needs a more reliable and consistent way to detect errors and report them to the user during the select stream in an arbitrary format. Currently, there are at least several variations based on the requested format.

Consider the following with a default 24.12 instance and curl:

CSV: exit code 18

curl "http://localhost:8123" --data-binary "select throwIf(number = 3, 'There was an error in the stream!') AS _, randomPrintableASCII(20) AS str from system.numbers limit 30000 SETTINGS max_block_size=1 FORMAT CSV" 

0,"FgUi-u.2~X?6j""~C[2>u"
__exception__
Code: 395. DB::Exception: There was an error in the stream!: while executing 'FUNCTION throwIf(equals(__table1.number, 3_UInt8) :: 3, 'There was an error in the stream!'_String :: 2) -> throwIf(equals(__table1.number, 3_UInt8), 'There was an error in the stream!'_String) UInt8 : 0'. (FUNCTION_THROW_IF_VALUE_IS_NON_ZERO) (version 24.12.2.29 (official build))
curl: (18) transfer closed with outstanding read data remaining

Default JSONEachRow behavior: exit code 0

curl "http://localhost:8123" --data-binary "select throwIf(number = 3, 'There was an error in the stream!') AS _, randomPrintableASCII(20) AS str from system.numbers limit 30000 SETTINGS max_block_size=1 FORMAT JSONEachRow" 
 
{"_":0,"str":"W#~QZ<\"'P-i;Fo,o4cVw"}
{"exception": "Code: 395. DB::Exception: There was an error in the stream!: while executing 'FUNCTION throwIf(equals(__table1.number, 3_UInt8) :: 3, 'There was an error in the stream!'_String :: 2) -> throwIf(equals(__table1.number, 3_UInt8), 'There was an error in the stream!'_String) UInt8 : 0'. (FUNCTION_THROW_IF_VALUE_IS_NON_ZERO) (version 24.12.2.29 (official build))"}

JSONEachRow + http_write_exception_in_output_format=0 (default is 1) = similar behavior to CSV, exit code 18

curl "http://localhost:8123?http_write_exception_in_output_format=0" --data-binary "select throwIf(number = 3, 'There was an error in the stream!') AS _, randomPrintableASCII(20) AS str from system.numbers limit 30000 SETTINGS max_block_size=1 FORMAT JSONEachRow"

{"_":0,"str":"^&D~V2Z&c3o0$V1Kf(?6"}
__exception__
Code: 395. DB::Exception: There was an error in the stream!: while executing 'FUNCTION throwIf(equals(__table1.number, 3_UInt8) :: 3, 'There was an error in the stream!'_String :: 2) -> throwIf(equals(__table1.number, 3_UInt8), 'There was an error in the stream!'_String) UInt8 : 0'. (FUNCTION_THROW_IF_VALUE_IS_NON_ZERO) (version 24.12.2.29 (official build))
curl: (18) transfer closed with outstanding read data remaining

RowBinary - exit code 18, and the exception is dumped as a plain string (without LEB, etc)

curl "http://localhost:8123" --data-binary "select throwIf(number = 3, 'There was an error in the stream!') AS _, number from system.numbers limit 30000 SETTINGS max_block_size=1 FORMAT RowBinary" > rowbinary.bin 

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   553    0   402  100   151   159k  61382 --:--:-- --:--:-- --:--:--  270k
curl: (18) transfer closed with outstanding read data remaining

Image

Regarding the error detection mechanism:

  • Currently, it is tricky cause language clients need to, at the very least, constantly check if there is a row matching __exception__ in the received chunk, but this will not always work for all the formats (see above).
  • It is even more complicated in case when raw bytes streams without deserialization are supported (e.g. Arrow, Parquet etc - and the language client itself might not even have the required dependency to decode the format properly).
  • On top of that, there are possible funny scenarios like strings containing keyword __exception__, or with default JSONEachRow behavior - a column with name exception and so on.
  • The headers cannot be modified if they were already sent, and there are no other push mechanisms in HTTP/1.1. If we have to keep the exception in the response stream and parse it out of there, could it be separated differently from the rest of the rows? For example, could double newline work in this case? This might simplify the flow on the language client side a bit.

HTTP streaming and load-balancers

The changes from #68800 (24.11+) may not work as intended if the request is going through a proxy/LB. Consider this sample docker-compose with two CH nodes behind nginx:

curl "http://localhost:8123" --data-binary "select throwIf(number = 3, 'There was an error in the stream!') AS _, randomPrintableASCII(20) AS str from system.numbers limit 30000 SETTINGS max_block_size=1 FORMAT CSV" 

0,"1v7_]@J?/nsG/K6=2SO;"
0,"kT/SQ+[M.6pPPc'73Lfo"
0,"gvCJG|,^^H%xAYzn/GJT"
__exception__
Code: 395. DB::Exception: There was an error in the stream!: while executing 'FUNCTION throwIf(equals(__table1.number, 3_UInt8) :: 3, 'There was an error in the stream!'_String :: 2) -> throwIf(equals(__table1.number, 3_UInt8), 'There was an error in the stream!'_String) UInt8 : 0'. (FUNCTION_THROW_IF_VALUE_IS_NON_ZERO) (version 24.12.2.29 (official build))

the return code is zero in this case (while it was 18 when we queried the node directly), e.g. it fully depends on how the LB terminates the connection now.

Metadata

Metadata

Assignees

Labels

close in a month if not activeThis will be closed in case of no informationst-need-infoWe need extra data to continue (waiting for response). Either some details or a repro of the issue.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions