-
Notifications
You must be signed in to change notification settings - Fork 8.3k
HTTP interface: more reliable way to detect errors in the middle of a stream #75175
Description
Errors detection
For language client usage, the HTTP interface needs a more reliable and consistent way to detect errors and report them to the user during the select stream in an arbitrary format. Currently, there are at least several variations based on the requested format.
Consider the following with a default 24.12 instance and curl:
CSV: exit code 18
curl "http://localhost:8123" --data-binary "select throwIf(number = 3, 'There was an error in the stream!') AS _, randomPrintableASCII(20) AS str from system.numbers limit 30000 SETTINGS max_block_size=1 FORMAT CSV"
0,"FgUi-u.2~X?6j""~C[2>u"
__exception__
Code: 395. DB::Exception: There was an error in the stream!: while executing 'FUNCTION throwIf(equals(__table1.number, 3_UInt8) :: 3, 'There was an error in the stream!'_String :: 2) -> throwIf(equals(__table1.number, 3_UInt8), 'There was an error in the stream!'_String) UInt8 : 0'. (FUNCTION_THROW_IF_VALUE_IS_NON_ZERO) (version 24.12.2.29 (official build))
curl: (18) transfer closed with outstanding read data remaining
Default JSONEachRow behavior: exit code 0
curl "http://localhost:8123" --data-binary "select throwIf(number = 3, 'There was an error in the stream!') AS _, randomPrintableASCII(20) AS str from system.numbers limit 30000 SETTINGS max_block_size=1 FORMAT JSONEachRow"
{"_":0,"str":"W#~QZ<\"'P-i;Fo,o4cVw"}
{"exception": "Code: 395. DB::Exception: There was an error in the stream!: while executing 'FUNCTION throwIf(equals(__table1.number, 3_UInt8) :: 3, 'There was an error in the stream!'_String :: 2) -> throwIf(equals(__table1.number, 3_UInt8), 'There was an error in the stream!'_String) UInt8 : 0'. (FUNCTION_THROW_IF_VALUE_IS_NON_ZERO) (version 24.12.2.29 (official build))"}
JSONEachRow + http_write_exception_in_output_format=0 (default is 1) = similar behavior to CSV, exit code 18
curl "http://localhost:8123?http_write_exception_in_output_format=0" --data-binary "select throwIf(number = 3, 'There was an error in the stream!') AS _, randomPrintableASCII(20) AS str from system.numbers limit 30000 SETTINGS max_block_size=1 FORMAT JSONEachRow"
{"_":0,"str":"^&D~V2Z&c3o0$V1Kf(?6"}
__exception__
Code: 395. DB::Exception: There was an error in the stream!: while executing 'FUNCTION throwIf(equals(__table1.number, 3_UInt8) :: 3, 'There was an error in the stream!'_String :: 2) -> throwIf(equals(__table1.number, 3_UInt8), 'There was an error in the stream!'_String) UInt8 : 0'. (FUNCTION_THROW_IF_VALUE_IS_NON_ZERO) (version 24.12.2.29 (official build))
curl: (18) transfer closed with outstanding read data remaining
RowBinary - exit code 18, and the exception is dumped as a plain string (without LEB, etc)
curl "http://localhost:8123" --data-binary "select throwIf(number = 3, 'There was an error in the stream!') AS _, number from system.numbers limit 30000 SETTINGS max_block_size=1 FORMAT RowBinary" > rowbinary.bin
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 553 0 402 100 151 159k 61382 --:--:-- --:--:-- --:--:-- 270k
curl: (18) transfer closed with outstanding read data remaining
Regarding the error detection mechanism:
- Currently, it is tricky cause language clients need to, at the very least, constantly check if there is a row matching
__exception__in the received chunk, but this will not always work for all the formats (see above). - It is even more complicated in case when raw bytes streams without deserialization are supported (e.g.
Arrow,Parquetetc - and the language client itself might not even have the required dependency to decode the format properly). - On top of that, there are possible funny scenarios like strings containing keyword
__exception__, or with default JSONEachRow behavior - a column with nameexceptionand so on. - The headers cannot be modified if they were already sent, and there are no other push mechanisms in HTTP/1.1. If we have to keep the exception in the response stream and parse it out of there, could it be separated differently from the rest of the rows? For example, could double newline work in this case? This might simplify the flow on the language client side a bit.
HTTP streaming and load-balancers
The changes from #68800 (24.11+) may not work as intended if the request is going through a proxy/LB. Consider this sample docker-compose with two CH nodes behind nginx:
curl "http://localhost:8123" --data-binary "select throwIf(number = 3, 'There was an error in the stream!') AS _, randomPrintableASCII(20) AS str from system.numbers limit 30000 SETTINGS max_block_size=1 FORMAT CSV"
0,"1v7_]@J?/nsG/K6=2SO;"
0,"kT/SQ+[M.6pPPc'73Lfo"
0,"gvCJG|,^^H%xAYzn/GJT"
__exception__
Code: 395. DB::Exception: There was an error in the stream!: while executing 'FUNCTION throwIf(equals(__table1.number, 3_UInt8) :: 3, 'There was an error in the stream!'_String :: 2) -> throwIf(equals(__table1.number, 3_UInt8), 'There was an error in the stream!'_String) UInt8 : 0'. (FUNCTION_THROW_IF_VALUE_IS_NON_ZERO) (version 24.12.2.29 (official build))
the return code is zero in this case (while it was 18 when we queried the node directly), e.g. it fully depends on how the LB terminates the connection now.
