"buffer exceeded max size" when reading JSON array via auto-detect

Repro is with Zed commit 6288fa9e with the attached test data [nfcapd.json.gz](https://github.com/brimdata/zed/files/8597678/nfcapd.json.gz) which consists of a JSON array of NetFlow records.

Prior to #3555 (cc: @nwt), this input data was auto-detected as JSON and the elements were treated as individual records, such that the following worked ok.

```
$ zq -version
Version: v0.33.0-167-g06c8ed11

$ zq -z 'head 1' nfcapd.json.gz 
{app_latency:0,cli_latency:0,dst4_addr:"10.47.2.154",dst_port:58331,export_sysid:0,fwd_status:0,in_bytes:313,in_packets:1,label:"<none>",proto:17,sampled:0,src4_addr:"10.0.0.100",src_port:53,src_tos:0,srv_latency:0,t_first:"2018-03-23T12:58:22.641",t_last:"2018-03-23T12:58:22.641",tcp_flags:"........",type:"FLOW"}
```

However, ever since #3555, the input's original array-ness is preserved, so now I'd need to apply `over this` to remove the array layer. This makes sense, but now auto-detect fails to get past the input phase successfully.

```
$ zq -version
Version: v1.0.0-72-g6288fa9e

$ zq -z 'over this | head 1' nfcapd.json.gz 
nfcapd.json.gz: format detection error
	zeek: line 1: bad types/fields definition in zeek header
	zjson: line 1: unexpected end of JSON input
	zson: parse error: string literal: buffer exceeded max size trying to infer input format
	zng: zngio: unknown compression format 0x7b
	zng21: zng type ID out of range
	csv: line 1: no comma found
	json: buffer exceeded max size trying to infer input format
	parquet: auto-detection not supported
	zst: auto-detection not supported
```

I can make it work again if I can add the explicit `-i json`, but I'm curious if this limitation with auto-detect could be seen as a bug or undesirable limitation that could be addressed. Imagining a future of users bringing arbitrary inputs to the tooling, it does seem like being forgiving with auto-detect is a desirable goal.

The wider context is that I've been drafting updates to the [Custom Brimcap Config](https://github.com/brimdata/brimcap/wiki/Custom-Brimcap-Config) wiki article, which currently shows example command lines that depended on the previous ability to successfully auto-detect this JSON input. Due to known Brimcap limitation https://github.com/brimdata/brimcap/issues/80, the option of explicitly specifying the equivalent of `-i json` on a `brimcap analyze` command line is not currently available, so I can't employ the same workaround as I would for `zq`. However, if the limitation identified here is deemed too difficult to address in the short term, I could revise the article to work around it for now in another way, such as perhaps using CSV input as it had in the past.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"buffer exceeded max size" when reading JSON array via auto-detect #3865

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

"buffer exceeded max size" when reading JSON array via auto-detect #3865

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions