Skip to content

User config knob for NULL presentation #5961

@philrz

Description

@philrz

SQL clients often have a user config knob that can be used to specify how NULL values are presented. For formats with a well-defined spec like JSON or SUP it probably makes sense to always stick with the standard, but for looser formats like CSV, TSV, text, and table users would likely appreciate a similar knob with super.

Details

Repro is with super commit 9fcbd22.

We'll run our examples with the following sample data.

$ super -version
Version: 9fcbd222b

$ echo '{a:1} {b:2}' | super -f parquet -o data.parquet -c "fuse" -

$ super -s data.parquet
{a:1,b:null::int64}
{a:null::int64,b:2}

For reasons I think have to do with the roots of super, with text and table formats, right now null values are rendered as -.

$ super -f text data.parquet 
1	-
-	2

$ super -f table data.parquet 
a b
1 -
- 2

As touched on in #5868, with CSV and TSV, they're rendered as empty values.

$ super -f csv data.parquet 
a,b
1,
,2

$ super -f tsv data.parquet 
a	b
1	
	2

By default DuckDB renders them as NULL, but offers the -nullvalue flag to vary this, such as of a user wanted to match what SuperDB currently does.

$ duckdb --version
v1.3.0 71c5c07cdd

$ duckdb --csv -c "SELECT * FROM 'data.parquet';"
a,b
1,NULL
NULL,2

$ duckdb --csv --nullvalue '-' -c "SELECT * FROM 'data.parquet';"
a,b
1,-
-,2

As also touched on in #5868, ClickHouse renders with \N by default, but they have several ways to vary this, including per-format knobs, e.g.,

$ clickhouse --version
ClickHouse local version 25.5.2.47 (official build).

$ clickhouse --format=csv --query "SELECT * FROM 'data.parquet';"
1,\N
\N,2

$ clickhouse --format=csv --format_csv_null_representation='-' --query "SELECT * FROM 'data.parquet';"
1,-
-,2

Postgres psql renders by default as an empty string, but a pset knob can vary this. Using the pg_parquet extension to load data from our Parquet data file:

pg_parquet=# CREATE TABLE data (
  a BIGINT,
  b BIGINT
);
CREATE TABLE

pg_parquet=# COPY data FROM '/Users/phil/work/super-5961/data.parquet';
COPY 2

pg_parquet=# SELECT * FROM data;
 a | b 
---+---
 1 |  
   | 2
(2 rows)

pg_parquet=# \pset null 'NULL'
Null display is "NULL".

pg_parquet=# SELECT * FROM data;
  a   |  b   
------+------
    1 | NULL
 NULL |    2
(2 rows)

Context

I happened to confront this topic because I was working with a sqllogictest that outputs NULL values. Up to that point I was having pretty good luck using super -f text to get the simple outputs that could be diff'ed against the expected outputs in sqllogictest files. super's static use of - creates a challenge for me here personally, but as evidenced by the presence of the knobs in other SQL clients I expect other users may see it as a problem as well.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions