Skip to content

Conversation

@ctabin
Copy link
Contributor

@ctabin ctabin commented Jul 6, 2023

Hi,

This is a minor enhancement to #3786 to allow H2 to still write/read empty string if the nullString option is set in CSVREAD and CSVWRITE functions to be explicitly different from "" or null.

Typically, when using MSQL/MARIADB modes, a NULL value is not the same as an empty string (as where for Oracle, an empty string is converted to NULL).

@katzyn
Copy link
Contributor

katzyn commented Jul 7, 2023

NULL is different from empty string in the SQL Standard and it almost every database system, including H2.

@manticore-projects
Why #3786 was implemented in that way?

At least PostgreSQL and H2 distinguish nulls from empty strings in the following way:

id,columnWithNull,columnWithEmptyString
1,,""

After that PR H2 reads both columns as NULL.

@katzyn
Copy link
Contributor

katzyn commented Jul 7, 2023

@ctabin
Thank you for your contribution, but I think we need to restore old correct behavior at least by default instead of introduction of additional quirks.

If there are real-world use cases where "" needs to be processed as NULL and there are no other ways, these cases should use some new configuration option. If there are no such cases, PR with a breaking change should be simply reverted from my point of view.

@manticore-projects
Copy link
Contributor

manticore-projects commented Jul 7, 2023

Greetings!

Please check this details of the original case. I am aware that NULL != "" however when reading a quoted CSV column with Decimals and there is a NULL value, it would show as "", e. g.:

"1234.45"
""

This caused an exception before which I have fixed. See example #3785
NULL behavior is not specific in CSV.

@katzyn
Copy link
Contributor

katzyn commented Jul 7, 2023

Do you really need to write numbers with quotes?

@manticore-projects
Copy link
Contributor

Do you really need to write numbers with quotes?

I do not need to, but I have also no control over it since both is allowed and usually "quoting" is the safest option.
All spreadsheet programs I have tried offer quoting and read quoted decimals (incl. NULLs) without any problem.

The problem is not H2, but the CSV spec -- however I think if the spread sheet program can deal with it, H2 must find a way.

@katzyn
Copy link
Contributor

katzyn commented Jul 7, 2023

Please, add a new configuration option for your change, something like quotedNulls. And this option needs to be disabled by default. If this option is enabled, you need to use normal readNull(String) in Csv.readValue() in a branch with quoted string.

Spreadsheets are not database systems, their columns with numeric or any other formats may still contain strings as well and they don't really have nulls.

CSV is used not only with these spreadsheets, some people use them for other purposes.

@ctabin
Copy link
Contributor Author

ctabin commented Jul 7, 2023

Hi @katzyn and @manticore-projects, thanks for jumping in.

Indeed #3786 introduces a breaking change but since it was accepted, I just wanted to minimize the impact of it and allow to at least give a possible way to keep the old behavior (by explicitly set the option nullString=NULL in the csv read/write options), since we cannot update to 2.2.220 because we need to have empty strings instead of NULL values when loading data from CSV (our only solution otherwise would be to completely rewrite the dump/loading of CSVs from/to H2 ourselves).

Let me know if you want to rollback #3786 and then implement the quotedNulls flags or if directly a new PR would be good enough.

@ctabin
Copy link
Contributor Author

ctabin commented Jul 16, 2023

Replaced by #3842

@ctabin ctabin closed this Jul 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants