Skip to content

Conversation

@remram44
Copy link
Contributor

@remram44 remram44 commented Mar 16, 2018

This adds a combo box to select how missing values should be inserted.

Fixes #1339

Current behavior:

if(f->isInteger() && f->notnull()) // If this is an integer column but NULL isn't allowed, insert 0
nullValues << "0";
else if(f->isInteger() && !f->notnull()) // If this is an integer column and NULL is allowed, insert NULL
nullValues << QByteArray();
else // Otherwise (i.e. if this isn't an integer column), insert an empty string
nullValues << "";

The options after my patch are "empty string" (default, matches sqlite3's .import, current behavior) or "NULL" (probably what you want).

Open questions:

  • I didn't change the behavior for INTEGER columns, which is NULL if NULL is allowed, 0 otherwise
  • More options? (fail on missing value, use default for column, insert 0/false/"(missing)"/...)
  • First time contributor; check if I broke tests? Config?

</item>
<item row="7" column="0">
<layout class="QVBoxLayout" name="verticalLayout_2"/>
</item>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to have been there by mistake

@justinclift
Copy link
Member

Excellent @remram44, this should move things in the right direction. 😄

@mgrojo or @MKleusberg Would either of you be ok to do a quick review of this code? It looks pretty simple.

@remram44 As a thought, it's probably useful to also add the "use default value for the field, if defined" (possibly with better wording) thing too. I'm kind of expecting that'd be well used as well, though that's more a gut feeling thing than based on any real evidence. 😉

@remram44
Copy link
Contributor Author

From what I understand, the default is added if you don't specify the field at all in the statement, so there would need to be more than one prepared statement to do this. Therefore it is more complex than the case I already added.

@mgrojo
Copy link
Member

mgrojo commented Mar 17, 2018

Thanks for the contribution, @remram44.

I've made a quick review. It's already a useful addition but ideally it should also work for non-existing tables.

I think it is right to leave the current behaviour for integer columns. It makes no sense to insert there an empty string.

I have a doubt about general NOT NULL columns of existing tables when the insert-null option is selected. Should the same check as for integer columns performed and an empty-string be inserted instead? Or should it fail the import? I'd think that it is better to insert the empty-string for that column but allow to insert the NULL for others. With the current PR it would fail (not tested, just assuming it).

@justinclift
Copy link
Member

From what I understand, the default is added if you don't specify the field at all in the statement, so there would need to be more than one prepared statement to do this. Therefore it is more complex than the case I already added.

No worries, it was just a thought. 😄

@justinclift
Copy link
Member

I have a doubt about general NOT NULL columns of existing tables when the insert-null option is selected.

That's a good point. I'm not sure what the ideal behaviour should be, but we should at least warn the user somehow when this situation occurs so they can take some kind of action (eg make a decision, fix things manually, etc).

@remram44
Copy link
Contributor Author

I personally like the option of having it fail if my data has unexpected NULLs, or if my table has an unexpected NOT NULL. It's easy enough to select a different option in the import or remove that NOT NULL constraint, and I definitely wouldn't want things to happen silently.

I can actually see myself selecting the "import as NULL" option to import data into a NOT NULL column when I don't expect missing values.

@remram44
Copy link
Contributor Author

It's already a useful addition but ideally it should also work for non-existing tables.

Seems like I completely skipped over that if block, I can add that.

@justinclift
Copy link
Member

@remram44 Yeah, failing the import for that situation makes sense. 😄

@remram44
Copy link
Contributor Author

The situation when creating a new table is somewhat weird, e3e2e81 is my attempt. It will insert NULLs if that's what was selected, and otherwise will stick to the current behavior; however I'm not sure about that behavior.

@MKleusberg
Copy link
Member

This is looking pretty good! Thanks a lot, @remram44 👍

I have stumbled upon two unexpected peculiarities:

  1. When importing a NULL value into an existing table with a NOT NULL constraint, you get an error as you would expect. But after that it says "Import successful" and closes the dialog. This is definitely not your fault and has already been an issue before this PR. It's just that I noticed this in the "error message on missing value" feature discussed above. We might want to open a separate issue for this if we haven't already.

  2. Again the "error message on missing value" feature. When importing a NULL value into an existing table with a NOT NULL constraint on an INTEGER column, you don't get the expected error message. Even though this confused me for a second, I would leave it this way. It's basically about the fact that the "error on missing value" and the "0 on integer columns" rules are in conflict here. But what do you think? Which rule should win in this special case?

So, as said I would leave it this way and merge this as-is. I certainly couldn't find any problems in the actual implementation 😄

@MKleusberg
Copy link
Member

As for the debate regarding default values: We probably don't need different prepared statements here when we use the f->defaultValue() function in lines 563-573. It shouldn't be much more than an additional else if statement this way. The only question then is what to do when importing into a new table for which we don't have default values: I think we should fall back to empty strings in this case but maybe others disagree here.

Either way, if you want to and have the time you can add this as a third combo box option, @remram44. If not, I will give it a try 😄

@remram44
Copy link
Contributor Author

I think I will let you work on the third option @MKleusberg, as I'm not sure I will be able to get to it for a while.

@karim
Copy link
Member

karim commented May 8, 2018

I just want to say something. 😃

Regarding the expected behavior of DB4S when importing empty values from a CSV.

  1. If the column has a default value then it gets priority over everything else.
    • This is the default action for SQLite, it fills columns with default values when a value is not given.
    • The column being NOT NULL or not doesn't matter, since a value will be set for it anyway.
    • ✔️ No need for a checkbox, it should be the right thing to do. 👍
  2. If the column does not have a default value.
    1. Check if the column does not have NOT NULL, if true:
      • Then just add NULL to it, it should be the default.
      • Be consistent. CSV value is NULL, SQLite table doesn't forbid NULL, so just add NULL.
      • ✔️ Also, no need for a checkbox, and this is what everyone here wants. 🎉
    2. Else if it does have NOT NULL, then we have two options, either:
      1. Fail the import (make sense since you are importing NULL to a NOT NULL column), or
      2. Set a default value for it (e.g. 0 for INTEGER, "" for TEXT)
        • Probably by adding a checkbox to "set default values for empty fields in NOT NULL columns" (need a better wording 😄)

@chrisjlocke
Copy link
Member

That actually makes a lot of sense. 😺
Very logical.

@MKleusberg
Copy link
Member

Maybe I'm overthinking this but what about the case where I want to import a CSV file into my table and keep it exactly as it is (i.e. empty fields not replaced by default values) to then do something with these empty records? Trying to come up with an example: lets say I have a table messages with a column read which is set to 1 if the message was read. The default value would be 0. Now I import many messages from some source via a CSV file and I would like to be able to check if all the imported messages have complete information. I can only do this by checking if there are records with read == '', unless of course those empty values were replaced by the default value. [You could argue that this is bad database design and that there should be a CHECK constraint or whatever but my point only is that there seem to be cases which I can imagine (and imagine myself being in) where I would like to be able to select the behaviour here.]

What do you think? Maybe I'm missing something?

@karim
Copy link
Member

karim commented May 22, 2018

Yes, you might be overthinking this. 😄

I believe the database design should be honored over the imported file. The default value should be inserted whenever no value is given, and that's how SQLite works.

If you want to import some data into a column with default value but do not want it to be set for empty values (violating the database design), then you shouldn't have a default value.

For your example, you can remove the default value, import the csv, then set it again after import to get the result you are after. If it is not user-friendly, or you might forget to do that every time before importing, then a check mark "ignore/don't set default values for empty fields" (with a tool tip explaining that the default behavior is to set the default value when none is given) should be enough to do this automatically (unset default value, import success, restore back).

One thing that might be a problem though is finding out what the user thinks is a NULL value.

Is it value,,value or value,"",value?

User A might think that both are NULL, but user B might think that only the first one is and the second one is just an empty string and should be inserted as-is, while user C might think that none has a NULL value and both are just empty values, and a null value is when it is like value,null,value.

@MKleusberg
Copy link
Member

You're probably right, @karim. I think I might add the checkbox as you have suggested anyway, just to make sure we lose no functionality. And it should be easy to add 😉 If it turns out we would need more checkboxes to cover some other settings I will hide them away behind an 'Advanced' button. The ,, vs ,"", distinction is actually a bit trickier to implement because it's not only requiring changes in the insert code but also in the parser.

But that's definitely beyond this pull request. So I will merge this now and deal with the default settings later because this is already a huge improvement 😄

Thanks again, @remram44, and sorry for taking so much time merging this 😉 If you ever feel like making another pull request you're very welcomed to do so 😄

@MKleusberg MKleusberg merged commit 17f1eab into sqlitebrowser:master May 22, 2018
@remram44
Copy link
Contributor Author

No problem! Sorry I couldn't take care of the rest of the concerns here,

@MKleusberg
Copy link
Member

No worries 😄 I just opened another issue so they aren't lost and will deal with them over the next couple of days. And either way, your pull request did most of the work here already.

@remram44 remram44 mentioned this pull request May 31, 2018
MKleusberg pushed a commit that referenced this pull request Jun 7, 2018
* Add a combo box for missing values in import ui

* Add ui pieces to .cpp file

* Insert NULL values if requested

* Allow inserting NULLs in new table also
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants