Skip to content

Closes #1140 spm load#1141

Merged
jan-petr merged 10 commits intodevelopfrom
feature-#1140_spm_load
Aug 15, 2022
Merged

Closes #1140 spm load#1141
jan-petr merged 10 commits intodevelopfrom
feature-#1140_spm_load

Conversation

@HenkMutsaerts
Copy link
Member

Linked issue

Closes #1140

How to test

Required: if not defined in the linked issue, add a simple test description here

Comments

Optional: add helpful comments for the reviewers here

@HenkMutsaerts HenkMutsaerts linked an issue Aug 3, 2022 that may be closed by this pull request
8 tasks
Copy link
Contributor

@jan-petr jan-petr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, but this needs a complete overhaul. Can you send the wrong file?! And I'll code it nicely.

@jan-petr jan-petr self-requested a review August 4, 2022 11:57
@jan-petr jan-petr assigned jan-petr and unassigned maartenhammer Aug 4, 2022
@HenkMutsaerts
Copy link
Member Author

Great refactoring, but you now report and fix all empty cells; whereas we only should report and fix the empty cells at the end of lines (so too many delimiters at end of lines). Details:

171 "If there's a single line-end only in the data, then consider everything a single line a divide (if possible) by the number of columns" -> what do you mean? difficult to read

174 rem(numel(d{1}, N) -> this used to be the check if we can reshape by number of columns, so this check is now redundant?

208 Am I correct that you are here filling empty cells with empty strings? That's not what we should do. Empty cells are not the problem (if they were, this should be filled with 'n/a' per BIDS (or NaN)). By the way, difficult to read for me, I need more comments ;)

217 -> this warning is incorrect, it should only warn for empty cells at row ends, not for any empty cells

222 -> this warning is correct! We should add +1 though to compensate for the header.
And the repaired file is the original file, so the line endings still have too many delimiters (or empty cells)

FixIllegalEmptyCells and DetectIllegalEmptyCells are not used anymore?

@jan-petr
Copy link
Contributor

jan-petr commented Aug 4, 2022

Great refactoring, but you now report and fix all empty cells; whereas we only should report and fix the empty cells at the end of lines (so too many delimiters at end of lines). Details:

It fixes too much or too little cells per line. It reports all empty for convenience of user that he knows he has empty cells, but I can disable that.

HENK: Thanks, no it should only report the illegal empty cells at end of lines (222).

208 Am I correct that you are here filling empty cells with empty strings? That's not what we should do. Empty cells are not the problem (if they were, this should be filled with 'n/a' per BIDS (or NaN)). By the way, difficult to read for me, I need more comments ;)

If cells are empty, then they are initialised as empty. I am only changing type of the empty cell from double to char. Because all the read cells are chars.

HENK: Are you sure it won't read any cells as numeric? If all cells are read as char then this is fine of course; otherwise this is just redundant code and we can skip this whole part. We don't care about empty cells that are not at the eol.

222 -> this warning is correct! We should add +1 though to compensate for the header. And the repaired file is the original file, so the line endings still have too many delimiters (or empty cells)

I added +1 to compensate for header.
Repaired file is correct. I has the correct number of delimiters to have the same number of delimiters for each line. That is - having delimiters also for empty cells. It rectifies the number of delimiters as this should correctly be. Instead of removing the delimiters for empty cells which would have been wrong.

FixIllegalEmptyCells and DetectIllegalEmptyCells are not used anymore?

HENK: ???

NEW TEST:

image

It still mentions all empty cells. It should only mention the empty cells at the end of the line (making a line having more cells than the header has). Did you remove the wrong warning? The warning on line 222 was correct, the warning on line 217 was incorrect.

image

The code still doesn't remove empty cells at the end of the line... It should remove the empty cells at the end of the line, such that the number of delimiters is the same for each line.

@jan-petr
Copy link
Contributor

jan-petr commented Aug 4, 2022

208 Am I correct that you are here filling empty cells with empty strings? That's not what we should do. Empty cells are not the problem (if they were, this should be filled with 'n/a' per BIDS (or NaN)). By the way, difficult to read for me, I need more comments ;)

If cells are empty, then they are initialised as empty. I am only changing type of the empty cell from double to char. Because all the read cells are chars.

HENK: Are you sure it won't read any cells as numeric? If all cells are read as char then this is fine of course; otherwise this is just redundant code and we can skip this whole part. We don't care about empty cells that are not at the eol.

Yes - numbers are read also as chars.
So this code makes it consistent. Maybe not really necessary, but it helps. Maybe not now, but for future functions, it is good to have clean outputs.

FixIllegalEmptyCells and DetectIllegalEmptyCells are not used anymore?
HENK: ???

Yes - these are deleted - not used anymore.

NEW TEST:

image

It still mentions all empty cells. It should only mention the empty cells at the end of the line (making a line having more cells than the header has). Did you remove the wrong warning? The warning on line 222 was correct, the warning on line 217 was incorrect.

Sorry. Yes I deactivated the wrong warning. Function was intact though. Just the warnings were switched.

As before. It finds lines with too few or too much delimiters (or empty cells). It doesn't really care if the cell is empty or not. Just checks really the delimiters. So reporting is fixed, but function unchanged.

image

The code still doesn't remove empty cells at the end of the line... It should remove the empty cells at the end of the line, such that the number of delimiters is the same for each line.

I don't understand. This sentence seems contradictory. The number of delimiters is the same for each line - same as in the header. But that means that there can be empty cells at the end of the line.

So the header has 9 cells. And each line should have 8 delimiters = 9 normal cells, or 5 normal cells and 4 empty, or 4 normal and 5 empty etc. And that's what you see in the figure you've sent - 6 filled cells and then 3 empty ones.

I can of course try to write a function that has 6 normal cells on the line and no empty cells. But if I remove empty cells, then there won't be the delimiters and there will only be 5 delimiters per line = that's not what we want. And probably tsv_write doesn't support this anyway.

Please check and say exactly what to do with empty cells/delimiters at the end of the line. Previous comments were unclear.

Still - all that was in the code before. Only the warnings and comments in code changed....

HENK: Yes you are right, now it works!

@jan-petr jan-petr removed the request for review from maartenhammer August 15, 2022 07:47
@jan-petr jan-petr assigned jan-petr and unassigned maartenhammer Aug 15, 2022
@jan-petr
Copy link
Contributor

The function textscan treats a last empty cell (so a delimiter follow by line-end) as a non-existing cell (not empty, but completely skipping it). So "text,text,text EOL" is three cells (comma being a delimiter). "text,,,text EOL" is four cells. But "text, EOL" one cell. Though "text,,EOL" two cells.

So I've inserted an extra fix that adds an extra cell at the end of the line if it ends with a delimiter.

Now it all works.

Copy link
Contributor

@jan-petr jan-petr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK.

@HenkMutsaerts
Copy link
Member Author

Funnily, I think that none of the changes were necessary for the participants.tsv that I now worked with, except for your very last commit ...

@jan-petr jan-petr force-pushed the feature-#1140_spm_load branch from 71eded8 to 52171bf Compare August 15, 2022 18:17
@jan-petr jan-petr merged commit 52171bf into develop Aug 15, 2022
@jan-petr jan-petr deleted the feature-#1140_spm_load branch August 15, 2022 18:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

spm_load

3 participants