ARROW-1497: [Java] Fix JsonReader to initialize count correctly#1067
ARROW-1497: [Java] Fix JsonReader to initialize count correctly#1067icexelloss wants to merge 1 commit intoapache:masterfrom
Conversation
|
|
||
| try (VectorSchemaRoot rootFromJson = reader.read();) { | ||
| validateUnionData(count, rootFromJson); | ||
| Validator.compareVectorSchemaRoot(root, rootFromJson); |
There was a problem hiding this comment.
This would fail without the JsonFileReader change
| } | ||
|
|
||
| /* | ||
| * TODO: This method doesn't load some vectors correctly. For instance, it doesn't initialize |
There was a problem hiding this comment.
I can open a follow up Jira for this.
| throw new IllegalArgumentException("Expected field " + field.getName() + " but got " + name); | ||
| } | ||
| int count = readNextField("count", Integer.class); | ||
| vector.allocateNew(); |
There was a problem hiding this comment.
I am still not very familiar with when to call the allocateNew() method. Why is it necessary to set the value count?
There was a problem hiding this comment.
allocateNew allocates the buffer for the vector.
setValueCount sets the number of values in the vector.
We need to call setValueCount to correctly create the vector
There was a problem hiding this comment.
I think we should set lastSet before setting the value count else setValueCount() will corrupt the vector.
There was a problem hiding this comment.
Unfortunately it's hard to do it here unless we do some crazy instance matching here, I don't want to do that because it's too hard to maintain.
The correct way I think is to use the proper loadFieldBuffers in this class which initialize the vectors correctly and set things like lastSet( The TODO above)
There was a problem hiding this comment.
ok, sounds good to me. I think we need lastSet only for VarChar, VarBinary and ListVector?
There was a problem hiding this comment.
I am not too sure, but at least, + their Nullable version.
|
|
||
| VectorSchemaRoot root = new VectorSchemaRoot(parent.getChild("root")); | ||
| validateUnionData(count, root); | ||
| try (VectorSchemaRoot root = new VectorSchemaRoot(parent.getChild("root"))) { |
There was a problem hiding this comment.
why changing to a nested try blocks?
There was a problem hiding this comment.
Because I need both VectorSchemaRoot to compare equality.
|
@siddharthteotia can you take a look at this? @icexelloss can you change the PR title to start with "ARROW-1497:" (remove the brackets). thanks! |
|
+1 |
| } | ||
| int count = readNextField("count", Integer.class); | ||
| vector.allocateNew(); | ||
| vector.getMutator().setValueCount(count); |
There was a problem hiding this comment.
Would it work to do the following?
- vector.setInitialCapacity(count)
- vector.allocateNew()
- after all inner vectors are read, then vector.getMutator().setValueCount(count)
Doing this, you should be able to clean up all those similar calls in the inner vector loop too. I'm also used to setValueCount being called after values are populated, not sure if that makes a difference though.
There was a problem hiding this comment.
Oops, sorry I commented too late after the merge. I can look into this at another time.
There was a problem hiding this comment.
Can we create a follow up JIRA about this? thanks!
Author: Li Jin <[email protected]> Closes apache#1067 from icexelloss/json-reader-ARROW-1497 and squashes the following commits: 6d4e1df [Li Jin] Fix JsonReader to read union vectors correctly
No description provided.