ARROW-5861: [Java] Initial implement to convert Avro record with primitive types#4812
ARROW-5861: [Java] Initial implement to convert Avro record with primitive types#4812tianchen92 wants to merge 5 commits intoapache:masterfrom
Conversation
java/adapter/avro/src/main/java/org/apache/arrow/AvroToArrowUtils.java
Outdated
Show resolved
Hide resolved
java/adapter/avro/src/main/java/org/apache/arrow/AvroToArrowUtils.java
Outdated
Show resolved
Hide resolved
java/adapter/avro/src/main/java/org/apache/arrow/AvroToArrowUtils.java
Outdated
Show resolved
Hide resolved
java/adapter/avro/src/main/java/org/apache/arrow/AvroToArrowUtils.java
Outdated
Show resolved
Hide resolved
Codecov Report
@@ Coverage Diff @@
## master #4812 +/- ##
==========================================
+ Coverage 87.42% 89.58% +2.16%
==========================================
Files 994 661 -333
Lines 140102 96617 -43485
Branches 1418 0 -1418
==========================================
- Hits 122481 86557 -35924
+ Misses 17259 10060 -7199
+ Partials 362 0 -362Continue to review full report at Codecov.
|
java/adapter/avro/src/main/java/org/apache/arrow/AvroToArrow.java
Outdated
Show resolved
Hide resolved
java/adapter/avro/src/main/java/org/apache/arrow/consumers/AvroStringConsumer.java
Outdated
Show resolved
Hide resolved
java/adapter/avro/src/test/java/org/apache/arrow/AvroToArrowTest.java
Outdated
Show resolved
Hide resolved
java/adapter/avro/src/main/java/org/apache/arrow/AvroToArrowUtils.java
Outdated
Show resolved
Hide resolved
java/adapter/avro/src/main/java/org/apache/arrow/AvroToArrowUtils.java
Outdated
Show resolved
Hide resolved
java/adapter/avro/src/main/java/org/apache/arrow/AvroToArrowUtils.java
Outdated
Show resolved
Hide resolved
java/adapter/avro/src/main/java/org/apache/arrow/AvroToArrowUtils.java
Outdated
Show resolved
Hide resolved
java/adapter/avro/src/main/java/org/apache/arrow/AvroToArrow.java
Outdated
Show resolved
Hide resolved
java/adapter/avro/src/main/java/org/apache/arrow/AvroToArrowUtils.java
Outdated
Show resolved
Hide resolved
|
|
||
| allocateVectors(root, DEFAULT_BUFFER_SIZE); | ||
|
|
||
| List<Consumer> consumers = createAvroConsumers(root); |
There was a problem hiding this comment.
it seems like we might need to understand more about decoders and the order they are expected to [decode(https://avro.apache.org/docs/1.8.2/api/java/org/apache/avro/io/ValidatingDecoder.html) in. This can also be for a follow-up PR
java/adapter/avro/src/main/java/org/apache/arrow/AvroToArrowUtils.java
Outdated
Show resolved
Hide resolved
| @Override | ||
| public void consume(Decoder decoder) throws IOException { | ||
| VarBinaryHolder holder = new VarBinaryHolder(); | ||
| ByteBuffer byteBuffer = decoder.readBytes(null); |
There was a problem hiding this comment.
we should probably keep a cached byte buffer for values below a certain size.
Even better would be if we could create a ByteBuffer that wraps the Vector and puts bytes directly where they need to go. This can be done in a separate PR.
There was a problem hiding this comment.
Can we address the first part of my comment (caching a ByteBuffer instead of creating a new one each time)?
Otherwise LGTM as start. @tianchen92
There was a problem hiding this comment.
Sure, I will create a new JIRA to address this, thanks!
emkornfield
left a comment
There was a problem hiding this comment.
Thanks this looks a lot closer to what I was expecting. It still seems like there might be some additional details we need to learn about the AVRO API but I think this can be done as a follow-up
Many thanks for your feedback :), revise would be provided later. |
|
@emkornfield Hi, Micah, could we get this PR merged? thanks :) And in the follow PR, I resolve most the comments and still have few questions, would you mind explaining a lot? thanks a lot! |
| import org.apache.avro.Schema.Type; | ||
| import org.apache.avro.io.Decoder; | ||
|
|
||
| import sun.reflect.generics.reflectiveObjects.NotImplementedException; |
There was a problem hiding this comment.
Sorry just noticed this, we should be using https://docs.oracle.com/javase/6/docs/api/java/lang/UnsupportedOperationException.html
There was a problem hiding this comment.
Fixed, thanks! hope could merge soon then I could start follow-up works
|
An adapter for CSV would probably be nice as well. I don't know if there is an efficient CSV parser in java that we could potentially already leverage (or if we want bindings to the C++ version). might be worth an e-mail to the mailing list to get opinions. |
Maybe https://commons.apache.org/proper/commons-csv/ is a good choice. |
Seems reasonable to me. Please ask on the mailing list in case anyone else has input. |
Thanks for your effort, I will start discuss later :) |
|
@emkornfield Build passed :) |
|
+1, thank you. |
…itive types Related to [ARROW-5861](https://issues.apache.org/jira/browse/ARROW-5861). Initial implement to support convert Avro record with primitive types to Arrow objects. Author: tianchen <[email protected]> Closes #4812 from tianchen92/ARROW-5861 and squashes the following commits: 2439478 <tianchen> use UnsupportedOperationException fa3f39a <tianchen> resolve comments 7c3a730 <tianchen> add consumers and use GenericDatumReader 61d2dac <tianchen> fix style 54479c8 <tianchen> Initial implement to convert Avro record with primitive types
…itive types Related to [ARROW-5861](https://issues.apache.org/jira/browse/ARROW-5861). Initial implement to support convert Avro record with primitive types to Arrow objects. Author: tianchen <[email protected]> Closes apache#4812 from tianchen92/ARROW-5861 and squashes the following commits: 2439478 <tianchen> use UnsupportedOperationException fa3f39a <tianchen> resolve comments 7c3a730 <tianchen> add consumers and use GenericDatumReader 61d2dac <tianchen> fix style 54479c8 <tianchen> Initial implement to convert Avro record with primitive types
Related to ARROW-5861.
Initial implement to support convert Avro record with primitive types to Arrow objects.