Skip to content

Add arrow io format#10580

Merged
alexey-milovidov merged 39 commits intoClickHouse:masterfrom
FawnD2:arrow-io-format
May 10, 2020
Merged

Add arrow io format#10580
alexey-milovidov merged 39 commits intoClickHouse:masterfrom
FawnD2:arrow-io-format

Conversation

@FawnD2
Copy link
Copy Markdown
Contributor

@FawnD2 FawnD2 commented Apr 29, 2020

I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en

Changelog category (leave one):

  • New Feature

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

  • Add Arrow IPC File format (Input and Output)
  • Fix incorrect work of resetParser() for Parquet Input Format
  • Add zero-copy optimization for ORC for RandomAccessFiles
  • Add missing halffloat type for input parquet and ORC formats
    ...

Detailed description / Documentation draft:

...

By adding documentation, you'll allow users to try your new feature immediately, not when someone else will have time to document it later. Documentation is necessary for all features that affect user experience in any way. You can add brief documentation draft above, or add documentation right into your patch as Markdown files in docs folder.

If you are doing this for the first time, it's recommended to read the lightweight Contributing to ClickHouse Documentation guide first.

@FawnD2 FawnD2 marked this pull request as draft April 29, 2020 11:40
@blinkov blinkov added doc-alert pr-feature Pull request with new product feature labels Apr 29, 2020
@alexey-milovidov
Copy link
Copy Markdown
Member

Minor:

2020-04-29 15:14:08 In file included from ../src/Processors/Formats/Impl/ArrowBlockInputFormat.cpp:1:
2020-04-29 15:14:08 ../src/Processors/Formats/Impl/ArrowBlockInputFormat.h:35:7: error: no newline at end of file [-Werror,-Wnewline-eof]
2020-04-29 15:14:08 #endif
2020-04-29 15:14:08       ^
2020-04-29 15:14:08 ../src/Processors/Formats/Impl/ArrowBlockInputFormat.cpp:113:7: error: no newline at end of file [-Werror,-Wnewline-eof]
2020-04-29 15:14:08 #endif
2020-04-29 15:14:08       ^


namespace DB
{
class Context;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused forward declaration.

copyData(in, file_buffer);
}

std::unique_ptr<arrow::Buffer> local_buffer = std::make_unique<arrow::Buffer>(file_data);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to read data without excessive copying and allocating O(n) memory?
That's extremely suboptimal.


std::unique_ptr<arrow::Buffer> local_buffer = std::make_unique<arrow::Buffer>(file_data);

std::shared_ptr<arrow::io::RandomAccessFile> in_stream(new arrow::io::BufferReader(*local_buffer));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make_shared?

@FawnD2 FawnD2 force-pushed the arrow-io-format branch from f3134b5 to 182de47 Compare May 4, 2020 14:30
@FawnD2 FawnD2 marked this pull request as ready for review May 5, 2020 02:23
@FawnD2 FawnD2 changed the title [wip] Add arrow io format Add arrow io format May 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-feature Pull request with new product feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants