ARROW-9288: [C++][Dataset] Fix PartitioningFactory with dictionary encoding for HivePartioning by jorisvandenbossche · Pull Request #7608 · apache/arrow

jorisvandenbossche · 2020-07-01T15:08:26Z

No description provided.

github-actions · 2020-07-01T15:17:04Z

https://issues.apache.org/jira/browse/ARROW-9288

jorisvandenbossche

The approach here is to also determine field_names_ in HivePartitioningFactory after inspecting (for DirectoryPartitioningFactory, those field names are passed to the constructor). So that we can then trim the schema and have the dictionaries match the order of the schema.

However, thinking of it now: there might still be a problem if the user specified the full dataset schema so no inspection happens .. So we might need to think of a better solution.

(I should also add some C++ tests)

jorisvandenbossche · 2020-07-01T15:20:27Z

cpp/src/arrow/dataset/partition.cc

I should probably guard here against the case that field_names_ was not yet updated (if Finish is called without Inspect being called), with empty vector?

Absolutely, the first line of this method should just call

auto field_names = FieldNames();

and replace occurrences of the private member.

There is no FieldNames() method on the PartitioningFactory (only the impl has one, but that is not accessible here; that's the reason I added the field_names_ private member to store those)

…coding for HivePartioning

cpp/src/arrow/dataset/partition.cc

Co-authored-by: Benjamin Kietzman <[email protected]>

wesm

+1

jorisvandenbossche force-pushed the ARROW-9288 branch from 455f6dc to 9f0a90b Compare July 1, 2020 15:09

jorisvandenbossche commented Jul 1, 2020

View reviewed changes

jorisvandenbossche requested review from bkietz and fsaintjacques July 1, 2020 15:28

jorisvandenbossche mentioned this pull request Jul 1, 2020

ARROW-8647: [C++][Python][Dataset] Allow partitioning fields to be inferred with dictionary type #7536

Closed

ARROW-9288: [C++][Dataset] Fix PartitioningFactory with dictionary en…

81aecfa

…coding for HivePartioning

jorisvandenbossche force-pushed the ARROW-9288 branch from 9f0a90b to 81aecfa Compare July 7, 2020 09:12

bkietz requested changes Jul 8, 2020

View reviewed changes

cpp/src/arrow/dataset/partition.cc Outdated Show resolved Hide resolved

cpp/src/arrow/dataset/partition.cc Outdated Show resolved Hide resolved

jorisvandenbossche and others added 3 commits July 8, 2020 22:05

Update cpp/src/arrow/dataset/partition.cc

bb32e8b

Co-authored-by: Benjamin Kietzman <[email protected]>

Update cpp/src/arrow/dataset/partition.cc

1342f56

Co-authored-by: Benjamin Kietzman <[email protected]>

remove comment

0e89728

wesm approved these changes Jul 12, 2020

View reviewed changes

wesm closed this in 44aa829 Jul 12, 2020

jorisvandenbossche deleted the ARROW-9288 branch July 13, 2020 08:54

asfimport mentioned this pull request Jul 12, 2020

[C++][Dataset] Discovery of partition field as dictionary type segfaulting with HivePartitioning #25380

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ARROW-9288: [C++][Dataset] Fix PartitioningFactory with dictionary encoding for HivePartioning#7608

ARROW-9288: [C++][Dataset] Fix PartitioningFactory with dictionary encoding for HivePartioning#7608
jorisvandenbossche wants to merge 4 commits intoapache:masterfrom
jorisvandenbossche:ARROW-9288

jorisvandenbossche commented Jul 1, 2020

Uh oh!

github-actions bot commented Jul 1, 2020

Uh oh!

jorisvandenbossche left a comment •

edited

Loading

Uh oh!

jorisvandenbossche Jul 1, 2020

Uh oh!

fsaintjacques Jul 2, 2020

Uh oh!

jorisvandenbossche Jul 7, 2020

Uh oh!

Uh oh!

Uh oh!

wesm left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

jorisvandenbossche commented Jul 1, 2020

Uh oh!

github-actions bot commented Jul 1, 2020

Uh oh!

jorisvandenbossche left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jorisvandenbossche Jul 1, 2020

Choose a reason for hiding this comment

Uh oh!

fsaintjacques Jul 2, 2020

Choose a reason for hiding this comment

Uh oh!

jorisvandenbossche Jul 7, 2020

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

wesm left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jorisvandenbossche left a comment •

edited

Loading