Skip to content

Comments

Remove file_type() from FileFormat#10499

Merged
alamb merged 1 commit intoapache:mainfrom
Jefffrey:remove_file_type
May 15, 2024
Merged

Remove file_type() from FileFormat#10499
alamb merged 1 commit intoapache:mainfrom
Jefffrey:remove_file_type

Conversation

@Jefffrey
Copy link
Contributor

Which issue does this PR close?

Closes #8657

Rationale for this change

Was implementing rudimentary ORC integration for DataFusion:

https://github.com/datafusion-contrib/datafusion-orc/blob/main/src/datafusion/file_format.rs

Noticed that to implement FileFormat it requires a file_type(), even though this is not really used anywhere. Removing this can make it clearer for those wanting to extend with more custom file formats.

What changes are included in this PR?

Remove FileFormat::file_type() which was unused.

Are these changes tested?

Are there any user-facing changes?

Yes, function removed from trait

@github-actions github-actions bot added the core Core DataFusion crate label May 14, 2024
@Jefffrey
Copy link
Contributor Author

There are some related issues that I think might be good to close now? #8345 and #8637

I was able to implement basic read support for ORC files via ListingTable provider:

https://github.com/datafusion-contrib/datafusion-orc/blob/main/src/datafusion/mod.rs

Which might be enough to demonstrate those issues are done?

@alamb alamb added the api change Changes the API exposed to users of the crate label May 14, 2024
@alamb
Copy link
Contributor

alamb commented May 14, 2024

Makes sense to me -- thank you @Jefffrey

As you go through implementing ORC support, if you hit anything else that woudl make it easier to add new format support to the core and/or listing table that would be great.

@alamb
Copy link
Contributor

alamb commented May 14, 2024

strictly speaking this is an API change but I think it makes it easier to implement new formats 🚀

@alamb
Copy link
Contributor

alamb commented May 14, 2024

There are some related issues that I think might be good to close now? #8345 and #8637

I was able to implement basic read support for ORC files via ListingTable provider:

https://github.com/datafusion-contrib/datafusion-orc/blob/main/src/datafusion/mod.rs

Which might be enough to demonstrate those issues are done?

Indeed -- it sounds reasonable

In my mind, the ideal example is to make an example in datafusion-examples showing how to create a custom file format

I wonder if we could create a custom CSV reader that ignores comments 🤔 -- for example #10262 that @bbannier has been working on

@Jefffrey
Copy link
Contributor Author

As you go through implementing ORC support, if you hit anything else that woudl make it easier to add new format support to the core and/or listing table that would be great.

Will definitely keep this in mind 👍

@alamb alamb merged commit 33aea7e into apache:main May 15, 2024
@alamb
Copy link
Contributor

alamb commented May 15, 2024

Thanks again @Jefffrey

@Jefffrey Jefffrey deleted the remove_file_type branch May 16, 2024 18:47
findepi pushed a commit to findepi/datafusion that referenced this pull request Jul 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api change Changes the API exposed to users of the crate core Core DataFusion crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Make/Remove FileType enum and replace with a trait

2 participants