Skip to content

Comments

move read/write outside of View#65

Merged
mikss merged 4 commits intogoldenfrom
readwrite_in_io_not_view
Jul 22, 2021
Merged

move read/write outside of View#65
mikss merged 4 commits intogoldenfrom
readwrite_in_io_not_view

Conversation

@mikss
Copy link
Contributor

@mikss mikss commented Jul 21, 2021

We discussed in a call that due to the way we anticipate build working, that View can just be metadata, not data. In this PR, I remove the .data attribute and I move read and write out of View methods.

I do not, however, move into a separate io package, because then that separate io package needs to be imported everywhere we want to use those readers (otherwise they are not registered). By placing the read/write implementations inside the same file as the view declaration (e.g., _read_pickle_localfile_python with class Python(View):) I think we avoid this issue.

@mikss mikss requested a review from JacobHayes July 21, 2021 21:09
Copy link
Member

@JacobHayes JacobHayes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few small things (mostly the -> Any for read), but looking good!

I do not, however, move into a separate io package, because then that separate io package needs to be imported everywhere we want to use those readers (otherwise they are not registered). By placing the read/write implementations inside the same file as the view declaration (e.g., _read_pickle_localfile_python with class Python(View):) I think we avoid this issue.

Not a strong position, but it's possible we could do:

# arti/io/core.py
from arti.formats.core import Format
from arti.storage.core import Storage
from arti.views.core import View

@multidispatch
def read(*, format: Format, storage: Storage, view: View) -> Any: ...

@multidispatch
def write(data: Any, *, format: Format, storage: Storage, view: View) -> None: ...

# arti/views/core.py doesn't import `io` at all

# arti/views/python.py
from arti.views.core import View
from arti.io.core import read, write

[... everything else the same ...]

@codecov
Copy link

codecov bot commented Jul 21, 2021

Codecov Report

Merging #65 (e13085b) into golden (85ff212) will not change coverage.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff            @@
##            golden       #65   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           21        22    +1     
  Lines          675       668    -7     
  Branches        75        74    -1     
=========================================
- Hits           675       668    -7     
Impacted Files Coverage Δ
src/arti/io/core.py 100.00% <100.00%> (ø)
src/arti/views/core.py 100.00% <100.00%> (ø)
src/arti/views/python.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 85ff212...e13085b. Read the comment docs.

@mikss
Copy link
Contributor Author

mikss commented Jul 21, 2021

Not a strong position, but it's possible we could do:

So in this example, split out read/write interfaces into io, but keep the read/write implementations in the views/{...}.py file? Isn't that confusing for a different reason (read/write in this special directory io but actual implementations elsewhere).

@JacobHayes
Copy link
Member

So in this example, split out read/write interfaces into io, but keep the read/write implementations in the views/{...}.py file?

Yeah

Isn't that confusing for a different reason (read/write in this special directory io but actual implementations elsewhere).

Yeah maybe a bit, but it is exactly why we're using multidispatch (eg: user wants to add their own handlers outside arti/ code).

--

Actually, now that we're thinking the user will be saying build(cls, data: Annotated[pd.DataFrame, MyArtifact]) -> ... instead of importing corresponding arti.views.* classes, it's not a given that these arti.view modules will be imported. We'll probably need some dynamic importing of all submodules (logic would live in arti/__init__.py) beneath arti.views. At that point, we could also move implementations to an arti.io and do the same dynamic importing (doesn't say we should move implementation, just that either way, we have the ~same "cost").

Then any implementations within arti/{views,io}/ OR third_party/arti/{views,io} OR user_module/arti/{views,io} would get registered automatically (any implementations outside arti/ would have to be imported manually by the user).

@mikss
Copy link
Contributor Author

mikss commented Jul 22, 2021

Actually, now that we're thinking the user will be saying build(cls, data: Annotated[pd.DataFrame, MyArtifact]) -> ... instead of importing corresponding arti.views.* classes, it's not a given that these arti.view modules will be imported. We'll probably need some dynamic importing of all submodules (logic would live in arti/__init__.py) beneath arti.views. At that point, we could also move implementations to an arti.io and do the same dynamic importing (doesn't say we should move implementation, just that either way, we have the ~same "cost").

Isn't the key import step at the "framework" level and not the "user" level? I.e., we need to import the "io stuff" at the same time that we resolve the "view stuff", which should hopefully only happen "at build time" (by the executor that is calling build along with fetching all the necessary data). From that perspective, I feel like putting the readers/writers with the view code makes sense (if you're importing the view, you're importing the associated readers and writers).

@mikss
Copy link
Contributor Author

mikss commented Jul 22, 2021

This latest commit e13085b moves the read/write interfaces to io/core.py, but keeps the implementations in corresponding view file (e.g., views/python.py).

In principle, the implementations could live anywhere that imports io.core, and in many cases they will live somewhere not indexed by the view type (e.g., suppose one wishes to implement a new storage medium like AWS S3, and wishes to hook that storage into all existing view variants like pandas, BigQuery, and the like).

For the purposes of the "core" library, however, we will (for now) keep reading and writing implementations with the corresponding view.

@mikss mikss merged commit 182316f into golden Jul 22, 2021
@mikss mikss deleted the readwrite_in_io_not_view branch July 22, 2021 20:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants