Skip to content

Comments

Introduce python view, pickle format, local storage#62

Merged
mikss merged 9 commits intogoldenfrom
io_format_storage_view
Jul 21, 2021
Merged

Introduce python view, pickle format, local storage#62
mikss merged 9 commits intogoldenfrom
io_format_storage_view

Conversation

@mikss
Copy link
Contributor

@mikss mikss commented Jul 20, 2021

This PR introduces simple example implementations of View (Python), Storage (LocalFile), and Format (Pickle).

The View object is instantiated with .read and persisted with .write, but these methods in fact call out to multiple dispatch functions read and write, that implement the expected behavior based on the types of the passed input arguments (namely, view, storage, format).

In principle, the read/write will have to do quite a bit of heavy lifting, potentially using type_system attributes for conversion, but in this lucky example of Python/Local/Pickle, none of that is necessary.

@mikss mikss changed the base branch from golden to trivial_artifact July 20, 2021 17:20
@codecov
Copy link

codecov bot commented Jul 20, 2021

Codecov Report

Merging #62 (bddd507) into golden (ce84794) will not change coverage.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff            @@
##            golden       #62   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           17        21    +4     
  Lines          619       675   +56     
  Branches        74        75    +1     
=========================================
+ Hits           619       675   +56     
Impacted Files Coverage Δ
src/arti/formats/pickle.py 100.00% <100.00%> (ø)
src/arti/internal/utils.py 100.00% <100.00%> (ø)
src/arti/storage/local.py 100.00% <100.00%> (ø)
src/arti/views/core.py 100.00% <100.00%> (ø)
src/arti/views/python.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ce84794...bddd507. Read the comment docs.

@mikss mikss changed the title DRAFT: views RAD-2613: introduce python view, pickle format, local storage Jul 20, 2021
@mikss mikss marked this pull request as ready for review July 20, 2021 18:47
@mikss mikss requested review from JacobHayes and joycex99 July 20, 2021 18:48
@JacobHayes JacobHayes force-pushed the io_format_storage_view branch from 0ea31f8 to a2f07c7 Compare July 20, 2021 19:34
Copy link
Member

@JacobHayes JacobHayes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking quite good! Main question is about the View mutation vs return - that may become a bit more clear as we build some of the higher up machinery, but I do have a bit of hesitancy around the mutation (despite arguing for it earlier, iirc 😁).



@write.register
def _write(view: Python, format: Pickle, storage: LocalFile) -> None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Planting this seed (probably should wait to experiment), but iirc multimethod supports class hierarchy based dispatch. In other words, if we built a few mixins (or Protocols, granted multimethod support) for format/storage/view behavior (right now, they're all data), then we could add some high-level read/write handlers that delegate to cover a lot of ground, while still being able to write specific or optimized handlers as needed.

Eg: we could:

  • add a FileLike mixin/Protocol that has an .open method
  • make LocalFile subclass FileLike and implement .open
  • make read/write handlers that have (format: Pickle, storage: FileLike, view: Python) and use storage.open instead of open
  • mix and match with format/view interface mixins for more generalization OR dive into specific classes (like the current PR) for optimizations



@multidispatch
def read(view: View, format: Format, storage: Storage) -> View:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about changing this to -> None and just relying on the view.data assignment? That might be a bit more clear that it works via mutation.

Alternatively, should we get rid of the mutation (can't remember where our prior discussion landed) and return the underlying data/-> Any? I think the challenge (is it one?) we realized was that then write needs an additional data: Any argument

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we like this idea:

Alternatively, should we get rid of the mutation (can't remember where our prior discussion landed) and return the underlying data/-> Any? I think the challenge (is it one?) we realized was that then write needs an additional data: Any argument

Specifically because then the issue of "where is my data" becomes muddled. One side effect is, as you point out, write then needs to find the data before writing. I think the way we have it as-is, a View is an actual object with the data, rather than just an instruction, which smooths over these issues.

Copy link
Member

@JacobHayes JacobHayes Jul 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still torn because that makes View act a bit different than the strictly metadata Format, Storage, etc (which can be reused across calls to build). For example, if we added a PandasDataFrameView(index=["date"]), than in each call to build (eg: for multiple partitions), we would have to recreate that object w/ the same metadata just to wrap different data.

--

Let's leave View.data is for now and we can reconsider (weighing the costs of an extra write argument) later on.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we would have to recreate that object w/ the same metadata just to wrap different data.

🤷 Is an artifact a thing or a collection of things? Is a view a snapshot of a thing or a collection of snapshots of things? I think if Views support slicing (in this case, rows), different instances per partition feels natural.

Base automatically changed from trivial_artifact to golden July 21, 2021 14:58
@JacobHayes JacobHayes force-pushed the io_format_storage_view branch from cb71a5c to bddd507 Compare July 21, 2021 15:46
@JacobHayes
Copy link
Member

Rebased to resolve conflicts + added two tidying commits (1, 2)

Copy link
Member

@JacobHayes JacobHayes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking great! Punting on View.data mutation until we build some of the higher up machinery.

--

Next up, we probably need to update Producer.build to accept Views rather than Artifacts directly... Thinking about that a bit, if we expand View to have similar matching as the TypeAdapters, then maybe the build signature can be Annotated[pd.DataFrame, MyArtifact] rather than requiring PandasDataFrame[MyArtifact]

More complex cases may still want to use PandasDataFrame directly to pick a specific View or set additional metadata, but the simple cases would stay simple. 🤔

@mikss mikss merged commit 85ff212 into golden Jul 21, 2021
@mikss mikss deleted the io_format_storage_view branch July 21, 2021 16:49
@JacobHayes JacobHayes changed the title RAD-2613: introduce python view, pickle format, local storage Introduce python view, pickle format, local storage Mar 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants