Skip to content

Conversation

@janvanrijn
Copy link
Member

What does this PR implement/fix? Explain your changes.

It allows run objects (including predictions and traces) to be serialized to disk, and reloaded. This functionality is almost used in all my projects, and I can imagine us using this for the benchmark study.

How should this PR be tested?

Unit tests should pass, code should make sense, please check the unit test if run equality check is OK.

Any other comments?

@janvanrijn janvanrijn requested a review from mfeurer April 30, 2018 02:11
Copy link
Collaborator

@mfeurer mfeurer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks mostly good, a few requests for changes.

pp.text(str(self))

@classmethod
def from_filesystem(cls, folder):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add a docstring?


return run

def to_filesystem(self, output_directory):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add a docstring, here, too?

run_xml = self._create_description_xml()
predictions_arff = arff.dumps(self._generate_arff_dict())

with open(output_directory + '/description.xml', 'w') as f:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please use os.path.join as above?


with open(output_directory + '/description.xml', 'w') as f:
f.write(run_xml)
with open(output_directory + '/predictions.arff', 'w') as f:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.


if self.trace_content is not None:
trace_arff = arff.dumps(self._generate_trace_arff_dict())
with open(output_directory + '/trace.arff', 'w') as f:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And here

run.to_filesystem(cache_path)

run_prime = openml.runs.OpenMLRun.from_filesystem(cache_path)
self._test_run_obj_equals(run, run_prime)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should add a check here that the trace is available. The function _test_run_obj_equals does not guarantee this.

@janvanrijn
Copy link
Member Author

Agreed to all.

Copy link
Collaborator

@mfeurer mfeurer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more thing, could you please add a note about the new functionality to our (outdated) changelog (I created an issue to update that here: #460)? We need to do this now as we're on pypi as we'll slowly get more users.

@classmethod
def from_filesystem(cls, folder):
"""
The inverse of the to_filesystem method. Initiates a run based
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that initiate is unfortunate wording here as it also means 'to start'. How about 'instantiates a run object'?

@mfeurer
Copy link
Collaborator

mfeurer commented May 1, 2018

I did not yet merge this to develop/master as you requested in my other PR because we can do this whenever we're ready to start up some experiments, and given the amount of open pull requests I would like to see some more changes before issuing a new release. If you need this badly in a non-develop version, we can merge this into master under a 0.7.1 version tag.

@janvanrijn
Copy link
Member Author

Sure. Where can I find the change log?

@janvanrijn
Copy link
Member Author

Ouch, I extended the unit tests with a publish statement, and apparently the model is always needed to be present in order to be able to upload. Don't know if we should really enforce this, as it seems just a sanity check, but I encountered some more discrepancies. Apparently the to/from xml functions upon which this relied did not work perfectly.

I pushed a fix, also adding more checks to both the serialize/unserialize functions and unit tests. However, I have a feeling the Run to/from XML functionality could use some more extensive unit tests. Please have a critical look at this and feel free to extend.

@mfeurer
Copy link
Collaborator

mfeurer commented May 2, 2018

The changelog is a bit hidden and we need to re-launch it, but it's here: https://github.com/openml/openml-python/blob/master/doc/progress.rst.

Regarding the behavior of publish when no flow is present, I'm not sure if/how we can support this. Instead of having a flow in the run, we would keep the model in the run and change this once we want to upload the run to OpenML? @joaquinvanschoren also opened an issue about this: #457

@janvanrijn
Copy link
Member Author

This is different. The publish function obviously requires a flow id (which is why Joaquin opened the issue), but requires a model (which I didn't store). Not sure if we really should require this, but this is what I fixed in the last PR.

Good to merge?

Copy link
Collaborator

@mfeurer mfeurer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good except for the changelog. What additional tests do you have in mind? Could you please open an issue for them?

@janvanrijn
Copy link
Member Author

Almost forgot, but it's now in the change log

Made an issue with some suggestions #465

@mfeurer
Copy link
Collaborator

mfeurer commented May 4, 2018

I think you forgot to push the commit.

@janvanrijn
Copy link
Member Author

I accidentally put it in the other pull request (listing)

@janvanrijn janvanrijn merged commit 870dfbf into develop May 4, 2018
@janvanrijn janvanrijn deleted the serialize_run branch May 4, 2018 15:15
@mfeurer mfeurer mentioned this pull request Jun 14, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants