-
-
Notifications
You must be signed in to change notification settings - Fork 211
added serialize run functionality #459
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
mfeurer
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks mostly good, a few requests for changes.
| pp.text(str(self)) | ||
|
|
||
| @classmethod | ||
| def from_filesystem(cls, folder): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please add a docstring?
|
|
||
| return run | ||
|
|
||
| def to_filesystem(self, output_directory): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please add a docstring, here, too?
openml/runs/run.py
Outdated
| run_xml = self._create_description_xml() | ||
| predictions_arff = arff.dumps(self._generate_arff_dict()) | ||
|
|
||
| with open(output_directory + '/description.xml', 'w') as f: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please use os.path.join as above?
openml/runs/run.py
Outdated
|
|
||
| with open(output_directory + '/description.xml', 'w') as f: | ||
| f.write(run_xml) | ||
| with open(output_directory + '/predictions.arff', 'w') as f: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here.
openml/runs/run.py
Outdated
|
|
||
| if self.trace_content is not None: | ||
| trace_arff = arff.dumps(self._generate_trace_arff_dict()) | ||
| with open(output_directory + '/trace.arff', 'w') as f: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And here
tests/test_runs/test_run.py
Outdated
| run.to_filesystem(cache_path) | ||
|
|
||
| run_prime = openml.runs.OpenMLRun.from_filesystem(cache_path) | ||
| self._test_run_obj_equals(run, run_prime) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should add a check here that the trace is available. The function _test_run_obj_equals does not guarantee this.
|
Agreed to all. |
mfeurer
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more thing, could you please add a note about the new functionality to our (outdated) changelog (I created an issue to update that here: #460)? We need to do this now as we're on pypi as we'll slowly get more users.
openml/runs/run.py
Outdated
| @classmethod | ||
| def from_filesystem(cls, folder): | ||
| """ | ||
| The inverse of the to_filesystem method. Initiates a run based |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that initiate is unfortunate wording here as it also means 'to start'. How about 'instantiates a run object'?
|
I did not yet merge this to develop/master as you requested in my other PR because we can do this whenever we're ready to start up some experiments, and given the amount of open pull requests I would like to see some more changes before issuing a new release. If you need this badly in a non-develop version, we can merge this into master under a |
|
Sure. Where can I find the change log? |
|
Ouch, I extended the unit tests with a publish statement, and apparently the model is always needed to be present in order to be able to upload. Don't know if we should really enforce this, as it seems just a sanity check, but I encountered some more discrepancies. Apparently the to/from xml functions upon which this relied did not work perfectly. I pushed a fix, also adding more checks to both the serialize/unserialize functions and unit tests. However, I have a feeling the Run to/from XML functionality could use some more extensive unit tests. Please have a critical look at this and feel free to extend. |
|
The changelog is a bit hidden and we need to re-launch it, but it's here: https://github.com/openml/openml-python/blob/master/doc/progress.rst. Regarding the behavior of publish when no flow is present, I'm not sure if/how we can support this. Instead of having a flow in the run, we would keep the model in the run and change this once we want to upload the run to OpenML? @joaquinvanschoren also opened an issue about this: #457 |
|
This is different. The publish function obviously requires a flow id (which is why Joaquin opened the issue), but requires a model (which I didn't store). Not sure if we really should require this, but this is what I fixed in the last PR. Good to merge? |
mfeurer
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good except for the changelog. What additional tests do you have in mind? Could you please open an issue for them?
|
Almost forgot, but it's now in the change log Made an issue with some suggestions #465 |
|
I think you forgot to push the commit. |
|
I accidentally put it in the other pull request (listing) |
What does this PR implement/fix? Explain your changes.
It allows run objects (including predictions and traces) to be serialized to disk, and reloaded. This functionality is almost used in all my projects, and I can imagine us using this for the benchmark study.
How should this PR be tested?
Unit tests should pass, code should make sense, please check the unit test if run equality check is OK.
Any other comments?