added serialize run functionality #459

janvanrijn · 2018-04-30T02:11:22Z

What does this PR implement/fix? Explain your changes.

It allows run objects (including predictions and traces) to be serialized to disk, and reloaded. This functionality is almost used in all my projects, and I can imagine us using this for the benchmark study.

How should this PR be tested?

Unit tests should pass, code should make sense, please check the unit test if run equality check is OK.

Any other comments?

mfeurer

Looks mostly good, a few requests for changes.

mfeurer · 2018-04-30T14:00:14Z

openml/runs/run.py

        pp.text(str(self))

+    @classmethod
+    def from_filesystem(cls, folder):


Could you please add a docstring?

mfeurer · 2018-04-30T14:00:28Z

openml/runs/run.py

+
+        return run
+
+    def to_filesystem(self, output_directory):


Could you please add a docstring, here, too?

mfeurer · 2018-04-30T14:00:52Z

openml/runs/run.py

+        run_xml = self._create_description_xml()
+        predictions_arff = arff.dumps(self._generate_arff_dict())
+
+        with open(output_directory + '/description.xml', 'w') as f:


Could you please use os.path.join as above?

mfeurer · 2018-04-30T14:00:59Z

openml/runs/run.py

+
+        with open(output_directory + '/description.xml', 'w') as f:
+            f.write(run_xml)
+        with open(output_directory + '/predictions.arff', 'w') as f:


mfeurer · 2018-04-30T14:01:10Z

openml/runs/run.py

+
+        if self.trace_content is not None:
+            trace_arff = arff.dumps(self._generate_trace_arff_dict())
+            with open(output_directory + '/trace.arff', 'w') as f:


mfeurer · 2018-04-30T14:02:31Z

tests/test_runs/test_run.py

+        run.to_filesystem(cache_path)
+
+        run_prime = openml.runs.OpenMLRun.from_filesystem(cache_path)
+        self._test_run_obj_equals(run, run_prime)


You should add a check here that the trace is available. The function _test_run_obj_equals does not guarantee this.

janvanrijn · 2018-04-30T16:49:11Z

Agreed to all.

mfeurer

One more thing, could you please add a note about the new functionality to our (outdated) changelog (I created an issue to update that here: #460)? We need to do this now as we're on pypi as we'll slowly get more users.

mfeurer · 2018-05-01T07:08:24Z

openml/runs/run.py

    @classmethod
    def from_filesystem(cls, folder):
+        """
+        The inverse of the to_filesystem method. Initiates a run based


I think that initiate is unfortunate wording here as it also means 'to start'. How about 'instantiates a run object'?

mfeurer · 2018-05-01T07:12:26Z

I did not yet merge this to develop/master as you requested in my other PR because we can do this whenever we're ready to start up some experiments, and given the amount of open pull requests I would like to see some more changes before issuing a new release. If you need this badly in a non-develop version, we can merge this into master under a 0.7.1 version tag.

janvanrijn · 2018-05-01T14:39:50Z

Sure. Where can I find the change log?

janvanrijn · 2018-05-01T15:41:50Z

Ouch, I extended the unit tests with a publish statement, and apparently the model is always needed to be present in order to be able to upload. Don't know if we should really enforce this, as it seems just a sanity check, but I encountered some more discrepancies. Apparently the to/from xml functions upon which this relied did not work perfectly.

I pushed a fix, also adding more checks to both the serialize/unserialize functions and unit tests. However, I have a feeling the Run to/from XML functionality could use some more extensive unit tests. Please have a critical look at this and feel free to extend.

mfeurer · 2018-05-02T13:46:13Z

The changelog is a bit hidden and we need to re-launch it, but it's here: https://github.com/openml/openml-python/blob/master/doc/progress.rst.

Regarding the behavior of publish when no flow is present, I'm not sure if/how we can support this. Instead of having a flow in the run, we would keep the model in the run and change this once we want to upload the run to OpenML? @joaquinvanschoren also opened an issue about this: #457

janvanrijn · 2018-05-02T14:49:58Z

This is different. The publish function obviously requires a flow id (which is why Joaquin opened the issue), but requires a model (which I didn't store). Not sure if we really should require this, but this is what I fixed in the last PR.

Good to merge?

mfeurer

Looks good except for the changelog. What additional tests do you have in mind? Could you please open an issue for them?

janvanrijn · 2018-05-03T19:06:03Z

Almost forgot, but it's now in the change log

Made an issue with some suggestions #465

mfeurer · 2018-05-04T08:06:40Z

I think you forgot to push the commit.

janvanrijn · 2018-05-04T14:09:02Z

I accidentally put it in the other pull request (listing)

added serialize run functionality

4118a96

janvanrijn requested a review from mfeurer April 30, 2018 02:11

janvanrijn added 2 commits April 29, 2018 22:21

removed exist ok argument

52e301b

fixed unit test

3209892

mfeurer requested changes Apr 30, 2018

View reviewed changes

changes requested by @mfeurer

050a572

mfeurer requested changes May 1, 2018

View reviewed changes

updated docstring

d92e9f2

extended unit tests

ec82219

mfeurer requested changes May 3, 2018

View reviewed changes

mfeurer approved these changes May 4, 2018

View reviewed changes

janvanrijn merged commit 870dfbf into develop May 4, 2018

janvanrijn deleted the serialize_run branch May 4, 2018 15:15

mfeurer mentioned this pull request Jun 14, 2018

Store run to disk fn #422

Closed

Uh oh!

added serialize run functionality #459

added serialize run functionality #459

Uh oh!

Conversation

janvanrijn commented Apr 30, 2018

What does this PR implement/fix? Explain your changes.

How should this PR be tested?

Any other comments?

Uh oh!

mfeurer left a comment

Choose a reason for hiding this comment

Uh oh!

mfeurer Apr 30, 2018

Choose a reason for hiding this comment

Uh oh!

mfeurer Apr 30, 2018

Choose a reason for hiding this comment

Uh oh!

mfeurer Apr 30, 2018

Choose a reason for hiding this comment

Uh oh!

mfeurer Apr 30, 2018

Choose a reason for hiding this comment

Uh oh!

mfeurer Apr 30, 2018

Choose a reason for hiding this comment

Uh oh!

mfeurer Apr 30, 2018

Choose a reason for hiding this comment

Uh oh!

janvanrijn commented Apr 30, 2018

Uh oh!

mfeurer left a comment

Choose a reason for hiding this comment

Uh oh!

mfeurer May 1, 2018

Choose a reason for hiding this comment

Uh oh!

mfeurer commented May 1, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

janvanrijn commented May 1, 2018

Uh oh!

janvanrijn commented May 1, 2018

Uh oh!

mfeurer commented May 2, 2018

Uh oh!

janvanrijn commented May 2, 2018

Uh oh!

mfeurer left a comment

Choose a reason for hiding this comment

Uh oh!

janvanrijn commented May 3, 2018

Uh oh!

mfeurer commented May 4, 2018

Uh oh!

janvanrijn commented May 4, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mfeurer commented May 1, 2018 •

edited

Loading