Refactoring run_flow_on_task and doc add for run_model #516

ArlindKadra · 2018-09-11T12:49:25Z

Reference Issue

#515, #498, #457

What does this PR implement/fix? Explain your changes.

The PR fixes the documentation for run_flow_on_task, it also tackles run_flow_or_task always uploading the flow.

How should this PR be tested?

Calling the function, getting a run object in the end without uploading the flow to OpenML.

…t as a pre step for publish

PGijsbers · 2019-02-21T12:57:40Z

Hi, I am interested in having this functionality and if wonder it's ok working on this. I am a bit of a loss of the changes made so far, so I have some questions to better understand your work:

What is the _to_dict function for (in tasks/tasks.py). It seems to return some flow_container which is not referenced elsewhere, it seems to serialize flows but is located in tasks.py. Was it meant for flow serialization (for saving to disk)?
Work on actually preventing flow checks/uploads is not handled yet, is that correct?

In general, I thought to update the process to the following:

check if flow exists online, if so download. If not or no internet connection, use the local flow (possibly publish it if a flag is set and internet is available).
do the run logic
assuming still using a local flow, reference that when creating an OpenMLRun.

then upload the flow when:

publishing the run, if a local flow was used, upload it first, update the OpenMLRun with updated flow information before uploading the run.
or when flow.publish is called.

ArlindKadra · 2019-02-21T13:40:02Z

Hey @PGijsbers

What is the _to_dict function for (in tasks/tasks.py). It seems to return some flow_container which is not referenced elsewhere, it seems to serialize flows but is located in tasks.py. Was it meant for flow serialization (for saving to disk)?

It has been quite a bit of time. Considering the commit name, I guess I was implementing the generation of a task dictionary and I just copy pasted the code from the _to_dict function in the flow class. I started tweaking it and I never got back to the pr, that is why there are still flow parts there, so basically it is not useful. You should remove that function completely.

This is addressed in #607 .

Work on actually preventing flow checks/uploads is not handled yet, is that correct?

Yes, it is not handled yet. Preventing flow uploads is not easily done, since a flow id is needed to create the run, with the idea of storing the run locally, if I remember correctly.

As to how you want to proceed, it seems really good to me. We had a discussion on how to handle this in the last OpenML workshop and this was what we agreed upon. @mfeurer @janvanrijn can you confirm this was and still is the case ?

The functions to get the cached flows should be useful and the documentation change for the run_model_on_task is nice too.

ArlindKadra · 2019-02-21T13:43:51Z

@mfeurer should the flow cache functions be moved in another PR ?

PGijsbers · 2019-02-21T13:48:55Z

So basically undo the last commit 22b1e62 before proceeding to work?

ArlindKadra · 2019-02-21T13:53:04Z

Yes, the other commits should be useful.

codecov-io · 2019-02-21T15:26:20Z

Codecov Report

Merging #516 into develop will increase coverage by 0.43%.
The diff coverage is 90.21%.

@@             Coverage Diff             @@
##           develop     #516      +/-   ##
===========================================
+ Coverage    90.12%   90.56%   +0.43%     
===========================================
  Files           32       32              
  Lines         3232     3520     +288     
===========================================
+ Hits          2913     3188     +275     
- Misses         319      332      +13

Impacted Files	Coverage Δ
openml/flows/sklearn_converter.py	`94.39% <100%> (+3.99%)`	⬆️
openml/datasets/functions.py	`92.73% <100%> (ø)`	⬆️
openml/flows/functions.py	`87.4% <79.16%> (-5.86%)`	⬇️
openml/runs/functions.py	`86.43% <92.45%> (-0.63%)`	⬇️
openml/runs/run.py	`90.56% <93.33%> (+1.46%)`	⬆️
openml/exceptions.py	`96.87% <94.11%> (-3.13%)`	⬇️
openml/flows/flow.py	`94.08% <96.77%> (+0.22%)`	⬆️
openml/_api_calls.py	`83.11% <0%> (-5.2%)`	⬇️
openml/tasks/functions.py	`86.45% <0%> (-1.94%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 98a73b3...e63e495. Read the comment docs.

… to do with changes.

mfeurer · 2019-02-22T15:15:24Z

I think this is what we discussed in Paris. And flow caching should be fine within this PR as it would already test the new flow caching functionality.

…d as the name did not reflect the functionality.

…server`

mfeurer

A few more comments regarding the unit tests.

tests/test_runs/test_run.py

mfeurer · 2019-03-05T09:52:07Z

I played a bit with the code and it appears that it cannot handle a simple random forest at the moment:

import sklearn.ensemble

import openml

openml.config.server = "https://test.openml.org/api/v1/xml"
openml.config.apikey = "610344db6388d9ba34f6db45a3cf71de"

model = sklearn.ensemble.RandomForestClassifier(n_estimators=33)

task = openml.tasks.get_task(12)

run = openml.runs.run_model_on_task(
    model=model,
    task=task,
    avoid_duplicate_runs=False,
    upload_flow=False,
)

print(run)
print(run.flow)
print(run.flow.flow_id)
print(run.flow_id)

run.publish()

as it results in:

[run id: None, task id: 12, flow id: None, flow name: sklearn.ensemble.forest.Ra...]
<openml.flows.flow.OpenMLFlow object at 0x7fe912411438>
None
None
Traceback (most recent call last):
  File "/home/feurerm/sync_dir/projects/openml/python/testing3.py", line 24, in <module>
    run.publish()
  File "/home/feurerm/sync_dir/projects/openml/python/openml/runs/run.py", line 402, in publish
    self.parameter_settings = openml.flows.obtain_parameter_values(self.flow, self.model)
  File "/home/feurerm/sync_dir/projects/openml/python/openml/flows/sklearn_converter.py", line 373, in obtain_parameter_values
    model = model if model else flow.model
  File "/home/feurerm/miniconda/3-4.5.4/envs/openml/lib/python3.6/site-packages/sklearn/ensemble/base.py", line 140, in __len__
    return len(self.estimators_)
AttributeError: 'RandomForestClassifier' object has no attribute 'estimators_'

…_task and publish.

PGijsbers · 2019-03-05T10:36:14Z

Weird. I can recreate the problem locally and am looking into this.

…s to test for truthiness.

PGijsbers · 2019-03-05T10:57:54Z

It seems that with absence of an explicit __bool__ operator Python objects fall back on the __len__ operator to handle 'truthiness', since BaseEnsemble had a __len__ method but not a __bool__ method, __len__ was invoked. I now refactored the problematic line
model = model if model else flow.model
to include explicit None checking:
model = model if model is not None else flow.model

…flow to ensure it does not exist on the server.

…f associated flow did not exist but was also not uploaded. This gave errors at publish-time.

mfeurer

From my side we're good to merge. Only nitpick would be that some fields are not filled prior to uploading and the user can obtain a None when accessing them. However, I think this is more related to #634.

@ArlindKadra do you have any comments as you started this PR?

PGijsbers · 2019-03-05T13:57:43Z

But that is essentially the same as a user just creating the flow through sklearn_to_flow and then accessing the None fields, right? If so, I would agree that it's not really related to this PR so much as #634.

openml/flows/flow.py

openml/flows/functions.py

ArlindKadra · 2019-03-05T21:40:41Z

@mfeurer @PGijsbers except for a small suggestion, everything looks really good to me :)

ArlindKadra added 2 commits September 11, 2018 14:41

Documentation fix

32d99d7

Add doc for run_model_on_task

f968dcc

ArlindKadra changed the title ~~[WIP] Refactoring run_flow_on_task~~ [WIP] Refactoring run_flow_on_task and doc add for run_model Sep 11, 2018

ArlindKadra added 3 commits September 12, 2018 11:54

Initial additions

41badba

Added functions to cache flows

09ff907

Tweaking a function from flow which will be used to create a task dic…

22b1e62

…t as a pre step for publish

Undo 22b1e62.

7e2ddc4

PGijsbers added 2 commits February 21, 2019 18:32

Merge develop head to fix498. Three tests fail locally, but none have…

ac343c5

… to do with changes.

PEP8 compliance.

4031718

PGijsbers added 14 commits February 25, 2019 18:45

Merge develop.

dd689d3

Add (unused) flag to (not) upload flow. Rename get_seeded_model metho…

7b98b96

…d as the name did not reflect the functionality.

Add RunExistsError.

c70eb19

RunsExistsError now correctly allows multiple runs, reflected in name.

8308f8f

Towards offline run_model_on_task

37ee912

Fix name.

5743f1b

Py3 style.

3cdf04d

Fix typo.

bfce696

Allow run flow locally. Caching and upload not implemented.

66f1027

Clean up test with new Error type.

a92d011

Check if flow exists before uploading.

3fc2bb4

Remove one-line method that was only called from other method.

06a49b7

Change error type. Add typehint.

875595b

Fix imports.

7b6e659

PGijsbers added 6 commits March 4, 2019 19:34

Updated for the new parametername.

62b28a9

Function signature formatting improvements.

ad03560

Consistent spacing around colons. Add parameter description of `from_…

66a7ed8

…server`

Add missing parenthesis.

00d4923

Doc changes, typehint.

38a0b4f

Remove check for flow as I think it is outdated.

448e04b

mfeurer reviewed Mar 5, 2019

View reviewed changes

tests/test_runs/test_run.py Show resolved Hide resolved

tests/test_runs/test_run.py Show resolved Hide resolved

tests/test_runs/test_run.py Show resolved Hide resolved

tests/test_runs/test_run.py Show resolved Hide resolved

PGijsbers added 2 commits March 5, 2019 12:10

PrivateDatasetError and RunsExistError now prefixed with 'OpenML'

7be9dc7

Updated unit test to verify flows existence before/after run_model_on…

ac754fa

…_task and publish.

PGijsbers added 2 commits March 5, 2019 12:48

Start for testing model on downloaded flow.

d7d95e1

Explicit test for none as other __len__ can get invoked on some model…

c3e0913

…s to test for truthiness.

PGijsbers added 3 commits March 5, 2019 13:10

Unit test now downloads flow after ensuring it exists.

7cffdd1

Test with run_flow_on_task instead so a sentinel can be added to the …

eaae3b7

…flow to ensure it does not exist on the server.

Fixed a bug where run.flow_id would be set to False instead of None i…

c0e11f1

…f associated flow did not exist but was also not uploaded. This gave errors at publish-time.

mfeurer approved these changes Mar 5, 2019

View reviewed changes

ArlindKadra commented Mar 5, 2019

View reviewed changes

openml/flows/flow.py Show resolved Hide resolved

openml/flows/functions.py Show resolved Hide resolved

openml/flows/functions.py Outdated Show resolved Hide resolved

Fix typo.

e63e495

mfeurer merged commit 0235c51 into develop Mar 6, 2019

This was referenced Mar 6, 2019

run_model_on_task has no docs, run_flow_on_tasks docs are wrong #515

Closed

Unclear function documentation and warning #498

Closed

Why does run_flow_or_task always upload the flow? #457

Closed

PGijsbers mentioned this pull request Mar 6, 2019

Cache runs for later upload #219

Closed

mfeurer mentioned this pull request Mar 6, 2019

Feature to run a local model, without the intention to upload it to openml #301

Closed

PGijsbers deleted the fix498 branch March 17, 2019 12:45

PGijsbers mentioned this pull request Apr 15, 2019

First run of sklearn classifier crashes constantly #562

Closed

Uh oh!

Refactoring run_flow_on_task and doc add for run_model #516

Refactoring run_flow_on_task and doc add for run_model #516

Uh oh!

Conversation

ArlindKadra commented Sep 11, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference Issue

What does this PR implement/fix? Explain your changes.

How should this PR be tested?

Uh oh!

PGijsbers commented Feb 21, 2019

Uh oh!

ArlindKadra commented Feb 21, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ArlindKadra commented Feb 21, 2019

Uh oh!

PGijsbers commented Feb 21, 2019

Uh oh!

ArlindKadra commented Feb 21, 2019

Uh oh!

codecov-io commented Feb 21, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

mfeurer commented Feb 22, 2019

Uh oh!

mfeurer left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mfeurer commented Mar 5, 2019

Uh oh!

PGijsbers commented Mar 5, 2019

Uh oh!

PGijsbers commented Mar 5, 2019

Uh oh!

mfeurer left a comment

Choose a reason for hiding this comment

Uh oh!

PGijsbers commented Mar 5, 2019

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ArlindKadra commented Mar 5, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

ArlindKadra commented Sep 11, 2018 •

edited

Loading

ArlindKadra commented Feb 21, 2019 •

edited

Loading

codecov-io commented Feb 21, 2019 •

edited

Loading