Skip to content

Conversation

@PGijsbers
Copy link
Collaborator

@PGijsbers PGijsbers commented Oct 15, 2019

Several OpenML entities are sharing a lot of code in theory, but have copy paste duplicate code in practice. This PR introduces a base class from which OpenML entities can derive and share common logic. In this draft some bare basics are implemented and OpenMLFlow derives from it.

This will be further extended to make the other main entities (Dataset, Task, Run and Study) derive from the base class.
I will take a look at a few other functions shared between some or all of those types such as _to_xml and publish.

edit: looks like unit tests fail due to illegal type annotation for Py3.6, please ignore. The Flow unit tests worked locally, will fix it to be Py3.6 compatible.

edit: Also fixes #433

@PGijsbers PGijsbers requested a review from mfeurer October 15, 2019 12:19
@codecov-io
Copy link

codecov-io commented Oct 16, 2019

Codecov Report

❗ No coverage uploaded for pull request base (develop@23d4e6f). Click here to learn what that means.
The diff coverage is 83.13%.

Impacted file tree graph

@@            Coverage Diff             @@
##             develop     #828   +/-   ##
==========================================
  Coverage           ?   88.55%           
==========================================
  Files              ?       37           
  Lines              ?     4248           
  Branches           ?        0           
==========================================
  Hits               ?     3762           
  Misses             ?      486           
  Partials           ?        0
Impacted Files Coverage Δ
openml/evaluations/evaluation.py 66.66% <ø> (ø)
openml/flows/functions.py 87.42% <ø> (ø)
openml/setups/setup.py 44% <ø> (ø)
openml/study/functions.py 88.23% <0%> (ø)
openml/runs/trace.py 90.75% <100%> (ø)
openml/flows/flow.py 93.96% <100%> (ø)
openml/config.py 90% <100%> (ø)
openml/tasks/task.py 83.89% <55.55%> (ø)
openml/study/study.py 72.5% <68.18%> (ø)
openml/utils.py 90.9% <85.71%> (ø)
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 23d4e6f...88e9cf0. Read the comment docs.

@PGijsbers
Copy link
Collaborator Author

For the __repr__ header I now parse the class names, which gives nearly identical output with what was before, but is generic and shared between all child classes. For tasks there is now also more information in the header, instead of "OpenML Task" it now says the full task type, e.g.:

OpenML Classification Task
==========================
Task Type...........: Supervised Classification
Task ID.............: 59
Task URL............: https://www.openml.org/t/59
Estimation Procedure: crossvalidation
Evaluation Measure..: predictive_accuracy
Target Feature......: class
# of Classes........: 3
Cost Matrix.........: Available

I propose to remove the 'Task Type' row from the body of the __repr__.

@PGijsbers PGijsbers changed the title Create OpenMLBase, have OpenMLFlow derive from it. Create OpenMLBase, have most OpenML objects derive from it Oct 16, 2019
Copy link
Collaborator

@mfeurer mfeurer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really good! I have one more suggestion, but we can really merge this.

# Default values are actually added here in the _setup() function which is
# called at the end of this module
server = _defaults['server']
server = str(_defaults['server']) # so mypy knows it is a string
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could also use mypy.cast to not have to use a comment here

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I generally prefer this for simple types, as it is clear what is happening (whereas you have to be familiar with mypy.cast to immediately know what's being done and why).

Copy link
Collaborator

@mfeurer mfeurer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that there are right now three type issues which I unfortunately cannot handle from the web interface, could you please check these?

@PGijsbers
Copy link
Collaborator Author

I fixed the mypy errors by using getattr as mypy can't know the attributes exist in child classes. I think this is probably the cleanest way to do this.

@PGijsbers
Copy link
Collaborator Author

Looks like my local mypy reported different errors than CI. Great.

@PGijsbers PGijsbers marked this pull request as ready for review October 17, 2019 07:36
@PGijsbers
Copy link
Collaborator Author

@mfeurer I think this is good to go. The only failure on Appveyor seems like a server issue to me:

================================== FAILURES ===================================
__________________ TestRun.test_run_and_upload_randomsearch ___________________
[gw1] win32 -- Python 3.5.6 c:\miniconda35-x64\python.exe
self = <tests.test_runs.test_run_functions.TestRun testMethod=test_run_and_upload_randomsearch>
    def test_run_and_upload_randomsearch(self):
        randomsearch = RandomizedSearchCV(
            RandomForestClassifier(n_estimators=5),
            {"max_depth": [3, None],
             "max_features": [1, 2, 3, 4],
             "min_samples_split": [2, 3, 4, 5, 6, 7, 8, 9, 10],
             "min_samples_leaf": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
             "bootstrap": [True, False],
             "criterion": ["gini", "entropy"]},
            cv=StratifiedKFold(n_splits=2, shuffle=True),
            n_iter=5)
        # The random states for the RandomizedSearchCV is set after the
        # random state of the RandomForestClassifier is set, therefore,
        # it has a different value than the other examples before
        task_id = self.TEST_SERVER_TASK_SIMPLE[0]
        n_missing_vals = self.TEST_SERVER_TASK_SIMPLE[1]
        n_test_obs = self.TEST_SERVER_TASK_SIMPLE[2]
        run = self._run_and_upload_classification(
            clf=randomsearch,
            task_id=task_id,
            n_missing_vals=n_missing_vals,
            n_test_obs=n_test_obs,
>           flow_expected_rsv='12172',
        )
tests\test_runs\test_run_functions.py:607: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
tests\test_runs\test_run_functions.py:460: in _run_and_upload_classification
    sentinel=sentinel,
tests\test_runs\test_run_functions.py:424: in _run_and_upload
    raise e
tests\test_runs\test_run_functions.py:420: in _run_and_upload
    fold=0,
openml\runs\functions.py:332: in initialize_model_from_trace
    run_trace = get_run_trace(run_id)
openml\runs\functions.py:274: in get_run_trace
    'get')
openml\_api_calls.py:52: in _perform_api_call
    return _read_url(url, request_method, data)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
url = 'https://test.openml.org/api/v1/xml/run/trace/101829'
request_method = 'get', data = {'api_key': '610344db6388d9ba34f6db45a3cf71de'}
    def _read_url(url, request_method, data=None):
        data = {} if data is None else data
        if config.apikey is not None:
            data['api_key'] = config.apikey
    
        response = send_request(request_method=request_method, url=url, data=data)
        if response.status_code != 200:
>           raise _parse_server_exception(response, url, file_elements=None)
E           openml.exceptions.OpenMLServerException: No successful trace associated with this run. - None
openml\_api_calls.py:99: OpenMLServerException

@mfeurer mfeurer merged commit 43596e0 into develop Oct 17, 2019
@mfeurer mfeurer deleted the OpenMLBase branch October 17, 2019 08:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add open in browser function for each entity

4 participants