Skip to content

Conversation

@mfeurer
Copy link
Collaborator

@mfeurer mfeurer commented Dec 19, 2018

Fixes #392

Copy link
Member

@janvanrijn janvanrijn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mfeurer
Copy link
Collaborator Author

mfeurer commented Dec 19, 2018

@janvanrijn
Copy link
Member

Will immediately make that error more XML-able.

@joaquinvanschoren
Copy link
Contributor

Seems like you're uploading a dataset with an ID that's already taken:

Error Number: 1062

Duplicate entry 'Pandas_testing_dataset-5133' for key 'nameID'

@janvanrijn
Copy link
Member

Looks like this is a result of the following:

  • Due to bad luck, two times a dataset with the same name got uploaded at the same time. Should not happen in practice.
  • Usually, on the main server this would not give a mysql error as they are hidden away and replaced with a xml error message (at least, most, including this one, are). On the test server we run the database in debug mode, meaning that every error will result in a verbose error message like this one. Given that this was only an incident, I feel like we should keep it that way.

@codecov-io
Copy link

codecov-io commented Dec 21, 2018

Codecov Report

Merging #609 into develop will increase coverage by 0.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           develop    #609      +/-   ##
==========================================
+ Coverage    89.88%   89.9%   +0.01%     
==========================================
  Files           32      32              
  Lines         3016    3020       +4     
==========================================
+ Hits          2711    2715       +4     
  Misses         305     305
Impacted Files Coverage Δ
openml/testing.py 93.58% <100%> (+0.34%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7c0a77d...c90cd42. Read the comment docs.

@mfeurer
Copy link
Collaborator Author

mfeurer commented Dec 21, 2018

Okay, so to not risk this happening any further (because this really confused me and requires getting feedback from the admins) I added a (hopefully) unique sentinel to make sure all datasets have unique names.

@mfeurer
Copy link
Collaborator Author

mfeurer commented Dec 21, 2018

It appears that there are some rather random test failures:

Unexpected failing examples:
/home/travis/build/openml/openml-python/examples/introduction_tutorial.py failed leaving traceback:
Traceback (most recent call last):
  File "/home/travis/build/openml/openml-python/examples/introduction_tutorial.py", line 70, in <module>
    run = openml.runs.run_flow_on_task(flow, task, avoid_duplicate_runs=False)
  File "/home/travis/build/openml/openml-python/openml/runs/functions.py", line 147, in run_flow_on_task
    "same: '%s' vs '%s'" % (str(flow.flow_id), str(flow_id))
ValueError: Result from API call flow_exists and flow.flow_id are not same: '58748' vs 'False'

and happens in

    # in case the flow not exists, flow_id will be False (as returned by
    # flow_exists). Also check whether there are no illegal flow.flow_id values
    # (compared to result of openml.flows.flow_exists)
    if flow_id is False:
        if flow.flow_id is not None:
            raise ValueError('flow.flow_id is not None, but the flow does not'
                             'exist on the server according to flow_exists')
        _publish_flow_if_necessary(flow)

    data_content, trace, fold_evaluations, sample_evaluations = res
    if not isinstance(flow.flow_id, int):
        # This is the usual behaviour, where the flow object was initiated off
        # line and requires some additional information (flow_id, input_id for
        # each hyperparameter) to be usable by this library
        server_flow = get_flow(flow_id)
        openml.flows.flow._copy_server_fields(server_flow, flow)
        openml.flows.assert_flows_equal(flow, server_flow,
                                        ignore_parameter_values=True)
    else:
        # This can only happen when the function is called directly, and not
        # through "run_model_on_task"
        if flow.flow_id != flow_id:
            # This should never happen, unless user made a flow-creation fault
            raise ValueError(
                "Result from API call flow_exists and flow.flow_id are not "
                "same: '%s' vs '%s'" % (str(flow.flow_id), str(flow_id))
            )

Could it be that we need to set flow_id after _publish_flow_if_necessary to avoid this error?

@ArlindKadra
Copy link
Member

@mfeurer I took a look at the code and I think you are right.
In case there is a flow that is not on the server and it does not have an id, if it gets published successfully

_publish_flow_if_necessary(flow)
data_content, trace, fold_evaluations, sample_evaluations = res
if not isinstance(flow.flow_id, int):
# This is the usual behaviour, where the flow object was initiated off
# line and requires some additional information (flow_id, input_id for
# each hyperparameter) to be usable by this library
server_flow = get_flow(flow_id)
openml.flows.flow._copy_server_fields(server_flow, flow)
openml.flows.assert_flows_equal(flow, server_flow,
ignore_parameter_values=True)
else:
# This can only happen when the function is called directly, and not
# through "run_model_on_task"
if flow.flow_id != flow_id:
# This should never happen, unless user made a flow-creation fault
raise ValueError(
"Result from API call flow_exists and flow.flow_id are not "
"same: '%s' vs '%s'" % (str(flow.flow_id), str(flow_id))
)

The else part above always gets triggered and fails. Since flow_id is not changed after the flow is published. It is only assigned once in this function call:
flow_id = flow_exists(flow.name, flow.external_version)

@mfeurer
Copy link
Collaborator Author

mfeurer commented Jan 29, 2019

Thanks @ArlindKadra this appears to solve the issue. The build failures are not related.

@mfeurer mfeurer merged commit 4a7db0e into develop Jan 29, 2019
@mfeurer mfeurer deleted the fix_392_again branch January 29, 2019 08:15
janvanrijn added a commit that referenced this pull request Feb 11, 2019
Subclass all test classes from openml test helper (#609)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants