Skip to content

Conversation

@mfeurer
Copy link
Collaborator

@mfeurer mfeurer commented Apr 15, 2019

Continuation of #647:

  • Remove obtain_arff_trace and return the trace headers in each fold and later on check that they're all the same.
  • Simplify run_model_on_fold to accept X and y and do not split the data itself
  • Simplify the return value of run_model_on_fold
  • [ ] Have functionality for transforming predictions into arff lines in the different task classes. (As discussed with JvR offline, we'll keep task-specific code here until the maintenance becomes an issue as for now the code would be more complex if such transformations would be in the task classes).
  • [ ] Figure out how to best handle dependencies (as raised here) (this is beyond the scope of this PR)
  • add more unit tests

@mfeurer mfeurer changed the base branch from master to develop April 15, 2019 15:05
task: 'OpenMLTask',
X_train: Union[np.ndarray, scipy.sparse.spmatrix, pd.DataFrame],
y_train: np.ndarray,
rep_no: int,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can be removed, together with fold_no, sample_no

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rep_no and fold_no are necessary to create trace objects.

rep_no: int,
fold_no: int,
sample_no: int,
add_local_measures: bool,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can be removed as it can be handled a level up

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@mfeurer mfeurer force-pushed the improve_extension_interface branch from ba48e55 to 8abfb23 Compare April 17, 2019 18:16
@mfeurer mfeurer marked this pull request as ready for review April 17, 2019 18:16
@mfeurer mfeurer requested review from PGijsbers and janvanrijn April 17, 2019 18:16
# correct probability array (the actualy array might be incorrect if there are some
# classes not present during train time).
proba_y_new = np.zeros((proba_y.shape[0], len(classes)))
for idx, model_class in enumerate(model_classes):
Copy link
Collaborator

@PGijsbers PGijsbers Apr 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what you're trying to do here.
If targets are mapped to be zero-based indices, wouldn't enumerate(model_classes) just evaluate to (1,1), (2,2), ..., (Kt, Kt) where Kt is the number of classes in the training data?
If so why not do either

proba_y_new = np.zeros((proba_y.shape[0], len(classes)))
proba_y_new[:, :proba_y.shape[1]] = proba_y

or use np.hstack to pad zero-columns?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The zero-columns might need to be in the middle of the array. I added an example to the comment.


if isinstance(task, (OpenMLClassificationTask, OpenMLLearningCurveTask)):

if classes is None:
Copy link
Collaborator

@PGijsbers PGijsbers Apr 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would personally prefer argument checking to be done at the start of the method call such that all constraints are clear, and TypeError lead to a quick fail (not e.g. after running a whole grid search!). Even though I know it possibly introduces an extra line of code :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That check will go away when using task.class_labels. I will add a check that X_test and y_train are given at the top of the function.

classes = None

n_fit = 0
for rep_no in range(num_reps):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe replace this with itertools.product(num_reps, num_folds, num_samples) so we get rid of two levels of indentation and can maybe spare some ugly line-breaks below.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

for rep_no in range(num_reps):
for fold_no in range(num_folds):
for sample_no in range(num_samples):
n_fit += 1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a reason not to use enumerate (it looks like you want to start at 1, which is done with enumerate(Sequence, start=1)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks for the tip with start=1.

else:
raise TypeError(type(task))

arff_datacontent.extend(arff_datacontent_fold)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the use arff_datacontent_fold? It is defined in line 446 but I don't see it being used anywhere else. Is this an error? Or is it legacy? If it is legacy, I would prefer arff_datacontent.extend([]) # legacy or something equivalent.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not used any more so I removed it.

if len(traces) > 0:
if len(traces) != n_fit:
raise ValueError(
'Did not find enough traces (expected %d, found %d)' % (n_fit, len(traces))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.format

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

if not isinstance(other, OpenMLTraceIteration):
return False
attributes = [
'repeat', 'fold', 'iteration', 'setup_string', 'evaluation', 'selected', 'paramaters',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed the typo paramaters here, and also noticed this is present all throughout the init method. Could you fix that up there too?
Also, it looks like this code probably isn't tested then? Since getattr should raise an exception as this object does not have an attribute paramater as (somehow) the attribute to the object itself is spelled correctly :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fixed the attribute and removed the equals function as it was not used.


def get_X_and_y(self):
def get_X_and_y(self, dataset_format='array'):
"""Get data associated with the current task.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

type hint, doc string with legal values for dataset_format parameter.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think documentation on the dataset_format parameter is still missing?
I don't know what the possible valid string values are (I think 'array' and 'dataframe'?).

return cls(run_id, trace)

@classmethod
def merge_traces(cls, traces: List['OpenMLRunTrace']):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return type not specified in typehint

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

self.assertEqual(len(arff_tracecontent), 0)
self.assertIsNone(trace)

self._check_fold_timing_evaluations(fold_evaluations, num_repeats, num_folds,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since neither num_repeats nor num_folds seem to be used throughout the test other than here, I would prefer the call to just be

self._check_fold_timing_evaluations(fold_evaluations, 
    num_repeats=1, num_folds=1,
    task_type=task.task_type_id, check_scores=False)

which makes that clear.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

self.assertIsInstance(trace, OpenMLRunTrace)
self.assertEqual(len(trace.trace_iterations), 2)

self._check_fold_timing_evaluations(fold_evaluations, num_repeats, num_folds,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same note on num_repeats and num_folds

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

# trace. SGD does not produce any
self.assertIsNone(trace)

self._check_fold_timing_evaluations(fold_evaluations, num_repeats, num_folds,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

num_repeats

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

# trace. SGD does not produce any
self.assertIsNone(trace)

self._check_fold_timing_evaluations(fold_evaluations, num_repeats, num_folds,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

num_repeats

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Collaborator

@PGijsbers PGijsbers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel fine about accepting the PR despite it not passing the unit tests, as I am now also convinced they are server issues and not something introduced by this PR.
That said, there are a few minor remarks left :)

merged_trace[key] = iteration
previous_iteration = key

return cls(None, merged_trace)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't tested this, but I think this can be a bit more pythonic and clearer if you first check the error condition, and make use of zip to avoid indexing:

if any([sorted(iter1.parameters) != sorted(iter2.parameters)
        for trace in traces
        for (iter1, iter2) in zip(trace, trace[1:])]):
		raise ValueError(...)

merged_trace = {
   (iteration.repeat, iteration.fold, iteration.iteration) : iteration
   for trace in traces for iteration in trace
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(note that list(Dict) and list(Dict.keys()) should be equivalent. If you are not concerned about insertion order, you can use list (which is ordered based on insertion order). If insertion order can be a concern, use sorted instead.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion. I'm not 100% sure whether this will allow the same flexibility, for example, the first statement would only check the iterations in one trace, and it would be hard to extract a useful error message out of it. Therefore, i suggest leaving the code as it is.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I missed that.

all_iterations = [iter for trace in traces for iter in trace]
for (iter1, iter2) in zip(all_iterations, all_iterations[1:]):
   if sorted(iter1.parameters) != sorted(iter2.parameters):
       raise ValueError(...)

should then work fine? But I am fine leaving this as is (and perhaps refactoring it later if we can find something we all like).


def get_X_and_y(self):
def get_X_and_y(self, dataset_format='array'):
"""Get data associated with the current task.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think documentation on the dataset_format parameter is still missing?
I don't know what the possible valid string values are (I think 'array' and 'dataframe'?).

@codecov-io
Copy link

codecov-io commented Apr 18, 2019

Codecov Report

Merging #673 into develop will decrease coverage by 0.04%.
The diff coverage is 85.97%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop     #673      +/-   ##
===========================================
- Coverage    90.82%   90.77%   -0.05%     
===========================================
  Files           36       36              
  Lines         3573     3608      +35     
===========================================
+ Hits          3245     3275      +30     
- Misses         328      333       +5
Impacted Files Coverage Δ
openml/tasks/task.py 96.11% <100%> (+0.32%) ⬆️
openml/testing.py 95.37% <100%> (+0.04%) ⬆️
openml/_api_calls.py 83.11% <66.66%> (ø) ⬆️
openml/extensions/extension_interface.py 91.17% <75%> (-0.26%) ⬇️
openml/runs/functions.py 82% <79.36%> (-1.95%) ⬇️
openml/extensions/sklearn/extension.py 90.86% <89.83%> (+0.95%) ⬆️
openml/runs/trace.py 91.22% <90.47%> (+0.14%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4152f91...0b01581. Read the comment docs.

@mfeurer
Copy link
Collaborator Author

mfeurer commented Apr 19, 2019

Okay @PGijsbers I found the issue. After some offline discussion with @janvanrijn I figured out that we can see the reason why there is no trace in the error log of the run. It turns out that there were three bugs:

  1. there was an indexing error which caused the predictions to have wrong indices and therefore make the evaluation fail. I fixed this.
  2. in case there was no evaluation, after 200 seconds the test continued and raised the cryptic error message that there was no trace associated. I added a useful error message.
  3. In case there were no evaluations, due to recently added run caching, the test could not retrieve any new evaluations by the server because it would continue to check the cached run. I added a flag to ignore the cached run and download it again from the server.

All in all, this should be good to merge now.

@PGijsbers PGijsbers merged commit f656062 into develop Apr 23, 2019
@PGijsbers PGijsbers deleted the improve_extension_interface branch April 23, 2019 08:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants