Improve extension interface #673

mfeurer · 2019-04-15T15:05:12Z

Continuation of #647:

Remove obtain_arff_trace and return the trace headers in each fold and later on check that they're all the same.
Simplify run_model_on_fold to accept X and y and do not split the data itself
Simplify the return value of run_model_on_fold
~~[ ] Have functionality for transforming predictions into arff lines in the different task classes.~~ (As discussed with JvR offline, we'll keep task-specific code here until the maintenance becomes an issue as for now the code would be more complex if such transformations would be in the task classes).
~~[ ] Figure out how to best handle dependencies (as raised here)~~ (this is beyond the scope of this PR)
add more unit tests

janvanrijn · 2019-04-16T08:01:06Z

openml/extensions/sklearn/extension.py

        task: 'OpenMLTask',
+        X_train: Union[np.ndarray, scipy.sparse.spmatrix, pd.DataFrame],
+        y_train: np.ndarray,
        rep_no: int,


can be removed, together with fold_no, sample_no

rep_no and fold_no are necessary to create trace objects.

janvanrijn · 2019-04-16T08:01:26Z

openml/extensions/sklearn/extension.py

        rep_no: int,
        fold_no: int,
        sample_no: int,
        add_local_measures: bool,


can be removed as it can be handled a level up

openml/extensions/extension_interface.py

openml/extensions/sklearn/extension.py

PGijsbers · 2019-04-18T08:41:44Z

openml/extensions/sklearn/extension.py

+                # correct probability array (the actualy array might be incorrect if there are some
+                # classes not present during train time).
+                proba_y_new = np.zeros((proba_y.shape[0], len(classes)))
+                for idx, model_class in enumerate(model_classes):


I don't understand what you're trying to do here.
If targets are mapped to be zero-based indices, wouldn't enumerate(model_classes) just evaluate to (1,1), (2,2), ..., (Kt, Kt) where Kt is the number of classes in the training data?
If so why not do either

proba_y_new = np.zeros((proba_y.shape[0], len(classes))) proba_y_new[:, :proba_y.shape[1]] = proba_y

or use np.hstack to pad zero-columns?

The zero-columns might need to be in the middle of the array. I added an example to the comment.

openml/extensions/sklearn/extension.py

PGijsbers · 2019-04-18T08:50:33Z

openml/extensions/sklearn/extension.py

-
        if isinstance(task, (OpenMLClassificationTask, OpenMLLearningCurveTask)):

+            if classes is None:


I would personally prefer argument checking to be done at the start of the method call such that all constraints are clear, and TypeError lead to a quick fail (not e.g. after running a whole grid search!). Even though I know it possibly introduces an extra line of code :)

That check will go away when using task.class_labels. I will add a check that X_test and y_train are given at the top of the function.

PGijsbers · 2019-04-18T09:00:21Z

openml/runs/functions.py

+    classes = None

+    n_fit = 0
    for rep_no in range(num_reps):


maybe replace this with itertools.product(num_reps, num_folds, num_samples) so we get rid of two levels of indentation and can maybe spare some ugly line-breaks below.

PGijsbers · 2019-04-18T09:01:01Z

openml/runs/functions.py

    for rep_no in range(num_reps):
        for fold_no in range(num_folds):
            for sample_no in range(num_samples):
+                n_fit += 1


I don't see a reason not to use enumerate (it looks like you want to start at 1, which is done with enumerate(Sequence, start=1)

Done, thanks for the tip with start=1.

PGijsbers · 2019-04-18T09:05:12Z

openml/runs/functions.py

+                else:
+                    raise TypeError(type(task))
+
                arff_datacontent.extend(arff_datacontent_fold)


What is the use arff_datacontent_fold? It is defined in line 446 but I don't see it being used anywhere else. Is this an error? Or is it legacy? If it is legacy, I would prefer arff_datacontent.extend([]) # legacy or something equivalent.

It's not used any more so I removed it.

PGijsbers · 2019-04-18T09:06:33Z

openml/runs/functions.py

+    if len(traces) > 0:
+        if len(traces) != n_fit:
+            raise ValueError(
+                'Did not find enough traces (expected %d, found %d)' % (n_fit, len(traces))


openml/runs/trace.py

PGijsbers · 2019-04-18T09:18:06Z

openml/runs/trace.py

+        if not isinstance(other, OpenMLTraceIteration):
+            return False
+        attributes = [
+            'repeat', 'fold', 'iteration', 'setup_string', 'evaluation', 'selected', 'paramaters',


I noticed the typo paramaters here, and also noticed this is present all throughout the init method. Could you fix that up there too?
Also, it looks like this code probably isn't tested then? Since getattr should raise an exception as this object does not have an attribute paramater as (somehow) the attribute to the object itself is spelled correctly :)

I fixed the attribute and removed the equals function as it was not used.

openml/tasks/task.py

PGijsbers · 2019-04-18T09:19:05Z

openml/tasks/task.py


-    def get_X_and_y(self):
+    def get_X_and_y(self, dataset_format='array'):
        """Get data associated with the current task.


type hint, doc string with legal values for dataset_format parameter.

I think documentation on the dataset_format parameter is still missing?
I don't know what the possible valid string values are (I think 'array' and 'dataframe'?).

PGijsbers · 2019-04-18T09:19:30Z

openml/runs/trace.py

        return cls(run_id, trace)

+    @classmethod
+    def merge_traces(cls, traces: List['OpenMLRunTrace']):


return type not specified in typehint

PGijsbers · 2019-04-18T09:24:07Z

tests/test_extensions/test_sklearn_extension/test_sklearn_extension.py

-        self.assertEqual(len(arff_tracecontent), 0)
+        self.assertIsNone(trace)
+
+        self._check_fold_timing_evaluations(fold_evaluations, num_repeats, num_folds,


since neither num_repeats nor num_folds seem to be used throughout the test other than here, I would prefer the call to just be

self._check_fold_timing_evaluations(fold_evaluations, num_repeats=1, num_folds=1, task_type=task.task_type_id, check_scores=False)

which makes that clear.

PGijsbers · 2019-04-18T09:32:21Z

tests/test_extensions/test_sklearn_extension/test_sklearn_extension.py

+        self.assertIsInstance(trace, OpenMLRunTrace)
+        self.assertEqual(len(trace.trace_iterations), 2)
+
        self._check_fold_timing_evaluations(fold_evaluations, num_repeats, num_folds,


same note on num_repeats and num_folds

tests/test_extensions/test_sklearn_extension/test_sklearn_extension.py

PGijsbers · 2019-04-18T09:49:08Z

tests/test_extensions/test_sklearn_extension/test_sklearn_extension.py

+        # trace. SGD does not produce any
+        self.assertIsNone(trace)
+
+        self._check_fold_timing_evaluations(fold_evaluations, num_repeats, num_folds,


num_repeats

PGijsbers · 2019-04-18T09:49:26Z

tests/test_extensions/test_sklearn_extension/test_sklearn_extension.py

+        # trace. SGD does not produce any
+        self.assertIsNone(trace)
+
+        self._check_fold_timing_evaluations(fold_evaluations, num_repeats, num_folds,


num_repeats

PGijsbers

I feel fine about accepting the PR despite it not passing the unit tests, as I am now also convinced they are server issues and not something introduced by this PR.
That said, there are a few minor remarks left :)

PGijsbers · 2019-04-18T14:48:20Z

openml/runs/trace.py

+                merged_trace[key] = iteration
+                previous_iteration = key
+
+        return cls(None, merged_trace)


I haven't tested this, but I think this can be a bit more pythonic and clearer if you first check the error condition, and make use of zip to avoid indexing:

if any([sorted(iter1.parameters) != sorted(iter2.parameters) for trace in traces for (iter1, iter2) in zip(trace, trace[1:])]): raise ValueError(...) merged_trace = { (iteration.repeat, iteration.fold, iteration.iteration) : iteration for trace in traces for iteration in trace }

(note that list(Dict) and list(Dict.keys()) should be equivalent. If you are not concerned about insertion order, you can use list (which is ordered based on insertion order). If insertion order can be a concern, use sorted instead.

Thanks for the suggestion. I'm not 100% sure whether this will allow the same flexibility, for example, the first statement would only check the iterations in one trace, and it would be hard to extract a useful error message out of it. Therefore, i suggest leaving the code as it is.

Right, I missed that.

all_iterations = [iter for trace in traces for iter in trace] for (iter1, iter2) in zip(all_iterations, all_iterations[1:]): if sorted(iter1.parameters) != sorted(iter2.parameters): raise ValueError(...)

should then work fine? But I am fine leaving this as is (and perhaps refactoring it later if we can find something we all like).

PGijsbers · 2019-04-18T14:52:03Z

openml/tasks/task.py


-    def get_X_and_y(self):
+    def get_X_and_y(self, dataset_format='array'):
        """Get data associated with the current task.


I think documentation on the dataset_format parameter is still missing?
I don't know what the possible valid string values are (I think 'array' and 'dataframe'?).

openml/tasks/task.py

codecov-io · 2019-04-18T17:43:48Z

Codecov Report

Merging #673 into develop will decrease coverage by 0.04%.
The diff coverage is 85.97%.

@@             Coverage Diff             @@
##           develop     #673      +/-   ##
===========================================
- Coverage    90.82%   90.77%   -0.05%     
===========================================
  Files           36       36              
  Lines         3573     3608      +35     
===========================================
+ Hits          3245     3275      +30     
- Misses         328      333       +5

Impacted Files	Coverage Δ
openml/tasks/task.py	`96.11% <100%> (+0.32%)`	⬆️
openml/testing.py	`95.37% <100%> (+0.04%)`	⬆️
openml/_api_calls.py	`83.11% <66.66%> (ø)`	⬆️
openml/extensions/extension_interface.py	`91.17% <75%> (-0.26%)`	⬇️
openml/runs/functions.py	`82% <79.36%> (-1.95%)`	⬇️
openml/extensions/sklearn/extension.py	`90.86% <89.83%> (+0.95%)`	⬆️
openml/runs/trace.py	`91.22% <90.47%> (+0.14%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4152f91...0b01581. Read the comment docs.

mfeurer · 2019-04-19T21:48:38Z

Okay @PGijsbers I found the issue. After some offline discussion with @janvanrijn I figured out that we can see the reason why there is no trace in the error log of the run. It turns out that there were three bugs:

there was an indexing error which caused the predictions to have wrong indices and therefore make the evaluation fail. I fixed this.
in case there was no evaluation, after 200 seconds the test continued and raised the cryptic error message that there was no trace associated. I added a useful error message.
In case there were no evaluations, due to recently added run caching, the test could not retrieve any new evaluations by the server because it would continue to check the cached run. I added a flag to ignore the cached run and download it again from the server.

All in all, this should be good to merge now.

mfeurer changed the base branch from master to develop April 15, 2019 15:05

janvanrijn reviewed Apr 16, 2019

View reviewed changes

mfeurer added 6 commits April 17, 2019 09:44

simplify extension interface

38e02ef

simplify interface further

fc46df7

simplify the extension interface even more

4e971f4

fix test & pep8 & mypy

2228059

add extra tests, minor refactoring

deda557

pep8 and better docstrings

8abfb23

mfeurer force-pushed the improve_extension_interface branch from ba48e55 to 8abfb23 Compare April 17, 2019 18:16

mfeurer marked this pull request as ready for review April 17, 2019 18:16

mfeurer requested review from PGijsbers and janvanrijn April 17, 2019 18:16

make regex more leniant

7565e1a