[MRG+1] Fix multi-label issues in IsolationForest benchmark#8638
[MRG+1] Fix multi-label issues in IsolationForest benchmark#8638TomDLT merged 2 commits intoscikit-learn:masterfrom hrjn:fix-iforest-benchmark
Conversation
|
Thanks! I agree that the described fix is appropriate, but now that there are other changes you will need to be patient for a full review. I hope to look soon. |
benchmarks/bench_isolation_forest.py
Outdated
| ax[2].hist(scoring[y_test == 1], bins, color='r', | ||
| label='outliers') | ||
| ax[2].legend(loc="lower right") | ||
| y_pred = model.predict(X_test) |
There was a problem hiding this comment.
you don't need to call predict before decision_function, y_pred is unused,
and it artificially increases the predict time.
benchmarks/bench_isolation_forest.py
Outdated
| bins = np.linspace(-0.5, 0.5, 200) | ||
| ax[0].hist(scoring, bins, color='black') | ||
| ax[0].set_title('Decision function for %s dataset' % dat) | ||
| ax[0].legend(loc="lower right") |
There was a problem hiding this comment.
You can remove this line, since there is no label in this plot.
It will remove the matplotlib warning
benchmarks/bench_isolation_forest.py
Outdated
| ========================================== | ||
|
|
||
| A test of IsolationForest on classical anomaly detection datasets. | ||
| """ |
There was a problem hiding this comment.
Could you put the description on the top of the file
benchmarks/bench_isolation_forest.py
Outdated
|
|
||
| A test of IsolationForest on classical anomaly detection datasets. | ||
| """ | ||
| print(__doc__) |
There was a problem hiding this comment.
You can put the print(__doc__) just below the import and before print_outlier_ratio(y)
benchmarks/bench_isolation_forest.py
Outdated
| fig_roc, ax_roc = plt.subplots(1, 1, figsize=(8, 5)) | ||
|
|
||
| # Set this to true for plotting score histograms for each dataset: | ||
| with_scoring_hists = False |
There was a problem hiding this comment.
To be consistent with the previous example, shouldn't this be set to True by default?
There was a problem hiding this comment.
I found it clearer with just the ROC curves, hence the False by default (but can be changed if required).
There was a problem hiding this comment.
Could you please rename it "with_decision_functions_histograms"?
|
LGTM |
…ges (mostly esthethic) Minor corrections after code review. Minor corrections after 2nd code review. Minor modif
|
Thanks @hrjn ! |
…earn#8638) * Fixed a legacy multi-label issue and added minor refactoring and changes (mostly esthethic) Minor corrections after code review. Minor corrections after 2nd code review. Minor modif * rerun CI
…earn#8638) * Fixed a legacy multi-label issue and added minor refactoring and changes (mostly esthethic) Minor corrections after code review. Minor corrections after 2nd code review. Minor modif * rerun CI
…earn#8638) * Fixed a legacy multi-label issue and added minor refactoring and changes (mostly esthethic) Minor corrections after code review. Minor corrections after 2nd code review. Minor modif * rerun CI
…earn#8638) * Fixed a legacy multi-label issue and added minor refactoring and changes (mostly esthethic) Minor corrections after code review. Minor corrections after 2nd code review. Minor modif * rerun CI
…earn#8638) * Fixed a legacy multi-label issue and added minor refactoring and changes (mostly esthethic) Minor corrections after code review. Minor corrections after 2nd code review. Minor modif * rerun CI
…earn#8638) * Fixed a legacy multi-label issue and added minor refactoring and changes (mostly esthethic) Minor corrections after code review. Minor corrections after 2nd code review. Minor modif * rerun CI
…earn#8638) * Fixed a legacy multi-label issue and added minor refactoring and changes (mostly esthethic) Minor corrections after code review. Minor corrections after 2nd code review. Minor modif * rerun CI
Reference Issue
Fixes #8637
What does this implement/fix? Explain your changes.
In previous version, using Python 3 and
LabelBinarizerto encode categorical features from SA & SF datasets fromkddcup99led to the error described in #8637. Replacing it withMultipleLabelBinarizerfixes the problem and allows the code to run on both Python 2 and 3.Output obtained with new version:

Any other comments?
Additional minor cleaning/refactoring:
with_scoring_histsflag (set toFalseby default) to avoid plotting all score histograms (might potentially clog the screen with figure windows)