[MRG+1] Add an example and a method to analyse the decision tree stucture by arjoly · Pull Request #5487 · scikit-learn/scikit-learn

arjoly · 2015-10-20T13:42:19Z

Ping @glouppe, @pprett, @ogrisel, @amueller

Suggestions are welcome to improve the example.

It should fix #1105 and #5441.

glouppe · 2015-10-20T13:53:11Z

examples/tree/plot_structure.py

On Python 2, this yields an error.

➜ scikit-learn git:(9c2af10) python examples/tree/plot_structure.py File "examples/tree/plot_structure.py", line 66 SyntaxError: Non-ASCII character '\xc2' in file examples/tree/plot_structure.py on line 66, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details

(Everything works fine under Python 3 though)

glouppe · 2015-10-20T13:55:14Z

It would be nice to include in the example a visualization of the tree, as done with export_graphviz.

glouppe · 2015-10-20T13:55:57Z

examples/tree/plot_structure.py

could -> can

arjoly · 2015-10-20T16:04:12Z

@glouppe I have taken your comments into account.

arjoly · 2015-10-20T16:21:01Z

ping @jmschrei

arjoly · 2015-10-20T16:21:15Z

ping @jnothman

glouppe · 2015-10-20T18:07:04Z

It would be nice to include in the example a visualization of the tree, as done with export_graphviz.

What about this? I am not sure it easily feasible though, since the example gallery expect plots to be generated from matplotlib... (as far as I know)

glouppe · 2015-10-20T18:07:29Z

Other than that, the rest of the code looks good to me. Thanks for this!

jmschrei · 2015-10-20T19:02:51Z

sklearn/tree/_tree.pyx

Might want to mention that it returns a sparse matrix, and the reason (because it has to have as many columns as the maximal path)

Also, I think it should be decision_path, not decision_paths

jmschrei · 2015-10-20T19:16:14Z

Do you think it might be worthwhile to wrap some of the attributes of _tree in the DecisionTreeRegressor/Classifier object? Such as node_count, and maybe even the arrays? While it makes sense to me why we have a tree object underlying a tree object, it might be easier for others.

jmschrei · 2015-10-20T19:25:16Z

It would be nice to include in the example a visualization of the tree, as done with export_graphviz.

It would be nice to have an option where you can pass in a tree, and a path, and get a picture of the tree with the path highlighted. I understand this may be out of scope for this PR though.

arjoly · 2015-10-20T19:26:24Z

Do you think it might be worthwhile to wrap some of the attributes of _tree in the DecisionTreeRegressor/Classifier object? Such as node_count, and maybe even the arrays? While it makes sense to me why we have a tree object underlying a tree object, it might be easier for others

I don't have a strong opinion. However, I believe it will complicate the overall code by adding many properties.

arjoly · 2015-10-20T19:28:43Z

It would be nice to include in the example a visualization of the tree, as done with export_graphviz.

It would be nice to have an option where you can pass in a tree, and a path, and get a picture of the tree with the path highlighted. I understand this may be out of scope for this PR though

This is a good idea however I don't see how to make this with tools such as matplotlib.

It would be nice to include in the example a visualization of the tree, as done with export_graphviz.

What about this? I am not sure it easily feasible though, since the example gallery expect plots to be generated from matplotlib... (as far as I know)

I forgot this one. I can use export_graphviz, but I fear that I won't get anything better than a list of unreadable strings.

glouppe · 2015-10-21T05:42:50Z

I forgot this one. I can use export_graphviz, but I fear that I won't get anything better than a list of unreadable strings.

We could maybe convert it to an image and then "plot" it using plt.imshow?

arjoly · 2015-10-21T08:48:10Z

@jmschrei I have renamed the function, added depth * '\t' when I plot the decision tree in the example and hardcoded some node indicator value.

@glouppe I am not sure that this is possible to use graphviz on the doc builder. Maybe this should be made in another pr.

arjoly · 2015-10-21T08:49:53Z

I am thinking of adding a decision path in the forest. This could be useful to generate new feature from the forest.

arjoly · 2015-10-21T09:10:44Z

this should be ready for a new round of review.

glouppe · 2015-10-21T09:13:12Z

@glouppe I am not sure that this is possible to use graphviz on the doc builder. Maybe this should be made in another pr.

Okay, fair enough. I'll see what I can do once this is merged.

arjoly · 2015-10-21T15:10:43Z

I have taken into account your comment @amueller.

amueller · 2015-10-21T15:52:27Z

one of the five ;)

arjoly · 2015-10-21T16:09:29Z

Now it's good;

amueller · 2015-10-21T16:10:19Z

thanks :) I didn't review the decision_path (yet?) so I don't want to give a +1.

ngoix · 2015-10-21T16:56:59Z

examples/tree/unveil_tree_structure.py

elements -> element
sthrough -> through

ngoix · 2015-10-21T17:03:26Z

I find the example very useful !

jmschrei · 2015-10-21T23:27:05Z

sklearn/tree/tests/test_tree.py

I think I am misunderstanding these paths. How can the first point go to node 1, then node 1 again, then node 0? My understanding was that the decision path was an array of node IDs, ending in a leaf node.

It is the binary encoded version of this. If path[i] == 1, then the sample traverses node with node_id i.

Okay. So then the later nodes always correspond to later nodes visited, so using it as a mask on the nodes gives you the path. Got it.

glouppe · 2015-10-22T08:51:36Z

Anybody else for a review? or @jmschrei @ngoix can give your +1 if you think this is ready?

(It would be nice to have this merged quickly, so that work on #4163 can continue.)

ngoix · 2015-10-22T09:54:57Z

sklearn/ensemble/tests/test_forest.py

I don't understand the almost_equal

ngoix · 2015-10-22T11:00:33Z

For what I understood it seems good to me but I'm not fluent enough in cython to give a +1.

[MRG+1] Add an example and a method to analyse the decision tree stucture

agramfort · 2015-10-22T11:31:59Z

Thanks

This reverts commit 46ad44a.

example + benchmark explanation make some private functions + fix public API IForest using BaseForest base class for trees debug + plot_iforest classic anomaly detection datasets and benchmark small modif BaseBagging inheritance shuffle dataset before benchmarking BaseBagging inheritance remove class label 4 from shuttle dataset pep8 + rm shuttle.csv bench_IsolationForest.png + doc decision_function add tests remove comments fetching kddcup99 and shuttle datasets fetching kddcup99 and shuttle datasets pep8 fetching kddcup99 and shuttle datasets pep8 new files iforest.py and test_iforest.py sc alternative to pandas (but very slow) in kddcup99.py faster parser sc pep8 + cleanup + simplification example outlier detection clean and correct idem random_state added percent10=True in benchmark mc remove shuttle + minor changes sc undo modif on forest.py and recompile cython on _tree.c fix travis cosmit change bagging to fix travis Revert "change bagging to fix travis" This reverts commit 30ea500. add max_samples_ in BaseBagging.fit to fix travis mc API : don't add fit param but use a private _fit + update tests + examples to avoid warning adapt to the new structure of _tree.pyx cosmit add performance test for iforest add _tree.c _utils.c _criterion.c TST : pass on tests remove test relax roc-auc to fix AppVeyor add test on toy samples Handle depth averaging at python level plot example: rm html add png load_kddcup99 -> fetch_kddcup99 + doc Take into account arjoly comments sh -> shuffle add decision_path code from scikit-learn#5487 to bench Take into account arjoly comments Revert "add decision_path code from scikit-learn#5487 to bench" This reverts commit 46ad44a. fix bug with max_samples != int

arjoly changed the title ~~Add an example and a method to analyse the decision tree stucture~~ [MRG] Add an example and a method to analyse the decision tree stucture Oct 20, 2015

arjoly force-pushed the example-node branch from 7ca3b2d to 9c2af10 Compare October 20, 2015 13:47

glouppe reviewed Oct 20, 2015
View reviewed changes

examples/tree/plot_structure.py Outdated

Copy link
Copy Markdown

Contributor

glouppe Oct 20, 2015

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could -> can

jmschrei reviewed Oct 20, 2015
View reviewed changes

glouppe mentioned this pull request Oct 21, 2015

[MRG + 1] Isolation forest - new anomaly detection algo #4163

Merged

arjoly added 4 commits October 21, 2015 17:05

typo

405c717

Add a decision_path function to forest estimator

7d4755e

Take into account glouppe comments

c9888a6

Take into account amueller comments

ec8fbef

arjoly force-pushed the example-node branch from e752a0f to ec8fbef Compare October 21, 2015 15:06

wording

4d0ba3c

Rename since we don't plot anything

7ad90f5

ngoix reviewed Oct 21, 2015
View reviewed changes

examples/tree/unveil_tree_structure.py Outdated

Copy link
Copy Markdown

Contributor

ngoix Oct 21, 2015

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

elements -> element
sthrough -> through

jmschrei reviewed Oct 21, 2015
View reviewed changes

Take into account ngoix comment

3fbd3b9

ngoix reviewed Oct 22, 2015
View reviewed changes

sklearn/ensemble/tests/test_forest.py

Copy link
Copy Markdown

Contributor

ngoix Oct 22, 2015

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the almost_equal

agramfort added a commit that referenced this pull request Oct 22, 2015

Merge pull request #5487 from arjoly/example-node

5a58e56

[MRG+1] Add an example and a method to analyse the decision tree stucture

agramfort merged commit 5a58e56 into scikit-learn:master Oct 22, 2015

glouppe mentioned this pull request Oct 22, 2015

Add an example showing how to extract prediction paths in forests #5441

Closed

ngoix added a commit to ngoix/scikit-learn that referenced this pull request Oct 22, 2015

add decision_path code from scikit-learn#5487 to bench

46ad44a

ngoix added a commit to ngoix/scikit-learn that referenced this pull request Oct 22, 2015

Revert "add decision_path code from scikit-learn#5487 to bench"

0377cd0

This reverts commit 46ad44a.

shkupfer mentioned this pull request Aug 20, 2018

Random Forest Feature Contributions for Individual Predictions #11861

Open

Uh oh!

Conversation

arjoly commented Oct 20, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

glouppe commented Oct 20, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

arjoly commented Oct 20, 2015

Uh oh!

arjoly commented Oct 20, 2015

Uh oh!

arjoly commented Oct 20, 2015

Uh oh!

glouppe commented Oct 20, 2015

Uh oh!

glouppe commented Oct 20, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jmschrei commented Oct 20, 2015

Uh oh!

jmschrei commented Oct 20, 2015

Uh oh!

arjoly commented Oct 20, 2015

Uh oh!

arjoly commented Oct 20, 2015

Uh oh!

glouppe commented Oct 21, 2015

Uh oh!

arjoly commented Oct 21, 2015

Uh oh!

arjoly commented Oct 21, 2015

Uh oh!

arjoly commented Oct 21, 2015

Uh oh!

glouppe commented Oct 21, 2015

Uh oh!

arjoly commented Oct 21, 2015

Uh oh!

amueller commented Oct 21, 2015

Uh oh!

arjoly commented Oct 21, 2015

Uh oh!

amueller commented Oct 21, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ngoix commented Oct 21, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

glouppe commented Oct 22, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ngoix commented Oct 22, 2015

Uh oh!

agramfort commented Oct 22, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants