Consider adding a version of multivariate mutual information from the gist below?
If I'm not mistaken, there is no version of mutual information for multiple continuous random variables in sklearn.
Quite a few people have been landing on this gist: https://gist.github.com/GaelVaroquaux/ead9898bd3c973c40429
Any thoughts on including some version of this MI calculation here?
Describe the workflow you want to enable
Describe your proposed solution
Describe alternatives you've considered, if relevant
Additional context
I wanted to understand the correctness of the above method. Is there enough clarity on it?
I wanted to understand the correctness of the above method. Is there enough clarity on it?
@lirus7
It probably needs some testing and cleaning up. A while ago I put it here and have put some notes/reminders about DE in the readme. Haven't gone back to it as have not really used any MI stuff since then.
https://github.com/mutualinfo/mutual_info
@cottrell, Are you sure the implementation is correct because looks like for multivariate, You do a summation of all entropies and subtract the joint entropy, which is not correct.
@lirus7 Not sure at all sure. I remember there were some issues with this stuff (see the notes in the README) and what the actual target of the estimation was. The original gist was not invariant to some things it should been maybe. I think it is basically some estimator of a kind of divergence and should satisfy some invariance properties https://en.wikipedia.org/wiki/Differential_entropy ... if you end up digging into that gist or the code I can add you that project. I haven't looked at it in a while.
I was looking for what could be called "partial multivariate mutual information", assuming the concept even makes sense, in analogy with partial autocorrelation from statsmodels.graphics.tsaplots.plot_pacf.
In other words, I am trying to compute:
- mutual information between y0 and y1
- the extra mutual information gained by adding y2 to the comparison
- the extra mutual information gained by adding y3 to all of the above
- ...and so on until yN
I've tried to apply mutual_information() in a loop, while adding new variables, and keeping a tally of the variations in mutual information in the process as new variables are added. Each variable is just a lagged version of y0, where the lag increases by 1 at each step. The head of each lagged variable is imputed with 0.0.
The results are bizarre, I get negative values a lot. In other words, mutual information sometimes decreases when I add a new variable to the comparison.
Also, mutual_information() does not seem to match very well sklearn.feature_selection.mutual_info_regression when I only have 2 variables.
Any chance I could achieve my goal with the code from the gist?
@FlorinAndrei are your variables discrete or continuous? I think I put a few notes in the README of https://github.com/mutualinfo/mutual_info ... that Differential Entropy can be negative and it is not invariant to change of variables. It's been a while since I've used this though so can't remember all the details.
@cottrell I have a mix of numeric and categorical features, with the categorical being integer-encoded. Generally no float values among numeric features (c == round(c)), so technically they are all "discrete" in that sense.