-
-
Notifications
You must be signed in to change notification settings - Fork 26.5k
MAINT Parameters validation for chi2_kernel with gamma #26153
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ling parameter gamma
…heck for X and Y as duplicate
…heck cannot be skipped as it also checks for sparse matrix
| "sklearn.tree.export_text", | ||
| "sklearn.tree.plot_tree", | ||
| "sklearn.utils.gen_batches", | ||
| "sklearn.metrics.pairwise.chi2_kernel", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you put this line in alphabetic order
sklearn/metrics/pairwise.py
Outdated
| { | ||
| "X": ["array-like", "sparse matrix"], | ||
| "Y": ["array-like", "sparse matrix", None], | ||
| "gamma": [Interval(Real, None, None, closed="neither")], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am also surprised here that it does fail because we don't allow ndarray which would be required by Gaussian processes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, here we would trigger a regression:
In [2]: from sklearn.datasets import make_friedman2
...: from sklearn.gaussian_process import GaussianProcessRegressor
...: from sklearn.gaussian_process.kernels import DotProduct, WhiteKernel, PairwiseKernel
...: X, y = make_friedman2(n_samples=500, noise=0, random_state=0)
...: kernel = DotProduct() + WhiteKernel() + PairwiseKernel(gamma=1.0, metric="chi2")
...: gpr = GaussianProcessRegressor(kernel=kernel,
...: random_state=0).fit(X, y)InvalidParameterError: The 'gamma' parameter of chi2_kernel must be a float in the range (-inf, inf). Got array([1.]) instead.
So it would be useful to add in test_gpr.py this minimal example to be sure that we have a test triggering this issue.
Here, it means that we need:
| "gamma": [Interval(Real, None, None, closed="neither")], | |
| "gamma": [Interval(Real, None, None, closed="neither"), Hidden(np.ndarray)], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rand0wn please also add a test with what @glemaitre points out. Also, why don't we want to document gamma as an array publicly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because it is an array with unique values. We found out that the only case that this happens is internal to the GP models that provide such entry.
We therefore should still accept this type of entry to avoid regression but we should not document it since we don't support gamma being several values in an array.
…, added ndarray in gamma for gaussian processes
|
Added changes for review again |
jeremiedbb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks @rand0wn
Co-authored-by: jeremiedbb <[email protected]>
Co-authored-by: jeremiedbb <[email protected]>
Reference Issues/PRs
Towards #24862
What does this implement/fix? Explain your changes.
Added automatic parameter validation for "sklearn.metrics.pairwise.chi2_kernel".
Any other comments?
The validation of "sklearn.metrics.pairwise.additive_chi2_kernel" doesn't include validation of gamma, so even through X and Y are revalidated, gamma isn't, additive_chi2_kernel validation can be removed as it may not be needed.