-
Notifications
You must be signed in to change notification settings - Fork 300
Calculates correlation between cubes #555
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@niallrobinson - I haven't had time to look closely at what you've done yet, but have you considered taking advantage of the built in stats functionality in scipy/numpy e.g. |
|
Hi Ed - I initially wrote this code iterating over the cube using pearsonr() but it was very slow. I don't know how I could use those functions without iterating, but I might have missed something. (As far as I can see, correlate() and pearsonr() both only accept 2 1-D arrays and correlate2d is for cross correlation, but there might be something smart you can do) |
|
@esc24 I actually suggested @niallrobinson do a manual calculation instead of using |
|
I'm wondering now if it might be best to separate this into its own module in analysis. That way we can separate out and generalise the function for managing the reshaping. Then we can pass a function for doing the maths operation to the general function, allowing this approach to be used for other operations than pearsons r - spearmans, covar etc etc |
lib/iris/analysis/maths.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(trivial) Please start with a capital.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As you suggest, making the list of coords optional sounds like a good idea.
|
The comment about separating / generalising could optionally be addressed in another ticket. |
lib/iris/analysis/maths.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please consider making the docstring more clear that the cubes are "assumed to be" comparable in all other dimensions, or adding a check.
|
Thanks Niall, |
lib/iris/analysis/maths.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently if I provide a coordinate that doesn't exist then it goes unnoticed. It would be better to iterate over the coordinates that are passed in and extract the coordinate from the cube using cube.coord which will raise an exception if the coordinate does not exist. Then you can use cube.coord_dims to find the dimension number(s) it corresponds to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 You might need to sort the resulting list of dims too, if you do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Problem with that is that I also want access to the coordinates that are not named for slice_ind and slice_shape. I'll just add a check to make sure non-existent coords are caught though.
I wouldn't consider it calculus, no. If I implied this then I didn't mean to/was momentarily confused! |
|
Rebase required to fix issues with latest standard name table, see #586 |
lib/iris/analysis/maths.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The convention in iris is for multi-line docstrings to start with a blank line. The first line should also have a full stop at the end.
|
Humble apologies for the slow response time on this, @niallrobinson. |
lib/iris/analysis/maths.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These should be views not copies. i.e. data_a = cube_a.data.view().
This sounds similar to the Aggregator objects used by Cube.collapsed() and friends. Are you suggesting a user API which is something like: Or are you talking about an internal helper function used by a variety of user API functions, such as: Or have I got the wrong end of the stick altogether? 😉 |
|
Thanks guys, I'll get on sorting that stuff out. I wrote this a long time ago but I guess I was envisaging the latter (internal helper functions), however, the former may be desirable? I think that using the existing aggregators wouldn't be possible as I have done the calculation explicitly here in order to take advantage of the vectorisation. However, as @bblay said previously
Maybe we should add that to the to do list for another ticket and get this working first? What do you think |
Seems reasonable, and more agile, to worry about refactoring for these future functions in their future PRs but if you're going to pave the way, I agree with you on the latter API. |
|
Sorry to litter this. I was so busy convincing myself I had mastered github commands that I forgot to run the unit tests! |
lib/iris/analysis/stats.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are not dimensions, they are names. Suggest corr_names, corr_coords or alternative.
|
Thanks for the rigorous review of my refactoring. Add look sensible to me so I'll sort them out this morning. |
|
Thanks @niallrobinson, please squash and we'll give the others a chance to comment before merging. |
lib/iris/analysis/stats.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm late to the party here, but I find the docstring quite confusing with respect to the coordinates and what output I might expect. Would it be possible to include a short but verbose example in the docstring?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @ajdawson. The world only makes sense if I keep the concepts of coordinates and dimensions clearly separated. The docstring speaks of dimensions, but the signature refers to coords.
|
commit in response to |
|
Calling @bblay - think the Travis build has fallen over erroneously. |
|
I just restarted it (there's an option in the settings drop-down to do that). |
|
@bblay That option is greyed out for me, event after logging in with github etc. Also, I think it failed again! |
|
That happens to me sometimes too. Re-restarted. |
|
Ok, it's not timed out this time. Those new test failures are not your fault, @niallrobinson. I believe @esc24 is looking into this problem, which affects all Iris tests. |
|
I've implemented a temporary workaround on upstream/master (fixed netCDF4 to version 1.0.2 in .travis.yml) so you could rebase. |
|
🎉 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs a iris.tests.skip_data to indicate that this test needs real data.
|
This is looking great. |
|
@bblay done...? I believe we have realised we need to comment to get attention after a commit. |
|
@bblay done and passed |
|
Lovely. |
Calculates correlation between cubes
|
Phew, good stuff @niallrobinson! |
|
Hurray! Thanks for sticking with me.
The "statistics" themselves are basically three lines, and I'm perfectly happy with them. Its all the housekeeping - the shape manipulation etc - that I need to think carefully about. |
some code to calculate the correlation between cubes over any dimensions (including all of them). Not sure if the tests are up to scratch but they are some anyway. Maybe should make the coord list optional and just calc corr over all dimensions in that case?