Calculates correlation between cubes #555

niallrobinson · 2013-06-13T15:44:52Z

some code to calculate the correlation between cubes over any dimensions (including all of them). Not sure if the tests are up to scratch but they are some anyway. Maybe should make the coord list optional and just calc corr over all dimensions in that case?

esc24 · 2013-06-14T07:48:24Z

@niallrobinson - I haven't had time to look closely at what you've done yet, but have you considered taking advantage of the built in stats functionality in scipy/numpy e.g. numpy.correlate(), scipy.stats.pearsonr(), scipy.signal.correlate2d() etc.? These may be of use.

niallrobinson · 2013-06-14T08:01:15Z

Hi Ed - I initially wrote this code iterating over the cube using pearsonr() but it was very slow. I don't know how I could use those functions without iterating, but I might have missed something. (As far as I can see, correlate() and pearsonr() both only accept 2 1-D arrays and correlate2d is for cross correlation, but there might be something smart you can do)

ajdawson · 2013-06-14T09:12:11Z

@esc24 I actually suggested @niallrobinson do a manual calculation instead of using scipy.stats.pearsonr because of the vectorization opportunity it presents. I've written similar code in the past and expressing it as a matrix multiplication allows one to compute all the correlations in one go rather than iterating, and is fast since matrix-matrix multiplication is probably one of the most optimized array operations going.

niallrobinson · 2013-06-14T09:38:07Z

I'm wondering now if it might be best to separate this into its own module in analysis. That way we can separate out and generalise the function for managing the reshaping. Then we can pass a function for doing the maths operation to the general function, allowing this approach to be used for other operations than pearsons r - spearmans, covar etc etc

bblay · 2013-06-24T11:47:25Z

lib/iris/analysis/maths.py

(trivial) Please start with a capital.

As you suggest, making the list of coords optional sounds like a good idea.

bblay · 2013-06-24T11:51:30Z

The comment about separating / generalising could optionally be addressed in another ticket.
However, I must ask (forgive my ignorance), is this type of calculation generally considered to be "calculus"?

bblay · 2013-06-24T11:54:11Z

lib/iris/analysis/maths.py

Please consider making the docstring more clear that the cubes are "assumed to be" comparable in all other dimensions, or adding a check.

bblay · 2013-06-24T12:03:14Z

Thanks Niall,
I've reviewed this PR.
No major issues.

ajdawson · 2013-06-24T13:03:45Z

lib/iris/analysis/maths.py

Currently if I provide a coordinate that doesn't exist then it goes unnoticed. It would be better to iterate over the coordinates that are passed in and extract the coordinate from the cube using cube.coord which will raise an exception if the coordinate does not exist. Then you can use cube.coord_dims to find the dimension number(s) it corresponds to.

👍 You might need to sort the resulting list of dims too, if you do that.

Problem with that is that I also want access to the coordinates that are not named for slice_ind and slice_shape. I'll just add a check to make sure non-existent coords are caught though.

niallrobinson · 2013-07-01T14:05:02Z

@bblay

is this type of calculation generally considered to be "calculus"?

I wouldn't consider it calculus, no. If I implied this then I didn't mean to/was momentarily confused!

cpelley · 2013-07-02T08:15:39Z

Rebase required to fix issues with latest standard name table, see #586

ajdawson · 2013-07-02T08:18:08Z

lib/iris/analysis/maths.py

The convention in iris is for multi-line docstrings to start with a blank line. The first line should also have a full stop at the end.

bblay · 2013-07-24T10:04:14Z

Humble apologies for the slow response time on this, @niallrobinson.
Travis fell asleep during the last test.

rhattersley · 2013-07-25T10:09:37Z

lib/iris/analysis/maths.py

These should be views not copies. i.e. data_a = cube_a.data.view().

rhattersley · 2013-07-25T10:43:12Z

I'm wondering now if it might be best to separate this into its own module in analysis. That way we can separate out and generalise the function for managing the reshaping. Then we can pass a function for doing the maths operation to the general function, allowing this approach to be used for other operations than pearsons r - spearmans, covar etc etc

This sounds similar to the Aggregator objects used by Cube.collapsed() and friends. Are you suggesting a user API which is something like:

result = iris.analysis.stats.do_stuff(cube_a, cube_b, iris.analysis.stats.PEARSON)
result = iris.analysis.stats.do_stuff(cube_a, cube_b, iris.analysis.stats.SPEARMAN)
result = iris.analysis.stats.do_stuff(cube_a, cube_b, iris.analysis.stats.COVARIANCE)
...

Or are you talking about an internal helper function used by a variety of user API functions, such as:

result = iris.analysis.stats.pearson(cube_a, cube_b)
result = iris.analysis.stats.spearman(cube_a, cube_b)
result = iris.analysis.stats.covariance(cube_a, cube_b)

Or have I got the wrong end of the stick altogether? 😉

niallrobinson · 2013-07-29T08:17:10Z

Thanks guys, I'll get on sorting that stuff out.

I wrote this a long time ago but I guess I was envisaging the latter (internal helper functions), however, the former may be desirable? I think that using the existing aggregators wouldn't be possible as I have done the calculation explicitly here in order to take advantage of the vectorisation.

However, as @bblay said previously

The comment about separating / generalising could optionally be addressed in another ticket.

Maybe we should add that to the to do list for another ticket and get this working first? What do you think

bblay · 2013-07-29T09:43:01Z

another ticket

Seems reasonable, and more agile, to worry about refactoring for these future functions in their future PRs but if you're going to pave the way, I agree with you on the latter API.

niallrobinson · 2013-07-29T10:49:14Z

Sorry to litter this. I was so busy convincing myself I had mastered github commands that I forgot to run the unit tests!

bblay · 2013-08-05T15:59:14Z

lib/iris/analysis/stats.py

These are not dimensions, they are names. Suggest corr_names, corr_coords or alternative.

niallrobinson · 2013-08-06T08:37:27Z

Thanks for the rigorous review of my refactoring. Add look sensible to me so I'll sort them out this morning.

bblay · 2013-08-06T13:49:29Z

Thanks @niallrobinson, please squash and we'll give the others a chance to comment before merging.

ajdawson · 2013-08-06T14:04:57Z

lib/iris/analysis/stats.py

I'm late to the party here, but I find the docstring quite confusing with respect to the coordinates and what output I might expect. Would it be possible to include a short but verbose example in the docstring?

I agree with @ajdawson. The world only makes sense if I keep the concepts of coordinates and dimensions clearly separated. The docstring speaks of dimensions, but the signature refers to coords.

niallrobinson · 2013-08-06T15:17:57Z

@esc24 @ajdawson Hopefully that's slightly clearer now. Its a bit tortured to explain but the example should make it clearer

niallrobinson · 2013-08-07T14:12:11Z

commit in response to
niallrobinson@f4af695#commitcomment-3798884
which isn't shown on this feed for whatever reason

niallrobinson · 2013-08-13T13:11:40Z

Calling @bblay - think the Travis build has fallen over erroneously.

bblay · 2013-08-14T11:30:00Z

I just restarted it (there's an option in the settings drop-down to do that).

niallrobinson · 2013-08-14T12:40:07Z

@bblay That option is greyed out for me, event after logging in with github etc. Also, I think it failed again!

bblay · 2013-08-14T15:03:14Z

That happens to me sometimes too. Re-restarted.

bblay · 2013-08-15T09:17:04Z

Ok, it's not timed out this time. Those new test failures are not your fault, @niallrobinson. I believe @esc24 is looking into this problem, which affects all Iris tests.

esc24 · 2013-08-16T08:06:20Z

I've implemented a temporary workaround on upstream/master (fixed netCDF4 to version 1.0.2 in .travis.yml) so you could rebase.

niallrobinson · 2013-08-21T08:34:26Z

🎉

pelson · 2013-08-21T14:28:19Z

lib/iris/tests/analysis/test_stats.py

Needs a iris.tests.skip_data to indicate that this test needs real data.

bblay · 2013-08-22T08:31:31Z

This is looking great.
Just added a minor request for a comment.
If you then squash I'll merge it!
Thanks for all your hard work and patience on this.

niallrobinson · 2013-08-22T13:42:36Z

@bblay done...? I believe we have realised we need to comment to get attention after a commit.

niallrobinson · 2013-08-22T14:59:24Z

@bblay done and passed

bblay · 2013-08-22T16:12:58Z

Lovely.
I'm a little vague on the statistical process itself, but happy that several experts have seen it.

Calculates correlation between cubes

bblay · 2013-08-22T17:32:08Z

Phew, good stuff @niallrobinson!

niallrobinson · 2013-08-23T08:19:08Z

Hurray! Thanks for sticking with me.

I'm a little vague on the statistical process itself, but happy that several experts have seen it.

The "statistics" themselves are basically three lines, and I'm perfectly happy with them. Its all the housekeeping - the shape manipulation etc - that I need to think carefully about.

ghost assigned bblay Jun 24, 2013

bblay reviewed Jun 24, 2013
View reviewed changes

ajdawson reviewed Jun 24, 2013
View reviewed changes

ajdawson reviewed Jul 2, 2013
View reviewed changes

rhattersley reviewed Jul 25, 2013
View reviewed changes

lib/iris/analysis/maths.py Outdated

Copy link

Member

rhattersley Jul 25, 2013

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should be views not copies. i.e. data_a = cube_a.data.view().

bblay reviewed Aug 5, 2013
View reviewed changes

lib/iris/analysis/stats.py Outdated

Copy link

Contributor

bblay Aug 5, 2013

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are not dimensions, they are names. Suggest corr_names, corr_coords or alternative.

ajdawson reviewed Aug 6, 2013
View reviewed changes

pelson reviewed Aug 21, 2013
View reviewed changes

lib/iris/tests/analysis/test_stats.py Outdated

Copy link

Member

pelson Aug 21, 2013

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs a iris.tests.skip_data to indicate that this test needs real data.

Calculates correlation between cubes

173064a

bblay added a commit that referenced this pull request Aug 22, 2013

Merge pull request #555 from niallrobinson/cube_correlation

a909184

Calculates correlation between cubes

bblay merged commit a909184 into SciTools:master Aug 22, 2013

niallrobinson deleted the cube_correlation branch August 23, 2013 08:19

rcomer mentioned this pull request Dec 14, 2023

Re-write pearsonr to use Resolve #5638

Merged

Calculates correlation between cubes #555

Calculates correlation between cubes #555

Uh oh!

Conversation

niallrobinson commented Jun 13, 2013

Uh oh!

esc24 commented Jun 14, 2013

Uh oh!

niallrobinson commented Jun 14, 2013

Uh oh!

ajdawson commented Jun 14, 2013

Uh oh!

niallrobinson commented Jun 14, 2013

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bblay commented Jun 24, 2013

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bblay commented Jun 24, 2013

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

niallrobinson commented Jul 1, 2013

Uh oh!

cpelley commented Jul 2, 2013

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bblay commented Jul 24, 2013

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rhattersley commented Jul 25, 2013

Uh oh!

niallrobinson commented Jul 29, 2013

Uh oh!

bblay commented Jul 29, 2013

Uh oh!

niallrobinson commented Jul 29, 2013

Uh oh!

Choose a reason for hiding this comment

Uh oh!

niallrobinson commented Aug 6, 2013

Uh oh!

bblay commented Aug 6, 2013

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

niallrobinson commented Aug 6, 2013

Uh oh!

niallrobinson commented Aug 7, 2013

Uh oh!

niallrobinson commented Aug 13, 2013

Uh oh!

bblay commented Aug 14, 2013

Uh oh!

niallrobinson commented Aug 14, 2013

Uh oh!

bblay commented Aug 14, 2013

Uh oh!

bblay commented Aug 15, 2013

Uh oh!

esc24 commented Aug 16, 2013

Uh oh!

niallrobinson commented Aug 21, 2013

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bblay commented Aug 22, 2013

Uh oh!