-
Notifications
You must be signed in to change notification settings - Fork 44
Make multimodel work correctly with yearly data #677
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Peter9192
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @valeriupredoi , I started studying the code, but I need more time to fully appreciate what's going on. I do feel that some things can still be simplified, both in the PR and in the existing code. Hoping to get back to this tomorrow.
Co-authored-by: Peter Kalverla <[email protected]>
Co-authored-by: Peter Kalverla <[email protected]>
|
yeah cheers mate, better implementation 🍺 - go for it, I was a bit in a hurry when plugging these in, I think too there's room for improvement defo |
|
|
||
| # get the number of days starting from the reference unit | ||
| time_unit = cube.coord('time').units.name | ||
| time_offset = _get_time_offset(time_unit) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @valeriupredoi , this is one of the points that really confuses me. What happens if different cubes have different time offsets? Don't you want the offset with respect to a common reference time? I'm thinking something like
def _datetime_to_int_days(cube, overlap=False):
"""Return list of int(days) with respect to a common reference.
Cubes may have different calendars. This function extracts the date
information from the cube and re-constructs a default calendar using
the datetime library.
This doesn't work for daily or subdaily data, because different
calendars may have different number of days in the year.
The standard calendar used by datetime provides the to_ordinal()
function which returns "int number of days since the 0001/01/01".
"""
# Convert to a common time format
years = [cell.point.year for cell in cube.coord('time').cells()]
months = [cell.point.month for cell in cube.coord('time').cells()]
if not 0 in np.diff(years):
# yearly data
standard_dates = [datetime(year, 7, 1) for year in years]
elif not 0 in np.diff(months):
# monthly data
standard_dates = [datetime(year, month, 15) for year, month in zip(years, months)]
else:
raise ValueError("Multimodel only supports yearly or monthly data")
return [dt.to_ordinal() for dt in standard_dates]I can work this out further if you like?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes that's exactly that, I use the number of days since the start of the data taken from units forcing a STANDARD_CALENDAR eg days since 1950-01-01, calendar=STANDARD_CALENDAR - this works fine if there is no variation in calendars since the units are all the same for all cubes after cmor.fix unified the units, but I suspect it's the cause of the shifts @LisaBock is seeing when calendars differ
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw isn't it toordinal()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw isn't it
toordinal()?
Right you are 🍺
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this works fine if there is no variation in calendars
So I'd rather just fix it, either to the ordinal or to 1950/01/01. Taking it from the cube seems unnecessary and error-prone.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah not the best code to understand indeed! but it does do a lot of tricksy stuff nonetheless. Can I suggest we finish up with this PR in terms of functionality for the yearly (normal) data and merge then we look into Lisa's issue, the more I look at it the more it looks to be way separate 🍺
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree we should respect the scope of the PR, but let's not merge to hastily. I'm not so keen on the overlap argument for the yearly data. But removing it in favor of the above suggestion breaks several tests. Can I look into fixing that first?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that would be very much appreciated mate and will buy a 🍺
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I fixed most of it, but some tests still fail because test cube 2 has a daily time coordinate, and the code above raises an exception in that case. What to do with that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's not make the changes - your suggestion below saved the day, so let's keep the changes to a minimum and only commit what's necessary - let me take it from here 🍺
| all_times = [] | ||
| for cube in cubes: | ||
| span = _datetime_to_int_days(cube) | ||
| span = _datetime_to_int_days(cube, overlap=True) | ||
| start, stop = span[0], span[-1] | ||
| all_times.append([start, stop]) | ||
| bounds = [range(b[0], b[-1] + 1) for b in all_times] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can also be simplified:
bounds = []
for cube in cubes:
span = _datetime_to_int_days(cube)
start, stop = span[0], span[-1]
bounds.append(range(start, stop+1))But why even use the range function? Don't we just want the overlapping time points from span?
spans = []
for cube in cubes:
spans.append(_datetime_to_int_days(cube))
time_pts = reduce(np.intersect1d, spans)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mate can we please not start trying to fix something that's not broken? We need to understand the very specific problem that @LisaBock is having and to allow for the correct handling of yearly data - the second one works fine so let's just focus on the scope of this PR. I really don't want to start changing this MM right before the release 🍺
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, that's fair. It's just that I thought it might help to debug, for I'm still struggling to understand these parts of the code.
|
Hey @valeriupredoi, I think I finally got to the bottom of it! Internally, we're always working with those integer days according to the standard calendar. But finally, we put them back into the if t_axis is None:
times = template_cube.coord('time')
else:
times = iris.coords.DimCoord(
t_axis,
standard_name='time',
units=template_cube.coord('time').units)Now, if So I bet that changing the time units here to |
|
excellent call @Peter9192 - I had completely forgotten about the cube assembly right at the very end. Note that forcing a fixed unit will shift the days if they are not from 1950, but this does the job perfectly: def _put_in_cube(template_cube, cube_data, statistic, t_axis):
"""Quick cube building and saving."""
if t_axis is None:
times = template_cube.coord('time')
else:
unit_name = template_cube.coord('time').units.name
tunits = cf_units.Unit(unit_name, calendar="standard")
times = iris.coords.DimCoord(
t_axis,
standard_name='time',
units=tunits)Cheers for spotting this! 🍺 |
That's why was fixing it at the other end 😉 |
|
ideally we should alter the data as little as possible, I reckon this will do it for now, could you test for your corner case pls? 🍺 |
|
@Peter9192 we can put your suggested changes as per #677 (comment) in a new PR but as far as the yearly data and elimination of the wandering dates I am happy 🍺 |
Done, it seems to work fine. I'm still not keen on the check for yearly data, though. And since I'm now invested: do you mind if I create another PR to suggest some simplifications? Then it's up to you whether or not to include it for the v2 release (if I even make it in time). |
|
yeah was just saying in the comment from the issue - go for it 👍 🍺 |
|
Can you fix the Codacy issue? I will test and merge afterwards. |
LisaBock
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Works fine for me!
|
cheers for testing and contributing @LisaBock and @Peter9192 - @mattiarighi that codacy issue can be ignored, if you could test when you haz time then this can go in then @Peter9192 can start on tuning the module more 🍺 |
|
@valeriupredoi It would be great if you could fix the Codacy issue and help set a good example. If you think that we should change how we use Codacy I would be happy to discuss that over a beer at the next workshop 🍻 Anyway, it shouldn't be more than a minute or so it fix it. |
|
sorted @bouweandela 👮 |
#685 work in progress |
Before you start, please read our contribution guidelines.
Tasks
If you need help with any of the tasks above, please do not hesitate to ask by commenting in the issue or pull request.
Closes #665