Skip to content

Conversation

@valeriupredoi
Copy link
Contributor

@valeriupredoi valeriupredoi commented Jun 16, 2020

Before you start, please read our contribution guidelines.

Tasks

  • Create an issue to discuss what you are going to do, if you haven't done so already (and add the link at the bottom)
  • This pull request has a descriptive title that can be used in a changelog
  • Add unit tests
  • Public functions should have a numpy-style docstring so they appear properly in the API documentation. For all other functions a one line docstring is sufficient.
  • If writing a new/modified preprocessor function, please update the documentation
  • Circle/CI tests pass. Status can be seen below your pull request. If the tests are failing, click the link to find out why.
  • Codacy code quality checks pass. Status can be seen below your pull request. If there is an error, click the link to find out why. If you suspect Codacy may be wrong, please ask by commenting.

If you need help with any of the tasks above, please do not hesitate to ask by commenting in the issue or pull request.


Closes #665

Copy link
Contributor

@Peter9192 Peter9192 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @valeriupredoi , I started studying the code, but I need more time to fully appreciate what's going on. I do feel that some things can still be simplified, both in the PR and in the existing code. Hoping to get back to this tomorrow.

valeriupredoi and others added 2 commits June 17, 2020 17:57
Co-authored-by: Peter Kalverla <[email protected]>
Co-authored-by: Peter Kalverla <[email protected]>
@valeriupredoi
Copy link
Contributor Author

yeah cheers mate, better implementation 🍺 - go for it, I was a bit in a hurry when plugging these in, I think too there's room for improvement defo


# get the number of days starting from the reference unit
time_unit = cube.coord('time').units.name
time_offset = _get_time_offset(time_unit)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @valeriupredoi , this is one of the points that really confuses me. What happens if different cubes have different time offsets? Don't you want the offset with respect to a common reference time? I'm thinking something like

def _datetime_to_int_days(cube, overlap=False):
    """Return list of int(days) with respect to a common reference.

    Cubes may have different calendars. This function extracts the date
    information from the cube and re-constructs a default calendar using
    the datetime library.

    This doesn't work for daily or subdaily data, because different
    calendars may have different number of days in the year.

    The standard calendar used by datetime provides the to_ordinal()
    function which returns "int number of days since the 0001/01/01".
    """
    # Convert to a common time format
    years = [cell.point.year for cell in cube.coord('time').cells()]
    months = [cell.point.month for cell in cube.coord('time').cells()]
    if not 0 in np.diff(years):
        # yearly data
        standard_dates = [datetime(year, 7, 1) for year in years]
    elif not 0 in np.diff(months):
        # monthly data
        standard_dates = [datetime(year, month, 15) for year, month in zip(years, months)]
    else:
        raise ValueError("Multimodel only supports yearly or monthly data")

    return [dt.to_ordinal() for dt in standard_dates]

I can work this out further if you like?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes that's exactly that, I use the number of days since the start of the data taken from units forcing a STANDARD_CALENDAR eg days since 1950-01-01, calendar=STANDARD_CALENDAR - this works fine if there is no variation in calendars since the units are all the same for all cubes after cmor.fix unified the units, but I suspect it's the cause of the shifts @LisaBock is seeing when calendars differ

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw isn't it toordinal()?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw isn't it toordinal()?

Right you are 🍺

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this works fine if there is no variation in calendars

So I'd rather just fix it, either to the ordinal or to 1950/01/01. Taking it from the cube seems unnecessary and error-prone.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah not the best code to understand indeed! but it does do a lot of tricksy stuff nonetheless. Can I suggest we finish up with this PR in terms of functionality for the yearly (normal) data and merge then we look into Lisa's issue, the more I look at it the more it looks to be way separate 🍺

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree we should respect the scope of the PR, but let's not merge to hastily. I'm not so keen on the overlap argument for the yearly data. But removing it in favor of the above suggestion breaks several tests. Can I look into fixing that first?

Copy link
Contributor Author

@valeriupredoi valeriupredoi Jun 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that would be very much appreciated mate and will buy a 🍺

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I fixed most of it, but some tests still fail because test cube 2 has a daily time coordinate, and the code above raises an exception in that case. What to do with that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's not make the changes - your suggestion below saved the day, so let's keep the changes to a minimum and only commit what's necessary - let me take it from here 🍺

Comment on lines 216 to 221
all_times = []
for cube in cubes:
span = _datetime_to_int_days(cube)
span = _datetime_to_int_days(cube, overlap=True)
start, stop = span[0], span[-1]
all_times.append([start, stop])
bounds = [range(b[0], b[-1] + 1) for b in all_times]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can also be simplified:

    bounds = []
    for cube in cubes:
        span = _datetime_to_int_days(cube)
        start, stop = span[0], span[-1]
        bounds.append(range(start, stop+1))

But why even use the range function? Don't we just want the overlapping time points from span?

    spans = []
    for cube in cubes: 
        spans.append(_datetime_to_int_days(cube))    
    time_pts = reduce(np.intersect1d, spans)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mate can we please not start trying to fix something that's not broken? We need to understand the very specific problem that @LisaBock is having and to allow for the correct handling of yearly data - the second one works fine so let's just focus on the scope of this PR. I really don't want to start changing this MM right before the release 🍺

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, that's fair. It's just that I thought it might help to debug, for I'm still struggling to understand these parts of the code.

@Peter9192
Copy link
Contributor

Peter9192 commented Jun 18, 2020

Hey @valeriupredoi, I think I finally got to the bottom of it!

Internally, we're always working with those integer days according to the standard calendar. But finally, we put them back into the template_cube like so (in _put_in_cube):

    if t_axis is None:
        times = template_cube.coord('time')
    else:
        times = iris.coords.DimCoord(
            t_axis,
            standard_name='time',
            units=template_cube.coord('time').units)

Now, if templace_cube happens to have a different than standard calendar, this is where things get messy. For example, the CCSM4 calendar is "365 days". So in comparison to the standard calendar, every ~4 years, 1 day is miscounted. That leads to the shifted dates!

So I bet that changing the time units here to Unit("days since 1950-01-01", calendar='standard') will fix the problem for good!

@valeriupredoi
Copy link
Contributor Author

excellent call @Peter9192 - I had completely forgotten about the cube assembly right at the very end. Note that forcing a fixed unit will shift the days if they are not from 1950, but this does the job perfectly:

def _put_in_cube(template_cube, cube_data, statistic, t_axis):
    """Quick cube building and saving."""
    if t_axis is None:
        times = template_cube.coord('time')
    else:
        unit_name = template_cube.coord('time').units.name
        tunits = cf_units.Unit(unit_name, calendar="standard")
        times = iris.coords.DimCoord(
            t_axis,
            standard_name='time',
            units=tunits)

Cheers for spotting this! 🍺

@Peter9192
Copy link
Contributor

Note that forcing a fixed unit will shift the days if they are not from 1950

That's why was fixing it at the other end 😉

@valeriupredoi
Copy link
Contributor Author

ideally we should alter the data as little as possible, I reckon this will do it for now, could you test for your corner case pls? 🍺

@valeriupredoi
Copy link
Contributor Author

@Peter9192 we can put your suggested changes as per #677 (comment) in a new PR but as far as the yearly data and elimination of the wandering dates I am happy 🍺

@Peter9192
Copy link
Contributor

could you test for your corner case pls

Done, it seems to work fine. I'm still not keen on the check for yearly data, though. And since I'm now invested: do you mind if I create another PR to suggest some simplifications? Then it's up to you whether or not to include it for the v2 release (if I even make it in time).

@valeriupredoi
Copy link
Contributor Author

yeah was just saying in the comment from the issue - go for it 👍 🍺

@mattiarighi
Copy link
Contributor

Can you fix the Codacy issue?

I will test and merge afterwards.

Copy link
Member

@LisaBock LisaBock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works fine for me!

@valeriupredoi
Copy link
Contributor Author

cheers for testing and contributing @LisaBock and @Peter9192 - @mattiarighi that codacy issue can be ignored, if you could test when you haz time then this can go in then @Peter9192 can start on tuning the module more 🍺

@bouweandela
Copy link
Member

@valeriupredoi It would be great if you could fix the Codacy issue and help set a good example. If you think that we should change how we use Codacy I would be happy to discuss that over a beer at the next workshop 🍻 Anyway, it shouldn't be more than a minute or so it fix it.

@valeriupredoi
Copy link
Contributor Author

sorted @bouweandela 👮

@mattiarighi mattiarighi merged commit 950cf40 into master Jun 19, 2020
@mattiarighi mattiarighi deleted the make_mm_yearly branch June 19, 2020 13:21
@Peter9192
Copy link
Contributor

then @Peter9192 can start on tuning the module more

#685 work in progress

@bouweandela bouweandela added the preprocessor Related to the preprocessor label Jul 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request preprocessor Related to the preprocessor

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Multi-model statistics shift time coordinate

6 participants