Add classes for calibrating average ensemble and ensemble of tragedies by peanutfun · Pull Request #1048 · CLIMADA-project/climada_python

peanutfun · 2025-04-17T14:01:57Z

Changes proposed in this PR:

Add optimizers for calibrating "average ensembles" and "ensembles of tragedies" of impact functions
Add option to assign weights to each calibration data point

PR Author Checklist

PR Reviewer Checklist

# Conflicts: # script/jenkins/branches/Jenkinsfile # tests_runner.py

Use negative cost function as target function in BayesianOptimizer Co-authored-by: Thomas Vogt <[email protected]>

…DA-project/climada_python into calibrate-impact-functions

…pact-functions # Conflicts: # doc/user-guide/climada_util_calibrate.ipynb

emanuel-schmid

Awesome amendment! Very well written module I'd say.
I just wish the pydoc strings were more elaborate and I wonder about the use of dataclasses.

emanuel-schmid · 2025-06-03T09:19:16Z

climada/util/calibrate/ensemble.py

+from ...engine.unsequa.input_var import InputVar
+from ...entity.impact_funcs import ImpactFunc, ImpactFuncSet
+from ..coordinates import country_to_iso
+from .base import Input, Optimizer, Output


PEP 8: "Absolute imports are recommended, as they are usually more readable and tend to be better behaved (or at least give better error messages)"

Imo, this is probably true for ..-modules. For siblings relative imports are OK.

Suggested change

from ...engine.unsequa.input_var import InputVar

from ...entity.impact_funcs import ImpactFunc, ImpactFuncSet

from ..coordinates import country_to_iso

from .base import Input, Optimizer, Output

from climada.engine.unsequa.input_var import InputVar

from climada.entity.impact_funcs import ImpactFunc, ImpactFuncSet

from climada.util.coordinates import country_to_iso

from .base import Input, Optimizer, Output

emanuel-schmid · 2025-06-03T15:10:38Z

climada/util/calibrate/ensemble.py

+@dataclass
+class EnsembleOptimizerOutput:
+    """The collective output of an ensemble optimization"""
+
+    data: pd.DataFrame


This is stretching the concept of a dataclass as I see it a lot. For instance, you can't even do this:

EnsembleOptimizerOutput(df1) == EnsembleOptimizerOutput(df2)

Is there a reason why it has to be a dataclass? Woldn't it be more natural to have an ordinary class that uses from_outputs as __init__ ?

I am using dataclass to avoid boilerplate code. In my view, the __init__ of the class should be as permissive as possible. Therefore, a usual init will look like this:

class MyClass: def __init__(attr1, attr2): self.attr1 = attr1 self.attr2 = attr2

I think the code is trivial and using dataclass to construct is seems reasonable to me. It improves readability, maintainability, and extensibility.

Woldn't it be more natural to have an ordinary class that uses from_outputs as __init__ ?

I think from_outputs is too restrictive for making it the default initialization (although I use it exclusively throughout my code). See my comment above.

You can't even do this:

A good point! I will implement a custom __eq__. But I don't think that's a reason to abandon dataclass in this case.

I implemented a custom __eq__ for EnsembleOptimizerOutput

emanuel-schmid · 2025-06-04T07:49:53Z

climada/util/calibrate/ensemble.py

+        return ax
+
+
+@dataclass


Input has no __eq__, hence this class has neither. Here too, I'm not sure why it needs to be a dataclass.
Is it about the implicit immutability of a dataclass object? If so, we maybe even should make it explicit by

Suggested change

@dataclass

@dataclass(frozen=True)

Although this would require to drop the __post_init__ methods and InitVars and replace them by something like

@classmethod def from_initvars(cls, x, y): return cls(something(x,y))

Even so, it would be nice to allow comparing two such objects meaningfully - and if it means that we have to implement __eq__ for a dataclass. Otherwise I honestly don't see a point of calling it dataclass.

Again, I use dataclass mostly to avoid a trivial init. This is especially useful for derived classes, where you might easily miss that you also need to initialize the base class.

I did not check it, but a default __eq__ should be created by dataclass. I did not really consider freezing the instances, as I think it's not necessary. Input probably is the only sensible candidate for freezing, as results are distorted if one changes it.

not really convinced given the need to implement __eq__ and __post__
- but I guess it doesn't matter much. 😎

emanuel-schmid · 2025-06-10T10:52:32Z

climada/util/calibrate/ensemble.py

+
+@dataclass
+class TragedyEnsembleOptimizer(EnsembleOptimizer):
+    """An optimizer for the ensemble of tragedies


A very short summary that doesn't give much of a clue what this implementation of the EnsembleOpitmizer does.

Should be better now

emanuel-schmid · 2025-06-10T10:53:28Z

climada/util/calibrate/ensemble.py

+            self.samples = rng.choice(self.samples, ensemble_size, replace=False)
+
+    def input_from_sample(self, sample: list[tuple[int, int]]):
+        """Subselect all input"""


I agree that this does not become clear in the docs. The idea is the following: Each EnsembleOptimizer only stores a list of samples, which contains the indices of the data points that are to be used in each optimization. The concrete optimizer then defines in input_from_sample, how a new Input object is generated for each sample and thus each optimization. In TragedyEnsembleOptimizer, this is used to drastically reduce the amount of hazard data that needs to be processed by each optimization. I'll try to clarify

I hope I clarified

emanuel-schmid · 2025-06-10T10:55:14Z

climada/util/calibrate/ensemble.py

+
+@dataclass
+class AverageEnsembleOptimizer(EnsembleOptimizer):
+    """An optimizer for the average ensemble


🤔 Hard to understand. I wish the description was more elaborate. s.b.

I hope I clarified

emanuel-schmid · 2025-06-10T10:56:16Z

climada/util/calibrate/ensemble.py

+
+@dataclass
+class EnsembleOptimizer(ABC):
+    """Abstract base class for defining an ensemble optimizer


What is an ensemble optimizer?

Will clarify in the docs. An optimizer yields a single set of optimal parameters (or impact functions), whereas an ensemble optimizer yields an ensemble of them.

climada/util/calibrate/ensemble.py

* Use weights for sampling with replacement in AverageEnsembleOptimizer. * Update tests

peanutfun · 2025-06-13T09:58:27Z

@emanuel-schmid I hope the docs are clearer now!

I also found a bug in the implementation where the replace parameter in AverageEnsembleOptimizer would not have any effect because the data structure would only allow to add a sample once. I fixed this by implementing an additional data structure data_weights to Input, along with some tests. Effectively, drawing a sample more than once in AverageEnsembleOptimizer will multiply the weights of that sample. Other than that, users are now free to weight their data as they see fit (although we did not consider that necessary before)

…-project/climada_python into cross-calibrate-impact-functions

emanuel-schmid

Thanks for improving the docs! Sorry for taking so long.
It all looks good to me, apart from a few minor, mainly cosmetical, concerns.
Happy to merge.

emanuel-schmid · 2025-09-02T12:29:54Z

climada/util/calibrate/bayesian_optimizer.py

+        return -self.input.cost_func(data, predicted, weights)

-    def run(self, controller: BayesianOptimizerController) -> BayesianOptimizerOutput:
+    def run(self, **opt_kwargs) -> BayesianOptimizerOutput:


Why is the signature not just like the pydoc describes it?

Same argument as we had in the discussion about the unsequa abstract base class:

The Optimizer abstract base class defines an interface that should be kept for all derived classes. However, the derived classes can, or in this case must, take additional, class specific arguments. The only elegant (?) way I see for doing this is simply going with a variadic kwargs argument and then documenting the method like it had the actual arguments that are needed here.

tldr: The linter would complain: https://pylint.pycqa.org/en/latest/user_guide/messages/warning/arguments-differ.html

Note that we do the same thing in the ScipyOptimizer:

climada_python/climada/util/calibrate/scipy_optimizer.py

Line 76 in 93aa8b2

def run(self, **opt_kwargs) -> ScipyMinimizeOptimizerOutput:

🤦 aah, right! 👍

emanuel-schmid · 2025-09-02T12:30:52Z

climada/util/calibrate/bayesian_optimizer.py

+        # Take the controller
+        try:
+            controller = opt_kwargs.pop("controller")
+        except KeyError as err:
+            raise RuntimeError(
+                "BayesianOptimizer.run requires 'controller' as keyword argument"
+            ) from err
+


all of this wouldn't be necessary with a signature as described in the pydoc

Yes, but see above.

climada/util/calibrate/ensemble.py

emanuel-schmid · 2025-09-02T15:40:48Z

climada/util/calibrate/ensemble.py

+        return ax
+
+
+@dataclass


not really convinced given the need to implement __eq__ and __post__
- but I guess it doesn't matter much. 😎

emanuel-schmid · 2025-09-02T15:42:46Z

climada/util/calibrate/ensemble.py

+LOGGER = logging.getLogger(__name__)
+
+
+def sample_data(data: pd.DataFrame, sample: list[tuple[int, int]]):


function could be private if we wanted (with regard to possible Extensions better not though)

emanuel-schmid · 2025-09-02T15:43:01Z

climada/util/calibrate/ensemble.py

+    return data_sampled
+
+
+def sample_weights(weights: pd.DataFrame, sample: list[tuple[int, int]]):


emanuel-schmid · 2025-09-02T15:43:12Z

climada/util/calibrate/ensemble.py

+    return weights_sampled
+
+
+def event_info_from_input(inp: Input) -> dict[str, Any]:


Co-authored-by: Emanuel Schmid <[email protected]>

emanuel-schmid · 2025-09-25T10:27:44Z

@peanutfun this is ready for mergin - isn't it?

peanutfun and others added 30 commits March 31, 2023 16:23

Initial draft for calibration from scipy.optimize

5d8278e

Draft for impact function calibration

443545e

Add first unit tests of calibration module

819aab5

ci: Add bayesian-optimization during Jenkins build

96e3cb3

Add __init__.py for util/calibarte/test module

123c632

Add climada.util.calibrate.test module to test discovery

107a836

Add unit and integration tests, update code base

2af6f09

Start documenting new calibrate module

0d6e80b

Actually add the intregration test

23cae6c

Add some documentation

50f3fd9

commit PLEASE CLEAN UP

d321832

Add more docstrings and simplify imports through __init__

24c0fbc

Add separate Output classes for each optimizer

096a8d4

Merge branch 'develop' into calibrate-impact-functions

37c65d9

Restructure calibration module

e8abb1a

Add tutorial on impact function calibration

3d94151

Update tutorial

ea0eb47

Remove hazard event selection from calibrate.Input

0e5a557

Update calibration tutorial

e1fe68a

Merge branch 'develop' into calibrate-impact-functions

68c421b

# Conflicts: # script/jenkins/branches/Jenkinsfile # tests_runner.py

Update climada/util/calibrate/bayesian_optimizer.py

df03b0d

Use negative cost function as target function in BayesianOptimizer Co-authored-by: Thomas Vogt <[email protected]>

Separate computing cost from transforming impact objects

5ef4a01

Merge branch 'calibrate-impact-functions' of https://github.com/CLIMA…

91cfd83

…DA-project/climada_python into calibrate-impact-functions

Add evaluator for calibration output

4e1f104

Add TestBayesianOptimizer test to test loader

43f40b3

Update code, docs, and tutorial

97d763a

Update tutorial

d43eb8a

Add option to adjust data frame alignment

dda079d

Merge branch 'develop' into calibrate-impact-functions

185866f

Merge branch 'develop' into calibrate-impact-functions

5fdbf4e

peanutfun and others added 3 commits May 9, 2025 14:44

Update calibation tutorial

821d7c0

Merge branch 'develop' into cross-calibrate-impact-functions

1dc5dc0

Merge remote-tracking branch 'origin/develop' into cross-calibrate-im…

c460c73

…pact-functions # Conflicts: # doc/user-guide/climada_util_calibrate.ipynb

emanuel-schmid requested changes Jun 10, 2025

View reviewed changes

peanutfun commented Jun 11, 2025

View reviewed changes

climada/util/calibrate/ensemble.py Outdated Show resolved Hide resolved

peanutfun added 6 commits June 12, 2025 15:14

Use absolute imports for referring out of this module

6cce5a9

Update docs and links in tutorial

3f30b04

Add data_weights to calibration Input

58bf84f

* Use weights for sampling with replacement in AverageEnsembleOptimizer. * Update tests

Add equality comparison to EnsembleOptimizerOutput

b176e5f

Update CHANGELOG.md

26ef863

Update CHANGELOG.md

c6ce680

peanutfun added 7 commits June 13, 2025 15:20

Comply to abstractmethod interface in BayesianOptimizer

62519bb

Fix linter issue where builtin 'input' was shadowed

e33ecc7

Suggest that cost functions take ndarrays

1c125fc

Update tutorial

56a0f15

Make cost functions consume numpy arrays

96cf55c

Update docs

d18a814

Update tutorial

5c988ec

peanutfun requested a review from emanuel-schmid July 9, 2025 07:48

emanuel-schmid added 3 commits September 1, 2025 17:23

Merge branch 'develop' into cross-calibrate-impact-functions

51630a6

remove useless tqdm artefacts

373dbbc

Merge branch 'cross-calibrate-impact-functions' of github.com:CLIMADA…

e3da3ec

…-project/climada_python into cross-calibrate-impact-functions

emanuel-schmid reviewed Sep 2, 2025

View reviewed changes

peanutfun and others added 2 commits September 3, 2025 14:31

Update climada/util/calibrate/ensemble.py

01ba79f

Co-authored-by: Emanuel Schmid <[email protected]>

Merge branch 'develop' into cross-calibrate-impact-functions

8d568d8

Merge branch 'develop' into cross-calibrate-impact-functions

5b36cbf

emanuel-schmid merged commit 593ebf0 into develop Sep 29, 2025
19 checks passed

emanuel-schmid deleted the cross-calibrate-impact-functions branch September 29, 2025 15:59

		LOGGER = logging.getLogger(__name__)


		def sample_data(data: pd.DataFrame, sample: list[tuple[int, int]]):

		return data_sampled


		def sample_weights(weights: pd.DataFrame, sample: list[tuple[int, int]]):

		return weights_sampled


		def event_info_from_input(inp: Input) -> dict[str, Any]:

Conversation

peanutfun commented Apr 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Author Checklist

PR Reviewer Checklist

Uh oh!

emanuel-schmid left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

peanutfun commented Jun 13, 2025

Uh oh!

emanuel-schmid left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

emanuel-schmid commented Sep 25, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

peanutfun commented Apr 17, 2025 •

edited

Loading