Replies: 7 comments 15 replies
-
|
I have an initial concern that I also raised in the original PR (#3103) here ro which you replied. In short, To throw a list of Scenario objects into a If we use a It also prevents the Scenario reuse.
This feels like a natural fit for a single-file sweep backend. We could extend the SQLDataRecorder (or build a wrapper around it) that creates a centralized scenarios table to store the Scenario parameters, assigned to a unique run_id. The time-series/result tables would just get a run_id(here replication_id) foreign key. For the much richer outputs you mentioned (like maps over time), Can we consider HDF5 or Xarray/NetCDF as a backend target? They are built explicitly for storing complex N-dimensional arrays tightly bound with metadata attributes in a single file |
Beta Was this translation helpful? Give feedback.
-
I share this concern. Conceptually, I want scenarios to be the equivalent of a frozen dataclass. The current implementation is less restrictive by not allowing modification after use inside a model. I am inclined to change this to be fully frozen upon instantiation. I have to see if this means I can just use a dataclass, or that I have to implement |
Beta Was this translation helpful? Give feedback.
-
|
Love this update. This has been headache. RE: Data management -- If I understand the problem correctly -- this feels a little like... we don't want too much in core Mesa, but also, we want this handled by core Mesa. I want to propose a middle ground. We could define a storage protocol and Mesa ships with one default implementation for new users (e.g. SQLite). More experienced users can swamp out for more advanced set ups. I am not married to this idea. I am throwing it out as an option or jumping off point. |
Beta Was this translation helpful? Give feedback.
-
|
I have a concern about how flexible the execution logic should be in the proposed 'run_scenario' function. model = model_cls(scenario=scenario)
model.run_for(n_steps)
data = model.data_recorder.get_data()But models are often executed in different ways, like 'run_for', stepping loops, stop conditions, etc, and experiments also vary in what they want to extract as the result (full recorder output vs summary statistics or custom metrics). It might be better for 'run_scenario' to separate the execution logic and the result extraction, for example by allowing user-provided callables for one or both of these. That way the worker function stays general enough for different experiment patterns while still giving Mesa a standard way to execute a scenario. |
Beta Was this translation helpful? Give feedback.
-
|
With #3493 merged, let's see where we stand. It is now trivial to generate scenarios (see below). What is still missing are the following pieces
class RunSpec:
def __init__(
self,
model_class: type[Model],
n_steps: int,
):
self.model_class = model_class
self.steps = steps
def __call__(self, scenario: Scenario) -> tuple[int, int, dict[str, pd.DataFrame]]:
model = self.model_class(scenario=scenario)
model.run_for(n_steps)
data = {
name: model.recorder.get_table_dataframe(name)
for name in model.recorder.storage
}
return scenario.scenario_id, scenario.replication_id, data
if __name__ == "main":
# draw 1000 samples
samples = scipy.stats.qmc(d=3).random(1000)
scenarios = SchellingScenario.from_numpy(samples, parameter_names=["density","minory_pc", "homophily"],
replications=10)
run_schelling = RunSpec(Schelling, 100)
with ProcessPoolExecutor() as executor:
results = list(executor.map(run_schelling, scenarios)) |
Beta Was this translation helpful? Give feedback.
-
|
Building on the RunSpec idea, I think this can be the default execution unit: a small object that takes a Scenario, runs a model, and returns a result, without handling replications or storage. class RunSpec:
def __init__(self, model_class, steps: int = 100):
self.model_class = model_class
self.steps = steps
def execute(self, model):
model.run_for(self.steps)
def extract(self, model):
return model.data_registry
def __call__(self, scenario):
model = self.model_class(scenario=scenario)
self.execute(model)
data = self.extract(model)
return scenario.scenario_id, scenario.replication_id, dataThe intent here is that this replaces the current |
Beta Was this translation helpful? Give feedback.
-
|
In my initial implementation of RunSpec in #3641, one thing I didn't account for is that none of the recorders store scenario_id or replication_id. They're single-run by design, so once results are aggregated across runs, there's no way to distinguish where each row came from unless that information is attached somewhere. So where should run identity live, considering it has to be added before aggregation? One option is to make extract responsible for tagging each dataset with scenario_id and replication_id, since it has access to both the model and scenario. That would require passing scenario into extract. If tagging doesn't live here, it has to be handled at the executor or storage layer instead, but this needs to be handled explicitly somewhere in the pipeline. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I have been thinking about a replacement for the batch_runner. As I indicated in Mesa 4 Goals, the current batchrunner does too much. In my view, the generation of samples and the (parallel) execution of these samples is not the responsibility of Mesa but can be handled by other libraries. For example,
numpy.qmcandSALibcan be used to create samples.concurrent.futuresoffers aProcessPoolExecutorfor parallel model execution, and this API is mimicked by e.g.mpi4pyfor support on HPC using MPI. However, what would still be needed inside Mesa to make this all work?In short, there are 3 things still needed: experiment representation including seed management for replications, a clear reusable specification of executing a single sample, and a helper function for storing a sample and its result.
1. Replication support on Scenario
In #3103, we added support for experiment representation via
Scenario. However, Scenario does not yet provide strong support for replications. I suggest we expandrnginsideScenarioto support spawning multiple replications of the same scenario. The basic API would be something along the lines ofThe above can quite easily be implemented, but it requires some refactoring
scenario.spawn_replicationsinternally usesnumpy.qmc.Generator.spawnto spawn n independent generators and returns a list of copies of itself with just the generator changed and the replication_id specified. See also fix-spawn-generator #3195scenario_idbecomes a bit trickier than the current always increment on instantiation2. A standard run_scenario function which specifies the logic of executing a single scenario
This is the worker function to be used by a
concurrent.futuresexecutorThere are two open questions here
run_forapproach. Likely, we want to support a richer set of possibilities, including the still unresolved run_while/stop_condition problem.3. Some helper function that takes the (scenario, result) tuple and turns this into something that can be stored in an appropriate back-end
When doing parameter sweeps, ideally, we want to store the results in an appropriate back-end to disk. This means we need to take the replication and its results, process them so they can be easily stored, and store them. This can be as simple as turning the replication and the results into dictionaries, which can then be combined into several dataframes. But for larger outputs (e.g., maps, timeseries, maps over time), we might need something richer. Moreover, dataframes are just one possible data structure that can be stored as, e.g., CSV files. Ideally, however, the experiments and all results are stored in a single file rather than spread across multiple files.
Beta Was this translation helpful? Give feedback.
All reactions