Conversation
|
Performance benchmarks:
|
|
I did some research for (3) and I found that we technically can, but we probably shouldn't:-
However it can be python-fast(For eg. Supporting both AgentSet and WeakAgentSet discussed in #3128 ) |
|
One more minor thing that I'd like to address is that |
|
Thanks a lot for this. Very useful pathfinding. The separation of extraction (DataSet) from storage and timing/triggering (future work) is helpful. Always chop problems into smaller problems if you can. From looking back at our original discussions, few questions on how flexible the extraction layer can be:
# Instead of evaluating once at init
registry.track_agents("starving", model.agents.select(lambda a: a.energy < 10))
# Evaluate fresh each time
registry.track_agents("starving", lambda: model.agents.select(lambda a: a.energy < 10))Of course, this add some complexity that you're not working with a fixed set of agents anymore.
wealth_data = AgentDataSet("wealth", model.agents, "wealth")
# Option A: Pass the wealth_data DataSet as the "model"?
gini_data = ModelDataSet("gini", wealth_data, gini=lambda ds: calculate_gini(ds.data))
# Option B: Keep model, but callable references wealth_data?
gini_data = ModelDataSet("gini", model, gini=lambda m: calculate_gini(wealth_data.data))
# Option C: Something else entirely?
|
This indeed currently does not work (nor does it for the existing data collector). I am actually thinking of making this possible in a custom DynamicAgentDataSet class that listens for agent_registered signals for inclusion and agent_deregistered signals for removal. So no need to filter on every call to
This is related to the |
|
Just asking this out of curiosity, Is it a good idea to make Dataset observable and then squeeze out a new Class from ModelDataset known as class ComputedDataSet(DataSet):
def __init__(
self,
name: str,
parent: DataSet,
compute_fn: Callable[[dict], Any],
):
self.parent.observe(parent.name, SignalType.CHANGE, self.collect) |
Something like this might be made to work. However, any dataset, by definition, is already a |
|
You are right that a pure reactive approach would cause a massive Signal Explosion and a performance nightmare But that is not what I am proposing. The Hybrid Flow:
Result: We have the memory buffer.
|
That is a design that indeed makes more sense. However, it is also fragile. Basically, the dataset would become dirty on step. It is clean again after the first call to data. But there is nothing preventing the user to have updates to agents in the dataset afterwards: agent.shuffle_do("step_a")
self.update_stats() # --> calls .data on some of our DataSets
self.shuffle_do("step_b")
self.update_stats() # assumes that the data is clean but not true.So, that is why a numpy view style design, as we use for the property layers (i.e., However, I want to leave the numpy style agent data set for a future PR to avoid complicating this one too much. |
Add convenience property to access model.scenario directly from agents. This follows the same pattern as the existing random and rng properties, making scenario parameters easier to access within agent code.
|
I added the unique_ids of agents as a separate numpy array so this is now included. However, I have left For the data recording, unique_ids do matter. Ideally, you want to store agent data by unique_id so you can trace agents over the simulation. @codebreaker32, I would love your perspective on this and how you think we could include this in the API. There is also a broader point on |
|
I recommend we keep
# inside CollectorListener._store_dataset_snapshot
dataset = self.registry.datasets[name]
data = dataset.data
if hasattr(dataset, "ids"):
ids = dataset.ids
# Stack IDs(using np.hstack) with Data for storageThis way, the
Agree and rather I see "Listener" as the normalization layer. |
I agree with this. I'll add a
Ok, for now, let's keep it this way. We might revisit this depending on if and how we want to use the DataRegistry in the UI side of things. |
|
From the usage example in the PR description:
I find it a bit weird you can just pile one arguments. Can we make this a list or a set? I think requiring keywords here might also help (including for future API changes). |
I actually thought it was very convenient. You are just passing the different attributes/properties/descriptors you want to collect. So yes, you could also do this via e.g., a single argument |
EwoutH
left a comment
There was a problem hiding this comment.
Pre-approving since this is almost fully in the experimental space.
|
So, before merging this, I would like to know if there is a preference regarding fields. We have three options # current implementations
registry.track_model("model_dataset", "attr1", "attr2", "attr3")
# have a single fields argument
registry.track_model("model_dataset", ["attr1", "attr2", "attr3"])
# have a single fields keyword argument
registry.track_model("model_dataset", fields=["attr1", "attr2", "attr3"])@EwoutH, @codebreaker32 do either of you have a clear preference? I like the convenience of the current implementation, but I see @EwoutH's point of future extendability. We also separately indicated a desire to move towards a keyword-preferred design, which would favor option 3. |
|
I am fine with option 3 as well. It is self-documenting, extensible, and aligns with the project's shift toward explicit keyword arguments |
|
I added NumpyAgentDataSet.agent_ids and shifted to fields as a keyword argument. I am merging this so we can move on to finalizing #3145. |
Summary
Expanding on an idea from #3145 as well as past discussion on data collection, this PR adds a novel data-registry approach to Mesa. This new approach rests on the idea of a
DataSet. A DataSet contains part of the state of a model at a given instant.The key point is that current data collection does too much. With this PR, we separate the getting of the state of part of the model at a given instant from the storage of these states over time. With explicit
DataSetclasses, it's now trivial to extend this if you need your own custom data collection. Another benefit ofDataSetclasses is that we get rid of the complex dict-style configuration of what to collect. Everything can be handled through args for attributes and kwargs for callables.This PR adds
DataRegistry: a dict-like collection of datasets. It is always available via model.data_registry.ModelDataSet: a dataset for gathering model-level data.AgentDataSet: a dataset for gathering agent data from anAbstractAgentSet.TableDataSet: a dataset for gathering miscellaneous data, works by adding rows to it.NumpyAgentDataSet: a Numpy array-based dataset containing agent data for a specified Agent class.DataSetprotocolDatasets gather data from fields. Fields are always strings and assumed to be accessible via attribute access. DataSet does not support lambda functions. If you want to do something like that, use properties or descriptors instead. Data is accessed via
DataSet.data.ModelDataSetandAgentDataSetwill at that moment gather the data and return it.NumpyAgentDataSetwill return a view on the numpy array containing the data. This view is always in sync with the attribute values, so this data is not separately gathered on request.TableDataSetwill return the current list of rows.This PR is a first draft, exploring the idea. Feedback is very much welcome. The focus is more on fleshing out the API than on optimizing the code itself.
API