Add DataFrame support to Agent creation#3199
Conversation
Previously, initializing agents from a pandas DataFrame required verbose conversion of columns to lists using '.tolist()' when calling This commit improves the ergonomics of agent creation by: 1. Updating [create_agents] to accept 'pandas.Series' directly as sequence arguments, eliminating the need for manual conversion. 2. Adding a factory method that automatically maps DataFrame columns to agent attributes, allowing for concise instantiation from tabular data.
|
Performance benchmarks:
|
|
Thanks for this PR. Can you ensure you get to 100% test code coverage, or at least as close as is reasonable? You can check the coverage report to see which parts are not yet covered by tests. @EwoutH, you started the original issue. What do you think? Do you prefer overloading |
Previously, initializing agents from a pandas DataFrame required verbose conversion of columns to lists using '.tolist()' when calling This commit improves the ergonomics of agent creation by: 1. Updating [create_agents] to accept 'pandas.Series' directly as sequence arguments, eliminating the need for manual conversion. 2. Adding a factory method that automatically maps DataFrame columns to agent attributes, allowing for concise instantiation from tabular data.
Previously, initializing agents from a pandas DataFrame required verbose conversion of columns to lists using '.tolist()' when calling This commit improves the ergonomics of agent creation by: 1. Updating [create_agents] to accept 'pandas.Series' directly as sequence arguments, eliminating the need for manual conversion. 2. Adding a factory method that automatically maps DataFrame columns to agent attributes, allowing for concise instantiation from tabular data.
Previously, initializing agents from a pandas DataFrame required verbose conversion of columns to lists using '.tolist()' when calling This commit improves the ergonomics of agent creation by: 1. Updating [create_agents] to accept 'pandas.Series' directly as sequence arguments, eliminating the need for manual conversion. 2. Adding a factory method that automatically maps DataFrame columns to agent attributes, allowing for concise instantiation from tabular data.
Previously, initializing agents from a pandas DataFrame required verbose conversion of columns to lists using '.tolist()' when calling This commit improves the ergonomics of agent creation by: 1. Updating [create_agents] to accept 'pandas.Series' directly as sequence arguments, eliminating the need for manual conversion. 2. Adding a factory method that automatically maps DataFrame columns to agent attributes, allowing for concise instantiation from tabular data.
|
Performance benchmarks:
|
|
@quaquel Got the 100% test code coverage. Also, interesting performance benchmark results. |
|
The benchmark results make no sense. Nothing here touches code that is used in the benchmarks. But we have been seeing funky benchmark results a lot lately. I'll try to take a closer look at the PR itself soonish. |
6bbe53d to
5012657
Compare
754ea34 to
18ba547
Compare
tests/test_agent.py
Outdated
| assert agent.b == 7 | ||
|
|
||
|
|
||
| def test_agent_create_edge_cases(): |
There was a problem hiding this comment.
Why are these added? They are not relevant for this PR
|
I doo agree with dropping **args but keeping **kwargs |
|
If we allow series as per #3186, allowing kwargs here is redundant. |
|
I see the point about redundancy with a Series-capable
Essentially, Also, as stated by @EwoutH in #3186 (comment) should I split this into 2 seperate PRs? |
e4a6efb to
725b280
Compare
|
I agree. While I would prefer 2 PRs, keeping it in one is also fine since the create agent changes are small. |
|
Ok, I am fine with adding kwargs back to from_dataframe, but let's then restrict it to non-sequence data only. |
|
Agreed. |
020a935 to
9617bba
Compare
|
Do we have any idea how this will handle non-Pandas tables? Like NumPy arrays or Polars dataframes? |
Numpy does not have to_dict, so it will fail. Polars has a to_dict, but with different keyword arguments. Again it will fail. |
Co-authored-by: Ewout ter Hoeven <[email protected]>
Co-authored-by: Ewout ter Hoeven <[email protected]>
|
I can add Duck-Typing for Polars support. Something like: # In from_dataframe
if hasattr(df, "to_dict"): # Pandas
records = df.to_dict(orient="records")
elif hasattr(df, "to_dicts"): # Polars
records = df.to_dicts()
else:
raise TypeError("Object must be a Pandas or Polars DataFrame.")But this might require slightly more complex error handling if |
|
I suggest we just handle pandas for now and move the polars conversation more broadly to the mesa 4 discussion |
Summary
This PR improves the integration between pandas DataFrames and agent creation in Mesa. It enables
Agent.create_agentsto accept pandas Series directly and introduces a dedicatedAgent.from_dataframefactory method.Motive
Currently, initializing agents from tabular data (e.g., CSVs, Parquet) is verbose because users must manually convert every column to a list using
.tolist(). As discussed in #3186, improving this functionality aligns Mesa with other data-centric libraries (like Hugging Face Datasets or Ray) and makes working with synthetic populations or large datasets much more ergonomic.Implementation
Agent.create_agentsto recognizepandas.Seriesas a valid sequence type. This ensures that Series passed as arguments are distributed across agents row-by-row, rather than being treated as a single atomic value.Agent.from_dataframe(model, df, **kwargs). This method uses a row-based loop (df.to_dict(orient="records")) for efficiency, as suggested during review.from_dataframemethod automatically filters out model and n columns from records to avoid clashing with theAgentconstructor's positional arguments.4.Performance: Implemented a lazy-check for the pandas module in
create_agentsto ensure zero performance overhead for users who do not have pandas imported.Usage Examples
Additional Notes
Agent.create_agents()#3186.