Skip to content

Add DataFrame support to Agent creation#3199

Merged
quaquel merged 24 commits intomesa:mainfrom
falloficarus22:better-dataframe-support
Jan 28, 2026
Merged

Add DataFrame support to Agent creation#3199
quaquel merged 24 commits intomesa:mainfrom
falloficarus22:better-dataframe-support

Conversation

@falloficarus22
Copy link
Copy Markdown
Contributor

@falloficarus22 falloficarus22 commented Jan 24, 2026

Summary

This PR improves the integration between pandas DataFrames and agent creation in Mesa. It enables Agent.create_agents to accept pandas Series directly and introduces a dedicated Agent.from_dataframe factory method.

Motive

Currently, initializing agents from tabular data (e.g., CSVs, Parquet) is verbose because users must manually convert every column to a list using .tolist(). As discussed in #3186, improving this functionality aligns Mesa with other data-centric libraries (like Hugging Face Datasets or Ray) and makes working with synthetic populations or large datasets much more ergonomic.

Implementation

  1. Direct Series Support: Updated the argument parsing logic in Agent.create_agents to recognize pandas.Series as a valid sequence type. This ensures that Series passed as arguments are distributed across agents row-by-row, rather than being treated as a single atomic value.
  2. Optimized Factory Method: Added Agent.from_dataframe(model, df, **kwargs). This method uses a row-based loop (df.to_dict(orient="records")) for efficiency, as suggested during review.
  3. Conflict Handling: The from_dataframe method automatically filters out model and n columns from records to avoid clashing with the Agent constructor's positional arguments.
    4.Performance: Implemented a lazy-check for the pandas module in create_agents to ensure zero performance overhead for users who do not have pandas imported.

Usage Examples

import pandas as pd
from mesa.agent import Agent

# Sample Data
df = pd.DataFrame({
    'age': [25, 34, 45],
    'income': [45000, 62000, 51000],
    'model': ['unused', 'unused', 'unused'] # Will be filtered out
})

# --- create_agents (Now supports Series directly) ---
agents = MyAgent.create_agents(
    model=model,
    n=len(df),
    age=df['age'],
    income=df['income']
)

# --- from_dataframe (New Factory Method) ---
# Automatically maps 'age' and 'income' columns to __init__ arguments
agents = MyAgent.from_dataframe(model, df)

# With additional constant arguments (applied to all agents)
agents = MyAgent.from_dataframe(
    model, 
    df, 
    agent_type='citizen'
)

Additional Notes

Previously, initializing agents from a pandas DataFrame required
verbose conversion of columns to lists using '.tolist()' when calling

This commit improves the ergonomics of agent creation by:
1. Updating [create_agents] to accept 'pandas.Series' directly as sequence arguments, eliminating the need for manual conversion.
2. Adding a factory method that automatically maps DataFrame columns to agent attributes, allowing for concise
   instantiation from tabular data.
@github-actions
Copy link
Copy Markdown

Performance benchmarks:

Model Size Init time [95% CI] Run time [95% CI]
BoltzmannWealth small 🔵 +0.4% [+0.1%, +0.8%] 🔵 +0.8% [+0.7%, +1.0%]
BoltzmannWealth large 🔵 -0.2% [-4.3%, +2.6%] 🔵 +0.4% [-1.9%, +2.3%]
Schelling small 🔵 +1.3% [+0.8%, +1.8%] 🔵 +0.5% [+0.4%, +0.6%]
Schelling large 🔵 +3.6% [-1.4%, +8.5%] 🔴 +6.1% [+3.5%, +9.2%]
WolfSheep small 🔵 +4.5% [+2.7%, +6.2%] 🔵 +1.8% [+1.5%, +2.1%]
WolfSheep large 🔵 +6.3% [-6.4%, +18.8%] 🔵 +3.3% [+1.6%, +4.9%]
BoidFlockers small 🔵 +1.8% [+1.5%, +2.2%] 🔵 -0.2% [-0.4%, +0.0%]
BoidFlockers large 🔵 +1.0% [+0.4%, +1.6%] 🔵 +0.1% [-0.1%, +0.4%]

@quaquel
Copy link
Copy Markdown
Member

quaquel commented Jan 24, 2026

Thanks for this PR. Can you ensure you get to 100% test code coverage, or at least as close as is reasonable? You can check the coverage report to see which parts are not yet covered by tests.

@EwoutH, you started the original issue. What do you think? Do you prefer overloading create_agent, or do you prefer a separate method from_dataframe as done here and as argued for by several people, including me, in the original issue.

falloficarus22 and others added 3 commits January 24, 2026 13:04
Previously, initializing agents from a pandas DataFrame required
verbose conversion of columns to lists using '.tolist()' when calling

This commit improves the ergonomics of agent creation by:
1. Updating [create_agents] to accept 'pandas.Series' directly as sequence arguments, eliminating the need for manual conversion.
2. Adding a factory method that automatically maps DataFrame columns to agent attributes, allowing for concise
   instantiation from tabular data.
@falloficarus22 falloficarus22 marked this pull request as draft January 24, 2026 08:52
falloficarus22 and others added 8 commits January 24, 2026 09:02
Previously, initializing agents from a pandas DataFrame required
verbose conversion of columns to lists using '.tolist()' when calling

This commit improves the ergonomics of agent creation by:
1. Updating [create_agents] to accept 'pandas.Series' directly as sequence arguments, eliminating the need for manual conversion.
2. Adding a factory method that automatically maps DataFrame columns to agent attributes, allowing for concise
   instantiation from tabular data.
Previously, initializing agents from a pandas DataFrame required
verbose conversion of columns to lists using '.tolist()' when calling

This commit improves the ergonomics of agent creation by:
1. Updating [create_agents] to accept 'pandas.Series' directly as sequence arguments, eliminating the need for manual conversion.
2. Adding a factory method that automatically maps DataFrame columns to agent attributes, allowing for concise
   instantiation from tabular data.
Previously, initializing agents from a pandas DataFrame required
verbose conversion of columns to lists using '.tolist()' when calling

This commit improves the ergonomics of agent creation by:
1. Updating [create_agents] to accept 'pandas.Series' directly as sequence arguments, eliminating the need for manual conversion.
2. Adding a factory method that automatically maps DataFrame columns to agent attributes, allowing for concise
   instantiation from tabular data.
@falloficarus22 falloficarus22 marked this pull request as ready for review January 24, 2026 10:25
@github-actions
Copy link
Copy Markdown

Performance benchmarks:

Model Size Init time [95% CI] Run time [95% CI]
BoltzmannWealth small 🟢 -5.3% [-6.6%, -4.0%] 🔵 +1.3% [+1.1%, +1.5%]
BoltzmannWealth large 🟢 -7.1% [-11.6%, -3.6%] 🟢 -14.1% [-16.7%, -11.5%]
Schelling small 🔵 -3.4% [-4.0%, -2.7%] 🔵 +0.4% [+0.0%, +0.7%]
Schelling large 🔵 -1.2% [-6.4%, +3.7%] 🟢 -10.4% [-14.3%, -6.4%]
WolfSheep small 🟢 -5.7% [-7.7%, -3.9%] 🔵 -0.8% [-1.5%, -0.2%]
WolfSheep large 🔵 -0.6% [-14.6%, +12.4%] 🔵 -2.4% [-5.8%, +1.1%]
BoidFlockers small 🔵 +0.0% [-0.5%, +0.6%] 🔵 +1.1% [+0.9%, +1.4%]
BoidFlockers large 🔵 +1.5% [+0.5%, +2.4%] 🔵 +1.6% [+1.2%, +2.0%]

@falloficarus22
Copy link
Copy Markdown
Contributor Author

@quaquel Got the 100% test code coverage. Also, interesting performance benchmark results.
Any thoughts here?

@quaquel
Copy link
Copy Markdown
Member

quaquel commented Jan 24, 2026

The benchmark results make no sense. Nothing here touches code that is used in the benchmarks. But we have been seeing funky benchmark results a lot lately. I'll try to take a closer look at the PR itself soonish.

@falloficarus22 falloficarus22 force-pushed the better-dataframe-support branch from 6bbe53d to 5012657 Compare January 25, 2026 14:01
@falloficarus22 falloficarus22 force-pushed the better-dataframe-support branch from 754ea34 to 18ba547 Compare January 26, 2026 07:22
assert agent.b == 7


def test_agent_create_edge_cases():
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are these added? They are not relevant for this PR

@EwoutH
Copy link
Copy Markdown
Member

EwoutH commented Jan 28, 2026

I doo agree with dropping **args but keeping **kwargs

@quaquel
Copy link
Copy Markdown
Member

quaquel commented Jan 28, 2026

If we allow series as per #3186, allowing kwargs here is redundant.

@falloficarus22
Copy link
Copy Markdown
Contributor Author

I see the point about redundancy with a Series-capable create_agents, but I think keeping **kwargs in from_dataframe is worth it for Data Integrity and Ergonomics:

  • The DataFrame often represents the 'Raw Population' (observed data like age, income), while the extra **kwargs represent 'Simulation State' (initial model parameters like is_infected=False, energy=100). Forcing users to inject simulation constants into their raw dataframes just to instantiate agents feels like 'polluting' the data layer.
  • Adding a column of constant scalars to a large DataFrame (e.g., 1M rows) is an $O(N)$ operation that consumes extra memory. Passing a single scalar via **kwargs is $O(1)$ and avoids making pandas allocate memory for a column that would be immediately discarded after iteration.
  • Users coming from data-heavy workflows (HuggingFace, Ray, etc.) will look for .from_dataframe() specifically because they have a table and want to map it. If that method lacks the ability to add a simple constant override, it feels 'broken' compared to other Pythonic factory methods.

Essentially, create_agents is for when you are building a population by columns, and from_dataframe is for when you are building a population from a table. Both should be first-class citizens with simple override support.

Also, as stated by @EwoutH in #3186 (comment) should I split this into 2 seperate PRs?

@falloficarus22 falloficarus22 force-pushed the better-dataframe-support branch from e4a6efb to 725b280 Compare January 28, 2026 08:44
@EwoutH
Copy link
Copy Markdown
Member

EwoutH commented Jan 28, 2026

I agree.

While I would prefer 2 PRs, keeping it in one is also fine since the create agent changes are small.

@quaquel
Copy link
Copy Markdown
Member

quaquel commented Jan 28, 2026

Ok, I am fine with adding kwargs back to from_dataframe, but let's then restrict it to non-sequence data only.

@EwoutH
Copy link
Copy Markdown
Member

EwoutH commented Jan 28, 2026

Agreed.

Copy link
Copy Markdown
Member

@quaquel quaquel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me

Copy link
Copy Markdown
Member

@EwoutH EwoutH left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few suggestions for tests

@EwoutH
Copy link
Copy Markdown
Member

EwoutH commented Jan 28, 2026

Do we have any idea how this will handle non-Pandas tables? Like NumPy arrays or Polars dataframes?

@quaquel
Copy link
Copy Markdown
Member

quaquel commented Jan 28, 2026

Do we have any idea how this will handle non-Pandas tables? Like NumPy arrays or Polars dataframes?

Numpy does not have to_dict, so it will fail. Polars has a to_dict, but with different keyword arguments. Again it will fail.

@falloficarus22
Copy link
Copy Markdown
Contributor Author

I can add Duck-Typing for Polars support.

Something like:

# In from_dataframe
if hasattr(df, "to_dict"): # Pandas
    records = df.to_dict(orient="records")
elif hasattr(df, "to_dicts"): # Polars
    records = df.to_dicts()
else:
    raise TypeError("Object must be a Pandas or Polars DataFrame.")

But this might require slightly more complex error handling if to_dict exists but doesn't support the orient kwarg in some other library.

@quaquel
Copy link
Copy Markdown
Member

quaquel commented Jan 28, 2026

I suggest we just handle pandas for now and move the polars conversation more broadly to the mesa 4 discussion

Copy link
Copy Markdown
Member

@EwoutH EwoutH left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that’s fine

@quaquel quaquel merged commit 2c83d6d into mesa:main Jan 28, 2026
14 checks passed
@falloficarus22 falloficarus22 deleted the better-dataframe-support branch January 29, 2026 02:59
@EwoutH EwoutH added the feature Release notes label label Feb 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature Release notes label

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Better DataFrame support for Agent.create_agents()

3 participants