-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Better DataFrame support for Agent.create_agents() #3186
Copy link
Copy link
Closed
Labels
enhancementRelease notes labelRelease notes label
Description
I think we can improve the ergonomics of creating agents from pandas DataFrames. Currently it requires verbose .tolist() conversions for each column.
Background
Users may initialize agents from tabular data (CSV, Parquet, database queries).
# Example synthetic population data
df = pd.DataFrame({
'age': [25, 34, 45, 67, 29],
'bmi': [22.5, 28.1, 31.2, 24.8, 26.3],
'condition_status': ['healthy', 'at_risk', 'chronic', 'healthy', 'at_risk'],
'income': [45000, 62000, 51000, 38000, 71000]
})The current create_agents() API requires converting each DataFrame column to a list:
class HealthAgent(Agent):
def __init__(self, model, age, bmi, condition_status, income):
super().__init__(model)
self.age = age
self.bmi = bmi
self.condition_status = condition_status
self.income = income
# Current approach - verbose and inefficient
agents = HealthAgent.create_agents(
model=model,
n=len(df),
age=df['age'].tolist(), # Manual conversion
bmi=df['bmi'].tolist(), # Manual conversion
condition_status=df['condition_status'].tolist(), # Manual conversion
income=df['income'].tolist() # Manual conversion
)Potential solutions
We're considering two approaches (not mutually exclusive):
Option 1: Accept DataFrame columns directly
Allow pandas Series as arguments without manual conversion:
# Proposed - cleaner API
agents = HealthAgent.create_agents(
model=model,
n=len(df),
age=df['age'], # No .tolist() needed
bmi=df['bmi'],
condition_status=df['condition_status'],
income=df['income']
)Option 2: Add df parameter for direct DataFrame input
Add a dedicated parameter that accepts a DataFrame:
# Most concise - auto-map all columns
agents = HealthAgent.create_agents(
model=model,
df=df
)
# If you only want certain columns, just filter them yourself before input:
agents = HealthAgent.create_agents(
model=model,
df=df[['age', 'bmi', 'condition_status', 'income']]
)
# Mix DataFrame with additional parameters
agents = HealthAgent.create_agents(
model=model,
df=df,
initial_energy=100 # Same value for all agents
)
# Use DataFrame subset with overrides
agents = HealthAgent.create_agents(
model=model,
df=df[['age', 'bmi']],
condition_status='healthy', # Override for all
income=df['adjusted_income'] # Mix with Series
)Questions for discussion
- Which option should we implement? Both? Start with Option 1, add Option 2 later?
- For Option 2, how should we handle conflicts if df contains a column 'age' AND the user passes age=... explicitly?
- Should we also support other tabular formats like Polars DataFrames or NumPy structured arrays?
Related
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementRelease notes labelRelease notes label