Skip to content

Conversation

@andkay
Copy link
Collaborator

@andkay andkay commented Mar 20, 2025

This PR will address #784.

The proposed approach includes some relatively simple "smoke test" of YAML, SPEC, and coefficients config files:

  • The component settings are first loaded into the relevant Pydantic data model.
  • Then, the coefficients file is then read into memory, if defined at the top level of the data model.
  • Then, the SPEC file is read into memory, similarly, if defined at the top level of the data model.
  • Finally, one of the two available methods (simulate.eval_coefficients or simulate.eval_nest_coefficients) is run.

As scoped, validating expressions is not included, and the code isn't worried about tables at all.

Currently, there is a small subset of models that get tested, which include simpler configs, one with a preprocessor, and one with nested coefficients. Some basic logging is included. There may be more cases that need to be solved, but I did want to get some feedback / validation before pressing onward. I'll highlight some of my own thoughts below, but also want to get some eyes on this to make sure the contribution meets the issue-brief and will be useful.

Autodiscovery of Model Settings YAML Files
Currently, there is a dict that maps the names of component models to a Pydantic model class, as well as the relevant model file name. This will be pretty cumbersome to maintain.

A better way forward would be identify the default arguments to model_settings (Pydantic) and model_settings_file_name (YAML file) from the signature of each step. I'm not totally sure how to extract this information from the step wrapper around the callable.

Error Handling
An important design decision (which is still TBD) will be if any custom errors or warnings are raised by the settings_checker module (currently, this is not included. In my thinking, the main point of the settings checker is to surface any exceptions or warnings that would otherwise be raised at runtime earlier in the run process -- not necessarily to introduce any new error handling. If that is a decent assumption, responsibility for raising problems should be delegated upstream to the underlying Pydantic models and ActivitySim methods.

Nested Coefficients
Currently, I am using simulate.eval_nested_coefficients to evaluate models with the NESTS property. So far, this seems to behave as I expect -- but I am a little wary that it is misguided. With the settings checker turned off, I have not been been able to identify anything in the prototype_mtc model that actually calls this function.

Calling the checker
The checker is currently being called from cli.run for convenience of development. But I suspect that it needs to be migrated (or added) to the State.run method.

@andkay andkay requested review from jpn-- and seanmcatee March 20, 2025 22:16

def check_model_settings(state: State) -> None:

components = state.settings.models # _RUNNABLE_STEPS.keys() may be better?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

state.settings.models is probably the right thing here not _RUNNABLE_STEPS, as the latter includes every step that might be run, but the former includes only that which we want to run right now. It's totally legit to have incomplete/invalid settings for steps we are not trying to run right now.

logger = logging.getLogger(__name__)
file_logger = logger.getChild("logfile")

COMPONENTS_TO_SETTINGS = {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably a necessary evil for now.

andkay added 9 commits June 7, 2025 18:01
missing values.

because the fields in the settings classes are inherited, it is not
usually possible for the settings checker to determine if a path to an
external file is actually required for a particular model component.

the workaround is to issue a specific warning that a path *may* be
expected and users should check the YAML file.

the ultimate solution would be to define more robust Pydantic data
models that ensure that fields are marked as required when appropriate.
andkay and others added 18 commits June 9, 2025 23:24
* initial batch of models with pre and post processing added

* nonmand sched and transit pass models

* first pass at preprocessing and annotate functionality in all models

* fixing bugs in jtf and trip purpose

* adding persons back in to locals_d in jtc

* model name missing in tour scheduling

* missing expressions import in tour sched prob

* ci unit tests & fixing estimation test error

* addressing review comments

---------

Co-authored-by: Ali Etezady <[email protected]>
* first uv sync

* explicitly set np.int64 dtype before calling pd set_index()

Pandas 2 got rid of the `Int64Index` class, it was downcasting to int32 for the input land use table

* Add all dependencies from Conda environments

Consider removing unused ones at a later time

* Remove orca

This dependency was replaced with “state” for workflow orchestration.

* Update release instructions for `uv`

* Remove `--no-default-groups` from `uv sync` instructions

* Trial updating Github Action to uv instead of conda

* Bug fixes

* Try downgrading sharrow

* change core test to windows runner

* use win for all core tests

* index type mismatch

* formatting

* unlock sharrow to 2.14

* correct a typo

* Update Model Setup page in user guide

* Update Ways to Run the Model page

* Remove duplicates

* Simplify pyproject.toml for github action and use only this group

* Update remainder of user guide docs

* Update dev guide [makedocs]

* Update lockfile from changes to dependencies

* Remove conda from other github actions

* Debug doc building [makedocs]

* Update install instructions [makedocs]

* update installation instructions [makedocs]

* notes on uv options [makedocs]

* address review comment

---------

Co-authored-by: Josie Kressner <[email protected]>
* identify and flag naming conflicts in 2d vs 3d skims

* add skims doc

* run on workflow_dispatch

* add link to docs

* write error message in exception also

* add error message example

* code block formatting

* unit tests
* update cat columns in df when new cat exists

* blacken

* sort two categoricals before union
* minor fixes for SimOR development

* blacken

* Using PNUM if available in JTFC

Required to maintain backwards compatibility in tour indexes in SANDAG models

---------

Co-authored-by: Jeffrey Newman <[email protected]>
…activitysim version to logfile (ActivitySim#963)

* adding activitysim version to log file

* fix duplicate columns in parking location choice as described in ActivitySim#633

* making tour mode choice logsum optional in destionation choice models ActivitySim#847

* logsum settings in location choice, trip dest, tour_od

* optional logsums in trip destination

* blacken

---------

Co-authored-by: Ali Etezady <[email protected]>
@andkay
Copy link
Collaborator Author

andkay commented Sep 4, 2025

Closed by ActivitySim#950

@andkay andkay closed this Sep 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants