SANDAG run time results after chunking update

I have been testing the recent chunking update on the 1-Zone SANDAG example. The tests were run on a server with the following specs:
- Processors: 2 Intel Xeon CPU E5-2690 v4 @ 2.60Ghz
- Cores: 28 Cores, 56 Logical Processors
- Memory: 256 GB RAM

The `chunk_training_mode`, `chunk_size`, and `num_processes` settings were set per the [documentation](https://activitysim.github.io/activitysim/core.html#chunk) -- other chunking settings were left unchanged. The sequential steps taken, and their run times are recorded in the following table:

Step | Processes | Chunk Size (RAM) | Total Run Time   (minutes) 
-- | -- | -- | --
Training_1 | 45 | 205 | 1700
Production | 45 | 205 | 573
Production | 56 | 220 | 477
Adaptive_1 | 45 | 205 | 1300
Production | 56 | 220 | 499
Adaptive_2 | 45 | 205 | 1112
Production | 56 | 230 | 529
**SERVER REBOOT**
Production | 56 | 230 | 362
Adaptive_3 | 45 | 205 | 
Production | 56 | 230 | 514
Production | 45 | 230 | 477

As can be seen, the production run times fluctuate heavily and it is hard to tell which settings are best. The best run time was achieved after a server reboot but returned to prior run times after another round of adaptive. Other processor settings were used (e.g. 28) in production runs but some memory/chunking issues were encountered (specifically at this [line](https://github.com/ActivitySim/activitysim/blob/a57efa1418320cb88d72ad5f413aaa45131183a4/activitysim/core/chunk.py#L870)).

Also, the run times we are getting here are much longer than the run times we were getting prior to the update. I compare the run times between a pre-update run using 56 processes and the best production run that achieved 362 minutes with 56 processes as well (both were run on the same machine):

model_name | Pre-Update | Post-Update | Pre - Post
-- | -- | -- | --
mandatory_tour_scheduling | 6.4 | 17.6 | -11.2
joint_tour_destination | 0.9 | 5.3 | -4.4
non_mandatory_tour_destination | 12.9 | 16 | -3.1
non_mandatory_tour_scheduling | 2.2 | 7.8 | -5.6
trip_destination | 26.8 | 65.4 | -38.6
trip_scheduling | 4.8 | 142.5 | -137.7
SUB MODEL RUN   TIME | 92.8 | 310.9 | -218.1

The main differences were kept in the table above. Trip scheduling here seems to be the main bottleneck. While comparing the log files, I noticed that 100 iterations of trip scheduling were done for each run. However, the post-update run spent a lot longer for each iteration on, what seems to be, chunking processes.

Lastly, the memory usage for the post-update production run I compared above:

<img src="https://user-images.githubusercontent.com/36172634/123495189-87ebe200-d5d7-11eb-96bb-aab655e5732e.png" width="350" />

The chunk size was set to 230 GB yet it seems to peak at ~160 GB.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

SANDAG run time results after chunking update #444

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Step	Processes	Chunk Size (RAM)	Total Run Time (minutes)
Training_1	45	205	1700
Production	45	205	573
Production	56	220	477
Adaptive_1	45	205	1300
Production	56	220	499
Adaptive_2	45	205	1112
Production	56	230	529
SERVER REBOOT
Production	56	230	362
Adaptive_3	45	205
Production	56	230	514
Production	45	230	477

model_name	Pre-Update	Post-Update	Pre - Post
mandatory_tour_scheduling	6.4	17.6	-11.2
joint_tour_destination	0.9	5.3	-4.4
non_mandatory_tour_destination	12.9	16	-3.1
non_mandatory_tour_scheduling	2.2	7.8	-5.6
trip_destination	26.8	65.4	-38.6
trip_scheduling	4.8	142.5	-137.7
SUB MODEL RUN TIME	92.8	310.9	-218.1

SANDAG run time results after chunking update #444

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions