Commit 17770ee
committed
[https://nvbugs/6143599][fix] DeepSeek-V3 OOM and artifacts path
- Lower kv_cache_free_gpu_memory_fraction from 0.85 to 0.75 for
DeepSeek-V3/R1; the previous fraction left no headroom for the
transient DeepGEMM MoE workspace and OOM'd at max_batch_size=2048.
- Set PYTORCH_ALLOC_CONF=expandable_segments:True for DeepSeek-V3/R1
to reduce CUDA allocator fragmentation under stress.
- Add ARTIFACTS_DIR constant anchored to this file's location; pass it
to aiperf via --output-artifact-dir and use it as the default reader
path in extract_stress_test_metrics, so writes and reads stay aligned
independent of pytest cwd.
Signed-off-by: Wangshanshan <[email protected]>1 parent 45d15a1 commit 17770ee
1 file changed
Lines changed: 13 additions & 3 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
72 | 72 | | |
73 | 73 | | |
74 | 74 | | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
75 | 81 | | |
76 | 82 | | |
77 | 83 | | |
| |||
571 | 577 | | |
572 | 578 | | |
573 | 579 | | |
| 580 | + | |
| 581 | + | |
| 582 | + | |
574 | 583 | | |
575 | 584 | | |
576 | 585 | | |
| |||
582 | 591 | | |
583 | 592 | | |
584 | 593 | | |
585 | | - | |
| 594 | + | |
586 | 595 | | |
587 | 596 | | |
588 | 597 | | |
| |||
954 | 963 | | |
955 | 964 | | |
956 | 965 | | |
| 966 | + | |
| 967 | + | |
957 | 968 | | |
958 | 969 | | |
959 | 970 | | |
| |||
1365 | 1376 | | |
1366 | 1377 | | |
1367 | 1378 | | |
1368 | | - | |
1369 | | - | |
| 1379 | + | |
1370 | 1380 | | |
1371 | 1381 | | |
1372 | 1382 | | |
| |||
0 commit comments