Skip to content

Conversation

@Tabrizian
Copy link
Member

Fix lost requests for disaggregated serving

Looks like there might be some issues with connection reuse that tries to reuse closed connections. Disable this feature for now. After this fix, I don't see any "broken pipe" or "connection reset error" in the log. However, there are still some lost requests that I'm investigating.

Related issue: aio-libs/aiohttp#6138

Before this change

============ Serving Benchmark Result ============
Successful requests:                     11959     
Benchmark duration (s):                  171.29    
Total input tokens:                      2620090   
Total generated tokens:                  2372861   
Request throughput (req/s):              69.82     
Output token throughput (tok/s):         13853.07  
Total Token throughput (tok/s):          29149.50  
User throughput (tok/s):                 12.96     
---------------Time to First Token----------------
Mean TTFT (ms):                          12375.93  
Median TTFT (ms):                        13243.47  
P99 TTFT (ms):                           16061.78  
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          4.33      
Median TPOT (ms):                        0.04      
P99 TPOT (ms):                           38.12     
---------------Inter-token Latency----------------
Mean ITL (ms):                           7.60      
Median ITL (ms):                         0.03      
P99 ITL (ms):                            0.61      
==================================================

After this change

============ Serving Benchmark Result ============
Successful requests:                     11964     
Benchmark duration (s):                  149.21    
Total input tokens:                      2620592   
Total generated tokens:                  2376949   
Request throughput (req/s):              80.18     
Output token throughput (tok/s):         15930.54  
Total Token throughput (tok/s):          33494.01  
User throughput (tok/s):                 13.90     
---------------Time to First Token----------------
Mean TTFT (ms):                          8803.55   
Median TTFT (ms):                        9334.60   
P99 TTFT (ms):                           12528.99  
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          9.50      
Median TPOT (ms):                        5.58      
P99 TPOT (ms):                           46.91     
---------------Inter-token Latency----------------
Mean ITL (ms):                           15.28     
Median ITL (ms):                         0.02      
P99 ITL (ms):                            856.46    

@Tabrizian Tabrizian force-pushed the user/imant/fixStreaming branch from 825c9a4 to 42bc0e4 Compare July 8, 2025 03:07
@Tabrizian Tabrizian enabled auto-merge (squash) July 8, 2025 03:07
@Tabrizian Tabrizian requested review from arekay and pcastonguay July 8, 2025 03:07
@Tabrizian
Copy link
Member Author

/bot run

1 similar comment
@Tabrizian
Copy link
Member Author

/bot run

Copy link
Collaborator

@pcastonguay pcastonguay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know why the P99 ITL increased substantially after the changes? Can you reproduce all the time?

@pcastonguay pcastonguay requested a review from kaiyux July 8, 2025 13:58
@pcastonguay
Copy link
Collaborator

@kaiyux could you review the changes as well? Thanks.

@Tabrizian
Copy link
Member Author

Tabrizian commented Jul 8, 2025

@pcastonguay After a rerun the ITL number is much more reasonable, might have been because of running it before any warmups.:

============ Serving Benchmark Result ============
Successful requests:                     11964     
Benchmark duration (s):                  192.66    
Total input tokens:                      2620592   
Total generated tokens:                  2377148   
Request throughput (req/s):              62.10     
Output token throughput (tok/s):         12338.83  
Total Token throughput (tok/s):          25941.28  
User throughput (tok/s):                 2.39      
---------------Time to First Token----------------
Mean TTFT (ms):                          105621.21 
Median TTFT (ms):                        107776.22 
P99 TTFT (ms):                           189838.78 
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          6.32      
Median TPOT (ms):                        0.04      
P99 TPOT (ms):                           65.52     
---------------Inter-token Latency----------------
Mean ITL (ms):                           8.90      
Median ITL (ms):                         0.03      
P99 ITL (ms):                            13.00     
==================================================

@Tabrizian Tabrizian force-pushed the user/imant/fixStreaming branch from 18e6978 to 52496d3 Compare July 8, 2025 19:03
@Tabrizian
Copy link
Member Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #11340 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #11340 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #8391 completed with status: 'SUCCESS'
Pipeline passed with automatic retried tests. Check the rerun report for details.

@Tabrizian Tabrizian force-pushed the user/imant/fixStreaming branch from 52496d3 to 76f1a2c Compare July 8, 2025 23:18
@pcastonguay pcastonguay self-requested a review July 8, 2025 23:19
Tabrizian added 3 commits July 8, 2025 16:21
Signed-off-by: Iman Tabrizian <[email protected]>
Signed-off-by: Iman Tabrizian <[email protected]>
@Tabrizian Tabrizian force-pushed the user/imant/fixStreaming branch from 76f1a2c to 826e3b4 Compare July 8, 2025 23:21
@Tabrizian
Copy link
Member Author

/bot reuse-pipeline

@tensorrt-cicd
Copy link
Collaborator

PR_Github #11353 [ reuse-pipeline ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #11353 [ reuse-pipeline ] completed with state SUCCESS
Reusing PR_Github #11340 for commit 826e3b4

@Tabrizian Tabrizian merged commit c508b99 into NVIDIA:main Jul 8, 2025
3 checks passed
@Tabrizian Tabrizian deleted the user/imant/fixStreaming branch July 9, 2025 16:57
nvzhihanj pushed a commit that referenced this pull request Jul 11, 2025
zhou-yuxin pushed a commit to zhou-yuxin/TensorRT-LLM that referenced this pull request Jul 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants