-
Notifications
You must be signed in to change notification settings - Fork 3k
[WIP] SGLang rollout multiturn support #917
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
dd0017f to
64c458c
Compare
56fa926 to
a67c247
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add comments to this yaml. Tell others this is ready to use but the validation score hasn't been improved. Welcome others to contribute and improve the algorithm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, chenyang, for "hasn't been improved", what's the baseline here? Thx
| index_start=val.index_start, | ||
| index_end=val.index_end, | ||
| ) | ||
| else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
elif "multi-turn":
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
state the usage of this file at the beginning
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
state the usage of this file at the beginning
Co-authored-by: Hanchen Zhang <[email protected]> Co-authored-by: Rui Lu <[email protected]> Co-authored-by: Haoran Wang <[email protected]> Co-authored-by: Yujiang Li <[email protected]> Co-authored-by: zhaochenyang20 <[email protected]>
339102f to
0dd7ce8
Compare
f2d0992 to
c7d0f20
Compare
c7d0f20 to
16d1550
Compare
Co-authored-by: Jiajun Li <[email protected]>
a794fb1 to
31f39fe
Compare
…1037) A redesigned version of #917 ## Current Status [Develop log & Tracker](zhaochenyang20/Awesome-ML-SYS-Tutorial#113) **What Has Been Done** - Async Rollout Refactoring: Integrate with the tool server to coordinate tool calls during generation, leveraging request IDs for state and progress tracking, support async multi-turn conversations in Agentic RL training (with Tool support). - Async Request Management: Encapsulate rollout requests into a unified structure, enabling efficient tracking and handling of concurrent multi-turn dialogues with chatml style messages. - Extensible Tools: A modular design for adapt tools in OpenAIFunctionTool format which is both support by SGLang and vLLM, with create separate instance, execute when tool call, calc score according to tool env state and release resource. - Multi-turn support has been implemented for the GSM8K task (new version working on). However, training has not yet converged, and we hope the community could join to investigate the issue. **What Is WIP** - [x] Merge loss mask to training process from last version - [x] Add more user friendly tool config and e2e tests for gsm8k with tool training - [ ] We are going to validate our multiturn feature in open-source sandbox environments. ## Key Features will be introduced in future version - Integrate a Ray-based agent trainer to enable explicit separation of the rollout and training pipeline. Provide support for partial rollout handling and fine-grained request state management. - Extend the framework to support simulated user interactions (e.g., roleplay, interactive feedback) and more complex environment-in-the-loop RL tasks. **Future Plan** [Discussion Thread](zhaochenyang20/Awesome-ML-SYS-Tutorial#74 (comment)) [RFC doc](https://github.com/SwordFaith/verl-sglang-dev-log/blob/main/rlhf/verl/multi-turn/veRL-multiturn-rollout-RFC.md) will be updated soon. ## Contributors & Acknowledgement - Xiang Long [[email protected]](mailto:[email protected]) @SwordFaith (Design RFC & core-dev of refactor part) - Yuzhen Zhou [[email protected]](mailto:[email protected]) @zyzshishui (Core-dev) - Chenyang Zhao [[email protected]](mailto:[email protected]) @zhaochenyang20 (PM) - Guanhua Wang @WANG-GH - Junrong Lin @ocss884 (verl-sglang support) - Hanchen Zhang [[email protected]](mailto:[email protected]) - Haoran Wang [[email protected]](mailto:[email protected]) - Rui Lu [[email protected]](mailto:[email protected]) - Yujiang Li [[email protected]](mailto:[email protected]) - Jiajun Li [[email protected]](mailto:[email protected]) - Jin Pan [[email protected]](mailto:[email protected]) - Zhi Zheng [[email protected]](mailto:[email protected]) @zh-zheng --------- Co-authored-by: zyzshishui <[email protected]> Co-authored-by: guanhua <[email protected]> Co-authored-by: zhaochenyang20 <[email protected]> Co-authored-by: ocss884 <[email protected]> Co-authored-by: Shawn/Yuxuan Tong <[email protected]> Co-authored-by: HL <[email protected]>
…olcengine#1037) A redesigned version of volcengine#917 ## Current Status [Develop log & Tracker](zhaochenyang20/Awesome-ML-SYS-Tutorial#113) **What Has Been Done** - Async Rollout Refactoring: Integrate with the tool server to coordinate tool calls during generation, leveraging request IDs for state and progress tracking, support async multi-turn conversations in Agentic RL training (with Tool support). - Async Request Management: Encapsulate rollout requests into a unified structure, enabling efficient tracking and handling of concurrent multi-turn dialogues with chatml style messages. - Extensible Tools: A modular design for adapt tools in OpenAIFunctionTool format which is both support by SGLang and vLLM, with create separate instance, execute when tool call, calc score according to tool env state and release resource. - Multi-turn support has been implemented for the GSM8K task (new version working on). However, training has not yet converged, and we hope the community could join to investigate the issue. **What Is WIP** - [x] Merge loss mask to training process from last version - [x] Add more user friendly tool config and e2e tests for gsm8k with tool training - [ ] We are going to validate our multiturn feature in open-source sandbox environments. ## Key Features will be introduced in future version - Integrate a Ray-based agent trainer to enable explicit separation of the rollout and training pipeline. Provide support for partial rollout handling and fine-grained request state management. - Extend the framework to support simulated user interactions (e.g., roleplay, interactive feedback) and more complex environment-in-the-loop RL tasks. **Future Plan** [Discussion Thread](zhaochenyang20/Awesome-ML-SYS-Tutorial#74 (comment)) [RFC doc](https://github.com/SwordFaith/verl-sglang-dev-log/blob/main/rlhf/verl/multi-turn/veRL-multiturn-rollout-RFC.md) will be updated soon. ## Contributors & Acknowledgement - Xiang Long [[email protected]](mailto:[email protected]) @SwordFaith (Design RFC & core-dev of refactor part) - Yuzhen Zhou [[email protected]](mailto:[email protected]) @zyzshishui (Core-dev) - Chenyang Zhao [[email protected]](mailto:[email protected]) @zhaochenyang20 (PM) - Guanhua Wang @WANG-GH - Junrong Lin @ocss884 (verl-sglang support) - Hanchen Zhang [[email protected]](mailto:[email protected]) - Haoran Wang [[email protected]](mailto:[email protected]) - Rui Lu [[email protected]](mailto:[email protected]) - Yujiang Li [[email protected]](mailto:[email protected]) - Jiajun Li [[email protected]](mailto:[email protected]) - Jin Pan [[email protected]](mailto:[email protected]) - Zhi Zheng [[email protected]](mailto:[email protected]) @zh-zheng --------- Co-authored-by: zyzshishui <[email protected]> Co-authored-by: guanhua <[email protected]> Co-authored-by: zhaochenyang20 <[email protected]> Co-authored-by: ocss884 <[email protected]> Co-authored-by: Shawn/Yuxuan Tong <[email protected]> Co-authored-by: HL <[email protected]>
…olcengine#1037) A redesigned version of volcengine#917 ## Current Status [Develop log & Tracker](zhaochenyang20/Awesome-ML-SYS-Tutorial#113) **What Has Been Done** - Async Rollout Refactoring: Integrate with the tool server to coordinate tool calls during generation, leveraging request IDs for state and progress tracking, support async multi-turn conversations in Agentic RL training (with Tool support). - Async Request Management: Encapsulate rollout requests into a unified structure, enabling efficient tracking and handling of concurrent multi-turn dialogues with chatml style messages. - Extensible Tools: A modular design for adapt tools in OpenAIFunctionTool format which is both support by SGLang and vLLM, with create separate instance, execute when tool call, calc score according to tool env state and release resource. - Multi-turn support has been implemented for the GSM8K task (new version working on). However, training has not yet converged, and we hope the community could join to investigate the issue. **What Is WIP** - [x] Merge loss mask to training process from last version - [x] Add more user friendly tool config and e2e tests for gsm8k with tool training - [ ] We are going to validate our multiturn feature in open-source sandbox environments. ## Key Features will be introduced in future version - Integrate a Ray-based agent trainer to enable explicit separation of the rollout and training pipeline. Provide support for partial rollout handling and fine-grained request state management. - Extend the framework to support simulated user interactions (e.g., roleplay, interactive feedback) and more complex environment-in-the-loop RL tasks. **Future Plan** [Discussion Thread](zhaochenyang20/Awesome-ML-SYS-Tutorial#74 (comment)) [RFC doc](https://github.com/SwordFaith/verl-sglang-dev-log/blob/main/rlhf/verl/multi-turn/veRL-multiturn-rollout-RFC.md) will be updated soon. ## Contributors & Acknowledgement - Xiang Long [[email protected]](mailto:[email protected]) @SwordFaith (Design RFC & core-dev of refactor part) - Yuzhen Zhou [[email protected]](mailto:[email protected]) @zyzshishui (Core-dev) - Chenyang Zhao [[email protected]](mailto:[email protected]) @zhaochenyang20 (PM) - Guanhua Wang @WANG-GH - Junrong Lin @ocss884 (verl-sglang support) - Hanchen Zhang [[email protected]](mailto:[email protected]) - Haoran Wang [[email protected]](mailto:[email protected]) - Rui Lu [[email protected]](mailto:[email protected]) - Yujiang Li [[email protected]](mailto:[email protected]) - Jiajun Li [[email protected]](mailto:[email protected]) - Jin Pan [[email protected]](mailto:[email protected]) - Zhi Zheng [[email protected]](mailto:[email protected]) @zh-zheng --------- Co-authored-by: zyzshishui <[email protected]> Co-authored-by: guanhua <[email protected]> Co-authored-by: zhaochenyang20 <[email protected]> Co-authored-by: ocss884 <[email protected]> Co-authored-by: Shawn/Yuxuan Tong <[email protected]> Co-authored-by: HL <[email protected]>
…olcengine#1037) A redesigned version of volcengine#917 ## Current Status [Develop log & Tracker](zhaochenyang20/Awesome-ML-SYS-Tutorial#113) **What Has Been Done** - Async Rollout Refactoring: Integrate with the tool server to coordinate tool calls during generation, leveraging request IDs for state and progress tracking, support async multi-turn conversations in Agentic RL training (with Tool support). - Async Request Management: Encapsulate rollout requests into a unified structure, enabling efficient tracking and handling of concurrent multi-turn dialogues with chatml style messages. - Extensible Tools: A modular design for adapt tools in OpenAIFunctionTool format which is both support by SGLang and vLLM, with create separate instance, execute when tool call, calc score according to tool env state and release resource. - Multi-turn support has been implemented for the GSM8K task (new version working on). However, training has not yet converged, and we hope the community could join to investigate the issue. **What Is WIP** - [x] Merge loss mask to training process from last version - [x] Add more user friendly tool config and e2e tests for gsm8k with tool training - [ ] We are going to validate our multiturn feature in open-source sandbox environments. ## Key Features will be introduced in future version - Integrate a Ray-based agent trainer to enable explicit separation of the rollout and training pipeline. Provide support for partial rollout handling and fine-grained request state management. - Extend the framework to support simulated user interactions (e.g., roleplay, interactive feedback) and more complex environment-in-the-loop RL tasks. **Future Plan** [Discussion Thread](zhaochenyang20/Awesome-ML-SYS-Tutorial#74 (comment)) [RFC doc](https://github.com/SwordFaith/verl-sglang-dev-log/blob/main/rlhf/verl/multi-turn/veRL-multiturn-rollout-RFC.md) will be updated soon. ## Contributors & Acknowledgement - Xiang Long [[email protected]](mailto:[email protected]) @SwordFaith (Design RFC & core-dev of refactor part) - Yuzhen Zhou [[email protected]](mailto:[email protected]) @zyzshishui (Core-dev) - Chenyang Zhao [[email protected]](mailto:[email protected]) @zhaochenyang20 (PM) - Guanhua Wang @WANG-GH - Junrong Lin @ocss884 (verl-sglang support) - Hanchen Zhang [[email protected]](mailto:[email protected]) - Haoran Wang [[email protected]](mailto:[email protected]) - Rui Lu [[email protected]](mailto:[email protected]) - Yujiang Li [[email protected]](mailto:[email protected]) - Jiajun Li [[email protected]](mailto:[email protected]) - Jin Pan [[email protected]](mailto:[email protected]) - Zhi Zheng [[email protected]](mailto:[email protected]) @zh-zheng --------- Co-authored-by: zyzshishui <[email protected]> Co-authored-by: guanhua <[email protected]> Co-authored-by: zhaochenyang20 <[email protected]> Co-authored-by: ocss884 <[email protected]> Co-authored-by: Shawn/Yuxuan Tong <[email protected]> Co-authored-by: HL <[email protected]>
…olcengine#1037) A redesigned version of volcengine#917 ## Current Status [Develop log & Tracker](zhaochenyang20/Awesome-ML-SYS-Tutorial#113) **What Has Been Done** - Async Rollout Refactoring: Integrate with the tool server to coordinate tool calls during generation, leveraging request IDs for state and progress tracking, support async multi-turn conversations in Agentic RL training (with Tool support). - Async Request Management: Encapsulate rollout requests into a unified structure, enabling efficient tracking and handling of concurrent multi-turn dialogues with chatml style messages. - Extensible Tools: A modular design for adapt tools in OpenAIFunctionTool format which is both support by SGLang and vLLM, with create separate instance, execute when tool call, calc score according to tool env state and release resource. - Multi-turn support has been implemented for the GSM8K task (new version working on). However, training has not yet converged, and we hope the community could join to investigate the issue. **What Is WIP** - [x] Merge loss mask to training process from last version - [x] Add more user friendly tool config and e2e tests for gsm8k with tool training - [ ] We are going to validate our multiturn feature in open-source sandbox environments. ## Key Features will be introduced in future version - Integrate a Ray-based agent trainer to enable explicit separation of the rollout and training pipeline. Provide support for partial rollout handling and fine-grained request state management. - Extend the framework to support simulated user interactions (e.g., roleplay, interactive feedback) and more complex environment-in-the-loop RL tasks. **Future Plan** [Discussion Thread](zhaochenyang20/Awesome-ML-SYS-Tutorial#74 (comment)) [RFC doc](https://github.com/SwordFaith/verl-sglang-dev-log/blob/main/rlhf/verl/multi-turn/veRL-multiturn-rollout-RFC.md) will be updated soon. ## Contributors & Acknowledgement - Xiang Long [[email protected]](mailto:[email protected]) @SwordFaith (Design RFC & core-dev of refactor part) - Yuzhen Zhou [[email protected]](mailto:[email protected]) @zyzshishui (Core-dev) - Chenyang Zhao [[email protected]](mailto:[email protected]) @zhaochenyang20 (PM) - Guanhua Wang @WANG-GH - Junrong Lin @ocss884 (verl-sglang support) - Hanchen Zhang [[email protected]](mailto:[email protected]) - Haoran Wang [[email protected]](mailto:[email protected]) - Rui Lu [[email protected]](mailto:[email protected]) - Yujiang Li [[email protected]](mailto:[email protected]) - Jiajun Li [[email protected]](mailto:[email protected]) - Jin Pan [[email protected]](mailto:[email protected]) - Zhi Zheng [[email protected]](mailto:[email protected]) @zh-zheng --------- Co-authored-by: zyzshishui <[email protected]> Co-authored-by: guanhua <[email protected]> Co-authored-by: zhaochenyang20 <[email protected]> Co-authored-by: ocss884 <[email protected]> Co-authored-by: Shawn/Yuxuan Tong <[email protected]> Co-authored-by: HL <[email protected]>
Description
Current Status
What Has Been Done
What Is Pending
Refactoring Update
The refactoring of the multiturn feature is well underway and expected to be completed by the end of this week. Key changes include:
Further Refactor Plan
The next phases of the refactoring effort will build upon the current foundation to support more advanced rollout scenarios: