-
Notifications
You must be signed in to change notification settings - Fork 3k
Open
Labels
bugSomething isn't workingSomething isn't working
Description
System Info
----------Python Info----------
Version : 3.11.10
Compiler : GCC 13.3.0
Build : ('main', 'Oct 16 2024 01:27:36')
Arch : ('64bit', 'ELF')
------------Pip Info-----------
Version : 25.2
Directory : /opt/conda/lib/python3.11/site-packages/pip
vllm : 0.10.2
sglang : not found.
ray : 2.50.0
torch : 2.8.0
----------verl Info-----------
Version : 0.7.0.dev
Directory : /workspace/verl/verl
Commit Hash : acfcf98ed0dd8997f9dfbd1795e24c49486fba71
----------Platform Info----------
Platform : Linux-5.10.223-212.873.amzn2.x86_64-x86_64-with-glibc2.35
system : Linux
node : d3a56fc99633
release : 5.10.223-212.873.amzn2.x86_64
version : #1 SMP Wed Aug 7 16:53:32 UTC 2024
----------Environment----------
CUDA Runtime : 12.8
CUDA Compiler : Cuda compilation tools, release 12.4, V12.4.131
----------System Info----------
CPU Memory : 1492.03 GB
GPU Count : 8
GPU 1 Type : NVIDIA L40S
GPU 1 Memory : 44.99 GB
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
Any training script based on trainer/config/_generated_ppo_trainer.yaml
Expected behavior
Encountering the following error after upgrading from verl 0.5 to 0.6, the same script/code has running smoothly in 0.5 version.
File "/opt/conda/lib/python3.11/site-packages/verl/single_controller/ray/base.py", line 700, in func
return getattr(self.worker_dict[key], name)(*args, **kwargs)
File "/opt/conda/lib/python3.11/site-packages/verl/single_controller/base/decorator.py", line 433, in inner
return func(*args, **kwargs)
File "/opt/conda/lib/python3.11/site-packages/verl/workers/fsdp_workers.py", line 797, in init_model
self._build_rollout(trust_remote_code=self.config.model.get("trust_remote_code", False))
File "/opt/conda/lib/python3.11/site-packages/verl/workers/fsdp_workers.py", line 633, in _build_rollout
loop = asyncio.get_event_loop()
File "/opt/conda/lib/python3.11/site-packages/uvloop/__init__.py", line 206, in get_event_loop
raise RuntimeError(
RuntimeError: There is no current event loop in thread 'MainThread'.
Wanted to check if the community/owner has encountered similar issues, and what would be the recommended path forward.
mirrorboat, Fir-lat, jxmorris12, stgzr and dkremez
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working