CapRL++: Unified Reinforcement Learning with Verifiable Rewards for Dense Image and Video Captioning
A unified RLVR framework for dense image and video captioning, where caption quality is optimized through verifiable downstream question-answering rewards.
