Bridging Visual Representation and Reinforcement Learning from Verifiable Rewards in Large Vision-Language Models

Published in ECCV 2026 (Under Review), 2026

Reinforcement Learning from Verifiable Rewards (RLVR) has substantially enhanced the reasoning capabilities of large language models in abstract reasoning tasks. However, its application to Large Vision-Language Models (LVLMs) remains constrained by a structural representational bottleneck. We propose KAWHI (Key-Region Aligned Weighted Harmonic Incentive), a plug-and-play reward reweighting mechanism that explicitly incorporates structured visual information into uniform reward policy optimization methods. KAWHI adaptively localizes semantically salient regions through hierarchical geometric aggregation, identifies vision-critical attention heads via structured attribution, and performs paragraph-level credit reallocation to align spatial visual evidence with semantically decisive reasoning steps.

Recommended citation: Yuhang Han, Yuyang Wu, Zhengbo Jiao, Yiyu Wang, Xuyang Liu, Shaobo Wang, Hanlin Xu, Xuming Hu, and Linfeng Zhang. (2026). "Bridging Visual Representation and Reinforcement Learning from Verifiable Rewards in Large Vision-Language Models." arXiv preprint arXiv:2603.27375.
@article{han2026kawhi,
  title={Bridging Visual Representation and Reinforcement Learning from Verifiable Rewards in Large Vision-Language Models},
  author={Han, Yuhang and Wu, Yuyang and Jiao, Zhengbo and Wang, Yiyu and Liu, Xuyang and Wang, Shaobo and Xu, Hanlin and Hu, Xuming and Zhang, Linfeng},
  journal={arXiv preprint arXiv:2603.27375},
  year={2026}
}