OmniGameArena: A Unified UE5 Benchmark for VLM Game Agents with Improvement Dynamics

Published in arXiv Technical Report, 2026

Vision-language model agents are increasingly evaluated in interactive game environments, but existing benchmarks often emphasize a single first-attempt score and lack unified protocols across heterogeneous agent classes. OmniGameArena introduces twelve newly built Unreal Engine 5 games spanning Solo, PvP, and Coop settings, together with the Improvement Dynamics Curve framework for measuring how agent performance evolves through iterative reflection and how learned skills generalize to held-out task variants.

Recommended citation: Mingxian Lin, Shengju Qian, Yuqi Liu, Yi-Hua Huang, Yiyu Wang, Wei Huang, Yitang Li, Fan Zhang, Zeyu Hu, Lingting Zhu, Xin Wang, and Xiaojuan Qi. (2026). "OmniGameArena: A Unified UE5 Benchmark for VLM Game Agents with Improvement Dynamics." arXiv preprint arXiv:2606.09826.
@article{lin2026omnigamearena,
  title={OmniGameArena: A Unified UE5 Benchmark for VLM Game Agents with Improvement Dynamics},
  author={Lin, Mingxian and Qian, Shengju and Liu, Yuqi and Huang, Yi-Hua and Wang, Yiyu and Huang, Wei and Li, Yitang and Zhang, Fan and Hu, Zeyu and Zhu, Lingting and Wang, Xin and Qi, Xiaojuan},
  journal={arXiv preprint arXiv:2606.09826},
  year={2026}
}