Yiyu Wang
Streaming Video Understanding / Efficient VideoLLMs / VLM Game Agents
Yiyu Wang
王一宇 · Joint Ph.D. Student at HKUST(GZ) & SJTU
I am a joint Ph.D. student at HKUST(GZ) and SJTU, advised by Prof. Xuming Hu and Prof. Linfeng Zhang. I am currently a Research Intern at Tencent. My research builds efficient multimodal systems for real-time video understanding, visual token compression, and interactive VLM game agents.
Streaming Video
Real-time perception and temporal reasoning for long-horizon video streams.
Token Compression
Efficient visual encoding for VideoLLMs under latency and memory budgets.
VLM Game Agents
UE5 benchmarks and improvement dynamics for interactive decision making.
News & Updates
Selected Publications
* denotes equal contribution, † denotes corresponding author. View full publication list
Accepted 4
Accelerating Streaming Video Large Language Models via Hierarchical Token Compression
CVPR 2026
We propose Streaming Token Compression (STC), a plug-and-play hierarchical token compression framework for streaming VideoLLMs that reduces both ViT encoding and LLM pre-filling latency.
@inproceedings{wang2026stc,
title={Accelerating Streaming Video Large Language Models via Hierarchical Token Compression},
author={Wang, Yiyu and Liu, Xuyang and Gui, Xiyan and Lin, Xinying and Yang, Boxue and Liao, Chenfei and Chen, Tailai and Zhang, Linfeng},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
year={2026}
}Variation-aware Vision Token Dropping for Faster Large Vision-Language Models
CVPR 2026
V2Drop progressively removes visual tokens with minimal variation during LVLM inference, preserving image and video performance while reducing generation latency.
@inproceedings{chen2026v2drop,
title={Variation-aware Vision Token Dropping for Faster Large Vision-Language Models},
author={Chen, Junjie and Liu, Xuyang and Wen, Zichen and Wang, Yiyu and Huang, Siteng and Chen, Honggang},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
year={2026}
}Video Compression Commander: Plug-and-Play Inference Acceleration for Video Large Language Models
EMNLP 2025 Main
VidCom² adaptively adjusts compression intensity across video frames, retaining performance with a much smaller token budget and lower latency.
@inproceedings{liu2025vidcom2,
title={Video Compression Commander: Plug-and-Play Inference Acceleration for Video Large Language Models},
author={Liu, Xuyang and Wang, Yiyu and Ma, Junpeng and Zhang, Linfeng},
booktitle={Proceedings of the Conference on Empirical Methods in Natural Language Processing},
year={2025}
}VTC-Bench: Are We Using the Right Benchmark? An Evaluation Framework for Visual Token Compression Methods
ACL 2026
VTC-Bench provides a systematic evaluation framework for visual token compression across image, video, and long-context understanding tasks.
@inproceedings{liao2026vtc,
title={Are We Using the Right Benchmark: An Evaluation Framework for Visual Token Compression Methods},
author={Liao, Chenfei and Wang, Wensong and Wen, Zichen and Zheng, Xu and Wang, Yiyu and He, Haocong and Lyu, Yuanhuiyi and Jiang, Lutao and Zou, Xin and Fu, Yuqian and Ren, Bin and Zhang, Linfeng and Hu, Xuming},
booktitle={Proceedings of the Annual Meeting of the Association for Computational Linguistics},
year={2026}
}Latest Preprint 1
OmniGameArena: A Unified UE5 Benchmark for VLM Game Agents with Improvement Dynamics
arXiv 2026
OmniGameArena introduces twelve UE5 games for VLM game agents and evaluates improvement dynamics under iterative agentic reflection.
@article{lin2026omnigamearena,
title={OmniGameArena: A Unified UE5 Benchmark for VLM Game Agents with Improvement Dynamics},
author={Lin, Mingxian and Qian, Shengju and Liu, Yuqi and Huang, Yi-Hua and Wang, Yiyu and Huang, Wei and Li, Yitang and Zhang, Fan and Hu, Zeyu and Zhu, Lingting and Wang, Xin and Qi, Xiaojuan},
journal={arXiv preprint arXiv:2606.09826},
year={2026}
}
. My research builds efficient multimodal systems for real-time video understanding, visual token compression, and interactive VLM game agents.