Yiyu Wang

Streaming Video Understanding / Efficient VideoLLMs / VLM Game Agents

Yiyu Wang

王一宇 · Joint Ph.D. Student at HKUST(GZ) & SJTU

I am a joint Ph.D. student at HKUST(GZ) and SJTU, advised by Prof. Xuming Hu and Prof. Linfeng Zhang. I am currently a Research Intern at Tencent. My research builds efficient multimodal systems for real-time video understanding, visual token compression, and interactive VLM game agents.

HKUST(GZ) SJTU Tencent Streaming Video Efficient VideoLLM VLM Game Agents

Streaming Video

Real-time perception and temporal reasoning for long-horizon video streams.

Token Compression

Efficient visual encoding for VideoLLMs under latency and memory budgets.

VLM Game Agents

UE5 benchmarks and improvement dynamics for interactive decision making.

0 Papers
0 Accepted
0 Under Review

News & Updates

ACL 2026 AcceptedVTC-Bench accepted to ACL 2026.
arXiv 2026 New PreprintOmniGameArena released for VLM game-agent evaluation.
ECCV 2026 Under ReviewV-CAST submitted to ECCV 2026.
CVPR 2026 AcceptedSTC accepted to CVPR 2026.
EMNLP 2025 AcceptedVidCom² accepted to EMNLP 2025.

Selected Publications

* denotes equal contribution, † denotes corresponding author. View full publication list

Accepted 4

First Author

Accelerating Streaming Video Large Language Models via Hierarchical Token Compression

Yiyu Wang, Xuyang Liu, Xiyan Gui, Xinying Lin, Boxue Yang, Chenfei Liao, Tailai Chen, and Linfeng Zhang.

CVPR 2026

We propose Streaming Token Compression (STC), a plug-and-play hierarchical token compression framework for streaming VideoLLMs that reduces both ViT encoding and LLM pre-filling latency.

Citation
Yiyu Wang, Xuyang Liu, Xiyan Gui, Xinying Lin, Boxue Yang, Chenfei Liao, Tailai Chen, and Linfeng Zhang. (2026). "Accelerating Streaming Video Large Language Models via Hierarchical Token Compression." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
BibTeX
@inproceedings{wang2026stc,
  title={Accelerating Streaming Video Large Language Models via Hierarchical Token Compression},
  author={Wang, Yiyu and Liu, Xuyang and Gui, Xiyan and Lin, Xinying and Yang, Boxue and Liao, Chenfei and Chen, Tailai and Zhang, Linfeng},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year={2026}
}

Variation-aware Vision Token Dropping for Faster Large Vision-Language Models

Junjie Chen, Xuyang Liu, Zichen Wen, Yiyu Wang, Siteng Huang, and Honggang Chen.

CVPR 2026

V2Drop progressively removes visual tokens with minimal variation during LVLM inference, preserving image and video performance while reducing generation latency.

Citation
Junjie Chen, Xuyang Liu, Zichen Wen, Yiyu Wang, Siteng Huang, and Honggang Chen. (2026). "Variation-aware Vision Token Dropping for Faster Large Vision-Language Models." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
BibTeX
@inproceedings{chen2026v2drop,
  title={Variation-aware Vision Token Dropping for Faster Large Vision-Language Models},
  author={Chen, Junjie and Liu, Xuyang and Wen, Zichen and Wang, Yiyu and Huang, Siteng and Chen, Honggang},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year={2026}
}
First Author

Video Compression Commander: Plug-and-Play Inference Acceleration for Video Large Language Models

Xuyang Liu*, Yiyu Wang*, Junpeng Ma, and Linfeng Zhang.

EMNLP 2025 Main

VidCom² adaptively adjusts compression intensity across video frames, retaining performance with a much smaller token budget and lower latency.

Citation
Xuyang Liu, Yiyu Wang, Junpeng Ma, and Linfeng Zhang. (2025). "Video Compression Commander: Plug-and-Play Inference Acceleration for Video Large Language Models." Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP).
BibTeX
@inproceedings{liu2025vidcom2,
  title={Video Compression Commander: Plug-and-Play Inference Acceleration for Video Large Language Models},
  author={Liu, Xuyang and Wang, Yiyu and Ma, Junpeng and Zhang, Linfeng},
  booktitle={Proceedings of the Conference on Empirical Methods in Natural Language Processing},
  year={2025}
}

VTC-Bench: Are We Using the Right Benchmark? An Evaluation Framework for Visual Token Compression Methods

Chenfei Liao, Wensong Wang, Zichen Wen, Xu Zheng, Yiyu Wang, Haocong He, Yuanhuiyi Lyu, Lutao Jiang, Xin Zou, Yuqian Fu, Bin Ren, Linfeng Zhang, and Xuming Hu.

ACL 2026

VTC-Bench provides a systematic evaluation framework for visual token compression across image, video, and long-context understanding tasks.

Citation
Chenfei Liao, Wensong Wang, Zichen Wen, Xu Zheng, Yiyu Wang, Haocong He, Yuanhuiyi Lyu, Lutao Jiang, Xin Zou, Yuqian Fu, Bin Ren, Linfeng Zhang, and Xuming Hu. (2026). "Are We Using the Right Benchmark: An Evaluation Framework for Visual Token Compression Methods." Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL).
BibTeX
@inproceedings{liao2026vtc,
  title={Are We Using the Right Benchmark: An Evaluation Framework for Visual Token Compression Methods},
  author={Liao, Chenfei and Wang, Wensong and Wen, Zichen and Zheng, Xu and Wang, Yiyu and He, Haocong and Lyu, Yuanhuiyi and Jiang, Lutao and Zou, Xin and Fu, Yuqian and Ren, Bin and Zhang, Linfeng and Hu, Xuming},
  booktitle={Proceedings of the Annual Meeting of the Association for Computational Linguistics},
  year={2026}
}

Latest Preprint 1

OmniGameArena: A Unified UE5 Benchmark for VLM Game Agents with Improvement Dynamics

Mingxian Lin, Shengju Qian, Yuqi Liu, Yi-Hua Huang, Yiyu Wang, Wei Huang, Yitang Li, Fan Zhang, Zeyu Hu, Lingting Zhu, Xin Wang, and Xiaojuan Qi.

arXiv 2026

OmniGameArena introduces twelve UE5 games for VLM game agents and evaluates improvement dynamics under iterative agentic reflection.

Citation
Mingxian Lin, Shengju Qian, Yuqi Liu, Yi-Hua Huang, Yiyu Wang, Wei Huang, Yitang Li, Fan Zhang, Zeyu Hu, Lingting Zhu, Xin Wang, and Xiaojuan Qi. (2026). "OmniGameArena: A Unified UE5 Benchmark for VLM Game Agents with Improvement Dynamics." arXiv preprint arXiv:2606.09826.
BibTeX
@article{lin2026omnigamearena,
  title={OmniGameArena: A Unified UE5 Benchmark for VLM Game Agents with Improvement Dynamics},
  author={Lin, Mingxian and Qian, Shengju and Liu, Yuqi and Huang, Yi-Hua and Wang, Yiyu and Huang, Wei and Li, Yitang and Zhang, Fan and Hu, Zeyu and Zhu, Lingting and Wang, Xin and Qi, Xiaojuan},
  journal={arXiv preprint arXiv:2606.09826},
  year={2026}
}

Academic Service

Conference Reviewer
CVPR ECCV ACM MM EMNLP