Yiyu Wang

Yiyu Wang ๐Ÿ‘‹
็Ž‹ไธ€ๅฎ‡ ยท Joint Ph.D. @ HKUST(GZ) & SJTU
I am a joint Ph.D. student at HKUST(GZ) and SJTU, advised by Prof. Xuming Hu and Prof. Linfeng Zhang. Currently a Research Intern at Tencent.

My research focuses on Streaming Video Understanding โ€” building systems that perceive, reason, and act under real-time constraints with tight token budgets and precise temporal grounding.
๐Ÿซ HKUST(GZ) ๐Ÿซ SJTU ๐Ÿ’ผ Tencent ๐ŸŽฅ Streaming โšก Efficient VideoLLM
๐Ÿ“ก
Streaming Video Real-time perception & reasoning
โšก
Token Compression Efficient visual encoding
๐Ÿ“Š
Data-Centric AI Quality over quantity
๐Ÿ“„
0 Papers
โœ…
0 Accepted
๐Ÿ“
0 Under Review

๐Ÿ”ฅ News & Updates

ACL 2026 ๐Ÿ“ Under Review โ€” VTC-Bench submitted to ACL 2026.
ECCV 2026 ๐Ÿ“ Under Review โ€” V-CAST submitted to ECCV 2026.
CVPR 2026 ๐Ÿ† Accepted โ€” STC accepted to CVPR 2026.
EMNLP 2025 ๐Ÿ† Accepted โ€” VidComยฒ accepted to EMNLP 2025.

๐Ÿ“ Publications

* denotes equal contribution, โ€  denotes corresponding author. View full list โ†’

Accepted 3

โญ First Author

Accelerating Streaming Video Large Language Models via Hierarchical Token Compression

Yiyu Wang, Xuyang Liu, Xiyan Gui, Xinying Lin, Boxue Yang, Chenfei Liao, Tailai Chen, and Linfeng Zhang.
CVPR 2026
We propose Streaming Token Compression (STC), the first plug-and-play hierarchical token compression framework for streaming VideoLLMs. Introduces STC-Cacher and STC-Pruner. Retains up to 99% accuracy while reducing ViT encoding latency and LLM pre-filling latency by 24.5% and 45.3%.
๐Ÿ“‹ Citation
Yiyu Wang, Xuyang Liu, Xiyan Gui, Xinying Lin, Boxue Yang, Chenfei Liao, Tailai Chen, and Linfeng Zhang. (2026). "Accelerating Streaming Video Large Language Models via Hierarchical Token Compression." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
๐Ÿ“ BibTeX
@inproceedings{wang2026stc,
  title={Accelerating Streaming Video Large Language Models via Hierarchical Token Compression},
  author={Wang, Yiyu and Liu, Xuyang and Gui, Xiyan and Lin, Xinying and Yang, Boxue and Liao, Chenfei and Chen, Tailai and Zhang, Linfeng},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year={2026}
}

Variation-aware Vision Token Dropping for Faster Large Vision-Language Models

Junjie Chen, Xuyang Liu, Zichen Wen, Yiyu Wang, Siteng Huang, and Honggang Chen.
CVPR 2026
We propose V2Drop, which progressively removes visual tokens with minimal variation during LVLM inference, maintaining 94.0% and 98.6% of original performance for image and video tasks respectively, while reducing LLM generation latency by 31.5% and 74.2%.
๐Ÿ“‹ Citation
Junjie Chen, Xuyang Liu, Zichen Wen, Yiyu Wang, Siteng Huang, and Honggang Chen. (2026). "Variation-aware Vision Token Dropping for Faster Large Vision-Language Models." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
๐Ÿ“ BibTeX
@inproceedings{chen2026v2drop,
  title={Variation-aware Vision Token Dropping for Faster Large Vision-Language Models},
  author={Chen, Junjie and Liu, Xuyang and Wen, Zichen and Wang, Yiyu and Huang, Siteng and Chen, Honggang},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year={2026}
}
โญ First Author

Video Compression Commander: Plug-and-Play Inference Acceleration for Video Large Language Models

Xuyang Liu*, Yiyu Wang*, Junpeng Ma, and Linfeng Zhang.
EMNLP 2025 Main
We propose VidComยฒ, a plug-and-play inference acceleration framework for VideoLLMs that adaptively adjusts compression intensity across frames. Achieved 99.6% performance retention with only 25% tokens and 70.8% latency reduction.
๐Ÿ“‹ Citation
Xuyang Liu, Yiyu Wang, Junpeng Ma, and Linfeng Zhang. (2025). "Video Compression Commander: Plug-and-Play Inference Acceleration for Video Large Language Models." Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP).
๐Ÿ“ BibTeX
@inproceedings{liu2025vidcom2,
  title={Video Compression Commander: Plug-and-Play Inference Acceleration for Video Large Language Models},
  author={Liu, Xuyang and Wang, Yiyu and Ma, Junpeng and Zhang, Linfeng},
  booktitle={Proceedings of the Conference on Empirical Methods in Natural Language Processing},
  year={2025}
}

Under Review 4

VTC-Bench: Are We Using the Right Benchmark? An Evaluation Framework for Visual Token Compression Methods

Chenfei Liao, Wensong Wang, Zichen Wen, Xu Zheng, Yiyu Wang, Haocong He, Yuanhuiyi Lyu, Lutao Jiang, Xin Zou, Yuqian Fu, Bin Ren, Linfeng Zhang, and Xuming Hu.
ACL 2026 (Under Review)
We propose VTC-Bench, the first comprehensive evaluation framework for visual token compression methods across image and video understanding tasks, revealing critical insights about current benchmarks.
๐Ÿ“‹ Citation
Chenfei Liao, Wensong Wang, Zichen Wen, Xu Zheng, Yiyu Wang, Haocong He, Yuanhuiyi Lyu, Lutao Jiang, Xin Zou, Yuqian Fu, Bin Ren, Linfeng Zhang, and Xuming Hu. (2026). "Are We Using the Right Benchmark: An Evaluation Framework for Visual Token Compression Methods." Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL).
๐Ÿ“ BibTeX
@inproceedings{liao2026vtc,
  title={Are We Using the Right Benchmark: An Evaluation Framework for Visual Token Compression Methods},
  author={Liao, Chenfei and Wang, Wensong and Wen, Zichen and Zheng, Xu and Wang, Yiyu and He, Haocong and Lyu, Yuanhuiyi and Jiang, Lutao and Zou, Xin and Fu, Yuqian and Ren, Bin and Zhang, Linfeng and Hu, Xuming},
  booktitle={Proceedings of the Annual Meeting of the Association for Computational Linguistics},
  year={2026}
}

๐ŸŽ“ Academic Service

โœ๏ธ
Conference Reviewer
CVPR ECCV ACM MM EMNLP