VTC-Bench: Are We Using the Right Benchmark? An Evaluation Framework for Visual Token Compression Methods
Published in ACL 2026 (Under Review), 2025
Visual token compression has emerged as a key technique for accelerating multi-modal large language models. However, existing benchmarks often fail to comprehensively evaluate these methods across diverse scenarios. We propose VTC-Bench, a systematic evaluation framework that assesses visual token compression methods across image understanding, video understanding, and long-context tasks, providing critical insights for future research directions.
Recommended citation: Chenfei Liao, Wensong Wang, Zichen Wen, Xu Zheng, Yiyu Wang, Haocong He, Yuanhuiyi Lyu, Lutao Jiang, Xin Zou, Yuqian Fu, Bin Ren, Linfeng Zhang, and Xuming Hu. (2026). "Are We Using the Right Benchmark: An Evaluation Framework for Visual Token Compression Methods." Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL).
@inproceedings{liao2026vtc,
title={Are We Using the Right Benchmark: An Evaluation Framework for Visual Token Compression Methods},
author={Liao, Chenfei and Wang, Wensong and Wen, Zichen and Zheng, Xu and Wang, Yiyu and He, Haocong and Lyu, Yuanhuiyi and Jiang, Lutao and Zou, Xin and Fu, Yuqian and Ren, Bin and Zhang, Linfeng and Hu, Xuming},
booktitle={Proceedings of the Annual Meeting of the Association for Computational Linguistics},
year={2026}
}