Publications

Expanding mmWave Datasets for Human Pose Estimation with Unlabeled Data and LiDAR Datasets

Published in CVPR, 2026

EMDUL improves mmWave human pose estimation by expanding datasets with pseudo-labeled and LiDAR-translated point clouds, significantly boosting accuracy and generalization.

Recommended citation: Expanding mmWave Datasets for Human Pose Estimation with Unlabeled Data and LiDAR Datasets, Zhuoxuan Peng, Boan Zhu, Xingjian Zhang, Wenying Li, S.-H. Gary Chan, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2026

Revisiting Referring Expression Comprehension Evaluation in the Era of Large Multimodal Models

Published in BEAM (CVPR Workshop), 2025

This study identifies limitations in existing REC benchmarks and introduces Ref-L4, a comprehensive benchmark with diverse objects, longer expressions, and a larger vocabulary, to better evaluate modern REC models.

Recommended citation: Chen, J., Wei, F., Zhao, J., Song, S., Wu, B., Peng, Z., Chan, S.G., & Zhang, H. (2024). Revisiting Referring Expression Comprehension Evaluation in the Era of Large Multimodal Models. 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 513-524.

Single Domain Generalization for Crowd Counting

Published in CVPR, 2024

We propose MPCount to tackle the problem of regression nature and label ambiguity for single domain generalization for crowd counting.

Recommended citation: Single Domain Generalization for Crowd Counting, Zhuoxuan Peng, S.-H. Gary Chan, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024

CounTr: A Novel End-to-End Transformer Approach for Single Image Crowd Counting

Published in IWDSC (ECCV Workshop), 2022

We introduce CounTr, a novel end-to-end transformer approach for crowd counting and density estimation, which enables capture global context in every layer of the Transformer.

Recommended citation: Bai, H., He, H., Peng, Z., Dai, T., Chan, SH.G. (2023). CounTr: An End-to-End Transformer Approach for Crowd Counting and Density Estimation. In: Karlinsky, L., Michaeli, T., Nishino, K. (eds) Computer Vision – ECCV 2022 Workshops. ECCV 2022. Lecture Notes in Computer Science, vol 13806. Springer, Cham. https://doi.org/10.1007/978-3-031-25075-0_16

PENG Zhuoxuan

Publications

Expanding mmWave Datasets for Human Pose Estimation with Unlabeled Data and LiDAR Datasets

Revisiting Referring Expression Comprehension Evaluation in the Era of Large Multimodal Models

Single Domain Generalization for Crowd Counting

CounTr: A Novel End-to-End Transformer Approach for Single Image Crowd Counting