I am a Ph.D. student in Chinese Information Processing Group (CIP), Institute of Automation, Chinese Academy of Sciences, advised by Yu Zhou and Chengqing Zong and expect to graduate in June 2026. Before that, I received my B.E. degree in School of Automation Science and Electrical Engineering at Beihang University in 2021.
My research focuses on multimodal large language model, document AI (document QA, understanding, and reasoning), and machine translation. My research has led to several publications at top AI conferences and journals, including ACL, NAACL, EMNLP, TASLP, and TPAMI. I am currently serving as a reviewer for top conferences and journals, including ACL, NeurIPS, TASLP, etc.
We are organizing the first ICDAR 2025 Competition on End-to-end Document Image Machine Translation [DIMT25@ICDAR].
Outside of research, I enjoy playing the erhu (a traditional Chinese instrument), badminton๐ธ, running๐๐ป, and swimming๐๐ป. I am a member of the string section in the student Chinese orchestra of Beihang University and University of Chinese Academy of Sciences. I also manage a Bilibili account [1398ๅท็ๅฌๅ] where I share my reviews on anime, movies, and books. Feel free to follow!
News
[2025.07] ๐ค I give a talk about SSR online [Bilibili].
[2025.05] ๐ Our new work DocRTN has been accepted by TASLP.
[2025.05] ๐ ๏ธ The code for SSR has been released. [Github]
[2025.05] ๐ ๏ธ The code for M4Doc has been released. [Github]
[2025.05] ๐ Our new work M4Doc has been accepted by ACL 2025 Main.
[2025.05] ๐ Our new works SSR and QRDIT have been accepted by ACL 2025 Findings.
[2025.03] ๐ Submission site of DIMT25@ICDAR opens.
[2025.01] ๐ Our new work ZoomDIT has been accepted by COLING 2025.
[2025.01] ๐ Our new work UniDIT has been accepted by TPAMI.
[2024.04] ๐ ๏ธ The DoTA dataset and code for DIMTDA have been released. [Github]
[2024.03] ๐ Our new work DIMTDA has been accepted by NAACL 2024 Main.
[2024.02] ๐ Our new work BabNet has been accepted by LREC-COLING 2024.
[2023.10] ๐ Our new work LayoutDIT has been accepted by EMNLP 2023 Findings.
Publications
- Yupu Liang, Yaping Zhang, Zhiyang Zhang, Yang Zhao, Lu Xiang, Chengqing Zong, Yu Zhou. Single-to-mix Modality Alignment with Multimodal Large Language Model for Document Image Machine Translation. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (ACL 2025). 2025. [ACL Anthology]
- Yupu Liang, Yaping Zhang, Zhiyang Zhang, Zhiyuan Chen, Yang Zhao, Lu Xiang, Chengqing Zong, Yu Zhou. Improving MLLMโs Document Image Machine Translation via Synchronously Self-reviewing Its OCR Proficiency. Findings of the 63rd Annual Meeting of the Association for Computational Linguistics (ACL 2025). 2025. [ACL Anthology]
- Yupu Liang, Yaping Zhang, Cong Ma, Zhiyang Zhang, Yang Zhao, Lu Xiang, Chengqing Zong, Yu Zhou. Document Image Machine Translation with Dynamic Multi-pre-trained Models Assembling. Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL 2024). 2024. [ACL Anthology]
- Zhiyang Zhang, Yaping Zhang, Yupu Liang, Cong Ma, Lu Xiang, Yang Zhao, Yu Zhou, Chengqing Zong, Reading when Translating: Multi-Modal Document Image Machine Translation with Reading Flow Prediction. IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP). 2025. [IEEE]
- Zhiyang Zhang, Yaping Zhang, Yupu Liang, Zhiyuan Chen, Lu Xiang, Yang Zhao, Yu Zhou, Chengqing Zong. A Query-Response Framework for Whole-Page Complex-Layout Document Image Translation with Relevant Regional Concentration. Findings of the 63rd Annual Meeting of the Association for Computational Linguistics (ACL 2025). 2025. [ACL Anthology]
- Zhiyang Zhang, Yaping Zhang, Yupu Liang, Lu Xiang, Yang Zhao, Yu Zhou, Chengqing Zong. From Chaotic OCR Words to Coherent Document: A Fine-to-Coarse Zoom-Out Network for Complex-Layout Document Image Translation. Proceedings of the 31st International Conference on Computational Linguistics (COLING 2025). 2025. [ACL Anthology]
- Zhiyang Zhang, Yaping Zhang, Yupu Liang, Cong Ma, Lu Xiang, Yang Zhao, Yu Zhou, Chengqing Zong. Understand Layout and Translate Text: Unified Feature-Conductive End-to-End Document Image Translation. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI). 2025. [IEEE]
- Cong Ma, Yaping Zhang, Zhiyang Zhang, Yupu Liang, Yang Zhao, Yu Zhou, Chengqing Zong. Born a BabNet with Hierarchical Parental Supervision for End-to-End Text Image Machine Translation. Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024). 2024. [ACL Anthology]
- Zhiyang Zhang, Yaping Zhang, Yupu Liang, Lu Xiang, Yang Zhao, Yu Zhou, Chengqing Zong. LayoutDIT: Layout-Aware End-to-End Document Image Translation with Multi-Step Conductive Decoder. Findings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP 2023). 2023. [ACL Anthology]
- Bing Sun, Yi Xu, Shuhang Xie, Dong Xu, Yupu Liang. Data Processing Methods of Flow Field Based on Artificial Lateral Line Pressure Sensors. Journal of Bionic Engineering (JBE). 2022. [Springer]
Educations
[2021.09 - Now] Ph.D. Computer Science, Institute of Automation, Chinese Academy of Sciences
- Outstanding Student
[2017.09 - 2021.06] Undergraduate. Automation, School of Automation Science and Electrical Engineering, Beihang University
- National Scholarship
- Outstanding Graduate of Beijing
Interns
[2025.06 - Now] Tencent, Hunyuan Large Language Model Department
- Explore methods to improve LLMโs ability to understand long documents in the pre-training and post-training stages
[2025.03 - 2025.05] Xiaohongshu (RedNote), Applied Algorithms Department
- Explore the use of reinforcement learning to improve the image translation capabilities of MLLM
[2024.06 - 2025.02] Huawei, Celia Department
- Explore the application of end-to-end image translation model in mobile phone photography scenarios