About Me
Yupu Liang is a Ph.D. student in Chinese Information Processing Group (CIP), Institute of Automation, Chinese Academy of Sciences, advised by Yu Zhou and Chengqing Zong. Before that, he received his B.E. degree in School of Automation Science and Electrical Engineering at Beihang University in 2021.
His research focuses on multimodal large language model, document AI, and machine translation.
News
[2025.05] π Our new work Reading when Translating: Multi-Modal Document Image Machine Translation with Reading Flow Prediction has been accepted by TASLP.
[2025.05] π οΈ The code for SSR has been released. [Github]
[2025.05] π οΈ The code for M4Doc has been released. [Github]
[2025.05] π Our new work Single-to-mix Modality Alignment with Multimodal Large Language Model for Document Image Machine Translation has been accepted by ACL 2025 Main.
[2025.05] π Our new works Improving MLLMβs Document Image Machine Translation via Synchronously Self-reviewing Its OCR Proficiency and A Query-Response Framework for Whole-Page Complex-Layout Document Image Translation with Relevant Regional Concentration have been accepted by ACL 2025 Findings.
[2025.01] π Our new work From Chaotic OCR Words to Coherent Document: A Fine-to-Coarse Zoom-Out Network for Complex-Layout Document Image Translation has been accepted by COLING 2025.
[2025.01] π Our new work Understand Layout and Translate Text: Unified Feature-Conductive End-to-End Document Image Translation has been accepted by TPAMI.
[2024.04] π οΈ The DoTA dataset and code for DIMTDA have been released. [Github]
[2024.03] π Our new work Document Image Machine Translation with Dynamic Multi-pre-trained Models Assembling has been accepted by NAACL 2024 Main.
[2024.02] π Our new work Born a BabNet with Hierarchical Parental Supervision for End-to-End Text Image Machine Translation has been accepted by LREC-COLING 2024.
[2023.10] π Our new work LayoutDIT: Layout-Aware End-to-End Document Image Translation with Multi-Step Conductive Decoder has been accepted by EMNLP 2023 Findings.
Publications
- Yupu Liang, Yaping Zhang, Zhiyang Zhang, Yang Zhao, Lu Xiang, Chengqing Zong, Yu Zhou. Single-to-mix Modality Alignment with Multimodal Large Language Model for Document Image Machine Translation. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (ACL 2025). 2025.
- Yupu Liang, Yaping Zhang, Zhiyang Zhang, Zhiyuan Chen, Yang Zhao, Lu Xiang, Chengqing Zong, Yu Zhou. Improving MLLMβs Document Image Machine Translation via Synchronously Self-reviewing Its OCR Proficiency. In Findings of the 63rd Annual Meeting of the Association for Computational Linguistics (ACL 2025). 2025.
- Yupu Liang, Yaping Zhang, Cong Ma, Zhiyang Zhang, Yang Zhao, Lu Xiang, Chengqing Zong, Yu Zhou. Document Image Machine Translation with Dynamic Multi-pre-trained Models Assembling. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL 2024). 2024. [ACL Anthology]
- Zhiyang Zhang, Yaping Zhang, Yupu Liang, Cong Ma, Lu Xiang, Yang Zhao, Yu Zhou, Chengqing Zong, Reading when Translating: Multi-Modal Document Image Machine Translation with Reading Flow Prediction. IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP). 2025.
- Zhiyang Zhang, Yaping Zhang, Yupu Liang, Zhiyuan Chen, Lu Xiang, Yang Zhao, Yu Zhou, Chengqing Zong. A Query-Response Framework for Whole-Page Complex-Layout Document Image Translation with Relevant Regional Concentration. In Findings of the 63rd Annual Meeting of the Association for Computational Linguistics (ACL 2025). 2025.
- Zhiyang Zhang, Yaping Zhang, Yupu Liang, Lu Xiang, Yang Zhao, Yu Zhou, Chengqing Zong. From Chaotic OCR Words to Coherent Document: A Fine-to-Coarse Zoom-Out Network for Complex-Layout Document Image Translation. In Proceedings of the 31st International Conference on Computational Linguistics (COLING 2025). 2025. [ACL Anthology]
- Zhiyang Zhang, Yaping Zhang, Yupu Liang, Cong Ma, Lu Xiang, Yang Zhao, Yu Zhou, Chengqing Zong. Understand Layout and Translate Text: Unified Feature-Conductive End-to-End Document Image Translation. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI). 2025. [IEEE]
- Cong Ma, Yaping Zhang, Zhiyang Zhang, Yupu Liang, Yang Zhao, Yu Zhou, Chengqing Zong. Born a BabNet with Hierarchical Parental Supervision for End-to-End Text Image Machine Translation. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024). 2024. [ACL Anthology]
- Zhiyang Zhang, Yaping Zhang, Yupu Liang, Lu Xiang, Yang Zhao, Yu Zhou, Chengqing Zong. LayoutDIT: Layout-Aware End-to-End Document Image Translation with Multi-Step Conductive Decoder. In Findings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP 2023). 2023. [ACL Anthology]
- Bing Sun, Yi Xu, Shuhang Xie, Dong Xu, Yupu Liang. Data Processing Methods of Flow Field Based on Artificial Lateral Line Pressure Sensors. Journal of Bionic Engineering (JBE). 2022. [Springer]
Educations
[2021.09 - Now] Ph.D. Computer Science, Institute of Automation, Chinese Academy of Sciences
- Outstanding Student
[2017.09 - 2021.06] Undergraduate. Automation, School of Automation Science and Electrical Engineering, Beihang University
- National Scholarship
- Outstanding Graduate of Beijing
Interns
[2024.06 - 2025.02] Huawei, Celia Department
Explore the application of end-to-end text image machine translation models in mobile photography scenarios.
Moreβ¦
The profile is generated by Stable Diffusion.π