Posts by Collection



Data Processing Methods of Flow Field Based on Artificial Lateral Line Pressure Sensors

Published in Journal of Bionic Engineering, 2022

The estimation of the type and parameter of flow field is important for robotic fish. Recent estimation methods cannot meet the requirements of the robotic fish due to the lack of prior knowledge or the under-fitting of the model. A processing method including data preprocessing, feature extraction, feature selection, flow type classification and flow field parameters estimation, is proposed based on the data of the pressure sensors in an artificial lateral line. Probabilistic Neural Network (PNN) is used to classify the flow field type and the Generalized Regressive Neural Network (GRNN) is the best choice for estimating the flow field parameters. Also, a few filtering methods for data preprocessing, three methods for feature selection and nine parameters estimation methods are analysis for choosing better method. The proposed method is verified by the experiments with both simulation and real data.

LayoutDIT: Layout-Aware End-to-End Document Image Translation with Multi-Step Conductive Decoder

Published in EMNLP 2023 Findings, 2023

Document image translation (DIT) aims to translate text embedded in images from one language to another. It is a challenging task that needs to understand visual layout with text semantics simultaneously. However, existing methods struggle to capture the crucial visual layout in real-world complex document images. In this work, we make the first attempt to incorporate layout knowledge into DIT in an end-to-end way. Specifically, we propose a novel Layout-aware end-to-end Document Image Translation (LayoutDIT) with multi-step conductive decoder. A layout-aware encoder is first introduced to model visual layout relations with raw OCR results. Then a novel multi-step conductive decoder is unified with hidden states conduction across three step-decoders to achieve the document translation step by step. Benefiting from the layout-aware end-to-end joint training, our LayoutDIT outperforms state-of-the-art methods with better parameter efficiency. Besides, we create a new multi-domain document image translation dataset to validate the model’s generalization. Extensive experiments show that LayoutDIT has a good generalization in diverse and complex layout scenes.

Document Image Machine Translation with Dynamic Multi-pre-trained Models Assembling

Published in NAACL 2024 Main, 2024

Text image machine translation (TIMT) is a task that translates source texts embedded in the image to target translations. The existing TIMT task mainly focuses on text-line-level images. In this paper, we extend the current TIMT task and propose a novel task, Document Image Machine Translation to Markdown (DIMT2Markdown), which aims to translate a source document image with long context and complex layout structure to markdown-formatted target translation. We also introduce a novel framework, Document Image Machine Translation with Dynamic multi-pre-trained models Assembling (DIMTDA). A dynamic model assembler is used to integrate multiple pre-trained models to enhance the model’s understanding of layout and translation capabilities. Moreover, we build a novel large-scale Document image machine Translation dataset of ArXiv articles in markdown format (DoTA), containing 126K image-translation pairs. Extensive experiments demonstrate the feasibility of end-to-end translation of rich-text document images and the effectiveness of DIMTDA.



Teaching experience 1

Undergraduate course, University 1, Department, 2014

This is a description of a teaching experience. You can use markdown like any other post.

Teaching experience 2

Workshop, University 1, Department, 2015

This is a description of a teaching experience. You can use markdown like any other post.