Item

Attention Guidance by Cross-Domain Supervision Signals for Scene Text Recognition

Xue, Fanfu
Sun, Jiande
Xue, Yaqi
Wu, Qiang
Zhu, Lei
Chang, Xiaojun
Cheung, Sen-Ching
Supervisor
Department
Computer Vision
Embargo End Date
Type
Journal article
Date
2025
License
Language
English
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
Despite recent advances, scene text recognition remains a challenging problem due to the significant variability, irregularity and distortion in text appearance and localization. Attention-based methods have become the mainstream due to their superior vocabulary learning and observation ability. Nonetheless, they are susceptible to attention drift which can lead to word recognition errors. Most works focus on correcting attention drift in decoding but completely ignore the error accumulated during the encoding process. In this paper, we propose a novel scheme, called the Attention Guidance by Cross-Domain Supervision Signals for Scene Text Recognition (ACDS-STR), which can mitigate the attention drift at the feature encoding stage. At the heart of the proposed scheme is the cross-domain attention guidance and feature encoding fusion module (CAFM) that uses the core areas of characters to recursively guide attention to learn in the encoding process. With precise attention information sourced from CAFM, we propose a non-attention-based adaptive transformation decoder (ATD) to guarantee decoding performance and improve decoding speed. In the training stage, we fuse manual guidance and subjective learning to learn the core areas of characters, which notably augments the recognition performance of the model. Experiments are conducted on public benchmarks and show the state-of-the-art performance. The source will be available at https://github.com/xuefanfu/ACDS-STR.
Citation
F. Xue et al., "Attention Guidance by Cross-Domain Supervision Signals for Scene Text Recognition," in IEEE Transactions on Image Processing, vol. 34, pp. 717-728, 2025, doi: 10.1109/TIP.2024.3523799
Source
IEEE Transactions on Image Processing
Conference
Keywords
Feature extraction, Text recognition, Decoding, Encoding, Image segmentation, Image coding, Transformers, Iron, Electronic mail, Computer vision
Subjects
Source
Publisher
IEEE
Full-text link