Abstract
LSTM and attention mechanism have been widely used for scene text recognition. However, the existing LSTM-based recognizers usually convert 2D feature maps into 1D space by flattening or pooling operations, resulting in the neglect of spatial information of text images. Additionally, the attention drift problem, where models fail to align targets at proper feature regions, has a serious impact on the recognition performance of existing models. To tackle the above problems, in this paper, we propose a scene text Recognizer with Encoded Location and Focused Attention, i.e., ReELFA. Our ReELFA utilizes one-hot encoded coordinates to indicate the spatial relationship of pixels and character center masks to help focus attention on the right feature areas. Experiments conducted on the benchmarking datasets IIIT5K, SVT, CUTE and IC15 demonstrate that the proposed method achieves comparable performance on the regular, low-resolution and noisy text images, and outperforms state-of-the-art approaches on the more challenging curved text images.
Original language | English |
---|---|
Pages | 71-76 |
Number of pages | 6 |
DOIs | |
Publication status | Published - 2019 |
Externally published | Yes |
Event | 2nd International Workshop on Machine Learning, WML 2019 - ICDAR 2019 Workshop - Sydney, Australia Duration: 21 Sept 2019 → 22 Sept 2019 |
Conference
Conference | 2nd International Workshop on Machine Learning, WML 2019 - ICDAR 2019 Workshop |
---|---|
Country/Territory | Australia |
City | Sydney |
Period | 21/09/19 → 22/09/19 |
Keywords
- Attention LSTM
- Attention drift
- Center masks
- Encoded location
ASJC Scopus subject areas
- Artificial Intelligence
- Computer Vision and Pattern Recognition
- Signal Processing
- Media Technology