ReELFA: A scene text recognizer with encoded location and focused attention

Qingqing Wang, Wenjing Jia, Xiangjian He, Yue Lu, Michael Blumenstein, Ye Huang, Shujing Lyu

Research output: Contribution to conferencePaperpeer-review

5 Citations (Scopus)

Abstract

LSTM and attention mechanism have been widely used for scene text recognition. However, the existing LSTM-based recognizers usually convert 2D feature maps into 1D space by flattening or pooling operations, resulting in the neglect of spatial information of text images. Additionally, the attention drift problem, where models fail to align targets at proper feature regions, has a serious impact on the recognition performance of existing models. To tackle the above problems, in this paper, we propose a scene text Recognizer with Encoded Location and Focused Attention, i.e., ReELFA. Our ReELFA utilizes one-hot encoded coordinates to indicate the spatial relationship of pixels and character center masks to help focus attention on the right feature areas. Experiments conducted on the benchmarking datasets IIIT5K, SVT, CUTE and IC15 demonstrate that the proposed method achieves comparable performance on the regular, low-resolution and noisy text images, and outperforms state-of-the-art approaches on the more challenging curved text images.

Original languageEnglish
Pages71-76
Number of pages6
DOIs
Publication statusPublished - 2019
Externally publishedYes
Event2nd International Workshop on Machine Learning, WML 2019 - ICDAR 2019 Workshop - Sydney, Australia
Duration: 21 Sept 201922 Sept 2019

Conference

Conference2nd International Workshop on Machine Learning, WML 2019 - ICDAR 2019 Workshop
Country/TerritoryAustralia
CitySydney
Period21/09/1922/09/19

Keywords

  • Attention LSTM
  • Attention drift
  • Center masks
  • Encoded location

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Vision and Pattern Recognition
  • Signal Processing
  • Media Technology

Fingerprint

Dive into the research topics of 'ReELFA: A scene text recognizer with encoded location and focused attention'. Together they form a unique fingerprint.

Cite this