Convolutional neural network with spatial pyramid pooling for hand gesture recognition

Yong Soon Tan; Kian Ming Lim; Connie Tee; Chin Poo Lee; Cheng Yaw Low

doi:10.1007/s00521-020-05337-0

Convolutional neural network with spatial pyramid pooling for hand gesture recognition

Yong Soon Tan, Kian Ming Lim, Connie Tee, Chin Poo Lee, Cheng Yaw Low

Research output: Journal Publication › Article › peer-review

69 Citations (Scopus)

Abstract

Hand gesture provides a means for human to interact through a series of gestures. While hand gesture plays a significant role in human–computer interaction, it also breaks down the communication barrier and simplifies communication process between the general public and the hearing-impaired community. This paper outlines a convolutional neural network (CNN) integrated with spatial pyramid pooling (SPP), dubbed CNN–SPP, for vision-based hand gesture recognition. SPP is discerned mitigating the problem found in conventional pooling by having multi-level pooling stacked together to extend the features being fed into a fully connected layer. Provided with inputs of varying sizes, SPP also yields a fixed-length feature representation. Extensive experiments have been conducted to scrutinize the CNN–SPP performance on two well-known American sign language (ASL) datasets and one NUS hand gesture dataset. Our empirical results disclose that CNN–SPP prevails over other deep learning-driven instances.

Original language	English
Pages (from-to)	5339-5351
Number of pages	13
Journal	Neural Computing and Applications
Volume	33
Issue number	10
DOIs	https://doi.org/10.1007/s00521-020-05337-0
Publication status	Published - May 2021
Externally published	Yes

Keywords

Convolutional neural network (CNN)
Hand gesture recognition
Sign language recognition
Spatial pyramid pooling (SPP)

ASJC Scopus subject areas

Software
Artificial Intelligence

Access to Document

10.1007/s00521-020-05337-0

Cite this

@article{511fd3b63f1b4c3eae58a431d909fde6,

title = "Convolutional neural network with spatial pyramid pooling for hand gesture recognition",

abstract = "Hand gesture provides a means for human to interact through a series of gestures. While hand gesture plays a significant role in human–computer interaction, it also breaks down the communication barrier and simplifies communication process between the general public and the hearing-impaired community. This paper outlines a convolutional neural network (CNN) integrated with spatial pyramid pooling (SPP), dubbed CNN–SPP, for vision-based hand gesture recognition. SPP is discerned mitigating the problem found in conventional pooling by having multi-level pooling stacked together to extend the features being fed into a fully connected layer. Provided with inputs of varying sizes, SPP also yields a fixed-length feature representation. Extensive experiments have been conducted to scrutinize the CNN–SPP performance on two well-known American sign language (ASL) datasets and one NUS hand gesture dataset. Our empirical results disclose that CNN–SPP prevails over other deep learning-driven instances.",

keywords = "Convolutional neural network (CNN), Hand gesture recognition, Sign language recognition, Spatial pyramid pooling (SPP)",

author = "Tan, {Yong Soon} and Lim, {Kian Ming} and Connie Tee and Lee, {Chin Poo} and Low, {Cheng Yaw}",

note = "Publisher Copyright: {\textcopyright} 2020, Springer-Verlag London Ltd., part of Springer Nature.",

year = "2021",

month = may,

doi = "10.1007/s00521-020-05337-0",

language = "English",

volume = "33",

pages = "5339--5351",

journal = "Neural Computing and Applications",

issn = "0941-0643",

publisher = "Springer London",

number = "10",

}

TY - JOUR

T1 - Convolutional neural network with spatial pyramid pooling for hand gesture recognition

AU - Tan, Yong Soon

AU - Lim, Kian Ming

AU - Tee, Connie

AU - Lee, Chin Poo

AU - Low, Cheng Yaw

PY - 2021/5

Y1 - 2021/5

N2 - Hand gesture provides a means for human to interact through a series of gestures. While hand gesture plays a significant role in human–computer interaction, it also breaks down the communication barrier and simplifies communication process between the general public and the hearing-impaired community. This paper outlines a convolutional neural network (CNN) integrated with spatial pyramid pooling (SPP), dubbed CNN–SPP, for vision-based hand gesture recognition. SPP is discerned mitigating the problem found in conventional pooling by having multi-level pooling stacked together to extend the features being fed into a fully connected layer. Provided with inputs of varying sizes, SPP also yields a fixed-length feature representation. Extensive experiments have been conducted to scrutinize the CNN–SPP performance on two well-known American sign language (ASL) datasets and one NUS hand gesture dataset. Our empirical results disclose that CNN–SPP prevails over other deep learning-driven instances.

AB - Hand gesture provides a means for human to interact through a series of gestures. While hand gesture plays a significant role in human–computer interaction, it also breaks down the communication barrier and simplifies communication process between the general public and the hearing-impaired community. This paper outlines a convolutional neural network (CNN) integrated with spatial pyramid pooling (SPP), dubbed CNN–SPP, for vision-based hand gesture recognition. SPP is discerned mitigating the problem found in conventional pooling by having multi-level pooling stacked together to extend the features being fed into a fully connected layer. Provided with inputs of varying sizes, SPP also yields a fixed-length feature representation. Extensive experiments have been conducted to scrutinize the CNN–SPP performance on two well-known American sign language (ASL) datasets and one NUS hand gesture dataset. Our empirical results disclose that CNN–SPP prevails over other deep learning-driven instances.

KW - Convolutional neural network (CNN)

KW - Hand gesture recognition

KW - Sign language recognition

KW - Spatial pyramid pooling (SPP)

UR - http://www.scopus.com/inward/record.url?scp=85091021465&partnerID=8YFLogxK

U2 - 10.1007/s00521-020-05337-0

DO - 10.1007/s00521-020-05337-0

M3 - Article

AN - SCOPUS:85091021465

SN - 0941-0643

VL - 33

SP - 5339

EP - 5351

JO - Neural Computing and Applications

JF - Neural Computing and Applications

IS - 10

ER -

Convolutional neural network with spatial pyramid pooling for hand gesture recognition

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this