TY - GEN
T1 - Data-Driven Regular Expressions Evolution for Medical Text Classification Using Genetic Programming
AU - Liu, Jiandong
AU - Bai, Ruibin
AU - Lu, Zheng
AU - Ge, Peiming
AU - Aickelin, Uwe
AU - Liu, Daoyun
N1 - Publisher Copyright:
© 2020 IEEE.
PY - 2020/7
Y1 - 2020/7
N2 - In medical fields, text classification is one of the most important tasks that can significantly reduce human work-load through structured information digitization and intelligent decision support. Despite the popularity of learning-based text classification techniques, it is hard for human to understand or manually fine-tune the classification for better precision and recall, due to the black box nature of learning. This study proposes a novel regular expression-based text classification method making use of genetic programming (GP) approaches to evolve regular expressions that can classify a given medical text inquiry with satisfaction. Given a seed population of regular expressions (randomly initialized or manually constructed by experts), our method evolves a population of regular expressions, using a novel regular expression syntax and a series of carefully chosen reproduction operators. Our method is evaluated with real-life medical text inquiries from an online healthcare provider and shows promising performance. More importantly, our method generates classifiers that can be fully understood, checked and updated by medical doctors, which are fundamentally crucial for medical related practices.
AB - In medical fields, text classification is one of the most important tasks that can significantly reduce human work-load through structured information digitization and intelligent decision support. Despite the popularity of learning-based text classification techniques, it is hard for human to understand or manually fine-tune the classification for better precision and recall, due to the black box nature of learning. This study proposes a novel regular expression-based text classification method making use of genetic programming (GP) approaches to evolve regular expressions that can classify a given medical text inquiry with satisfaction. Given a seed population of regular expressions (randomly initialized or manually constructed by experts), our method evolves a population of regular expressions, using a novel regular expression syntax and a series of carefully chosen reproduction operators. Our method is evaluated with real-life medical text inquiries from an online healthcare provider and shows promising performance. More importantly, our method generates classifiers that can be fully understood, checked and updated by medical doctors, which are fundamentally crucial for medical related practices.
KW - cooccurrence matrix
KW - genetic programming
KW - text classification
UR - http://www.scopus.com/inward/record.url?scp=85092055831&partnerID=8YFLogxK
U2 - 10.1109/CEC48606.2020.9185500
DO - 10.1109/CEC48606.2020.9185500
M3 - Conference contribution
AN - SCOPUS:85092055831
T3 - 2020 IEEE Congress on Evolutionary Computation, CEC 2020 - Conference Proceedings
BT - 2020 IEEE Congress on Evolutionary Computation, CEC 2020 - Conference Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2020 IEEE Congress on Evolutionary Computation, CEC 2020
Y2 - 19 July 2020 through 24 July 2020
ER -