Data-Driven Regular Expressions Evolution for Medical Text Classification Using Genetic Programming

Jiandong Liu, Ruibin Bai, Zheng Lu, Peiming Ge, Uwe Aickelin, Daoyun Liu

Research output: Chapter in Book/Conference proceedingConference contributionpeer-review

12 Citations (Scopus)

Abstract

In medical fields, text classification is one of the most important tasks that can significantly reduce human work-load through structured information digitization and intelligent decision support. Despite the popularity of learning-based text classification techniques, it is hard for human to understand or manually fine-tune the classification for better precision and recall, due to the black box nature of learning. This study proposes a novel regular expression-based text classification method making use of genetic programming (GP) approaches to evolve regular expressions that can classify a given medical text inquiry with satisfaction. Given a seed population of regular expressions (randomly initialized or manually constructed by experts), our method evolves a population of regular expressions, using a novel regular expression syntax and a series of carefully chosen reproduction operators. Our method is evaluated with real-life medical text inquiries from an online healthcare provider and shows promising performance. More importantly, our method generates classifiers that can be fully understood, checked and updated by medical doctors, which are fundamentally crucial for medical related practices.

Original languageEnglish
Title of host publication2020 IEEE Congress on Evolutionary Computation, CEC 2020 - Conference Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781728169293
DOIs
Publication statusPublished - Jul 2020
Event2020 IEEE Congress on Evolutionary Computation, CEC 2020 - Virtual, Glasgow, United Kingdom
Duration: 19 Jul 202024 Jul 2020

Publication series

Name2020 IEEE Congress on Evolutionary Computation, CEC 2020 - Conference Proceedings

Conference

Conference2020 IEEE Congress on Evolutionary Computation, CEC 2020
Country/TerritoryUnited Kingdom
CityVirtual, Glasgow
Period19/07/2024/07/20

Keywords

  • cooccurrence matrix
  • genetic programming
  • text classification

ASJC Scopus subject areas

  • Control and Optimization
  • Decision Sciences (miscellaneous)
  • Artificial Intelligence
  • Computer Vision and Pattern Recognition
  • Hardware and Architecture

Fingerprint

Dive into the research topics of 'Data-Driven Regular Expressions Evolution for Medical Text Classification Using Genetic Programming'. Together they form a unique fingerprint.

Cite this