TY - GEN
T1 - Automatic Step Recognition with Video and Kinematic Data for Intelligent Operating Room and Beyond
AU - Chng, Chin Boon
AU - Lin, Wenjun
AU - Hu, Yaxin
AU - Hu, Yan
AU - Liu, Jiang
AU - Chui, Chee Kong
N1 - Publisher Copyright:
© 2023 Owner/Author.
PY - 2023/12/7
Y1 - 2023/12/7
N2 - With the continuous development of intelligent operating room systems, the segmentation and automatic recognition of surgical workflow have become challenging research fields. In recent years, an increasing number of models have been proposed to address this challenge, with deep learning becoming the mainstream approach. In this paper, we propose a multi-stage network for surgical step recognition by using surgical video and kinematic data. Firstly, a convolutional neural network (ResNet34) is used to extract visual features from video frames. Next, since surgical videos are a form of sequential data, a Temporal Convolutional Network (TCN) is employed as a temporal extractor to process temporal information between video frames for classification. Finally, a multi-stage TCN network, consisting of Encoder-Decoded TCN and Dilated TCN architectures, is used to refine the result. The proposed network is compared against a LSTM network from our prior work and is evaluated on a surgical dataset named MISAW in two modes - video data with and without kinematic data. Experimental results indicate that kinematic data is crucial for robot motion control in the operating rooms of the future. The technology will also find application in robotic labs for the development and optimization of chemical manufacturing processes.
AB - With the continuous development of intelligent operating room systems, the segmentation and automatic recognition of surgical workflow have become challenging research fields. In recent years, an increasing number of models have been proposed to address this challenge, with deep learning becoming the mainstream approach. In this paper, we propose a multi-stage network for surgical step recognition by using surgical video and kinematic data. Firstly, a convolutional neural network (ResNet34) is used to extract visual features from video frames. Next, since surgical videos are a form of sequential data, a Temporal Convolutional Network (TCN) is employed as a temporal extractor to process temporal information between video frames for classification. Finally, a multi-stage TCN network, consisting of Encoder-Decoded TCN and Dilated TCN architectures, is used to refine the result. The proposed network is compared against a LSTM network from our prior work and is evaluated on a surgical dataset named MISAW in two modes - video data with and without kinematic data. Experimental results indicate that kinematic data is crucial for robot motion control in the operating rooms of the future. The technology will also find application in robotic labs for the development and optimization of chemical manufacturing processes.
KW - Automation
KW - Intelligent Operating Room
KW - Multi-stage Model
KW - Step Recognition
KW - Surgical Robotics
KW - Temporal Convolutional Networks
UR - http://www.scopus.com/inward/record.url?scp=85180550841&partnerID=8YFLogxK
U2 - 10.1145/3628797.3628999
DO - 10.1145/3628797.3628999
M3 - Conference contribution
AN - SCOPUS:85180550841
T3 - ACM International Conference Proceeding Series
SP - 599
EP - 606
BT - SOICT 2023 - 12th International Symposium on Information and Communication Technology
PB - Association for Computing Machinery
T2 - 12th International Symposium on Information and Communication Technology, SOICT 2023
Y2 - 7 December 2023 through 8 December 2023
ER -