IENet: inheritance enhancement network for video salient object detection

Tao Jiang; Yi Wang; Feng Hou; Ruili Wang

doi:10.1007/s11042-024-18408-4

IENet: inheritance enhancement network for video salient object detection

Tao Jiang, Yi Wang, Feng Hou, Ruili Wang

Research output: Journal Publication › Article › peer-review

Abstract

Effective utilization of spatiotemporal information is essential for improving the accuracy and robustness of Video Salient Object Detection (V-SOD). However, current methods have not fully utilized historical frame information, ultimately resulting in insufficient integration of complementary semantic information. To address this issue, we propose a novel Inheritance Enhancement Network (IENet) based on Transformer. The core of IENet is a Heritable Multi-Frame Attention (HMA) module, which fully exploits long-term context and frame-aware temporal modeling in feature extraction through unidirectional cross-frame enhancement. In contrast to existing methods, our heritable strategy is based on the unidirectional inheritance model using attention maps which ensure the information propagation for each frame is consistent and orderly, avoiding additional interference. Furthermore, we propose an auxiliary attention loss by using inherited attention maps to direct the network to focus more on target regions. The experimental results of our IENet reveal its effectiveness in handling challenging scenes on five popular benchmark datasets. For instance, in the cases of VOS and DAVSOD, our method achieves 0.042% and 0.070% for MAE compared to other competitive models. Particularly, IENet excels in inheriting finer details from historical frames even in complex environments. The module and predicted maps are publicly available at https://github.com/TOMMYWHY/IENet

Original language	English
Pages (from-to)	72007-72026
Number of pages	20
Journal	Multimedia Tools and Applications
Volume	83
Issue number	28
DOIs	https://doi.org/10.1007/s11042-024-18408-4
Publication status	Published - Aug 2024
Externally published	Yes

Keywords

Feature fusion
Frame-aware temporal relationships
Video salient object detection
Visual transformer

ASJC Scopus subject areas

Software
Media Technology
Hardware and Architecture
Computer Networks and Communications

Access to Document

10.1007/s11042-024-18408-4

Cite this

@article{c4e101dab6864c33b2baf4d60d1bc477,

title = "IENet: inheritance enhancement network for video salient object detection",

abstract = "Effective utilization of spatiotemporal information is essential for improving the accuracy and robustness of Video Salient Object Detection (V-SOD). However, current methods have not fully utilized historical frame information, ultimately resulting in insufficient integration of complementary semantic information. To address this issue, we propose a novel Inheritance Enhancement Network (IENet) based on Transformer. The core of IENet is a Heritable Multi-Frame Attention (HMA) module, which fully exploits long-term context and frame-aware temporal modeling in feature extraction through unidirectional cross-frame enhancement. In contrast to existing methods, our heritable strategy is based on the unidirectional inheritance model using attention maps which ensure the information propagation for each frame is consistent and orderly, avoiding additional interference. Furthermore, we propose an auxiliary attention loss by using inherited attention maps to direct the network to focus more on target regions. The experimental results of our IENet reveal its effectiveness in handling challenging scenes on five popular benchmark datasets. For instance, in the cases of VOS and DAVSOD, our method achieves 0.042% and 0.070% for MAE compared to other competitive models. Particularly, IENet excels in inheriting finer details from historical frames even in complex environments. The module and predicted maps are publicly available at https://github.com/TOMMYWHY/IENet",

keywords = "Feature fusion, Frame-aware temporal relationships, Video salient object detection, Visual transformer",

author = "Tao Jiang and Yi Wang and Feng Hou and Ruili Wang",

note = "Publisher Copyright: {\textcopyright} The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2024.",

year = "2024",

month = aug,

doi = "10.1007/s11042-024-18408-4",

language = "English",

volume = "83",

pages = "72007--72026",

journal = "Multimedia Tools and Applications",

issn = "1380-7501",

publisher = "Springer Netherlands",

number = "28",

}

TY - JOUR

T1 - IENet

T2 - inheritance enhancement network for video salient object detection

AU - Jiang, Tao

AU - Wang, Yi

AU - Hou, Feng

AU - Wang, Ruili

N1 - Publisher Copyright: © The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2024.

PY - 2024/8

Y1 - 2024/8

N2 - Effective utilization of spatiotemporal information is essential for improving the accuracy and robustness of Video Salient Object Detection (V-SOD). However, current methods have not fully utilized historical frame information, ultimately resulting in insufficient integration of complementary semantic information. To address this issue, we propose a novel Inheritance Enhancement Network (IENet) based on Transformer. The core of IENet is a Heritable Multi-Frame Attention (HMA) module, which fully exploits long-term context and frame-aware temporal modeling in feature extraction through unidirectional cross-frame enhancement. In contrast to existing methods, our heritable strategy is based on the unidirectional inheritance model using attention maps which ensure the information propagation for each frame is consistent and orderly, avoiding additional interference. Furthermore, we propose an auxiliary attention loss by using inherited attention maps to direct the network to focus more on target regions. The experimental results of our IENet reveal its effectiveness in handling challenging scenes on five popular benchmark datasets. For instance, in the cases of VOS and DAVSOD, our method achieves 0.042% and 0.070% for MAE compared to other competitive models. Particularly, IENet excels in inheriting finer details from historical frames even in complex environments. The module and predicted maps are publicly available at https://github.com/TOMMYWHY/IENet

AB - Effective utilization of spatiotemporal information is essential for improving the accuracy and robustness of Video Salient Object Detection (V-SOD). However, current methods have not fully utilized historical frame information, ultimately resulting in insufficient integration of complementary semantic information. To address this issue, we propose a novel Inheritance Enhancement Network (IENet) based on Transformer. The core of IENet is a Heritable Multi-Frame Attention (HMA) module, which fully exploits long-term context and frame-aware temporal modeling in feature extraction through unidirectional cross-frame enhancement. In contrast to existing methods, our heritable strategy is based on the unidirectional inheritance model using attention maps which ensure the information propagation for each frame is consistent and orderly, avoiding additional interference. Furthermore, we propose an auxiliary attention loss by using inherited attention maps to direct the network to focus more on target regions. The experimental results of our IENet reveal its effectiveness in handling challenging scenes on five popular benchmark datasets. For instance, in the cases of VOS and DAVSOD, our method achieves 0.042% and 0.070% for MAE compared to other competitive models. Particularly, IENet excels in inheriting finer details from historical frames even in complex environments. The module and predicted maps are publicly available at https://github.com/TOMMYWHY/IENet

KW - Feature fusion

KW - Frame-aware temporal relationships

KW - Video salient object detection

KW - Visual transformer

UR - http://www.scopus.com/inward/record.url?scp=85184165596&partnerID=8YFLogxK

U2 - 10.1007/s11042-024-18408-4

DO - 10.1007/s11042-024-18408-4

M3 - Article

AN - SCOPUS:85184165596

SN - 1380-7501

VL - 83

SP - 72007

EP - 72026

JO - Multimedia Tools and Applications

JF - Multimedia Tools and Applications

IS - 28

ER -

IENet: inheritance enhancement network for video salient object detection

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this