Video Infringement Detection via Feature Disentanglement and Mutual Information Maximization

Zhenguang Liu; Xinyang Yu; Ruili Wang; Shuai Ye; Zhe Ma; Jianfeng Dong; Sifeng He; Feng Qian; Xiaobo Zhang; Roger Zimmermann; Lei Yang

doi:10.1145/3581783.3612002

Video Infringement Detection via Feature Disentanglement and Mutual Information Maximization

Zhenguang Liu, Xinyang Yu, Ruili Wang, Shuai Ye, Zhe Ma, Jianfeng Dong, Sifeng He, Feng Qian, Xiaobo Zhang, Roger Zimmermann, Lei Yang

Research output: Chapter in Book/Conference proceeding › Conference contribution › peer-review

1 Citation (Scopus)

Abstract

The self-media era provides us tremendous high quality videos. Unfortunately, frequent video copyright infringements are now seriously damaging the interests and enthusiasm of video creators. Identifying infringing videos is therefore a compelling task. Current state-of-the-art methods tend to simply feed high-dimensional mixed video features into deep neural networks and count on the networks to extract useful representations. Despite its simplicity, this paradigm heavily relies on the original entangled features and lacks constraints guaranteeing that useful task-relevant semantics are extracted from the features. In this paper, we seek to tackle the above challenges from two aspects: (1) We propose to disentangle an original high-dimensional feature into multiple sub-features, explicitly disentangling the feature into exclusive lower-dimensional components. We expect the sub-features to encode non-overlapping semantics of the original feature and remove redundant information. (2) On top of the disentangled sub-features, we further learn an auxiliary feature to enhance the sub-features. We theoretically analyzed the mutual information between the label and the disentangled features, arriving at a loss that maximizes the extraction of task-relevant information from the original feature. Extensive experiments on two large-scale benchmark datasets (i.e., SVD and VCSL) demonstrate that our method achieves 90.1% TOP-100 mAP on the large-scale SVD dataset and also sets the new state-of-the-art on the VCSL benchmark dataset. Our code and model have been released at https://github.com/yyyooooo/DMI/, hoping to contribute to the community.

Original language	English
Title of host publication	MM 2023 - Proceedings of the 31st ACM International Conference on Multimedia
Publisher	Association for Computing Machinery, Inc
Pages	144-152
Number of pages	9
ISBN (Electronic)	9798400701085
DOIs	https://doi.org/10.1145/3581783.3612002
Publication status	Published - 26 Oct 2023
Externally published	Yes
Event	31st ACM International Conference on Multimedia, MM 2023 - Ottawa, Canada Duration: 29 Oct 2023 → 3 Nov 2023

Publication series

Name	MM 2023 - Proceedings of the 31st ACM International Conference on Multimedia

Conference

Conference	31st ACM International Conference on Multimedia, MM 2023
Country/Territory	Canada
City	Ottawa
Period	29/10/23 → 3/11/23

Keywords

mutual information
neural network
video copyright infringements

ASJC Scopus subject areas

Artificial Intelligence
Computer Graphics and Computer-Aided Design
Human-Computer Interaction
Software

Access to Document

10.1145/3581783.3612002

Cite this

Liu, Z., Yu, X., Wang, R., Ye, S., Ma, Z., Dong, J., He, S., Qian, F., Zhang, X., Zimmermann, R., & Yang, L. (2023). Video Infringement Detection via Feature Disentanglement and Mutual Information Maximization. In MM 2023 - Proceedings of the 31st ACM International Conference on Multimedia (pp. 144-152). (MM 2023 - Proceedings of the 31st ACM International Conference on Multimedia). Association for Computing Machinery, Inc. https://doi.org/10.1145/3581783.3612002

@inproceedings{1eb7918b7fea417db520d628f0a2daa5,

title = "Video Infringement Detection via Feature Disentanglement and Mutual Information Maximization",

abstract = "The self-media era provides us tremendous high quality videos. Unfortunately, frequent video copyright infringements are now seriously damaging the interests and enthusiasm of video creators. Identifying infringing videos is therefore a compelling task. Current state-of-the-art methods tend to simply feed high-dimensional mixed video features into deep neural networks and count on the networks to extract useful representations. Despite its simplicity, this paradigm heavily relies on the original entangled features and lacks constraints guaranteeing that useful task-relevant semantics are extracted from the features. In this paper, we seek to tackle the above challenges from two aspects: (1) We propose to disentangle an original high-dimensional feature into multiple sub-features, explicitly disentangling the feature into exclusive lower-dimensional components. We expect the sub-features to encode non-overlapping semantics of the original feature and remove redundant information. (2) On top of the disentangled sub-features, we further learn an auxiliary feature to enhance the sub-features. We theoretically analyzed the mutual information between the label and the disentangled features, arriving at a loss that maximizes the extraction of task-relevant information from the original feature. Extensive experiments on two large-scale benchmark datasets (i.e., SVD and VCSL) demonstrate that our method achieves 90.1% TOP-100 mAP on the large-scale SVD dataset and also sets the new state-of-the-art on the VCSL benchmark dataset. Our code and model have been released at https://github.com/yyyooooo/DMI/, hoping to contribute to the community.",

keywords = "mutual information, neural network, video copyright infringements",

author = "Zhenguang Liu and Xinyang Yu and Ruili Wang and Shuai Ye and Zhe Ma and Jianfeng Dong and Sifeng He and Feng Qian and Xiaobo Zhang and Roger Zimmermann and Lei Yang",

note = "Publisher Copyright: {\textcopyright} 2023 ACM.; 31st ACM International Conference on Multimedia, MM 2023 ; Conference date: 29-10-2023 Through 03-11-2023",

year = "2023",

month = oct,

day = "26",

doi = "10.1145/3581783.3612002",

language = "English",

series = "MM 2023 - Proceedings of the 31st ACM International Conference on Multimedia",

publisher = "Association for Computing Machinery, Inc",

pages = "144--152",

booktitle = "MM 2023 - Proceedings of the 31st ACM International Conference on Multimedia",

}

Liu, Z, Yu, X, Wang, R, Ye, S, Ma, Z, Dong, J, He, S, Qian, F, Zhang, X, Zimmermann, R & Yang, L 2023, Video Infringement Detection via Feature Disentanglement and Mutual Information Maximization. in MM 2023 - Proceedings of the 31st ACM International Conference on Multimedia. MM 2023 - Proceedings of the 31st ACM International Conference on Multimedia, Association for Computing Machinery, Inc, pp. 144-152, 31st ACM International Conference on Multimedia, MM 2023, Ottawa, Canada, 29/10/23. https://doi.org/10.1145/3581783.3612002

Video Infringement Detection via Feature Disentanglement and Mutual Information Maximization. / Liu, Zhenguang; Yu, Xinyang; Wang, Ruili et al.
MM 2023 - Proceedings of the 31st ACM International Conference on Multimedia. Association for Computing Machinery, Inc, 2023. p. 144-152 (MM 2023 - Proceedings of the 31st ACM International Conference on Multimedia).

Research output: Chapter in Book/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Video Infringement Detection via Feature Disentanglement and Mutual Information Maximization

AU - Liu, Zhenguang

AU - Yu, Xinyang

AU - Wang, Ruili

AU - Ye, Shuai

AU - Ma, Zhe

AU - Dong, Jianfeng

AU - He, Sifeng

AU - Qian, Feng

AU - Zhang, Xiaobo

AU - Zimmermann, Roger

AU - Yang, Lei

PY - 2023/10/26

Y1 - 2023/10/26

N2 - The self-media era provides us tremendous high quality videos. Unfortunately, frequent video copyright infringements are now seriously damaging the interests and enthusiasm of video creators. Identifying infringing videos is therefore a compelling task. Current state-of-the-art methods tend to simply feed high-dimensional mixed video features into deep neural networks and count on the networks to extract useful representations. Despite its simplicity, this paradigm heavily relies on the original entangled features and lacks constraints guaranteeing that useful task-relevant semantics are extracted from the features. In this paper, we seek to tackle the above challenges from two aspects: (1) We propose to disentangle an original high-dimensional feature into multiple sub-features, explicitly disentangling the feature into exclusive lower-dimensional components. We expect the sub-features to encode non-overlapping semantics of the original feature and remove redundant information. (2) On top of the disentangled sub-features, we further learn an auxiliary feature to enhance the sub-features. We theoretically analyzed the mutual information between the label and the disentangled features, arriving at a loss that maximizes the extraction of task-relevant information from the original feature. Extensive experiments on two large-scale benchmark datasets (i.e., SVD and VCSL) demonstrate that our method achieves 90.1% TOP-100 mAP on the large-scale SVD dataset and also sets the new state-of-the-art on the VCSL benchmark dataset. Our code and model have been released at https://github.com/yyyooooo/DMI/, hoping to contribute to the community.

AB - The self-media era provides us tremendous high quality videos. Unfortunately, frequent video copyright infringements are now seriously damaging the interests and enthusiasm of video creators. Identifying infringing videos is therefore a compelling task. Current state-of-the-art methods tend to simply feed high-dimensional mixed video features into deep neural networks and count on the networks to extract useful representations. Despite its simplicity, this paradigm heavily relies on the original entangled features and lacks constraints guaranteeing that useful task-relevant semantics are extracted from the features. In this paper, we seek to tackle the above challenges from two aspects: (1) We propose to disentangle an original high-dimensional feature into multiple sub-features, explicitly disentangling the feature into exclusive lower-dimensional components. We expect the sub-features to encode non-overlapping semantics of the original feature and remove redundant information. (2) On top of the disentangled sub-features, we further learn an auxiliary feature to enhance the sub-features. We theoretically analyzed the mutual information between the label and the disentangled features, arriving at a loss that maximizes the extraction of task-relevant information from the original feature. Extensive experiments on two large-scale benchmark datasets (i.e., SVD and VCSL) demonstrate that our method achieves 90.1% TOP-100 mAP on the large-scale SVD dataset and also sets the new state-of-the-art on the VCSL benchmark dataset. Our code and model have been released at https://github.com/yyyooooo/DMI/, hoping to contribute to the community.

KW - mutual information

KW - neural network

KW - video copyright infringements

UR - http://www.scopus.com/inward/record.url?scp=85179549587&partnerID=8YFLogxK

U2 - 10.1145/3581783.3612002

DO - 10.1145/3581783.3612002

M3 - Conference contribution

AN - SCOPUS:85179549587

T3 - MM 2023 - Proceedings of the 31st ACM International Conference on Multimedia

SP - 144

EP - 152

BT - MM 2023 - Proceedings of the 31st ACM International Conference on Multimedia

PB - Association for Computing Machinery, Inc

T2 - 31st ACM International Conference on Multimedia, MM 2023

Y2 - 29 October 2023 through 3 November 2023

ER -

Liu Z, Yu X, Wang R, Ye S, Ma Z, Dong J et al. Video Infringement Detection via Feature Disentanglement and Mutual Information Maximization. In MM 2023 - Proceedings of the 31st ACM International Conference on Multimedia. Association for Computing Machinery, Inc. 2023. p. 144-152. (MM 2023 - Proceedings of the 31st ACM International Conference on Multimedia). doi: 10.1145/3581783.3612002

Video Infringement Detection via Feature Disentanglement and Mutual Information Maximization

Abstract

Publication series

Conference

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this