Unsupervised multi-author document decomposition based on hidden Markov model

Khaled Aldebei, Xiangjian He, Wenjing Jia, Jie Yang

Research output: Chapter in Book/Conference proceedingConference contributionpeer-review

7 Citations (Scopus)

Abstract

This paper proposes an unsupervised approach for segmenting a multiauthor document into authorial components. The key novelty is that we utilize the sequential patterns hidden among document elements when determining their authorships. For this purpose, we adopt Hidden Markov Model (HMM) and construct a sequential probabilistic model to capture the dependencies of sequential sentences and their authorships. An unsupervised learning method is developed to initialize the HMM parameters. Experimental results on benchmark datasets have demonstrated the significant benefit of our idea and our approach has outperformed the state-of-the-arts on all tests. As an example of its applications, the proposed approach is applied for attributing authorship of a document and has also shown promising results.

Original languageEnglish
Title of host publication54th Annual Meeting of the Association for Computational Linguistics, ACL 2016 - Long Papers
PublisherAssociation for Computational Linguistics (ACL)
Pages706-714
Number of pages9
ISBN (Electronic)9781510827585
DOIs
Publication statusPublished - 2016
Externally publishedYes
Event54th Annual Meeting of the Association for Computational Linguistics, ACL 2016 - Berlin, Germany
Duration: 7 Aug 201612 Aug 2016

Publication series

Name54th Annual Meeting of the Association for Computational Linguistics, ACL 2016 - Long Papers
Volume2

Conference

Conference54th Annual Meeting of the Association for Computational Linguistics, ACL 2016
Country/TerritoryGermany
CityBerlin
Period7/08/1612/08/16

ASJC Scopus subject areas

  • Language and Linguistics
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'Unsupervised multi-author document decomposition based on hidden Markov model'. Together they form a unique fingerprint.

Cite this