In:
Journal of Information Science, SAGE Publications, Vol. 40, No. 3 ( 2014-06), p. 281-292
Abstract:
The Doubly Correlated Topic Model is a generative probabilistic topic model for automatically identifying topics from the corpus of the text documents. It is a mixed membership model, based on the fact that a document exhibits a number of topics. We used word co-occurrence statistical information for identifying an initial set of topics as posterior information for the model. Posterior inference methods utilized by the existing models are intractable and therefore provide an approximate solution. Consideration of co-occurred words as initial topics provides a tighter bound on the topic coherence. The proposed model is motivated by the Latent Dirichlet Allocation Model. The Doubly Correlated Topic Model differs from the Latent Dirichlet Allocation Model in its posterior inference; it uses the highest ranked co-occurred words as initial topics rather than obtaining from Dirichlet priors. The results of the proposed model suggest some improved performance on entropy and topical coherence over different datasets.
Type of Medium:
Online Resource
ISSN:
0165-5515
,
1741-6485
DOI:
10.1177/0165551514524678
Language:
English
Publisher:
SAGE Publications
Publication Date:
2014
detail.hit.zdb_id:
439125-1
detail.hit.zdb_id:
2025062-9
SSG:
24,1
Permalink