In:
Applied Mechanics and Materials, Trans Tech Publications, Ltd., Vol. 121-126 ( 2011-10), p. 1947-1951
Abstract:
The entity semantic representation model (ESRM), which considers an entity as a set of attributes and corresponding values, is very useful for various applications. This paper proposes an approach for extracting new attributes and values from related unstructured documents. In our approach, the extraction process is formulated as the sequence labeling task. According to the predefined entity structure, the labeled data for training annotator are achieved automatically. The CRFs based annotator is trained to annotate the sentence which maybe contains the new attributes and values. And then, in terms of a decision process with scoring algorithm, new attributes and values are identified and fill into the predefined entity representation model. The experiments show that the proposed method improves the performance of extraction with a higher accuracy.
Type of Medium:
Online Resource
ISSN:
1662-7482
DOI:
10.4028/www.scientific.net/AMM.121-126
DOI:
10.4028/www.scientific.net/AMM.121-126.1947
Language:
Unknown
Publisher:
Trans Tech Publications, Ltd.
Publication Date:
2011
detail.hit.zdb_id:
2251882-4