GLORIA

GEOMAR Library Ocean Research Information Access

Your email was sent successfully. Check your inbox.

An error occurred while sending the email. Please try again.

Proceed reservation?

Export
Filter
  • Hsu, Wen-Lian  (4)
  • Biodiversity Research  (4)
  • 1
    In: Nucleic Acids Research, Oxford University Press (OUP), Vol. 46, No. D1 ( 2018-01-04), p. D296-D302
    Type of Medium: Online Resource
    ISSN: 0305-1048 , 1362-4962
    RVK:
    Language: English
    Publisher: Oxford University Press (OUP)
    Publication Date: 2018
    detail.hit.zdb_id: 1472175-2
    SSG: 12
    Location Call Number Limitation Availability
    BibTip Others were also interested in ...
  • 2
    In: Nucleic Acids Research, Oxford University Press (OUP), Vol. 44, No. D1 ( 2016-01-04), p. D239-D247
    Type of Medium: Online Resource
    ISSN: 0305-1048 , 1362-4962
    RVK:
    Language: English
    Publisher: Oxford University Press (OUP)
    Publication Date: 2016
    detail.hit.zdb_id: 1472175-2
    SSG: 12
    Location Call Number Limitation Availability
    BibTip Others were also interested in ...
  • 3
    Online Resource
    Online Resource
    Springer Science and Business Media LLC ; 2006
    In:  BMC Bioinformatics Vol. 7, No. 1 ( 2006-12)
    In: BMC Bioinformatics, Springer Science and Business Media LLC, Vol. 7, No. 1 ( 2006-12)
    Abstract: Text mining in the biomedical domain is receiving increasing attention. A key component of this process is named entity recognition (NER). Generally speaking, two annotated corpora, GENIA and GENETAG, are most frequently used for training and testing biomedical named entity recognition (Bio-NER) systems. JNLPBA and BioCreAtIvE are two major Bio-NER tasks using these corpora. Both tasks take different approaches to corpus annotation and use different matching criteria to evaluate system performance. This paper details these differences and describes alternative criteria. We then examine the impact of different criteria and annotation schemes on system performance by retesting systems participated in the above two tasks. Results To analyze the difference between JNLPBA's and BioCreAtIvE's evaluation, we conduct Experiment 1 to evaluate the top four JNLPBA systems using BioCreAtIvE's classification scheme. We then compare them with the top four BioCreAtIvE systems. Among them, three systems participated in both tasks, and each has an F-score lower on JNLPBA than on BioCreAtIvE. In Experiment 2, we apply hypothesis testing and correlation coefficient to find alternatives to BioCreAtIvE's evaluation scheme. It shows that right-match and left-match criteria have no significant difference with BioCreAtIvE. In Experiment 3, we propose a customized relaxed-match criterion that uses right match and merges JNLPBA's five NE classes into two, which achieves an F-score of 81.5%. In Experiment 4, we evaluate a range of five matching criteria from loose to strict on the top JNLPBA system and examine the percentage of false negatives. Our experiment gives the relative change in precision, recall and F-score as matching criteria are relaxed. Conclusion In many applications, biomedical NEs could have several acceptable tags, which might just differ in their left or right boundaries. However, most corpora annotate only one of them. In our experiment, we found that right match and left match can be appropriate alternatives to JNLPBA and BioCreAtIvE's matching criteria. In addition, our relaxed-match criterion demonstrates that users can define their own relaxed criteria that correspond more realistically to their application requirements.
    Type of Medium: Online Resource
    ISSN: 1471-2105
    Language: English
    Publisher: Springer Science and Business Media LLC
    Publication Date: 2006
    detail.hit.zdb_id: 2041484-5
    SSG: 12
    Location Call Number Limitation Availability
    BibTip Others were also interested in ...
  • 4
    In: BMC Bioinformatics, Springer Science and Business Media LLC, Vol. 8, No. 1 ( 2007-12)
    Abstract: Bioinformatics tools for automatic processing of biomedical literature are invaluable for both the design and interpretation of large-scale experiments. Many information extraction (IE) systems that incorporate natural language processing (NLP) techniques have thus been developed for use in the biomedical field. A key IE task in this field is the extraction of biomedical relations, such as protein-protein and gene-disease interactions. However, most biomedical relation extraction systems usually ignore adverbial and prepositional phrases and words identifying location, manner, timing, and condition, which are essential for describing biomedical relations. Semantic role labeling (SRL) is a natural language processing technique that identifies the semantic roles of these words or phrases in sentences and expresses them as predicate-argument structures. We construct a biomedical SRL system called BIOSMILE that uses a maximum entropy (ME) machine-learning model to extract biomedical relations. BIOSMILE is trained on BioProp, our semi-automatic, annotated biomedical proposition bank. Currently, we are focusing on 30 biomedical verbs that are frequently used or considered important for describing molecular events. Results To evaluate the performance of BIOSMILE, we conducted two experiments to (1) compare the performance of SRL systems trained on newswire and biomedical corpora; and (2) examine the effects of using biomedical-specific features. The experimental results show that using BioProp improves the F-score of the SRL system by 21.45% over an SRL system that uses a newswire corpus. It is noteworthy that adding automatically generated template features improves the overall F-score by a further 0.52%. Specifically, ArgM-LOC, ArgM-MNR, and Arg2 achieve statistically significant performance improvements of 3.33%, 2.27%, and 1.44%, respectively. Conclusion We demonstrate the necessity of using a biomedical proposition bank for training SRL systems in the biomedical domain. Besides the different characteristics of biomedical and newswire sentences, factors such as cross-domain framesets and verb usage variations also influence the performance of SRL systems. For argument classification, we find that NE (named entity) features indicating if the target node matches with NEs are not effective, since NEs may match with a node of the parsing tree that does not have semantic role labels in the training set. We therefore incorporate templates composed of specific words, NE types, and POS tags into the SRL system. As a result, the classification accuracy for adjunct arguments, which is especially important for biomedical SRL, is improved significantly.
    Type of Medium: Online Resource
    ISSN: 1471-2105
    Language: English
    Publisher: Springer Science and Business Media LLC
    Publication Date: 2007
    detail.hit.zdb_id: 2041484-5
    SSG: 12
    Location Call Number Limitation Availability
    BibTip Others were also interested in ...
Close ⊗
This website uses cookies and the analysis tool Matomo. More information can be found here...