In:
The Computer Journal, Oxford University Press (OUP), Vol. 66, No. 7 ( 2023-07-13), p. 1733-1748
Abstract:
The presence of a schema for eXtensible Markup Language (XML) documents has numerous advantages. Unfortunately, many XML documents in practice are not accompanied by a (valid) schema. Therefore, it is essential to devise algorithms to infer schemas from XML documents, where the fundamental task is learning regular expressions. In this paper, we focus on the learning of disjunctive multiplicity expressions (DMEs), a subclass of regular expressions that are particularly suitable to specify unordered models and have been used as the foundation of the schemas for unordered XML. Previous work for learning DME lacks inference algorithms that support positive and negative examples. Further, presently there has been no algorithm can learn DMEs extended with numeric occurrences. We address these challenges in the present paper and first propose a novel algorithm to learn DMEs from positive and negative examples by using genetic algorithms and parallel techniques. Then we extend DMEs to disjunctive generalized multiplicity expressions (DGMEs), which allow numeric occurrences and develop an algorithm to learn DGMEs from positive and negative examples. Finally, experimental results show that with only positive examples, our algorithm can generate a DME with an acceptable learning time, which can accept all positive examples, and when given both positive and negative examples, we can learn DMEs or DGMEs with high accuracy.
Type of Medium:
Online Resource
ISSN:
0010-4620
,
1460-2067
DOI:
10.1093/comjnl/bxac037
Language:
English
Publisher:
Oxford University Press (OUP)
Publication Date:
2023
detail.hit.zdb_id:
1477172-X
Permalink