In:
Journal of Computer Science and Cybernetics, Publishing House for Science and Technology, Vietnam Academy of Science and Technology (Publications), ( 2023-12-25), p. 323-342
Abstract:
The exponential growth of bioinformatics in the healthcare domain has revolutionized our understanding of DNA, proteins, and other biomolecular entities. This remarkable progress has generated an overwhelming volume of data, necessitating big data technologies for efficient storage and indexing. While big data technologies like Hadoop offer substantial support for big XML file storage, the challenges of indexing data sizes and XPath query performance persist. To enhance the efficiency of XPath queries and address the data size problem, a novel approach that is derived from the spatial indexing method of the R-tre family. The proposed method is to modify the structure of leaf nodes in the indexing tree to preserve XML-sibling connections. Then, new algorithms for constructing the new tree structure and processing sibling queries better are introduced. Experimental results demonstrate the superior efficiency of sibling XPath queries with reduced data sizes for indexing, while other XPath queries exhibit notable performance improvements. This research contributes to the development of more effective indexing methods for managing and querying large XML datasets in bioinformatics applications, ultimately advancing biomedical research and healthcare initiatives.
Type of Medium:
Online Resource
ISSN:
2815-5939
,
1813-9663
DOI:
10.15625/1813-9663/19018
Language:
Unknown
Publisher:
Publishing House for Science and Technology, Vietnam Academy of Science and Technology (Publications)
Publication Date:
2023
detail.hit.zdb_id:
3169641-7
SSG:
24,1
Permalink