GLORIA

GEOMAR Library Ocean Research Information Access

Your email was sent successfully. Check your inbox.

An error occurred while sending the email. Please try again.

Proceed reservation?

Export
  • 1
    Publication Date: 2024-04-18
    Repository Name: National Museum of Natural History, Netherlands
    Type: info:eu-repo/semantics/article
    Format: application/pdf
    Location Call Number Limitation Availability
    BibTip Others were also interested in ...
  • 2
    Publication Date: 2024-01-12
    Description: We describe an effective approach to automated text digitisation with respect to natural history specimen labels. These labels contain much useful data about the specimen including its collector, country of origin, and collection date. Our approach to automatically extracting these data takes the form of a pipeline. Recommendations are made for the pipeline\'s component parts based on state-of-the-art technologies. \n \nOptical Character Recognition (OCR) can be used to digitise text on images of specimens. However, recognising text quickly and accurately from these images can be a challenge for OCR. We show that OCR performance can be improved by prior segmentation of specimen images into their component parts. This ensures that only text-bearing labels are submitted for OCR processing as opposed to whole specimen images, which inevitably contain non-textual information that may lead to false positive readings. In our testing Tesseract OCR version 4.0.0 offers promising text recognition accuracy with segmented images. \n \nNot all the text on specimen labels is printed. Handwritten text varies much more and does not conform to standard shapes and sizes of individual characters, which poses an additional challenge for OCR. Recently, deep learning has allowed for significant advances in this area. Google\'s Cloud Vision, which is based on deep learning, is trained on large-scale datasets, and is shown to be quite adept at this task. This may take us some way towards negating the need for humans to routinely transcribe handwritten text. \n \nDetermining the countries and collectors of specimens has been the goal of previous automated text digitisation research activities. Our approach also focuses on these two pieces of information. An area of Natural Language Processing (NLP) known as Named Entity Recognition (NER) has matured enough to semi-automate this task. Our experiments demonstrated that existing approaches can accurately recognise location and person names within the text extracted from segmented images via Tesseract version 4.0.0. \n \nWe have highlighted the main recommendations for potential pipeline components. The paper also provides guidance on selecting appropriate software solutions. These include automatic language identification, terminology extraction, and integrating all pipeline components into a scientific workflow to automate the overall digitisation process.
    Keywords: automated text digitisation ; natural language processing ; named entity recognition ; optical character recognition ; handwritten text recognition ; language identification ; terminology extraction ; scientific workflows ; natural history specimens ; label data
    Repository Name: National Museum of Natural History, Netherlands
    Type: info:eu-repo/semantics/article
    Format: application/pdf
    Location Call Number Limitation Availability
    BibTip Others were also interested in ...
  • 3
    Publication Date: 2024-04-12
    Description: This paper presents a use-case conducted within the ENVRI FAIR project, examining challenges and opportunities in deploying FAIR-aligned (ensuring Findability, Accessibility, Interoperability and Reusability) scientific name-matching services across Environmental Research Infrastructures (RIs). Six services were tested using various name variations, revealing inconsistencies in match types, status reporting and handling of canonical forms and typos. These diversities pose challenges for RI data pipelines and interoperability. The paper underscores the importance of standardised tools, enhanced communication, training, collaboration and shared resources. Addressing these needs can facilitate more effective FAIR implementation within the ENVRI community and biodiversity research. This, in turn, will empower RIs to seamlessly integrate and leverage scientific names, unlocking the full potential of their data for research and policy implementation.
    Keywords: scientific names ; taxonomy ; biodiversity ; FAIR ; ENVRI
    Repository Name: National Museum of Natural History, Netherlands
    Type: info:eu-repo/semantics/article
    Format: application/pdf
    Location Call Number Limitation Availability
    BibTip Others were also interested in ...
  • 4
    Publication Date: 2024-01-12
    Description: New data on 52 non-indigenous mollusks in the Eastern Mediterranean Sea is reported. Fossarus sp. (aff. aptus sensu Blatterer 2019), Coriophora lessepsiana Albano, Bakker & Sabelli, sp. nov., Cerithiopsis sp. aff. pulvis, Joculator problematicus Albano & Steger, sp. nov., Cerithiopsis sp., Elachisina sp., Iravadia aff. elongata, Vitrinella aff. Vitrinella sp. 1 (sensu Blatterer 2019), Melanella orientalis, Parvioris aff. dilecta, Odostomia cf. dalli, Oscilla virginiae, Parthenina cossmanni, Parthenina typica, Pyrgulina craticulata, Turbonilla funiculata, Cylichna collyra, Musculus coenobitus, Musculus aff. viridulus, Chavania erythraea, Scintilla cf. violescens, Iacra seychellarum and Corbula erythraeensis are new records for the Mediterranean. An unidentified gastropod, Skeneidae indet., Triphora sp., Hypermastus sp., Sticteulima sp., Vitreolina cf. philippi, Odostomia (s.l.) sp. 1, Henrya (?) sp., and Semelidae sp. are further potential new non-indigenous species although their status should be confirmed upon final taxonomic assessment. Additionally, the status of Dikoleps micalii, Hemiliostraca clandestina comb. nov. and H. athenamariae comb. nov. is changed to non-indigenous, range extensions for nine species and the occurrence of living individuals for species previously recorded from empty shells only are reported. Opimaphora blattereri Albano, Bakker & Sabelli, sp. nov. is described from the Red Sea for comparison with the morphologically similar C. lessepsiana Albano, Bakker & Sabelli, sp. nov. The taxonomic part is followed by a discussion on how intensive fieldwork and cooperation among institutions and individuals enabled such a massive report, and how the poor taxonomic knowledge of the Indo-Pacific fauna hampers non-indigenous species detection and identification. Finally, the hypothesis that the simultaneous analysis of quantitative benthic death assemblages can support the assignment of non-indigenous status to taxonomically undetermined species is discussed.
    Keywords: Animal Science and Zoology ; Ecology ; Evolution ; Behavior and Systematics ; Cerithiopsidae ; invasion biology ; Lessepsian invasion ; Mollusca ; new species ; Red Sea ; taxonomy ; Triphoridae
    Repository Name: National Museum of Natural History, Netherlands
    Type: info:eu-repo/semantics/article
    Format: application/pdf
    Location Call Number Limitation Availability
    BibTip Others were also interested in ...
  • 5
    Publication Date: 2024-04-18
    Description: Specimen data in taxonomic literature are among the highest quality primary biodiversity data. Innovative cybertaxonomic journals are using workflows that maintain data structure and disseminate electronic content to aggregators and other users; such structure is lost in traditional taxonomic publishing. Legacy taxonomic literature is a vast repository of knowledge about biodiversity. Currently, access to that resource is cumbersome, especially for non-specialist data consumers. Markup is a mechanism that makes this content more accessible, and is especially suited to machine analysis. Fine-grained XML (Extensible Markup Language) markup was applied to all (37) open-access articles published in the journal Zootaxa containing treatments on spiders (Order: Araneae). The markup approach was optimized to extract primary specimen data from legacy publications. These data were combined with data from articles containing treatments on spiders published in Biodiversity Data Journal where XML structure is part of the routine publication process. A series of charts was developed to visualize the content of specimen data in XML-tagged taxonomic treatments, either singly or in aggregate. The data can be filtered by several fields (including journal, taxon, institutional collection, collecting country, collector, author, article and treatment) to query particular aspects of the data. We demonstrate here that XML markup using GoldenGATE can address the challenge presented by unstructured legacy data, can extract structured primary biodiversity data which can be aggregated with and jointly queried with data from other Darwin Core-compatible sources, and show how visualization of these data can communicate key information contained in biodiversity literature. We complement recent studies on aspects of biodiversity knowledge using XML structured data to explore 1) the time lag between species discovry and description, and 2) the prevelence of rarity in species descriptions.
    Keywords: Araneae ; Biodiversity informatics ; Data mining ; Open access ; Spiders ; Taxonomy ; XML ; markup
    Repository Name: National Museum of Natural History, Netherlands
    Type: info:eu-repo/semantics/article
    Format: application/pdf
    Location Call Number Limitation Availability
    BibTip Others were also interested in ...
  • 6
    Publication Date: 2024-01-12
    Keywords: Ecology ; Ecology ; Evolution ; Behavior and Systematics
    Repository Name: National Museum of Natural History, Netherlands
    Type: info:eu-repo/semantics/article
    Format: application/pdf
    Location Call Number Limitation Availability
    BibTip Others were also interested in ...
  • 7
    Publication Date: 2024-02-01
    Description: The Distributed System of Scientific Collections (DiSSCo) is a pan-European Research \nInfrastructure (RI) initiative. DiSSCo aims to bring together natural science collections from \n175 museums, botanical gardens, universities and research institutes across 23 countries \nin a distributed infrastructure that makes these collections physically and digitally open and \naccessible for all forms of research and innovation. DiSSCo RI entered the ESFRI \nroadmap in 2018 and successfully concluded its Preparatory Phase in early 2023. The RI \nis now transitioning towards the constitution of its legal entity (an ERIC) and the start of its \nscaled-up construction (implementation) programme. This publication is an abridged \nversion of the successful grant proposal for the DiSSCo Transition Project which has the \ngoal of ensuring the seamless transition of the DiSSCo RI from its Preparatory Phase to \nthe Construction Phase (expected to start in 2025). In this transition period, the Project will \naddress five objectives building on the outcomes of the Preparatory Phase project: \n1) Advance the DiSSCo ERIC process and complete its policy framework, ensuring the \nsmooth early-phase Implementation of DISSCo; \n2) Engage & support DiSSCo National Nodes to strengthen national commitments; \n3) Advance the development of core e-services to avoid the accumulation of technical debt \nbefore the start of the Implementation Phase; \n4) Continue international collaboration on standards & best practices needed for the \nDiSSCo service provision; and \n5) Continue supporting DiSSCo RI interim governance bodies and transition them to the \nDiSSCo ERIC formal governance. \nThe Project\xe2\x80\x99s impact will be measured against the increase in the RI\'s overall \nImplementation Readiness Level (IRL). More specifically, we will monitor its impact towards \nreaching the required level of maturity in four of the five dimensions of the IRL that can \nbenefit from further developments. These include the organisational, financial, \ntechnological and data readiness levels.
    Keywords: natural science collections ; natural history collections ; research infrastructure ; global ; natural science ; digitisation ; data standards ; Distributed System of Scientific Collections ; DiSSCo ; Digital Specimen Architecture ; FAIR Data Ecosystem ; FAIR digital objects
    Repository Name: National Museum of Natural History, Netherlands
    Type: info:eu-repo/semantics/article
    Format: application/pdf
    Location Call Number Limitation Availability
    BibTip Others were also interested in ...
Close ⊗
This website uses cookies and the analysis tool Matomo. More information can be found here...