GLORIA

GEOMAR Library Ocean Research Information Access

feed icon rss

Your email was sent successfully. Check your inbox.

An error occurred while sending the email. Please try again.

Proceed reservation?

Export
  • 1
    Publication Date: 2022-09-23
    Description: Highlights • Method and application to improve digital soil maps of silt and clay in China • Within the framework of a DSM approach we derived spatial uncertainties. • Spatial uncertainty is based on randomized decision trees. • Model calibration set is refined by purposive sampling in area of high uncertainty. • Method and map refinement is approved using accuracy and uncertainty measures. Digital soil mapping (DSM) products represent estimates of spatially distributed soil properties. These estimations comprise an element of uncertainty that is not evenly distributed over the area covered by DSM. If we quantify the uncertainty spatially explicit, this information can be used to improve the quality of DSM by optimizing the sampling design. This study follows a DSM approach using a Random Forest regression model, legacy soil samples, and terrain covariates to estimate topsoil silt and clay contents in a small catchment of 4.2 km2 in the Three Gorges Reservoir Area, Central China. We aim (i) to introduce a method to derive spatial uncertainty, and (ii) to improve the initial DSM approaches by additional sampling that is guided by the spatial uncertainty. The proposed uncertainty measure is based on multiple realizations of individual and randomized decision tree models. We used the spatial uncertainty of the initial DSM approaches to stratify the study area and thereby to identify potential sampling areas of high uncertainties. Further, we tested how precisely available legacy samples cover the variability of the covariates within each potential sampling area to define the final sampling area and to apply a purposive sampling design. For the final Random Forest model calibration, we combined the legacy sample set with the additional samples. This uncertainty-driven DSM refinement was evaluated by comparing it to a second approach. In this second approach, the additional samples were replaced by a random sample set of the same size, obtained from the entire study area. For the comparative analysis, external, bootstrap-, and cross-validation was applied. The DSM approach using the uncertainty-driven refinement performed best. The averaged spatial uncertainty was reduced by 31% for silt and by 27% for clay compared to the initial DSM approach. Using external validation, the accuracy increased by the same proportions, while showing an overall accuracy of R2 = 0.59 for silt and R2 = 0.56 for clay.
    Type: Article , PeerReviewed
    Format: text
    Location Call Number Limitation Availability
    BibTip Others were also interested in ...
  • 2
    Publication Date: 2021-07-04
    Description: Most common machine learning (ML) algorithms usually work well on balanced training sets, that is, datasets in which all classes are approximately represented equally. Otherwise, the accuracy estimates may be unreliable and classes with only a few values are often misclassified or neglected. This is known as a class imbalance problem in machine learning and datasets that do not meet this criterion are referred to as imbalanced data. Most datasets of soil classes are, therefore, imbalanced data. One of our main objectives is to compare eight resampling strategies that have been developed to counteract the imbalanced data problem. We compared the performance of five of the most common ML algorithms with the resampling approaches. The highest increase in prediction accuracy was achieved with SMOTE (the synthetic minority oversampling technique). In comparison to the baseline prediction on the original dataset, we achieved an increase of about 10, 20 and 10% in the overall accuracy, kappa index and F‐score, respectively. Regarding the ML approaches, random forest (RF) showed the best performance with an overall accuracy, kappa index and F‐score of 66, 60 and 57%, respectively. Moreover, the combination of RF and SMOTE improved the accuracy of the individual soil classes, compared to RF trained on the original dataset and allowed better prediction of soil classes with a low number of samples in the corresponding soil profile database, in our case for Chernozems. Our results show that balancing existing soil legacy data using synthetic sampling strategies can significantly improve the prediction accuracy in digital soil mapping (DSM). Highlights Spatial distribution of soil classes in Iran can be predicted using machine learning (ML) algorithms. The synthetic minority oversampling technique overcomes the drawback of imbalanced and highly biased soil legacy data. When combining a random forest model with synthetic sampling strategies the prediction accuracy of the soil model improves significantly. The resulting new soil map of Iran has a much higher spatial resolution compared to existing maps and displays new soil classes that have not yet been mapped in Iran.
    Description: Alexander von Humboldt‐Stiftung http://dx.doi.org/10.13039/100005156
    Description: German Research Foundation http://dx.doi.org/10.13039/501100001659
    Description: Soil and Water Research Institute, Agricultural Research, Education and Extension Organization, Karaj, Iran
    Keywords: 631.4 ; covariates ; imbalanced data ; machine learning ; random forest ; soil legacy data
    Type: article
    Location Call Number Limitation Availability
    BibTip Others were also interested in ...
  • 3
    Publication Date: 2023-03-14
    Keywords: AUG; Auger; Cation exchange capacity; Clay; digital soil mapping; Field capacity; Laboratory code/label; LATITUDE; LONGITUDE; Lora_del_Rio; Lora del Rio, Analusia, Spain; multi-scale terrain analysis; Organic carbon, soil; pH; Profile; ResourceCultures; Sand; SFB1070; Silt; soil properties; soil quality; Soil quality rating; spatial modelling
    Type: Dataset
    Format: text/tab-separated-values, 5060 data points
    Location Call Number Limitation Availability
    BibTip Others were also interested in ...
  • 4
    facet.materialart.
    Unknown
    PANGAEA
    In:  Supplement to: Rentschler, Tobias; Bartelheim, Martin; Behrens, Thorsten; Díaz-Zorita Bonilla, Marta; Teuber, Sandra; Scholten, Thomas; Schmidt, Karsten (2022): Contextual spatial modelling in the horizontal and vertical domains. Scientific Reports, 12(1), https://doi.org/10.1038/s41598-022-13514-5
    Publication Date: 2023-01-13
    Description: The dataset was used to estimate the relevant range of spatial scales with multi-scale contextual spatial modelling. The modelled soil properties were cation exchange capacity, pH, and water content at field capacity. The soil quality indicator data was modelled and predicted with partial least squares regression models based on NIR and MIR spectroscopy (Pangaea DOI (doi:10.1594/PANGAEA.938522): “Soil spectroscopy data from 130 soil profiles in Lora del Rio, Andalusia, Spain”). The soil samples were taken in an area of 1000 km² around Lora del Rio, Andalusia, Spain, in the Sierra Morena mountain range (Palaeozoic granite, gneiss, and slate), at the Guadalquivir river flood plain (Pleistocene marl, calcarenite, coarse sand, and Holocene sands and loams), and southern tertiary terraces (coarse gravel and cobble with sands and loams). Present soil types according to USDA Soil Taxonomy are Alfisols, Entisols, Inceptisols, and Vertisols. The basis for the multi-scale terrain analysis was a digital terrain model by the Centro Nacional de Information Geográfica (CNIG) of the Spanish government. The digital terrain model was published under the CC-BY 4.0 license via the Centro de Descargas del CNIG (IGN; doi:10.7419/162.09.2020) with the title Digital Terrain Model - DTM05 (EPSG: 25830) and last accessed on March, 31st 2020. The study area is covered by the MTN50 map sheets 0941, 0942, 0963, 0964, 0985, and 0986. The multi-scale contextual spatial modelling and the derivation of the scaled terrain covariates was based on the Gaussian pyramid (doi:10.1016/j.geoderma.2017.09.015 and doi:10.1038/s41598-018-33516-6) and the estimation of the relevant range of scales was based on exhaustive additive and subtractive machine learning sequences (doi:10.1038/s41598-019-51395-3). The models were trained with the multi-scale terrain covariates at each soil profile location extracted from the digital terrain model derivatives. For each soil depth of the soil dataset (0-10, 10-20, 20-30, 40-60, and 70-100 cm) two model sequences (additive and subtractive) were trained.
    Keywords: digital soil mapping; multi-scale terrain analysis; ResourceCultures; SFB1070; soil properties; soil quality; spatial modelling
    Type: Dataset
    Format: application/zip, 3 datasets
    Location Call Number Limitation Availability
    BibTip Others were also interested in ...
  • 5
    Publication Date: 2023-01-13
    Keywords: AUG; Auger; digital soil mapping; ELEVATION; LATITUDE; LONGITUDE; Lora_del_Rio; Lora del Rio, Analusia, Spain; multi-scale terrain analysis; Multi-scale terrain covariate; Profile; ResourceCultures; SFB1070; soil properties; soil quality; spatial modelling
    Type: Dataset
    Format: text/tab-separated-values, 26390 data points
    Location Call Number Limitation Availability
    BibTip Others were also interested in ...
  • 6
    Publication Date: 2023-01-13
    Keywords: AUG; Auger; Increment number; Laboratory code/label; LATITUDE; LONGITUDE; Lora_del_Rio; Lora del Rio, Analusia, Spain; Profile; ResourceCultures; Sample code/label; SFB1070; UTM Easting, Universal Transverse Mercator; UTM Northing, Universal Transverse Mercator; UTM Zone, Universal Transverse Mercator
    Type: Dataset
    Format: text/tab-separated-values, 3542 data points
    Location Call Number Limitation Availability
    BibTip Others were also interested in ...
Close ⊗
This website uses cookies and the analysis tool Matomo. More information can be found here...