In:
PLOS ONE, Public Library of Science (PLoS), Vol. 17, No. 7 ( 2022-7-21), p. e0271610-
Abstract:
Approaching epidemiological data with flexible machine learning algorithms is of great value for understanding disease-specific association patterns. However, it can be difficult to correctly extract and understand those patterns due to the lack of model interpretability. Method We here propose a machine learning workflow that combines random forests with Bayesian network surrogate models to allow for a deeper level of interpretation of complex association patterns. We first evaluate the proposed workflow on synthetic data. We then apply it to data from the large population-based Study of Health in Pomerania (SHIP). Based on this combination, we discover and interpret broad patterns of individual serum TSH concentrations, an important marker of thyroid functionality. Results Evaluations using simulated data show that feature associations can be correctly recovered by combining random forests and Bayesian networks. The presented model achieves predictive accuracy that is similar to state-of-the-art models (root mean square error of 0.66, mean absolute error of 0.55, coefficient of determination of R 2 = 0.15). We identify 62 relevant features from the final random forest model, ranging from general health variables over dietary and genetic factors to physiological, hematological and hemostasis parameters. The Bayesian network model is used to put these features into context and make the black-box random forest model more understandable. Conclusion We demonstrate that the combination of random forest and Bayesian network analysis is helpful to reveal and interpret broad association patterns of individual TSH concentrations. The discovered patterns are in line with state-of-the-art literature. They may be useful for future thyroid research and improved dosing of therapeutics.
Type of Medium:
Online Resource
ISSN:
1932-6203
DOI:
10.1371/journal.pone.0271610
DOI:
10.1371/journal.pone.0271610.g001
DOI:
10.1371/journal.pone.0271610.g002
DOI:
10.1371/journal.pone.0271610.t001
DOI:
10.1371/journal.pone.0271610.t002
DOI:
10.1371/journal.pone.0271610.t003
DOI:
10.1371/journal.pone.0271610.t004
DOI:
10.1371/journal.pone.0271610.t005
DOI:
10.1371/journal.pone.0271610.s001
DOI:
10.1371/journal.pone.0271610.s002
DOI:
10.1371/journal.pone.0271610.s003
DOI:
10.1371/journal.pone.0271610.s004
DOI:
10.1371/journal.pone.0271610.s005
DOI:
10.1371/journal.pone.0271610.s006
DOI:
10.1371/journal.pone.0271610.s007
DOI:
10.1371/journal.pone.0271610.s008
DOI:
10.1371/journal.pone.0271610.s009
DOI:
10.1371/journal.pone.0271610.r001
DOI:
10.1371/journal.pone.0271610.r002
DOI:
10.1371/journal.pone.0271610.r003
DOI:
10.1371/journal.pone.0271610.r004
DOI:
10.1371/journal.pone.0271610.r005
Language:
English
Publisher:
Public Library of Science (PLoS)
Publication Date:
2022
detail.hit.zdb_id:
2267670-3
Permalink