Abstract
COVID-19 is responsible for high mortality, but robust machine learning-based predictors of mortality are lacking. To generate a model for predicting mortality in patients hospitalized with COVID-19 using Gradient Boosting Decision Trees (GBDT). The Spanish SEMI-COVID-19 registry includes 24,514 pseudo-anonymized cases of patients hospitalized with COVID-19 from 1 February 2020 to 5 December 2021. This registry was used as a GBDT machine learning model, employing the CatBoost and BorutaShap classifier to select the most relevant indicators and generate a mortality prediction model by risk level, ranging from 0 to 1. The model was validated by separating patients according to admission date, using the period 1 February to 31 December 2020 (first and second waves, pre-vaccination period) for training, and 1 January to 30 November 2021 (vaccination period) for the test group. An ensemble of ten models with different random seeds was constructed, separating 80% of the patients for training and 20% from the end of the training period for cross-validation. The area under the receiver operating characteristics curve (AUC) was used as a performance metric. Clinical and laboratory data from 23,983 patients were analyzed. CatBoost mortality prediction models achieved an AUC performance of 84.76 (standard deviation 0.45) for patients in the test group (potentially vaccinated patients not included in model training) using 16 features. The performance of the 16-parameter GBDT model for predicting COVID-19 hospital mortality, although requiring a relatively large number of predictors, shows a high predictive capacity.
Similar content being viewed by others
Data availability
J-M C-R and J-M R-R have full access to the data and are the guarantors for the data.
References
Zhang R, Li Y, Zhang AL et al (2020) Identifying airborne transmission as the dominant route for the spread of COVID-19. Proc Natl Acad Sci U S A 117:14857–14863. https://doi.org/10.1073/PNAS.2009637117
Driggin E, Madhavan MV, Bikdeli B et al (2020) Cardiovascular considerations for patients, health care workers, and health systems during the COVID-19 pandemic. J Am Coll Cardiol 75:2352–2371. https://doi.org/10.1016/J.JACC.2020.03.031
COVID-19 Map - Johns hopkins coronavirus resource center. https://coronavirus.jhu.edu/map.html. Accessed 12 Mar 2023
Casas-Rojo JM, Antón-Santos JM, Millán-Núñez-Cortés J et al (2020) Clinical characteristics of patients hospitalized with COVID-19 in Spain: results from the SEMI-COVID-19 registry. Rev Clin Esp 220:480–494. https://doi.org/10.1016/j.rce.2020.07.003
Richardson S, Hirsch JS, Narasimhan M et al (2020) Presenting characteristics, comorbidities, and outcomes among 5700 patients hospitalized with COVID-19 in the New York City area. JAMA 323:2052–2059. https://doi.org/10.1001/JAMA.2020.6775
Wu YC, Chen CS, Chan YJ (2020) The outbreak of COVID-19: an overview. J Chin Med Assoc 83:217–220. https://doi.org/10.1097/JCMA.0000000000000270
Garibaldi BT, Fiksel J, Muschelli J et al (2021) Patient trajectories among persons hospitalized for covid-19: a cohort study. Ann Intern Med 174:33–41. https://doi.org/10.7326/M20-3905
Gong J, Ou J, Qiu X et al (2020) A tool for early prediction of severe coronavirus disease 2019 (COVID-19): a multicenter study using the risk nomogram in Wuhan and Guangdong, China. Clin Infect Dis 71:833–840. https://doi.org/10.1093/CID/CIAA443
Hashmi HAS, Asif HM (2020) Early detection and assessment of covid-19. Front Med 7:311. https://doi.org/10.3389/FMED.2020.00311
Knight SR, Ho A, Pius R et al (2020) Risk stratification of patients admitted to hospital with covid-19 using the ISARIC WHO clinical characterisation protocol: development and validation of the 4c mortality score. BMJ 370:m3339. https://doi.org/10.1136/BMJ.M3339
Berenguer J, Borobia AM, Ryan P et al (2021) Development and validation of a prediction model for 30-day mortality in hospitalised patients with COVID-19: the COVID-19 SEIMC score. Thorax 76:920–929. https://doi.org/10.1136/THORAXJNL-2020-216001
Lalueza A, Lora-Tamayo J, Maestro-de la Calle G et al (2022) A predictive score at admission for respiratory failure among hospitalized patients with confirmed 2019 coronavirus disease: a simple tool for a complex problem. Intern Emerg Med 17:515–524. https://doi.org/10.1007/S11739-021-02748-2
Camacho-Moll ME, Ramírez-Daher Z, Escobedo-Guajardo BL et al (2023) ABC-GOALScl score predicts admission to the intensive care unit and mortality of COVID-19 patients over 60 years of age. BMC Geriatr 23:138. https://doi.org/10.1186/S12877-023-03864-8
Artero A, Madrazo M, Fernández-Garcés M et al (2021) Severity scores in COVID-19 pneumonia: a multicenter, retrospective, cohort study. J Gen Intern Med 36:1338–1345. https://doi.org/10.1007/s11606-021-06626-7
George R, Mehta AA, Paul T et al (2022) Validation of MuLBSTA score to derive modified MuLB score as mortality risk prediction in COVID-19 infection. PLOS Glob public Heal 2:e0000511. https://doi.org/10.1371/JOURNAL.PGPH.0000511
Liu J, Liu Y, Xiang P et al (2020) Neutrophil-to-lymphocyte ratio predicts critical illness patients with 2019 coronavirus disease in the early stage. J Transl Med 18:206. https://doi.org/10.1186/S12967-020-02374-0
Mehta P, McAuley DF, Brown M et al (2020) COVID-19: consider cytokine storm syndromes and immunosuppression. Lancet (London, England) 395:1033–1034. https://doi.org/10.1016/S0140-6736(20)30628-0
Yang AP, Ping LJ, Qiang TW, Ming LH (2020) The diagnostic and predictive role of NLR, d-NLR and PLR in COVID-19 patients. Int Immunopharmacol 84:1065. https://doi.org/10.1016/J.INTIMP.2020.106504
Jimeno S, Ventura PS, Castellano JM et al (2021) Prognostic implications of neutrophil-lymphocyte ratio in COVID-19. Eur J Clin Invest 51:e13404. https://doi.org/10.1111/ECI.13404
López-Escobar A, Madurga R, Castellano JM et al (2021) Hemogram as marker of in-hospital mortality in COVID-19. J Investig Med 69:962–969. https://doi.org/10.1136/JIM-2021-001810
Ramos-Rincon JM, Buonaiuto V, Ricci M et al (2021) Clinical characteristics and risk factors for mortality in very old patients hospitalized with COVID-19 in Spain. J Gerontol A Biol Sci Med Sci 76:E28–E37. https://doi.org/10.1093/GERONA/GLAA243
Díaz-Simón R, Lalueza A, Lora-Tamayo J et al (2021) Clinical characteristics and risk factors of respiratory failure in a cohort of young patients requiring hospital admission with SARS-CoV2 infection in Spain: results of the multicenter SEMI-COVID-19 registry. J Gen Intern Med 36:3080–3087. https://doi.org/10.1007/S11606-021-07066-Z
Ramos-Rincón J-M, Ventura PS, Casas-Rojo J-M et al (2023) Validation of the RIM score-COVID in the Spanish SEMI-COVID-19 registry. Intern Emerg Med 18:907–915. https://doi.org/10.1007/S11739-023-03200-3
Ferrara P, Battiato S, Polosa R (2022) Progress and prospects for artificial intelligence in clinical practice: learning from COVID-19. Intern Emerg Med 17:1855–1857. https://doi.org/10.1007/S11739-022-03080-Z
Casillas N, Torres AM, Moret M et al (2022) Mortality predictors in patients with COVID-19 pneumonia: a machine learning approach using eXtreme gradient boosting model. Intern Emerg Med 17:1929–1939. https://doi.org/10.1007/S11739-022-03033-6
Ustebay S, Sarmis A, Kaya GK, Sujan M (2023) A comparison of machine learning algorithms in predicting COVID-19 prognostics. Intern Emerg Med 18:229–239. https://doi.org/10.1007/S11739-022-03101-X
Bhosale YH, Patnaik KS (2022) Application of deep learning techniques in diagnosis of covid-19 (coronavirus): a systematic review. Neural Process Lett. https://doi.org/10.1007/S11063-022-11023-0
Kiaei A, Salari N, Boush M et al (2022) Identification of suitable drug combinations for treating COVID-19 using a novel machine learning approach: the RAIN method. Life (Basel, Switzerland). https://doi.org/10.3390/LIFE12091456
Huang Y, Pinto MD, Borelli JL et al (2022) COVID symptoms, symptom clusters, and predictors for becoming a long-hauler looking for clarity in the haze of the pandemic. Clin Nurs Res 31:1390–1398. https://doi.org/10.1177/10547738221125632
Kim DK (2022) Prediction models for COVID-19 mortality using artificial intelligence. J Pers Med 12:1522. https://doi.org/10.3390/JPM12091522
Klén R, Purohit D, Gómez-Huelgas R et al (2022) Development and evaluation of a machine learning-based in-hospital COVID-19 disease outcome predictor (CODOP): a multicontinental retrospective study. Elife. 11:e75985. https://doi.org/10.7554/ELIFE.75985
Rubio-Rivas M, Mora-Luján JM, Montero A et al (2022) The use of corticosteroids or tocilizumab in COVID-19 based on inflammatory markers. J Gen Intern Med 37:168–175. https://doi.org/10.1007/S11606-021-07146-0
López-Escobar A, Madurga R, Castellano JM et al (2021) Risk score for predicting in-hospital mortality in COVID-19 (RIM Score). Diagnostics (Basel, Switzerland). https://doi.org/10.3390/DIAGNOSTICS11040596
Assaf D, Gutman Y, Neuman Y et al (2020) Utilization of machine-learning models to accurately predict the risk for critical COVID-19. Intern Emerg Med 15:1435–1443. https://doi.org/10.1007/S11739-020-02475-0
Wan TK, Huang RX, Tulu TW et al (2022) Identifying predictors of COVID-19 mortality using machine learning. Life (Basel, Switzerland) 12:547. https://doi.org/10.3390/LIFE12040547
Mamandipoor B, Bruno RR, Wernly B et al (2022) COVID-19 machine learning model predicts outcomes in older patients from various European countries, between pandemic waves, and in a cohort of Asian, African, and American patients. PLOS Digit Heal 1:e0000136. https://doi.org/10.1371/JOURNAL.PDIG.0000136
Gao Y, Cai GY, Fang W et al (2020) Machine learning based early warning system enables accurate mortality risk prediction for COVID-19. Nat Commun 11:5033. https://doi.org/10.1038/S41467-020-18684-2
Vaid A, Somani S, Russak AJ et al (2020) Machine learning to predict mortality and critical events in a cohort of patients with COVID-19 in New York City: model development and validation. J Med Internet Res 22:e24018. https://doi.org/10.2196/24018
Guadiana-Alvarez JL, Hussain F, Morales-Menendez R et al (2022) Prognosis patients with COVID-19 using deep learning. BMC Med Inform Decis Mak 22:78. https://doi.org/10.1186/S12911-022-01820-X
Reina Reina A, Barrera JM, Valdivieso B et al (2022) Machine learning model from a Spanish cohort for prediction of SARS-COV-2 mortality risk and critical patients. Sci Rep 12:5723. https://doi.org/10.1038/S41598-022-09613-Y
Domínguez-Olmedo JL, Gragera-Martínez Á, Mata J, Álvarez VP (2021) Machine learning applied to clinical laboratory data in Spain for COVID-19 outcome prediction: model development and validation. J Med Internet Res 23:e26211. https://doi.org/10.2196/26211
Izquierdo JL, Ancochea J, Soriano JB (2020) Clinical characteristics and prognostic factors for intensive care unit admission of patients with COVID-19: retrospective study using machine learning and natural language processing. J Med Internet Res 22:e21801. https://doi.org/10.2196/21801
Williamson EJ, Walker AJ, Bhaskaran K et al (2020) Factors associated with COVID-19-related death using OpenSAFELY. Nature 584:430–436. https://doi.org/10.1038/S41586-020-2521-4
Yan L, Zhang HT, Goncalves J et al (2020) An interpretable mortality prediction model for COVID-19 patients. Nat Mach Intell 25(2):283–288. https://doi.org/10.1038/s42256-020-0180-7
Yadaw AS, Li Chak Y, Bose S et al (2020) Clinical features of COVID-19 mortality: development and validation of a clinical prediction model. Lancet Digit Heal 2:e516–e525. https://doi.org/10.1016/S2589-7500(20)30217-X
Acknowledgements
We gratefully acknowledge all the investigators and staff from the SEMI-COVID-19 registry who participate in the collection of the patient data (see Appendix 1).
Funding
There are no sources of funding for this manuscript.
Author information
Authors and Affiliations
Consortia
Contributions
As corresponding authors, we warrant that the manuscript, as submitted, has been reviewed and approved by all named authors; that the corresponding authors are authorized by all authors to act on their behalf with respect to submission of the manuscript; that the article is original; that the article does not infringe any copyright or other proprietary right of any third party; that neither the text nor the data submitted have been previously published; and that the article or a substantially similar article is not under consideration by another journal at this time.
Corresponding authors
Ethics declarations
Conflict of interest
The authors declare that they have no conflicts of interest.
Institutional review board statement
The project was approved by the Institutional Research Ethics Committee of Málaga on 27 March 2020 (Ethics Committee code: SEMI-COVID-19 27–03-20), as per the guidelines of the Spanish Agency of Medicines and Medical Products. All patients gave informed consent.
Informed consent
Only patients who had previously given consent for their medical records to be used for medical research were included in this registry. Data confidentiality and patient anonymity were always maintained, in accordance with Spanish regulations on observational studies.
Human and animal rights
All procedures in human participants were approved by the Institutional Research Ethics Committee of Málaga on 27 March 2020 (Ethics Committee code: SEMI-COVID-19 27-03-20), as per the guidelines of the Spanish Agency of Medicines and Medical Products. All patients gave informed consent. This article does not contain any studies with animal performed by any of the authors.
Awards obtained
The RIM Score-COVID project was winner of the EpidemiXs—COVID Warriors Challenge of the V National Health Hackathon. Madrid, November 2020.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
On behalf of the SEMI-COVID-19 Network a complete list of the SEMI-COVID-19 Network members is provided in the Appendix.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Casas-Rojo, JM., Ventura, P.S., Antón Santos, J.M. et al. Improving prediction of COVID-19 mortality using machine learning in the Spanish SEMI-COVID-19 registry. Intern Emerg Med 18, 1711–1722 (2023). https://doi.org/10.1007/s11739-023-03338-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11739-023-03338-0