Skip to main content

Advertisement

Log in

Improving prediction of COVID-19 mortality using machine learning in the Spanish SEMI-COVID-19 registry

  • IM - ORIGINAL
  • Published:
Internal and Emergency Medicine Aims and scope Submit manuscript

Abstract

COVID-19 is responsible for high mortality, but robust machine learning-based predictors of mortality are lacking. To generate a model for predicting mortality in patients hospitalized with COVID-19 using Gradient Boosting Decision Trees (GBDT). The Spanish SEMI-COVID-19 registry includes 24,514 pseudo-anonymized cases of patients hospitalized with COVID-19 from 1 February 2020 to 5 December 2021. This registry was used as a GBDT machine learning model, employing the CatBoost and BorutaShap classifier to select the most relevant indicators and generate a mortality prediction model by risk level, ranging from 0 to 1. The model was validated by separating patients according to admission date, using the period 1 February to 31 December 2020 (first and second waves, pre-vaccination period) for training, and 1 January to 30 November 2021 (vaccination period) for the test group. An ensemble of ten models with different random seeds was constructed, separating 80% of the patients for training and 20% from the end of the training period for cross-validation. The area under the receiver operating characteristics curve (AUC) was used as a performance metric. Clinical and laboratory data from 23,983 patients were analyzed. CatBoost mortality prediction models achieved an AUC performance of 84.76 (standard deviation 0.45) for patients in the test group (potentially vaccinated patients not included in model training) using 16 features. The performance of the 16-parameter GBDT model for predicting COVID-19 hospital mortality, although requiring a relatively large number of predictors, shows a high predictive capacity.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Data availability

J-M C-R and J-M R-R have full access to the data and are the guarantors for the data.

References

  1. Zhang R, Li Y, Zhang AL et al (2020) Identifying airborne transmission as the dominant route for the spread of COVID-19. Proc Natl Acad Sci U S A 117:14857–14863. https://doi.org/10.1073/PNAS.2009637117

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Driggin E, Madhavan MV, Bikdeli B et al (2020) Cardiovascular considerations for patients, health care workers, and health systems during the COVID-19 pandemic. J Am Coll Cardiol 75:2352–2371. https://doi.org/10.1016/J.JACC.2020.03.031

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. COVID-19 Map - Johns hopkins coronavirus resource center. https://coronavirus.jhu.edu/map.html. Accessed 12 Mar 2023

  4. Casas-Rojo JM, Antón-Santos JM, Millán-Núñez-Cortés J et al (2020) Clinical characteristics of patients hospitalized with COVID-19 in Spain: results from the SEMI-COVID-19 registry. Rev Clin Esp 220:480–494. https://doi.org/10.1016/j.rce.2020.07.003

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Richardson S, Hirsch JS, Narasimhan M et al (2020) Presenting characteristics, comorbidities, and outcomes among 5700 patients hospitalized with COVID-19 in the New York City area. JAMA 323:2052–2059. https://doi.org/10.1001/JAMA.2020.6775

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Wu YC, Chen CS, Chan YJ (2020) The outbreak of COVID-19: an overview. J Chin Med Assoc 83:217–220. https://doi.org/10.1097/JCMA.0000000000000270

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Garibaldi BT, Fiksel J, Muschelli J et al (2021) Patient trajectories among persons hospitalized for covid-19: a cohort study. Ann Intern Med 174:33–41. https://doi.org/10.7326/M20-3905

    Article  PubMed  Google Scholar 

  8. Gong J, Ou J, Qiu X et al (2020) A tool for early prediction of severe coronavirus disease 2019 (COVID-19): a multicenter study using the risk nomogram in Wuhan and Guangdong, China. Clin Infect Dis 71:833–840. https://doi.org/10.1093/CID/CIAA443

    Article  CAS  PubMed  Google Scholar 

  9. Hashmi HAS, Asif HM (2020) Early detection and assessment of covid-19. Front Med 7:311. https://doi.org/10.3389/FMED.2020.00311

    Article  Google Scholar 

  10. Knight SR, Ho A, Pius R et al (2020) Risk stratification of patients admitted to hospital with covid-19 using the ISARIC WHO clinical characterisation protocol: development and validation of the 4c mortality score. BMJ 370:m3339. https://doi.org/10.1136/BMJ.M3339

    Article  PubMed  Google Scholar 

  11. Berenguer J, Borobia AM, Ryan P et al (2021) Development and validation of a prediction model for 30-day mortality in hospitalised patients with COVID-19: the COVID-19 SEIMC score. Thorax 76:920–929. https://doi.org/10.1136/THORAXJNL-2020-216001

    Article  PubMed  Google Scholar 

  12. Lalueza A, Lora-Tamayo J, Maestro-de la Calle G et al (2022) A predictive score at admission for respiratory failure among hospitalized patients with confirmed 2019 coronavirus disease: a simple tool for a complex problem. Intern Emerg Med 17:515–524. https://doi.org/10.1007/S11739-021-02748-2

    Article  PubMed  Google Scholar 

  13. Camacho-Moll ME, Ramírez-Daher Z, Escobedo-Guajardo BL et al (2023) ABC-GOALScl score predicts admission to the intensive care unit and mortality of COVID-19 patients over 60 years of age. BMC Geriatr 23:138. https://doi.org/10.1186/S12877-023-03864-8

    Article  PubMed  PubMed Central  Google Scholar 

  14. Artero A, Madrazo M, Fernández-Garcés M et al (2021) Severity scores in COVID-19 pneumonia: a multicenter, retrospective, cohort study. J Gen Intern Med 36:1338–1345. https://doi.org/10.1007/s11606-021-06626-7

    Article  PubMed  PubMed Central  Google Scholar 

  15. George R, Mehta AA, Paul T et al (2022) Validation of MuLBSTA score to derive modified MuLB score as mortality risk prediction in COVID-19 infection. PLOS Glob public Heal 2:e0000511. https://doi.org/10.1371/JOURNAL.PGPH.0000511

    Article  Google Scholar 

  16. Liu J, Liu Y, Xiang P et al (2020) Neutrophil-to-lymphocyte ratio predicts critical illness patients with 2019 coronavirus disease in the early stage. J Transl Med 18:206. https://doi.org/10.1186/S12967-020-02374-0

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Mehta P, McAuley DF, Brown M et al (2020) COVID-19: consider cytokine storm syndromes and immunosuppression. Lancet (London, England) 395:1033–1034. https://doi.org/10.1016/S0140-6736(20)30628-0

    Article  CAS  PubMed  Google Scholar 

  18. Yang AP, Ping LJ, Qiang TW, Ming LH (2020) The diagnostic and predictive role of NLR, d-NLR and PLR in COVID-19 patients. Int Immunopharmacol 84:1065. https://doi.org/10.1016/J.INTIMP.2020.106504

    Article  Google Scholar 

  19. Jimeno S, Ventura PS, Castellano JM et al (2021) Prognostic implications of neutrophil-lymphocyte ratio in COVID-19. Eur J Clin Invest 51:e13404. https://doi.org/10.1111/ECI.13404

    Article  CAS  PubMed  Google Scholar 

  20. López-Escobar A, Madurga R, Castellano JM et al (2021) Hemogram as marker of in-hospital mortality in COVID-19. J Investig Med 69:962–969. https://doi.org/10.1136/JIM-2021-001810

    Article  PubMed  Google Scholar 

  21. Ramos-Rincon JM, Buonaiuto V, Ricci M et al (2021) Clinical characteristics and risk factors for mortality in very old patients hospitalized with COVID-19 in Spain. J Gerontol A Biol Sci Med Sci 76:E28–E37. https://doi.org/10.1093/GERONA/GLAA243

    Article  CAS  PubMed  Google Scholar 

  22. Díaz-Simón R, Lalueza A, Lora-Tamayo J et al (2021) Clinical characteristics and risk factors of respiratory failure in a cohort of young patients requiring hospital admission with SARS-CoV2 infection in Spain: results of the multicenter SEMI-COVID-19 registry. J Gen Intern Med 36:3080–3087. https://doi.org/10.1007/S11606-021-07066-Z

    Article  PubMed  PubMed Central  Google Scholar 

  23. Ramos-Rincón J-M, Ventura PS, Casas-Rojo J-M et al (2023) Validation of the RIM score-COVID in the Spanish SEMI-COVID-19 registry. Intern Emerg Med 18:907–915. https://doi.org/10.1007/S11739-023-03200-3

    Article  PubMed  PubMed Central  Google Scholar 

  24. Ferrara P, Battiato S, Polosa R (2022) Progress and prospects for artificial intelligence in clinical practice: learning from COVID-19. Intern Emerg Med 17:1855–1857. https://doi.org/10.1007/S11739-022-03080-Z

    Article  PubMed  PubMed Central  Google Scholar 

  25. Casillas N, Torres AM, Moret M et al (2022) Mortality predictors in patients with COVID-19 pneumonia: a machine learning approach using eXtreme gradient boosting model. Intern Emerg Med 17:1929–1939. https://doi.org/10.1007/S11739-022-03033-6

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Ustebay S, Sarmis A, Kaya GK, Sujan M (2023) A comparison of machine learning algorithms in predicting COVID-19 prognostics. Intern Emerg Med 18:229–239. https://doi.org/10.1007/S11739-022-03101-X

    Article  PubMed  Google Scholar 

  27. Bhosale YH, Patnaik KS (2022) Application of deep learning techniques in diagnosis of covid-19 (coronavirus): a systematic review. Neural Process Lett. https://doi.org/10.1007/S11063-022-11023-0

    Article  PubMed  PubMed Central  Google Scholar 

  28. Kiaei A, Salari N, Boush M et al (2022) Identification of suitable drug combinations for treating COVID-19 using a novel machine learning approach: the RAIN method. Life (Basel, Switzerland). https://doi.org/10.3390/LIFE12091456

    Article  PubMed  Google Scholar 

  29. Huang Y, Pinto MD, Borelli JL et al (2022) COVID symptoms, symptom clusters, and predictors for becoming a long-hauler looking for clarity in the haze of the pandemic. Clin Nurs Res 31:1390–1398. https://doi.org/10.1177/10547738221125632

    Article  PubMed  PubMed Central  Google Scholar 

  30. Kim DK (2022) Prediction models for COVID-19 mortality using artificial intelligence. J Pers Med 12:1522. https://doi.org/10.3390/JPM12091522

    Article  PubMed  PubMed Central  Google Scholar 

  31. Klén R, Purohit D, Gómez-Huelgas R et al (2022) Development and evaluation of a machine learning-based in-hospital COVID-19 disease outcome predictor (CODOP): a multicontinental retrospective study. Elife. 11:e75985. https://doi.org/10.7554/ELIFE.75985

    Article  PubMed  PubMed Central  Google Scholar 

  32. Rubio-Rivas M, Mora-Luján JM, Montero A et al (2022) The use of corticosteroids or tocilizumab in COVID-19 based on inflammatory markers. J Gen Intern Med 37:168–175. https://doi.org/10.1007/S11606-021-07146-0

    Article  PubMed  Google Scholar 

  33. López-Escobar A, Madurga R, Castellano JM et al (2021) Risk score for predicting in-hospital mortality in COVID-19 (RIM Score). Diagnostics (Basel, Switzerland). https://doi.org/10.3390/DIAGNOSTICS11040596

    Article  PubMed  Google Scholar 

  34. Assaf D, Gutman Y, Neuman Y et al (2020) Utilization of machine-learning models to accurately predict the risk for critical COVID-19. Intern Emerg Med 15:1435–1443. https://doi.org/10.1007/S11739-020-02475-0

    Article  PubMed  PubMed Central  Google Scholar 

  35. Wan TK, Huang RX, Tulu TW et al (2022) Identifying predictors of COVID-19 mortality using machine learning. Life (Basel, Switzerland) 12:547. https://doi.org/10.3390/LIFE12040547

    Article  CAS  PubMed  Google Scholar 

  36. Mamandipoor B, Bruno RR, Wernly B et al (2022) COVID-19 machine learning model predicts outcomes in older patients from various European countries, between pandemic waves, and in a cohort of Asian, African, and American patients. PLOS Digit Heal 1:e0000136. https://doi.org/10.1371/JOURNAL.PDIG.0000136

    Article  Google Scholar 

  37. Gao Y, Cai GY, Fang W et al (2020) Machine learning based early warning system enables accurate mortality risk prediction for COVID-19. Nat Commun 11:5033. https://doi.org/10.1038/S41467-020-18684-2

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Vaid A, Somani S, Russak AJ et al (2020) Machine learning to predict mortality and critical events in a cohort of patients with COVID-19 in New York City: model development and validation. J Med Internet Res 22:e24018. https://doi.org/10.2196/24018

    Article  PubMed  PubMed Central  Google Scholar 

  39. Guadiana-Alvarez JL, Hussain F, Morales-Menendez R et al (2022) Prognosis patients with COVID-19 using deep learning. BMC Med Inform Decis Mak 22:78. https://doi.org/10.1186/S12911-022-01820-X

    Article  PubMed  PubMed Central  Google Scholar 

  40. Reina Reina A, Barrera JM, Valdivieso B et al (2022) Machine learning model from a Spanish cohort for prediction of SARS-COV-2 mortality risk and critical patients. Sci Rep 12:5723. https://doi.org/10.1038/S41598-022-09613-Y

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Domínguez-Olmedo JL, Gragera-Martínez Á, Mata J, Álvarez VP (2021) Machine learning applied to clinical laboratory data in Spain for COVID-19 outcome prediction: model development and validation. J Med Internet Res 23:e26211. https://doi.org/10.2196/26211

    Article  PubMed  PubMed Central  Google Scholar 

  42. Izquierdo JL, Ancochea J, Soriano JB (2020) Clinical characteristics and prognostic factors for intensive care unit admission of patients with COVID-19: retrospective study using machine learning and natural language processing. J Med Internet Res 22:e21801. https://doi.org/10.2196/21801

    Article  PubMed  PubMed Central  Google Scholar 

  43. Williamson EJ, Walker AJ, Bhaskaran K et al (2020) Factors associated with COVID-19-related death using OpenSAFELY. Nature 584:430–436. https://doi.org/10.1038/S41586-020-2521-4

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Yan L, Zhang HT, Goncalves J et al (2020) An interpretable mortality prediction model for COVID-19 patients. Nat Mach Intell 25(2):283–288. https://doi.org/10.1038/s42256-020-0180-7

    Article  Google Scholar 

  45. Yadaw AS, Li Chak Y, Bose S et al (2020) Clinical features of COVID-19 mortality: development and validation of a clinical prediction model. Lancet Digit Heal 2:e516–e525. https://doi.org/10.1016/S2589-7500(20)30217-X

    Article  Google Scholar 

Download references

Acknowledgements

We gratefully acknowledge all the investigators and staff from the SEMI-COVID-19 registry who participate in the collection of the patient data (see Appendix 1).

Funding

There are no sources of funding for this manuscript.

Author information

Authors and Affiliations

Authors

Consortia

Contributions

As corresponding authors, we warrant that the manuscript, as submitted, has been reviewed and approved by all named authors; that the corresponding authors are authorized by all authors to act on their behalf with respect to submission of the manuscript; that the article is original; that the article does not infringe any copyright or other proprietary right of any third party; that neither the text nor the data submitted have been previously published; and that the article or a substantially similar article is not under consideration by another journal at this time.

Corresponding authors

Correspondence to José-Manuel Ramos-Rincón or Alejandro López-Escobar.

Ethics declarations

Conflict of interest

The authors declare that they have no conflicts of interest.

Institutional review board statement

The project was approved by the Institutional Research Ethics Committee of Málaga on 27 March 2020 (Ethics Committee code: SEMI-COVID-19 27–03-20), as per the guidelines of the Spanish Agency of Medicines and Medical Products. All patients gave informed consent.

Informed consent

Only patients who had previously given consent for their medical records to be used for medical research were included in this registry. Data confidentiality and patient anonymity were always maintained, in accordance with Spanish regulations on observational studies.

Human and animal rights

All procedures in human participants were approved by the Institutional Research Ethics Committee of Málaga on 27 March 2020 (Ethics Committee code: SEMI-COVID-19 27-03-20), as per the guidelines of the Spanish Agency of Medicines and Medical Products. All patients gave informed consent. This article does not contain any studies with animal performed by any of the authors.

Awards obtained

The RIM Score-COVID project was winner of the EpidemiXs—COVID Warriors Challenge of the V National Health Hackathon. Madrid, November 2020.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

On behalf of the SEMI-COVID-19 Network a complete list of the SEMI-COVID-19 Network members is provided in the Appendix.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (DOCX 27 KB)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Casas-Rojo, JM., Ventura, P.S., Antón Santos, J.M. et al. Improving prediction of COVID-19 mortality using machine learning in the Spanish SEMI-COVID-19 registry. Intern Emerg Med 18, 1711–1722 (2023). https://doi.org/10.1007/s11739-023-03338-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11739-023-03338-0

Keywords

Navigation