Comparison of different ai systems for diagnosing sepsis, septic shock, and cardiogenic shock: a retrospective study

feature-image

Play all audios:

Loading...

ABSTRACT Sepsis, septic shock, and cardiogenic shock are life-threatening conditions associated with high mortality rates, but differentiating them is complex because they share certain


symptoms. Using the Medical Information Mart for Intensive Care (MIMIC)-III database and artificial intelligence (AI), we aimed to increase diagnostic precision, focusing on Bayesian network


classifiers (BNCs) and comparing them with other AI methods. Data from 5970 adults, including 950 patients with cardiogenic shock, 1946 patients with septic shock, and 3074 patients with


sepsis, were extracted for this study. Of the original 51 variables included in the data records, 12 were selected for constructing the predictive model. The data were divided into training


and validation sets at an 80:20 ratio, and the performance of the BNCs was evaluated and compared with that of other AI models, such as the one rule classifier (OneR), classification and


regression tree (CART), and an artificial neural network (ANN), in terms of accuracy, sensitivity, specificity, precision, and F1-score. The BNCs exhibited an accuracy of 87.6% to 91.5%. The


CART model demonstrated a notable 91.6% accuracy when only three decision levels were used, whereas the intricate ANN model reached 90.5% accuracy. Both the BNCs and the CART model allowed


clear interpretation of the predictions. BNCs have the potential to be valuable tools in diagnostic tasks, with an accuracy, sensitivity, and precision comparable, in some cases, to those of


ANNs while demonstrating superior interpretability. SIMILAR CONTENT BEING VIEWED BY OTHERS ARTIFICIAL INTELLIGENCE BASED MULTISPECIALTY MORTALITY PREDICTION MODELS FOR SEPTIC SHOCK IN A


MULTICENTER RETROSPECTIVE STUDY Article Open access 28 April 2025 DEVELOPMENT OF A MACHINE LEARNING-BASED PREDICTION MODEL FOR SEPSIS-ASSOCIATED DELIRIUM IN THE INTENSIVE CARE UNIT Article


Open access 04 August 2023 EXTERNAL VALIDATION OF AN ARTIFICIAL INTELLIGENCE MODEL USING CLINICAL VARIABLES, INCLUDING ICD-10 CODES, FOR PREDICTING IN-HOSPITAL MORTALITY AMONG TRAUMA


PATIENTS: A MULTICENTER RETROSPECTIVE COHORT STUDY Article Open access 07 January 2025 INTRODUCTION Sepsis, septic shock, and cardiogenic shock are critical medical conditions characterised


by life-threatening organ dysfunction and high mortality rates. Distinguishing between these syndromes can be challenging, however, owing to their shared pathophysiological features. Sepsis


is defined as organ dysfunction resulting from a dysregulated host response to infection and can be detected by an acute increase in the total Sequential Organ Failure Assessment (SOFA)


score by ≥ 2 points due to infection. Septic shock is identified by persistent hypotension requiring vasopressors to maintain a mean arterial pressure ≥ 65 mm Hg and a serum lactate level 


> 2.0 mmol/l despite adequate volume administration1,2. Cardiogenic shock, on the other hand, is diagnosed when patients exhibit a systolic blood pressure < 90 mmHg for ≥ 30 min or


require pharmacologic/mechanical circulatory support to maintain adequate blood pressure, accompanied by clinical signs of pulmonary congestion, pulmonary oedema, and impaired end-organ


perfusion3,4. Distinguishing between septic and cardiogenic shock is particularly challenging because of the overlapping pathophysiological mechanisms of both conditions, while septic shock


often involves a complex interplay of distributive shock, hypovolemia, impaired cardiac function, mitochondrial dysfunction, and coagulopathies5,6. Furthermore, the incidence of septic (31%)


and cardiogenic shock (28%) are nearly identical7, and both represent life-threatening conditions requiring rapid diagnosis and treatment to prevent further deterioration, the development


of serious consequences, or death. In recent years, the integration of artificial intelligence (AI) technologies, driven by a surge in big data, has garnered increasing attention across


various sectors, including healthcare. AI-based applications hold immense potential to revolutionise medical research and clinical practice, consistently demonstrating the capacity to


outperform human counterparts in certain instances8,9. Despite these promising achievements, the widespread implementation of AI in healthcare faces considerable challenges, one of the


foremost of which is the inherent lack of transparency of complex AI algorithms. Deep learning methods, especially artificial neural networks (ANNs), often operate as a "black


box," rendering the decision-making process inscribable and eroding trust in AI-derived solutions10. A potential remedy for this interpretability issue is the use of Bayesian network


classifiers (BNCs) as decision support tools. BNCs offer a unique advantage by graphically depicting direct and indirect variable dependencies, facilitating the understanding of causal


relationships among variables in disease prediction. This retrospective study aimed to fill this crucial gap by developing a reliable diagnostic assistance system. Leveraging patient data


from the Medical Information Mart for Intensive Care (MIMIC) III database, we focused on the intensive care conditions of sepsis, septic shock, and cardiogenic shock and employed various


BNCs for discriminating among them. These classifiers were subsequently benchmarked against conventional approaches such as the One Rule classifier (OneR), the Classification and Regression


Trees (CART) algorithm, and an artificial neural networks (ANN), paving the way for a comprehensive evaluation of model performance, interpretability, and potential to enhance clinical


decision-making in critical care scenarios. METHODS DATA SOURCE AND SELECTION Given the difficulties of obtaining primary medical data, the MIMIC-III database was utilised in this study.


This extensive, single-centre database consists of deidentified clinical data from patients admitted to critical care units of the Beth Israel Deaconess Medical Center in Boston,


Massachusetts, between 2001 and 201211. The data are widely accessible to researchers under a data use agreement, so the study was exempt from the need for specific ethical review.


Nevertheless, we adhered to fundamental ethical principles and ensured the validity and fairness of the study process. The database contains data from 53,423 adult hospital admissions (aged


16 years or above)11. Among these patients, this study focused on patients aged ≥ 18 years. The inclusion criteria included admission with a diagnosis of sepsis, septic shock, or cardiogenic


shock according to the International Classification of Diseases, 9th Revision (ICD-9) diagnosis table (codes 785.51, 785.52, 995.91, and 995.92). Due to the constraints of the data in the


MIMIC-III collected prior to 2012, sepsis and septic shock admissions were re-evaluated on the basis of criteria from the 2016 and 2021 Surviving Sepsis Campaign guidelines1,2 and


cardiogenic shock guidelines3,4. A two-step reclassification process was conducted to differentiate between the sepsis and septic shock groups. Initially, patients diagnosed with either


sepsis, severe sepsis, or septic shock who had a SOFA score ≥ 2 were classified as having sepsis. Patients who did not fulfill this criterion were excluded from the data set. Within this


group, patients were then reclassified as septic shock cases if they also met the diagnostic criteria of vasopressor-dependent circulatory failure and hyperlactatemia (plasma lactate level 


> 2 mmol/L). After being reallocated into the septic shock group, those cases were subsequently removed from the sepsis group. We used MIT-LCP/mimic-code to create the SOFA score table12.


The criteria for the cardiogenic shock group included a low systolic blood pressure (< 90 mmHg) for a period of ≥ 30 min or the presence of two of three clinical signs from among the


following: bilateral lung sounds, cool/cold skin temperature and pale/mottled/cyanotic skin colour; data on admissions for patients who did not meet these criteria were excluded. Patients


whose data did not adhere to the previously mentioned conditions were either reclassified or entirely excluded from this study. To prevent duplicate diagnoses from affecting the learning and


assessment processes of the models, patients diagnosed with multiple conditions (cardiogenic shock and either sepsis or septic shock) during a single admission were also excluded. The data


adjustment process is shown in Fig. 1. DATA PROCESSING The final dataset comprised 5,970 distinct hospital admissions, each considered a unique observation. To enhance the dataset, the


Elixhauser et al.13 comorbidity index was extracted from the MIMIC III diagnostic table, and selected categories were incorporated as variables in the dataset. This classification, along


with the index value, has been demonstrated to be highly important in predicting prognoses and mortality for a range of diseases and injuries. Compared with detailed ICD9 codes, it allows


better organisation of the corresponding data. The data from Quan et al.14, constructed via the MIT-LCP/mimic-code12 methodology, were employed to generate predictor variables for this


research. In addition, specific predictor variables for conditions such as cardiac arrhythmias, myocardial infarction and infections were formulated by combining different ICD9 codes into


diagnostic groups. We also introduced predictor variables for age, sex, SOFA score ≥ 2, vasopressor dependence, lactate level > 2 mmol/l, and other clinical signs and indicators


(Supplementary Table 1 online offers an overview of the created variables). FEATURE SELECTION A three-phase feature selection process was employed after cleaning the dataset for


restructuring and adaptation to the new guidelines. Initially, variables that were deemed irrelevant to disease diagnosis were excluded. In the second phase, features that did not provide


sufficient value and features exhibiting unsatisfactory importance due to low frequency or low variance between classes were discarded. The final phase involved evaluating the remaining


variables with the mutual information score (MIS), employing the median as the cut-off threshold over the arithmetic mean because of the enhanced resilience of this metric against outliers.


After the feature selection process, 12 of the 51 variables were included in the final dataset. Table 1 offers an in-depth overview of the included prediction variables. CLASSIFIERS EMPLOYED


To address the challenge of differentiating between the clinically similar conditions of sepsis, septic shock and cardiogenic shock, Bayesian network classifiers (BNCs) were employed. These


classifiers leverage probabilistic relationships among selected clinical parameters (e.g., vasopressor dependency, lactate levels, SOFA scores, and infection status) to calculate diagnostic


probabilities, thereby enabling nuanced differentiation among these clinically overlapping syndromes. Various BNCs were trained on the dataset and evaluated for their predictive power in


terms of accuracy, sensitivity, specificity, precision, F1-Score, and interpretability. To assess the usefulness of the BNCs, the results were compared with those of other commonly used


classifiers, such as a naive Bayes (NB) classifier, the One rule Classifier (OneR), a Classification and Regression Trees (CART)-based classifier, and a feed forward backpropagation


artificial neural network (ANN). The basic concepts, methods and relevant algorithms of the individual classifiers are briefly summarised below. NAIVE BAYES (NB) A probabilistic algorithm


based on Bayes’ theorem that operates under the assumption that all features are independent of one another. TREE AUGMENTED NAIVE BAYES (TAN) An enhancement of the NB, the TAN algorithm


introduces feature dependencies via a tree structure, enhancing prediction accuracy over its naive counterpart. It employs methods such as the Chow–Lui algorithm (Akaike information


criterion (AIC), Bayesian information criterion (BIC), and log-likelihood (LOG)) for structure optimisation. SEMI-NAIVE BAYES CLASSIFIER (SNBC) SNBC is an adaptation of NB that considers


feature interdependencies and eliminates, selects or joins features to improve classification accuracy. It uses algorithms such as backward sequential elimination and joining (BSEJ) and


forward sequential selection and joining (FSSJ). BENCHMARKING We used three alternative techniques/algorithms as benchmarks to evaluate our models. ONE RULE CLASSIFIER (ONER) This one-rule


algorithm classifies data on the basis of a single attribute. Despite the simplicity of the resulting model, OneR has been shown to deliver good results and thus serves an excellent


benchmark15. CLASSIFICATION AND REGRESSION TREE (CART) A decision tree trained through recursive partitioning. The tree’s structure represents the course of the subsequent classification


process, determined by the iterative division of the data into subgroups16. FEED-FORWARD BACKPROPAGATION NEURAL NETWORK (ANN) This algorithm accurately represents the biological nervous


system and is designed to mimic the communication between neurons. The feedforward aspect describes the direction of signal transmission, whereas the backpropagation aspect describes the


neural network’s learning process, wherein the loss function is minimised by adjusting the individual parameters through weights and biases. EVALUATION METRICS AND PROCEDURES All classifiers


were trained, validated, and tested. First, the data were randomly divided into training and test datasets at a ratio of 80:20. Performance metrics, including accuracy, sensitivity/recall,


specificity, precision, and F1-score, were derived from the resulting confusion matrices. Specifically, the true negative (TN), true positive (TP), false negative (FN), false positive (FP)


and true negative (TN) values presented in the confusion matrix were used to calculate the following metrics for qualifying the classifiers: $$Accuracy = \frac{TP + TN}{{TP + TN + FP +


FN}}$$ $$Sensitivity/{{Re}}call = \frac{TP}{{TP + FN}}$$ $$Specifity = \frac{TN}{{TN + FP}}$$ $${{Precision}} = \frac{{TP}}{{TP + FP}}$$ $$F1-Score = \frac{{2. Precision \cdot {{Re}}call}}{{


Precision + {{Re}}call}}$$ These metrics are class-specific and must therefore be calculated three times for this multiclass problem. Furthermore, the area under the curve (AUC) was


calculated for the multiclass classification following the methods of Hand and Till17. We performed tenfold cross-validation using accuracy and Cohen’s kappa as internal validation metrics


to assess the accuracy and reliability of the models independent of the choice of randomiser. Additionally, we attempted to address potential imbalances in the models through upsampling and


then compared the results with those obtained without upsampling. COMPUTATION AND VISUALISATION All computations and visualisations were performed with R Version 4.2.2, RStudio 2022.07.1


Build 554 and the packages bnclassify, caret, nnet, OneR, pRoc and rpart18,19,20,21,22,23. RESULTS This study explored the efficacy of Bayesian networks in addressing intricate


classification challenges within the field of medical diagnostics. Initially, various BNCs were compared, but the differences in their performance were negligible. The accuracy of the TAN


method ranged from 91.3 to 91.5%, whereas the SNBC model achieved an accuracy of 91.1% for the forwards sequences and 90.9% for the backwards sequences. Moreover, the NB method exhibited a


comparatively modest accuracy of 87.6% (Table 2). To further evaluate the outcomes of the Bayesian networks, the classifiers were benchmarked against established models (Table 2). Utilising


OneR as a preliminary benchmark revealed that vasopressor dependency was the paramount predictor, with 68.5% predictive power, followed by infection at 61.5% and lactate at 59.1%. These


findings are in accordance with the results of the MIS. Although these accuracy values were considerably lower than those of the BNCs, they provided preliminary insights into the


effectiveness of distinct predictors. Given the binary nature of the predictor variables, OneR can identify only two classes, which downgrades its utility from that of a benchmark model to


that of a tool for determining feature importance. In our evaluations, the CART model exhibited an accuracy of 91.6% through its rudimentary tree framework, employing only the predictors


vasopressor dependency, lactate and infection, the same ones that were most prominent in the OneR analysis. The BNCs were subsequently compared with the ANN. Despite numerous iterations


across different hidden layer architectures, the highest accuracy achieved by the ANN, obtained with a singular hidden layer and ten neurons, was 90.5%, whereas different multilayer ANNs


yielded accuracies between 85.0 and 88.0%. In the tenfold cross-validation, the TAN (BIC), TAN (AIC) and FSSJ models showed relatively low variance between quartiles in terms of accuracy and


Cohen’s kappa, indicating consistent performance across different samples, whereas the ANN showed significant performance degradation. The cross-validation process revealed that the TAN


models exhibited the best performance, with high accuracy and kappa values while exhibiting low variance. Nevertheless, all the models demonstrated a high degree of correlation, thereby


indicating the emergence of a common trend. The box plot diagram in Fig. 2 provides a comprehensive overview of the cross-validation results. All the employed models were assessed in terms


of their ability to accurately classify cardiogenic shock. While the sensitivities/recalls for sepsis and septic shock remained consistently above 90% across the models, those for


cardiogenic shock fluctuated between 77.3 and 83.5%. Upsampling was performed to balance the classes and investigate whether an imbalance within the training data may have been responsible


for this discrepancy. While this strategy increased cardiogenic shock sensitivity by an average of 4.5%, it compromised holistic model performance, as the overall accuracy decreased by an


average of 1.7%, and the sensitivities for septic shock and sepsis decreased by 2.9% and 2.8%, respectively. The models demonstrated a mean reduction in accuracy of 2.9% for the sepsis class


and 2.8% for the septic shock class, whereas that for the cardiogenic shock class decreased by 12.0% on average. Furthermore, the mean F1-scores of the models for the different classes


ranged between 0.006 and 0.044. DISCUSSION The urgency in diagnosing life-threatening conditions such as sepsis, septic shock, and cardiogenic shock, all of which have high mortality rates,


cannot be overstated. Our findings highlight the effectiveness of BNCs in facilitating prompt diagnoses of these ailments. To our knowledge, this is the first investigation of BNCs in the


differential diagnosis of sepsis, septic shock, and cardiogenic shock. Given the absence of directly analogous studies, we benchmarked our models against other AI algorithms. Prior


techniques for the early diagnosis of these conditions have employed clinical decision rules, logistic regression models, random forests, decision trees, and deep learning


techniques24,25,26,27,28. The latest research emphasises early detection before the disease can truly manifest, making direct comparisons with certain existing methods challenging, as our


focus was on immediate diagnosis24,25,26,27,28. Many AI-based diagnostic methods target a single condition (binary classification), whereas the models developed in our study perform


multiclass differentiation, further increasing the difficulty in directly comparing these methods. Challenging the conventional wisdom of ANNs as the AI gold standard, our BNCs, which are


inherently interpretable, exhibited comparable accuracy. Our findings suggest that BNCs are potent tools for intricate classification tasks in the field of medicine, demonstrating


substantial accuracy in multiclass predictions. Their inherent interpretability also makes them inherently appealing. As indicated above, the prevailing narrative holds that ANN and deep


learning methods are universal solutions, potentially overshadowing the utility of BNCs29, and the latest model is often assumed to perform best30. The findings of this study demonstrate


that BNCs are comparable to ANNs in terms of performance metrics and may even outperform them, thus challenging the prevailing assumption about the novelty and effectiveness of ANN models.


Our findings also call into question the prevailing hypothesis regarding a trade-off between accuracy and interpretability, instead suggesting that increased model complexity does not


necessarily equate to superior performance. The differences were only a few percentage points, however, and it is possible that the ANN could be optimised with additional training and


tuning. Notably, applying a straightforward hidden layer structure resulted in a notable increase in accuracy over more intricate structural configurations. The CART model, despite its


simplicity, outperformed some of the BNCs and the ANN. The rudimentary architecture of the NB model yielded suboptimal results, whereas the nuanced interconnections within the TAN methods


enhanced their accuracy. The SNBCs leveraged their ability to select features but were, unfortunately, not superior to the TANs in our study. Consistent with this notion, the BNCs in our


study demonstrated substantial decomposability, algorithmic transparency, and simulatability. The upsampling procedures we employed to increase the sensitivity/recall for the cardiogenic


shock class highlighted the trade-off between this metric and precision in multiclass approaches. While most of our models improved in predicting cardiogenic shock following upsampling, the


precision for this class significantly decreased. In contrast, the sensitivity/recall decreased for the other two classes, whereas the precision increased slightly. It is important that our


models be able to differentiate between the three classes with high accuracy; in this context, therefore, sensitivity/recall and precision represent essential metrics. The TAN AIC model


demonstrated a good balance between sensitivity/recall and precision without upsampling, whereas after upsampling, it demonstrated only a 2.1% greater sensitivity/recall and an 11.8% lower


precision for the cardiogenic shock category. This is further evidenced by the decrease in the F1-score, by 0.052 points. Moreover, without upsampling, commendable sensitivity/recall and


precision values were already demonstrated for the other two classes with this model. Consequently, the TAN (AIC) model was identified as the preferred option. A shared limitation across the


models was the difficulty in classifying cardiogenic shock. The cardiogenic shock is associated with a wide range of potential causes (e.g. myocardial infarction, post-myocardial syndrome,


valvular heart disease etc.)31, which significantly increases the heterogeneity within this cohort. Potential data or variable-related deficiencies might be another reason for this issue.


Critical insights from imaging, patient history, and clinical examinations, mainly presented as continuous text within the databases, were not explored. The promise of natural language


processing tools in refining such data warrants exploration. Our dataset exhibited an underrepresentation of myocardial infarction (60%), a critical predictor of cardiogenic shock, compared


with the frequency reported in previous studies on this condition (80%)3. Moreover, the incidence of myocardial infarctions during sepsis and septic shock admissions introduced a number of


uncertainties. Variables such as ‘vasopressor dependency’, ‘infection’ and ‘lactate’ were shown to be pivotal indicators for sepsis and septic shock, which aligns with contemporary sepsis


definitions and guidelines. However, the revision of outdated data may have influenced model learning, potentially skewing the predictive power of these indicators. The absence of temporal


contexts in our final dataset limited the integration of specific measured values, underscoring the need for temporally processable datasets for improving diagnostic accuracy, which should


be investigated in a follow-up study with more current data. Furthermore, because outdated data were employed in this study, prospective statements regarding the results in a clinical


setting cannot be made. Nevertheless, the retrospective outcomes are encouraging, suggesting that external validation may be associated with favourable results. Another limitation of the


study was that patients with concomitant conditions such as septic and cardiogenic shock were explicitly excluded to ensure clinically distinct phenotypes. Furthermore, no distinction was


made between patients with septic shock due to septic cardiomyopathy and those due to vasodilation (reduced vascular tone). While the models in this study displayed commendable performance


levels for a discrete classification, their diagnostic accuracy in individual cases decreases in proportion to pathophysiological complexity. This finding mirrors the clinical reality, where


the presence of overlapping pathophysiological profiles complicates the diagnosis32 and leaves a crucial diagnostic gap for the use of the models in a clinical context. Moreover, the


inherent heterogeneity of the study population presents a challenge to the generalisability of prediction models. Emerging evidence highlights the urgent need to explore sub-phenotypes to


improve clinical care33. Recent studies have shown that machine learning-based subphenotyping is highly relevant prognosis in critical care34,35,36,37,38 and validation of prediction


models39. Future work is needed to determine whether subphenotypes provide better model performance. In subsequent studies, the intention is to undertake external validation, enhance the


model, and execute a systematic evaluation of the model’s performance across sub-phenotypes. CONCLUSIONS This investigation aimed to design a BNC to aid in the diagnosis of sepsis, septic


shock, and cardiogenic shock, drawing data from the MIMIC-III database. Classifiers constructed from TAN algorithms demonstrated optimal performance, characterised by both reproducibility


and interpretability. Our findings highlight the efficacy of selecting BNCs for intricate classification tasks, such as diagnostic support systems. Notably, certain BNCs have the potential


to match the performance of or even outperform ANNs while ensuring consistent learning behaviours, commendable accuracy, and interpretability. Nevertheless, there is an obvious need for


ongoing improvements, particularly by integrating data with temporal relationships. DATA AVAILABILITY The data are widely accessible to researchers under a data use agreement. Researchers


seeking to use the database must formally request access. For details, see https://mimic.mit.edu/docs/gettingstarted/ or https://physionet.org/content/mimiciii/1.4/ ABBREVIATIONS * AIC:


Akaike information criterion * ANN: Feed forward backpropagation artificial neural network * AUC: Area under the curve * BIC: Bayesian information criterion * BNC: Bayesian network


classifier * BSEJ: Backward sequential elimination and joining * CART: Classification and regression tree * FN: False negative * FP: False positive * FSSJ: Forward sequential selection and


joining * LOG: LOG-likelihood * MIMIC: Medical Information Mart for Intensive Care * MIS: Mutual information score * MIT-LCP: Massachusetts Institute of Technology-Laboratory for


Computational Physiology * NB: Naive Bayes * OneR: One rule machine learning classification algorithm with enhancements * SNBC: Semi-Naïve Bayes classifier * TAN: Tree augmented Naive Bayes


* TN: True negative * TP: True positive REFERENCES * Singer, M. et al. The third international consensus definitions for sepsis and septic shock (Sepsis-3). _JAMA_ 315, 801–810 (2016).


Article  CAS  PubMed  PubMed Central  Google Scholar  * Evans, L. et al. Surviving sepsis campaign: International guidelines for management of sepsis and septic shock 2021. _Intensive Care


Med._ 47, 1181–1247 (2021). Article  PubMed  PubMed Central  Google Scholar  * Vahdatpour, C., Collins, D. & Goldberg, S. Cardiogenic shock. _J. Am. Heart Assoc._ 8, e011991 (2019).


Article  PubMed  PubMed Central  Google Scholar  * Werdan, K. et al. Kurzversion der 2. Auflage der deutsch-österreichischen S3-Leitlinie infarkt-bedingter kardiogener schock—Diagnose,


monitoring und therapie. _Kardiologe_ 14, 364–395 (2020). Article  Google Scholar  * Standl, T. et al. The nomenclature, definition and distinction of types of shock. _Dtsch Arztebl Int._


115, 757–768 (2018). PubMed  PubMed Central  Google Scholar  * Carbone, F., Liberale, L., Preda, A., Schindler, T. H. & Montecucco, F. Septic cardiomyopathy: From pathophysiology to the


clinical setting. _Cells_ 11, 2833 (2022). Article  CAS  PubMed  PubMed Central  Google Scholar  * Bloom, J. E. et al. Incidence and outcomes of nontraumatic shock in adults using emergency


medical services in Victoria, Australia. _JAMA Netw Open_ 5, e2145179 (2022). Article  PubMed  PubMed Central  Google Scholar  * Davenport, T. & Kalakota, R. The potential for artificial


intelligence in healthcare. _Future Healthc. J._ 6, 94–98 (2019). Article  PubMed  PubMed Central  Google Scholar  * Yu, K. H., Beam, A. L. & Kohane, I. S. Artificial intelligence in


healthcare. _Nat. Biomed. Eng._ 2, 719–731 (2018). Article  PubMed  Google Scholar  * Zihni, E. et al. Opening the black box of artificial intelligence for clinical decision support: A study


predicting stroke outcome. _PLoS ONE_ 15, e0231166 (2020). Article  CAS  PubMed  PubMed Central  Google Scholar  * Johnson, A., Pollard, T. & Mark, R. MIMIC-III clinical database.


PhysioNet (2023). * Johnson, A., Pollard, T., Blundell, J., Gow, B., Erinhong, Paris, N., et al. _MIT-LCP/Mimic-Code: MIMIC Code v2.2.1_. Zenodo; (2022). * Elixhauser, A., Steiner, C.,


Harris, D. R. & Coffey, R. M. Comorbidity measures for use with administrative data. _Med. Care_ 36, 8–27 (1998). Article  CAS  PubMed  Google Scholar  * Quan, H. et al. Coding


algorithms for defining comorbidities in ICD-9-CM and ICD-10 administrative data. _Med. Care_ 43, 1130–1139 (2005). Article  PubMed  Google Scholar  * Shaik, N. B., Pedapati, S. R., Taqvi,


S. A. A., Othman, A. R. & Dzubir, F. A. A. A feed-forward back propagation neural network approach to predict the life condition of crude oil pipeline. _Processes_ 8, 661 (2020). Article


  Google Scholar  * Breiman, L. _Classification and Regression Trees_ (Routledge, 1984). Google Scholar  * Hand, D. J. & Till, R. J. A simple generalisation of the area under the ROC


curve for multiple class classification problems. _Mach. Learn._ 45, 171–186 (2001). Article  Google Scholar  * Mihaljevic, B., Bielza, C. & Larranaga, P. bnclassify: Learning Bayesian


network classifiers. _R J._ 10, 455 (2018). Article  Google Scholar  * Jouanne-Diedrich, H. V. OneR: one rule machine learning classification algorithm with enhancements (2017).


https://CRAN.R-project.org/package=OneR. Accessed 23 Feb 2024. * Venables, B. & Ripley, B. _Modern Applied Statistics with S_ (Springer, 2002). Book  Google Scholar  * Kuhn, M. caret:


Classification and regression training. 2022. https://CRAN.R-project.org/package=caret. Accessed 12 May 2024. * Robin, X. et al. pROC: An open-source package for R and S+ to analyze and


compare ROC curves. _BMC Bioinf._ https://doi.org/10.1186/1471-2105-12-77 (2011). Article  Google Scholar  * Therneau, T. & Atkinson, B. rpart: Recursive partitioning and regression


trees (2022). https://CRAN.R-project.org/package=rpart. Accessed 12 Feb 2024. * Wu, M., Du, X., Gu, R. & Wei, J. Artificial intelligence for clinical decision support in sepsis. _Front


Med._ 8, 665464 (2021). Article  Google Scholar  * Bai, Z. et al. Development of a machine learning model to predict the risk of late cardiogenic shock in patients with ST-segment elevation


myocardial infarction. _Ann. Transl. Med._ 9, 1162 (2021). Article  CAS  PubMed  PubMed Central  Google Scholar  * Chang, Y. et al. Early prediction of cardiogenic shock using machine


learning. _Front Cardiovasc. Med._ 9, 862424 (2022). Article  CAS  PubMed  PubMed Central  Google Scholar  * Hu, Y., Lui, A., Goldstein, M., Sudarshan, M., Tinsay, A., Tsui, C., et al. A


dynamic risk score for early prediction of cardiogenic shock using machine learning. arXiv https://doi.org/10.48550/arXiv.2303.12888 (2023). * Rahman, F. et al. Using machine learning for


early prediction of cardiogenic shock in patients with acute heart failure. _J. Soc. Cardiovasc. Angiogr. Interv._ 1, 100308 (2022). PubMed  PubMed Central  Google Scholar  * Ahmed, M. &


Islam, A. K. M. N. Deep learning: Hope or hype. _Ann. Data Sci._ 7, 427–432 (2020). Article  Google Scholar  * Johansson, U., Sönströd, C., Norinder, U. & Boström, H. Trade-off between


accuracy and interpretability for predictive in silico modeling. _Future Med. Chem._ 3, 647–663 (2011). Article  CAS  PubMed  Google Scholar  * Ng, R. & Yeghiazarians, Y. Post myocardial


infarction cardiogenic shock: A review of current therapies. _J. Intensive Care Med._ 28, 151–165. https://doi.org/10.1177/0885066611411407 (2013). Article  PubMed  Google Scholar  * Sato,


R., Hasegawa, D., Guo, S., Nuqali, A. E. & Moreno, J. E. P. Sepsis-induced cardiogenic shock: Controversies and evidence gaps in diagnosis and management. _J. Intensive Care_ 13, 1.


https://doi.org/10.1186/s40560-024-00770-y (2025). Article  PubMed  PubMed Central  Google Scholar  * Soussi, S. et al. Identifying biomarker-driven subphenotypes of cardiogenic shock:


Analysis of prospective cohorts and randomized controlled trials. _EClinicalMedicine_ 79, 103013. https://doi.org/10.1016/j.eclinm.2024.103013 (2025). Article  PubMed  Google Scholar  *


Zweck, E. et al. Phenotyping cardiogenic shock. _J. Am. Heart Assoc._ 10, e020085. https://doi.org/10.1161/JAHA.120.020085 (2021). Article  PubMed  PubMed Central  Google Scholar  * Jentzer,


J. C. et al. Machine learning approaches for phenotyping in cardiogenic shock and critical illness: Part 2 of 2. _JACC Adv._ 1, 100126. https://doi.org/10.1016/j.jacadv.2022.100126 (2022).


Article  PubMed  PubMed Central  Google Scholar  * Hu, C., Li, Y., Wang, F. & Peng, Z. Application of machine learning for clinical subphenotype identification in sepsis. _Infect. Dis.


Ther._ 11, 1949–1964. https://doi.org/10.1007/s40121-022-00684-y (2022). Article  PubMed  PubMed Central  Google Scholar  * Seymour, C. W. et al. Derivation, validation, and potential


treatment implications of novel clinical phenotypes for sepsis. _JAMA_ 321, 2003–2017. https://doi.org/10.1001/jama.2019.5791 (2019). Article  CAS  PubMed  PubMed Central  Google Scholar  *


Yang, J. et al. Identification of clinical subphenotypes of sepsis after laparoscopic surgery. _Laparosc. Endosc. Robot. Surg._ 7, 16–26. https://doi.org/10.1016/j.lers.2024.02.001 (2024).


Article  Google Scholar  * Subbaswamy, A. et al. A data-driven framework for identifying patient subgroups on which an AI/machine learning model may underperform. _NPJ Digit. Med._ 7, 334.


https://doi.org/10.1038/s41746-024-01275-6 (2024). Article  PubMed  PubMed Central  Google Scholar  Download references AUTHOR INFORMATION AUTHORS AND AFFILIATIONS * Department of


Anaesthesiology and Critical Care, Klinikum Aschaffenburg-Alzenau, Aschaffenburg, Germany Dirk Obmann & York A. Zausig * Department of Anaesthesiology, University of Regensburg,


Regensburg, Germany Dirk Obmann, Bernhard Graf & York A. Zausig * Faculty of Engineering, Competence Centre for Artificial Intelligence, TH Aschaffenburg (University of Applied


Sciences), Aschaffenburg, Germany Philipp Münch & Holger von Jouanne-Diedrich Authors * Dirk Obmann View author publications You can also search for this author inPubMed Google Scholar *


Philipp Münch View author publications You can also search for this author inPubMed Google Scholar * Bernhard Graf View author publications You can also search for this author inPubMed 


Google Scholar * Holger von Jouanne-Diedrich View author publications You can also search for this author inPubMed Google Scholar * York A. Zausig View author publications You can also


search for this author inPubMed Google Scholar CONTRIBUTIONS HvJD and YZ originated the idea. DO and PM developed the following study objectives under the supervision of HvJD and YZ. DO and


PM performed the data preprocessing, programming, feature selection and evaluation of the results. DO was responsible for writing the paper. PM, HvJD, BG and YZ supported the editing of the


manuscript and added essential comments to the paper. All authors read and approved the final manuscript. CORRESPONDING AUTHOR Correspondence to Dirk Obmann. ETHICS DECLARATIONS COMPETING


INTERESTS The authors declare no competing interests. ETHICS APPROVAL AND CONSENT TO PARTICIPATE An inquiry was submitted to the Ethics Committee of the Bavarian Medical Association.


Following an internal review, no further consultation with the ethics committee was deemed necessary. The data are widely accessible to researchers under a data use agreement, so the study


was exempt from the need for specific ethical review. Nevertheless, we adhered to fundamental ethical principles and ensured the validity and fairness of the study process. ADDITIONAL


INFORMATION PUBLISHER’S NOTE Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. ELECTRONIC SUPPLEMENTARY MATERIAL Below is


the link to the electronic supplementary material. SUPPLEMENTARY MATERIAL 1 RIGHTS AND PERMISSIONS OPEN ACCESS This article is licensed under a Creative Commons


Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give


appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission


under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons


licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by


statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit


http://creativecommons.org/licenses/by-nc-nd/4.0/. Reprints and permissions ABOUT THIS ARTICLE CITE THIS ARTICLE Obmann, D., Münch, P., Graf, B. _et al._ Comparison of different AI systems


for diagnosing sepsis, septic shock, and cardiogenic shock: a retrospective study. _Sci Rep_ 15, 15850 (2025). https://doi.org/10.1038/s41598-025-00830-9 Download citation * Received: 27


December 2024 * Accepted: 30 April 2025 * Published: 06 May 2025 * DOI: https://doi.org/10.1038/s41598-025-00830-9 SHARE THIS ARTICLE Anyone you share the following link with will be able to


read this content: Get shareable link Sorry, a shareable link is not currently available for this article. Copy to clipboard Provided by the Springer Nature SharedIt content-sharing


initiative KEYWORDS * Artificial intelligence * Ai * Decision support systems * Sepsis * Septic shock * Cardiogenic shock