Breakthrough in Readmission Prediction: New AI Model Hits 75% AUC Using Only Text

by Text MiningMay 20th, 2025
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

Using MIMIC-III discharge notes, BDSS+MLP model achieved 75% AUC and 94% recall in predicting 30-day readmission, outperforming other models on imbalanced data.

Companies Mentioned

Mention Thumbnail
Mention Thumbnail
featured image - Breakthrough in Readmission Prediction: New AI Model Hits 75% AUC Using Only Text
Text Mining HackerNoon profile picture
0-item

Authors:

(1) Rasoul Samani, School of Electrical and Computer Engineering, Isfahan University of Technology and this author contributed equally to this work;

(2) Mohammad Dehghani, School of Electrical and Computer Engineering, University of Tehran, Tehran, Iran and this author contributed equally to this work (dehghani.mohammad@ut.ac.ir);

(3) Fahime Shahrokh, School of Electrical and Computer Engineering, Isfahan University of Technology.

Abstract and 1. Introduction

2. Related Works

3. Methodology and 3.1 Data

3.2 Data preprocessing

3.3. Predictive models

4. Evaluation

4.1. Evaluation metrics

4.2. Results and discussion

5. Conclusion and References

4.1. Evaluation metrics

In binary classification tasks, data instances are typically classified as either positive or negative. A positive label signifies the presence of readmission, while a negative instance indicates no-readmission. Each binary label prediction can be categorized into one of four possibilities: a true positive (TP) occurs when a positive outcome is correctly predicted, a true negative (TN) happens when a negative outcome is correctly predicted, a false positive (FP) arises when a negative instance is wrongly predicted as positive, and a false negative (FN) occurs when a positive instance is incorrectly predicted as negative [45].


The primary evaluation metrics for binary classification are accuracy, precision, recall, and F1-score. Accuracy represents the percentage of correctly classified instances among all instances (Equation 1). Precision measures the proportion of instances classified as positive among all instances predicted as positive (Equation 2). Recall, also known as sensitivity, assesses the ability of the model to identify all truly positive instances (Equation 3). Finally, the F1-score is a harmonic mean of precision and recall, providing a balanced assessment of the model's performance (Equation 4) [46].



Two additional valuable metrics include ROC (Receiver Operating Characteristic) and AUC. The ROC curve is constructed by plotting the true positive rate against the false positive rate. This curve consistently increases within the unit square, bounded by the points (0, 0) and (1, 1) [47]. In addition to the ROC curve, the area under it (AUC) serves as another valuable evaluation metric. This metric spans from 0 to 1, providing insight into the overall performance of the classification model [48].


4.2. Results and discussion

The dataset initially comprised 51,113 records, which underwent preprocessing resulting in 49,083 records. All these records were utilized to construct classification models. It's noteworthy that the dataset exhibits a high degree of imbalance, and to maintain realism and promote better generalization, no balancing techniques were applied. The data was then partitioned into three subsets: 70% for training, amounting to 34,358 records, 15% for validation, containing 7,363 records, and the remaining 15% for testing, also consisting of 7,362 records. Table 1 provide the distribution of each class.



Table 1: statics of dataset.



Table 2 presents the results obtained from various classifiers employed in our study. Notably, the Final Method, which combines the BDSS model with MLP, outperformed the state-of-the-art models in terms of AUC. Furthermore, this model, along with logistic regression, achieved the highest accuracy, recall, and F1-score, underscoring the continued relevance of machine learning models. In Figure 5, the ROC curve illustrates the performance of different models, with the Final Method achieving an impressive AUC of 75%, surpassing all other models. Remarkably, logistic regression exhibited superior performance with a rate of 73.2%, outperforming alternative machine learning techniques.


In the medical domain, metrics like recall and AUC play a crucial role in evaluating AI models. Recall, which measures the ability of a model to correctly identify positive cases, is particularly important in healthcare settings where identifying all potential cases is paramount. Similarly, AUC provides an overall measure of model performance and is widely used for assessing predictive models in medical applications. The Final Method, leveraging the BDSS model, is considered the best model due to its superior performance in terms of recall and AUC. This model is trained on discharge summaries data and harnesses the power of BDSS, which is pre-trained on a large corpus of text data and is adept at understanding the semantic nuances of text.



Table 2: Results of proposed models.




Figure 5: ROC curve.



One advantage of the logistic regression model is its clarity and interpretability. To gain insights into the model's decision-making process, we extracted and presented the features that exerted the most significant impact on the outcomes, as shown in Figure 6. As observed, words such as "milliliter," "mg," and "chronic" had the greatest influence on categorizing patients as readmitted. This can be attributed to the prescription of various drugs with specific doses by the medical practitioners during the patient's discharge. The higher the number of prescribed drugs, the higher the likelihood of patient readmission. Conversely, the presence of words like "without," "family," "negative," "normal," and "transferred" in the patient's discharge text had the most substantial impact on categorizing patients as non-returning to the hospital.



Figure 6: The most effective words in the classification of patients in the logistic regression model.



Several previous studies have investigated models for predicting ICU readmission, with logistic regression consistently demonstrating favorable results, achieving AUC rates of 65% [49], 66% [50], and 70% [51]. However, a recent study by Orangi-Fard et al. [41], utilized various machine learning techniques on the MIMIC-III dataset to predict patient readmission. Their SVM-RBF model achieved an AUC rate of 74%. It's worth noting that Orangi-Fard et al. only utilized a portion of the dataset (4000 for training and 6000 for validation), balanced data, and employed 825 features. In contrast to previous approaches, our study took a comprehensive approach by utilizing the entire dataset, including imbalanced data. Additionally, we focused solely on textual features, omitting other factors such as demographics. This deliberate choice allowed us to gain deeper insights into the specific aspects we aimed to explore. Furthermore, while previous studies solely relied on machine learning models, our study also incorporated deep learning methods. This highlights the novelty and potential advantages of leveraging deep learning techniques in predicting ICU readmission. Table 3 provide a comparison with existing methods based on AUC metric.




5. Conclusion

Medical data, particularly EHR data, presents a rich source for text mining studies. These studies hold promise in various healthcare applications. Reducing ICU readmission rates is paramount for hospitals to enhance patient outcomes, conserve ICU resources, and curtail healthcare expenses. In this study, we aimed to leverage patient discharge reports, which offer detailed insights into a patient's medical history, current condition, and treatment recommendations, to develop a predictive model for ICU readmission. Our proposed deep learning-based model demonstrated superior performance compared to traditional machine learning models, achieving higher AUC. For future research, exploring alternative deep learning architectures beyond MLP could be beneficial. Additionally, Large Language Models (LLM) can be considered for creating predictive models and conducting comparative analyses with deep learning models. To enhance their effectiveness, we recommend considering the use of larger input data and leveraging advanced models like the LongFormer. Additionally, incorporating summarization techniques during the pre-processing stage can further improve the quality of input data.

References

[1] T. H. Tulchinsky, E. A. Varavikova, and M. J. Cohen, "Chapter 13 - National health systems," in The New Public Health (Fourth Edition), T. H. Tulchinsky, E. A. Varavikova, and M. J. Cohen Eds. San Diego: Academic Press, 2023, pp. 875-986.


[2] M. Paul, L. Maglaras, M. A. Ferrag, and I. Almomani, "Digitization of healthcare sector: A study on privacy and security concerns," ICT Express, vol. 9, no. 4, pp. 571-588, 2023.


[3] A. I. Stoumpos, F. Kitsios, and M. A. Talias, "Digital transformation in healthcare: technology acceptance and its applications," International journal of environmental research and public health, vol. 20, no. 4, p. 3407, 2023.


[4] D. K. Murala, S. K. Panda, and S. K. Sahoo, "Securing electronic health record system in cloud environment using blockchain technology," in Recent advances in blockchain technology: realworld applications: Springer, 2023, pp. 89-116.


[5] E. G. Ferro et al., "Patient readmission rates for all insurance types after implementation of the hospital readmissions reduction program," Health Affairs, vol. 38, no. 4, pp. 585-593, 2019.


[6] B. Hahn, T. Ball, W. Diab, C. Choi, H. Bleau, and A. Flynn, "Utilization of a multidisciplinary hospital-based approach to reduce readmission rates," SAGE Open Medicine, vol. 12, p. 20503121241226591, 2024.


[7] E. Sheetrit, M. Brief, and O. Elisha, "Predicting unplanned readmissions in the intensive care unit: a multimodality evaluation," Scientific Reports, vol. 13, no. 1, p. 15426, 2023.


[8] A. Gupta and G. C. Fonarow, "The Hospital Readmissions Reduction Program—learning from failure of a healthcare policy," European journal of heart failure, vol. 20, no. 8, pp. 1169-1174, 2018.


[9] U. Balla, S. Malnick, and A. Schattner, "Early readmissions to the department of medicine as a screening tool for monitoring quality of care problems," Medicine, vol. 87, no. 5, pp. 294-300, 2008.


[10] S. F. Jencks, M. V. Williams, and E. A. Coleman, "Rehospitalizations among patients in the Medicare fee-for-service program," New England Journal of Medicine, vol. 360, no. 14, pp. 1418- 1428, 2009.


[11] J. Carter, C. Ward, A. Thorndike, K. Donelan, and D. J. Wexler, "Social factors and patient perceptions associated with preventable hospital readmissions," Journal of Patient Experience, vol. 7, no. 1, pp. 19-26, 2020.


[12] E. Owusu, F. Oluwasina, N. Nkire, M. A. Lawal, and V. I. Agyapong, "Readmission of patients to acute psychiatric hospitals: influential factors and interventions to reduce psychiatric readmission rates," in Healthcare, 2022, vol. 10, no. 9: MDPI, p. 1808.


[13] M. A.-A. Al-Tamimi, S. W. Gillani, M. E. Abd Alhakam, and K. G. Sam, "Factors associated with hospital readmission of heart failure patients," Frontiers in pharmacology, vol. 12, p. 732760, 2021.


[14] O. Ben-Assuli and R. Padman, "Analysing repeated hospital readmissions using data mining techniques," Health Systems, vol. 7, no. 2, pp. 120-134, 2018.


[15] L. L. Wang and K. Lo, "Text mining approaches for dealing with the rapidly expanding literature on COVID-19," Briefings in Bioinformatics, vol. 22, no. 2, pp. 781-799, 2021.


[16] M. Dehghani, "Dental Severity Assessment through Few-shot Learning and SBERT Fine-tuning," arXiv preprint arXiv:2402.15755, 2024.


[17] F. Ebrahimi, M. Dehghani, and F. Makkizadeh, "Analysis of Persian Bioinformatics Research with Topic Modeling," BioMed Research International, vol. 2023, 2023.


[18] I. Graf, H. Gerwing, K. Hoefer, D. Ehlebracht, H. Christ, and B. Braumann, "Social media and orthodontics: A mixed-methods analysis of orthodontic-related posts on Twitter and Instagram," American Journal of Orthodontics and Dentofacial Orthopedics, vol. 158, no. 2, pp. 221-228, 2020.


[19] B. Bokharaeian, M. Dehghani, and A. Diaz, "Automatic extraction of ranked SNP-phenotype associations from text using a BERT-LSTM-based method," BMC bioinformatics, vol. 24, no. 1, p. 144, 2023.


[20] M. Dehghani and Z. Yazdanparast, "Discovering the symptom patterns of COVID-19 from recovered and deceased patients using Apriori association rule mining," Informatics in Medicine Unlocked, vol. 42, p. 101351, 2023.


[21] T. Andriotti et al., "The optimal length of stay associated with the lowest readmission risk following surgery," journal of surgical research, vol. 239, pp. 292-299, 2019.


[22] M. T. Mardini and Z. W. Raś, "Extraction of actionable knowledge to reduce hospital readmissions through patients personalization," Information Sciences, vol. 485, pp. 1-17, 2019.


[23] L. R. Brindise and R. J. Steele, "Machine learning-based pre-discharge prediction of hospital readmission," in 2018 International Conference on Computer, Information and Telecommunication Systems (CITS), 2018: IEEE, pp. 1-5.


[24] O. Ben-Assuli, R. Padman, M. Leshno, and I. Shabtai, "Analyzing hospital readmissions using creatinine results for patients with many visits," Procedia Computer Science, vol. 98, pp. 357-361, 2016.


[25] B. Zheng, J. Zhang, S. W. Yoon, S. S. Lam, M. Khasawneh, and S. Poranki, "Predictive modeling of hospital readmissions using metaheuristics and data mining," Expert Systems with Applications, vol. 42, no. 20, pp. 7110-7120, 2015.


[26] A. Hammoudeh, G. Al-Naymat, I. Ghannam, and N. Obied, "Predicting hospital readmission among diabetics using deep learning," Procedia Computer Science, vol. 141, pp. 484-489, 2018.


[27] R. Assaf and R. Jayousi, "30-day hospital readmission prediction using MIMIC data," in 2020 IEEE 14th International Conference on Application of Information and Communication Technologies (AICT), 2020: IEEE, pp. 1-6.


[28] A. Moerschbacher and Z. He, "Building prediction models for 30-day readmissions among icu patients using both structured and unstructured data in electronic health records," in 2023 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), 2023: IEEE, pp. 4368-4373.


[29] A. E. Johnson et al., "MIMIC-III, a freely accessible critical care database," Scientific data, vol. 3, no. 1, pp. 1-9, 2016.


[30] K. Maharana, S. Mondal, and B. Nemade, "A review: Data pre-processing and data augmentation techniques," Global Transitions Proceedings, vol. 3, no. 1, pp. 91-99, 2022.


[31] B. J. Jansen, K. K. Aldous, J. Salminen, H. Almerekhi, and S.-g. Jung, "Data Preprocessing," in Understanding Audiences, Customers, and Users via Analytics: An Introduction to the Employment of Web, Social, and Other Types of Digital People Data: Springer, 2023, pp. 65-75.


[32] M. Siino, I. Tinnirello, and M. La Cascia, "Is text preprocessing still worth the time? A comparative survey on the influence of popular preprocessing methods on Transformers and traditional classifiers," Information Systems, vol. 121, p. 102342, 2024.


[33] S. Vijayarani and R. Janani, "Text mining: open source tokenization tools-an analysis," Advanced Computational Intelligence: An International Journal (ACII), vol. 3, no. 1, pp. 37-47, 2016. [34] M. Gerlach, H. Shi, and L. A. N. Amaral, "A universal information theoretic approach to the identification of stopwords," Nature Machine Intelligence, vol. 1, no. 12, pp. 606-612, 2019.


[35] A. Jalilifard, V. F. Caridá, A. F. Mansano, R. S. Cristo, and F. P. C. da Fonseca, "Semantic sensitive TF-IDF to determine word relevance in documents," in Advances in Computing and Network Communications: Proceedings of CoCoNet 2020, Volume 2, 2021: Springer, pp. 327-337.


[36] E. Alsentzer et al., "Publicly available clinical BERT embeddings," arXiv preprint arXiv:1904.03323, 2019.


[37] "Bio-Discharge Summary BERT model." [Online]. Available: https://huggingface.co/emilyalsentzer/Bio_Discharge_Summary_BERT.


[38] E. Y. Boateng and D. A. Abaye, "A review of the logistic regression model with emphasis on medical research," Journal of data analysis and information processing, vol. 7, no. 04, p. 190, 2019.


[39] M. Kuhn and K. Johnson, Applied predictive modeling. Springer, 2013.


[40] W. Xing and Y. Bei, "Medical health big data classification based on KNN classification algorithm," Ieee Access, vol. 8, pp. 28808-28819, 2019.


[41] N. Orangi-Fard, A. Akhbardeh, and H. Sagreiya, "Predictive model for icu readmission based on discharge summaries using machine learning and natural language processing," in Informatics, 2022, vol. 9, no. 1: MDPI, p. 10.


[42] M. Wankhade, A. C. S. Rao, and C. Kulkarni, "A survey on sentiment analysis methods, applications, and challenges," Artificial Intelligence Review, vol. 55, no. 7, pp. 5731-5780, 2022.


[43] M. Hosseinzadeh et al., "A multiple multilayer perceptron neural network with an adaptive learning algorithm for thyroid disease diagnosis in the internet of medical things," The Journal of Supercomputing, vol. 77, pp. 3616-3637, 2021.


[44] D. Chen, J. Niu, Q. Pan, Y. Li, and M. Wang, "A deep-learning based ultrasound text classifier for predicting benign and malignant thyroid nodules," in 2017 International conference on green informatics (ICGI), 2017: IEEE, pp. 199-204.


[45] W. Zhu, N. Zeng, and N. Wang, "Sensitivity, specificity, accuracy, associated confidence interval and ROC analysis with practical SAS implementations," NESUG proceedings: health care and life sciences, Baltimore, Maryland, vol. 19, p. 67, 2010.


[46] M. Dehmer and S. C. Basak, Statistical and machine learning approaches for network analysis. Wiley Online Library, 2012.


[47] M. S. Pepe, G. Longton, and H. Janes, "Estimation and comparison of receiver operating characteristic curves," The Stata Journal, vol. 9, no. 1, pp. 1-16, 2009.


[48] O. Rainio, J. Teuho, and R. Klén, "Evaluation metrics and statistical tests for machine learning," Scientific Reports, vol. 14, no. 1, p. 6086, 2024.


[49] A. J. Campbell, J. A. Cook, G. Adey, and B. H. Cuthbertson, "Predicting death and readmission after intensive care discharge," British journal of anaesthesia, vol. 100, no. 5, pp. 656-662, 2008.


[50] S. A. Frost et al., "Readmission to intensive care: development of a nomogram for individualising risk," Critical Care and Resuscitation, vol. 12, no. 2, pp. 83-89, 2010.


[51] O. Badawi and M. J. Breslow, "Readmissions and death after ICU discharge: development and validation of two predictive models," PloS one, vol. 7, no. 11, p. e48758, 2012.

This paper is available on arxiv under CC BY-NC-ND 4.0 DEED license.


Trending Topics

blockchaincryptocurrencyhackernoon-top-storyprogrammingsoftware-developmenttechnologystartuphackernoon-booksBitcoinbooks