JANOLI International Journal of Data Science (JIJDS) | JANOLI International Journal
ISSN: A/F

Volume 1, Issue 2 - Feb 2025

Download Issue

Enhancing Predictive Accuracy in Healthcare Readmission through Ensemble Learning with Feature Selection and Imbalanced Data Handling

Dr. Sudhir Kumar Sharma, Assistant Professor

Healthcare readmission rates represent a significant burden on healthcare systems globally, contributing to increased costs and potentially indicating suboptimal patient care. This research proposes an enhanced predictive model for healthcare readmission using ensemble learning techniques, specifically focusing on Gradient Boosting Machines (GBM) and Random Forests, augmented with a rigorous feature selection process and strategies to mitigate the challenges posed by imbalanced datasets. We employ a hybrid feature selection approach combining filter and wrapper methods to identify the most relevant predictors. Furthermore, we address the class imbalance problem inherent in readmission data using Synthetic Minority Oversampling Technique (SMOTE) and cost-sensitive learning. The performance of the proposed model is evaluated using various metrics, including AUC-ROC, precision, recall, F1-score, and Brier score. The results demonstrate a significant improvement in predictive accuracy compared to baseline models and existing approaches, offering a promising avenue for proactive intervention and improved patient outcomes. The interpretability of the model is further enhanced through SHAP (SHapley Additive exPlanations) values, providing insights into the factors driving readmission predictions.

Download PDF Published: 26/05/2025

Leveraging Distributed Deep Learning and Feature Engineering for Enhanced Predictive Maintenance in Industrial IoT Big Data

Akash Verma, Assistant Professor

This paper explores the application of distributed deep learning techniques, coupled with advanced feature engineering, to enhance predictive maintenance capabilities within the Industrial Internet of Things (IIoT) landscape. The increasing volume and velocity of data generated by IIoT devices present significant challenges for traditional predictive maintenance approaches. We propose a novel methodology that leverages the distributed processing capabilities of Apache Spark to handle large-scale sensor data, combined with carefully engineered features derived from time-series analysis and domain expertise. A Long Short-Term Memory (LSTM) network, trained in a distributed manner using TensorFlow on a Spark cluster, is employed to predict equipment failures. The efficacy of the proposed approach is demonstrated through experiments on a simulated industrial dataset, showcasing significant improvements in prediction accuracy and reduced false positive rates compared to conventional methods. The results highlight the potential of distributed deep learning and feature engineering to revolutionize predictive maintenance in IIoT environments, leading to reduced downtime, improved operational efficiency, and cost savings.

Download PDF Published: 26/05/2025

Title: Federated Learning with Differential Privacy for Preserving Data Utility and Privacy in Healthcare Predictive Modeling

Dr K K Lavania, Assistant Professor

This paper explores the application of Federated Learning (FL) with Differential Privacy (DP) in healthcare predictive modeling. The inherent sensitivity of healthcare data necessitates robust privacy-preserving techniques. Federated learning enables collaborative model training across multiple healthcare institutions without direct data sharing, while differential privacy adds noise to the model updates to further protect individual patient data. This research investigates the trade-off between privacy protection (measured by the privacy budget, epsilon) and model accuracy (data utility) in the context of predicting patient readmission rates. We present a novel framework integrating federated averaging with Gaussian differential privacy and evaluate its performance on a synthetic healthcare dataset. The results demonstrate the feasibility of achieving acceptable prediction accuracy while maintaining a reasonable level of privacy protection, highlighting the potential of this approach for advancing collaborative healthcare research in a privacy-conscious manner.

Download PDF Published: 26/05/2025

Adaptive Ensemble Learning with Dynamic Feature Selection for Enhanced Predictive Accuracy in High-Dimensional Biological Datasets

Leszek Ziora, Assistant Professor

High-dimensional biological datasets present significant challenges for accurate predictive modeling due to the curse of dimensionality and the presence of irrelevant or redundant features. This paper introduces a novel adaptive ensemble learning framework that incorporates dynamic feature selection to enhance predictive accuracy in such datasets. The proposed method combines multiple base learners with a dynamically adjusted weighting scheme, informed by the performance of each learner on subsets of features selected using a novel hybrid feature selection strategy. This strategy integrates filter, wrapper, and embedded methods to identify the most relevant feature subsets for each base learner. The adaptive weighting mechanism dynamically adjusts the contribution of each base learner based on its performance on a validation set. We evaluate the performance of the proposed method on several benchmark biological datasets, demonstrating its superiority over existing ensemble learning and feature selection techniques. Results show a significant improvement in predictive accuracy, robustness, and interpretability, making it a promising tool for analyzing complex biological data.

Download PDF Published: 26/05/2025

Leveraging Ensemble Learning and Feature Engineering for Enhanced Predictive Accuracy in Customer Churn Prediction

Pradeep Upadhyay, Professor

Customer churn prediction is a critical challenge for businesses seeking to maintain and grow their customer base. This research investigates the application of ensemble learning techniques combined with advanced feature engineering to enhance the accuracy of churn prediction models. We explore several ensemble methods, including Random Forest, Gradient Boosting Machines (GBM), and XGBoost, and evaluate their performance against traditional machine learning algorithms. Furthermore, we implement a comprehensive feature engineering strategy, incorporating techniques such as interaction feature generation, polynomial features, and domain-specific feature extraction. Our results demonstrate that the proposed approach significantly improves churn prediction accuracy compared to baseline models, offering valuable insights for customer retention strategies. The study highlights the importance of both model selection and feature engineering in building robust and effective churn prediction systems.

Download PDF Published: 26/05/2025