A Hybrid Feature Selection and Advanced Machine Learning Framework for Forecasting Infant Mortality Rate in Bangladesh
Infant Mortality Rate (IMR) remains a vital indicator of a nation’s socioeconomic development and health system performance, particularly in developing countries such as Bangladesh. This study applies hybrid feature selection techniques to identify key environmental and demographic factors influencing IMR. Advanced machine learning models including Gradient Boosting, Random Forest, AdaBoost, K Nearest Neighbors (KNN) Regressor, Linear Regression, and XGBoost were used to forecast future IMR trends. Using data from 1970 to 2022 obtained from the World Development Indicators 2025 (WDI 2025), model performance was evaluated with Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), R squared (R²), and Mean Absolute Percentage Error (MAPE). A k fold cross validation approach was used to ensure model robustness. The results confirm the significant role of environmental and demographic variables in predicting infant mortality in Bangladesh. Among the models tested, Gradient Boosting achieved the highest accuracy (R² = 0.9995; RMSE = 1.0332; MAE = 0.8841; MAPE = 1.62 percent), demonstrating exceptional predictive capability. Our study produces a nearly flat forecast around ~24–25 deaths/1,000 from ~2023 to 2030 also indicates that infant mortality will not decrease unless special attention is paid on environmental and demographic indicators.
