Phd Dissertation

Application of Advanced Machine Learning Paradigms for Injury Severity Modelling of Motor Vehicle Crashes on Rural Highways in Saudi Arabia

Abstract

Traffic crashes are the primary source of fatalities and deaths, particularly in developing countries. The majority of crash victims belong to developing low and middle – income nations. Traffic-related injuries are the third leading cause of fatalities accounting for about 4.7% of total mortalities in the Kingdom of Saudi Arabia (KSA). Traffic causes are also responsible for whopping economic losses annually worth 4.3% of national GDP. In response to growing road safety concerns, recently, different mitigation strategies, however, the employment of these countermeasures appears to be inadequate because the highway safety situations have been barely improved. There is a major barrier between the policy recommendations and law enforcement. Despite the worsening highway safety situation in the country, no comprehensive study has been conducted at the national level to investigate the trends of traffic crashes and the identification of crash risk and injury severity factors. Injury severity analysis of accidents is particularly an under-researched problem in KSA.To put in place appropriate countermeasures in a proactive and effective manner, a thorough analysis of crash risk variables is required. crash injury severity analysis constitute an important research problem in the domain of traffic safety and the subject proposed has been dominated by the application of different statistical models. However, these models have several unrealistic assumptions, which if violated, can easily produce biased predictions. To overcome the limitation of statistical methods, different types of data mining, machine learning (ML), and deep learning techniques have been increasingly employed. Though ML models have better adaptability to complex data structures with no or very few underlying assumptions and offer highly accurate predictions, they are mostly criticized for lack of interpretability. The present study proposes the application of six different ML algorithms (Logistic Regression, Naïve Bayes, Random Forest, LightGBM, XGBoost, and CatBoost) for injury severity prediction of traffic crashes that were reported between 2017 to 2019 on interstate rural highways in Saudi Arabia. The injury severity modeling/classification performance of the proposed ML algotithms was evaluated in terms of performance metrics such as average accuracy, Recall, Precision, F-1 score, Kappa, ROC, and AUC. Experimental results revealed that CatBoost, with an average accuracy of 88.9% outperformed other models. Model’s comparison by individual severity classes also demonstrated the robust performance of the CatBoost classifier. To address the ML models interpretability issue, two newly developed techniques, i.e., feature importance and SHAP (Shapley Additive exPlanations) analysis were employed. The identified significant injury severity risk factors were mostly consistent between the two techniques. Few factors linked with a higher likehood of resulting in a fatal or injury severity prone crashes are crash type, time of the day, lighting conditions, speeding, weather status, vehicle and highway type, collisions involving heavy vehicles, and on-site damage characteristics. Finally, the study also proposes the application of Information Root Node Variation (IRNV) to extract significant decision rules highlighting the circumstances for the categorization of crash injury instances. For comparison purposes, multinomial logit (MNL) models were also developed using the same datasets. Results revealed that although ML modes had better injury severity predictive performance, the severity risk factors identified from both the techniques were mostly common. During the last phase of this study, GIS-based spatial analytic methods were employed for the identification of crash hotspots. Crash hotspots were determined for individual crash injury severity categories. The analysis revealed that hotspots were clustered on the outskirts of main cities close to road intersections and merge/diverge areas. The outcomes and findings of the current study can yield useful guidance and valuable to safety practitioners for timely and effective implementation of suitable mitigation measures.]