W. Hsu, D. Lamba, E. Fitzsimmons, M. Alsadhan
This paper presents machine learning-based approaches to classification of historical traffic crashes in Kansas by severity, applied to a data set consisting of highway geometry, weather, and road sensor data. The goal of this work is to identify relevant features using a variety of loss measures and algorithms for feature selection. This is shown to facilitate the discovery of the most relevant sensors for the task of learning to predict severe crashes (those involving bodily injury). The key technical challenges are to cope with class imbalance (as a 75% majority of crashes are non-severe) and a highly correlated and redundant set of features from multiple coalesced sources. The major novel contributions of this work are the development of a random oversampling strategy for data augmentation, combined with the systematic application of multiple feature selection measures over a range of supervised inductive learning models and algorithms. Positive results from this approach, on a data set of 277 initial ground features and 20,000 vehicle crashes collected over 9 years (2007 – 2015) by the Kansas Department of Transportation (KDOT), included models trained using 30 features (out of 277) that achieve cross-validation precision and recall comparable to those obtained using the full set of features. These and other results point towards potential use of feature selection findings and the resultant models in planning future road construction.
Diff selection: Mark the radio boxes of the revisions to compare and hit enter or the button at the bottom.
Legend: (cur) = difference with latest revision, (prev) = difference with preceding revision, m = minor edit.
Published on 01/01/2019
Volume 2019, 2019DOI: 10.5121/csit.2019.90611Licence: CC BY-NC-SA license
Views 0Recommendations 0
Are you one of the authors of this document?