Use of Information Theory measures on driving data extracted from OBD-II for classification and prediction applications
Driver identification, Emissions Prediction, OBD-II, Information Theory, Complexity- Entropy, Classification Algorithm, Prediction Algorithm.
Vehicles have more and more built-in sensors, such sensors are interconnected in an internal network called CAN and their values can be accessed through the OBD-II interface. This allows a large amount of data from different variables from the act of driving.
Several works have proposed applications that benefit from the availability of these data. Most applications fall into one of the following problems: classification, grouping, prediction. In general, the works perform the following process: data extraction, cleaning and transformation data, training model, evaluation model.
In this work it is proposed to extract measures from Information Theory to add data to pro- cess. With this, it is intended to have new values to add or replace the data pre-processing and then evaluate the performance of the model. For evaluation, it is intended to analyze two applications: driver identification (classification problem) and pollutant gas emission prediction (prediction problem).
Preliminary results were obtained for driver identification. An experiment was done for a small and a large amount of data. For the small amount of data, the process according to the literature proved to be superior in most classification algorithms in relation to the proposed process. For the large amount of data, the difference between the processes became small, however with the process according to the literature being slightly superior with the exception of the Naive Bayes algorithm (which performed better with the proposed process, but had lower accuracy than the other classifiers).
The next step is to compare the proposed process and the one commonly used in the litera- ture for a prediction problem: prediction of polluting gas emissions.