Understanding Traffic Signs by an Intelligent Advanced Driving Assistance System for Smart Vehicles

: Recent technologies have made life smarter. vehicles are vital components in daily life that are getting smarter for a safer environment. Advanced Driving Assistance Systems (ADAS) are widely used in today's vehicles. It has been a revolutionary approach to make roads safer by assist-ing the driver in difficult situations like collusion, or assistance in respecting road rules. ADAS is composed of a huge number of sensors and processing units to provide a complete overview of the surrounding objects to the driver. In this paper, we introduce a road signs classifier for an ADAS to recognize and understand traffic signs. This classifier is based on a deep learning technique, and, in particular, it uses Convolutional Neural Networks (CNN). The proposed approach is composed of two stages. The first stage is a data preprocessing technique to filter and enhance the quality of the input images to reduce the processing time and improve the recognition accuracy. The second stage is a convolutional CNN model with a skip connection that allows passing semantic features to the top of the network in order to allow for better recognition of traffic signs. Experiments have proved the performance of the CNN model for traffic sign classification with a correct recognition rate of 99.75% on the German traffic sign recognition benchmark GTSRB dataset.


Introduction
Intelligent ADAS are used to assist drivers in difficult driving situations. indeed, the ADAS can assist them in the continuous control of the vehicle. In addition, the ADAS system can access the vehicle functionalities like engine and brakes, and apply numerous actions to avoid a dangerous situation like collisions with other vehicles or with pedestrians. Generally, an ADAS system is composed of several sensors (camera included). In the research presented here, the visual information provided by the cam-eras will be used to help the ADAS system to recognize and interpret correctly the road signs.
To process visual information, a well-known deep learning model was used as convolutional neural network [1], widely exploited in image processing tasks such as object recognition, image classification [2], and object localization. The success of CNN models has been achieved in various computer vision tasks [3,4]. Indeed, the CNN is characterized by the power of processing visual data that mimics the biological system. In a CNN every neuron of the network is applied in a restricted region of the receptive field [5]. The power of the CNN to learn directly from the image [6], unlike other clas-sification algorithms that need a hand-crafted feature to learn from, makes it at the top of used algorithms for image processing tasks.
Enhancing deep learning algorithm performance is based on the amount of training data. For an artificial system, extensive computation effort are required for recognizing 2 of 8 and classifying a traffic sign. From one country to another, the shape and the color of the same road sign are different. Figure 1 presents the school sign in different countries (intraclass variability). Each country has different classes of signs (inter-class variability). Moreover, the same road sign may have a different appearance because of environmental factors such as dust, rain, sun, its ob-servation point, etc. Therefore, the mentioned above challenges need to be solved successfully in order to obtain a robust road sign classifier, i.e., leading to the minimum rate of errors.

Figure 1. School signs in different countries
This paper introduces a two steps approach to traffic sign classification which solves all the above-mentioned problems and offers a high processing rate (up to 40 image/s).
The remainder of the paper is organized as follows. Section 2 introduces some related works on traffic sign classification. Section 3 outlines the novel two-step approach to traffic sign recognition. Section 4 provides the results of the evaluation of the proposed approach. Section 5 summarizes the paper and to proposes its extensions.

Related works
The need for a robust traffic sign classifier became an important part of ADAS. Fleyeh et al. [19] propose the use of the principal component analysis algorithm to classify traffic signs. Their method was evaluated on 2 different datasets and the highest accuracy achieved was 97.6%.
In [20], a traffic sign recognition method combines two functional modules. The first module is a feature extraction descriptor (histogram of oriented gradient (HOG). the second module was the classifier based on the extreme learning machine. The proposed method achieves high performance in both recognition accuracy and computational efficiency.
Berkaya et al. [21] propose 2 traffic sign recognition methods for circular signs. The first method is based on a combination of feature extraction techniques (histogram of oriented gradients, local binary patterns, and Gabor features), and a classifier based on the support vector machines. The second method is based on a circle detection algorithm and an RGB-based color thresholding technique. Both techniques achieve acceptable accuracy, but the second method was suitable for real-time processing.
Gecer et al. [18] use a Combination of Shifted Filter Responses for a color-blob method to select circular regions from images (as a features extraction module) and use Riadh Ayachi et al.

of 8
the support vector machines as a classifier. This technique achieves an accuracy of 98.94% on the GTSRB dataset and 89.20% on the butterfly dataset.
Many works based on deep learning techniques were presented. Rachmadi et al. [9] introduced a cascade Convolutional Neural Network to perform the classification of the Japanese road signs. The proposed method achieved an inference speed of less than 20 ms per image.
A Multi-Column deep neural network was prosed in [17]. it is a combination of many convolutional neural networks. Each convolutional neural network was trained on different preprocessed data. The proposed method results system is invariant to contrast and illumination changes.
Recently, vehicles are integrating new techniques for traffic signs classification; e.g. BMW has integrated a traffic sign recognition system in the BMW 5 Series [22]. Moreover, other vehicle manufactories started to implement those technologies [10]. Volkswagen has integrated a traffic sign classifier into the Audi A8 [11].
The literature review on the traffic signs recognition proved that this technology is very important for ADAS systems or even for self-driving cars. The mentioned works are improved separately, either for accuracy or inference speed. The proposed method provides a good balance between accuracy and speed by achieving the highest classification accuracy and a good inference speed.

Proposed approach
The proposed method focuses on the balance between speed and recognition accuracy by deploying a data preparing technique to enhance the image quality and the use of a Convolutional Neural Network.
A two-steps approach to traffic sign classification encompasses: (1) data preparation pipeline to enhance data quality and to improve performance. (2) deep learning model to recognize and classify traffic signs. The data preprocessing method is composed of 4 steps: 1. First, loading the data and apply data augmentation techniques 2. Second, images resizing to 32*32 with 3 channel RGB color space and shuffled [12]. 3. Third, a local histogram equalization [7,13] was applied; 4. Finally, the images were normalized to feed them to the proposed convolutional neural network. For tyraffic sign recognition, a custom CNN model composed of 13 layers, 6 convolution layers, 3 max-pooling layers, 2 fully connected layers, and a softmax layer was proposed. The data preparation workflow is presented in Figure 2.
First, the data was loaded and new examples were generated using a data augmentation technique. The data augmentation technique was applied to maximize the amount of the training data. Second, input images were resized 32*32 in the RGB color space. Then, to avoid obtaining highly correlated mini-batches, the images were shuffled. So, the training algorithm obtains different mini-batch for each forward pass. Third, contrast of the images was enhanced using local histogram equalization [14].

Figure 2. Data preparation process
Globally speaking, the proposed preparation allows to increase the contrast of areas with lower local contrast and makes global contrast higher. Finally, the data was normalized to make all examples at the same scale ensuring balanced representation of all features. Data preparation process enhances data quality for both training and testing processes. Then a CNN model was proposed for traffic sign recognition. CNN is a deep learning model used for solving computer vision tasks. It has achieved a wide success in images classification task and it has the flexibility of reuse of existing models for new applications.
Inspired by the ResNet [8] that proposed implementing residual blocks instead of classic plain blocks, we proposed a custom CNN model for traffic sign recognition. A residual block consists of connecting previous layers with more than just the next layer. By using those residual blocks, ResNet achieved a top-5 error of 3.57% [8] in the ILSVRC 2015 [15].
For the proposed traffic sign recognition, a CNN model based on residual connection was proposed. Figure 3 presents the proposed architecture. The network is composed of 7 convolution layers. For the first convolution layer, 3 filters with 1x1 kernel size with a stride of 1 were used to learn features from the whole image. Those 3 filters are used to produce feature maps from each of the 3 channels RGB space color. The 1x1 kernel size was used to make a significant improvement to the network performance. For the other convolution layers, a 3x3 kernel is used starting with 32 filters and the number of filters was doubled after each pooling layer. 3x3 kernels were used because smaller kernel sizes result in better performance. For the pooling, the max-pooling with 3x3 kernel size and a stride of 2 was used. The flatten layer converts tensors to vectors to ensure the connection between the max-pooling layer and the fully connected layer. Two fully connected layers Riadh Ayachi et al.

of 8
summarized the features and learned global features. For the classification, the soft max layer was deployed to provide predictions. As a residual connection, all the max-pooling layers are connected to the fully connected layer to get features from all the stages of the network. For performance enhancement, the dropout technique was deployed after each max pooling layer and after the fully connected layers.

Results
Experiments were performed on a desktop with an Intel i7 CPU and a Nvidia GTX960 GPU. The network was built using the TensorFlow framework. For model training and testing, the German traffic signs dataset GTSRB [16] was used. The GTSRB dataset is traffic signs classification benchmark. The dataset was divided into training set and a testing set. It contains more than 50000 natural scenes images from 43 traffic signs classes.
To test in detail the system performance, various configurations were tested such as the image sizes, different batches, dropout rate, and different learning algorithms (optimizers).
At the start, the parameters of the experimental environment were set as follows: all images were resizing to 32*32, a batch of 1024 samples was used, a dropout rate of 0.25 was assumed and the stochastic gradient descent was selected as learning algorithm.
Once the training is finished and several tests were performed, the best configuration was fixed by an images size of 96*96, using a mini-batch of 256 samples, a dropout rate of 0.5, and the Adam optimizer for learning.
For model training and testing, we started by preprocessing the data and use at as input to the CNN model. First, the data was loaded, that the augmentation technique was applied. Next, the images were resized, shuffled to generate mixed mini-batches and the local histogram equalization was applied. At the final step of the preparing pipeline, the generated data was used for the training of the proposed CNN. Table 1 summurize the achieved results compared to existing works repoted on the GTSRB dataset. as repoted, an accuracy of 99.75% was achieved. Compared to human accuracy, the achieved results is very acceptable. this result outperforms the state-of-theart models in the traffic signs classification task.
Most of the false-negative predictions are due to totally or partially damaged images after the data pre-processing. For real-time implementation, a balancing speed and accuracy is important. The proposed CNN is suitable for implementation in real-time applications. Usually, an average of 25 images per second are required for real-time constraints. The traffic sign recognition system has achieved up to 143 FPS using high-performance GPU. So, switching to on lowperformance hardware such as raspberry and achieving high performance in term of speed and accuracy is possible.
After the train and tests, the CNN model was implemented in a traffic sign classification application where the images were labeled with human-understandable labels. The application was tested using new images. Figure 4 presents an example of the application output of top five predictions.

Conclusions
This paper proposed a 2 stages method based on a data preparation and a CNN model for the traffic signs recognition. The data preparation stage enhances the image quality and improves the recognition step. The obtained results prove the effeciency the proposed approach by achieving real-time speed and high accuracy which outperforms human performance and state-of-the-art methods.
Funding: This research received no external funding Data Availability Statement: In this section, please provide details regarding where data supporting reported results can be found, including links to publicly archived datasets analyzed or generated during the study.

Conflicts of Interest:
The authors declare no conflict of interest.