Final Report

License Plate Recognition

Contents

Project summary

In this project, we will apply the deep learning techniques to solve the problem of license plate recognition (LPR). For example,
when an image of the car plate license is given,we want the machine learning system returns us a string of characters and numbers.
The overall procedure is summarized as follows:

Automate plate license recognition is a technology that applies character and digits recognition system on images to read vehicle
registration plates. For example, the systems will alert police to record those cars which violate traffic laws or on a "hot list". At
the beginning of the project, we do not know how the recognition works, which architecture of NN should be used for the system,
and how we can make our recognition system or model more efficient? These problems are under our investigation.

Data set preparation

We use several datasets found on the internet, which include Medialab LPR database, License Plate Detection, Recognition and Automated Storage,
and PlatesMania.com.

These images are mainly categorized into several types: images in close view, images in middle-distant view, dirt or shadows showed
on car plate licenses, and images shot at night.

The image in close view:

The image in middle-distance view:

Dirt or shadows showed on car plate licenses:

The image is taken at night:

To obtain a high-quality dataset, we decide to snip each car plate license from images manually and do segmentation. There are several
samples below.

Example1:

Example2:

Example3:

For the project, we have ten image sets and totally collect 476 pictures. After segmentation, there are total of 3808(476*8) images (one
is the original picture, the rest are seven characters or digits for one car plate license). collected_data(download here)

After then, we design four MATLAB programs called image_to_data.m, illustration.m, label_to_vector.m, and vector_to_label.m to generate
numerical respersentation of the dataset with relation feature data & label data.

For feature data:

For label data:

For each image set, we use image_to_data.m to generate four files, Plate_Character_Labels_*, Plate_Labels_*, Plates_Character_Images_*, and Plates_Images_*
(* points to the image-set number and we totally create nine image sets for the project). These four files for each image set are load-data for our neural network.
all_mat_data(download here)

Architectures of neural networks

Our two neural networks are mainly based on Convolution Neural Networks which is one of the popular deep learning models and can be trained via back-propagation algorithms.

Vanilla_CNN.py(the first neural network)

For the model, we use three convolutional layers and two fully connected layers.

The first convolutiona layer: conv1 + relu + maxpool1

The second convolutional ayer: conv2 + relu + maxpool2

The third convolutiona layer: conv3 + relu + maxpool3

The first fully connected layer: fc1 + relu

The second fully connected layer: fc2

Finally, we have the construction of loss, training accuracy rate, the testing accuracy rate

parameters we used for the neural network

conv1_fsize = 4

conv1_fnum = 20

maxpool1_ksize = 2

conv2_fsize = 4

conv2_fnum = 20

maxpool2_ksize = 2

conv3_fsize = 4

conv3_fnum = 20

maxpool3_ksize = 2

fc1 = 100

fc2 = 36

batch_size = 90

l_rate = 0.005

N_epoch = 500

vanilla_resnet_v3.py(the second neural network)

In this section, we also design a residual network to do the experiments, and compare the performance of these two network architectures a vanilla CNN network and
a vanilla ResNet. We analyze the paper Deep Residual Learning for Image Recognition by Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun and find that
sometimes, deeper neural networks are more difficult to train; therefore, we try to present a residual learning framework to ease the training of networks. For the
second neural network, we try to use the residual learning functions regarding the layer inputs, instead of learning unreferenced functions.

The network we build for the second approach

parameters we used for the neural network

vfsize = 3 //Filter size for each covolution layer in the block

vfnum = 16 //The number of filters for each covolution layer in the block

batch_size = 80

l_rate = 0.001

Next, We introduce how to train and evaluate the networks proposed. We will take the Vanilla_CNN structure as an example to illustrate the idea part by part.

Software

Experimental results

In the training and testing process, we use three machines. The first machine is from the engineering lab with NVIDIA advanced GPU. It takes about 10 minutes to finish 500 epochs.
The second machine is from Argon hpc server. It takes about 8 minutes to finish 500 epochs. The third machine is from a game laptop with NVIDIA GeForce GTX 1060M. It takes
about 41 minutes to finish 500 epochs. For three machines, we get almost the same result for loss and accuracy of training and testing processing, so we just show one result about our
two neural networks.

Vanilla_CNN.py(the first neural network)

loss for Vanilla_CNN:

train loss: the training loss of character recognition in a batch

test loss: the testing loss of character recognition in a batch

accuracy for Vanilla_CNN:

train C-recog acc (or C-recog train accuracy): the training character recognition accuracy in a batch

test C-recog acc (or C-recog test accuracy): the testing character recognition accuracy in a batch

From the loss plot of the vanilla CNN structure, the training loss can decrease to 0 as we increase the number of epochs, but the testing loss goes down quickly
in the very beginning and then goes up slowly. This actually means that the vanilla CNN for our problem is slightly over fitting the data. This is actually also
what we expect since we have limited number of data samples due to the limit of a course length. Fortunately, the over-fitting phenomenon is not severe, which
means we can still hope to get good generalization results. This is also validated by the accuracy plot of the vanilla CNN.

In the accuracy plot of vanilla CNN, we see that the plate character recognition accuracy in training process can be as high as 100% and the accuracy in testing
process can also achieve as high as around 90%.

However, in the vanilla residual network case, the results are different. As we can see in the loss plot, both the training loss and testing loss goes down simultaneously
as the number of generation increases. Though it converges at a lower speed, the system seems to be more robust without large variance.

One interesting observation is that from the loss plot of vanilla residual network, we expect the loss can be further decreased when the number of generation increase.
But from the accuracy plot of vanilla residual network, the accuracy seems to have been saturated, and no further improvements can be achieved. This is also the
reason we terminate the program with 1000 generations. Unfortunately, we haven’t found a good explanation for this.

From the comparisons of the results from a vanilla CNN structure and that from a vanilla residual structure, the former is more powerful in dealing with our problem
than the latter one. Apart from the difference of number of variables used in two networks, we suspect the structure difference also accounts for the different performances.
For example, in the vanilla CNN structure, we do not take extra information by adding $x$ as the residual network does, and this makes the vanilla CNN more aggressive
in searching for a better solution or network configuration.

We now present the plate recognition accuracy plot for the vanilla CNN case.

train P-recog acc: the training plate recognition accuracy in a batch

test P-recog acc: the testing plate recognition accuracy in a batch

From the plate recognition accuracy plot, we see that in the very initial tens of iterations, the accuracy remains 0 but the plate character recognition accuracy increases stably.
This is reasonable since the plate character recognition task is easier than the plate recognition, and the infant vanilla CNN can recognition some characters but cannot recognize
any full plates. However, as the CNN becomes increasingly better tuned, it performs very will in both tasks.

vanilla_resnet_v3.py(the second neural network)

loss for vanilla_resnet_v3:

train loss: the training loss of character recognition in a batch

test loss: the testing loss of character recognition in a batch

accuracy for vanilla_resnet_v3:

train C-recog acc (or C-recog train accuracy): the training character recognition accuracy in a batch

test C-recog acc (or C-recog test accuracy): the testing character recognition accuracy in a batch

According to results above, the performance of the second model is worse than the first model. We assume that deep residual function might not be appropriate to the model we use
for the project. After our analysis, we suspect that the degradation of training and testing accuracy result from several reasons. The first potential reason is that model is not easy to
optimize, so it degrades the accuracy of training and testing. The second reason might be that we are supposed to use a shallow architecture. we need to add more neurons to our layers
and make our network wider. These assumptions are still under our investigation. We need to take a long time to study residual function framework, so we take the question as one
of our future extension.

Potential future extensions

Firstly, all collected data are pre-processed and segmented by our team manually. It spends us much more time on it. An interesting future extension can be designing another
neural network to detecting the car plate and a third network to segment the plate characters in an automatic way. By this, we can get a completely intelligent plate recognition
system.

Another potential extension can be that using a more systematic way to design neural networks to solve the problem and compare their performances. Next, it would also be
interesting to enlarge the data set and publish it as a benchmark for intelligent transportation. To the best of our knowledge, there are not many such data sets for supervised
learning in car plate detection, and car plate character segmentation. Since we have manually obtained the car images, car plate images, and car plate character images, and
annotated the data samples. They will be well suited for supervised learning tasks in intelligent transportation.

In this section, we briefly introduce some preliminary exploration in automatic plate detection and segmentation. Actually, in the beginning, we planed to design a python program called
catch_number_plate.py based on OpenCV to find a car plate license automatically and do the segmentation. However, our data come from different online sources, so there is no uniform
standard. In 50% of the samples, we fail to find a potential car plate licenses in a image. There are several examples of success and failure

Examples of success:

Examples of failure:

What is more, our data set is of small size. We consider data set augmenting and further improve the performances of the model. We try to perturb our input by a very small amount.
We use two ways on our created data. The first way is to add noise to our dataset. We create a python program adding_noise.py to add Gaussian noise on each segmented character
or digit. The second way is to tilt all characters or digits we segment from each image. We create a python program rotation.py to implment the second way.

Original data

Adding Gaussian noise

Tilting images

Conclusion and discussion

In conclusion, we successfully design and implement a license plate recognition system. After training processing, when an image set
includes the car plate license is given, our deep learning system can recognize these symbols.

In the project, we have two neural networks, so it is easy for us to compare and do analyses for our system. In our experiment, the first
neural network(Vanilla_CNN.py) has a better performance than the second approach(vanilla_resnet_v3.py). The accuracy of the first
model almost reaches 1.0; however, the second model has fluctuated from 0.55 to 0.6. In fact, we expect that residual function of the
residual network can provide shortcuts to help with training, but the result is different. We guess that we have 36 classes for training
images and our architecture might be slightly simple in the second approach. Finally, it results to a bad performance for the second
model.

On the other hand, our system has two drawbacks:

Our recognition system is successful. However, there are several works that we need to improve in the future such as fixing drawback we rise
above and designing detection system for potential car plate license in a image. Furthermore, thanks for the professor and the course, because
it gives us an opportunity to work for the project.

Acknowledgement

We would like to thank the professor and the course TA who have been providing us help during the course. We would also like to thank
Dr. Yang Yang for helping us prepare the data set and stimulating discussions.

Reference

These three papers are relevant on our project.

Teammate

Thus the assignments for every member are as follows:

proposal for License Plate Recognition

progress report for License Plate Recognition

codes