Neural Networks for Pattern Recognition-v4 PDF

Title	Neural Networks for Pattern Recognition-v4
Author	Yu Kan Ma
Course	Computer Vision And Pattern Recognition
Institution	香港理工大學
Pages	11
File Size	517.1 KB
File Type	PDF
Total Downloads	63
Total Views	147

Preview

CLICK TO PREVIEW PDF

Summary

Neural Networks for Pattern Recognition...

Description

EIE4100 Computer Vision and Pattern Recognition Part II Laboratory: Neural Networks for Pattern Recognition Objectives 1. To understand the working principles of a multi-layer perceptron and a convolutional neural network; 2. To get familiar with the error back propagation and stochastic gradient descent algorithms to train neural networks; and 3. To use neural networks to solve challenging visual recognition problems.

1 Introduction In this section, we first introduce two types of widely used neural network models, i.e. multi-layer perceptron (MLP) and Softmax classifier. We then describe convolutional neural networks (CNNs), which has attracted a growing attention from computer vision and machine learning community recently. 1.1 Multiple-Layer Perceptron (MLP) MLP consists of one input layer, one or several hidden layers and one output layer, as shown in Fig. 1. The nodes between two neighbor layers are full connected. For each node in each hidden layer and the output layer, an activation function is applied:

Fig. 1. The structure of MLP. 𝐱 𝑙+1 = 𝑓(𝒖𝑙+1), with 𝒖𝑙+1 = 𝐰 𝑙+1 𝐱 𝑙 + 𝒃𝑙+1 1

where 𝐱 𝑙+1 denotes the output of the 𝑙 + 1 layer, 𝐱 𝑙 indicates the output of the 𝑙

layer, 𝐱 0 is the input, 𝐰 𝑙+1 is the connection weights, 𝒃𝑙+1 is the bias, 𝑓 is

normally the sigmoid function (𝑓(x) = 1/(1 + 𝑒 −𝑥 )).

Suppose that we have 𝑁 training samples, each training sample is categorized

into one of 𝐾 classes, then the ground truth label 𝐭 is a K-length vector with 0 and 1.

For each training sample 𝑖, we can obtain the following error:

2

1

1

𝑒𝑖 = 2 ∑𝐾𝑗=1 (𝑦𝑗𝑖 − 𝑡𝑗𝑖 )2 = 2 ‖𝐲𝑖 − 𝐭 𝑖 ‖2

where 𝑒𝑖 is the error, 𝑦𝑗𝑖 is the 𝑗-th predict result, and 𝑡𝑗𝑖 is the 𝑗-th ground truth of the 𝑖-th sample. We can further define the accumulated error of 𝑁 training samples: E=

1

𝑁

∑𝑁 𝑖=1 𝑒𝑖 =

1

2𝑁

𝑖 𝑖 ∑𝑁 𝑖=1‖𝐲 − 𝐭 ‖

2

2

If we consider the regularization term, the E can be rewritten as: 1

2

1

𝑖 𝑖 2 E = 2𝑁 ∑𝑁 𝑖=1‖𝐲 − 𝐭 ‖2 + 2 𝜆‖𝐰‖2

where 𝜆 indicates the regularization coefficient. The purpose of the regularization

term is to make the small weights even small and eventually these weights will be close to zero (disconnected). In general, back propagation with stochastic gradient descent is widely used to train the neural networks. We have: 𝜹𝑙 = (𝐰 𝑙+1 )𝑇 𝜹𝑙+1 ∘ 𝑓 ′ (𝒖𝑙 )

where 𝜹 is the “errors” back propagated from output layer, the operator “∘” denotes the element-wise multiplication. For the output layer, the 𝜹 takes a slight different form:

𝜹𝐿 = (𝐲 − 𝐭) ∘ 𝑓 ′ (𝒖𝑙 )

where 𝑓(∙) is usually the sigmoid function. We can easily get the derivative of the sigmoid function as: 𝑓 ′ (𝑥 ) = 𝑓(𝑥)(1 − 𝑓(𝑥 )). Finally, we have 𝜕𝐸

𝜕𝐰𝑙

𝜕𝐸

𝜕𝒃𝑙

= 𝒙𝑙−1 (𝜹𝑙 )𝑇 , Δ𝐰 𝑙 = −𝜂 =

𝜕𝐸 𝜕𝒖𝑙 𝜕𝒖𝑙 𝜕𝒃𝑙

𝜕𝐸

𝜕𝐰𝑙

= 𝜹𝑙 , Δ𝒃𝑙 = −𝜂

𝜕𝐸

𝜕𝒃𝑙

, 𝐰 𝑙 = 𝐰 𝑙 + Δ𝐰𝑙 , 𝒃𝑙 = 𝒃𝑙 + Δ𝒃𝑙

1.2 Softmax Classifier A Softmax classifier looks like the MLP with the difference lying in the activation function and the loss function used. For each hidden layer, the activation function

used is the Relu (f(x) = max(0, x)). For each training sample 𝑖, the loss entropy is defined as

2

𝑓 𝑒 𝑡𝑖

𝐿𝑖 = −log(

𝑓 ) ∑ 𝑒 𝑗 𝑗

where 𝑓𝑗 is the 𝑗-th element of the class scores 𝒇, 𝑡𝑖 indicates the true class of the

𝑖-th training sample. The full loss of the training set is the mean of all the training samples:

1

𝐿 = 𝑁 ∑𝑁 𝑖=1 𝐿𝑖 = −

1

𝑁

∑𝑁 𝑖=1 log (

𝑓 𝑒 𝑡𝑖

𝑓 ∑𝑗 𝑒 𝑗

)

Just like MLP, back propagation with gradient descent can be used to train the Softmax classifier. We define 𝑝𝑡𝑖 as the probability of a training sample belonging to

the 𝑡𝑖 -th class, then we have:

𝑝𝑡𝑖 =

The derivative of 𝑝𝑡𝑖 with respect to 𝑓𝑗 is: 𝜕𝑝𝑡𝑖 𝜕𝑓𝑗

={

𝑓 𝑒 𝑡𝑖

𝑓 ∑𝑗 𝑒 𝑗

𝑝𝑡𝑖 (1 − 𝑝𝑗 ) −𝑝𝑡𝑖 𝑝𝑗

Then the derivative of 𝐿𝑖 with respect to 𝑓𝑗 is

𝑡𝑖 = 𝑗 𝑡𝑖 ≠ 𝑗

𝜕𝐿𝑖 1 𝜕𝑝𝑡𝑖 = (𝑝𝑗 − 1{𝑡𝑖 = 𝑗}) =− 𝜕𝑓𝑗 𝑝𝑡𝑖 𝜕𝑓𝑗

where 1{𝑡𝑖 = 𝑗} is equal to 1 when 𝑡𝑖 = 𝑗 and otherwise is 0. We can further apply

the chain rule to compute the derivatives of 𝐿𝑖 with respect to 𝐰 and 𝒃 as the MLP

classifier.

1.3 Convolutional Neural Network Convolutional neural network (CNN) is a kind of deep learning model. Fig. 2 shows a simple CNN used in our experiment. It consists of two convolutional layers with each convolutional layer followed by a pooling layer. The convolutional layers extract the features from the input by applying a number of trainable filters or kernels sliding across the input image. Each convolutional layer is followed by a pooling layer, which is used to reduce the spatial size of representation and alleviate the over fitting. The pooling layer takes

small square blocks ( 𝑠 × 𝑠 , generally 𝑠 = 2 ) from the convolutional layer and subsamples it to produce a single output from each block. The most common pooling form is average pooling or max pooling. After several convolution and max pooling layers, the final output is a 1-by-n vector with each node indicating one class. Each neuron of the output layer fully 3

connects to the nodes from the previous layer. The output is an 𝑛-way softmax predicting the probability distribution over 𝑛 different classes.

The error back-propagation strategy and gradient decent algorithm used in

training an MLP is adopted for training a CNN. However, training a CNN is more difficult and complex than training an MLP. The convolutional layer, pooling layer and full-connected layer take different BP (back-propagation) forms. More details can be found on the following websites: http://deeplearning.net/tutorial/lenet.html#lenet http://cs231n.github.io/convolutional-networks/

Fig. 2. A simple CNN model used in our experiment.

1.4 Trainable Parameters Although CNN model is a much deeper neural network than MLP, the trainable parameters might not more than that of MLP. In our experiment, suppose we have a MLP with one hidden layer which contains 100 hidden nodes. The input size is 32-by-32, the output is a 1-by-10 vector, then we can compute the trainable

parameters in the hidden layer as (32 × 32) × 100 + 100 = 102500. The number

of the trainable parameters in the output layer are 100 × 10 + 10 = 1010, the total trainable parameters for the MLP is 102500 + 1010 = 103510.

For the CNN model, note that there are no trainable parameters in the pooling layer. So we only need to compute the parameters in the convolutional layers and the output layer. We take the model shown in Fig. 2 as an example. For the C1 layer, the number

of parameters are (5 × 5) × 8 + 8 = 208 , the parameters in the C2 layer are

(5 × 5) × 8 × 16 + 16 = 3216, the number of parameters in the output layer are

(5 × 5 × 16) × 10 + 10 = 4010 , the total parameters are 208 + 3216 + 4010 = 7434, which is much less than the MLP.

4

2 Data Sets In this laboratory exercise, we will use neural networks to solve visual pattern recognition problems. Two data sets, MNIST and CIFAR10 are utilized in experiments. MNIST is a handwritten digits database. It has a training set of 60000 samples and a test set of 10000 samples, all the samples are categorized into one of ten digits: 0, 1, 2,…9. The digits have been size-normalized and centered in a fixed-size image. CIFAR10 consists of 60000 32x32 color images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images. The ten classes are “airplane”, “automobile”, “horse”, “bird”, “cat”, “deer”, “frog”, “truck”, “ship” and “dog”. Fig. 3 shows examples from the two databases.

Fig. 3. Examples from MNIST database (left) and CIFAR10 database (right).

3 CV_GUI Toolbox The toolbox used in our experiment is the CV_GUI, which was written in Matlab, this toolbox has been successfully tested on Matlab 2011 and 2013, with Windows 7 64-bit operation system. The toolbox contains all the necessary files, including the database mat, source code files etc. You are free to change the source code to build your own neural networks or apply them to other application if necessary. The following is a brief introduction on how to use the toolbox: 1. Download the CV_GUI toolbox from the blackboard. 2. Open Matlab and add the toolbox to the Matlab current path. 3. If you want to apply the MLP or Softmax classifier, you can input “Main_NN” in the Matlab command window. If you want to use the CNN, you can input 5

“Main_CNN” in the command window. 4. When you input “Main_NN” or “Main_CNN”, a GUI (graph user interface) will pop out, just as shown in Fig. 4. There are three blocks on the GUI, “Database”, “Train & Test” and “Plot results”. 5. In the “Database” block, you can choose the database and click “Show training samples” button to examine training samples. In the “Train & Test” block, you can set the parameters. For a CNN, note that the “conv1” and “conv2” indicate the number of filters used in the convolution layers. Since we have fixed the filter size (5x5) in this GUI, you need not to set the filter size (you can modify the source code to change the filter size). “pool1” and “pool2” are the pooling strategies for two pooling layers. After you set the parameters properly, you can click the “Train & Test” button to do the training and testing. In the Matlab commond window, you need to press the “enter” button to start the training process. You can see the display of training errors in the window. Finally, you can show the results in the “Plot results” block. 6. There are five different results we can show, i.e. “train error”, “train accuracy”, “test accuracy”, “train confusion matrix” and “test confusion matrix”. The first three results are shown on the GUI directly, for the confusion matrix, it will pop up another figure namely “image tool” as shown in Fig. 5 (a). You can click the second icon (inspect pixel values, surrounded by the circle in Fig. 5 (a)) to show the confusion matrix. The confusion matrix will look like that shown in Fig. 5 (b).

6

(a)

(b) Fig. 4. GUI for MLP or Softmax classifier (a) and CNN (a). 7

(a)

(b)

Fig. 5. (a) In the image tool window, click the icon with the (red) circle will show the confusion matrix as shown in (b). Some training parameters are briefly explained below: 

Training samples: the number of training samples for training



Test samples: the number of test samples



Batch size: the weight modifications are accumulated for a batch size of patterns before the weights are modified.



Learning rate: to control the step of weight modification



Regularization: to control the contribution from the regularization item. “0” means that no regularization item is used.



Train epochs: the number of iterations (going through the whole training set once is counted as one iteration).

8

EIE522Pattern Recognition: Theory and Applications, Part II Laboratory: Neural Networks for Pattern Recognition

Student Name:____________________ Student Number:_______________

1. Table 1 shows the exercises to be carried out by you. For each of these exercises, you should try different training parameter settings and report your findings. Table 1. Exercises Classifier

Hidden layer

Database

MLP

0

MNIST/CIFAR 10

MLP

1

MNIST/CIFAR 10

Softmax

0

MNIST/CIFAR 10

Softmax

1

MNIST/CIFAR 10

CNN

/

MNIST

CNN

/

CIFAR 10

2. When you apply an MLP classifier or a Softmax classifier to recognize the handwritten digits and objects, you can set the training parameter as shown in Table 2. Find out how each of these training parameters will affect the training process and classifier performance. Table 2. Training parameters and performance measures for an MLP or Softmax classifier Classifier

Training samples

Training

Learning rate

Regularization

parameters

Batch size

Train epochs

Hidden layer

Hidden nodes

Test samples

Train accuracy

Results Test accuracy

9

3. When you apply a CNN to recognize the handwritten digits and objects, you can set the training parameter as shown in Table 3. Find out how each of the training parameters will affect the training process and classifier performance.

4. Table 3. Training parameters and performance measures for a convolutional neural network Learning rate

Training samples

Training

Batch size

Train epochs

parameters

Conv. layer 1

Pool 1

Conv. layer 2

Pool 2

Test samples

Train accuracy

Results Test accuracy

In your experiment report, you should list the training parameters you have set. The report should also include the following five results: “train loss”, “train accuracy”, “test accuracy”, “train confusion matrix”, and “test confusion matrix”. Comment on your findings. 5.

(Optional) (1) You are encouraged to modify the source code to have more flexible setting of parameters. (2) Obtain the kernel weights and draw the kernel weights in maps.

On MATLAB (1) The old version of MATLAB may not run CNN models properly. It is suggested to use the trail version of Mathworks, which is free for 30 days: https://www.mathworks.com/programs/trials/trial_request.html?s_iid=coabt_trial_abt us_tb (2) You may use the computers in CF004 and CF105/CF105a. (3) The PolyU Virtual Student Computer Centre (vSCC) allows students to remotely access desktop OS and a list of commonly-used software applications including Matlab anywhere, anytime, on or off-campus.

10

PolyU VSCC Setup - Windows - Web Access: https://youtu.be/f4XGlqjqZcU PolyU Access Link: https://vdesk.polyu.edu.hk/access PolyU VSCC Software available induling MATLAB 9 (R2016a) http://www.polyu.edu.hk/vscc/SoftwareAvailable.html

11...