Image Processing: Applying Styles to Images
Our Results
We have learned a lot while working on this project, and we have been able to
successfully implement sections of the original paper. This page outlines and describes these results, and the related code can be found here.
Understanding CNNs and Implementing them with MatConvNet
This was our first attempt at working with the MATLAB library MatConvNet. When doing our research we came across this simple example of how 2-D convolution happens in a convolutional layer of a CNN (this can be found here). They used small matrices rather than full images for simplicity. We decided to replicate this using MatConvNet. We successfully were able to get the same output as the example. This was a helpful tool because it verified our understanding of CNN responses, taught us about the small nuances of the library MatConvNet, and showed us we can access the individual responses (the response of a single layer of the CNN) using this library.
Gradient Descent with a Full Layer of VGG-19
Our next step was to attempt gradient descent and back propagation with just one convolutional layer. We did this with the same loss function and network that A Neural Algorithm uses, except with one convolutional layer instead of many. Our result was similar to the 'content' side of this algorithm described here. We were unable to use MatConvNet's training function to perform gradient descent and backpropagation because of the non-standard nature of this algorithm, so we implemented this ourselves. This was a challenging thing to accomplish.
Our goal with this code was to reconstruct an image of a fox using one convolutional layer. An important concept that we learned and hoped to show with this code is that content can be pulled from running the
desired content image through one convolutional layer, and this content can be used to alter a white noise image until it matches the content.
Our implementation uses the weight, bias, pad, stride, and dilate properties of the first convolutional layer of a trained VGG-19 network. Fortunately for us, only using one layer proves to have better reconstruction results because with more layers the small details are lost and only the high-level content is preserved. For implementing the actual algorithm, the paper uses a higher convolutional layer because these represent higher level shapes in the image, and the small details of the content image are not needed if a style is to be applied.
On our first attempt at gradient descent, we updated the weights, biases, and white noise image after each iteration. We found that after the first iteration we could see the beginnings of the reconstruction of a fox, but after the second iteration our image approached black or white, representing the min or max pixel values. This output is shown below. Note that the white noise image has diverged to black after 20 iterations, and that the loss function increases to large values (approx. 10^38).
At first we thought that the gradient descent function was overshooting a local minimum because of a high learning rate. In response we tried using very small learning rates, but our loss function always diverged after 2 iterations. After that we tried leaving the layer weights and biases alone after each iteration, and only use backpropagation to update the white noise input image. This solved our problem, so we can successfully reproduce images like the one below. Note that the white noise image becomes progressively more similar to the fox image, and that the loss values decreases towards an asymptote. This indicates that a local minimum was reached. This image reconstruction was done using 30 iterations at a learning rate of 0.01.
Gradient Descent with More Layers of VGG-19
Our next step was to perform gradient descent using a network with 2 convolutional layers, 2 relu layers, and an average pooling layer in order to see the difference in the output between just 1 layer and 5 layers. We modeled our 5 layers after those in VGG-19 because that’s what the paper did. The pixel values start to diverge after a few iterations using this implementation, and we have have been unsuccessful at figuring out why. We tried drastically decreasing the learning rate to rule out the possibility of over-shooting a local minimum, but this did not solve the issue and the values consistently diverged after around 2 iterations.
Image Sources