Skip to content

We take an image and add the style of another reference style image to it and give it a new look. We do this experiment inspired from ”Image Style Transfer Using Convolutional Neural Networks” (Gatys et al CVPR 2015).

Notifications You must be signed in to change notification settings

svellaichamy3/Style_Transfer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 

Repository files navigation

Introduction

Wouldn't it be amazing to get Picasso or Van Gogh to paint your beautiful neighbourhood in their own style? Deep Learning helps us do that! We take an image and add the style of another reference style image to it and give it a new look. We do this experiment inspired from ”Image Style Transfer Using Convolutional Neural Networks” (Gatys et al CVPR 2015).

Idea

The general idea is to take two images (a content image and a style image), and produce a new image that reflects the content of one but the artistic ”style” of the other. We will do this by first formulating a loss function that matches the content and style of each respective image in the feature space of a deep network, and then performing gradient descent on the pixels of the image itself. In this project, we use SqueezeNet as our feature extractor.

We can generate an image that reflects the content of one image and the style of another by incorporating both in our loss function. We want to penalize deviations from the content of the content image and deviations from the style of the style image. We can then use this hybrid loss function to perform gradient descent not on the parameters of the model, but instead on the pixel values of our original image.

Content Loss

Content loss measures how much the feature map of the generated image differs from the feature map of the source image. We only care about the content representation of one layer of the network (say, layer l) that has feature maps Al ϵ R1*Cl*Hl*Wl. Cl is the number of channels in layer l, Hl and Wl are the height and width. We will work with reshaped versions of these feature maps that combine all spatial positions into one dimension. Let Fl ϵ RNl*Ml be the feature map for the current image and Pl ϵ RNlMl be the feature map for the content source image where Ml = Hl Wl is the number of elements in each feature map. Each row of Fl or Pl represents the vectorized activations of a particular filter, convolved over all positions of the image. Finally, let wc be the weight of the content loss term in the loss function. Then the content loss is given by:

Style Loss

Now we can tackle the style loss. For a given layer l, the style loss is defined as follows: First, compute the Gram matrix G which represents the correlations between the responses of each filter, where F is as above. The Gram matrix is an approximation to the covariance matrix – we want the activation statistics of our generated image to match the activation statistics of our style image, and matching the (approximate) covariance is one way to do that. There area variety of ways you could do this, but the Gram matrix is nice because it’s easy to compute and in practice shows good results. Given a feature map Fl of shape (1,Cl,Ml), the Gram matrix has shape (1,Cl,Cl) and its elements are given by:

Assuming Gl is the Gram matrix from the feature map of the current image, Al is the Gram Matrix from the feature map of the source style image, and wl a scalar weight term, then the style loss for the layer l is simply the weighted Euclidean distance between the two Gram matrices:

In practice we usually compute the style loss at a set of layers L rather than just a single layer l; then the total style loss is the sum of style losses at each layer:

Total Variation Loss

It turns out that it’s helpful to also encourage smoothness in the image. We can do this by adding another term to our loss that penalizes wiggles or total variation in the pixel values. This concept is widely used in many computer vision task as a regularization term. You can compute the ”total variation” as the sum of the squares of differences in the pixel values for all pairs of pixels that are next to each other (horizontally or vertically). Here we sum the total-variation regualarization for each of the 3 input channels (RGB), and weight the total summed loss by the total variation weight, wt:

Results

It was a wonderful fun experiment where we captured the style of one image and transferred it to the content of another image. Here are a few more results:

About

We take an image and add the style of another reference style image to it and give it a new look. We do this experiment inspired from ”Image Style Transfer Using Convolutional Neural Networks” (Gatys et al CVPR 2015).

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published