Deep learning approaches in image processing predominantly resort to supervised learning. A majority of methods for image denoising are no exception to this rule and hence demand pairs of noisy and corresponding clean images. Only recently has there been the emergence of methods such as Noise2Void [3], where a deep neural network learns to denoise solely from noisy images. However, when clean images that do not directly correspond to any of the noisy images are actually available, there is room for improvement as these clean images contain useful information that fully unsupervised methods do not exploit. In this paper, we propose a method for image denoising in this setting. First, we use a flow-based generative model to learn a prior from clean images. We then use it to train a denoising network without the need for any clean targets. We demonstrate the efficacy of our method through extensive experiments and comparisons.
Simply put, a flow-based generative model [2] \((h)\) is a bijection between a pair of random variables. Typically, one of these random variables \((Z)\) is something for which the probability density function is known and tractable (e.g. Gaussian). While the other random variable \((X)\) is the one we are interested in modelling. In our case, the latter is a clean image. As in any deep neural network, \(h\) is a composition of many layers \(h_1, h_2, \dots, h_n\). Now, \begin{align} X = h(Z) &= h_n \circ h_{n-1} \circ \dots h_1 (Z)\\ \implies \log{f_X(x)} &= \log{f_Z(z)} - \sum_{i=1}^n \log{|J_{h_i}|}\\ \end{align} where \(J_{h_i}\) is the Jacobian of the \(i^{\text{th}}\) layer. The weights of the network \(h\) are trained by maximizing the above log-likelihood of \(X\). Note that you cannot use any old layer in \(h\); it has to be bijective and its Jacobian must be easy to compute. This rules out the most obvious choices such as convolution, pooling etc (at least not in their typical form). For some examples of layers that fit the bill, check out Dinh et al., Kingma et al. If you are interested in learning more theoretical details about flow-based generative models, see this excellent tutorial on flow-based generative models.
Our method proceeds in two stages:
In the first stage, we train a flow-based generative model on clean images. This will act as a prior \((f_C)\) for the denoiser that we train in the next stage.
Here, we train a ResNet to denoise images. Starting with the maximum a posteriori (MAP) estimate for the denoised image and a little bit of math, we arrive at the following objective, which we minimize: \begin{equation} (B(D) - B(N)) ^ 2 - \lambda \log {f_C(D)} \end{equation} where \(D\) is the denoised output image, \(N\) is the noisy input image, \(f_C\) is the prior from stage 1, \(\lambda \) is a hyperparameter that controls the relative weighting between the prior and the squared loss, and \(B\) is a blur kernel. To be precise, \(\lambda \) depends on the noise level \((\sigma)\). Bxut because we want our denoiser to work regardless of the noise strength, we blur the input and the output before computing the squared error. Intuitively speaking, this signals the network to reproduce the low-frequency information from the input (which is largely unaffected by noise), while using the prior to fill in the high-frequency details.
I encourage you to look at our paper for more details about the derivation and the training.
Check out our paper for quantitative results.