Image smoothing by nonlinear diffusion

    Introduction
========================================================================

The most common image smoothing techniques such as Gaussian and mean filtering
are linear methods that will not only suppress the noise
but also displace and destroy important structures in the image.

Smoothing an image using nonlinear diffusion can reduce the noise while
preserving these structures.

In this blog post I will derive and show how nonlinear diffusion for
structure preserving image smoothing can be implemented in python.
The examples are all in 1D and 2D but the extension to 3D images is rather trivial.
I test the filter on a computed tomography (CT) image of the abdomen from
the 3Dircadb dataset [#3dircadb].

![Figure [header]: When comparing a linear image smoothing to a nonlinear one we
can see that it can reduce noise while preserving edges and important structures.
This example is of a scalar valued CT image that has been colored for
illustration purposes.](files/figures/ct-example.png width=100%)

The method was introduced by Perona and Malik in their publication from
1990 [#peronamalik90] and was the start of an extensive
research on partial differential equations (PDE) and variational methods in the
context of image analysis and computer vision.

I highly recommend the lecture series on _Variational Methods for Computer
Vision_ by Daniel Cremers [#cvprtum] which is available on youtube.
The subjects relevant for this blog post are covered in lectures [3](https://www.youtube.com/watch?v=_37TuuwUvpk&list=PLTBdjV_4f-EJ7A2iIH5L5ztqqrWYjP2RI&index=4) and [4](https://www.youtube.com/watch?v=E4TII7ZHPis&list=PLTBdjV_4f-EJ7A2iIH5L5ztqqrWYjP2RI&index=5).

A comprehensive but very theoretical review of this and related methods can
be found in the book _Mathematical problems in image processing_
[#aubertkornprobst2006].
It serves as a very good collection and contextualization of the
academic literature up to its year of publication but it does not
cover much of the numerical schemes needed to practically implement
the methods.

!!! Info
    _**Partial differential equations? In my image analysis?**_

    I've always considered PDEs to be very hard.
    Often for good reason since very few of them have analytical
    solutions and then only for very simple geometries.

    Outside of these we need numerical methods which can make
    problems solvable but in my opinion still not easily so.

    When I first heard of their applications in image analysis
    I thought that they probably would have to be quite powerful methods.
    And even though this particular method do not completely solve
    the noise reduction problem I'm very impressed with
    some of the results I've seen in my own experiments.

    And perhaps just as important is that implementing this
    and some related methods was a fun challenge that pushed
    the limits of my mathematical knowledge.


$\require{color}$
$\newcommand{\Real}{\mathbb{R}}$
$\newcommand{\half}{\frac{1}{2}}$
$\newcommand{\xfat}{\textbf{x}}$


Image smoothing as a diffusion process
========================================================================

The diffusion equation
--------------------------------------------------------------------------------

The diffusion equation originates from two other equations.

First **Fick's law** which states that a difference in concentration of some substance
will induce a flow of the same in the negative direction of the concentration
gradient. The speed of this flow is given by.
\begin{equation}
    \label{eq:fick}
    q = -g \nabla u
\end{equation}
Where $\nabla u$ denotes the **gradient** of the scalar field $u$.

Secondly the **continuity equation** which states that the concentration can only change
over time at a certain location if there is some difference in the speed of flow such that the
flow out from this point is not the same as the flow in.
\begin{equation}
\label{eq:cont}
    \partial_{t} u = -\text{div} q = \nabla \cdot q = \nabla \cdot (g \nabla u)
\end{equation}
Here the nabla operator denotes the **divergence**, $\nabla \cdot q$, of the
vector field $q$.
This also states that substance can not be created or destroyed, only moved
around.

Using Equation \ref{eq:cont} the **diffusion equation** can be formulated as a partial differential
equation on the image domain in 2D or 3D, $\Omega \in \Real^{2}$ or $\Omega \in \Real^{3}$.
\begin{equation}
    \left\{
    \begin{split}
        & \frac{\partial u}{\partial_{t}} = \nabla \cdot (g \nabla u) \quad \text{in} \quad \Omega \\
        & \frac{\partial u}{\partial N} = 0 \quad \text{on} \quad \partial \Omega \\
        & u(0,\xfat) = u_{0}(\xfat)
    \end{split}
    \right.
\end{equation}
where $u$ is the image, a scalar field with a value at position $\xfat \in
\Omega$,
$u(\xfat) \in \Real$. $\partial \Omega$ denotes the boundary of the domain and
$u_{0}$ is the original image.

The rate of the diffusion is governed by the **diffusivity**, $g$, and
when categorizing different approaches of diffusion filters in image processing
we look at how the diffusivity is modeled.

Categorization
-------------------------------------------------------------------------------
Diffusion is categorized depending on the diffusivity $g$:
1. $g = 1$ or any constant, $g \in \Real$, is called **_linear_**, **_isotropic_** and
    **_homogeneous_** diffusion.
2. $g = g(\textbf{x})$ such that the amount of diffusivity depends on spatial
    position is called **_inhomogeneous_**.
3. $g = g(u)$ is called **_nonlinear_** since the equation is no longer linear in
    $u$. This is what is covered by the method by [#peronamalik90] and in this blog post.
4. $g$ is matrix valued such that the diffusivity is not the same in different
    directions. This is **_anisotropic_** diffusion.

In Figure [diffusion1d] an example of a linear homogeneous and an inhomogeneous 1D diffusion process of the same $u_{0}$ can be seen.
One have the same diffusivity at every point and corresponds to a linear smoothing of the signal.
The inhomogeneous one have the same diffusivity everywhere except at two insulated points where $g=0$.

This illustrates how an ideal smoothing could behave when information about true edges in the image is known beforehand.
By restricting diffusion across a certain section of the domain distinct structures of
different signal level can be reconstructed and identified.

![Figure [diffusion1d]: Smoothing of a 1D signal using diffusion processes. The linear diffusion process has the same diffusivity at all points and the smoothing will make the signal approach the mean. The inhomogeneous one illustrates an ideal case where no diffusion takes place across the two insulated points and the reconstructed signal is close to the original noiseless signal.](files/video/diffusion-1d.mp4 width=100%)


!!! Warning:
    **Confusion regarding nonlinearity and anisotropy**

    When I started to look into this subject it became clear that there
    had been a lot of discourse regarding the difference between **nonlinear**
    and **anisotropic** smoothing.

    The origin of this is the title of the 1990 paper by Perona and Malik
    [#peronamalik90] that popularized the approach, and which this blog post is based on, which is
    _Scale space and edge detection using **anisotropic** diffusion_ when
    actually what they are describing is a nonlinear **isotropic** method.

    This error can be seen in titles of academic papers decades later
    and this faulty naming scheme has also found its way into different codebases.
    _E.g._ in simple-ITK the nonlinear diffusion filter based on this method is called
    _GradientAnisotropicDiffusionImageFilter_ [#sitk].

Linear diffusion and Gaussian smoothing
-------------------------------------------------------------------------------
Probably the most prevalent image smoothing method is Gaussian smoothing in which the
image is convolved with a bell shaped kernel.
This computes a new value of each pixel as the weighted mean of its
neighbors where the weight decreases with the distance.

It can be shown that this is equivalent to performing a nonlinear diffusion
[#aubertkornprobst2006].

The solution to the two dimensional linear diffusion equation with $g=1$

\begin{equation}
\label{eq:onedee}
    \left\{
    \begin{split}
        & \partial_{t} u = \nabla^{2} u = \Delta u \\
        & u(\xfat,t=0) = u_{0}(\xfat)
    \end{split}
    \right.
\end{equation}

can be found to be the convolution of the initial distribution $u_{0}$ with
another function $G$
\begin{equation}
    u(\xfat,t) = \iint_{\Real^{2}} G_{\sigma}(\xfat-\xfat^{\prime})f(\xfat^{\prime})d\xfat^{\prime} = (G \ast u_{0})(\xfat)
\end{equation}

It can then be shown, I omit the proof, that $G_{\sigma}$ in this case is in fact the
Gaussian function with $\sigma = 2t$.

\begin{equation}
    G = G_{\sigma}(\xfat) = \frac{1}{2 \pi \sigma^{2}} \text{exp} (-\frac{|\xfat|^2}{2\sigma^{2}}) \quad , \sigma = \sqrt{2 t}
\end{equation}

Nonlinear diffusion by Perona and Malik
-------------------------------------------------------------------------------
The nonlinearity in the diffusion process proposed by Perona and Malik comes
from a diffusivity that is expressed as a function of the current image itself,
$g = g(u)$.

\begin{equation}
    \partial_{t} u = \nabla \cdot (g(u) \nabla u)
\end{equation}

The diffusivity is designed to preserve edges in the image
such that only very little diffusion should take place across.
The gradient of $u$ is used as an edge indicator.
\begin{equation}
    |\nabla u| = \sqrt{u_{x}^{2} + u_{y}^{2}}
\end{equation}

The diffusivity $g$ is given as a function of $u$, or rather its gradient $\nabla u$.
\begin{equation}
    g(u) = g(|\nabla u|^{2}) = \frac{1}{\sqrt{1 + \frac{|\nabla u|^{2}}{\lambda^{2}}}}
\end{equation}
where $\lambda$ is called the _contrast parameter_.
Parts of the image with very low diffusivity will act as insulation that
separates different regions of the image that ideally should correspond to
different structures or objects.
In Figure [diffusivity] these insulated regions can be seen as parts of the
bowel, organs and bony structure and in Figure [nonlindiffusion1d] it can be
seen how such a nonlinear diffusion can restore a noisy signal.

![Figure [diffusivity]: The diffusivity is computed from the gradient of the current image, $\nabla u$, and evolves during the diffusion process. The darker parts of the diffusivity correspond to parts of $u$ where little diffusion will cross, such that the smoothing will preserve these structures.](files/figures/ct-gradient-magnitude-diffusivity.png)

![Figure [nonlindiffusion1d]: Smoothing of a 1D signal using a nonlinear diffusion process where a high local gradient gives low diffusivity. Compared to the linear diffusion we can see that it restores the original signal much better. The numbers in the parenthesis refer to the classification in Subsection Categorization.](files/video/nonlin-diffusion-1d.mp4 width=100%)

Discretization using finite differences
-------------------------------------------------------------------------------

$\newcommand{\xdiv}{\color{red}  {\partial_{x} (g(u) \partial_{x} u) }}$
$\newcommand{\ydiv}{\color{green}{\partial_{y} (g(u) \partial_{y} u) }}$
$\newcommand{\xterm}{\color{red}  {(g_{i+\half,j}^{t}(u_{i+1,j}^{t} - u_{i,j}^{t}) - g_{i-\half,j}^{t}(u_{i,j}^{t} - u_{i-1,j}^{t})}}$
$\newcommand{\yterm}{\color{green}{(g_{i,j+\half}^{t}(u_{i,j+1}^{t} - u_{i,j}^{t}) - g_{i,j-\half}^{t}(u_{i,j}^{t} - u_{i,j-1}^{t})}}$

Since we are working on discretized images sampled on a grid the equation also
needs to be discretized, in this case by using finite differences to approximate
both the spatial derivatives and the temporal one.

We start with the same equation as before but restrict ourselves to a 2D image.
\begin{equation}
    \partial_{t} u = \nabla \cdot (g(u) \nabla u), \quad \Omega \in \Real^{2}
\end{equation}
Then we apply the divergence operator.
\begin{equation}
\label{eq:divergence}
    \partial_{t} u = \xdiv ~\color{black}{+}~ \ydiv
\end{equation}
Assuming the grid spacing is 1 in both dimensions the two terms in Equation [eq:divergence] with the spatial derivatives are discretized using a half-step [#cvprtum] central difference.

For $x$.
$$
\begin{equation}
\begin{split}
    \xdiv & \approx (g(u) \partial_{x} u)_{i+\half, j}^{t} - (g(u) \partial_{x} u)_{i-\half, j}^{t} \\
          & \approx \xterm
\end{split}
\label{eq:xder}
\end{equation}
$$
And for $y$.
$$
\begin{equation}
\begin{split}
    \ydiv & \approx (g(u) \partial_{y} u)_{i, j+\half}^{t} - (g(u) \partial_{y} u)_{i, j-\half}^{t} \\
          & \approx \yterm
\end{split}
\label{eq:yder}
\end{equation}
$$

where

\begin{equation}
    g_{i+\half,j} = \sqrt{g_{i+1,j} g_{i,j} }
\end{equation}

I perform a padding of both $u$ and $g$. Further, $g$ is set to 0 along
the borders of the image so that no intensity can enter or leave the image
domain.

We discretize the temporal derivative using forward Euler with the time step $\tau$ as
\begin{equation}
\label{eq:tder}
    \partial_{t} u \approx \frac{u_{i,j}^{t+1} - u_{i,j}^{t}}{\tau}
\end{equation}

Finally we substitute Equation [eq:tder], Equation [eq:xder] and Equation [eq:yder] in Equation
[eq:divergence] and solve for $u_{i,j}^{t+1}$ to get

$$
\begin{equation}
\label{eq:final}
\begin{split}
    u_{i,j}^{t+1} = u_{i,j}^{t} + \tau ~ \times ~ & \left( \xterm \right. \\
                             \color{black}{+} & \phantom{\big(} \left. \yterm \right)
\end{split}
\end{equation}
$$

Since we are using a forward Euler discretization stability is an issue which puts constraints
on $\tau$. It can be shown that $\tau \leq \frac{h}{2^{N+1}}$ where $h$ is the
grid spacing which in our case is 1 and $N$ is the dimensionality of the image
such that for a 2D image we have $\tau \leq \frac{1}{2^{2+1}} = 0.125$.

!!! Tip:
    **Nonlinear diffusion for volumetric, 3D, images**

    If you want to do the analogous discretization for 3D images all that is basically needed
    is a third term in Equation [eq:divergence] that corresponds to the
    derivative along the z-axis.

Python implementation
========================================================================

**Nonlinear image filter**
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Python
import math
import numpy as np
import scipy.signal

def nonlinearDiffusionFilter(image: np.ndarray,
                             iterations = 10,
                             lamb = 1.0,
                             tau = 0.125,
                             image_seq = None):
    """
        Execute nonlinear, isotropic, smoothing filter on image.
        The method is described in the 1990 paper by Perona and Malik.
        This smoothing method uses diffusion that preserves edges.
    """

    def computeUpdate(u: np.ndarray, g: np.ndarray):
        """
            Compute the update for the next iteration using the spatial
            derivatives.
        """
        update = np.zeros(u.shape, dtype=float)
        u = np.pad(u, pad_width=1, mode='constant')
        g = np.pad(g, pad_width=1, mode='constant')

        for i in range(1, u.shape[1]-1):
            for j in range(1, u.shape[0]-1):

                g_pj = math.sqrt(g[j, i+1] * g[j, i])
                g_nj = math.sqrt(g[j, i-1] * g[j, i])
                g_ip = math.sqrt(g[j+1, i] * g[j, i])
                g_in = math.sqrt(g[j-1, 1] * g[j, i])

                if i==u.shape[1]-2:
                    g_pj = 0
                if i==1:
                    g_nj = 0
                if j==u.shape[0]-2:
                    g_ip = 0
                if j==1:
                    g_in = 0

                ux0 =   g_pj * (u[j, i+1] - u[j, i])
                ux1 = - g_nj * (u[j, i]   - u[j, i-1])
                uy0 =   g_ip * (u[j+1, i] - u[j, i])
                uy1 = - g_in * (u[j, i]   - u[j-1, i])

                # update is not padded so need to subtract 1 from i an j
                update[j-1,i-1] = ux0 + ux1 + uy0 + uy1
        return update

    def computeDiffusivity(u: np.ndarray, lamb: float):
        """
            Compute the nonlinear, gradient derived, diffusivity.
        """
        shape = u.shape
        if len(shape) > 2 and shape[2] > 1:
            print("RGB to gray")
            u = skimage.color.rgb2gray(u)
        gradkernelx = 0.5 * np.array([[ 0.0, 0.0, 0.0],
                                      [-1.0, 0.0, 1.0],
                                      [ 0.0, 0.0, 0.0]])
        gradkernely = 0.5 * np.array([[ 0.0,-1.0, 0.0],
                                      [ 0.0, 0.0, 0.0],
                                      [ 0.0, 1.0, 0.0]])
        gradx  = scipy.signal.convolve2d(u, gradkernelx, boundary='symm')
        grady  = scipy.signal.convolve2d(u, gradkernely, boundary='symm')
        gradm2 = np.power(gradx, 2) + np.power(grady, 2)
        g = 1.0 / np.sqrt(1.0 + gradm2 / (lamb*lamb))
        return g

    u = np.copy(image)
    if len(u.shape) > 2 and u.shape[2] == 1:
        u = np.reshape(u, (u.shape[0], u.shape[1]))
    if image_seq != None:
        image_seq.append(np.copy(u))

    for i in range(iterations):
        print(f"Iteration: {i+1}/{iterations}")
        g = computeDiffusivity(u, lamb)
        update = computeUpdate(u, g)
        u = u + tau * update
        if image_seq != None:
            image_seq.append(np.copy(u))
    return u

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~



Example: Smoothing a CT image in 2D
========================================================================

I applied the nonlinear smoothing to a CT image and compared with linear
smoothing. Overall the nonlinear smoothing succeeds in reducing noise while
preserving relevant structures.

![Figure [ct]: Nonlinear and linear smoothing of a CT image. The nonlinear smoothing preserves the anatomical structures such as bone and organs. Topology is also better preserved which can be seen as the gas filled parts of the intestine do not merge when using the nonlinear smoothing. The image used in the example is from the [#3dircadb] dataset.](files/video/ct-diffusion.mp4 width=100%)

Conclusion
========================================================================

I got quite good results using this method on CT images.
I also tried to use it on a photograph with some artificially
added noise but could not see any clear benefits over the Gaussian
smoothing, and decided to not include this example.
This implies that the type of noise is an important factor
when choosing what noise reduction method to use.

Further I found the two parameters $\lambda$ and number of iterations
quite hard to balance and get intuition for compared with the $\sigma$
in Gaussian image smoothing.

The current state of the art in noise reduction and structure
preserving image smoothing is likely some method using deep learning
but I find this PDE based method very appealing. Both from
a theoretical point of view but also because of the nice results.

+++++

**Bibliography**:

[#aubertkornprobst2006]: Mathematical problems in image processing: Partial
Differential equations and the calculus of varations, 2nd edition, 2006
https://link.springer.com/book/10.1007/978-0-387-44588-5

[#cvprtum]: Variational methods for computer vision
https://www.youtube.com/watch?v=fpw26tpHGr8&list=PLTBdjV_4f-EJ7A2iIH5L5ztqqrWYjP2RI

[#peronamalik90]: Pietro Perona and Jitendra Malik. 1990.
Scale-space and edge detection using anisotropic diffusion
IEEE Transactions on pattern analysis and machine intelligence 

DOI: https://doi.org/10.1109/34.56205 

https://authors.library.caltech.edu/6498/1/PERieeetpami90.pdf

[#sitk]: SipleITK _GradientAnisotropicDiffusionImageFilter_
https://simpleitk.org/doxygen/latest/html/classitk_1_1simple_1_1GradientAnisotropicDiffusionImageFilter.html 

[#3dircadb]: 3D-IRCADb (3D Image Reconstruction for Comparison of Algorithm Database) 

https://www.ircad.fr/research/3dircadb/ 

Distributed under Create Commons Attribution-Non commercial-No Derivative Works 3.0
[CC 3.0](other/creative-commons-legal-code.pdf)
Code for all figures and examples:
https://bitbucket.org/nils_olovsson/diffusion_image_filters/src/master/
[#]
Image smoothing by nonlinear diffusion

2022-02-27

Keywords: Image processing, Diffusion, PDE, Python