Image registration methods are used to establish geometrical correspondences between different datasets. Various characteristics of image data can be exploited to drive image registration algorithms. Thus, the currently available schemes can be roughly divided into two classes: intensitybased and featurebased registration schemes. In this paper, we present a mathematical framework, based on the calculus of variations, for combining these two classes in order to benefit from the advantages of both strategies. The goal is to obtain a registration algorithm which achieves a good matching of datasets near landmark locations but also away from them (by matching the corresponding intensities). The proposed approach includes the novel formulation of a disparity term which simultaneously takes into account the structural similarity index (a similarity measure which considers spatial dependencies in the images) and the location of outstanding points. Since the iteration which results of the variational formulation is translated into the frequency domain, the implementation of the proposed algorithm offers a good speedperformance tradeoff when compared to other stateoftheart image registration implementations. Experimental results show the advantages, in the medical setting, of the combined SSIM and landmarkbased approach over wellestablished registration techniques which use either landmark or intensity information alone. In particular, the registration of triplephase 3D computed tomographies of the liver under injection of a contrast agent has been chosen for such a comparison. The datasets are acquired at different times depending on the arrival time of the contrast agent in arteries, portal and hepatic veins, so they have to be registered in order to show the liver structures acquired at each phase in a common framework. These multiphase studies provide tumor enhancement on the arterial and portal venous phases that support differential diagnosis of lesions in the liver.
Keywords: Image registration, variational methods, Fourier domain, structural similarity index, landmarkdriven registration, contrastenhanced liver CT
Image registration is the process of finding the spatial correspondence between two different datasets (images or volumes), which typically represent different views of the same object or similar ones; in other words, register is equivalent to align in this context. Particularly, in medical imaging there are several applications that require a registration step (e.g. image fusion, atlas matching, pathological diagnosis). Classical reviews of image registration can be found, for example, in the references [6], [28], [47] or [29]. More recent comprehensive overviews about image registration methods can be found in [37] and [36], where the stateoftheart advances in this field are described in a systematic fashion, and the techniques applied to medical datasets are emphasized as well. When classified according to the image characteristics used to estimate the transformation which geometrically relates the involved datasets, registration algorithms are usually divided into three groups: geometric featurebased, which rely on the segmentation, done generally before the registration process itself, of part or all of the images, therefore obtaining objects that are registered by minimizing some geometrical distance between them; intensitybased, which maximize a intensity similarity measure computed between points lying at the same spatial position; and iconic featurebased, which can be understood as intermediate between the two previous categories, since they use explicitly some type of geometrical distance in addition to the intensity similarity measure, see [8]. The performance of intensitybased image registration methods is highly influenced by the choice of the similarity measure. This measure can be defined directly on image intensities as, e.g., sum of squared differences (SSD), correlation coefficient (CC), correlation ratio (CR), or mutual information (MI). A complete analysis of all these metrics can be found in [27]. These measures often rely on the assumption of independence and stationarity of the intensities from pixel to pixel without considering their spatial dependencies. Alternative metrics have been proposed in order to consider intensity nonstationarities and complex spatiallyvarying intensity distortions. For instance, in [31] a similarity measure based on the residual complexity is described. Another example is the structural similarity index (SSIM), introduced by [43], which achieves a good tradeoff between speed and performance and is able to cope with complex relationships between image intensity values. Due to the mathematical properties of SSIM [7] and its potentials in both theoretical development and practical applications, it is being incorporated into optimization frameworks in order to improve perceived image/video quality in a number of image processing problems, as e.g. [17] or [34]. This similarity measure has been previously used in image registration in [4], [1], [41], [42], [45], or [19], but its use is still not widespread —for example, the evaluation addressed in [33] does not include SSIM and the review in [36] does not deal with similarity measures based on the SSIM—.
Intensitybased approaches are in general fully automatic and usually yield good registration results. However, they may perform poorly for specific, important locations such as anatomical landmarks. On the other hand, geometric featurebased registration techniques are designed to accurately match user specified landmarks, see e.g. [35]. Recent examples of landmarkbased registration methods in medical imaging scenarios can be found in [44] or [21]. A common drawback of most landmarkdriven registration methods is the fact that the intensities of the datasets are completely neglected. Consequently, the registration outcome away from the landmarks may be very poor. In the literature, the registration results are typically obtained by interpolating the transformation with a thinplate spline (TPS) model [5] or by combining iterative closest point (ICP) registration and parametric relaxation [38], among other techniques. Although these approaches produce a smooth transformation from one dataset to another, they do not define a consistent correspondence between the two datasets except at the landmark points. Some previous attempts to combine the sparse correspondences with a intensitybased registration method can be found in [18] or [39]. However, to the authors knowledge, there has been no attempt to formulate a registration method which combines in the distance term both a SSIMbased term and a landmarkbased term. In recent works, such as [32] or [26], the authors of this paper dealt with intensitybased registration approaches within the variational framework first presented in [25], whose formulation in the frequency domain allowed for implementations of high efficiency [40]. Please refer to [2] or [46] for further details regarding variational image registration. The aforementioned approaches involved similarity terms derived from SSD or CRbased measures, so none of them included a SSIMbased nor a landmarkbased distance term. Regardless, the method presented in [26] achieved better results than publicly available stateofthe art approaches such as Elastix [20] and ANTs ().
The paper is organized as follows: Section 2 addresses the formulation, within the variational framework, of a novel registration method which combines in the distance term both a SSIMbased term and a landmarkbased term. Section 3 shows the results of applying the proposed registration algorithm to a medical imaging scenario; in particular, the image registration of triplephase 3D computed tomography datasets of the liver under injection of a contrast agent has been chosen; the suitability of our approach in such a scenario is also proved in this section through the comparison of the results it provides and the outcome obtained with the CRbased version of the algorithm, which was recently reported to outperform other wellknown approaches in the medical setting [26]. The conclusions of the paper as well as the future lines of research are discussed in Section 4. Finally, an appendix closes the paper with the detailed derivation of the external forces field related to the proposed SSIMbased distance term (i.e., the first addend of the corresponding EulerLagrange equation).
Let and be two datasets, a reference and a template respectively, which represent the same object (or similar ones) by using the same or different imaging modalities. These dimensional datasets are defined as , with being the domain where they are supported, and representing the intensity level in each spatial coordinate . We search for a displacement field that makes the transformed template similar to the reference dataset in the geometrical sense, i.e., , where . This problem can be approached in terms of the variational calculus by defining a joint energy functional to be minimized:

(1) 
where is an energy term which quantifies the disparity between the datasets, is a penalty term which measures the roughness of the solution , and is a scalar parameter, usually referred to as the regularization parameter, which is used to control and weight the influence of the penalty term versus the distance term. It should be noted that the present formulation considers the different functions to be continuous in the domain where they are supported; the natural spatial (and temporal) discretization is tackled later in this section.
In the literature, several choices for the dissimilarity measure can be found. Depending on the particular datasets to be compared, statistical measures derived from the MI or the CR are usually the most appropriate candidates. As an alternative, in this work we propose the novel approach of incorporating both the SSIM [43] and the Euclidean distance between identifiable corresponding points (i.e., landmarks) into the variational framework which is being presented. The resulting disparity term is the following:

(2) 
where , , and denote the mathematical expectations and variances of the intensity levels of and , respectively; represents the covariance; and are small constants which aim to characterize the saturation effects of the visual system at low luminance and contrast regions and which assure numerical stability when the expectations or variances are very close to zero; is a scalar used to weight the influence of the landmarkbased term versus the SSIMbased term; is the number of identified landmarks; and and denote the spatial position of the th landmark in and , respectively.
The regularization term is used to add some prior knowledge on the displacement field, thus preferentially obtaining more likely solutions and giving the smoothness characteristics to . In this work, the regularizer is defined by

(3) 
where denotes the dimensional gradient operator, and is the regularization order: if , the resulting term is the diffusion smoother [14]; if , the regularizer is the curvature smoother [15].
According to the variational calculus, a necessary condition for a minimizer of the joint energy functional (1) is that the first variation of in any direction (also known as the Gateaux derivative) vanishes for all possible perturbations :

(4) 
The computation of the previous Gateaux derivative leads to the following EulerLagrange (EL) equation:

(5) 
where represents the hypervolume of ; is a Gaussian kernel (strictly positive and differentiable); denotes the dimensional convolution operator; is the Laplacian operator; finally, we find the term

(6) 
where the variables , , and are defined as

(7) 

(8) 

(9) 

(10) 
with being the intensities of and , respectively.
In 5, the first addend in equation (5) is deduced. The proof of the internal forces field of the EulerLagrange equation (i.e., the last addend in Eq.(5)) can be found in [24]. Finally, the addend which corresponds to the landmarkbased term can be easily obtained by applying the definition of the Gateaux derivative to the last addend in Eq.(2):

(11) 
where denotes the inner (or dot) product in .
In order to solve the EulerLagrange equation (5), an iterative timemarching scheme can be used. For doing so, an artificial time has to be added to the EL equation and then the steadystate solution has to be computed, i.e., (where gathers the first two addends in Eq.(5)). In the steadystate, the displacement field converges and hence . The time is discretized, (with being the iteration index and where is the timestep), and finally the temporal derivative is replaced with its discrete approximation (first backward difference). Due to the fact that digital datasets are usually encoded by uniformly distributed spatial elements in each dimension (e.g. pixels if or voxels if ), the discretization of the spatial variable becomes a natural approach too. Therefore in the following the notation is used, where is the index of the discrete spatial position, with , and being the number of discrete spatial elements in the th dimension. Using this notation, the resulting semiimplicit iteration for the th component of the displacement field is the following:

(12) 
where the discrete operator stands for a kernel which performs the discrete approximation of the spatial derivatives of .
As an alternative to the spatial approach, solving the iteration (12) in the frequency domain provides a stable implementation for the computation of a numerical solution for the displacement field , and in a more efficient way than other existing approaches if the dimensional fast Fourier transform (FFT) is taken into account [40]. Translating the semiimplicit iteration (12) into the frequency domain results in

(13) 
where , and are the dimensional Fourier transforms of , and , respectively, and is the dimensional counterpart in the frequency domain associated to the discrete spatial variable . The operator performs the spatial derivatives in the frequency domain and allows for their calculation by means of the product of spectra (i.e., the translation of the spatial convolution into the frequency domain); as deduced in [40], its analytical expression is the following:

(14) 
It should be noted that, mathematically, any value makes sense in the frequency domain (but not in the spatial domain). This allows for the design of hybrid diffusioncurvature regularizers with adaptable properties, please refer to [25] for further details.
As explained in previous paragraphs, it is taken into account that the datasets are discrete and then the spatial variable gives rise to the discrete spatial index . Similarly, instead of handling continuous spectra, the frequency domain is also discretized and only the uniformly sampled frequencies are evaluated in each dimension. This way the iteration (13) can be implemented efficiently using the fast algorithms for the computation of the DFT which are provided by many programming languages (e.g., MATLAB's builtin function which is based on the FFTW library by [16]).
Finally, the iteration of the proposed registration algorithm is the following:

(15) 
where , , , and . Due to the fact that results in a circulant block matrix, all the spectra products and pseudoinversions become Hadamard (i.e., pointwise) products and divisions, respectively [10].
In order to assess the performance of the proposed methodology, it is tested on 15 medical experiments involving triplephase 3D computed tomography (CT) volumes of the liver under contrast agent injection. Although the acquisition of the different phases is continuous in most cases, there is no exact correspondence between them, so they must be registered in order to show the results in a common 3D framework. First, a study without contrast agent is acquired. Once the contrast agent is injected, it reaches the arteries (arterial phase), then the portal veins and next the hepatic veins. Hepatic and portal veins are then visible in the portal venous phase. Image registration allows to locate exactly the vessels in 3D at each phase of contrastenhanced CT data in order to measure distances and volumes and provide objective parameters of the pathology which facilitate comparisons between patients, the tracking of tumors [9], to make calculations on the volume of the liver to be preserved prior to a liver resection [12], or to generate vessel models for planning surgical procedures [22]. The experiments carried out involve data from 5 patients. The acquisition device is a GE LightSpeed VTC (General Electric Medical Systems). The size of the volumes vary from to voxels, with a resolution of millimeters.
For the assessment of the proposed method, we carry out a comparison with the purely intensitybased variational method recently presented by the authors of this work in [26]. This CRbased approach was reported to outperform publicly available stateofthe art methods such as Elastix and ANTs in the medical setting. As can be seen in Figure 1, the proposed framework shows excellent results for the three considered registration scenarios (arterialportal, arterialnoncontrast and portalnoncontrast), reaching average values of 1.47, 1.44 and 1.52 bits in terms of mutual information, corresponding to the arterialportal, arterialnoncontrast and portalnoncontrast cases, respectively; this represents a mean improvement of 28.9%, 48.45% and 51.16% in relative terms of mutual information, thus outperforming the CRbased registration algorithm, which achieves a mean improvement of 26.48%, 44.22% and 43.25%, respectively. Additionally, due to the analogous behavior (i.e., comparable final values of mutual information) of the proposed method in the three scenarios, all available experiments can be grouped into one ensemble in order to assess a more comprehensive validation of the actual registration error. A ground truth was established by an expert in the form of identifiable anatomical locations (landmarks) for all experiments. The registration errors were then obtained by computing the spatial distance between the corresponding landmarks in the reference and registered template datasets. Figures 2(a) and 2(b) show through box plots the registration error (in millimeters) achieved by the methods under comparison, gathering the results from the three considered registration scenarios. These box plots collect the final spatial distances between corresponding landmarks, along with the median distance error and its statistical significance (notch showing the 95% confidence interval of the true median). According to Figure 2(b), the proposed method significantly improves on the registration error of the CRbased approach, since it reduces the initial median error from 9.50 mm to a residual median distance between landmarks of 1.41 mm, decreasing at the same time the outliers occurrence.
(a) Box plot of the registration error  (b) Closeup view of (a) 
Figure 2: Spatial distances (in millimeters) between corresponding landmarks before and after the registration process. 
In addition to the previous measurements, the visual outcomes of two of the experiments are shown in Figures 3 and 4, whose purpose is to highlight the most illustrative differences (from a medical point of view) between the results provided by the compared methods. In Figure 3, we observe a normal size of the liver, with discretely irregular contours and homogeneous signal intensity. In hepatic segment II, there is a lesion of 40 mm of maximum axis, encapsulated and with welldefined contours and heterogeneous enhancement in arterial phase (after administration of intravenous contrast), suggestive of hepatocellular carcinoma (HCC). In this slice of the CT scan, we can also observe the aorta that shines in the arterial phase, the lower area of the stomach and the upper area of the spleen. In Figure 4, the liver has a normal size with discretely irregular contours in relation to changes due to chronic liver disease. In hepatic segment IV, a 36 mm diameter focal lesion is identified, which has arterial phase enhancement with a small area of necrosis of 13 mm; it corresponds to a HCC previously chemoembolized with partial necrosis. In this slice of CT, we can also observe the aorta, the gastric chamber and the spleen. When comparing the two methods under study, it can be seen how in Figure 3 the resulting registered datasets are very similar. However, looking closely, it can be noticed that in the right part of the image (left side of the patient) the shape and width of the structures corresponding to the stomach and the spleen in Figure 3(d) match better those in the reference dataset. Likewise, the part of the rib at the upper right of the image is more similar to the same region in the reference dataset by using the proposed method. Regarding the experiment shown in Figure 4, it can be easily appreciated how the geometrical matching (with respect to the reference dataset, Figure 4(a)) of the structures in the right side of the image (specially the gastric chamber) is visually more satisfactory in Figure 4(d). Moreover, the area of tumor necrosis which results from the proposed method is also slightly better aligned.
Throughout this work, the SSIMbased term uses the following parameters setting: and . These are the empirically obtained values suggested in the reference paper [43]. Regardless, we find that in the current scenario, the performance of the proposed registration algorithm is fairly insensitive to variations of these values. Regarding the remaining parameters, which are exclusively related to our variational approach, the values of (i.e., curvature smoother, since the deformations to be corrected are notable in some cases), , (only applicable to the combined SSIM and landmarkbased method) and were used for all experiments —we recommend a common set which achieves a good balance between generalization and performance—. This parameter setting was obtained following the guidelines first introduced in [23]. As for the number of iterations, a value of grants convergence in all cases; the cost function stabilizes after 50–70 iterations for both the CRbased and the proposed algorithm. From the results gathered in Table 1, it can be concluded that the computational overhead introduced by the novel approach does not increase significantly the execution time of the CRbased algorithm. Moreover, both variational approaches outperform wellestablished registration methods such as Elastix in terms of efficiency. Elastix was the fastest method which provided better results among the open source methods of image registration which entered in the second phase of Empire10 Challenge [30]. The overall complexity of each iteration of the compared algorithms is —where is the number of voxels of the datasets—, since doubling means doubling the computational time.
Size  CR  SSIM+LM  Elastix 
2.40  2.52  5.69  
5.03  5.27  11.20  
10.42  10.93  22.01 
In this work, a theoretical framework has been presented for approaching the image registration problem. In particular, we have formalized the variational formulation of image registration, making it valid even for a general dimensional case. The resulting semiimplicit iteration, which is solved in the frequency domain, allows for an efficient implementation if fast algorithms for the computation of the DFT are taken into account. The main novelty of this paper lies in the inclusion within such a framework of a disparity energy which combines the information provided by both the structural similarity index (SSIM) and the location of geometrical identifiable points (landmarks). With this purpose, the corresponding external forces fields of the resulting EulerLagrange equation have been deduced, and their analytical expressions are explicitly provided.
The suitability of the proposed methodology in the medical setting has been validated by means of illustrative experiments involving liver CT data under different contrast agent injection. When compared with an intensitybased method which was recently reported to outperform other popular stateoftheart methods, it has been shown that the novel approach achieves higher values for both the similarity measure considered (mutual information) and the actual registration error (distance in space between corresponding landmarks). Moreover, the results provided by the method proposed in this paper are subjectively considered more satisfying after being visually inspected by an expert. However, the main advantage of the proposed registration method is its efficiency. As expected, the FFTbased, C++ implementation of our approach outperforms fast methods such as Elastix in terms of computational cost, while keeping the execution times in the range of its main contestant (the previously proposed CRbased algorithm). In a clinical environment, where computational times should be kept as low as possible, this feature can be of paramount importance. Regardless, it should be noted that comparing the performance of different image registration algorithms is not a trivial task, since each approach has its own set of userdefined parameters, which may heavily influence the final outcome. For instance, if they are tested on datasets with different characteristics (e.g., modality or anatomical region), new optimal parameters need to be determined. In this sense, open challenges are an interesting way to categorize image registration software packages in a common context.
It should be noted that the proposed method considers the input directional field as a periodic signal. Unless this is taken into account, the vectors located near the boundaries will be erroneous. As stated by other authors —see e.g. [13]—, this is hardly a problem if the data is contained within a uniform background (e.g., when dealing with medical images or volumes, as in the current scenario), or else it can be overcome by using a folded algorithm which extends the dataset symmetrically, thus resulting in additional reflections of the original input, see e.g. [3].
The most notable limitation of the present method is that the SSIMbased term would not be an appropriate choice for the combined disparity energy unless matching voxels of the datasets have similar intensities (i.e., in pure monomodal or pseudomonomodal scenarios). In fact, this is an inherent drawback of the structural similarity index, not a problem which arises with our methodology. However, even if the aim was to register multimodal datasets, one could still use the proposed variational approach, but considering an alternative distance term instead (e.g., a combined CR and landmarkbased disparity energy).
In ongoing research, we would like to address the application of our methodology to other scenarios, such as the fusion of datasets from different modalities: for example, anatomical (CT and MRI), metabolic (spectroscopic MRI and PET) and functional (fMRI) volumes in the medical setting. Alternative definitions of the joint functional, including additional energy terms —e.g., a measure of the differences between forward and reverse transformations, so that the resulting displacement field is consistent—, are currently being explored within the presented theoretical framework.
In this work, we propose the opposite of the structural similarity index as the intensitybased part of the disparity term of the joint energy functional (first addend in equation (2)). Thus, a maximization problem is transformed into the minimization of the following cost function:

(A.1) 
The mathematical expectations, variances and covariance , , , and are defined as follows:

(A.2) 

(A.3) 

(A.4) 

(A.5) 

(A.6) 
where and denote the marginal intensity distributions estimated from and , respectively, and stands for an estimate of the joint intensity distribution. The intensities of the datasets, , are considered as random variables whose probability densities are respectively given by and .
In order to obtain the external forces field related to the SSIMbased energy term (i.e., the first addend in the EulerLagrange equation (5)), the computation of the corresponding Gateaux derivative is required. Therefore, we carry out an explicit computation from the definition of the Gateaux derivative (see Eq.(4)):

(A.7) 
with being an intensity operator —whose definition, not necessary at this point, is not explicitly provided for the sake of simplicity—, and where the notation is used.
Assuming a displacement , the joint intensity distribution estimated from and is provided by a nonparametric ParzenRozenblatt density model [11]:

(A.8) 
where denotes the hypervolume of and is a Gaussian kernel (strictly positive and differentiable):

(A.9) 
At this point, derivatives of can be easily computed. In particular,

(A.10) 
We now replace (A.10) in (A.7), and then let :

(A.11) 
In the previous equation, a bidimensional convolution with respect to the intensity variable appears naturally. This convolution conmutes with the derivative (since both are linear operators). Moreover, by identifying the resulting expression with an inner product in , the external forces field of the EulerLagrange equation (5) which corresponds to the SSIMbased energy term can be finally obtained:

(A.12) 
with

(A.13) 
and where the definitions of , , and can be found in Eqs.(7)(10).
[47] Zitová, B., Flusser, J. Image registration methods: a survey. Image and Vision Computing, 21:997–1000, 2003.
Published on 21/03/19
Accepted on 20/03/19
Submitted on 12/09/18
Volume 35, Issue 1, 2019
DOI: 10.23967/j.rimni.2019.03.004
Licence: CC BYNCSA license
Are you one of the authors of this document?