This blog post is a review of the paper "IGCN: Image-to-graph Convolutional Network for 2D/3D Deformable Registration"

Motivation, Contribution

It is common in the hospitals to capture high-quality and often times 3D images before an operations from patients to plan the intervention. However, during the intervention, capturing such images is not possible, and mostly 2D images are captured in real-time [3], which contain less information compared to the pre-operation 3D images. Registration is used to bring two or more images into spatial alignment, which are taken, for example, at different times, from different modalities, or from different viewpoints [2]. In this paper, a new method is proposed for 2D/3D image registration, where the 3D images are CT image captured before intervention and the 2D images can be any 2D image, such as X-ray or Digitally Reconstructed Radiograph (DRR) images which is used in this study.

This study, proposes an image-to-graph convolutional network (IGCN) that achieves deformable model registration of a 3D organ model for a single-viewpoint 2D projection image. The main contribution is in combining a u-net based generative network with a graph neural network to perform the registration. It is claimed that the proposed method will work on all abdominal organs, and they have quantitatively evaluated its performance on the liver. The proposed technique could be directly applied for localization of radiation targets and organ-at-risk volumes in radiation therapy, and it could also be applied to a wide range of image-guided interventions [1].


Methodology

IGCN consists of two main parts. First, a Deformation map is derived using 2D projection image I and a Semantic label S which is captured from Template 3D mesh M. Then the feature vectors from u and the template 3D mesh M are given as input to a GCN, whose output is the registered deformed mesh to I. Image Translation

Several different architectures can be used as the image translation function g, and the paper uses a UNet-based network [7]. The function g takes a Semantic label S derived from template 3D mesh of M, and the projected image I as input and outputs a displacement map u, representing a spatial mapping function in 2D space.


Figure 2

Fig. 2 illustrates the g θ learning process for the liver. First, we obtain a registered mesh with point-to-point correspondence through deformable mesh registration (DMR) [8] between the initial and target meshes (Fig. 2(a)). The displacement vector d i of each vertex v i is obtained from the corresponding points before and after deformation (Fig. 3(b)). A 3-channel projection image (Fig. 3(c)) is obtained by transforming d i from Euclidean to color space and is then used as the surface color of the initial mesh for rendering. This is a forward displacement map in which a 3D displacement vector is stored in each pixel, which directly represents u. The 2D region of the patient-specific organ obtained by projecting the initial mesh can be used as a semantic attention label S. Here, the proposed g θ defines the transformation:

u = g_\theta(I, S)

In this case, identical displacement vectors can be assigned to all vertices mapped to p. However, v α , v β form parts of different organs and different parts of the same organ (e.g., the anterior and posterior); thus, they must each be able to express different displacements. This problem is resolved in the GCN described below through embedded learning using 3D vectors obtained from the displacement map as well as the local shape and topologies at each mesh vertex.

Vertex Transformation

In order to resolve the issues mentioned in the last part, a Graph Convolutional Network is used as function f, a vertex transformation function. The graph is composed of the 3D mesh data, where each vertex is connected to its corresponding neighbors. Every vertex has a vector X as its representation, which consists of the displacement vector for that vertex (output of g) concatenated by the 3D coordinates of that vertex (v_i). Then the function f, is responsible for the spatial transformation:

\hat{v_i}} = f_\psi(v_i, u(p_i))

The proposed f is a GCN, which consists of eight sequential graph convolutional layers, each of which is defined in the following equation:

X^{(l+1)} = \sigma(\hat{D}^{-\frac{1}{2}}\hat{A} \hat{D}^{-\frac{1}{2}} X^{(l)} W^{(l)} )

where X^{(l)} and X^{(l+1)} denote the feature matrix before and after convolution, respectively. In our experiments, X^{(l)} was the concatenation of the vertex coordinates v_i and displacement vectors u(p_i), and W was the learnable parameter matrix. A \in R^{n\times n} was the adjacency matrix. D \in R^{n\times n} was the degree matrix, i.e., a diagonal matrix, in which each element A_{ii} represented the number of edges connected to v_i . The template mesh was deformed by updating X^{(l)}.

Loss Function

In order to train the proposed network in an end-to-end fashion, three different loss functions are introduced and sum of those function is considered as the whole networks loss function and optimized accordingly.

The first loss function measures the distance between the target position of vertices and predicted position of vertices in the mesh:

\ell_{pos} = \frac{1}{N} \sum_{i=1}^n ||v_i - \hat{v_i}||^2_2


In this problem setting, the organ deformation is spatially non-linear and heterogeneous but expected to remain within a limited range. To preserve the curvature and smoothness of the initial surface, the paper uses a regularization loss \ell_{smooth} that evaluates a discrete Laplacian of the mesh:

\ell_{smooth} = \frac{1}{N} \sum_{i=1}^n ||L(v_i) - L(\hat{v_i})||^2_2

where L(·) is the Laplace-Beltrami operator and L(v_i) is the discrete Laplacian of v i defined by

L(v_i) = \sum_{j \in N(v_i)} \frac{(v_i - v_j)}{N(v_i)}

where N(v_i ) is the number of adjacent vertices v_j of the 1-ring connected by the vertex v_i . This loss constrains the shape changes from the initial state and avoids generation of unexpected surface noise and low-quality meshes.

In addition to evaluating the mesh vertex coordinates, accurate prediction of u improves the 2D/3D deformable registration results. So the third loss, namely \ell_{map} tries to enforce good parameters for the function g to produce a good mapping result.

\ell_{map} =||u - \hat{u}||_1

Finally, the overall loss of the problem would be the sum of all aforementioned losses, yielding the following formula:

\ell = \ell_{pos} + \lambda \ell_{smooth} + \mu \ell_{map}

 Both \mu and \lambda were set to 1 for training. By minimizing the explained loss, the parameters of both g and f functions were optimized simultaneously at each epoch, such that:

g_{\theta^*}, f_{\psi^*} = \operatorname*{argmin}_{g_\theta, f_\psi} \ell(g_\theta, f_\psi)

Results and Experiments

The results of the proposed method is compared to two existing 2D/3D registration methods, namely Pixel2Mesh [4] and IGCN warp [5].

Dataset

3D-CT volumes of 124 cases and 4D-CT volumes of 35 cases were acquired from various patients who underwent intensity-modulated radiotherapy in Kyoto University Hospital. This study was performed in accordance with the Declaration of Helsinki and was approved by our institutional review board (approval number: R1446). Each 4D-CT volume consisted of 10 time phases (t=0, 10, · · · , 90%) for one respiratory cycle and was measured under respiratory synchronization, with t = 0 and t = 50 corresponding to the end-inhalation and end-exhalation phases, respectively. Thus, 474 3D-CT volumes were used.


Comparison of Results

The 3D shape and position accuracies for the predicted organs are evaluated using three error indices, mean distance (MD), Hausdorff distance (HD) [6] and mean absolute error (MAE) between surfaces, as well as the Dice similarity coefficient (DSC).

Table I lists aver age values and standard deviations of the evaluation indices obtained for 350 test data points. Here, “Initial” refers to the magnitude of the deviation from the known 3D shape of the first phase t = 0 and corresponds to the stage when deformation prediction was not performed. The proposed method exhibited superior performance to P2M and IGCN Warp, and a 3D liver shape was reconstructed with shape error values of MD = 2.1 mm and HD = 9.9 mm, and a shape similarity of DSC = 94.5%. Significant differences (one-way analysis of variance, ANOVA; p < 0.05) were confirmed for the conventional methods (P2M and IGCN Warp) for all indices.


Table II lists the respective errors when noise was added to the initial template alignment. For the MD values, the errors increased by 14.6% (0.5 mm) and 15.0% (0.4 mm) for P2M and IGCN Warp, respectively, but the increased error of the proposed method was suppressed to 9.0% (0.2 mm). Thus, stable prediction could be achieved even for differences in the initial conditions associated with the 3D shape arrangement. As with the noise-less conditions, significant differences were confirmed from the  conventional method for all indices. A visual comparison between proposed and literature models can also be seen in the following figure, which shows a liver registration.

Student's Review

Real-time reconstruction of 3D organs from 2D x-rays is potentially clinically valuable. This work presents a novel method to achieve this goal and quantitatively evaluated its performance on the liver. However the use of a DRR, which is basically a radiograph reconstructed from a full 3D CT scan, leaves questions open about the real use of the technique. Whether the proposed method works on X-ray data or not is ambiguous. The description of the IGCN is given only in high level, leaving many choices unclear for reproduction of the results. Also the data used is not publicly available, which could also make reproducibility of the results difficult.

The GCN part of the algorithm was not explained and investigated thoroughly. Only one last simple structure is mentioned, and there is no discussion and remarks about experiments to tune hyperparameters which resulted to the proposed structure.

References

[1] Nakao, Megumi & Nakamura, Mitsuhiro & Matsuda, Tetsuya. (2021). IGCN: Image-to-graph Convolutional Network for 2D/3D Deformable Registration. 

[2] B ROWN , L. G. A survey of image registration techniques. ACM Comput. Surv. 24, 4 (1992), 325–376.

[3] H. Teske, P. Mercea, M. Schwarz, N. H. Nicolay, F. Sterzing, and R. Bendl, Real-time markerless lung tumor tracking in fluoroscopic video: Handling overlapping of projected structures, Med Phys, vol. 42, no. 5, pp. 2540-9, 2015.

[4] N. Wang, Y. Zhang, Z. Li, Y. Fu, H. Yu, W. Liu et al., Pixel2mesh: 3D mesh model generation via image guided deformation, IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 1-1, 2020.

[5] M. Nakao, M. Nakamura, and T. Matsuda, Image-to-Graph convolutional network for deformable shape reconstruction from a single projection image, Proc. Medical Image Computing and Computer Assisted Intervention (MICCAI), pp. 259-268, 2021.

[6] D. P. Huttenlocher, G. A. Klanderman, and W. A. Rucklidge, Comparing images using the hausdorff distance, IEEE Transaction on Pattern Analysis and Machine Intelligence, vol. 15, no. 9, pp. 850-863, 1993.

[7] O. Ronneberger, P. Fischer, and T. Brox, U-net: Convolutional networks for biomedical image segmentation, Medical Image Computing and Computer Assisted Intervention (MICCAI), pp. 234-241, 2015.

[8]

[9]

  • Keine Stichwörter