Fenqiang Zhao, Shunren Xia, Zhengwang Wu, Dingna Duan, Li Wang, Weili Lin, John H. Gilmore, Dinggang Shen, Gang Li
Blog post written by: Milena Eisemann

Convolutional Neural Networks (CNN) are well-established in many Machine Learning tasks, however their usage in the domain of medical imaging on many tasks is linked to additional challenges due to the data’s spherical topology. In this paper [1] the authors introduce new adaptations of convolution, pooling and transposed convolution which are specifically fit to this topology. These reworked operations are built onto the idea of a new filter closely tied to the icosahedron expansion process and can easily be used to leverage for instance the well-known U-Net for the work on cortical surface tasks.

The challenges of a different Topology

CNN exploit the properties of Euclidean space [2], which they have originally been developed for. One advantage of Euclidean space emphasized by the authors is a consistent neighborhood relationship which is heavily relied on to guarantee the consistent and efficient application of convolutions. However, this does not directly transfer to the spherical topology inherent to many structures in the field of medical imaging such as cortical surfaces. Additionally, these kinds of data are often represented by triangular meshes composed of different amounts of vertices with varying local connectivity among subjects and even brain regions. Due to these intra- and inter-subject variations, establishing a consistent neighborhood across subjects is not a trivial task and conventional CNN can not be applied natively.

State of the Art - How others solve the problem

Clearly, this paper is not the first one to explore the possibilities of using CNN on spherical data. Previous works on the topic have mainly resorted to the use of one of two approaches:

Approach 1: Performing convolutions in non-spatial domains
Different investigations in the non-spatial domain [3, 4, 5] include for instance approaches working with the spectral domain obtained using the graph Laplacian or Fast Fourier Transformations. However, most recent works put their emphasis on problems on omnidirectional image data. The resampling to spherical coordinates necessary to apply these methods is not ideal for the use case on cortical data, as the resulting structure would be imbalanced vertex-wise and crucial structural information could get lost in the process.

Approach 2: Projecting data onto certain intrinsic spaces
For this method patches from the sphere are projected onto a different space, usually the tangent plane [6, 7, 8]. These patches, now in Euclidean Space again, are then fed to a CNN using different sampling approaches. The issue with these approaches is that they usually 
involve feature distortions. Other than that, the need for re-interpolation usually adds complexity and thus computational burden on the model as well as increases the risk for lower accuracy.

Given that both of these approaches have their shortcomings, this paper is aiming to introduce a method that avoids both by working more closely on the data’s natural topology.

Background knowledge: From Brain to Sphere - Icosahedron discretized spherical surfaces

One thought that might seem ridiculous at first, is the one of blowing a brain’s folded cortical surface up like a balloon. However, this inflation is a very useful image to have in mind for the fundamental concept used in this paper. Such a spherical representation is an important step to simplify any work on this kind of data. Unfortunately, the obtained distribution of vertices on the sphere will not automatically be uniform. This is where the discretization of the sphere’s surface using an icosahedron comes into play.
An icosahedron is one of the five so-called platonic solids: a collection of 3D shapes which are composed from identical, regular polygons as their faces and for which the same amount of these polygons meet at each vertex [9]. For our discretization purpose, the basic 12-vertices icosahedron is not accurate enough yet. To arrive at vertex counts that show sufficient complexity to model the attributes of the brain, we can iteratively approximate spheres of higher resolutions by using the process of so-called icosahedron expansion [6, 18] shown in Fig. 1.

[Fig 1] By iteratively adding new points (blue) in between existing edges (green) the approximation of the sphere
using an icosahedroncan be iteratively refined. The approximation leads to a regular, triangular grid.

Step 1: The DiNe filter: creating a new consistent neighborhood

Going one step back and looking at the grid of 2D images we can see some reference directions (x and y in Fig. 2) which allow for a systematic application of convolutions. Given our discretized, regular spherical surface from before we can see a regular grid emerging as well. However, in the case of our sphere we still lack reference directions. This suddenly makes the ordering of points ambiguous [5].
In conventional CNN a kernel is placed over a center pixel, the overlapping elements of kernel and data are convolved (Fig. 2) and the kernel is slid to the next position. 

[Fig 2] Application of conventional convolutional filter

Defining the new filter around a center point still works naturally in our setting. We can then consider all vertices as neighbors that can be directly reached by a single edge. This definition is the reason behind the naming of the filter: the 1-Ring or DirectNeighbor-Filter ( DiNe-Filter).
At this point, we are merely missing an indexing strategy to make the neighborhood ordering unambiguous. As visualized in Fig. 3, the authors of the paper suggest giving index 1 to the center vertex and then distributing the remaining indices based on prior posture information about the normal brain orientation at the center vertex to make it azimuthally rotation equivariant [10] (for a detailed description, refer to [11]). Due to the icosahedron expansion process some vertices will have five, most others six neighbors. For vertices with only five neighbors, both indices 1 and 2 will be assigned to the center vertex. As these indices can be pre-computed, this approach is efficient to implement.

[Fig 3] Two points and their neighborhoods with 5 and 6 neighbors respectively due to icosahedron expanison (left).
Indexing of vertex (blue) and its 1-ring neighborhood (right).

Step 2: DiNe-Convolutions: Adapting how to convolve

With the availability of the 1-ring filter a convolution is now merely a filter weighting process (Fig. 4). The operation outputs a new feature map of the same vertex number as before but with new dimensionality. While the process of convolving is equivalent to the one found in conventional convolutions, two well-known hyperparameters are missing. No stride needs to be set as the DiNe-Filter will always be applied to each vertex. Furthermore, it becomes redundant to deal with padding because we convolve on a closed sphere and hence there are no edges to take into consideration like in the case of 2D images.

[Fig 4] Application of new DiNe-Convolution

Step 3: Icosahedron contraction: the new way to apply pooling

Like in normal CNN it makes sense to employ pooling layers to extract information and reduce the size of feature maps. Here this corresponds to a sphere represented by less vertices obtainable through icosahedron contraction (= reversed application of the expansion process). Again, the DiNe-Filter is applicable, although with one adjustment: it only gets applied to vertices on the higher-resolution input that are also present in the next lower-resolution icosahedron resulting from the contraction process as visualized beneath (blue vertices in Fig. 5 on the left).

[Fig 5] The same DiNe-Filter can be used by applying it only on vertices that are still present
in a lower resolution icosahedron (here orange, blue and red respectively) to achieve pooling.


Step 4: Back to the higher resolution: new transposed Convolutions

Oftentimes networks also need to be able to recover higher resolutions of the data from lower resolution feature maps which arose from previous pooling layers. For this, each vertex from the low-dimensional sphere first gets extended by a ring of vertices using the DiNe-Filter. In a second step, the vertices will be convolved, and values of overlapping vertices will be summed, resulting in the final values (Fig. 6).

[Fig 6] Transposed convolutions are performed in 2 steps: First reconstruct each vertex's neigbors (orange neighborhood),
then obtain values by applying actual weights and summing where overlaps occur.

Step 5: Making U-Nets spherical - an extension of an existing architecture

The introduction of all these adaptations now facilitates the transfer of existing architectures to spherical topology. In the paper this is demonstrated using the example of a U-Net [12]: a popular choice for biomedical image analysis because of its symmetrical, hierarchical architecture using skip connections which enable it to capture contextual and localisation information jointly. For this, a mere exchange of the standard operations by their DiNe counterparts was required (additionally some minor changes were made, details can be found in [1]).

Validation

Finally, to showcase their model’s flexibility and efficiency the authors selected two experiments on infant brains: cortical surface parcellation (Segmentation) and attribute map development (Regression).

The tested models for the parcellation task can be divided in two subgroups: models with learning for upsampling and models without learning for upsampling. In the first category the Spherical U-Net with DiNe-Filter as described above was compared to one with a RectangularPatch-Filter (RePa) [6] as a representative of a patch-based method using the tangent plane. The results show that even in its reduced version the DiNe-Network can compete with the RePa-Filter with respect to Dice Ratio, while additionally offering the benefit of being faster as well as smaller in memory requirements and model size.
Additionally, a base version without upsampling was compared to two versions in SegNet-Style [13] with different interpolation methods. Here the U-Net structure using the proposed transposed convolutions is the preferred method again as its output shows less noise and is closest to the manual ground-truth.

For the cortical thickness prediction task, the Spherical U-Net was compared to 3 different feature-based approaches: Linear Regression, Polynomial Regression and Random Forest. All of them were trained using 102 features (local and neighboring information) extracted from the cortical surface [14]. However, the Spherical U-Net was only trained on the first 2 of those features (sulcal depth & cortical thickness) which provide local information exclusively. 
Not only does the Spherical U-Net produce the visibly smoothest results but its outputs are on average also more precise (wrt. MAE/MRE). It is worth noting that the other three algorithms are more computationally heavy as they generate a model for each vertex separately while the Spherical U-Net only generates one single model. This makes its results more spatially consistent.

Beyond the paper - Discussion and Conclusion 

By introducing a new filter type and adapting the known methods of CNN, this paper offers a novel alternative to the two approaches commonly used for spherical data.  Altogether, the Spherical U-Net shows competitive results and possesses desirable task- and feature-agnostic properties, all while being efficient.
The paper itself is easy to read, however it lacks some details in some explanations (e.g. on vertex indexing) which could be due to the reduced scope of a conference paper. For some deeper explanations it is recommended to have a look at the authors’ follow up paper [11].
The original paper itself does not mention any planned future work or shortcomings. In their subsequent paper the authors discuss an extension which makes the DiNe-Filter deformable by using additional offsets [11]. This makes the filter more flexible which is necessary to model structures of different sizes and shapes like the ones found in cortical surfaces. 
Additionally, this approach still requires a mapping to a sphere which can be sensitive to noise or on impaired surfaces.
Furthermore, some works criticize that the DiNe-Filter is only azimuthally and not fully rotationally equivariant. For cortical data this seems less problematic as it usually has a preferred orientation. In other potential application areas this could unfortunately lead to distortions near the poles.
Since the work’s release in 2019, it has been referred to by several papers in the medical imaging community. Thus, the approach seems to remain more of a niche topic, nevertheless one that is worth having a look at.

References

[1] Zhao, Fenqiang, et al. "Spherical U-Net on cortical surfaces: methods and applications." International Conference on Information Processing in Medical Imaging. Springer, Cham, 2019

[2] https://en.wikipedia.org/wiki/Euclidean_space  (last accessed 02.01.2023)

[3] Bruna, Joan, et al. "Spectral networks and locally connected networks on graphs." arXiv preprint arXiv:1312.6203 (2013)

[4] Cohen, Taco S., et al. "Spherical cnns." arXiv preprint arXiv:1801.10130 (2018).

[5] Zhang, Ziheng, et al. "Saliency detection in 360 videos." Proceedings of the European conference on computer vision (ECCV). 2018.

[6] Seong, Si-Baek, Chongwon Pae, and Hae-Jeong Park. "Geometric convolutional neural network for analyzing surface-based neuroimaging data." Frontiers in neuroinformatics 12 (2018): 42.

[7] Wu, Zhengwang, et al. "Registration-free infant cortical surface parcellation using deep convolutional neural networks." International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, Cham, 2018.

[8] Coors, Benjamin, Alexandru Paul Condurache, and Andreas Geiger. "Spherenet: Learning spherical representations for detection and classification in omnidirectional images." Proceedings of the European conference on computer vision (ECCV). 2018.

[9] http://www.math.utah.edu/~alfeld/math/polyhedra/polyhedra.html (last accessed 02.01.2023)

[10] Toft, Carl, Georg Bökman, and Fredrik Kahl. "Azimuthal rotational equivariance in spherical cnns." (2020).

[11] Zhao, Fenqiang, et al. "Spherical deformable u-net: Application to cortical surface parcellation and development prediction." IEEE transactions on medical imaging 40.4 (2021): 1217-1228.

[12] Ronneberger, Olaf, Philipp Fischer, and Thomas Brox. "U-net: Convolutional networks for biomedical image segmentation." International Conference on Medical image computing and computer-assisted intervention. Springer, Cham, 2015.

[13] Badrinarayanan, Vijay, Alex Kendall, and Roberto Cipolla. "Segnet: A deep convolutional encoder-decoder architecture for image segmentation." IEEE transactions on pattern analysis and machine intelligence 39.12 (2017): 2481-2495.



  • Keine Stichwörter