Author: Yilin Tang

Supervisor: Shahrooz Faghihroohi

1. Outline

Outline
Introduction
Motivation
SSMs and variations
1. Conventional SSMs
2. From conventional SSMs to DeepSSMs
3. From DeepSSMs to DeepISSMs
4. Other Variations of SSMs
  1. TL-DeepSSMs
  2. DeepSCSMs
Conclusion and comparison
Reference

2. Introduction

Statistical Shape Models (SSMs) have witnessed a surge in prominence since the 1990s and currently occupy a central position within the domain of medical image analysis. Recent advancements have substantially augmented their efficacy in precision tasks, including the segmentation of osseous structures, intricate three-dimensional (3D) reconstruction, and the classification of bone anatomies. These capabilities are of profound significance in the realms of diagnostics, preventive medicine, and therapeutic interventions.

As delineated in Figure 1[4], the operational intricacies of SSM within the medical domain can be discerned. The schematic elucidates the procedural generation of an SSM, emphasizing its primary applications. Moving from left to right, an SSM, whether a statistical model of shape or appearance exclusively, is derived from clinical images. Specifically, in the context of three-dimensional (3D) SSMs, computed tomography (CT) scans are conventionally employed, while two-dimensional (2D) SSMs find utility in the analysis of dual-energy X-rays (DXA) or radiographs. The ensuing sequence involves the segmentation of medical images to extract pertinent geometries, followed by a registration process that aligns these geometries through established correspondence techniques. Subsequently, dimensionality reduction techniques, such as Principal Component Analysis (PCA), are applied, facilitating the construction of the SSM. The resultant model is versatile and finds application across diverse medical domains.

3. Motivation

The generation of shape models in our research involves capturing the inherent variations in medical images through a low-dimensional representation, particularly for model-based segmentations. Within the scope of our investigation, we address two principal challenges.

The first pertains to the three-dimensional (3D) delineation of anatomical structures, notably applicable in diverse medical scenarios such as the diagnosis, prevention, and treatment of osteoporosis and fragility fractures, cardiovascular (Figure 2) and neurological diseases, as well as liver surgery (Figure 3). This multifaceted issue underscores the need for effective solutions in medical imaging analysis, where accurate delineation holds paramount importance. [1][2][4]

The second challenge involves causal reasoning, emphasizing the exploration of causal effects arising from genetic, environmental, or lifestyle factors on both normal and pathological variations in anatomical phenotypes. This intricate aspect prompts critical questions, exemplified by inquiries like, "How would this patient's brain structure change if they were ten years older?" (Figure 4) The pursuit of answers to such questions aligns with our research objectives, delving into the nuanced interplay of various factors contributing to anatomical variations.[3]

Through our investigation, we aim to contribute valuable insights and methodologies to address these challenges, fostering advancements in the understanding and analysis of medical images with significant implications for clinical applications and research endeavors.

4. SSMs and variations

4.1. Conventional SSMs

In the SSM, medical imaging data segmentation (Step 1) initiates the process, followed by the reconstruction of the anatomical segment (Step 2). These steps iterate for each patient, providing inputs for mean shape computation (Step 3). Postprocessing (Step 4), employing techniques like principal component analysis (PCA), enables shape variation comparison and quantitative assessments. Figure 5 illustrates the conventional SSM process, involving segmentation (Step 1), replication for all patients (Step 2), and postprocessing (Step 3) with methods like PCA for comprehensive shape understanding.

The procedural delineation of SSM entails a succinct enumeration of three fundamental steps. Firstly, the process commences with the construction of an anatomical SSM, akin to the formulation of a Point Distribution Model (PDM) shape model. This is exemplified through an illustrative paradigm involving the modeling of a human face. (Figure 6) Subsequently, the second step involves the fitting of the SSM to a novel image, thereby facilitating the alignment of the model with the distinctive features of the new anatomical structure. (Figure 7) Lastly, it is noteworthy that the Principal Component Analysis (PCA) loadings, a pivotal facet of the SSM, often lack precision in capturing finer details. To address this limitation, the third step necessitates local refinements, specifically tailored to provide meticulous fine-scale surface enhancements, thereby augmenting the fidelity of the overall SSM.

4.2. From conventional SSMs to DeepSSMs

Before the deep learning, SSMs were a workhorse. So, we compare the conventional SSMs and deep learning models to learn their pros and cons.

Conventional SSMs

Deep learning models

+ Established

+ Intuitive interpretation

+ Many extensions

+ No point-based 1-to-1 correspondences needed.

+ Flexible and non-linear

- Linear nature

- Many training samples

- Point-based 1-to-1 correspondences

- More training samples

- Hard to interpret

To combine their advantages and avoid their disadvantages, people developed the first version of DeepSSMs. DeepSSMs are unlike conventional SSMs, it is not a generative framework. It focuses on minimal pre-processing and direct computation of shape descriptors from original images of anatomy and can be further used for shape analysis applications.

Figure 8 is a process to show how DeepSSM is trained and used for getting shape descriptors directly from images. We can see in the bottom-right figure: that the shapes given by the original input CT scans which are the red dots are augmented by sampling in the PDM shape space, from a normal distribution. The resulting correspondences are used to transform original images of nearby samples, to create new images with known shape parameters. And in the top-right figure: we use a CNN architecture with a few convolution layers followed by FC layers to produce the output. The input to the network is a 3D image and the output is a set of ordered PCA loadings concerning the shape space.

Subsequently, a comparative analysis between the inference pipelines of a contemporary Point Distribution Model (PDM) and DeepSSM is conducted. Notably, DeepSSM emerges as a more expedient and computationally efficient alternative, underscoring its superiority in processing speed. This prompts an inquiry into the rationale behind opting for the development of DeepSSM over the utilization of PDM. (Figure 9)

PDM, while established, presents inherent drawbacks. Firstly, it necessitates meticulous point-to-point correspondences across shapes, demanding a laborious and intricate process. Moreover, it mandates a mesh or point-cloud shape representation, posing challenges in compatibility with modern deep learning frameworks. The incorporation of PDM into these frameworks becomes arduous due to these constraints.

4.3. From DeepSSMs to DeepISSMs

To address these limitations, researchers advocate for the evolution of a novel iteration termed Deep Implicit Statistical Shape Models (DeepISSM). This innovative paradigm eliminates the requirement for point-to-point correspondences and circumvents the challenges associated with mesh or point-cloud representations. DeepISSM emerges as a progressive solution to overcome the hindrances posed by PDM, aligning more seamlessly with contemporary deep learning frameworks.

The DeepISSM is correspondence-free and uses agent-based CNN to estimate pose for new images and uses deep level-set-based surface refinement. It can be typically broken into three steps. (Figure 10)

The first step is constructing an implicit shape model. For this model, there is no requirement for point-correspondences. The mean shape vector can produce a meaningful shape, as PCA basis. The second step is posing estimation. Like the classic PDMs, we represent shapes using a PCA basis. For a new image, we need to determine its rigid pose and non-rigid PCA loadings. So, we could just train a CNN encoder to directly output pose. The last step is fine-scale local refinements. After pose estimation, we have an SDF (signed distance function) shape model. Then we need to train an FCN to output corrections to SDF at each pixel by using deep-level set loss. And retrains high-quality surface representation. We only produce refinements in a narrow band (similar to standard-level set practices). After PCA loadings, we penalize deviations from SDF. [2] (Figure 10)

Experiment Result Here is a test on pathological liver segmentation. As shown in Figure 12, the left table 2 shows the DISSM results on top of a CT scan after each stage. As can be seen, each stage progressively improves the result. The visual impact of these improvements can be seen in the first column of the right box-whisker plot. And we can see the performance of different models in this table. DISSM exhibits less variability in DSC and ASSD, which means better robustness and can be seen in the box plot obviously. Since the clinical results are more convincing, they also test on the clinical liver dataset. In Figure 13, as Table 3 highlights, the performance of other models drops off on clinical datasets because the incidence rates, scanners, patient populations, and practices may be unexpected. In comparison, DISSM's performance is much more stable and still achieves very good data. And as shown in Figure 14 it also performs well on Larynx dataset.[2]

4.4. Other variations of SSMs

4.4.1. TL-DeepSSMs

Regression of the shape to its PCA scores is enough to capture all the linear shape variations in the population but will fail to capture the subtle non-linear changes that might be of interest. Although the data augmentation relies on PCA spaces, the correspondences may showcase non-linear variations, and the PCA score-based shape descriptor may be limiting in describing the anatomical variations.

For this, people propose a variant architecture of DeepSSM that does not rely on PCA score as a shape descriptor. They take inspiration from TL network architecture, which is applied for 2D to 3D reconstruction. They use this idea to find a low-dimensional latent representation that is common to the PDM representation and its corresponding raw image. (Figure 16)

Experiment Result As shown in Figure 17, the shape models used are state-of-the-art PDM ShapeWorks , the base, and TL-DeepSSM trained without focal loss. The experiment of TL-DeepSSM compared with the base-DeepSSM. This is the evaluation of estimated severity scores: Correlation and AUC of the severity measure using different shape models with respect to the expert ratings and higher numbers are better. [1]

4.4.2. Deep Structural causal shape models (DeepSCSMs)

The causal modeling of non-Euclidean structures is a problem that machine learning research has yet solved. So, people develop a deep structural causal shape model (CSM), using geometric deep learning components, of the data generation process of 3D meshes. For example, how would this patient's brain structure change if they were ten years older?" So, we can build this computation graph(left), graphical model (bottom-left), and structural causal model (right). Variables are brain stem mesh (x), age (a), sex (s), total brain (b), and brain stem (v) volumes. Then we can use these variables to estimate. (Figure 18)

5. Conclussion and comparison

	DeepSSM: A blueprint for image-to-shape deep learning models [1]	Deep implicit statistical shape models for 3d medical image delineation [2]	Deep structural causal shape models [3]
Models	•DeepSSMs •State-of-the-art PDM models •TL-DeepSSMs	•DeepISSMs	•DeepSCSMs
Basic framework	•DeepSSM framework •CNN-based architecture •TL network architecture	•DeepSSM framework •CNN-based architecture •Marginal space learning •Agent-based formula	•DSCM framework
Datasets	•CT scans of 10-15 months children with metopic craniosynostosis •206 LGE MRI images from patients who have AF	•Liver dataset •Larynx dataset	•Brain stem meshes from the UK Biobank
Comparison with other models?	Y	N	N
Evaluate on different datasets?	N	Y	N
Which model performs best?	TL-DeepSSMs	DeepISSMs	\
Strengths	+ DeepSSM can be used for downstream applications without the need for resource-heavy processing. + The correspondence model provided by DeepSSM and its variants is comparable in performance with state-of-the-art PDM, if not better in some scenarios.	+ DeepISSM is the first to integrate a true SSM with deep learning technology. + Cross-dataset evaluations on pathological liver segmentation demonstrate that DISSM outperforms leading FCNs.	+ Can generate novel counterfactual meshes that were robust to out-of-distribution interventions, preserved subject-specific traits, and captured the full range of shapes in the true data distribution.
Limitations	- Lack of different types of experiment datasets.	- Did not compare with other methods.	- Did not compare with other methods. - Lack of different types of experiment datasets.

6. Reference

[1] Bhalodia, Riddhish, et al. "DeepSSM: A blueprint for image-to-shape deep learning models." Medical Image Analysis (2023): 103034.

[2] Raju, Ashwin, et al. "Deep implicit statistical shape models for 3d medical image delineation." proceedings of the AAAI conference on artificial intelligence. Vol. 36. No. 2. 2022.

[3] Rasal, Rajat, et al. "Deep structural causal shape models." European Conference on Computer Vision. Cham: Springer Nature Switzerland, 2022.

[4] Grassi, Lorenzo, Sami P. Väänänen, and Hanna Isaksson. "Statistical shape and appearance models: Development towards improved osteoporosis care." Current Osteoporosis Reports 19 (2021): 676-687.

[5] Nair, Prathap M. and Andrea Cavallaro. “3-D Face Detection, Landmark Localization, and Registration Using a Point Distribution Model.” IEEE Transactions on Multimedia 11 (2009): 611-623.

[6] Face Alignment Models (Face Image Modeling and Representation) (Face Recognition) Part 3 (what-when-how.com).

[7] Zheng, Yefeng, et al. "Four-chamber heart modeling and automatic segmentation for 3-D cardiac CT volumes using marginal space learning and steerable features." IEEE transactions on medical imaging 27.11 (2008): 1668-1681..

[8]. DeepSSM: A Deep Learning Framework for Statistical Shape Modeling from Raw Images - PMC (nih.gov)

[9] Adams, Jadie, et al. "Learning spatiotemporal statistical shape models for non-linear dynamic anatomies." Frontiers in Bioengineering and Biotechnology 11 (2023): 1086234.

Seitenhierarchie

Learning-based Statistical Shape Model