Sensorless US compounding

Abstract

In this blog the topic of Sensorless Ultrasound (US) Compounding shall be introduced. The problem statement shall be explained and current procedure and methods for 3D Ultrasound reconstruction will be discussed. We then plan to throw light upon how deep learning based methods are improving the whole process of reconstructing 3D Ultrasound images from 2D Ultrasound images. For this purpose two papers have been selected and reviewed thoroughly. Each paper highlights and discusses two different deep learning methods and approaches for 3D US reconstruction by compounding of 2D US images. We will also compare the two approaches and how well the papers are written as well.

Author: Devansh Sharma

Tutor: Mohammad Farid Azampour

1. Introduction

1.1 2D Ultrasound Imaging

2D ultrasound imaging is a widely used medical technique that provides valuable insights into the human body. It utilizes high-frequency sound waves to generate real-time, two-dimensional images of organs, tissues, and developing fetuses. The procedure involves a transducer that emits sound waves, which then bounce back and are detected to create images. This non-invasive and safe imaging modality is particularly useful in obstetrics for monitoring fetal development, identifying potential abnormalities, and determining the gender of the baby. It also aids in diagnosing various conditions affecting internal organs, such as gallstones or kidney stones. 2D ultrasound imaging continues to play a crucial role in modern healthcare, providing valuable diagnostic information with minimal risk to patients.

Figure 1. Fetal Ultrasound [1]

1.2 3D Ultrasound Imaging

In recent years, the field of medical imaging has witnessed significant advancements, one of which is the development of 3D ultrasound image reconstruction methods. These techniques utilize a series of 2D ultrasound images to create a three-dimensional representation of the scanned area. By integrating multiple 2D images captured from different angles, the reconstructed 3D image provides a more comprehensive and detailed view of the internal structures.

To perform 3D ultrasound image reconstruction, specialized devices are used in practice. These devices consist of a transducer, which emits and receives ultrasound waves, and a computer system that processes the collected data. The transducer is maneuvered over the patient's body to capture a sequence of 2D images, covering the desired area from multiple perspectives. The computer then employs sophisticated algorithms to align and merge these images, constructing a coherent 3D representation.

This technology offers numerous benefits in various medical fields. In obstetrics, it enables healthcare professionals to visualize the fetus in three dimensions, aiding in the diagnosis of congenital anomalies and assisting in surgical planning. In cardiology, 3D ultrasound helps assess heart function and detect abnormalities. Additionally, it finds applications in abdominal imaging, urology, and musculoskeletal examinations.

The availability of 3D ultrasound image reconstruction has revolutionized diagnostic capabilities, providing clinicians with a powerful tool for improved visualization and accurate assessment of anatomical structures.

Figure 2. 3D Ultrasound Imaging setup for spine imaging [2]

1.3 Problem Statement

Current 3D ultrasound image reconstruction methods face several significant challenges that hinder their widespread adoption and effectiveness. One major concern is the high cost associated with external tracking approaches, making them financially prohibitive for many healthcare facilities. Moreover, these methods often have limited use cases, restricting their applicability in various medical scenarios. Additionally, the sensors utilized in external tracking systems are prone to errors, leading to inaccuracies in the reconstructed 3D images.

Optical tracking methods encounter occlusion problems, where certain areas of the scanned object are obscured from the tracking system's view, resulting in incomplete reconstructions. Electromagnetic tracking devices, on the other hand, suffer from interference issues, affecting the reliability and precision of the reconstructed images.

Conventional methods like Voxel-based Nearest Neighbour (VNN) lack the necessary efficiency and accuracy required for precise 3D image reconstructions. Even image processing methods, while promising, still need improvement to overcome various artifacts and limitations inherent in current techniques. Addressing these challenges is crucial to enhance the overall performance and applicability of 3D ultrasound image reconstruction, allowing for better diagnostic capabilities and more informed medical decision-making.

1.4 Deep Learning Solutions

Deep learning methods have made significant contributions to improving the performance and process of 3D ultrasound image reconstruction. One notable example is the DCL-Net (Deep Convolutional Level-set Network), which leverages deep convolutional neural networks (CNNs) and level-set-based contour evolution. DCL-Net has demonstrated promising results by effectively reconstructing volumetric images from 2D ultrasound slices. It combines the power of CNNs to extract meaningful features from the input images with contour evolution techniques to refine the contour and generate accurate 3D reconstructions.

Another noteworthy advancement in this field is the use of NeRF (Neural Radiance Fields) based methods. NeRF is a novel approach that represents the volumetric scene as a continuous function, allowing for high-quality 3D reconstruction from sparse and unstructured data. By training deep neural networks on a large dataset of 2D ultrasound images, NeRF-based methods can estimate the radiance and geometry of the scene, enabling the generation of detailed and realistic 3D reconstructions.

These deep learning-based solutions have improved the overall performance of 3D ultrasound image reconstruction by addressing limitations such as artifacts, inaccuracies, and the need for manual intervention. They offer the potential for more accurate, efficient, and automated reconstruction processes, enhancing the diagnostic capabilities of ultrasound imaging in various medical applications. Continued research and development in deep learning methods, including DCL-Net and NeRF-based approaches, hold promise for further advancements in 3D ultrasound image reconstruction, leading to improved healthcare outcomes and patient care.

Figure 3. An overview of the proposed DCL-Net, which takes one video segment as input
volume and gives the mean motion vector as the output [3]

Methodology and Evaluation

Now we look at two different papers and throw light on their methodology. We will also try to explain them in bit more detail and highlight the experiments conducted and final results obtained.

2.1 NeRF based Implicit Representation Method for Freehand 3D Ultrasound Image Reconstruction

Motivations

The motivation of this research paper is to address the limitations and artifacts in 3D ultrasound reconstruction of the carotid artery. The authors highlight that carotid atherosclerosis (CA) is a prevalent condition with significant health implications, including stroke, and ultrasound imaging is commonly used for clinical examination due to its non-invasiveness and cost-effectiveness. However, current ultrasound methods only provide 2D information and rely heavily on the sonographer's experience.

To overcome these limitations, the authors propose the development of a novel 3D ultrasound reconstruction algorithm based on deep learning. The objective is to improve the image quality of the reconstructed volume and reduce artifacts. The authors mention that traditional methods, such as voxel-based nearest neighbor (VNN), have been commonly used but may not provide satisfactory results.

Method Overview

NeRF - Neural Radiance Fields

The method used is based on NeRF (Neural Radiance Field) which is a cutting-edge computer vision method that reconstructs scenes and synthesizes novel views using deep learning. It models a scene as a continuous function mapping 3D coordinates to radiance values. By training a neural network on image and pose data, NeRF generates highly realistic views from any viewpoint, capturing fine details and lighting effects. While computationally demanding, NeRF's advancements have led to optimizations for real-world applications. This technique has the potential to revolutionize virtual reality, video game design, and other fields reliant on accurate scene reconstruction and rendering.

Figure 4. An Overview of NeRF scene representation [4]

Data acquisition and pre processing

The authors acquired 15 original datasets from a local hospital using a portable freehand 3D ultrasound imaging system. These datasets were obtained from clinical sources, rather than being created by the authors themselves. The datasets consisted of 2D transverse images captured using a 10-MHz linear array transducer and corresponding location information obtained from an electromagnetic tracker. Each dataset contained 208 +- 46 transverse frames.

The authors then performed data acquisition and preprocessing on these 15 datasets as part of their research. The acquired 2D images were input into a segmentation neural network to obtain the semantic probabilistic distribution. Subsequently, the outputs from the neural network, along with the original images, were used for reconstructing the 3D volume. In addition, the authors also reconstructed the obtained data using the voxel-based nearest neighbor (VNN) method for comparison purposes.

Figure 5. Overall pipeline for data acquisition and processing [5]

Network architecture and training process

The method proposed in this research paper involves a deep learning architecture for reconstructing image volumes based on the neural radiance field (NeRF) approach. The objective is to jointly encode volume intensity and semantic features for improved reconstruction quality.

The neural network architecture is designed to map a 3D coordinate $\begin{array}{l}x\end{array}$ to an output $\begin{array}{l}y\end{array}$ , which consists of a semantic vector $\begin{array}{l}s\end{array}$ and volume intensity $\begin{array}{l}i\end{array}$ . The mapping function is represented as $\begin{array}{l}y = F _\theta(X)\end{array}$ , where $\begin{array}{l}F_\theta\end{array}$ is the learned neural network with its weights $\begin{array}{l}\theta\end{array}$ . Position Encoding (PE) is utilized to map each 3D coordinate $\begin{array}{l}x\end{array}$ to a higher-dimensional space, preserving high-frequency information of the volume intensity.

The network architecture is based on a multi-layer perceptron (MLP) from NeRF. The front layers are shared since semantic information and volume intensity are considered correlated. The width of the network is reduced before the outputs to separately output the semantic distribution and volume intensity. The SIREN activation function is applied to better represent the high-frequency domain. Connecting PE to the middle layer of the MLP aims to improve the quality of the reconstructed volume.

The entire network is trained from scratch using volume intensity loss $\begin{array}{l}L_i\end{array}$ and semantic loss $\begin{array}{l}L_s\end{array}$ . The volume intensity loss measures the difference between the predicted intensity and the ground truth, while the semantic loss measures the discrepancy between the predicted and ground truth semantic probabilities. The training loss combines both losses. Hence the final loss is $\begin{array}{l}L = L_i + L_s\end{array}$

Figure 6. The networks architecture. Locations were fed into networks after position encoding (PE). Volume intensity (i) and semantic (s) probability were the functions of 3D locations. [5]

Experiments and Conclusions

Evaluation metrics

The evaluation metrics used in this paper to assess the image quality of the reconstructed volume are discontinuity, curvature, and distortion. These metrics are calculated on the transverse frames of the volume.

Discontinuity: This metric characterizes break points in the region of interest (ROI). If the voxels from the vessel wall in a frame are not connected, that frame is marked as discontinuity.
Curvature: Curvature is calculated from the outer edge of vessel walls on non-discontinuous frames. The curvature of a voxel is approximated by its nearest adjacent voxels. For a voxel $\begin{array}{l}V_{out}\end{array}$ ,the adjacent voxels were defined as $\begin{array}{l}V_{adj} \in \{ V_{out-N}, V_{out-N+1}, ....V_{out +N}\}\end{array}$ . Therefore, the curvature at $\begin{array}{l}V_{out}\end{array}$ could be defined as: $\begin{array}{l}Curvature(V_{out}) = median(\{f_c(V_{out-1},V_{out},V_{out+1}), ... , f_c(V_{out-N}, V_{out} , V_{out+N})\})\end{array}$
The curvature at a voxel is determined by measuring the variance of curvatures along the edge. A smaller variance indicates a smoother boundary shape.
Distortion: Distortion is calculated based on curvature. A predetermined threshold, $\begin{array}{l}C_{threshold}\end{array}$ , is used, and if the curvature of a voxel exceeds this threshold, it is counted as a distortion point. In this study, $\begin{array}{l}C_{threshold}\end{array}$ was set to 0.4.More distortion points indicate a more irregular shape of the vessel boundary.

To evaluate the image quality of the reconstructed volume, these metrics provide quantitative measures related to the continuity, smoothness, and regularity of the vessel wall.

Results

So having defined the evaluation metrics lets see the results of the experiments we obtained after training our model several times. So now lets see some visual comparisons that show improvement by using this method compared to a standard VNN on our dataset.

Figure 7 provides a visual comparison between the conventional VNN method and our proposed method. In the example image, a noticeable artifact is observed on the outer vessel wall in the VNN volume, as indicated by the white box. However, the new method successfully rectifies this discontinuity at the same position in the frame, resulting in a more coherent image. The green and red parts of the image are the segmented vessel wall area and lumen area respectively. The white box illustrates the position of discontinuity and its corresponding position on the image from our method.

Figure 7. Illustration of discontinuity rectification: a) the image with discontinuity from the VNN method; and b) the improved image from the new method. [5]

Figure 8 demonstrates an example of distortion in the outer edge of the vessel wall. The top edge appears rugged using the VNN method, whereas the new approach significantly smoothens the bumpy contour. The green and red parts of the image are the segmented vessel wall area and lumen area respectively. The white box illustrates the position of discontinuity and its corresponding position on the image from our method.

Figure 8. Illustration of distortion rectification: a) image from the VNN method; and b) image from the new method. [5]

Figure 9 showcases an example of a reconstructed 3D image. The green area represents the reconstructed tunica externa, and the red area indicates the lumen. Notably, our method yields a smoother reconstructed surface compared to the VNN method.

Figure 9. Illustration of 3D reconstructed volume: a) volume from the VNN method; and b) volume from the new method [5]

Furthermore, in Figure 6, a comparison of distortion numbers for one dataset illustrates that the new method reduces distortion in the majority of frames.

Table I [5] presents statistical results across all fifteen subjects, categorizing the cases as improved, unchanged, or deteriorated based on discontinuity, distortion, and variation of curvature. Significantly, almost two-thirds of the cases demonstrate improvement with method. The new approach particularly excels in reducing discontinuity and variation, surpassing its performance in reducing distortion.

Table II [5] provides quantitative differences for all improved cases. It highlights that our method consistently reduces artifacts by more than 30% across all evaluated categories.

In Table I we can see that there were 5 deteriorated cases in the experiments due to increased number of distortions. Training a Multi-Layer Perceptron (MLP) with high entropy may result in irregular boundaries and more distortion points compared to the conventional VNN method. Another factor that could contribute to these issues is the large gap between original image frames, which prevents the MLP from effectively filling the gap with appropriate semantic distributions, leading to additional distortions. To address these deteriorated cases, an alternative approach could involve utilizing semantic probability distribution to construct graphs and applying a graph-cut algorithm to refine the segmentation edge boundaries.

Figure 10. Comparison of distortion number from one dataset between VNN and the new method. [5]

In conclusion, the experiments demonstrate the superior performance of the new proposed method in reducing artifacts, such as discontinuity and distortion, resulting in smoother and more accurate vessel structures. These findings have significant implications for enhancing the diagnosis of carotid atherosclerosis (CA). Overall, our study presents a promising approach to improving 3D ultrasound reconstruction for carotid artery imaging. The proposed method demonstrated potential for more accurate diagnosis of carotid atherosclerosis (CA). However, there were a few deteriorated cases due to uncertainties in semantic segmentation and large gaps between image frames. Future work may involve refining the segmentation edge using graph-cut algorithms and validating the diagnostic results with additional and larger clinical data. Overall, this research presents a promising approach for enhancing 3D ultrasound reconstruction for carotid artery imaging.