INTRODUCTION
The thyroid gland is part of the endocrine system, which regulates hormones being released in the body. It is formed by two lobes and a connecting membrane called the isthmus.7 The thyroid is subject to two main types of diseases: hyperthyroidism and hypothyroidism. They both cause an imbalance in the amount of hormones released in the circulatory system.3 To diagnose these diseases, we can measure the volume of the gland regularly to detect any pathological anomalies.6 Figure 1 shows the thyroid in beige.
Figure 1. Anatomy of the thyroid gland and surrounding area.
Source: CFCF from wikipedia.org
Conventionally, 2D ultrasound methods are used to estimate thyroid volumes. The width, length and depth measured from the ultrasounds are used to form an ellipsoidal shape for each of the thyroid’s lobes and estimate their volume. Ultrasound is used instead of other imaging modalities like CT or MRI because it is not very costly and it is easy to use.7
Another method uses 3D ultrasound (3DUS). It consists of recording a stack of 2D ultrasounds at various positions, segmenting them, performing a 3D reconstruction, and calculating the volume. It has been shown that this method performs as well as a CT or MRI, when using manual tracing for the segmentation. Manual segmentation has also been shown to be more accurate in volume estimation than using the ellipsoidal model, for normal and abnormal thyroid shapes.7
However, in a clinical environment, it is important to be efficient. This requires segmentation techniques that are accurate, but also easy to use by doctors. This ease of use is characterized by the level of user interaction and the computation time of the segmentation.7
The authors had noticed a gap in comparative work based on practical criteria for thyroid volume estimation with 3DUS methods. Hence, they attempted to fill it by comparing three segmentation methods (level set, graph cut, feature-based classifier) under two criteria: segmentation quality and usability of the method. The authors have also made public their sixteen ground truth 3DUS records as reference to encourage further research.7
METHODOLOGY
Sixteen freehand-tracked 3DUS ground truth records were collected by a medical expert from MeVisLab by performing multiple sweeps using the GE Logiq E9 XDclear 2.0 device and the GE ML6-15 probe. These records were manually segmented by the medical expert.7 A freehand-tracked ultrasound is when the probe is guided by the user’s hand and the position of the probe is tracked by sensors to build the 3D reconstruction from the stack of 2D ultrasound images collected.1
The first segmentation algorithm used was a level set method called active contours without edges (ACWE). It is a PDE-based method that takes an initialized “snake” and tries to evolve it to represent the edge of an object by minimizing an energy function.5 It requires the user to choose a slice of the thyroid and draw a rectangle inside the thyroid, to initialize it. Then, no further interactions are required, as adjacent slices are segmented using previously segmented slices. If the resulting segmentation is not satisfactory, it can be restarted with a different initialization. The “without edges” version was chosen because, due to the noisiness of ultrasound images, the edges of the thyroid are not clearly defined. ACWE’s accuracy depends vastly on the user initialization.7
The second segmentation algorithm used was a graph cut method called GrabCut, from the OpenCV library. It requires the user to draw a purple contour around the thyroid, as well as a yellow scribble inside. This allows the algorithm to segment the remaining slices by estimating the color distribution of the thyroid and area around it. The user can correct a slice’s segmentation, triggering further resegmentation on other slices. The accuracy increases with the number of user corrections. Since the clinical usability is important, there needs to be a balance between the number of user corrections and the accuracy of the segmentation. The authors found that a good balance is performing a correction every 10 slices, or every 2mm.7
The third segmentation algorithm used was a feature-based classifier, more specifically a decision tree. The tree’s feature selection was based on a coefficient of variation computed using the mean and standard deviation of the color in a neighborhood of pixels. The user only needs to select a few samples from inside and outside the thyroid, let the decision tree quickly train, and then the remaining slices are segmented extremely fast. The speed and minimal input of this method allows the user to select new samples and retrain the tree. Since the segmentation had some noise and unconnected regions to the thyroid, the authors did some post-processing on the final segmentation.7 Figure 2 shows an example of the segmentation on one slice using each method.
Figure 2. Segmentation by each of the methods: ground truth (white), level set (red), graph cut (green) and feature-based classifier (blue).
Source: [7]
More recently, other methods have been used, resulting in more accurate segmentations. One example is 3D U-Net. It consists of a convolutional neural network (CNN) being trained on labelled training data. Its advantage is that it does not require the user to select features, as it uses a 3D kernel to automatically select them during training. Using a 3D kernel means that it classifies the pixels using information from neighboring cross-sectional slices.6 The segmentation is also rapid when the network is already trained. The main disadvantage is the large and varied data set required to train it.4
RESULTS AND CONCLUSION
To compare the accuracy of the three segmentation algorithms, MeVisLab was used to visualize the thyroid and carotid artery in 3D. The similarity between the ground truth and the algorithmic segmentations was evaluated using the Dice Coefficient (DC), a similarity statistic.7
All three algorithms performed similarly in terms of accuracy with an average DC of around 0.7. Differences were noticed in the range of the DC across the sixteen samples. The level set and feature-based classifier methods had a bigger number of under- and over-segmented thyroids. This particularly happened in thin thyroids and in the area of the isthmus. Furthermore, the graph cut algorithm had the biggest range, due to its accuracy being dependent on the number of user corrections.7 Figure 3 shows the DC distribution of the three algorithms. For 3D U-Net, on a data set where their segmentation with the level set and graph cut methods had a DC of around 0.78, their 3D U-Net achieved around 0.88, a significant increase.4
Figure 3. Box plot of the Dice Coefficient of the segmentation of all sixteen thyroids by segmentation method.
Source: [7]
The clinical usability was compared using the user interaction and computation times. The level set method had a short interaction time because the user only had to draw a rectangle in one slice, but the processing time of the algorithm was relatively long. Moreover, due to bad segmentations, the level set method had to be reinitialized 6.7 times on average. The graph cut method had the highest amount of user interaction, which was expected because it allows user corrections. The average user interaction time on each thyroid was 36 seconds. The feature-based classifier had the fewest user interactions, since it required the user to click inside and outside the thyroid, followed by a quick segmentation. The average time spent during initialization was 13 seconds. However, for bad initializations, the segmentation had to start over, but this was a minor disadvantage since the segmentation computation is quick.7 Finally, the 3D U-Net method requires no user interaction and its computation time is faster than the level set and graph cut methods.4
STUDENT REVIEW
This paper presented an algorithm from each of the three main families of segmentation methods, giving us an idea of a wide range of techniques. Their accuracy was measured by the Dice Coefficient, allowing us to compare the methods with others from more recent papers, especially if the same data set is used. Their comparison was not limited to the accuracy of the methods; a clinical scenario was evaluated by quantifying the user effort needed when using each method and the waiting time due to the segmentation computation. These are all strong points of this research paper which, accompanied by a clear introduction of the thyroid segmentation problem, gave the reader a good presentation of the methods at the time when the paper was written.
One weaker point was the lack of unhealthy or deformed thyroids in the data. The level set and feature-based classifier methods have trouble segmenting thin portions of the thyroid, making it interesting to see how they would have performed with non-standard thyroids. It would have also been interesting to see how the algorithms perform with nodules or cysts in the thyroid, as they produce different textures. Another weakness was the lack of a clear elapsed time comparison of the methods. While the authors mentioned how much time different aspects of the methods took, it would have been interesting to have a table that clearly shows the average time, from start to finish, to calculate the volume using each method.
For future research, the authors have already suggested the use of pathological data sets, and combining the methods shown or trying fully automatic ones. One could also perform these segmentations using multiple users with different experiences and see how much their different initializations and interactions with the algorithms affect the Dice Coefficient and interaction time.
REFERENCES
[1] Cenni, Francesco [Science in the Break]. 3D Freehand Ultrasonography: a video tutorial. YouTube. Retrieved November 10, 2021, from https://www.youtube.com/watch?v=lMJnpthHP2k.
[2] Chen, Junying & You, Haijun & Li, Kai. (2020). A Review of Thyroid Gland Segmentation and Thyroid Nodule Segmentation Methods for Medical Ultrasound Images. Computer Methods and Programs in Biomedicine. 185. 1-18. 10.1016/j.cmpb.2020.105329.
[3] Cleveland Clinic Medical Professional (Ed.). Thyroid disease: Causes, symptoms, risk factors, testing & treatment. Cleveland Clinic. Retrieved November 10, 2021, from https://my.clevelandclinic.org/health/diseases/8541-thyroid-disease.
[4] Gulame, Mayuresh & Dixit, Vaibhav & Suresh, M.. (2021). Thyroid nodules segmentation methods in clinical ultrasound images: A review. Materials Today: Proceedings. 45. 10.1016/j.matpr.2020.10.259.
[5] Poudel, Prabal & Hansen, Christian & Sprung, Julian & Friebe, Michael. (2016). 3D Segmentation of Thyroid Ultrasound Images using Active Contours. Current Directions in Biomedical Engineering. 2016. 10.1515/cdbme-2016-0103.
[6] Poudel, Prabal & Illanes, Alfredo & Sheet, Debdoot & Friebe, Michael. (2018). Evaluation of Commonly Used Algorithms for Thyroid Ultrasound Images Segmentation and Improvement Using Machine Learning Approaches. Journal of Healthcare Engineering. 2018. 1-13. 10.1155/2018/8087624.
[7] Wunderling, Tom & Golla, Björn & Poudel, Prabal & Arens, Christoph & Friebe, Michael & Hansen, Christian. (2017). Comparison of thyroid segmentation techniques for 3D ultrasound. Proc. SPIE 10133, Medical Imaging 2017: Image Processing. 10.1117/12.2254234.