Thorsten Busch, summer semester 2017


Nowadays, non-destructive testing (NDT) is a key aspect of quality certification of components. Innovative methods are developed and approved for serial inspection of new materials and components. Therefore, it is important to know which size of defects can be detected. Often the smallest detectable flaw size can be assumed but much more important and critical is the largest flaw that could be missed by the NDT method. Before testing it is not known which flaws are inside a component. Consequently, statistical analysis must be done to calculate the possibility of (not-) detecting flaws in several sizes under real testing conditions. A quantitative result can be calculated by the probability of detection (POD) and the possibility of false alarms (PFA).

Statistical background

The confidence interval and confidence level are two important statistical terms to describe the accuracy of data. The confidence interval gives a range around the main value in which the most values can be located. In contrast, the confidence level provides a probability of how many values are within the confidence interval. In most cases the confidence level is set to 95 % or 99 %.[1] In general, every measurement varies around a value. If one specific flaw is detected more than once, the registered values will vary around the true value, which is not exactly known. For instance, the depth of a flaw is to be determined by ultrasonic testing. After many measurements, most of the measured values lie in a range (e.g. 4 to 6 mm) around the true value of 5 mm. Now, a confidence interval can be determined, in this case 2 mm. Probably there are still some other values that are outside the confidence interval. The probability how many values are in the confidence interval gives the confidence level, e.g. 95 %. This means 95 % of all measured values are in the range of 4-6 mm. Confidence interval and level are correlated to each other and determined by several factors. Generally, the larger the confidence interval, the…

  • ... smaller the sampling size
  • ... larger the standard error
  • ... larger the confidence level

POD Determination

Hit/ miss method

The hit/ miss method gives a binary analysis of a testing signal, whether a defect is detected (hit) or missed (miss). Consequently, the testing results can be scaled in four possible configurations depending if a (non-) existing defect is (not) detected:

Table 1: The four possible diagnosis results of testing

defectdetectednot detectedprobability
ExistingTrue positive (TP)

The existing defect is detected (hit)

False negative (FN)

The existing defect is not detected (miss)

TP + FN = 100 %
non-existingFalse positive (FP)

A defect is detected even though it is not existing

True negative (TN)

No defect is detected, where no defect exists

FP + TN = 100 %


The POD is equal to the probability of TP and can be calculated in the following way:[2]

POD=P(TP)=\dfrac{TP}{(TP+FN)}

For one specific flaw size a, this equation can also be represented by the number of positive tests divided by the total amount of tests:[3]

POD(a)= \dfrac{n_{pos} (a)}{n_{tot} (a)}

The hit/ miss method requires a clearly defined hit/ miss criterion, e.g. a defined threshold.

â vs. a method

Another way to determine the POD is the â vs. a method in which the whole response signal is analysed. In contrast to the hit/ miss method the signal amplitude â correlating to a specific flaw size a is also analysed. This procedure contains more information about the detected flaw, like the flaw size and location. To define whether a defect is detected a threshold depending on the noise level must be defined. All amplitudes which exceed this noise level are registered and analysed. The definition of this threshold based on the accuracy of the inspection is very difficult because it must be ensured that no noise amplitudes are evaluated. In this case the detection can be compared to the FP configuration from the hit/ miss method.[4]

POD curves

All POD curves have a similar characteristic. The POD of a specific flaw size is plotted against the flaw size a. A typical POD curve is shown in figure 1. The curve can be divided in three different regions. In the first region (1), the very small defects can hardly be detected, only with a very low probability. In the transition region (2), more and bigger defects can be detected with a higher probability. An important flaw size is called a90/95. At this value, a flaw can be detected with a 90 % probability with a confidence level of 95 %. All flaws sizes higher than a90/95 belong to the third, the high detective region (3). In this region, a reliable inspection is possible. For a NDT system, the searched defects must be larger than the a90/95 value otherwise a trustful and reliable defect detection isn´t guaranteed. Therefore, the determination of the POD curve is very important to ensure the reliability of the inspection system. The a90/95 value depends both on the NDT technique and the sensitivity of the measuring system.

Figure 1: A typical POD curve

Possibility of false alarm (PFA)

Beside the POD there is a second possibility to describe the reliability of an inspection system. The PFA concentrates on the undetected flaws. These are very important because they can create considerable and momentous damage in a component. The PFA is calculated analogous to the POD but instead of TP and FN the FP and TN diagnosis results are used:

PFA=P(FP)=\dfrac{FP}{(FP+TN)}

Receiver Operating Characteristic (ROC)

The ROC curve is used to characterize the accuracy of a NDT system. Therefore, the POD is plotted against the PFA. Some ROC curves and important points are shown in figure 2. Point 1 in the diagram shows the result of a perfect tester with a PFA of 0 and a POD of 1that isn´t practicable in reality. The opposite testing result is a guessing tester, represented in point 2. Concentrating on one specific ROC curve and following this curve from the lower left to the upper right corner, the sensitivity of the system raises. In the lower part of the curve the highest signals are included and hardly any false calls or noise signals. Going to a higher part of the curve more and smaller defects are detected but also a greater amount of noise is registered. The further left the ROC curve is located the better is the inspection technique. The straight line that connects the lower left corner with the upper right corner is the result of a random testing.[5]

The four cases of possible diagnosis results can also be illustrated for whole response signals. Therefore, the amplitudes of the noise level and the defect are approached by a continuous signal (figure 3). Depending on the determination of the threshold the regions of the different cases vary. The further left the threshold is estimated the more existing defects can be found but also more false alarms occur. This equals to the position on a ROC curve.

Figure 2: Different ROC curvesFigure 3: Cases of diagnosis results depending on the threshold

Determination of POD curves in real life

The determination of a POD curve is a very important method to validate the usability and accuracy of an inspection system for a specific NDT task. Defects in real components aren´t known in advance, therefore some synthetic test blocks must be built. For a significant POD curve many different flaw sizes need to be inspected. Consequently, the test blocks must be very big with a high variation of flaw sizes. Furthermore, every inspection is very specific in the used material, testing depth, used transducers etc., that implicates one specific test block for every inspection. One inspection requires a minimum of 40 different flaws.[3] Altogether the production of many different test blocks with a convincing amount of flaws is very expensive. That´s the reason why there are still only a few real POD curves. Based on being aware of how important a statistical evaluation of a NDT is more and more POD curves are generated and analysed before the technique is used in serial inspection.  

Literature

  1. „Survey System,“ [Online]. Available: https://www.surveysystem.com/sscalc.htm. [Zugriff am 04.06.2017].
  2. C. M. M. S. F. Fücsök, „MEASURING OF THE RELIABILITY OF NDE,“ in Application of Contemporary Non-Destructive Testing in Engineering, Portorož, Slovenia, 2005.
  3. J. Farnhammer, Bestimmung und Verbesserung der Fehlerauffindwahrscheinlichkeit bei der Ultraschallprüfung von Triebwerksbauteilen, Dresden, 2016.
  4. J. H. Kurz, A. Jüngert, S. Dugan und G. Dobmann, „Probability of Detection (POD) determination using ultrasound phased array for considering NDT in probabilistic damage assessments,“ in 18th World Conference of Nondestructive Testing, Durban, 2012.
  5. F. Fücsök, C. Müller und M. Scharmach, „MEASURING OF THE RELIABILITY OF NDE,“ in Application of Contemporary Non-Destructive Testing in Engineering, Portorož, Slovenia, 2005.