Constant CV% in Multi-Lab P & B

D. Stokes, May 2023

(In memory of Mike Thomas, who finished his journey among us late last year.)

Libraries

Code

library(tidyverse)
library(sn)
library(latex2exp)

Part 1: for Pedestrians

“I can’t make it any clearer.”

– Mike Thomas, overheard in many conversations

Introduction

To a reasonable approximation, many ASTM test methods exhibit a constant Coefficient of Variation percentage (CV%) when results of inter-laboratory studies are analyzed.

Such situations are often described in their corresponding Precision and Bias Statements as ‘this test method has been shown to have a constant CV% of (some value), and therefore results of two properly conducted tests of the same materials at different laboratories should not differ by more than (some percentage) of their mean, nineteen times out of twenty’.

The constant CV% value is known as the \(1s\%\) value, and the percentage difference from the mean is known as the \(d2s\%\), as per ASTM C670, “Standard Practice for Preparing Precision and Bias Statements for Test Methods for Construction Materials”.

For situations as discussed here (i.e., a constant CV%) \(d2s\%\) is simply \(1s\%\) multiplied by \(2\sqrt{2}\).

This work aims at providing a visual representation of the expectation of the range in results that would follow from having a test method whose results exhibit the trend of a constant CV%, when one has obtained a result from one laboratory, and then obtains a result from another laboratory utilizing a duplicate (e.g., a split) sample.

Upper and Lower Values

From the phrase ‘results of two properly conducted tests of the same materials from different laboratories should not differ by more than (some percentage) of their mean’ we can derive upper and lower limits.

Designating two results from different laboratories as a pair of values \(Y_1\) and \(Y_2\), then the mean of these values is \((Y_1 + Y_2)/2\), and the difference between the two is the absolute value of the difference: \(|Y_2 - Y_1|\).

For the algebraic manipulation to follow, \(|Y_2 - Y_1|\) can be broken into \((Y_2 - Y_1)\) for the case \(Y_2 > Y_1\), and \((Y_1 - Y_2)\) for the case \(Y_1 > Y_2\).

When \(Y_2 > Y_1\), solving for \(Y_2\) will yield the upper limit, and conversely solving for \(Y_2\) yields the lower limit when \(Y_1 > Y_2\).

Using \(d2s\%\) to stand for the maximum percent difference from the mean of the two values \(Y_1\) and \(Y_2\) (nineteen times out of twenty) we can then write the following relationship:

\[ \frac{(Y_1 + Y_2)}{2} * \frac{d2s\%}{100} = |Y_2 - Y_1| \tag{1}\]

To solve for the upper limit, we solve the specific case \(Y_2 > Y_1\) and this equation:

\[ \frac{(Y_1 + Y_2)}{2} * \frac{d2s\%}{100} = (Y_2 - Y_1) \tag{2}\] After suitable manipulation we arrive at the result:

\[ Y_2 = Y_1*\frac{(2+\frac{d2s\%}{100})}{(2-\frac{d2s\%}{100})} \tag{3}\] Similarly, we would find the following for the lower limit:

\[ Y_2 = Y_1*\frac{(2-\frac{d2s\%}{100})}{(2+\frac{d2s\%}{100})} \tag{4}\]

Let’s Do an Example

ASTM C39, “Standard Test Method for Compressive Strength of Cylindrical Concrete Specimens”, has a multilab CV% of 5.0% (\(1s\%\)). The difference between two results from independent laboratories should therefore not exceed 14% (\(d2s\%\)) of their mean, nineteen times out of twenty (95% confidence level).

One way to visualize what this range looks like is to plot a line representing a spectrum of values that may be obtained from a lab initially, and draw the upper and lower values surrounding this line. These lines then represent the bounds of results that might could be obtained from another lab, performing the same test on a duplicate sample, that fit these statistics (19 times out of 20).

We’ll look at ‘initial’ values (\(Y_1\)) ranging from 3000 psi to 9000 psi. For each value along this spectrum of ‘initial’ values, we calculate the upper and lower bounds using equations \((3)\) and \((4)\).

Code

Y1 <- seq(3000, 9000, by=10)
X <- Y1
d2s_c39 <- 14
Y2_upper <- Y1*(2+d2s_c39/100)/(2-d2s_c39/100) #Eq. (3)
Y2_lower <- Y1*(2-d2s_c39/100)/(2+d2s_c39/100) #Eq. (4)
c39_df <- data.frame(X=X, Y1=Y1, Y2_upper=Y2_upper,
                     Y2_lower=Y2_lower)
p_c39_1 <- ggplot(c39_df, aes(x=X, y= Y1)) +geom_line() +
  geom_line(aes(y=Y2_upper), color="red") +
  geom_line(aes(y=Y2_lower), color="red") +
  theme_classic() + ylim(0, 18000) + 
  labs(x="Strength (psi) (First Result)", 
       y="Strength (psi) (Second Result)",
  title="ASTM C39 - Upper and Lower Bounds from Different Lab Results") +
  theme(axis.text=element_text(color="black", size=12), 
        panel.grid.major=element_line(color="black",
        linewidth = 0.25, linetype = 2), 
        axis.title=element_text(size=14))
p_c39_1  + 
  annotate("text", x = 4300, y = 8000, size=5, 
  label = "Given any result \n along this line") +
  annotate("text", x = 8000, y = 2750, size=5, 
  label = "Results from another \n lab should fit 
  between these limits, \n 19 times out of 20") +
  annotate("segment", x = 5200, xend = 5680, y = 7500, 
  yend = 5870,colour = "black", linewidth = 0.8, 
  arrow = arrow()) +
  annotate("segment", x = 6960, xend = 6100, y = 2500, 
  yend = 5200,colour = "red", linewidth = 0.8, 
  arrow = arrow()) +
  annotate("segment", x = 7700, xend = 7600, y = 5400, 
  yend = 8600,colour = "red", linewidth = 0.8, 
  arrow = arrow())

Fig. 1 - ASTM C39 multi-lab upper and lower bounds

At first glance, this looks reasonable, if perhaps a little broader than we might expect. This is, however, precisely what the Precision and Bias Statement is saying.

(Although it is very difficult to see on this plot, the upper and lower bounds are not equidistant from the line for initial values. The upper bound is farther away from \(Y_1\) than the lower bound is. It is not simply a “plus or minus” value. We’ll see this more clearly below.)

How Large Can a CV% Value Be, and the Method Still Be Viable (i.e., Useful)?

There is no straightforward mathematical answer to this question. So let’s look at a hypothetical situation. What would the situation look like if ASTM C39 (Compressive Strength) had the far lower precision of say, ASTM C1293 (“Standard Test Method for Determination of Length Change of Concrete Due to Alkali-Silica Reaction”).

From the C1293 Precision and Bias Statement, the multi-lab CV% for linear expansion greater than 0.014% (\(1s\%\)) is 23%. The \(d2s\%\) value is 65%.

If we were to overlay these bounds over the ASTM C39 example, it would look like this:

Code

# Now we'll add the far lower precision C1293 bounds
d2s_c1293 <- 65
Y2_upper_1293_h <- Y1*(2+d2s_c1293/100)/(2-d2s_c1293/100) #Eq. (3)
Y2_lower_1293_h <- Y1*(2-d2s_c1293/100)/(2+d2s_c1293/100) #Eq. (4)
c39_df$Y2_upper_1293_h <- Y2_upper_1293_h
c39_df$Y2_lower_1293_h <- Y2_lower_1293_h

p_c39_1 + geom_line(aes(y=Y2_upper_1293_h), color="red",
                    linetype = "dashed") +
  geom_line(aes(y=Y2_lower_1293_h), color="red",
            linetype = "dashed") +
  annotate("text", x = 5000, y = 17900, size=5, 
  label = "(If C39 had the same precision as C1293)") +
  annotate("text", x = 5000, y = 15000, size=5, 
  label = "Much larger bounds based on the
  much lower precision of C1293") +
  annotate("segment", x = 4800, xend = 5000, y = 12500, 
  yend = 2550,colour = "red", linewidth = 0.8, 
  arrow = arrow()) +
  annotate("segment", x = 5000, xend = 5200, y = 12700, 
  yend = 10400,colour = "red", linewidth = 0.8, 
  arrow = arrow())

Considering this last figure, the viabilty of C39 would be very questionable as a test method if it had the same precision as C1293. At the very least, a large number of replicates would likely become necessary.

(Notice in this figure the asymmetry in the upper and lower bounds around the \(Y_1\) values, ie., the black line. While perhaps surprising, this is absolutely correct.)

Now let’s move from the hypothetical and consider ASTM C1293 itself.

Code

Y1_1293 <- seq(0.04, 1, by=0.001)
X_1293 <- Y1_1293
Y2_upper_1293 <- Y1_1293*(2+d2s_c1293/100)/(2-d2s_c1293/100) #Eq. (3)
Y2_lower_1293 <- Y1_1293*(2-d2s_c1293/100)/(2+d2s_c1293/100) #Eq. (4)
c1293_df <- data.frame(X=X_1293, Y1=Y1_1293,
    Y2_upper=Y2_upper_1293, Y2_lower=Y2_lower_1293)
p_c1293_1 <- ggplot(c1293_df, aes(x=X, y= Y1)) + geom_line() +
  geom_line(aes(y=Y2_upper), color="red") +
  geom_line(aes(y=Y2_lower), color="red") +
  theme_classic() + ylim(0, 2) + 
  labs(x="%Expansion (First Result)", 
       y="%Expansion (Second Result)",
  title="ASTM C1293 - Upper and Lower Bounds from Different Lab Results") +
  theme(axis.text=element_text(color="black", size=12), 
        panel.grid.major=element_line(color="black",
        linewidth = 0.25, linetype = 2), 
        axis.title=element_text(size=14))
p_c1293_1 + 
  annotate("text", x = 0.75, y = 0.15, size=5,
  label = "Given any result \n along this line") +
  annotate("text", x = 0.25, y = 1.5, size=5,
  label = "Results from another \n lab should fit
  between these limits, \n 19 times out of 20") +
  annotate("segment", x = 0.6, xend = 0.5, y = 0.12,
  yend = 0.5,colour = "black", linewidth = 0.8,
  arrow = arrow()) +
  annotate("segment", x = 0.2, xend = 0.25, y = 1.1,
  yend = 0.13,colour = "red", linewidth = 0.8,
  arrow = arrow()) +
  annotate("segment", x = 0.3, xend = 0.4, y = 1.14,
  yend = 0.79,colour = "red", linewidth = 0.8,
  arrow = arrow())

Fig. 3 - ASTM C1293 multi-lab upper and lower bounds

Part 2: for Joggers

“I don’t believe it.”

– Mike Thomas, said in an ASTM sub-committee meeting

Let’s Take an Empirical Look at These Bounds

In order to test these bounds calculated from the \(d2s\%\) values, let’s take a sample of pairs of points from a normal distribution, and calculate the differences between the points, divide that difference between each pair by the average of that pair, and count the number of occurences where this percentage falls below the \(d2s\%\) value. And to get a value approaching the theoretical limit, we’ll take, say, 50,000 samples.

(Why take pairs of points? Remember, these graphs are displaying the range in results one might get when one sample is tested by two different laboratories, 19 times out of 20.)

A normal distribution is defined by only two parameters – the mean and the standard deviation. In this case we have a constant CV% - therefore the standard deviation is simply proportional to the mean. So we can pick any mean value we like, and the %difference from the mean values of pairs randomly sampled will be within the bounds of the \(d2s\%\), 19 times out of 20.

Code

ggplot(data.frame(x = c(0, 2)), aes(x = x)) +
stat_function(fun = dnorm, args = c(mean = 1, sd = 0.23)) +
stat_function(fun = dnorm, args = c(mean = .5, sd = 0.115)) +
stat_function(fun = dnorm, args = c(mean = 1, sd = 0.05),
              color="blue", linetype = 2) +
stat_function(fun = dnorm, args = c(mean = .5, sd = 0.025),
              color="blue", linetype = 2) +
  theme_classic() + labs(x="Mean", y="Density",
  title="Normal Distributions for C1293 and C39 (with Means of 0.5 and 1)") +
  annotate("text", x = 1.65, y = 4, size=5, 
  label = "C1293, CV% (1s%) = 23%") +
  annotate("text", x = 1.65, y = 12, size=5, 
  label = "C39, CV% (1s%) = 5.0%", color="blue") +
  annotate("text", x = 1.65, y = 11, size=4.4, 
  label = "(for comparison)", color="blue") +
  annotate("segment", x = 1.4, xend = 1.2, y = 3,
  yend = 1.3,colour = "black", linewidth = 0.8,
  arrow = arrow()) +
  annotate("segment", x = 1.2, xend = 0.6, y = 3.8,
  yend = 2.5,colour = "black", linewidth = 0.8,
  arrow = arrow()) +
  annotate("segment", x = 1.25, xend = 1.03, y = 11.3,
  yend = 7.4,colour = "blue", linewidth = 0.8,
  arrow = arrow()) +
  annotate("segment", x = 1.25, xend = 0.53, y = 11.7,
  yend = 10,colour = "blue", linewidth = 0.8,
  arrow = arrow()) +
  theme(axis.text=element_text(color="black", size=12), 
        axis.title=element_text(size=14))

Fig. 4 - Effect of CV% on distribution spread

To demonstrate this, we’ll take our 50,000 sample-pairs from a normal distribution with a mean of 1, then repeat this using a normal distribution with a mean of 0.5, both of which will have standard deviations corresponding to our CV% of 23% (\(1s\%\)) for ASTM C1293.

For C1293 distributions with means of 1 and 0.5

Code

set.seed(3) #for repeatability's sake
dum_y_one <- numeric(length = 2) # for mean=1
dum_y_half <- numeric(length = 2) # for mean=0.5
number_of_pairs <- 50000
y1_one <- numeric(length = number_of_pairs)
y2_one <- numeric(length = number_of_pairs)
y1_half <- numeric(length = number_of_pairs)
y2_half <- numeric(length = number_of_pairs)
for(i in seq_along(y1_one)){
  dum_y_one <- rnorm(2, mean = 1, sd = 0.23)
  y1_one[i] <- dum_y_one[1]
  y2_one[i] <- dum_y_one[2]
  dum_y_half <- rnorm(2, mean = 0.5, sd = 0.115)
  y1_half[i] <- dum_y_half[1]
  y2_half[i] <- dum_y_half[2]
}

# length(dum_y)
# head(y1)
# head(y2)

# Okay, we have the two pairs of vectors, now let's get the difference vector
y_diff_one <- abs(y2_one-y1_one)
y_diff_half <- abs(y2_half-y1_half)

y_pair_mean_one <- (y2_one+y1_one)/2 # take the average of the pairs
y_pair_mean_half <- (y2_half+y1_half)/2
# Now calculate the %difference from the mean of the pairs
pc_diff_one <- y_diff_one/y_pair_mean_one*100
pc_diff_half <- y_diff_half/y_pair_mean_half*100


# Now let's count the number <= 65, and calculate the percentage
# Also, let's add a flag (Y) if under 65
# under_65 <- numeric(length = number_of_pairs)
# under_65 <- 0
count_under_65_one <- 0
count_under_65_half <- 0
for(i in seq_along(pc_diff_one)){
  if(pc_diff_one[i]<=65) count_under_65_one=count_under_65_one+1
  if(pc_diff_half[i]<=65) count_under_65_half=count_under_65_half+1
}
pc_under_one <- count_under_65_one/length(pc_diff_one)*100
pc_under_half <- count_under_65_half/length(pc_diff_half)*100
# pc_under

# Let's create a dataframe for plotting
main_df <- data.frame(y1_one=y1_one,y2_one=y2_one,
        y1_half=y1_half,y2_half=y2_half,
  y_diff_one=y_diff_one, y_diff_half=y_diff_half,
  y_pair_mean_one=y_pair_mean_one,
  y_pair_mean_half=y_pair_mean_half,
  pc_diff_one=pc_diff_one, pc_diff_half=pc_diff_half)

# Let's plot

p_y1 <- ggplot(main_df) + geom_density(aes(x=y1_one)) +
  geom_density(aes(x=y1_half), color="blue") +
  theme_classic() + labs(x=bquote(Y[1]), y="Density",
  title="Y1 (First Result)") +
  theme(axis.text=element_text(color="black", size=12), 
        axis.title=element_text(size=14))
p_y1 + annotate("text", x = 0.76, y = 3, size=5,
  label = "Mean = 0.5", color="blue") +
  annotate("text", x = 1.3, y = 1.5, size=5,
  label = "Mean = 1")

Fig. 5 - Distributions of random samples of the first result

Code

p_y2 <- ggplot(main_df) + geom_density(aes(x=y2_one)) +
  geom_density(aes(x=y2_half), color="blue") +
  theme_classic() + labs(x=bquote(Y[2]), y="Density",
  title="Y2 (Second Result)") +
  theme(axis.text=element_text(color="black", size=12),
        axis.title=element_text(size=14))
p_y2 + annotate("text", x = 0.78, y = 3, size=5,
  label = "Mean = 0.5", color="blue") +
  annotate("text", x = 1.38, y = 1.3, size=5,
  label = "Mean = 1")

Fig. 6 - Distributions of random samples of the second result

Code

p_y_diff <- ggplot(main_df) + 
  geom_density(aes(x=y_diff_one)) +
  geom_density(aes(x=y_diff_half), color="blue") +
  theme_classic()+ labs(x="Difference |Y2-Y1| (absolute value)", y="Density",
  title="Difference Between the Pairs of Values") +
  theme(axis.text=element_text(color="black", size=12),
        axis.title=element_text(size=14))
p_y_diff + annotate("text", x = 0.32, y = 3, size=5,
  label = "Mean = 0.5", color="blue") +
  annotate("text", x = 0.6, y = 1, size=5,
  label = "Mean = 1")

Fig. 7 - Differences between the sample pairs

Code

p_y_pair_mean <- ggplot(main_df) +
  geom_density(aes(x=y_pair_mean_one)) +
  geom_density(aes(x=y_pair_mean_half), color="blue") +
  theme_classic()+ labs(x="Averages of the Pairs",
                        y="Density",
  title="Averages of the Pairs of Values") +
  theme(axis.text=element_text(color="black", size=12),
        axis.title=element_text(size=14))
p_y_pair_mean + annotate("text", x = 0.72, y = 4, size=5,
  label = "Mean = 0.5", color="blue") +
  annotate("text", x = 1.37, y = 1, size=5,
  label = "Mean = 1")

Code

q1_one <- quantile(main_df$pc_diff_one,probs=c(0.95))
q1_half <- quantile(main_df$pc_diff_half,probs=c(0.95))
q1_text_one <- sprintf("%.1f%% (Mean=1)", q1_one)
q1_text_half <- sprintf("%.1f%% (Mean=0.5)", q1_half)

p_pc_diff <- ggplot(main_df) +
  geom_density(aes(x=pc_diff_one)) +
  geom_density(aes(x=pc_diff_half), color="blue") +
  theme_classic() + labs(x="% Difference", y="Density",
  title="Difference of the Pairs Divided by their Means * 100") +
  theme(axis.text=element_text(color="black", size=12),
        axis.title=element_text(size=14))
p_pc_diff + geom_vline(xintercept=q1_one,
             linetype = "longdash") +
  geom_vline(xintercept=q1_half, color="blue",
             linetype = "dotted") +
  annotate("text", x = 150, y = 0.015, size=5,
  label = "95% Quantile") +
  annotate("text", x = 150, y = 0.006, size=5,
  label = q1_text_one) +
  annotate("text", x = 150, y = 0.004, size=5,
  label = q1_text_half, color="blue") +
  annotate("segment", x = 110, xend = 67, y = 0.015,
  yend = 0.015,colour = "black", linewidth = 0.8,
  arrow = arrow()) +
  annotate("segment", x = 115, xend = 67, y = 0.005,
  yend = 0,colour = "black", linewidth = 0.8,
  arrow = arrow())

Fig. 9 - Percent difference from the mean for the sample pairs

(Reminder, for the 5 preceding density plots, these were generated by sampling 50,000 pairs, each, at random, from normal distributions with means of 1 and 0.5, and both with CV%’s (\(1s\%\)’s) of 23%.)

When we count how many pairs had a percentage differences from their means of less than 65% and divide these by 50,000 (the total number of pair-samples), for these particular random samples, we obtain the values of 94.3% and 94.2% (for means of 1 and 0.5, respectively), which are reasonably close to the 95% predicted in the Precision and Bias Statement. (Stated another way, the areas under the last curves up to values of 67.2% and 67.1% is 95% of the areas for the sampling from the distributions with means of 1 and 0.5, respectively; both are reasonably close to 65% (\(d2s\%\)).)

It’s not exactly 95% (or 65%) because these are finite samples, and we only have estimates of \(1s\%\) and \(d2s\%\) to two significant figures. (Also, if the random sampling was performed with a different seed value (i.e., take different random samples of 50,000 sample-pairs), the results shown here would vary a little bit. A fixed seed is being used solely for the purpose of reproducibility.)

Basically, this demonstrates the validity of the calculation of the upper and lower bounds one might obtain between results from two separate labs that was predicted from the Precision and Bias Statement of ASTM C1293, to a very reasonable degree. And these are based on the Inter-Laboratory Study that produced the \(1s\%\) estimate.

Part 3: for Runners

“If he doesn’t get it this time, I’m not working with him anymore.”

– Mike Thomas, said in a bar in Washington D.C., talking about an inebriated colleague after pranking him for the fourth time with the same prank.

Let’s Put the Icing on this Visualization

The aim of this section is to take the basic plot showing the upper and lower bounds around an initial result from C1293, and fill it in with dots representing random samples of single results from another laboratory.

The problem is that the distribution to sample from is no longer a normal distribution. The distribution is obviously skewed as shown by the asymmetric upper and lower bounds.

In a normal distribution, the mean (the average value of the distribution), the median (if you arrange the individual elements in the distribution in order, the median is the value in the middle) and the mode (the most frequent value) are all the same value.

In a skewed normal distribution, this is no longer the case. The mode is still the highest value in the probability density plot, but the median and mean shift in the direction of the longest tail, with the mean changing faster than the median, in the direction of the longest tail.

We will construct the simplest deviation from the normal distribution, a skewed normal distribution, and fit the 0.025 and 0.975 quantiles of the upper and lower bounds, while keeping the mean and standard deviations as close as possible to the original normal distribution. (Why 0.025 and 0.975? This covers 95% of the area under the curve, with 2.5% on each tail.)

We’ll sample these skewed distributions from adjusted means ranging from approximately 0.04 to 1, which would cover just about any reasonable C1293 result (and then some) for a reactive aggregate. The amount of ‘skew’, and the proportional adjustments to the mean and standard deviation of the normal distribution of C1293 with a CV% of 23% (\(1s\%\)) will be maintained throughout.

The fine tuning of the fit was done manually. As before, plots of these skewed normal distributions will be displayed for a mean of 0.5 and 1, to emphasize the universality of the process for the range of C1293 values being plotted.

Why do all this? To put the icing on the visualization!

Code

# Using the sn package, create the skew normal for mean of 1 (and adjustments made elsewhere)
adj_mean_one <- 0.135
adj_sd_one <- 0.141
cp_one <- c(mean=1+adj_mean_one, s.d.=.23+adj_sd_one, gamma1=0.53)
dp_one <- cp2dp(cp_one, family="SN")

skew_one <- makeSECdistr(dp_one, family="SN", name="Skew_1")

# Let's plot
# Need data frame
dum_count <- seq(1,100000,1) # 100,000 samples
samp_one <- rsn(n=100000, dp=dp_one)
df_skew_one <- data.frame(x=dum_count, y=samp_one)

# Let's get some quantiles
q1_lower <- quantile(df_skew_one$y,probs=c(0.025))
q1_upper <- quantile(df_skew_one$y,probs=c(0.975))
# Let's turn into usable text
q1_text_lower <- sprintf("%.3f%% (at 2.5%%)", q1_lower)
q1_text_upper <- sprintf("%.3f%% (at 97.5%%)", q1_upper)

# Make the plot
sp_one <- ggplot(df_skew_one, aes(x=y)) + geom_density(adjust = 3)
sp_one + xlim(0,3) + theme_classic() + 
  labs(x=TeX(r"(%Expansion of Second Result ($Y_2$); First Result ($Y_1$) is 1%)"),
                                            y="Density",
  title="Fitted Skew Normal Distribution \nwith nominal Mean=1 and CV%=23%") +
  theme(axis.text=element_text(color="black", size=12),
        axis.title=element_text(size=14)) +
  geom_vline(xintercept=q1_lower, color="black",
             linetype = "dotted") +
  geom_vline(xintercept=q1_upper, color="black",
             linetype = "dotted") +
  annotate("text", x = 0.3, y = 0.85, size=5,
  label = q1_text_lower) +
  annotate("text", x = 2.1, y = 0.85, size=5,
  label = q1_text_upper)

Fig. 10 - Fitted skew normal distribution around Mean = 1

Code

# Now for the case where the mean is 0.5
cp_half <- c(mean=0.5+adj_mean_one*0.5, s.d.=.115+adj_sd_one*0.5, gamma1=0.53)
dp_half <- cp2dp(cp_half, family="SN")

skew_half <- makeSECdistr(dp_half, family="SN", name="Skew_half")

# Now let's use ggplot (again)
# Need (another) data frame
dum_count <- seq(1,100000,1) # 100,000 samples
samp_half <- rsn(n=100000, dp=dp_half)
df_skew_half <- data.frame(x=dum_count, y=samp_half)

# Let's get some quantiles
q1_lower_half <- quantile(df_skew_half$y,probs=c(0.025))
q1_upper_half <- quantile(df_skew_half$y,probs=c(0.975))
# Let's turn into usable text
q1_text_lower_half <- sprintf("%.3f%% (at 2.5%%)", q1_lower_half)
q1_text_upper_half <- sprintf("%.3f%% (at 97.5%%)", q1_upper_half)

# Make the plot
sp_half <- ggplot(df_skew_half, aes(x=y)) + geom_density(adjust = 3)
sp_half + xlim(0,1.5) + theme_classic() + 
  labs(x=TeX(r"(%Expansion of Second Result ($Y_2$); First Result ($Y_1$) is 0.5%)"),
                                            y="Density",
  title="Fitted Skew Normal Distribution \nwith nominal Mean=0.5 and CV%=23%") +
  theme(axis.text=element_text(color="black", size=12),
        axis.title=element_text(size=14)) +
  geom_vline(xintercept=q1_lower_half, color="black",
             linetype = "dotted") +
  geom_vline(xintercept=q1_upper_half, color="black",
             linetype = "dotted") +
  annotate("text", x = 0.15, y = 1.8, size=5,
  label = q1_text_lower_half) +
  annotate("text", x = 1.01, y = 1.8, size=5,
  label = q1_text_upper_half)

Fig. 11 - Fitted skew normal distribution around Mean = 0.5

As can be seen from these two plots, this empirically derived, skewed normal distribution, does a very good job of matching the upper and lower limits at the 95% confidence interval while keeping the mode close to the value of the first result (\(Y_1\)). Compare the values shown in Fig.s 10 and 11 to the graph in Part 1 (Fig. 3) that shows the upper and lower bounds for C1293, which was simply constructed from the precision and bias statement.

So now we have a construct for a skewed normal distribution matching our test method, from which we can sample and then use to enhance our visualization by showing them in an overlain scatter plot. And the final touch will be to color the points according to distance from the original value.

Here it is:

Code

# Let's create the vector of Y1 values
yf_1 <- seq(0.04, 1, 0.000192)
yf_2 <- numeric(length = length(yf_1))
upper_lim <- numeric(length = length(yf_1))
lower_lim <- numeric(length = length(yf_1))
dist_y_s <- numeric(length = length(yf_1))
# length(yf_1)
for (i in 1:length(yf_1)) {
  cp_f <- c(mean=yf_1[i]+adj_mean_one*yf_1[i], s.d.=.23*yf_1[i]+adj_sd_one*yf_1[i], gamma1=0.53)
  dp_f <- cp2dp(cp_f, family="SN")
  yf_2[i] <- rsn(1,dp=dp_f)
  upper_lim[i] <- yf_1[i]*(2+d2s_c1293/100)/(2-d2s_c1293/100)
  lower_lim[i] <- yf_1[i]*(2-d2s_c1293/100)/(2+d2s_c1293/100)
  if(yf_2[i]>yf_1[i]) {
    dist_y_s[i] <- (yf_2[i]-yf_1[i])/(upper_lim[i]-yf_1[i])
  } else if (yf_2[i]<yf_1[i]){
    dist_y_s[i] <- (yf_1[i]-yf_2[i])/(yf_1[i]-lower_lim[i])
  } else{
    dist_y_s[i] <- 0
  }
}
 # Now make the data frame
df_scat <- data.frame(yf_1=yf_1, yf_2=yf_2, dist_y_s=dist_y_s,
                      lower_lim=lower_lim, upper_lim=upper_lim)
p_scat <- ggplot(data=df_scat) + geom_point(aes(x=yf_1, y=yf_2, color=dist_y_s), alpha=0.95, size=0.6, show.legend = FALSE) +
  scale_color_gradient(low = "black", high = "red")
p_scat + theme_classic() +
  geom_line(aes(x=yf_1, y=yf_1), linewidth=1.15) +
  geom_line(aes(x=yf_1, y=upper_lim), color="red") +
  geom_line(aes(x=yf_1, y=lower_lim), color="red") +
  theme_classic() + ylim(0, 2) + 
  labs(x="%Expansion (First Result)", y="%Expansion (Second Result)",
  title="ASTM C1293 - Upper and Lower Bounds from Different Lab Results
  (2nd Result sampled 5000 times - shown to better visualize the spread)") +
  theme(axis.text=element_text(color="black", size=12), 
        panel.grid.major=element_line(color="black",
        linewidth = 0.25, linetype = 2), 
        axis.title=element_text(size=14)) +
  annotate("text", x = 0.75, y = 0.15, size=5,
  label = "Given any result \n along this line") +
  annotate("text", x = 0.25, y = 1.5, size=5,
  label = "Results from another \n lab should fit
  between these limits, \n 19 times out of 20") +
  annotate("segment", x = 0.77, xend = 0.80, y = 0.3,
  yend = 0.80,colour = "black", linewidth = 1.4,
  arrow = arrow()) +
  annotate("segment", x = 0.2, xend = 0.25, y = 1.1,
  yend = 0.13,colour = "red", linewidth = 1.5,
  arrow = arrow()) +
  annotate("segment", x = 0.3, xend = 0.4, y = 1.14,
  yend = 0.79,colour = "red", linewidth = 1.5,
  arrow = arrow())

Fig. 12 - C1293 spread in data from single results from 2 different labs

(This html document was created using the Flatly Bootswatch theme in Quarto with the R language in the RStudio environment. The packages used are listed in the beginning section “Libraries”.)