--- title: "PPQ Power Assessment Theoretical Results" author: "Yalin Zhu (yalin.zhu@merck.com)" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{PPQ Power Assessment Theoretical Results} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- # Preliminaries Without loss of generality, suppose $n$ outcomes of the Critical Quality Attribute (CQA) are normally distributed, which is denoted by $X_i \stackrel{\text{i.i.d}}{\sim} \mathcal{N}(\mu, \sigma^2)$, where $i=1,\dots, n$, then the distributions of sample mean and standard deviation are as known: \begin{equation} \bar{X} \sim \mathcal{N}(\mu, \frac{\sigma^2}{n}) \end{equation} and \begin{equation} \label{diststd} \dfrac{(n-1)S^2}{\sigma^2} \sim \chi^2 (n-1). \end{equation} Moreover, sample mean and sample standard deviation are independent under normal distribution assumption. Denote the lower and upper specification limits as $L$ and $U$, respectively. The prediction or tolerance interval can be expressed by \begin{equation} \left[ Y_1, Y_2 \right] = \left[ \bar{X} - kS, \ \bar{X} + kS \right] , \end{equation} where $k$ is a specific multiplier for the interval. For example, for prediction interval, $k=t_{1-\alpha/2,n-1}\sqrt{1+\frac{1}{n}}$. # Specification test for one release batch The outcome at release can be any one of the sample, so $X_{rl} \sim \mathcal{N}(\mu, \sigma^2)$, then the probability of passing PPQ at release should be \begin{equation} \begin{split} \Pr(\text{Passing Specification for Release}) & = \Pr(L \le X_{rl} \le U) \\ & = \Phi (U) - \Phi(L) \end{split} \end{equation} This probability is very easy to calculate using software, such as `pnorm()` in R. # Test for PPQ Batches \begin{equation} \label{probpass} \begin{split} \Pr(\text{Passing a Single PPQ Batch}) & = \Pr(L \le Y_1 \le Y_2 \le U) \\ & = \int_{L}^{U} \int_{L}^{y_2}f_{Y_1,Y_2}(y_1, y_2) dy_1 dy_2 \end{split} \end{equation} Now it is essential to obtain the bivariate joint distribution of the lower and upper prediction/tolerance interval, that is, find joint probability density function (PDF) $f_{Y_1,Y_2}(y_1,y_2)$. Since $Y_1=\bar{X} - k S$ and $Y_2=\bar{X} + k S$, we can use another bivariate PDF $f_{\bar{X},S}(x,s)$ to calculate $f_{Y_1,Y_2}(y_1,y_2)$ by using Jacobian transformation. Solve $\bar{X}$ and $S$ as $x=\dfrac{y_1+y_2}{2}$ and $s=\dfrac{y_2-y_1}{2k}$, then Jacobian of the transformation is \begin{equation} |J|= \left| \begin{array}{cc} \dfrac{\partial x}{\partial y_1} & \dfrac{\partial x}{\partial y_2}\\ \\ \dfrac{\partial s}{\partial y_1} & \dfrac{\partial s}{\partial y_2} \end{array} \right| = \left| \begin{array}{cc} \dfrac{1}{2} & \dfrac{1}{2}\\ \\ -\dfrac{1}{2k} & \dfrac{1}{2k} \end{array} \right| = \dfrac{1}{2k}. \end{equation} Thus, (\ref{probpass}) can be calculated as \begin{equation} \label{extend} \begin{split} \int_{L}^{U} \int_{L}^{y_2}f_{Y_1,Y_2}(y_1, y_2) dy_1 dy_2 & = \int_{L}^{U} \int_{L}^{y_2}f_{\bar{X},S}\left(\dfrac{y_1+y_2}{2}, \dfrac{y_2-y_1}{2k}\right) |J| dy_1 dy_2 \\ & = \dfrac{1}{2k} \int_{L}^{U} \int_{L}^{y_2}f_{\bar{X}}\left(\dfrac{y_1+y_2}{2}\right) f_{S}\left( \dfrac{y_2-y_1}{2k}\right) dy_1 dy_2. \end{split} \end{equation} The second equation follows from normal sample mean and standard deviation being independent. Similarly, we can obtain the PDF of sample standard deviation $f_S(s)$. By (\ref{diststd}), let $v= \dfrac{(n-1)s^2}{\sigma^2}$, then Jacobian of the transformation is \begin{equation} |J| = \left|\dfrac{dv}{ds}\right| = \dfrac{2(n-1)s}{\sigma^2}. \end{equation} Thus, \begin{equation} \label{Spdf} \begin{split} f_S(s) & = f_V\left(\dfrac{(n-1)s^2}{\sigma^2}\right) |J| \\ & = \dfrac{2(n-1)s}{\sigma^2}f_V\left(\dfrac{(n-1)s^2}{\sigma^2}\right) \end{split} \end{equation} Plug (\ref{Spdf}) in (\ref{extend}), we can get the final results. \begin{equation} \begin{split} & \Pr(\text{Passing a Single PPQ Batch}) \\ = & \dfrac{1}{2k} \int_{L}^{U} \int_{L}^{y_2}f_{\bar{X}}\left(\dfrac{y_1+y_2}{2}\right) \dfrac{2(n-1)\dfrac{y_2-y_1}{2k}}{\sigma^2}f_V\left\{\dfrac{(n-1)\left[\dfrac{y_2-y_1}{2k}\right]^2}{\sigma^2}\right\} dy_1 dy_2 \\ = & \dfrac{n-1}{2k^2 \sigma^2}\int_{L}^{U} \int_{L}^{y_2}f_{\bar{X}}\left(\dfrac{y_1+y_2}{2}\right) f_V\left\{\dfrac{(n-1)(y_2-y_1)^2}{4k^2\sigma^2}\right\} (y_2-y_1) dy_1 dy_2, \end{split} \end{equation} where $\bar{X} \sim \mathcal{N}(\mu, \frac{\sigma^2}{n})$ and $V \sim \chi^2 (n-1)$. Then this quantity can be easily calculated by software, such as functions `dnorm()`, `dchisq()` and `integrate()` in R. We can also calculate the probability of passing $m$ PPQ batches, then under the assumption of independence and similar expected performance across batches, the probability will be $$ \Pr(\text{Passing } m \text{ batches }) = \{\Pr(\text{Passing a Single PPQ Batch}) \}^m$$