Analysis of approximations of GUM supplement 2 based non-Gaussian PDFs of measurement models with Rosenblatt Gaussian transformation mappings

Vishal Ramnath

doi:10.1051/ijmqe/2019018

All issues

Volume 11 (2020)

Int. J. Metrol. Qual. Eng., 11 (2020) 2

Full HTML

Open Access

Issue		Int. J. Metrol. Qual. Eng. Volume 11, 2020


Article Number		2
Number of page(s)		16
DOI		https://doi.org/10.1051/ijmqe/2019018
Published online		29 January 2020

Int. J. Metrol. Qual. Eng. 11, 2 (2020)

Research Article

Analysis of approximations of GUM supplement 2 based non-Gaussian PDFs of measurement models with Rosenblatt Gaussian transformation mappings

Vishal Ramnath^*

Department of Mechanical and Industrial Engineering, University of South Africa, Private Bag X6, Florida 1710, South Africa

^* Corresponding author: ramnav@unisa.ac.za

Received: 12 November 2019
Accepted: 10 December 2019

Abstract

In scientific metrology practise the application of Monte Carlo simulations with the aid of the GUM Supplement 2 (GS2) technique for performing multivariate uncertainty analyses is now more prevalent, however a key remaining challenge for metrologists in many laboratories is the implicit assumption of Gaussian characteristics for summarizing and analysing measurement model results. Whilst non-Gaussian probability density functions (PDFs) may result from Monte Carlo simulations when the GS2 is applied for more complex non-linear measurement models, in practice results are typically only reported in terms of multivariate expected and covariance values. Due to this limitation the measurement model PDF summary is implicitly restricted to a multivariate Gaussian PDF in the absence of additional higher order statistics (HOS) information. In this paper an earlier classical theoretical result by Rosenblatt that allows for an arbitrary multivariate joint distribution function to be transformed into an equivalent system of Gaussian distributions with mapped variables is revisited. Numerical simulations are performed in order to analyse and compare the accuracy of the equivalent Gaussian system of mapped random variables for approximating a measurement model’s PDF with that of an exact non-Gaussian PDF that is obtained with a GS2 Monte Carlo statistical simulation. Results obtained from the investigation indicate that a Rosenblatt transformation offers a convenient mechanism to utilize just the joint PDF obtained from the GS2 data in order to both sample points from a non-Gaussian distribution, and also in addition which allows for a simple two-dimensional approach to estimate coupled uncertainties of random variables residing in higher dimensions using conditional densities without the need for determining parametric based copulas.

Key words: metrology uncertainty / GUM Supplement 2 / Monte Carlo / non-Gaussian PDF / Rosenblatt transformation / pressure balance

© V. Ramnath, published by EDP Sciences, 2020

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

1 Introduction

1.1 Research motivation

The original Guide to the Uncertainty of Measurement (GUM) [1] utilizing an algebraic sensitivity coefficient and mixed frequency and Bayesian statistics approach has found widespread utilization in metrology practise for uncertainty analysis, but is however only strictly valid for linearised measurement models. An uncertainty analysis of non-linear models without any limiting approximations may instead be achieved with the application of the Monte Carlo simulation (MCS) technique using the GUM Supplement 1 (GS1) [2] and GUM Supplement 2 (GS2) [3] approaches for univariate and multivariate models which supersede the original GUM technique. When the GS1 and GS2 are utilized a fully Bayesian statistics framework is utilized for modelling consistency, whilst newer mathematical refinements to modernise and update the original GUM are being further developed as discussed by Bich et al. [4].

As a result both the GS1 and GS2 techniques are considered suitable for application to generalized and non-linear measurement models which are problematic for the original GUM in its earlier form, where a key limitation in the application of both the GS1 and the GS2 is in the reporting of the summaries of the MCS simulation results for a measurement model’s uncertainty if this deviates from a univariate or multivariate Gaussian distribution.

In the case of the GS1 the use of extended lambda distributions (ELDs) as developed by Harris et al. [5] has achieved good success, and has subsequently been applied to non-Gaussian univariate models such as that for a gas pressure balance as discussed by Ramnath [6]. The case for non-Gaussian multivariate models is more complex but is in principle directly amenable with the use of multivariate copulas as originally developed by Possolo [7] for metrology uncertainty analysis problems. This approach has subsequently been implemented using parametrized vine copulas for an oil pressure balance problem as discussed by Ramnath [8] in order to demonstrate the utility of this approach for summarizing a non-Gaussian joint PDF in mechanical metrology applications.

Consequently, although the use of copulas thus presents an attractive theoretical tool for directly summarizing arbitrary joint PDFs, either with parametrized vine copulas as outlined by Nagler and Czado [9] or with non-parametrized empirical beta copulas as outlined by Segers et al. [10], the construction and utilization of copulas is still a mathematically complex and challenging application in the absence of readily available software routines and technical guidelines for many practising metrologists.

As a result the formulation of a higher dimensional joint PDF into an equivalent readily reported system of mapped variables which follow a Gaussian distribution with expected values and variances using a Rosenblatt based transformation, thus presents a potentially simpler and more convenient approach for many practising metrologists for analysing GS2 data which may potentially exhibit non-Gaussian characteristics if a functional form of the joint PDF is available.

1.2 Research contributions

This paper focuses on using numerical simulations in order to investigate and compare the accuracy of a mathematically equivalent sequential system of mapped Gaussian PDFs with that of an exact non-Gaussian trivariate joint PDF, that is obtained with representative synthesized GS2 MCS statistical data of a pressure balance.

Techniques for conveniently constructing the equivalent system which is composed of univariate marginals and conditional multivariatedistributions that are generated with a Rosenblatt transformation of mapped random variables from the GS2 MCS data are demonstrated using proposed developed numerical schemes. Mechanisms for sampling from non-Gaussian distributions are developed and validated by working out the Kullback-Leibler divergence between the actual non-Gaussian joint PDF and the equivalent system of Gaussian PDFs.

Finally, alternative techniques for estimating the uncertainties of the model results directly in terms of the conditional densities from the mapped measurement model variables without the utilization of copulas are demonstrated for practical metrology problems that may be encountered in high accuracy calibrations and laboratory inter-comparisons.

2 Literature review

2.1 Non-Gaussian PDFs in metrology studies

The conventional practise in reliability and quality engineering studies as an overlapping field of metrology uncertainty analysis is to approximate a multivariate PDF in terms of statistical moments.

One approach by Acar et al. [11] uses the additive decomposition approach by Rahman and Xu [12] to approximate a univariate function Y = f(X) for an input $X = {[X_{1}, \dots, X_{N}]}^{T}$ $\boldsymbol{X}=[X_1,\ldots,X_N]^{\mathsf{T}}$ as a summation of one-dimensional functions such that the approximation Ŷ (X) is $\hat{Y} (X) = \sum_{j = 1}^{N} Y (μ_{1}, \dots, μ_{(j - 1)}, X_{j}, μ_{(j + 1)}, \dots, μ_{N}) - (N - 1) y (μ_{1}, \dots, μ_{N})$ $\hat{Y}(\boldsymbol{X}) = \sum_{j=1}^N Y(\mu_1,\ldots,\mu_{(j-1)},X_j,\mu_{(j+1)},\ldots,\mu_N) - (N - 1) y(\mu_1,\ldots,\mu_N)$ where μ_j, j = 1, …, N are the mean values of the random variables X_j. For this approach each of the independent random variables follows an underlying probability distribution g_j (X_j) and the problem is equivalent to the conventional metrology problem of determining the uncertainty of a measurand model specified as Y = f(X) using the GS1, where the principle difference is that Higher Order Statistics (HOS) information may be incorporated.

The application of HOS moments may also be extended to approximating the joint PDF for multivariate functions Y (X) i.e. for a measurement model of the form Y = f(X) where $Y = {[Y_{1}, \dots, Y_{m}]}^{T}$ $\boldsymbol{Y}=[Y_1,\ldots,Y_m]^{\mathsf{T}}$ is a vector of outputs as discussed by Bretthorst [13] using the maximum entropy method of moments (MEM). The application of the MEM has however until very recently been constrained by the following five key issues in metrology studies, namely:

The data samples themselves are not used but rather the statistical moments which is not consistent with a Bayesian approach
The number of statistical moments and associated Lagrange multipliers are unknown
There is no consistent optimization search algorithm
There isno consistent approach to assign uncertainties to the Lagrange multipliers in accordance with a Bayesian framework
There is no consistent approach to assign uncertainties to the PDF in accordance with a Bayesian framework

The above combination of issues have subsequently recently been resolved for metrology uncertainty analysis of measurement systems using a combined maximum entropy and Bayesian statistics approach developed by Armstrong et al. [14]. This new MaxEnt/Bayesian approach can therefore now in principle be applied to the MCS method either for problems in which there is a heavy computational cost, or alternately for highly non-linear problems for which non-Gaussian PDFs may be present. Investigations performed by Armstrong et al. [14] indicate that the MaxEnt/Bayesian method is able to correctly recover the measurement model’s PDF for Gaussian, non-Gaussian and lognormal distributions for sample sizes larger than M > 60 simulation events when using sixth order HOS moments.

This more recent work may thus be contrasted with many earlier metrology studies using the GS2 method that were specifically restricted tosecond order statistical moments i.e. a first order expected value and a second order covariance. In the conventional GS2 approach although HOS information is implicitly encoded and readily available within the MCS data typically denoted as Ω, a multivariateGaussian PDF for a measurement model is nevertheless always implicit based on maximum statistical entropy reasons if only second order statistical moments are available. This joint PDF summary approach in the GS2 is therefore common both to models with a low computational intensity such as algebraic based models that may generate a high number of Monte Carlo simulation events, as well as to measurement systems with an intrinsic high computational intensity such as finite element method (FEM) and finite volume method (FVM) based models that may only generate a small number of Monte Carlo simulation events due to limited computational resources, since only second order statistical moments are reported for summarizing the joint PDF.

Computationally intensive models such as those governed by FEM and/or FVM based partial differential equation solutions in mechanical and electrical metrology systems generally make an implicit working approximation to justify a Gaussian distribution when only the expected value $\tilde{y}$ $\widetilde{\boldsymbol{y}}$ vector as a first order statistical moment andthe covariance $U_{\tilde{y}}$ $\mathbf{U}_{\widetilde{\boldsymbol{y}}}$ matrix as asecond order statistical moment for a measurement model y = f(x) are available, due to the finite number of simulation events that are available due to limited computational resources. An additional implicit assumption of a Gaussian PDF is also utilized for many low computational intensity models that could generate a larger number of Monte Carlo simulation events since both GUM supplements are presently explicitly restricted to only utilizing first and second order statistical moments. Due to this limitation the GS2 whilst theoretically allowing for the possibility of non-Gaussian PDFs within a Bayesian statistics modelling framework as encoded within Ω, and which may in principle be used to generate HOS information, is therefore unable to offer any specific technical guidance on how to utilize HOS information for modelling, summarizing and analysing non-Gaussian characteristics in uncertainty analysis work.

If a metrologist thus allows for the possibility of a measurement model that may exhibit a non-Gaussian PDF that cannot adequately be approximated with a Gaussian PDF, it then logically becomes necessary to either model and summarize non-Gaussian PDFs either directly with multivariate copulas, or indirectly with models that utilize HOS information such as third, fourth and higher order statistical moments, or alternately to completely avoid approximations altogether but to simply utilize the actual joint PDF obtained from the MCS GS2 data Ω.

In this latter approach, an equivalent system that can model and summarize the non-Gaussian characteristics using mapped random variables of transformed Gaussian PDFs as statistical ‘building blocks’ from the actual joint PDF, thus presents an appealing theoretical alternative for summarizing and analysing a non-Gaussian joint PDF.

2.2 Review of Rosenblatt transformation method

The development of the transformation that maps random variables from a general and possibly non-Gaussian distribution with coupling effects between the random variables, to an equivalent system of new mapped random variables that follow Gaussian distributions was originally discovered by Rosenblatt [15].

In the classical paper by Rosenblatt [15] $X = {[X_{1}, \dots, X_{n}]}^{T} \in ℝ^{n}$ $\boldsymbol{X} = [X_1,\ldots,X_n]^{\mathsf{T}} \in \mathbb{R}^n$ was a random vector with a cumulative distribution function F(X₁, …, X_n) such that a new random vector $Z = {[Z_{1}, \dots, Z_{n}]}^{T}$ $\boldsymbol{Z} = [Z_1, \ldots, Z_n]^{\mathsf{T}}$ was generated using a transformation T such that Z = T(X). This transformation was specified such that the corresponding random variables z₁, …, z_n were $\begin{array}{l} z_{1} & = & P (X_{1} ⩽ x_{1}) \\ z_{2} & = & P (X_{2} ⩽ x_{2} | X_{1} = x_{1}) \\ ⋮ \\ z_{n} & = & P (X_{n} ⩽ x_{n} | X_{1} = x_{1}, \dots, X_{n - 1} = x_{(n - 1)}) \end{array}$ $\begin{eqnarray*} z_1 & = & P(X_1 \leqslant x_1) \nonumber \\ z_2 & = & P(X_2 \leqslant x_2|X_1 = x_1) \nonumber \\ & \vdots & \nonumber \\ z_n & = & P(X_n \leqslant x_n|X_1=x_1,\ldots,X_{n-1}=x_{(n-1)}) \end{eqnarray*}$ (1)

Subsequent studies by Lebrun and Dotfoy [16] using results from more recent copula developments established the equivalence between the original Rosenblatt transformation and later refinements such as the generalized Nataf transformation. As a result in this paper we will restrict the investigation to the classical Rosenblatt transformation which maps from the original distribution to a system of Gaussian distributions.

The Rosenblatt transformation using a more modern approach by Lebrun and Dotfoy [16] is defined as a composition of two transformations such that if $X \in ℝ^{n}$ $\boldsymbol{X} \in \mathbb{R}^n$ is a continuous random variable defined by univariate marginal cumulative distribution functions and its corresponding copula C, then the Rosenblatt transformation denoted as T^R is defined by a composition of two sequential transformations such that $U = T^{R} (X) = T_{2}^{R} ° T_{1}^{R} (X)$ $\boldsymbol{U} = \mathsf{T}^R(\boldsymbol{X}) = \mathsf{T}_2^R \circ \mathsf{T}_1^R(\boldsymbol{X})$ .

The first transformation $T_{1}^{R} : (X \subseteq ℝ^{n}) \to (Y \subseteq ℝ^{n})$ $\mathsf{T}_1^R : (\boldsymbol{X} \subseteq \mathbb{R}^n) \rightarrow (\boldsymbol{Y} \subseteq \mathbb{R}^n)$ maps the original variate points x ∈ X into a corresponding space of points y ∈ Y where the transformation $T_{1}^{R} (X) = Y$ $\mathsf{T}_1^R(\boldsymbol{X})=\boldsymbol{Y}$ is defined as $Y = {[F_{X} (x_{1}), \dots, F_{X} (x_{n} | x_{1}, \dots, x_{(n - 1)})]}^{T}$ $\boldsymbol{Y} = [F_{\boldsymbol{X}}(x_1),\ldots, F_{\boldsymbol{X}}(x_n|x_1,\ldots,x_{(n-1)})]^{\mathsf{T}}$ . In this approach F_X (x_j|x₁, …, x_j−1) is the conditional cumulative distribution function (CDF) for a random variable x_j for j = 1, …, n and it follows from standard statistical theory that any conditional CDF automatically follows a rectangular PDF i.e. F_X(x_j|x₁, …, x_j−1) ~ R[0, 1] for j = 1, …, n.

The second transformation is $T_{2}^{R} : (Y \subseteq ℝ^{n}) \to (U \subseteq ℝ^{n})$ $\mathsf{T}_2^R : (\boldsymbol{Y} \subseteq \mathbb{R}^n) \rightarrow (\boldsymbol{U} \subseteq \mathbb{R}^n)$ where $U = T_{2}^{R} (Y)$ $\boldsymbol{U} = \mathsf{T}_2^R(\boldsymbol{Y})$ and is defined as $U = {[Φ^{- 1} (Y_{1}), \dots, Φ^{- 1} (Y_{n})]}^{T}$ $\boldsymbol{U} = [\Phi^{-1}(Y_1),\ldots, \Phi^{-1}(Y_n)]^{\mathsf{T}}$ where Φ⁻¹ denotes the inverse normal transformation, formally defined as $Φ^{- 1} (x) = \sqrt{2} {erf}^{- 1} (2 x - 1)$ $\Phi^{-1}(x) = \sqrt{2} \textrm{erf}^{-1} (2x - 1) \nonumber$ where erf(x) denotes the error function as discussed by van Albada and Robinson [17]. In this second transformation it logically follows that the final transformed variable follows a Gaussian PDF such that U_j = Φ⁻¹(Y_j) ~ N(μ = 0, σ² = 1) for j = 1, …, n, due to the explicit choice of a mapping function.

The essential utility of the Rosenblatt transformation is therefore that it allows for a random variable v that has some arbitrary distribution P_v(v) to be mapped to the cumulative distribution which is always uniformly i.e. rectangularly distributed on the interval R[0, 1], and then the uniform distribution is in turn mapped to a standard normal distribution such that $T_{1}^{R} : (v ~ P_{v} (v)) \to (F ~ R [0, 1])$ $\mathsf{T}_1^R : (v \sim P_v(v)) \rightarrow (F \sim R[0, 1])$ and then $T_{2}^{R} : (F ~ R [0, 1]) \to (u ~ N (μ = 0, σ^{2} = 1))$ $\mathsf{T}_2^R: (F \sim R[0, 1]) \rightarrow (u \sim N(\mu = 0, \sigma^2 = 1))$ .

Since it is clear that the distributions always refer to X the subscript in the conditional distribution F_X(x_j|x₁, …, x_(j−1)) may be dropped for notational brevity in the rest of the paper so that the transformed variables may then be combined and the Rosenblatt transformation $T_{2}^{R} ° T_{1}^{R} (X)$ $\mathsf{T}_2^R \circ \mathsf{T}_1^R (\boldsymbol{X})$ in the present context may then simply be written as $\begin{matrix} u_{1} & = & Φ^{- 1} (F (x_{1})) \\ u_{2} & = & Φ^{- 1} (F (x_{2} | x_{1})) \\ ⋮ \\ u_{n} & = & Φ^{- 1} (F (x_{n} | x_{1}, x_{2}, \dots, x_{(n - 1)})) \end{matrix}}$ $\begin{eqnarray*} \left. \begin{array}{ccl} u_1 & = & \Phi^{-1}\left( F(x_1)\right) \\ u_2 & = & \Phi^{-1}\left( F(x_2|x_1)\right) \\ & \vdots & \\ u_n & = & \Phi^{-1}\left( F(x_n|x_1,x_2,\ldots,x_{(n-1)})\right) \end{array} \right\}\end{eqnarray*}$ (2)

If the normal distribution Φ is applied to all of the mapped Gaussian variables i.e. Φ(u_j) = Φ(Φ⁻¹F(x_j|x₁, …, x_j−1) for j = 1, …, n an equivalent system of mapped variables corresponding to $T_{1}^{R} (X)$ $\mathsf{T}_1^R (\boldsymbol{X})$ that follow a rectangular distribution may be written as $\begin{matrix} r_{1} & = & F (x_{1}) \\ r_{2} & = & F (x_{2} | x_{1}) \\ ⋮ \\ r_{n} & = & F (x_{n} | x_{1}, x_{2}, \dots, x_{(n - 1)}) \end{matrix}}$ $\begin{eqnarray*} \left. \begin{array}{ccl} r_1 & = & F(x_1) \\ r_2 & = & F(x_2|x_1) \\ & \vdots & \\ r_n & = & F(x_n|x_1,x_2,\ldots,x_{(n-1)}) \end{array} \right\}\end{eqnarray*}$ (3)

Noting that $u_{j} = Φ^{- 1} (F (x_{j} | x_{1}, x_{2}, \dots, x_{(j - 1)}))$ $u_j = \Phi^{-1}\left( F(x_j|x_1,x_2,\ldots, x_{(j-1)})\right)$ with equation (2) and r_j = F(x_j|x₁, x₂, …, x_(j−1)) with equation (3) are mathematically interchangeable since a known mapping and inverse mapping function exist between these random variables, it may be concluded that either system may be used to specify a Rosenblatt transformation. This is due to the fact that the normal distribution Φ and inverse normal distribution Φ⁻¹ may be used to conveniently map between the random variables r_j and u_j such that u_j = Φ⁻¹(r_j) and r_j = Φ(u_j) for j = 1, …, n.

The above system of conditional CDFs that underpin the Rosenblatt transformation are connected to the equivalent decomposition of an arbitrary joint PDF in terms of conditional PDFs such that $\begin{array}{l} f (x_{1}, \dots, x_{n}) & = & f (x_{1}) \times f (x_{2} | x_{1}) \times f (x_{3} | x_{1}, x_{2}) \\ \times \dots \times f (x_{n} | x_{1}, \dots, x_{n - 1}) . \end{array}$ $\begin{eqnarray*} f(x_1,\ldots,x_n) & = & f(x_1) \times f(x_2|x_1) \times \: f(x_3|x_1, x_2) \nonumber \\ & & \times \cdots \times \: f(x_n|x_1,\ldots,x_{n-1}). \end{eqnarray*}$ (4)

In the above equation the order of the decomposition is arbitrary and the equation may equivalently be written in the case of a trivariate joint PDF f(x₁, x₂, x₃) as f(x₁, x₂, x₃) = f(x₃)f(x₂|x₃)f(x₁|x₃, x₂) with similar generalizations to higher dimensional joint PDFs.

Referring to the above system it may be seen that if a random sample for u_j, j = 1, …, n is generated from a standard normal distribution N(μ = 0, σ² = 1) then the corresponding random variable for v may be obtained from solving the equation Φ(u) = F(v). As a result a joint PDF f(x₁, …, x_n) with an arbitrary and potentially non-Gaussian distribution may with the aid of the Rosenblatt transformation be modelled as an equivalent system of Gaussian distributions using the underlying conditional distributions.

Extensions for a generalized Rosenblatt transformation that allow for mappings of arbitrary distributions where u_j does not necessarily follow an underlying Gaussian distribution with N(μ = 0, σ² = 1) are also theoretically possible as discussed by Chang [18], but are not considered further in this paper since this is conceptually straightforward if the first step r_j = F(x_j|x₁, …, x_j−1) ~ R[0, 1] has been obtained and another mapping function Ψ : r → u is specified.

From a literature search it may be observed that the inverse cumulative distribution function for a normal distribution Φ⁻¹ is readily available in both commercial as well as open source software packages. As an example in the commercial package Matlab (19) Φ⁻¹ is computed as u = icdf(’Normal’, p, A) where p is a probability and A = [mu, sigma] is an input of thedistributions parameters which in the case of a normal distribution is simply the mean μ and standard deviation σ, whilst in the open source package GNU Octave [20] Φ⁻¹ is computedas u = norminv(p, mu, sigma).

Since Φ⁻¹ is relatively straightforward to calculate the practical application of utilizing a Rosenblatt transformation as specified by equation (3) for a metrology uncertainty analysis of non-Gaussian PDFs obtained with a GS2 simulation, then reduces to the following two conceptual and functional steps, namely:

Selecting a convenient technique/software for calculating the numerical values of the conditional distributions F(x_j |x₁, …, x_n) from theMCS data
Selectinga convenient technique/software for algebraically summarizing the conditional distributions F(x_j|x₁, …, x_n) from theprevious numerical data

3 Mathematical models

3.1 Generating MCS GS2 data

In a GS2 approach multivariate Monte Carlo data are generated for a model that may have an explicit form Y = f(X) where $X = {[X_{1}, \dots, X_{N}]}^{T}$ $\boldsymbol{X}=[X_1,\ldots,X_N]^{\mathsf{T}}$ is a model input and $Y = {[Y_{1}, \dots, Y_{m}]}^{T}$ $\boldsymbol{Y}=[Y_1,\ldots,Y_m]^{\mathsf{T}}$ a model output, or an equivalent implicit form h(Y, X) = 0 for generality as discussed by Harris et al. [5] if the model cannot be simplified to an explicit form. The input X has a corresponding joint PDF g_X(ξ) where $ξ = {[ξ_{1}, \dots, ξ_{N}]}^{T}$ $\boldsymbol{\xi}=[\xi_1,\ldots,\xi_N]^{\mathsf{T}}$ is a random variable, whilst the output Y has a corresponding joint PDF g_Y(η) where $η = {[η_{1}, \dots, η_{m}]}^{T}$ $\boldsymbol{\eta}=[\eta_1,\ldots,\eta_m]^{\mathsf{T}}$ is a random variable.

Regardless of the form of the model if a GS2 technique is implemented for M Monte Carlo simulation events by solving y_r = f(x_r) or h (y_r, x_r) = 0 for r = 1, …, M for sampled random inputs x₁, …, x_M then corresponding random outputs y₁, …, y_M will be generated from the solution of the model. The outputs y₁, …, y_M may then be conveniently grouped as a m × M dimensional matrix Ω to represent the MCS model output data. Changing symbols and writing the matrix as $Ω = {[x_{1}, \dots, x_{M}]}^{T}$ $\mathbf{\Omega}=[\boldsymbol{x}_1,\ldots,\boldsymbol{x}_M]^{\mathsf{T}}$ for M Monte Carlo simulation events where each random variable has a dimension of m × 1 without loss of generality for notational convenience we consider the MCS data to take the general form $\begin{array}{l} Ω & = & [x_{1}, \dots, x_{M}] . \end{array}$ $\begin{eqnarray*} \mathbf{\Omega} & = & \left[ \boldsymbol{x}_1, \ldots, \boldsymbol{x}_M \right]. \end{eqnarray*}$ (5)

In a conventional GS2 approach the matrix Ω contains all the encoded information of the simulation and the expected value $\tilde{x}$ $\widetilde{\boldsymbol{x}}$ and covariance matrix $U_{\tilde{x}}$ $\mathbf{U}_{\widetilde{\boldsymbol{x}}}$ are calculated. If the only known information is $\tilde{x}$ $\widetilde{\boldsymbol{x}}$ and $U_{\tilde{x}}$ $\mathbf{U}_{\widetilde{\boldsymbol{x}}}$ then according to maximum statistical entropy theory the PDF will be a multivariate Gaussian distribution in the absence of HOS information such as statistical moments, although the HOS information is theoretically already encoded in Ω for finite order moments.

The information encoded in the MCS data represented by Ω, may therefore after suitable statistical post-processing be represented as a joint PDF f(x₁, …, x_n) if Ω is processed either using scaled histograms as discussed by Oliphant [21] or alternately with kernel density estimate (KDE) approaches as discussed by Scott [22].

A mathematically equivalent representation of the joint PDF is the CDF F(x₁, …, x_n) which is related to the joint PDF by the equation $f (x_{1}, \dots, x_{n}) = \frac{\partial^{n} F (x_{1}, \dots, x_{n})}{\partial x_{1} \dots x_{n}}$ $f(x_1,\ldots,x_n) = \frac{\partial^n F(x_1,\ldots,x_n)}{\partial x_1 \cdots x_n}$ . The CDF may in turn also be defined in terms of the joint PDF by the equation $F (x_{1}, \dots, x_{n}) = \int_{- \infty}^{x_{1}} \dots \int_{- \infty}^{x_{n}} f (t_{1}, \dots, t_{n}) d t_{1} \dots d t_{n}$ $F(x_1,\ldots,x_n) = \int_{-\infty}^{x_1} \cdots \int_{-\infty}^{x_n} f(t_1,\ldots,t_n) \; \mathrm{d}t_1 \cdots \mathrm{d}t_n$ .

As a result either the joint PDF f(x₁, …, x_n) or equivalently the CDF F(x₁, …, x_n) are both automatically known if the MCS GS2 data in Ω is appropriately processed using standard statistical techniques, however the key remaining challenge is in summarizing and analysing a non-Gaussian joint PDF for metrology problems which is the focus of this paper.

3.2 Calculating marginal distributions

In order to implement the Rosenblatt transformation in equation (3) it is first necessary to calculate the marginal distributions F(x₁), …, F(x_n) and then the conditional distributions F(x_j|x₁, …, x_j−1) for j = 2, …, n which in turn rely on the earlier marginal distribution calculations.

Noting that the terms F(x_j), j = 1, …, n are univariate distributions it is observed that these functions may simply be calculated using earlier work by Willink [23] with the aid of extended lambda distributions (ELDs) if the MCS data Ω is available. The use of such ELDs which are based on fourth order statistical moments was later adapted and extended by Harris et al. (5) using parametrized families of quantile functions and monotonic splines for greater accuracy if higher statistical moments are necessary. Other possible lambda distributions apart from ELDs include a combined generalized lambda distribution (GLD) and a generalized beta distribution (GBD) to account for problematic skewness issues in a normal GLD approach, where the combination of these approaches is known as an EGLD which was originally developed by Karian et al. [24]. An EGLD may determined using four parameters λ₁, λ₂, λ₃, λ₄ that are specified in terms of optimizations using statistical moments if the univariate x-data points are supplied as an input. Since EGLDs may take slightly different forms many researchers opt to utilize the Freimer-Mudholkar-Kollia-Lin generalized lambda distribution (FMKL-GLD) which is the most common EGLD function which takes the general form $F^{- 1} (u; λ_{1}, λ_{2}, λ_{3}, λ_{4}) = λ_{1} + λ_{2}^{- 1} [λ_{3}^{- 1} (u^{λ_{3}} - 1) - λ_{4}^{- 1} ({(1 - u)}^{λ_{4}} - 1)]$ $F^{-1}(u;\lambda_1,\lambda_2,\lambda_3,\lambda_4) = \lambda_1 + \lambda_2^{-1} [ \lambda_3^{-1}(u^{\lambda_3} - 1) - \lambda_4^{-1}((1 - u)^{\lambda_4} - 1) ]$ .

In the FMKL-GLD univariate model scheme all of the parameters are free where the only restriction is that λ₂ > 0. An additional constraint that may be posed to have finite k^th statistical moments is to impose the additional restriction that min(λ₃, λ₄) > −1∕k. From a review of the literature a maximum likelihood method as discussed by Corlu and Meterelliyoz [25] is generally preferred and may conveniently be obtained using the readily available open source software package fitgbd that has recently been developed by Wang [26].

A possible disadvantage of a more complicated curve fit such as a FMKL-GLD is that the optimization to determine the optimal values for λ₁, λ₂, λ₃, λ₄ may be more difficult to obtain than using simpler approaches such as a least squares regression, and hence require more advanced approaches such as maximum likelihood estimation (MLE) schemes that are not necessarily available to practicing metrologists.

For this reason in many earlier metrology studies univariate curve fits for the marginal distributions F(x₁), …, F(x_n) typically utilized the more standard ELD form of Willink [23] which takes the simpler form for the quantile function as $\begin{array}{l} Q (ρ) & = & {\begin{array}{l} d + (c / b) [a ρ^{b} - {(1 - ρ)}^{b} + 1 - a], b \neq 0 \\ d + c [a \log (ρ) - \log (1 - ρ)], b = 0 \end{array} . \end{array}$ $\begin{eqnarray*} Q(\rho) & = & \left\{ \begin{array}{l} d + (c/b)[a\rho^b - (1 - \rho)^b + 1 - a], \: b \neq 0\\ d + c [a \log(\rho) - \log(1 - \rho)], \: b = 0 \end{array}. \right.\end{eqnarray*}$ (6)

The above ELD equation is based on fourth order statistical moments and may be used to fit the data for the marginal distribution F(x₁) by sampling a random point ρ where 0 ≤ ρ ≤ 1 corresponding to ρ = Φ(u₁) if u₁ is a sampled point from U ~ N(0, 1) and then working out the corresponding random variable x₁ from the equation x₁ = Q(ρ). A similar approach may also be used for all of the remaining marginal distributions F(x₂), …, F(x_n) so that all of the marginal distributions F(x_j) are modelled using parameters a_j, b_j, c_j, d_j for j = 1, …, n. In all of the above lambda distribution schemes it is generally assumed that a unimodal behaviour is present i.e. there is a ‘single peak’ of the random variable with an element of skewness and kurtosis. In the event that a more complicated behaviour is present with multiple peaks then univariate splines as developed by Harris [5] may instead be utilized.

If the underlying MCS data Ω is not available, or if a parameter based functional fit using fourth order statistical moments is inadequate, and only the joint PDF f(x₁, …, x_n) is known then the marginal distributions, either with single or multiple peaks, may instead be calculated directly from the joint PDF with the HOS information using the fundamental equation such that $\begin{array}{l} f (x_{j}) & = & \int_{- \infty}^{\infty} \dots \int_{- \infty}^{\infty} f (x_{1}, \dots, x_{n}) \\ \times d u_{1} \dots d u_{j - 1} d u_{j + 1} \dots d u_{n} . \end{array}$ $\begin{eqnarray*} f(x_j) & = & \int_{-\infty}^{\infty} \cdots \int_{-\infty}^{\infty} f(x_1,\ldots,x_n) \nonumber \\ & & \times \: \mathrm{d}u_1 \cdots \mathrm{d}u_{j-1} \mathrm{d}u_{j+1} \cdots \mathrm{d}u_n. \end{eqnarray*}$ (7)

3.3 Calculating conditional distributions

Since f(x₁, …, x_n) or F(x₁, …, x_n) are assumed to be known for a MCS data set Ω which may be Gaussian or non-Gaussian, the only remaining requirement to convert the problem into the standard form specified by equation (3) is to calculate the conditional distributions F(x_j|x₁, …, x_j−1) for j = 2, …, n since the marginal distribution corresponding to the case for j = 1 has already been specified in the previous section. Two approaches are possible, namely an indirect approach that uses copulas, and a direct approach which uses numerical and symbolic integrations of the joint PDF.

The indirect approach which utilizes copulas to calculate the conditional distribution makes use of a result by Nagler [27] and just the MCS data Ω. In this approach the conditional univariate/multivariate distribution is specified using the copula C from F(x₁, …, x_n) = C(F(x₁), …, F(x_n)) such that $\begin{array}{l} F (x | v) & = & \frac{\partial C (F (x | v_{- j}), F (v_{j} | v_{- j}))}{\partial F (v_{j} | v_{- j})} . \end{array}$ $\begin{eqnarray*} F(x|\boldsymbol{v}) & = & \frac{\partial C(F(x|\boldsymbol{v}_{-j}), F(v_j|\boldsymbol{v}_{-j}))}{\partial F(v_j|\boldsymbol{v}_{-j})}. \end{eqnarray*}$ (8)

In the above formula $v = {[v_{1}, \dots, v_{k}]}^{T}$ $\boldsymbol{v} = [v_1,\ldots,v_k]^{\mathsf{T}}$ , v_j is any convenient arbitrary element of v e.g. v_j = v₁ or v_j = v_k, v_−j= v\{v_j} i.e. the vector v with the element v_j removed, and the copula is constructed with F(x|v_−j) = z₁ as a first argument and F(v_j|v_−j) = z₂ as a second argument i.e. the copula C(z₁, z₂).

For conditional univariate/univariate distributions the above formula simplifies to $\begin{array}{l} F (x, v) & = & \frac{\partial C (F_{x} (x), F_{v} (v))}{\partial v} . \end{array}$ $\begin{eqnarray*} F(x,v) & = & \frac{\partial C(F_x(x),F_v(v))}{\partial v}. \end{eqnarray*}$ (9)

When these formulae are applied to equation (3) it follows that the conditional distributions may be calculated as $\begin{array}{l} F (x_{2} | x_{1}) & = & \frac{\partial C (F (x_{2}), F (x_{1}))}{\partial F (x_{1})} \end{array}$ $\begin{eqnarray*} F(x_2|x_1) & = & \frac{\partial C(F(x_2), F(x_1))}{\partial F(x_1)}\end{eqnarray*}$ (10) $\begin{array}{l} F (x_{j} | x_{1}, \dots, x_{j - 1}) \\ = \frac{\partial C (F (x_{j} | x_{1}, \dots, x_{j - 2}), F (x_{j - 1} | x_{1}, \dots, x_{j - 2}))}{\partial F (x_{j - 1} | x_{1}, \dots, x_{j - 2})}, \\ j = 3, \dots, n \end{array}$ $\begin{eqnarray*} &&F(x_j|x_1,\ldots,x_{j-1}) \nonumber \\ &&\quad = \frac{\partial C(F(x_j|x_1,\ldots,x_{j-2}), F(x_{j-1}|x_1,\ldots,x_{j-2}))}{\partial F(x_{j-1}|x_1,\ldots,x_{j-2})}, \nonumber \\ &&\qquad\: j=3,\ldots,n\end{eqnarray*}$ (11)

Noting that the conditional in equation (11) for an index j is defined in terms of the previous conditional for j − 1 and that equation (10) has an explicit expression for j = 2 it may be observed that equation (10) and equation (11) then allow for all the conditionals for j = 2, …, n to be calculated. As a result the calculation of conditional distributions reduces to fitting bivariate copulas for C(z₁, z₂) for specified marginals z₁ and z₂, and conditional variate data.

Different approaches to fitting bivariate copulas are possible, all of which essentially reduce to parameter optimizations for choices of bivariate copula families, however the simplest approach is to use the open source R software package VineCopula by Nagler et al. [28] which performs the optimization with the following R code:

C12 <- BiCopSelect(u1, u2, familyset = NA)
# c(1, 2, 3, 4, 5, 6, 7)

At the present time of writing, although some other routines in Matlab and Python for fitting copulas are available, the R-based VineCopula package is considered the most advanced and practical software package for fitting and estimating copulas for multivariate data.

In the above code fragment u1 and u2 are the marginals for random variables x₁ and x₂ calculated as u₁ = F(x₁) and u₂ = F(x₂) and C12 is an object that contains the properties of the bivariate copula which may be used specify its mathematical equation by looking up the properties of the object and in the software package’s reference documentation.

Although forty different choices of parameter based bivariate copula may be fitted with the VineCopula package not all of these copula families are fully supported. As a result from a practical implementation perspective it is sometimes necessary to restrict the best copula fit to the first seven standard copulas which are fully supported viz. the independence Archimedean, Gaussian, Student-t, Clayton, Gumbel, Frank, Joe, and BB1 copulas by specifying the selection as familyset = c(1,2,3,4,5,6,7). The effect of this restriction is that the most accurate choice of copula may not be fitted due to the combination of the mathematical complexity of the copula’s functional form and the associated numerical challenges with performing the parameter optimization fit.

In the event that a supported standard bivariate copula is fitted and its parameters determined, then the partial derivatives $\frac{\partial C (u_{1}, u_{2})}{\partial u_{1}}$ $\frac{\partial C(u_1, u_2)}{\partial u_1}$ and $\frac{\partial C (u_{1}, u_{2})}{\partial u_{2}}$ $\frac{\partial C(u_1, u_2)}{\partial u_2}$ may be calculated for specified values of u₁ and u₂ using the following code:

dC12du1 <- BiCopHfunc1(u1val, u2val, C12)
dC12du2 <- BiCopHfunc2(u1val, u2val, C12)

Whilst the above indirect approach is conceptually simple it is critically reliant on the use of the VineCopula software library, and may therefore not be accessible to metrologists who are not familiar with and proficient in programming with the R-language for statistical computing. As a result to cater for a variety of metrologists who may utilize other software languages and packages it is thus beneficial to utilize a direct approach for generality.

The direct approach utilizes numerical/symbolic integrations of the joint PDF f(x₁, …, x_n) which is assumed to be known in a form that is amenable to calculations, where f(x₁, …, x_n) may be obtained from results using scaled histograms or KDE schemes from the MCS data Ω as previously discussed.

In order to calculate the conditional distributions F(x_j|x₁, …, x_j−1) it is first necessary to calculate the conditional density f(x_j|x₁, …, x_j−1). This may be achieved by first assuming that f(x₁, …, x_n) is known so that $\begin{array}{l} f (x_{1}, \dots, x_{j}) \\ = \int_{- \infty}^{\infty} \dots \int_{- \infty}^{\infty} f (x_{1}, \dots, x_{j}, u_{j + 1}, \dots, u_{n}) \\ \times d u_{j + 1} \times \dots \times d u_{n}, 1 ⩽ j ⩽ n . \end{array}$ $\begin{eqnarray*} &&f(x_1,\ldots,x_j) \nonumber \\ &&\quad = \int_{-\infty}^{\infty} \cdots \int_{-\infty}^{\infty} f(x_1,\ldots,x_j,u_{j+1},\ldots,u_n) \nonumber \\ &&\qquad \times \: \mathrm{d}u_{j+1} \times \cdots \times \mathrm{d}u_n, \: 1 \leqslant j \leqslant n.\end{eqnarray*}$ (12)

The above result may then be used to calculate the conditional density such that $f (x_{j} | x_{1}, \dots, x_{j - 1}) = \frac{f (x_{1}, \dots, x_{j})}{f (x_{1}) \times \dots \times f (x_{j - 1})} .$ $\begin{eqnarray*} f(x_j|x_1,\ldots,x_{j-1}) = \frac{f(x_1,\ldots,x_j)}{f(x_1) \times \cdots \times f(x_{j-1})}.\end{eqnarray*}$ (13)

The conditional distribution is then finally calculated from the conditional density such that $F (x_{j} | x_{1}, \dots, x_{j - 1}) = \int_{- \infty}^{x_{j}} f (u_{j} | x_{1}, \dots, x_{j - 1}) d u_{j} .$ $\begin{eqnarray*} F(x_j|x_1,\ldots,x_{j-1}) = \int_{-\infty}^{x_j} f(u_j|x_1,\ldots,x_{j-1}) \; \mathrm{d}u_j .\end{eqnarray*}$ (14)

In the above formula for F(x_j|x₁, …, x_j−1) no simplifying assumptions are utilized, and the conditional distributions implicitly utilizes all of the HOS information inclusive of potential non-Gaussian characteristics if the joint PDF f(x₁, …, x_n) has been correctly determined from the GS2 data Ω obtained from a MCS.

As a result, the conditional distribution may be conveniently calculated either symbolically or numerically using any convenient software package such as Matlab, Octave, Python or R if the joint PDF f(x₁, …, x_n) is known without the necessity of utilizing specialist statistical packages.

3.4 Comparing GS2 and Rosenblatt PDFs

In order to compare two different PDFs, f_X(x) corresponding to the GS2 based exact non-Gaussian joint PDF and f_Y (x) corresponding to the Rosenblatt equivalent joint PDF, the Kullback-Leibler divergence defined as $D_{K L} (X, Y) = \int_{ℝ^{n}} f_{X} (x) \log {\frac{f_{X} (x)}{f_{Y} (x)}} d x$ $\begin{eqnarray*} D_{KL}(\boldsymbol{X},\boldsymbol{Y}) = \int_{\mathbb{R}^n} f_{\boldsymbol{X}}(\boldsymbol{x})\log\left\{ \frac{f_{\boldsymbol{X}}(\boldsymbol{x})}{f_{\boldsymbol{Y}}(\boldsymbol{x})}\right\}\mathrm{d}\boldsymbol{x}\end{eqnarray*}$ (15)

is used as a measure to estimate the ‘distance’ or deviation between the two PDFs.

If the joint PDFs are equal every such that f_X(x) = f_Y(x) for all points x ∈Ω it then follows that log (f_X(x)∕f_Y(x)) = log(1) = 0. As a result when this function is integratedfor all points x ∈Ω it follows that D_KL(X, Y) = 0. If on the other hand the functional evaluations are not equal every where then D_KL(X, Y)≠0. As a result the closer D_KL(X, Y) is to zero thenthe better the ‘fit’ is between the two different joint PDFs.

4 Numerical simulations

4.1 Generating a MCS GS2 non-Gaussian dataset

Earlier work by Ramnath [8] studied a bivariate pressure balance model of the form A = A₀(1 + λP) where A is the effective area of the pressure balance, λ is a distortion coefficient and P is an independently varied applied pressure in the range 50 ≤ P∕[MPa] ≤ 500. In this model the parameters A₀ and λ are coupled to each other and a bivariate PDF $g_{A_{0}, λ} (η_{A_{0}}, η_{λ})$ $g_{A_0,\lambda}(\eta_{A_0}, \eta_{\lambda})$ is used to model the characteristics of the pressure balance. This physical pressure balance model may be extended to a three parameter model of the form $\begin{array}{l} A & = & A_{0} (1 + λ_{1} P + λ_{2} P^{2}) \end{array}$ $\begin{eqnarray*} A & = & A_0(1 + \lambda_1 P + \lambda_2 P^2) \end{eqnarray*}$ (16)

as discussed by Yadav et al. [29] where the parameters A₀, λ₁ and λ₂ are coupled in a trivariate joint PDF $g_{A_{0}, λ_{1}, λ_{2}} (η_{A_{0}}, η_{λ_{1}}, η_{λ_{2}})$ $g_{A_0,\lambda_1,\lambda_2}(\eta_{A_0}, \eta_{\lambda_1}, \eta_{\lambda_2})$ , however relatively few experimental data sets are available of pressure balances which exhibit strongly non-Gaussian characteristics. In the earlier work by Ramnath [8] the experimental data-set exhibited weakly non-Gaussian characteristics i.e. the non-Gaussian joint PDF could be roughly approximated with a Gaussian joint PDF for a reduced level of accuracy, and as a result there exists a general research gap in the area of metrology uncertainty analysis for summarizing and analysing non-Gaussian joint PDFs that cannot be adequately approximated with Gaussian joint PDFs.

In order to avoid the unnecessary complexity of solving for a full measurand model which may potentially produce a multivariate weak non-Gaussian distribution in this paper we therefore opt to instead directly model a joint PDF using an Azzalini multivariateskew-normal (AMSN) distribution. The AMSN which was developed by Azzalini and Capitanio [30] is denoted as Y ~ SN_k(ξ, Ω, α) and is utilized in this paper in order to specifically ensure a strongly non-Gaussian distribution that cannot be adequately summarized and approximated with a GS2 multivariate Gaussian joint PDF. Under these circumstances the actual non-Gaussian joint PDF for a random variable $y \in ℝ^{k}$ $\boldsymbol{y} \in \mathbb{R}^k$ is defined as $\begin{array}{l} g_{Y} (y) & = & 2 ϕ_{k} (y - ξ; Ω) Φ (α^{T} ω^{- 1} (y - ξ)) . \end{array}$ $\begin{eqnarray*} g_{\boldsymbol{Y}}(\boldsymbol{y}) & = & 2\phi_k (\boldsymbol{y} - \boldsymbol{\xi}; \mathbf{\Omega}) \Phi(\boldsymbol{\alpha}^{\mathsf{T}} \mathbf{\omega}^{-1}(\boldsymbol{y} - \boldsymbol{\xi})). \end{eqnarray*}$ (17)

In the above equation ξ, Ω and α are parameters for the AMSN where ξ is a center for the distribution, Ω is a covariance matrix, and α is a shape parameter.

Open source software routines have subsequently been developed by Azzalini [31] to easily and conveniently sample from the AMSN distribution if the equivalent quantities for the mean μ, covariance Ω and shape parameters α are provided. The use of these open source routines thus allows for a convenient approach to generate synthetic non-Gaussian Monte Carlo data in order to build up Ω for subsequent analysis in order to construct either f(x₁, …, x_n) or F(x₁, …, x_n).

Simulations in this paper are performed for a skew-normal SN₃ distribution in order to artificially emulate a pressure balance model that has a non-Gaussian joint PDF equation where the three-dimensional measurement model has the form A = A₀(1 + λ₁P + λ₂P²), where the random variables are $x_{1} = η_{A_{0}}$ $x_1=\eta_{A_0}$ , $x_{2} = η_{λ_{1}}$ $x_2=\eta_{\lambda_1}$ and $x_{3} = η_{λ_{2}}$ $x_3 = \eta_{\lambda_2}$ so that the joint PDF $g_{A_{0}, λ_{1}, λ_{2}} (η_{A_{0}}, η_{λ_{1}}, η_{λ_{2}})$ $g_{A_0,\lambda_1,\lambda_2}(\eta_{A_0},\eta_{\lambda_1},\eta_{\lambda_2})$ is represented as f(x₁, x₂, x₃) for notational brevity and simplicity. The parameters for the corresponding SN₃ function are reported in Table 1 where the covariance matrix is built up in terms of the correlation coefficients such that Σ_ij = r_ijσ_iσ_j.

Using the data in Table 1 then allows for the construction of the mean $μ = {[μ_{1}, μ_{2}, μ_{3}]}^{T}$ $\boldsymbol{\mu} = [\mu_1, \mu_2, \mu_3]^{\mathsf{T}}$ , covariance Σ, and shape parameter ${[δ_{1}, δ_{2}, δ_{3}]}^{T}$ $[\delta_1, \delta_2, \delta_3]^{\mathsf{T}}$ so that the Monte Carlo data dataOmega = Ω and corresponding joint PDF dataf = f(x) data may be generated by running the following computer code in a RStudio version 1.2.5001 developer environment using a R version 3.6.1 code base running in Microsoft Windows 10 as shown below:

library(EMMIXskew)
n <- 15000
dataOmega <- rdmsn(n, p, mean, cov, del)
dataf <- ddmsn(y, n, p, mean, cov, del)
write.table(dataOmega,
     file = paste(getwd(),’/Omega.txt’, sep = ""),
     append = FALSE, sep = " ", dec = ".",
     row.names = FALSE, col.names = FALSE)
write.table(dataf,
     file = paste(getwd(),’/Density.txt’, sep = ""),
     append = FALSE, sep = " ", dec = ".",
     row.names = FALSE, col.names = FALSE)

In the above code fragment EMMIXskew is the library developed by Azzalini [31] that must first be loaded in order to call the functions rdmsn that samples random points from the distribution and ddmsn that conveniently calculates the probability density function values for the corresponding sampled points.

Considering the synthetic MCS non-Gaussian data set, the variable x₁ emulates a scaled random variable of zero-pressure area $η_{A_{0}}$ $\eta_{A_0}$ , the variable x₂ emulates a scaled random variable of the first distortion coefficient $η_{λ_{1}}$ $\eta_{\lambda_1}$ , and the variable x₃ emulates a scaled random variable of the second distortion coefficient $η_{λ_{2}}$ $\eta_{\lambda_2}$ . Typical physical units for these physical parameters are $A_{0} = O (10^{- 6} m^{2})$ $A_0 = \mathcal{O}(10^{-6} \mathrm{m^2})$ , $λ_{1} = O (10^{- 12} {Pa}^{-1})$ $\lambda_1 = \mathcal{O}(10^{-12} \mathrm{Pa^{-1}})$ and $λ_{2} = O (10^{- 24} {Pa}^{-2})$ $\lambda_2 = \mathcal{O}(10^{-24} \mathrm{Pa^{-2}})$ if the applied pressure is in the range 50 ≤ P∕[MPa] ≤ 500 as reported by Ramnath [32] in later studies. As a result the physical meaning of the synthetic data that is generated with the AMSN is that the Monte Carlo data corresponds to random variables which have scaled units of x₁ ∕[mm²], x₂ ∕[ppm∕MPa] and $x_{3} / [{(ppm/MPa)}^{2}]$ $x_3/[(\mathrm{ppm/MPa})^2]$ respectively for a pressure balance measurement model of the form A = x₁(1 + x₂P + x₃P²).

For the example shown n = 15000 random multivariate points are sampled from the AMSN distribution and the corresponding Ω and f(x) data are saved to text files omega1.txt and density1.txt. The actual non-Gaussian joint PDF is visualized in Figure 1 using an isosurface volumetric perspective with 8 isodensity levels from the minimum to maximum joint PDF values. In this figure, the skew and asymmetric behaviour due to the non-Gaussian distribution behaviour may be qualitatively but subjectively observed.

Referring to the non-Gaussian joint PDF plot it is observed that the points are clustered near the ‘bottom left’ as seen by the darker colors and are then “stretched” towards the ‘upper right’. This non-Gaussian behaviour may be visually contrasted with the corresponding Gaussian approximation as shown in Figure 2 also for 8 isodensity levels from the minimum to maximum joint PDF values, from which it is observed that the points are clustered in the ‘middle region’, and are symmetrically “stretched” in an equal manner towards the ‘bottom left’ and ‘upper right’ respectively as a symmetric ellipsoid. An additional observation is that the Gaussian approximation “cuts-off” part of the joint PDF for the same range of x₁, x₂ and x₃ points as the regions are ‘equally stretched’ out in both directions, whereas the original non-Gaussian exhibits an ‘unequal stretching’. Consequently when comparing the two visualizations of the actual non-Gaussian and approximate Gaussian joint PDFs, it may be observed and concluded that the Gaussian joint PDF with low order second order statistical moments i.e. an expected value $\tilde{y}$ $\widetilde{\boldsymbol{y}}$ vector and covariance $U_{\tilde{y}}$ $\mathbf{U}_{\widetilde{\boldsymbol{y}}}$ matrix cannot adequately approximate a non-Gaussian joint PDF.

The equivalent Rosenblatt transformation system constructed with N = 15000 sampled points from the marginal and conditonal density equations outlined later in the paper is shown in Figure 3. In this plot it is clearly observed that whilst there are minor fluctuations in the isodensity values in different regions of points x ∈ Ω due to the stochastic nature in which both the GS2 and Rosenblatt systems were generated that the overall characteristics of the GS2 system in Figure 1 is broadly equivalent to that of the Rosenblatt system in Figure 3 as both systems exhibit strong non-Gaussian characteristics. Both the exact and equivalent non-Gaussian characteristics of the GS2 and Rosenblatt systems when contrasted with the GS2 Gaussian approximation constructed with the expected value and covariance in Figure 2 are found to exhibit differences. As a result it may be qualitatively observed that the traditional GS2 approach for summarizing a multivariate distribution with second order statistical moments using a multivariateGaussian, is problematic and inaccurate if there is a strong non-Gaussian behaviour in the actual data.

These numerical results are suggestive that it may be advantageous for a metrologist to first test for multivariate normality in order to decide whether a multivariate Gaussian distribution is adequate instead of automatically assuming this condition by only specifying the first order statistical moment in $\tilde{y}$ $\widetilde{\boldsymbol{y}}$ and the second order statistical moment in $U_{\tilde{y}}$ $\mathbf{U}_{\widetilde{\boldsymbol{y}}}$ . In the event that such tests determine that a Gaussian distribution is problematic then the alternative would be to construct the Rosenblatt transformation system so this could instead be utilized to appropriately model and analyse a joint PDF where second order statistical moments are inaccurate.

Noting that in a practical metrology uncertainty analysis such as a laboratory primary standard calibration or a laboratory inter-comparision, where additional quantitative statistical tests are necessary to make an objective assessment as to whether the distribution is sufficiently strongly non-Gaussian, a choice of quantitative test to objectively check for multivariate normality then becomes necessary in order to decide whether the non-Gaussian joint PDF may be approximated with a Gaussian joint PDF.

Different quantitative types of statistical tests to check for normality of univariate data are available and include the Shapiro-Wilk, D’Agostino K-squared, and Anderson-Darling tests amongst others as discussed by Ghasemi and Zahediasl [33], with related extensions for multivariatedata. In this paper we opt to test for multivariate normality using the approach originally proposed by Mardia [34] for simplicity. The Mardia normality approach tests for multivariate skewness and kurtosis, and is chosen for its simplicity and ready availability in the open source R-package MVN package by Korkmaz et al. [35].

Quantitative results when this normality test is applied to the GS2 data-set from Figure 1 are summarized in Figure 4 and similarly for the Rosenblatt transfomation system in Figure 5. In this statistical summary it may be observed that both distributions do not satisfy the Mardia skewness and kurtosis normality tests i.e. neither the actual GS2 nor the equivalent Rosenblatt data can be approximated with a three-dimensional Gaussian distribution. This information objectively and quantitatively demonstrates that the GS2 data fails both the normality criteria checks as the p-values are significantly below the cut-off thresholds that requires p > 0.05 in order to satisfy the normality criteria. Referring to these results an additional benefit that may be noted when using the standard Mardia test with the MVN package, is that this statistical test also in addition reports on the normality of the marginal distributions of f(x₁), f(x₂) and f(x₃) in the case of a joint PDF f(x₁, x₂, x₃) using the Shapiro–Wilk test. A consequence of this is that caution should also be exercised in attempting to model the marginal distributions with extended lambda distributions (ELDs) as ELDS are generally utilized to model weakly non-Gaussian distributions that do not deviate too far from univariate Gaussian distributions. This result is suggestive that the distributions may need to be modelled either with parametrized high order statistical moments or alternately with non-parametrized kernel density estimates if there is a strong non-Gaussian nature in the distribution.

Referring to these additional results it is observed that all of the marginal distributions also do not pass the normality tests since all of p-values are again significantly smaller than 0.001, where a value of p > 0.05 is similarly required in order for the marginal distribution to be considered sufficiently Gaussian in nature. As a result, it may be concluded that the GS2 non-Gaussian joint PDF data under consideration cannot be approximated with a multivariate Gaussian distribution, and hence a measurement uncertainty analysis must either directly utilize the non-Gaussian joint PDF or alternately indirectly utilize the same data in an equivalent and more convenient form for further analysis.

Due to the complexity of the joint PDF under consideration for generality a functional form for f(x₁, x₂, x₃) must be specified in order to determine the marginal distribution F(x₁), and the associated conditional distributions F(x₂|x₁) and F(x₃|x₁, x₂) for mathematical completeness. Since the data x₁,x₂,x₃, and f(x₁, x₂, x₃) are known for discrete points from the MCS data Ω and exhibit strong non-Gaussian characteristics, this may be conveniently achieved using a KDE approach which avoids low order statistical moment inaccuracies with any appropriate software choice such as Matlab, Octave, Python or R for which KDE software routines and best practises are readily available.

Table 1

Parameters for a generating a non-Gaussian pressure balance joint PDF model using a multivariate skew-normal distribution.

Fig. 1

Volumetric isosurfaces visualization of the actual GUM supplement 2 non-Gaussian joint PDF obtained from synthetic sampled Monte Carlo simulation data-points generated from a three-variable Azzalini multivariate skew-normal joint PDF.

Fig. 2

Volumetric isosurfaces density visualization of the approximate GUM supplement 2 Gaussian joint PDF obtained from synthetic Monte Carlo data sampled from a Gaussian distribution using second order statistical moments based expected value and covariance information.

Fig. 3

Volumetric isosurfaces density visualization of the equivalent Rosenblatt transformation system.

Fig. 4

Statistical summary of multivariate non-Gaussian characteristics of the GUM supplement 2 Monte Carlo data using the Mardia normality test.

Fig. 5

Statistical summary of multivariate non-Gaussian characteristics of the Rosenblatt transformation system Monte Carlo data using the Mardia normality test.

4.2 Calculating a Rosenblatt transformation system

The GS2 non-Gaussian data Ω as visualized in terms of its joint PDF f(x₁, x₂, x₃) in Figure 1 has a corresponding GS2 Gaussian joint PDF approximation as shown in Figure 2. This approximation which only utilizes the expected value $\tilde{y}$ $\widetilde{\boldsymbol{y}}$ and covariance $U_{\tilde{y}}$ $\mathbf{U}_{\widetilde{\boldsymbol{y}}}$ obtained from Ω may be utilized to observe the differences in density variations over the domain. When these quantitative differences are obtained with the aid of the Marsia multivariate normality tests from the previous section, it may be concluded that the variations are too severe for a Gaussian approximation to be utilized as the GS2 data exhibits strong non-Gaussian characteristics. As a result only the actual non-Gaussian joint PDF data will be considered, contrasted and compared with the equivalent Rosenblatt transformation based joint PDF data.

In order to proceed with the analysis the information in Ω must first be post-processed in order to calculate the marginal distributions for F(x_j) and the conditional distributions for F(x_j|x₁, …, x_j−1) for j = 1, 2, 3 in order to work out the corresponding values of the Rosenblatt transformations. Expanding out it follows that $\begin{array}{l} r_{1}^{⋆} & = & F (x_{1}^{⋆}) \\ r_{2}^{⋆} & = & F (x_{2}^{⋆} | x_{1}^{⋆}) \\ r_{3}^{⋆} & = & F (x_{3}^{⋆} | x_{1}^{⋆}, x_{2}^{⋆}) . \end{array}$ $\begin{eqnarray}r_1^{\star} & = & F(x_1^{\star})\\ r_2^{\star} & = & F(x_2^{\star}|x_1^{\star})\\ r_3^{\star} & = & F(x_3^{\star}|x_1^{\star},x_2^{\star}). \end{eqnarray}$

The application of the Rosenblatt transformation method then reduces to finding appropriate values of $x_{1}^{⋆}, x_{2}^{⋆}, x_{3}^{⋆}$ $x_1^{\star}, x_2^{\star}, x_3^{\star}$ that solve the above system of conditional distribution equations for specified random variables $r_{1}^{⋆}, r_{2}^{⋆}, r_{3}^{⋆} ~ R [0, 1]$ $r_1^{\star},r_2^{\star},r_3^{\star} \sim R[0,1]$ using equation (14), for which explicit forms of these equations are derived in the Appendix.

This can be sequentially achieved by first solving equation (18) which produces the $x_{1}^{⋆}$ $x_1^{\star}$ value, then using the known values of $x_{1}^{⋆}$ $x_1^{\star}$ in equation (19) which produces the $x_{2}^{⋆}$ $x_2^{\star}$ value, and then finally using the known values of $x_{1}^{⋆}$ $x_1^{\star}$ and $x_{2}^{⋆}$ $x_2^{\star}$ in equation (20) to produce the $x_{3}^{⋆}$ $x_3^{\star}$ value so that ${[x_{1}^{⋆}, x_{2}^{⋆}, x_{3}^{⋆}]}^{T}$ $[x_1^{\star},x_2^{\star}, x_3^{\star}]^{\mathsf{T}}$ corresponds to a sampled point from the f(x₁, x₂, x₃) distribution. This process may be extended in a straightforward manner for higher-dimensional systems of random variables $x \in ℝ^{d}$ $\boldsymbol{x} \in \mathbb{R}^d$ in measurement models for GS2 Monte Carlo simulation studies where a joint PDF f(x₁, …, x_d) for $3 ⩽ d \in ℕ$ $3 \leqslant d \in \mathbb{N}$ is known.

4.3 Calculation of the Kullback-Leibler divergence

Once all of the solved values of $x_{1}^{⋆}, x_{2}^{⋆}, x_{3}^{⋆}$ $x_1^{\star},x_2^{\star},x_3^{\star}$ have been obtained for a sufficiently large number of points then a new joint PDF f_Y (x) may be constructed as visualized in Figure 3 in order to compare it with the original joint PDF f_X (x) as shown in Figure 1. This information may be processed by calculating the Kullback-Leibler divergence D_KL(X, Y) specified in equation (15) in order to obtain a quantitative indication of the accuracy of the equivalent Rosenblatt transformation system of the original non-Gaussian joint PDF. If the scalar value of D_KL(X, Y) which is obtained by integrating over the hyper-volume that envelopes all of the generated variables that were used in the construction is sufficiently close to zero, then the equivalence of the original non-Gaussian f_X (x) and the equivalent Rosenblatt system of mapped variables will have been established.

In order to evaluate the Kullback-Leibler divergence it is necessary to perform a multivariate integral of the form ∫_ξ∈Ωϕ(ξ) dξ over the space of the random variables. Due to the complexity of the underlying integrand function ϕ this may be more conveniently and simply achieved through the use of a Monte Carlo integration scheme such that $\begin{array}{l} \int_{V} ϕ (ξ) d ξ & = & V \times \frac{\sum_{i = 1}^{N} f (x_{i})}{N} . \end{array}$ $\begin{eqnarray*} \int_V \phi(\boldsymbol{\xi}) \; \mathrm{d}\boldsymbol{\xi} & = & V \times \frac{\sum_{i=1}^N f(\boldsymbol{x}_i)}{N}. \end{eqnarray*}$ (21)

In the above equation N is a number of points within the hyper-volume V that is defined by the collection of all the points x ∈Ω such that V is calculated as $\begin{array}{l} V & = & \prod_{i = 1}^{d} [\max (x_{i}) - \min (x_{i})] \end{array}$ $\begin{eqnarray*} V & = & \prod_{i=1}^d \left[ \max(x_i) - \min(x_i) \right] \end{eqnarray*}$ (22)

where x_i is the random variable for a dimension obtained from $x = {[x_{1}, \dots, x_{d}]}^{T}$ $\boldsymbol{x} = [x_1, \ldots, x_d]^{\mathsf{T}}$ and d is the number of dimensions. When this is implemented a final value of D_KL(X, Y) = 0.045199 results where X denotes the actual GS2 distribution and Y denotes the equivalent Rosenblatt transformation system distribution.

Since D_KL ≈ 0 even for a relatively small number of n = N = 15000 Monte Carlo sampled points from the underlying distributions it may reasonably be concluded that the mapped Gaussian variables obtained from the Rosenblatt transformation is indeed equivalent to the original non-Gaussian GS2 distribution. This observation thus provides quantitative proof of the validity of the numerical scheme used to construct the conditional densities using equation (12), equation (13) and equation (14), and also in addition provides a convenient mechanism to sample from non-Gaussian multivariate distributions.

4.4 Analysis of Rosenblatt transformations

Although a mechanism of sampling from a non-Gaussian joint PDF f(x) is a convenient result by itself, a further additional benefit of the Rosenblatt transformation is that it allows for the study of uncertainties of random variables based on the coupling effect of other random variables without the need for a complicated determination of the underlying copula. This effect may be investigated by noting that the original joint PDF may be decomposed as f(x₁, x₂, x₃) = f(x₁)f(x₂|x₁)f(x₃|x₁, x₂) and as a result when the conditional densities f(x₂|x₁) and f(x₃|x₁, x₂) are calculated they may then be used to obtain estimates for the corresponding variations in x₂ and x₃ respectively.

Although the variation in x₁ is immediately determined by the marginal distribution f(x₁), it is not immediately obvious which values of x₁ to utilize in f(x₂|x₁) for the variation of x₂, and similarly which combination of values of x₁ and x₂ to utilize in f(x₃|x₁, x₂) for the variation of x₃. Motivated by the earlier discussion of fitting extended lambda distributions for univariate distributions it is noted that quartiles and quantiles can offer a convenient initial choice of limits in the random variables in order to study their coupling effect on the uncertainties of the other random variables.

By standard statistical convention, there are three quartiles in any univariate distribution, namely Q₁ which splits the lowest 25% of the data from the highest 75%, Q₂ which simply gives the median value that splits the data in half by seperating the lower 50% and upper 50% of the data, and finally Q₃ which splits the highest 25% of the data from the lowest 75% of the data. Quantiles function in a similar manner and split the data into five equal portions, and as a result this suggests that in general percentiles may offer a convenient mechanism to test the “spread” in uncertainty for the previously calculated conditional density distributions f(x₂|x₁) and f(x₃|x₁, x₂). Due to the variety of choices of possible percentiles to consider we restrict the selection to the 25%, 50% and 75% percentiles for brevity, noting that any choice of percentile may be utilized in a conditional density function. Referring to the earlier Mardia normality tests in Figure 5 it follows that $x_{1}^{(25 %)} = 2.079300$ $x_1^{(25\%)} = 2.079300$ , $x_{1}^{(50 %)} = 2.235216$ $x_1^{(50\%)} = 2.235216$ , $x_{1}^{(75 %)} = 2.433417$ $x_1^{(75\%)} = 2.433417$ , $x_{2}^{(25 %)} = 2.926692$ $x_2^{(25\%)} = 2.926692$ , $x_{2}^{(50 %)} = 3.290420$ $x_2^{(50\%)} = 3.290420$ , and $x_{2}^{(75 %)} = 3.784753$ $x_2^{(75\%)} = 3.784753$ respectively.

When the above choices of limits are incorporated into the Rosenblatt transformed system a set of a marginal density curve for f(x₁) in Figure 6, and conditional density curves for f(x₂|x₁) in Figure 7 and f(x₃|x₁, x₂) in Figure 8 respectively, may be generated in order to determine the “spread” in uncertainties for u(x₁), u(x₂) and u(x₃) under different operating conditions. This combination of marginal and conditional densities in terms of simple xy-graphs may then be used to estimate the “spread” of likely values of the random variables for different regions within Ω without the attendent difficulties of constructing the copulas and attempting to visualize the coupling effects in higher dimensional spaces.

One particular example of how the Rosenblatt system presents advantages is for the representative pressure balance problem data-set that was studied when the zero-pressure area $η_{A_{0}} = x_{1}$ $\eta_{A_0} = x_1$ is known with a greater confidence so that this influences the likely uncertainty of the distortion coefficent η_λ = x₂. In this particular case for a trivariate joint PDF where $η_{A_{0}} = x_{1}$ $\eta_{A_0} = x_1$ , $η_{λ_{1}} = x_{2}$ $\eta_{\lambda_1} = x_2$ and $η_{λ_{2}} = x_{3}$ $\eta_{\lambda_2} = x_3$ , the conditional densities such as $f (x_{2} | x_{1}^{(50 %)})$ $f(x_2|x_1^{(50\%)})$ and $f (x_{3} | x_{1}^{(50 %)}, x_{2}^{(50 %)})$ $f(x_3|x_1^{(50\%)}, x_2^{(50\%)})$ may be used to study the likely effect on the uncertainties of u(x₂) = u(λ₁) and u(x₃) = u(λ₂). Similarly, the effect of prior known information from for example earlier instrument calibrations and/or inter-comparison reports may be used as inputs into conditional densities in order to analyse variations in uncertainties.

This functionality thus presents potential new advantages to metrologists for performing advanced uncertainty analyses of laboratory measurement standards and intercomparisons which was not previously possible for GUM Supplement 2 based Monte Carlo simulations which exhibited non-Gaussian characteristics, and for which it was previously difficult to investigate conditional probabilities for metrology based uncertainty calculations. A potential additional benefit of the new conditional density approach that is implicit within the Rosenblatt transformation system is that it becomes easier to determine possible “double peaks” that may affect uncertainties by inspecting the xy-graphs, where this information was previously “hidden” in the higher dimensional space.

Fig. 6

Visualization of the Rosenblatt marginal density f(x₁).

Fig. 7

Visualization of the Rosenblatt conditional density f(x₂|x₁).

Fig. 8

Visualization of the Rosenblatt conditional density f(x₃|x₁, x₂)

5 Discussion

This paper reported on research work that performed a mathematical analysis with corresponding numerical simulations in order to investigate the practicality of utilizing a mathematically equivalent sequential system of mapped Gaussian PDFs for modelling, summarizing and analysing a non-Gaussian joint PDF in metrology uncertainty analysis studies.

Techniques for conveniently constructing the Rosenblatt transformation system based univariate marginal density and multivariate conditional densities for non-Gaussian distributions with a direct integration approach were therefore considered and investigated in this paper. Results with this approach where the joint PDF was constructed with a kernel density estimate (KDE) modelling scheme of the original GUM supplement 2 data were found to offer a convenient, practical and accurate approach. When this approach was extended it was further determined that it also offered a convenient mechanism for sampling from non-Gaussian distributions.

An additional benefit of a KDE approach over lambda distribitions for fitting functions was observed when the variables exhibited multiple peaks and strongly deviated from Gaussian distributions, for which conventional low order statistical moment based lambda distributions have difficulties addressing but which are straightforward for KDE schemes.

Based on the approach developed an alternative technique for estimating the uncertainties of the model results directly in terms of the conditional densities without the utilization of constructed copulas was considered using representative quartiles and expected values of the random variables. This approach presents a new novel method for metrologists for analysing non-Gaussian PDFs in practical measurement uncertainty problems that may be encountered in high accuracy calibrations and laboratory inter-comparisons.

6 Conclusions

Based on the investigation reported in this paper the following conclusions were determined:

The Rosenblatt transformation offers a convenient mechanism for decomposing a multivariate distribution into an equivalent system of marginal/conditional distributions that can be used to conveniently sample points from higher-dimensional non-Gaussian joint PDFs
A direct numerical integration of the joint PDF that is approximated with a kernel density estimate (KDE) is recommended for calculating both the marginal and conditional densities
The marginal and conditional densities may conveniently be used to estimate uncertainties in higher-dimensional multivariate non-Gaussian distributions using sequences of two-dimensional xy-graphs without the need for constructing parameter and/or empirical based copulas.

7 Influences and implications

Based on the research reported in this paper the following influences and implications were determined:

Metrologists are no longer constrained in summarizing and approximating GUM Supplement 2 measurement uncertainties as multivariate Gaussian distributions if non-Gaussian characteristics are present
More accurate and non-linear/multi-modal measurement uncertainty effects for metrology applications may now be directly considered using kernel density estimate based schemes without prior linear/ uni-modal behavior approximations that was previously necessary with low order statistical moment models such as lambda distributions.

Acknowledgments

This work was performed with funds provided by the Department of Higher Education, Science and Technology (DHEST) on behalf of the South African government for research by public universities.

Appendix A Derivation of conditional joint PDFs

Assume that the joint PDF f(x₁, x₂, x₃) is known and can be computed using either interpolations from scaled histograms or a kernel density estimate function formulation. It immediately follows that the marginal distributions are $\begin{array}{l} f (x_{1}) & = & \int_{- \infty}^{\infty} \int_{- \infty}^{\infty} f (x_{1}, u_{2}, u_{3}) d u_{2} d u_{3} \\ f (x_{2}) & = & \int_{- \infty}^{\infty} \int_{- \infty}^{\infty} f (u_{1}, x_{2}, u_{3}) d u_{1} d u_{3} \\ f (x_{3}) & = & \int_{- \infty}^{\infty} \int_{- \infty}^{\infty} f (u_{1}, u_{2}, x_{3}) d u_{1} d u_{2} . \end{array}$ $\begin{eqnarray}f(x_1) & = & \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} f(x_1, u_2, u_3) \; \mathrm{d}u_2 \; \mathrm{d}u_3 \\ f(x_2) & = & \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} f(u_1, x_2, u_3) \; \mathrm{d}u_1 \; \mathrm{d}u_3 \\ f(x_3) & = & \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} f(u_1, u_2, x_3) \; \mathrm{d}u_1 \; \mathrm{d}u_2. \end{eqnarray}$

The bivariate joint PDF f(x₁, x₂) may be obtained by integrating out the effect of x₃ such that $\begin{array}{l} f (x_{1}, x_{2}) & = & \int_{- \infty}^{\infty} f (x_{1}, x_{2}, u_{3}) d u_{3} . \end{array}$ $\begin{eqnarray*} f(x_1, x_2) & = & \int_{-\infty}^{\infty} f(x_1, x_2, u_3) \; \mathrm{d}u_3. \end{eqnarray*}$ (A.4)

The conditional joint PDF f(x₂|x₁) is then $\begin{array}{l} f (x_{2} | x_{1}) & = & \frac{f (x_{1}, x_{2})}{f (x_{1})} . \end{array}$ $\begin{eqnarray*} f(x_2|x_1) & = & \frac{f(x_1, x_2)}{f(x_1)}. \end{eqnarray*}$ (A.5)

The conditional joint PDF f(x₃|x₁, x₂) is then $\begin{array}{l} f (x_{3} | x_{1}, x_{2}) & = & \frac{f (x_{1}, x_{2}, x_{3})}{f (x_{1}) f (x_{2})} . \end{array}$ $\begin{eqnarray*} f(x_3|x_1, x_2) & = & \frac{f(x_1, x_2, x_3)}{f(x_1)f(x_2)}. \end{eqnarray*}$ (A.6)

In the above formulae when these formulae are numerically approximated all of the densities should be normalized such that $\int_{- \infty}^{\infty} f (x_{1}) d x_{1} = 1$ $\int_{-\infty}^{\infty}f(x_1)\;\mathrm{d}x_1 = 1$ , $\int_{- \infty}^{\infty} f (x_{2} | x_{1}) d x_{2} = 1$ $\int_{-\infty}^{\infty}f(x_2|x_1)\;\mathrm{d}x_2 = 1$ and $\int_{- \infty}^{\infty} f (x_{3} | x_{1}, x_{2}) d x_{3} = 1$ $\int_{-\infty}^{\infty}f(x_3|x_1,x_2)\;\mathrm{d}x_3 = 1$ in order to ensure mathematical consistency so that $0 ⩽ F (x_{1}), F (x_{2} | x_{1}), F (x_{3} | x_{1}, x_{2}) ⩽ 1 \forall {[x_{1}, x_{2}, x_{3}]}^{T} \in Ω$ $0 \leqslant F(x_1), F(x_2|x_1), F(x_3|x_1,x_2) \leqslant 1 \: \forall [x_1, x_2, x_3]^{\mathsf{T}} \in \mathbf{\Omega}$ .

As a result the conditional joint distributions are then calculated in terms of the previous conditional densities as $\begin{array}{l} F (x_{1}) & = & \int_{- \infty}^{x_{1}} f (u_{1}) d u_{1} \\ F (x_{2} | x_{1}) & = & \int_{- \infty}^{x_{2}} f (u_{2} | x_{1}) d u_{2} \\ F (x_{3} | x_{1}, x_{2}) & = & \int_{- \infty}^{x_{3}} f (u_{3} | x_{1}, x_{2}) d u_{3} . \end{array}$ $\begin{eqnarray}F(x_1) & = & \int_{-\infty}^{x_1} f(u_1) \; \mathrm{d}u_1 \\ F(x_2|x_1) & = & \int_{-\infty}^{x_2} f(u_2|x_1) \; \mathrm{d}u_2 \\ F(x_3|x_1,x_2) & = & \int_{-\infty}^{x_3} f(u_3|x_1,x_2) \; \mathrm{d}u_3. \end{eqnarray}$

The above mathematical equations are sufficiently general to be utilized in any appropriate numerical integration technique using either commercial or open source software packages.

One particular choice using open source software routines is to implement the above formulae using the Python scipy based KDE function gaussian_kde which may be used to calculate the kernel density estimates for one, two and three dimensions depending on the context. If the GS2 data has previously been saved in a text file Omega.txt from another simulation, then the marginal density f(x₁) and the joint PDF f(x₁, x₂, x₃) may then be conveniently constructed using the following Python computer code:

data = np.loadtxt(’Omega.txt’)
Omega = data
X1 = Omega[:, 0]
X2 = Omega[:, 1]
X3 = Omega[:, 2]
d1 = stats.gaussian_kde(X1)
def f1(x1):
     q = d1(x1)
     return q[0]
d123 = stats.gaussian_kde(Omega.T)
def f123(x1, x2, x3):
     q = d123([x1, x2, x3])
     return q[0]

The above formulation when produced using the stats.gaussian_kde function from the Python statistics package is useful as it simplifies the programming effort required and allows for a convenient function call to calculate the density f(x₁, x₂, x₃). Whilst a simplification in the programming code implementation for producing a Rosenblatt transformation is certainly beneficial, a downside is that this functional approach will tend to slow down a computer code implementation due to the overhead when calling the functions which is disadvantageous when sampling large numbers of random variables.

A alternative approach is to use the existing statistical functionality in R in order to calculate discrete values for an underling mesh as shown below.

library(misc3d)
library(R.matlab)
n1 <- nclass.FD(omega[, 1])
n2 <- nclass.FD(omega[, 2])
n3 <- nclass.FD(omega[, 3])
dens <- kde3d(omega[, 1], omega[, 2],
     omega[, 3], n = c(n1, n2, n3),
     lims = c(min(omega[, 1]),
     max(omega[, 1]),
     min(omega[, 2]), max(omega[, 2]),
     min(omega[, 3]), max(omega[, 3])))
v1 <- dens$x
v2 <- dens$y
v3 <- dens$z
g123 <- dens$d
f1 <- density(omega[, 1])
f2 <- density(omega[, 2])
f3 <- density(omega[, 3])
g1 <- approx(f1$x, f1$y,
     v1, method = ’linear’, 0, 0, rule = 2)$y
g2 <- approx(f2$x, f2$y,
     v2, method = ’linear’, 0, 0, rule = 2)$y
g3 <- approx(f3$x, f3$y,
     v3, method = ’linear’, 0, 0, rule = 2)$y
writeMat(paste(getwd(),
          ’/datav1.mat’, sep = ""),
          v1 = v1)

In the code fragment the Monte Carlo data after being generated and loaded into the work space is processed using the kde3d function in R in order to generate the f(x) discrete points and saved to a mat file format. The mat file may then be saved to file and loaded in GNU Octave or Matlab in order to implement a pure numerical implementation which depending on how the computer code is generated may be significantly faster.

Once expressions have been obtained to formulate the functional behaviour of the densities f(x₁), f(x₁, x₂) and f(x₁, x₂, x₃), then the final Rosenblatt transformation may be implemented using standard numerical analysis techniques.

References

BIPM, IEC, IFCC, ILAC, ISO, IUPAP, OIML, Tech. Rep., JCGM/WG1 GUM (2008). Revised 1st edition – https://www.bipm.org/en/publications/guides/ [Google Scholar]
BIPM, IEC, IFCC, ILAC, ISO, IUPAP, OIML, Tech. Rep., JCGM/WG1 GUM Supplement 1 (2008). 1st edition – https://www.bipm.org/en/publications/guides/ [Google Scholar]
BIPM, IEC, IFCC, ILAC, ISO, IUPAP, OIML, Tech. Rep., JCGM/WG1 GUM Supplement 2 (2011). 1st edition – https://www.bipm.org/en/publications/guides/ [Google Scholar]
W. Bich, M.G. Cox, C. Michotte, Metrologia 53 (5), S149–S159 (2016) DOI: 10.1088/0026-1394/53/5/S149 [Google Scholar]
P.H. Harris, C.E. Matthews, M.G. Cox, Metrologia 51 (3), 243–252 (2014) DOI: 10.1088/0026-1394/51/3/243 [Google Scholar]
V. Ramnath, Int. J. Metrol. Qual. Eng. 8 (4), 1–18 (2017) DOI: 10.1051/ijmqe/2016020 [CrossRef] [EDP Sciences] [Google Scholar]
A. Possolo, Metrologia 47 (3), 262–271 (2010) DOI: 10.1088/0026-1394/47/3/017 [Google Scholar]
V. Ramnath, Int. J. Metrol. Qual. Eng. 8 (29), 1–29 (2017) DOI: 10.1051/ijmqe/2017018 [CrossRef] [EDP Sciences] [Google Scholar]
T. Nagler, C. Czado, J. Multivar. Anal. 151, 69–89 (2016) DOI: 10.1016/j.jmva.2016.07.003 [Google Scholar]
J. Segers, M. Sibuya, H. Tsukahara, J. Multivar, Anal. 155, 35–51 (2017) DOI: 10.1016/j.jmva.2016.11.010 [Google Scholar]
E. Acar, M. Rais-Rohani, C.D. Eamon, Int. J. Reliab. Saf. 4 (2/3), 166–187 (2010) DOI: 10.1504/IJRS.2010.032444 [CrossRef] [Google Scholar]
S. Rahman, H. Xu, Prob. Eng. Mech. 19 (4), 393–408 (2004) DOI: 10.1016/j.probengmech.2004.04.003 [CrossRef] [Google Scholar]
G.L. Bretthorst, AIP Conf. Proc. 1553 (3), 3–15 (2013) DOI: 10.1063/1.4819977 [Google Scholar]
N. Armstrong, G.J. Sutton, D.B. Hibbert, Metrologia 56 (1), 015019 (15pp) DOI: 10.1088/1681-7575/aaf7d1 [Google Scholar]
M. Rosenblatt, Ann. Math. Stat. 23 (3), 470–472 (1952) DOI: 10.1214/aoms/1177729394 [Google Scholar]
R. Lebrun, A. Dutfoy, Prob. Eng. Mech. 24 (4), 577–584 (2009) DOI: 10.1016/j.probengmech.2009.04.006 [CrossRef] [Google Scholar]
S.J. van Albada, P.A. Robinson, J. Neurosci. Methods 161 (2), 205–11 (2007) DOI: 10.1016/j.jneumeth.2006.11.004 [Google Scholar]
K.H. Chang, Reliability analysis, in e-Design Computer-Aided Engineering Design, edited by K.H. Chang (Academic Press, 2015), chap. 10, pp. 523–595 [Google Scholar]
Mathworks, Tech. Rep., Mathworks (2019), https://www.mathworks.com/help/stats/prob.normaldistribution.icdf.html [Google Scholar]
J.W. Eaton, Tech. Rep., Sourceforge (2019), https://octave.sourceforge.io/nan/function/norminv.html [Google Scholar]
T.E. Oliphant, Tech. Rep., SciPy.org (2019), https://docs.scipy.org/doc/numpy/user/index.html [Google Scholar]
D.W. Scott, S.R. Sain, in Handbook of statistics - Data mining and data visualization, edited by C.R. Rao, E.J. Wegman, J.L. Solka (Elsevier, Oxford, 2005), Chap. 9, pp. 229–261 [Google Scholar]
R. Willink, Metrologia 46 (3), 154–166 (2009) DOI: 10.1088/0026-1394/46/3/002 [Google Scholar]
Z.A. Karian, E.J. Dudewicz, P. Mcdonald, Commun. Stat. Simul. Comput. 25 (3), 611–642 (1996) DOI: 10.1080/03610919608813333 [Google Scholar]
C.G. Corlu, M. Meterelliyoz, Commun. Stat. Simul. Comput. 45 (7), 2276–96 (2015) DOI: 10.1080/03610918.2014.901355 [Google Scholar]
B. Wang, Tech. Rep., CRAN (2019), https://cran.r-project.org/web/packages/gb/index.html [Google Scholar]
T. Nagler, C. Schellhase, C. Czado, Depend. Models 5 (1), 99–120 (2017) DOI: 10.1515/demo-2017-0007 [Google Scholar]
T. Nagler, U. Schepsmeier, J. Stoeber, E.C. Brechmann, B. Graeler, T. Erhardt, C. Almeida, A. Min, C. Czado, M. Hofmann, M. Lilliches, H. Joe, T. Vatter, Tech. Rep., CRAN (2019) https://cran.r-project.org/web/packages/VineCopula/index.html [Google Scholar]
S. Yadav, V.K. Gupta, A.K. Bandyopadhyay, MAPAN 26 (2), 133–151 (2011) DOI: 10.1007/s12647-011-0014-5 [CrossRef] [Google Scholar]
A. Azzalini, A. Capitanio, J. Roy. Stat. Soc. Ser. B 61 (3), 579–602 (1999) DOI: 10.1111/1467-9868.00194 [CrossRef] [Google Scholar]
A. Azzalini, Tech. Rep., CRAN (2019) https://cran.r-project.org/web/packages/sn/index.html [Google Scholar]
V. Ramnath, MAPAN 34 (3), 387–402 (2019) DOI: 10.1007/s12647-019-00324-w [CrossRef] [Google Scholar]
A. Ghasemi, S. Zahediasl, Int. J. Endocrinol. Metab. 10 (2), 486–9 (2012) DOI: 10.5815/ijem.3505 [PubMed] [Google Scholar]
K.V. Mardia, Biometrika 57 (3), 519–30 (1970) DOI: 10.1093/biomet/57.3.519 [Google Scholar]
S. Korkmaz, D. Goksuluk, G. Zararsiz, Tech. Rep., CRAN (2019) https://cran.r-project.org/web/packages/MVN/index.html [Google Scholar]

Cite this article as: Vishal Ramnath, Analysis of approximations of GUM supplement 2 based non-Gaussian PDFs of measurement models with Rosenblatt Gaussian transformation mappings, Int. J. Metrol. Qual. Eng. 11, 2 (2020)

All Tables

Table 1

Parameters for a generating a non-Gaussian pressure balance joint PDF model using a multivariate skew-normal distribution.

In the text

All Figures

	Fig. 1 Volumetric isosurfaces visualization of the actual GUM supplement 2 non-Gaussian joint PDF obtained from synthetic sampled Monte Carlo simulation data-points generated from a three-variable Azzalini multivariate skew-normal joint PDF.
In the text

	Fig. 2 Volumetric isosurfaces density visualization of the approximate GUM supplement 2 Gaussian joint PDF obtained from synthetic Monte Carlo data sampled from a Gaussian distribution using second order statistical moments based expected value and covariance information.
In the text

	Fig. 3 Volumetric isosurfaces density visualization of the equivalent Rosenblatt transformation system.
In the text

	Fig. 4 Statistical summary of multivariate non-Gaussian characteristics of the GUM supplement 2 Monte Carlo data using the Mardia normality test.
In the text

	Fig. 5 Statistical summary of multivariate non-Gaussian characteristics of the Rosenblatt transformation system Monte Carlo data using the Mardia normality test.
In the text

	Fig. 6 Visualization of the Rosenblatt marginal density f(x₁).
In the text

	Fig. 7 Visualization of the Rosenblatt conditional density f(x₂\|x₁).
In the text

	Fig. 8 Visualization of the Rosenblatt conditional density f(x₃\|x₁, x₂)
In the text

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.

[1] BIPM, IEC, IFCC, ILAC, ISO, IUPAP, OIML, Tech. Rep., JCGM/WG1 GUM (2008). Revised 1st edition – https://www.bipm.org/en/publications/guides/ [Google Scholar]

[2] BIPM, IEC, IFCC, ILAC, ISO, IUPAP, OIML, Tech. Rep., JCGM/WG1 GUM Supplement 1 (2008). 1st edition – https://www.bipm.org/en/publications/guides/ [Google Scholar]

[3] BIPM, IEC, IFCC, ILAC, ISO, IUPAP, OIML, Tech. Rep., JCGM/WG1 GUM Supplement 2 (2011). 1st edition – https://www.bipm.org/en/publications/guides/ [Google Scholar]

[4] W. Bich, M.G. Cox, C. Michotte, Metrologia 53 (5), S149–S159 (2016) DOI: 10.1088/0026-1394/53/5/S149 [Google Scholar]

[5] P.H. Harris, C.E. Matthews, M.G. Cox, Metrologia 51 (3), 243–252 (2014) DOI: 10.1088/0026-1394/51/3/243 [Google Scholar]

[6] V. Ramnath, Int. J. Metrol. Qual. Eng. 8 (4), 1–18 (2017) DOI: 10.1051/ijmqe/2016020 [CrossRef] [EDP Sciences] [Google Scholar]

[7] A. Possolo, Metrologia 47 (3), 262–271 (2010) DOI: 10.1088/0026-1394/47/3/017 [Google Scholar]

[8] V. Ramnath, Int. J. Metrol. Qual. Eng. 8 (29), 1–29 (2017) DOI: 10.1051/ijmqe/2017018 [CrossRef] [EDP Sciences] [Google Scholar]

[9] T. Nagler, C. Czado, J. Multivar. Anal. 151, 69–89 (2016) DOI: 10.1016/j.jmva.2016.07.003 [Google Scholar]

[10] J. Segers, M. Sibuya, H. Tsukahara, J. Multivar, Anal. 155, 35–51 (2017) DOI: 10.1016/j.jmva.2016.11.010 [Google Scholar]

[11] E. Acar, M. Rais-Rohani, C.D. Eamon, Int. J. Reliab. Saf. 4 (2/3), 166–187 (2010) DOI: 10.1504/IJRS.2010.032444 [CrossRef] [Google Scholar]

[12] S. Rahman, H. Xu, Prob. Eng. Mech. 19 (4), 393–408 (2004) DOI: 10.1016/j.probengmech.2004.04.003 [CrossRef] [Google Scholar]

[13] G.L. Bretthorst, AIP Conf. Proc. 1553 (3), 3–15 (2013) DOI: 10.1063/1.4819977 [Google Scholar]

[14] N. Armstrong, G.J. Sutton, D.B. Hibbert, Metrologia 56 (1), 015019 (15pp) DOI: 10.1088/1681-7575/aaf7d1 [Google Scholar]

[15] M. Rosenblatt, Ann. Math. Stat. 23 (3), 470–472 (1952) DOI: 10.1214/aoms/1177729394 [Google Scholar]

[16] R. Lebrun, A. Dutfoy, Prob. Eng. Mech. 24 (4), 577–584 (2009) DOI: 10.1016/j.probengmech.2009.04.006 [CrossRef] [Google Scholar]

[17] S.J. van Albada, P.A. Robinson, J. Neurosci. Methods 161 (2), 205–11 (2007) DOI: 10.1016/j.jneumeth.2006.11.004 [Google Scholar]

[18] K.H. Chang, Reliability analysis, in e-Design Computer-Aided Engineering Design, edited by K.H. Chang (Academic Press, 2015), chap. 10, pp. 523–595 [Google Scholar]

[19] Mathworks, Tech. Rep., Mathworks (2019), https://www.mathworks.com/help/stats/prob.normaldistribution.icdf.html [Google Scholar]

[20] J.W. Eaton, Tech. Rep., Sourceforge (2019), https://octave.sourceforge.io/nan/function/norminv.html [Google Scholar]

[21] T.E. Oliphant, Tech. Rep., SciPy.org (2019), https://docs.scipy.org/doc/numpy/user/index.html [Google Scholar]

[22] D.W. Scott, S.R. Sain, in Handbook of statistics - Data mining and data visualization, edited by C.R. Rao, E.J. Wegman, J.L. Solka (Elsevier, Oxford, 2005), Chap. 9, pp. 229–261 [Google Scholar]

[23] R. Willink, Metrologia 46 (3), 154–166 (2009) DOI: 10.1088/0026-1394/46/3/002 [Google Scholar]

[24] Z.A. Karian, E.J. Dudewicz, P. Mcdonald, Commun. Stat. Simul. Comput. 25 (3), 611–642 (1996) DOI: 10.1080/03610919608813333 [Google Scholar]

[25] C.G. Corlu, M. Meterelliyoz, Commun. Stat. Simul. Comput. 45 (7), 2276–96 (2015) DOI: 10.1080/03610918.2014.901355 [Google Scholar]

[26] B. Wang, Tech. Rep., CRAN (2019), https://cran.r-project.org/web/packages/gb/index.html [Google Scholar]

[27] T. Nagler, C. Schellhase, C. Czado, Depend. Models 5 (1), 99–120 (2017) DOI: 10.1515/demo-2017-0007 [Google Scholar]

[28] T. Nagler, U. Schepsmeier, J. Stoeber, E.C. Brechmann, B. Graeler, T. Erhardt, C. Almeida, A. Min, C. Czado, M. Hofmann, M. Lilliches, H. Joe, T. Vatter, Tech. Rep., CRAN (2019) https://cran.r-project.org/web/packages/VineCopula/index.html [Google Scholar]

[29] S. Yadav, V.K. Gupta, A.K. Bandyopadhyay, MAPAN 26 (2), 133–151 (2011) DOI: 10.1007/s12647-011-0014-5 [CrossRef] [Google Scholar]

[30] A. Azzalini, A. Capitanio, J. Roy. Stat. Soc. Ser. B 61 (3), 579–602 (1999) DOI: 10.1111/1467-9868.00194 [CrossRef] [Google Scholar]

[31] A. Azzalini, Tech. Rep., CRAN (2019) https://cran.r-project.org/web/packages/sn/index.html [Google Scholar]

[32] V. Ramnath, MAPAN 34 (3), 387–402 (2019) DOI: 10.1007/s12647-019-00324-w [CrossRef] [Google Scholar]

[33] A. Ghasemi, S. Zahediasl, Int. J. Endocrinol. Metab. 10 (2), 486–9 (2012) DOI: 10.5815/ijem.3505 [PubMed] [Google Scholar]

[34] K.V. Mardia, Biometrika 57 (3), 519–30 (1970) DOI: 10.1093/biomet/57.3.519 [Google Scholar]

[35] S. Korkmaz, D. Goksuluk, G. Zararsiz, Tech. Rep., CRAN (2019) https://cran.r-project.org/web/packages/MVN/index.html [Google Scholar]