Issue 
Int. J. Metrol. Qual. Eng.
Volume 8, 2017



Article Number  29  
Number of page(s)  29  
DOI  https://doi.org/10.1051/ijmqe/2017018  
Published online  27 November 2017 
Research Article
Numerical analysis of the accuracy of bivariate quantile distributions utilizing copulas compared to the GUM supplement 2 for oil pressure balance uncertainties
University of South Africa, Department of Mechanical and Industrial Engineering,
Private Bag X6,
Florida
1710, South Africa
^{*} ramnav@unisa.ac.za
Received:
12
June
2017
Accepted:
12
September
2017
In the field of pressure metrology the effective area is A_{e} = A_{0} (1 + λP) where A_{0} is the zeropressure area and λ is the distortion coefficient and the conventional practise is to construct univariate probability density functions (PDFs) for A_{0} and λ. As a result analytical generalized nonGaussian bivariate joint PDFs has not featured prominently in pressure metrology. Recently extended lambda distribution based quantile functions have been successfully utilized for summarizing univariate arbitrary PDF distributions of gas pressure balances. Motivated by this development we investigate the feasibility and utility of extending and applying quantile functions to systems which naturally exhibit bivariate PDFs. Our approach is to utilize the GUM Supplement 1 methodology to solve and generate Monte Carlo based multivariate uncertainty data for an oil based pressure balance laboratory standard that is used to generate known high pressures, and which are in turn crossfloated against another pressure balance transfer standard in order to deduce the transfer standard's respective area. We then numerically analyse the uncertainty data by formulating and constructing an approximate bivariate quantile distribution that directly couples A_{0} and λ in order to compare and contrast its accuracy to an exact GUM Supplement 2 based uncertainty quantification analysis.
Key words: pressure balance / uncertainty / copula / bivariate quantile function / GUM Supplement 2
© V. Ramnath, published by EDP Sciences, 2017
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
1 Introduction
In the field of pressure metrology when pistoncylinder operated pressure balances are used to measure applied pressures defined as P = F/A_{e} where F is a generalized force and A_{e} is the effective area, the most widespread model to characterize a pressure balance is that of a two parameter model such that A_{e} = A_{0} (1 + λP) where A_{0} is the zeropressure area and λ is the distortion coefficient. As a result knowledge of both the expected values as well as the associated covariance of a pressure balance's effective area model parameters A_{0} and λ are intrinsically necessary in order to fully quantify the uncertainty in A_{e} for subsequent application in pressure generation and measurement tasks. The conventional practise by many metrologists when performing an uncertainty quantification (UQ) analysis of a pistoncylinder operated pressure balance is to characterize the model parameters A_{0} and λ based on an analysis of the combination of the available experimental data such as crossfloating measurements and theoretical data such as finite element simulations. This analysis is usually performed with the application of the classical sensitivity coefficient based formulation of the GUM [1], or alternatively with the GUM Supplement 1 (GS1) Monte Carlo based UQ technique [2] in cases where the measurand model is either too complex or too nonlinear.
When these UQ techniques are applied the resulting multivariate UQ data for the pressure balance model, either implicitly in terms of a multivariate Gaussian distribution if generated with the GUM or potentially in terms of a multivariate nonGaussian distribution if generated with the GS1, is then typically approximated with univariate probability density function (PDF) distributions for A_{0} and λ due to the complexity of the models, whilst the covariance information through the available experimental and theoretical available data in terms of the coupling between A_{0} and λ is then usually either neglected, estimated or occasionally indirectly approximated.
Univariate PDFs of measurands have traditionally been analytically specified either as Gaussian or Student tdistributions in the case of a GUM based analysis, or alternately through discrete representations of either the underlying distribution function or equivalently the PDF when the uncertainty analysis was performed using the GS1 approach. In GS1 based uncertainty analysis approaches generated Monte Carlo data has in a majority of cases been utilized for subsequent analysis by retrieving previously generated and stored numerical data from electronic files. Recent work by Harris et al. [3] considered the problem of how to analytically represent GS1 numerical data by investigating and studying different families of quantile functions and approaches in order to estimate the respective quantile function parameters. Their investigations demonstrated the utility of extended lambda distribution (ELD) based quantile function representations to accurately summarize arbitrary PDFs generated using Monte Carlo simulation data. This methodology has recently been applied by Ramnath [4] in the field of gas pressure metrology through the use of an ELD based quantile function for the characterization of a pressure balance's effective area in the low pressure range 5 kPa ≤ p ≤ 7 MPa where the distortion coefficient λ is considered negligible and the pressure balance characteristics is fully specified in terms of just a univariate PDF of the zeropressure area A_{0}. In this particular investigation a noncontinuum gas flow was present and as a result the GUM was not appropriate due to the presence of a highly nonlinear measurand model. The measurand model of the gas pressure balance's effective area which was numerically solved with a corresponding uncertainty PDF determined through a GS1 Monte Carlo methodology demonstrated the feasibility of utilizing quantile functions for characterizing univariate PDFs which may exhibit deviations from more traditional Gaussian shaped deviations such as asymmetry and/or skewness in more complex fluid based primary scientific measurement systems.
Unfortunately for pressure balances operated at higher pressures where the effect of the distortion coefficient is significant due to fluid structural interaction effects the uncertainties and correlation effects between A_{0} and λ for hydraulic oil operated gauge pressure balances in the medium pressure range 7 MPa ≤ P ≤ 30 MPa must be considered. These effects are considered particularly critical and essential in the high pressure range 30 MPa ≤ P ≤ 500 MPa when calculating the uncertainty of a pressure balance's effective area. As a result based on this observation from an UQ perspective it is considered desirable and beneficial if a pressure balance model can be characterized directly in terms of a joint bivariate PDF in order to preserve intrinsic statistical uncertainty analysis information that may utilized be in order to accurately determine and quantify the magnitudes of the correlation effects that are present in physical pressure balances operated at high hydraulic pressures where there is strong coupling between A_{0} and λ.
Based on these motivating factors our approach in this paper is to firstly utilize an implicit multivariate GUM methodology which is used to solve for and characterize generated applied pressure data values and uncertainties for an oil pressure balance laboratory standard by sampling from appropriate PDF and joint PDF distributions where available. We then utilize the known generated pressure values and uncertainties as inputs into a full GS1 Monte Carlo simulation which is used to numerically generate multivariate data for the crossfloating of a transfer standard. Once the multivariate data for the transfer standard has been generated we then postprocess it using the GS2 [5] methodology for further analysis in order to determine the actual ellipsoidal and smallest coverage regions for the coupling of the transfer standard's zeropressure area A_{0} and distortion coefficient λ in a joint PDF. The exact coverage regions obtained with the GS2 methodology are then compared with results obtained with approximate bivariate quantile distributions that we formulate in this paper in order to investigate and determine the potential accuracy of bivariate quantile distributions for modelling pressure balance characteristics in terms of joint PDFs.
2 Mathematical models
The mathematical model for a pressure balance using a working fluid in a liquid state may be formulated as a pressure measurand equation through the analysis of a freebody diagram such that the applied pressure P = (p − p_{a}) is of the form $$p{p}_{a}=\frac{1}{S}\left(\left\{\sum _{i=1}^{n}{m}_{i}\left(1\frac{{\rho}_{a}}{{\rho}_{i}}\right){V}_{s}({\rho}_{f}{\rho}_{a})+H({\rho}_{f}{\rho}_{a})S\right\}g+\sigma C\right)$$(1) where p is the working fluid pressure, and p_{a} is the surrounding ambient pressure as per the original formulation by Dadson et al. [6]. In this model m_{i} and ρ_{i} are the actual mass and density values for the system of weights used to generate the respective pressures if there are n total mass pieces, ρ_{a} is the ambient density, V_{s} is the submerged volume of the piston into the working fluid, ρ_{f} is the density of the working fluid at an actual pressure of p, g is the local gravitational acceleration constant, σ is the surface tension of the working fluid where C is the circumference of the wetted perimeter of the piston in contact with the working fluid, H is a hydrostatic height term to account for any potential differences in elevation between the reference datums of generated pressues and measured pressures, and S is the effective area of the pressure balance. In the absence of additional information a useful approximation that may be used to estimate the circumference is $C=\sqrt{4\pi S}$.
Due to the fact that physical measurements may occur in which the temperature of the pressure balance may vary through a combination of ambient conditions and operating conditions the effective area is usually adjusted to correspond to a known reference temperature such that S = A_{0} [1 + λP] ϕ (t, t_{ref}) and ϕ (t, t_{ref}) =1 + α (t − t_{ref}). As per the discussion by Dadson et al. α is the sum of the thermal expansion coefficients of the piston and cylinder so that α = (α_{p} + α_{c}), and the reference temperature is fixed to a constant value which is usually just set to t_{ref} = 20 ° C in many commonwealth countries, and to 23 ° C in other countries. Following the general approach by many pressure metrologists the ambient air density may be specified using the CIPM2007 formula as ${\rho}_{a}=\frac{p{M}_{a}}{ZRT}\left[1{x}_{v}\left(1\frac{{M}_{v}}{{M}_{a}}\right)\right]$ as discussed by Picard et al. [7] where M_{a} = 28.96546 × 10^{−3} kg mol^{−1} is the molar mass of dry air, M_{v} = 18.01528 × 10^{−3} kg mol^{−1} is the molar mass of water, Z is the air compressibility, R = 8.314472 J mol^{−1} K^{−1} is the CODATA2006 recommended value for the universal gas constant, and x_{v} is the mole fraction of water vapour for the corresponding pressure, temperature and relative humidity conditions for the surrounding ambient air. The previous formulae may now be combined and rearranged as a nonlinear function ${f}^{\star}(p,{q}_{1}^{\star},\dots ,{q}_{m}^{\star})$ where p is the unknown working fluid pressure p which is to be determined and ${q}_{j}^{\star},\text{\hspace{0.17em}}j=1,\dots ,m$ are laboratory standard parameters such that $${f}^{\star}(p,{q}_{1}^{\star},\dots ,{q}_{m}^{\star})=\frac{1}{{S}^{\star}}\left[\sum _{i=1}^{n}{m}_{i}^{\star}g\left(1\frac{{\rho}_{a}}{{\rho}_{i}^{\star}}\right)+g[{H}^{\star}{S}^{\star}{V}_{s}^{\star}][{\rho}_{f}^{\star}{\rho}_{a}]+{\sigma}^{\star}{C}^{\star}\right](p{p}_{a})$$(2) The generated pressure p of the working fluid from a laboratory standard may then be obtained by solving ${f}^{\star}(p,{q}_{1}^{\star},\dots ,{q}_{m}^{\star})=0$ with any suitable solver by noting that ρ_{f}, C and S formally depend on the pressure p whilst all the other terms are independent of the working fluid's pressure. The above calculations are performed if we assume that the air density ρ_{a} = f_{a} (p_{a}, t_{a}, h_{a}) and fluid density ρ_{f} = f_{f} (p, t) are calculated by known functions in terms of the independent variables, and where ${f}^{\star}(p,{q}_{1}^{\star},\dots ,{q}_{m}^{\star})$ formally has units of applied pressure which allows for the numerical solution to be obtained for any specified applied pressure accuracy level.
A simple approach to solve for the unknown generated pressure p is to assume that p_{min} ≤ p ≤ p_{max} and search within this pressure range where p_{min} and p_{max} are rough estimates of the nominal pressure. As an example for an applied pressure of 250 MPa with a nominal atmospheric pressure of p_{a} = 101.325 kPa first calculate the approximate pressure as p_{0} = P + p_{a} and then for example set p_{min} = 0.95p_{0} and p_{max} = 1.05p_{0} as specifications to search within this interval or alternately to solve the full nonlinear solver of ${f}^{\star}(p,{q}_{1}^{\star},\dots ,{q}_{m}^{\star})=0$ using p_{0} as a starting solution. The next issue which naturally arises is how to solve this particular nonlinear equation noting that for the generating pressure equation both the density ρ_{f} and the wetted circumference $C=\mathrm{}\sqrt{4\pi S}$ depend on the pressure p. Ideally we would use for example Newton's method x_{n+1} = x_{n} − [f(x_{n})]/[f′(x_{n})], n = 0, 1, 2, ... for successive approximations to solve f (x) =0 where f′(x_{n}) denotes the derivative of the function where the benefits of using an iterative solver is that the generating pressure may be calculated to any desired numerical accuracy level. In our particular case it is however cumbersome to analytically calculate the derivative due to the choice of the fluid's equation of state. Fortunately there are derivative free iterative formulae as reported by Dehghan and Hajarian [8] to solve a nonlinear equation f (x) =0 and we may to use a forward difference formula originally developed by Steffensen such that x_{n+1} = x_{n} − [f (x_{n})] ^{2}/[f (x_{n} + f (x_{n}))] − f (x_{n})]. As a result through the use of the Steffensen formula we may control the numerical precision of the solved for applied pressures P_{k}, k = 1, … , n_{p} where n_{p} are the number of generated pressures by specifying the number of iterations.
In pressure metrology practise there is generally only have a limited indirect level of covariance information available and it is therefore common to disregard covariance terms in the ${f}^{\star}(p,{q}_{1}^{\star},\dots ,{q}_{m}^{\star})=0$ model by setting cov (q_{i}, q_{j}) =0, i ≠ j in the generated pressure calculation. The only potential exception where cov (q_{i}, q_{j}) ≉0 i ≠ j is in terms of the correlation between A_{0} and λ i.e. for cov (A_{0}, λ), and in the correlations between the weights and densities used i.e. cov (m_{i}, m_{j}), cov (ρ_{i}, ρ_{j}) and cov (m_{i}, ρ_{j}) respectively for i ≠ j unless otherwise specified.
The issue of how to approximate the correlation for weights was previously investigated by Bich [9] who utilized the multivariate GUM approach of Cox and Harris [10] in a slightly modified equation for a mass measurement as a general implicit equation. Later investigations by Palencar et al. [11] followed a similar multivariate GUM framework but refined the analysis slightly by using a least squares formulation to deduce the optimal values of the parameters in a calibration of mass pieces. Their investigation concluded that covariance terms can actually contribute nonnegligible terms in the uncertainty analysis of mass measurements and that ideally covariance information should be included in mass measurement calibration certificates.
Whilst the approach of Palencar et al. [11] does allow for the covariance information to be directly estimated under certain restricting assumptions for mass pieces traceable through a sequence of measurements to a particular country's national kilogram prototype standard or equivalent national metrology institure primary mass standard, in this paper we will disregard the correlation effects between mass and density terms used by the set of weights in the laboratory standard pressure generation model for simplicity. This modelling simplification is due to the fact that in most practical cases calibrated mass certificates do not provide any covariance information for the weights used by a pressure balance. In addition for simplicity we will also disregard the covariance between the laboratory standard zeropressure area and distortion coefficient cov (A_{0}, λ) in the pressure balance that is used to generate the known applied pressure due to that fact that many existing pressure calibration certificates do not generally provide any specific explicit covariance information.
Our approach of disregarding the covariance term cov (A_{0}, λ) for the generated pressure in the absence of specific known covariance information is consistent with experimental recommendations for national metrology institutes for pressure balances operated in free deformation mode as documented in for example EURAMET CG3 [12], and results from theoretical finite element based studies for pressure balances operated in both free deformation and controlled clearance modes such as the EUROMET Project 463 by Sabuga et al. [13] and in controlled clearance modes only by Dogra et al. [14] respectively. As a result of our choice to disregard covariance effects in the generated pressure calculation when implementing a GS1 Monte Carlo approach as discussed by Cox and Siebert [15] for the pressure generation calculations, it will then usually suffice to sample with appropriate numerical techniques for the inputs q_{i} from univariate PDFs g_{i} (ξ_{i}) associated with the corresponding input parameters q_{i} where ξ_{i} is a random variable of q_{i} unless information of a joint PDF is specifically available. Unless otherwise specified the PDFs for the input parameters will usually be Gaussian distributions however following the more recent work by Harris et al. it is now in principle possible to model the inputs as arbitrary PDFs using a univariate ELD quantile distribution as originally investigated by Willink [16].
We comment that even though we disregard covariance terms in the inputs of the generated pressure balance model in the absence of available information that when these generated pressures are used as inputs in the crossfloated pressure balance model to deduce the corresponding effective areas that due to the complexity of the mathematical model the covariance information in the parameters used to model the effective areas will still be present in an implicit form of the associated Monte Carlo data of the crossfloated effective areas.
Once the generated pressures from the laboratory standard (LS) pressure balance model are obtained and the GS1 information appropriately processed, the generated pressure PDFs ${g}_{i}^{\star}(\mathrm{}{\xi}_{i}^{\star})\mathrm{}$ where ${\xi}_{i}^{\star}$ is a random variable for generated pressure p_{i} may then used as inputs for when the LS is crossfloated against another pressure balance such as a unitundertest or transfer standard (TS) in order to determine the other pressure balance's effective area. When two pressure balances are crossfloated against each other the point of equilibrium occurs when the applied pressure P = (p − p_{a}) in both pressure balances are equal in which case the measurand equation for a crossfloated TS is $$f(S,{q}_{1},\dots ,{q}_{m})=\frac{1}{(p{p}_{a})}\left[\sum _{i=1}^{n}{m}_{i}g\left(1\frac{{\rho}_{a}}{{\rho}_{i}}\right)+g[HS{V}_{s}][{\rho}_{f}{\rho}_{a}]+\sigma C\right]S$$(3) In the above equation S is the unknown TS effective area which must be determined, and q_{j}, j = 1, … , m are corresponding parameters associated with the crossfloated TS pressure balance measurand equation. The benefit of this particular equation formulation is that f (S, q_{1}, … , q_{m}) formally has units of crossfloated area which then allows for the numerical solution accuracy to be controlled by specifying the number of iterations. Whilst a simple estimate to solve for the generated pressure is to assume a range of possible generated pressure values using for example 0.95p_{0} ≤ p ≤ 1.05p_{0} where p_{0} is the approximate nominal pressure p_{0} = P + p_{a} as previously discussed such a straightforward estimate of the range of possible crossfloated areas of the TS is not possible since we have no prior information in order to determine a possible range of values S_{min} ≤ S_{0} ≤ S_{max} to search within.
A simple initial approximation to overcome this difficulty is to ignore the effect of the oil surface tension and TS submerged volume on the TS crossfloated area so that a rough crossfloated area estimate is ${S}_{0}\approx \left[{\sum}_{i=1}^{n}{m}_{i}g\left(1\frac{{\rho}_{a}}{{\rho}_{i}}\right)\right]/\left[(p{p}_{a})Hg({\rho}_{f}{\rho}_{a})\right]$ since the hydrostatic pressure head term would not be negligible and to then use a search range of say 0.95S_{0} ≤ S ≤ 1.05S_{0} in order to solve the full nonlinear equation f (S, q_{1}, … , q_{m}) =0 or alternately by using for example the Steffensen formula as previously discussed.
In this paper the ⋆ superscripts signifies terms specific to the LS pressure balance which is used to generate the pressure whilst the unstarred terms represent quantities that are specific to the TS pressure balance which is crossfloated in order to determine corresponding effective areas. The model f (S, q_{1}, … , q_{m}) =0 using the previous generated pressure PDF ${g}_{{P}_{k}}({\xi}_{{P}_{k}})$ inputs may then be solved in another GS1 simulation to obtain the PDF distribution ${g}_{{S}_{k}}({\eta}_{{S}_{k}})$ for the crossfloated effective areas S_{k} for each pressure P_{k} where ${\eta}_{{S}_{k}}$ is a random variable of the crossfloated effective area S_{k}.
Since the crossfloated effective areas S_{k} are univariate data it follows that an ELDQF distribution may also be used to conveniently summarize S_{k} in terms of parameters {a_{k}, b_{k}, c_{k}, d_{k}} for k ∈ [1, … , n_{P}] for each of the n_{P} crossfloat effective area measurements corresponding to the n_{P} generated pressures where as previously discussed the crossfloated effective area is modelled in terms of a two parameter model such that S = A_{0} (1 + λP) where P is the applied pressure.
For our particular pressure balance problem a GS1 based uncertainty analysis is utilized for conceptual simplicity for the LS generated pressures in order to generate data for further analysis when the LS is crossfloated against the TS pressure balance as this avoids some of the subjectivity in certain aspects of the correlation effects modelling in the GUM based matrix analysis.
The GS1 methodology is focused on multivariate input models with an input $X={[{X}_{1},\dots ,{X}_{N}]}^{\text{T}}$ and a single output $Y=[{Y}_{1}]$ defined in terms of an implicit equation $h(y,x)=0$ such that the expected value $\tilde{y}$, variance ${u}^{2}(\tilde{y})$ and distribution function ${\tilde{G}}_{Y}(\eta )$ are approximated using M Monte Carlo simulations such that $h({y}_{r},{x}_{r})=0$ where r = 1, 2, … , M, $\tilde{y}=\frac{1}{M}{\sum}_{r=1}^{M}{y}_{r}$, ${u}^{2}(\tilde{y})=\frac{1}{M1}{\sum}_{r=1}^{M}{\left({y}_{r}\tilde{y}\right)}^{2}$, ${\tilde{G}}_{Y}(\eta )=\frac{r\frac{1}{2}}{M}+\frac{\eta {y}_{r}}{M({y}_{r+1}{y}_{r})}$ where y_{r} ≤ η ≤ y_{r+1} for r = 1, 2, … , M − 1. When the GS1 methodology is implemented for the LS generated pressure P_{k}, k = 1, … , n_{P} there will be a set of Monte Carlo univariate simulation data ${P}_{k}=\{{P}_{k}^{(1)},\mathrm{...},{P}_{k}^{(M)}\}$ for each generated pressure which may then be used as an input for the crossfloat of the TS.
A GS2 methodology on the other hand when contrasted to the GS1 method also considers the case for an input X but where there are now multiple outputs such that $Y={[{Y}_{1},\mathrm{...},{Y}_{m}]}^{\text{T}}$ defined in terms of an implicit vector equation $h(y,x)=0$. This methodology is also implemented by generating Monte Carlo data by first sampling from the input vector PDF for ${x}_{r}\sim {g}_{X}(\xi )$ or relevant associated joint PDFs if this information is available such that $h({y}_{r},{x}_{r})=0$ for r = 1, 2, … , M, $\tilde{y}=\frac{1}{M}\left[{y}_{1}+\cdots +{y}_{M}\right]$, ${\text{U}}_{{\displaystyle \tilde{y}}}=\frac{1}{M1}\left[({y}_{1}{\displaystyle \tilde{y}})\cdot {({y}_{1}{\displaystyle \tilde{y}})}^{\text{T}}\text{+}\mathrm{...}\text{+}({y}_{M}{\displaystyle \tilde{y}})\cdot {({y}_{M}{\displaystyle \tilde{y}})}^{\text{T}}\right]$ and $G=\left[{y}_{1},\dots ,{y}_{M}\right]$ where M is the number of Monte Carlo simulation events in order to produce a dataset y_{r}. In our particular case when the GS2 methodology is implemented for the TS crossfloated area S_{k}, k = 1, … , n_{P} there will be a set of Monte Carlo bivariate simulation data ${S}_{k}=\left\{{\left[{({A}_{0})}_{k}^{(1)},{\lambda}_{k}^{(1)}\right]}^{\text{T}},\dots ,{\left[{({A}_{0})}_{k}^{(M)},{\lambda}_{k}^{(M)}\right]}^{\text{T}}\right\}$.
When the GS2 uncertainty analysis method is implemented and the Monte Carlo simulation data postprocessed an implicit multivariate model may be used to construct the average vector value $\tilde{y}$ and corresponding covariance matrix ${\text{U}}_{{\displaystyle \tilde{y}}}$. As a result when using a GS2 approach the method as per the offic$y\sim \mathcal{N}(\mu ,\text{V})$mplicit assumption that the output follows a multivariate Gaussian distribution such that $y\sim \mathcal{N}(\mu ,\text{V})$ where the expected value is $\mu \approx {\displaystyle \tilde{y}}$ and the covariance matrix is $\sum \approx {\text{U}}_{\tilde{y}}$ unless otherwise specified. This feature is common to both the GS1 and GS2 due to the extreme mathematical complexity and numerical challenges in directly solving the Markov integral which provides for the exact PDF or joint PDF without any assumptions for both univariate as well as multivariate models. For the case of the GS2 the joint PDF for a multivariate Gaussian may then be written in matrix notation such that $${g}_{X}(\xi )=\frac{\mathrm{exp}\left(\frac{1}{2}{(\xi \mu )}^{T}{\text{V}}^{1}(\xi \mu )\right)}{\sqrt{{(2\pi )}^{N}\left\text{V}\right}}$$(4) where V^{−1} is the inverse of the matrix V, and $\leftV\right=\mathrm{det}(\text{V})$ denotes the determinant of the matrix V. Using this notation in the case of a bivariate distribution with an input $X={[{X}_{1},{X}_{2}]}^{\text{T}}$ and a corresponding random variable $\xi ={[{\xi}_{1},{\xi}_{2}]}^{\text{T}}$ the expected vector μ and covariance matrix V may then be specified as $$\mu =\left[\begin{array}{c}\hfill {\mu}_{{X}_{1}}\hfill \\ \hfill {\mu}_{{X}_{2}}\hfill \end{array}\right]$$(5a) $$\text{V}=\left[\begin{array}{cc}\hfill {\sigma}_{{X}_{1}}^{2}\hfill & \hfill \rho {\sigma}_{{X}_{1}}{\sigma}_{{X}_{2}}\hfill \\ \hfill \rho {\sigma}_{{X}_{1}}{\sigma}_{{X}_{2}}\hfill & \hfill {\sigma}_{{X}_{2}}^{2}\hfill \end{array}\right]$$(5b)where ${\sigma}_{{X}_{1}}^{2}$ and ${\sigma}_{{X}_{2}}^{2}$ are the variances for X_{1} and X_{2} respectively, and ρ is an indication of the covariance between the inputs X_{1} and X_{2}. The use of a Pearson r instead of ρ i.e. Spearman's rho as an indication of the correlation is usually the more common approach in mechanical metrology problems however other approaches such as Kendall's tau are also possible. The main advantage of these approaches are that the quantification of the extent of the correlation between the random variables x_{1} and x_{2} do not depend of the choice of parametrization for the model. Some additional benefits of using either a Kendall's tau or Spearman's rho instead of the Pearson's correlation coefficient are that these options are usually a more accurate indicator of correlation when the random variables X_{1} and X_{2} do not follow a Gaussian joint PDF since a Pearson's correlation can sometimes give misleading results if the random variables do not follow multivariate normal distributions as discussed in more detail by Goda [17]. If (x_{1}, y_{1}) , … , (x_{n}, y_{n}) are a set of n observations then Kendall's tau τ_{K} may be calculated as $${\tau}_{K}=\frac{CD}{n(n1)/2}$$(6)where C is the number of concordant pairs and D is the number of discordant pairs for the set of observations where a particular pair (x_{i}, y_{i}) and (x_{j}, y_{j}) for i ≠ j is concordant if x_{i} > x_{j} and y_{i} > y_{j} or alternately if x_{i} < x_{j} and y_{i} < y_{j}, whilst the same pair is discordant if x_{i} > x_{j} and y_{i} < y_{j} or if x_{i} < x_{j} and y_{i} > y_{j}. In the special case if x_{i} = x_{j} and y_{i} = y_{j} then the pair is neither concordant nor discordant and does not contribute in the calculation of τ_{K}. The use of Kendall's tau τ_{K} provides a useful simplification later in the paper for constructing particular formulations of copulas.
The benefit of using a GS2 approach to directly calculate the crossfloated effective areas is that any covariance information is immediately present and available from the covariance matrix ${\text{U}}_{{\displaystyle \tilde{y}}}$, and as a result this approach avoids any additional modelling assumptions for covariance terms. We will utilize the above ${S}_{k}=\left\{{\left[{({A}_{0})}_{k}^{(1)},{\lambda}_{k}^{(1)}\right]}^{\text{T}},\dots ,{\left[{({A}_{0})}_{k}^{(M)},{\lambda}_{k}^{(M)}\right]}^{\text{T}}\right\}$ multivariate dataset in order to investigate the accuracy of a bivariate quantile distribution in approximating a joint PDF for A_{0} and λ.
Once the univariate generated pressure Monte Carlo data ℘_{k} has been generated it is then necessary to summarize them. In the case of a univariate PDF g (η) with a random variable η and output Y one particular definition of defining a quantile function Q (ρ) following the approach of Harris et al. is by setting the distribution function as G (η) = ρ where 0 ≤ ρ ≤ 1 such that the quantile function Q (ρ) is defined such that η = Q (ρ) ≡ G^{−1} (ρ) from which it may be demonstrated that g (η) =1/[Q^{′} (ρ)]. The practical consequences of this definition is that if ρ is sampled from the rectangular distribution R (0, 1) then Q (ρ) corresponds to a sampling from g (η), [Q (p_{1}) , Q (p_{2})] is a probabilistically symmetric coverage interval for p if ${p}_{1}=\frac{1}{2}(1p)$ and ${p}_{2}=\frac{1}{2}(1+p)$, and that the expectation and variance for Y may be calculated as $E(Y)={\int}_{0}^{1}Q(\rho )\text{\hspace{0.17em}}d\rho $ and $V\mathrm{}(Y)\mathrm{}{=\int}_{0}^{1}{\mathrm{}[Q(\mathrm{}\rho )\mathrm{}E\mathrm{}(Y\mathrm{})]}^{2}\text{d}\rho $ respectively. For the univariate case where a quantile function distribution following the approach of Willink [16] utilizing an ELDs is implemented it has been demonstrated by Willink that the corresponding QF and ELDQF for the PDF may be simply represented in terms of real parameters a, b, c and d. Technical details to determine the values of a, b, c and d for an ELD were discussed in full and complete detail by Willink who compared the method of using four moments, four quantiles, and a modified method using the mean, variance and two quantiles. For pressure metrology purposes the method of using four quantiles is considered relatively simple and straightforward for pressure balances with univariate effective area models such as gas operated pressure balances at low pressure as discussed by Ramnath [4]. The approach of using quantile functions such as ELDs to summarize Monte Carlo uncertainty analysis data was further developed by Harris et al. [3] where they extended the earlier work of Willink for univariate distributions of a quantity Y to summarizing an arbitrary parent probability distributions denoted as P in terms of a generalized quantile function. This approach involved the utilization of the distribution function G_{P} (ζ) and PDF G_{P} (ζ) where ζ ∈ Z ∼ P is a random variable for the parent probability distribution and η ∈ Y is a random variable for the output Y. According to this approach where η = F (ζ) the random variable is calculated as $\eta =F({G}_{P}^{1}(\rho ))$ where ρ ∈ U ∼ R (0, 1) is a random variable following a rectangular distribution and the PDF of Y is calculated as g (η) = [g_{P} (ζ)]/[F^{′} (ζ)]. The key simplification of this approach is that the generalized quantile function defined in terms of F (ζ) simply requires knowledge of F (ζ) and a mechanism of sampling random variables from the parent distribution ζ ∈ Z ∼ P. In this paper we will opt for this general approach of Harris et al. since the ELD approach of Willink using four parameters to model the PDFs of the generated pressures and crossfloated areas may not necessarily be adequate. For simplicity we will use a random variable ρ ∈ R (0, 1) following a rectangular distribution as this is convenient with readily available random number generators. Although it is possible to use an increasing number of parameters over and above the four parameters in for example an ELD based QF through the calculation of higher order moments by working out for example the coefficients of Bspline approximations for F (ζ) with open source software routines [18] our approach will simply directly utilize the actual GS1 distribution functions for each of the generated pressures and crossfloated areas when constructing the joint PDF.
Earlier approaches such as that by Ramnath [19] utilized statistical linear regression analysis techniques developed by Krystek and Anton [20] for straight line data with correlation in plots of applied pressure [P ± u (P)] data versus crossfloated effective area [S ± u (S)] data with associated error bars. In this approach the generated pressure and crossfloated areas uncertainties were calculated with the classical sensitivity coefficient based GUM and the straight line fit for the curve S = A_{0} (1 + λP) allowed for A_{0} and A_{0}λ to be estimated, and which were then in turn used to estimate the distortion λ and indirectly estimate the correlation between A_{0} and λ. In this paper our approach is different since our objective is to instead construct an approximation to the actual joint PDF ${g}_{X}(\xi )$ with $X={[{A}_{0},\lambda ]}^{\text{T}}$ so that we may directly sample random values ${\xi}_{{A}_{0}}$ and ξ_{λ} from the underlying joint bivariate PDF similar to how univariate random variables ξ_{P} may be sampled from the generated pressure PDF.
The GS2 documentation specifically focuses on multivariate Gaussian distributions $x\sim \mathcal{N}(\mu ,\text{V})$ for a model with an output x, an expected value μ and an uncertainty matrix V. Explicit guidelines for sampling for such a joint PDF are provided such that a sampled draw may be estimated as $\xi =x+{\text{R}}^{\text{T}}z$ where x = [μ_{1}, … , μ_{N}] ^{T}, and ${z}_{i}\sim \mathcal{N}(0,1)$ if $X={[{X}_{1},\dots ,{X}_{N}]}^{\text{T}}$ and R is an upper triangular matrix formed from a Cholesky decomposition such that ${\text{U}}_{x}={\text{R}}^{\text{T}}\text{R}$. This focus is achieved through the assumptions that $\mu \approx \tilde{y}$ and $\text{V}\approx {\text{U}}_{x}$, however in general it may not necessarily be the case that a multivariate Gaussian is the best choice of PDF.
Although there may be some merits in particular cases of modelling the multivariate Monte Carlo data in terms of nonGaussian distributions the benefits of doing so may be outweighed by the necessity of selecting, deciding on and choosing alternative probability distribution information formulations as discussed by Tang et al. [21] who studied these effects in certain applications using the equivalent probability distribution information contained in appropriate choices of copulas. The theoretical approach of constructing bivariate distributions as discussed by Tang et al. for constructing the joint cumulative distribution function (CDF) and PDF of variables X and Y is possible if the marginal distributions of X say F (x) and that of Y say G (y) where x and y are corresponding random variables are both known. If a particular choice of copula function is specified then the application of Sklar's theorem as discussed by Tang et al. allows for the joint PDF f (x, y) to be constructed as $$f(x,y)=F(x)G(y)c(F(x),G(y);\theta )$$(7a) $$c(F(x),G(y);\theta )=\frac{{\partial}^{2}C}{\partial u\partial v}$$(7b) where u = F (x), v = G (y) and the copula is defined as C : I × I → I where c (F (x) , G (y)) is known as the copula density and I = [0, 1]. The copula C occasionally written as C_{θ} (u, v) where θ is a fixed parameter is formally defined as a mapping from the unit square I^{2} to the unit interval I. In this formulation θ is a parameter associated with the choice of the copula function for the underlying data specific to the particular model data as discussed by Genest and Favre [22]. One particular example of a copula are those belonging to the Farlie–Gumbel–Morgenstern family such that C_{θ} (u, v) = uv + θuv (1 − u) (1 − v) for u, v ∈ [0, 1] where the copula is constrained by the Frechet–Hoeffding bounds such that W (u, v) ≤ C (u, v) ≤ M (u, v) where W (u, v) = max {0, u + v − 1} and M (u, v) = min {u, v}. These bounds are used if X and Y are not independent, whilst a choice of C = Π = uv may be used if for example X and Y are fully independent where the degree of independence between the variables X and Y may be estimated in terms of either the Spearman's rho or Kendall's tau values as previously discussed. In the particular case of bivariate distributions the formal definition of a copula is that of a mapping such that $$C:\mathrm{}{I}^{2}\to I$$(8a) $$C(0,x)=C(x,0)=0$$(8b) $$C(1,x)=C(x,1)=x\text{\hspace{1em}}\forall \text{}x\in I$$(8c) $$0\le [C(b,d)+C(a,c)C(a,d)C(b,c)]\text{\hspace{1em}}\forall \text{}a,b,c,d\in I,\text{}a\le b\text{\hspace{0.17em}and\hspace{0.17em}}c\le d$$(8d)Different types of copulas are considered later in the paper when we construct the exact joint PDF from a GS2 Monte Carlo simulation and utilize various choices of copulas in approximating the actual bivariate joint PDF in order to investigate the suitability of various choices of bivariate quantile constructions of which copulas are one particular choice amongst several for modelling and summarizing a pressure balance's joint PDF.
Regardless of the particular choice of copula C_{θ} (u, v) it is seen that the construction of the joint PDF essentially reduces to that of the choice of a mapping function $\mathrm{\Phi}\mathrm{}{I}^{2}\to I$. This choice of mapping function is necessary in order to relate how variables 0 ≤ p, r ≤ 1 may be used as inputs for the mapping function to generate the corresponding random variables consistent with the joint PDF f (x, y). As a result although the absence of guidance from the GS2 for sampling from nonGaussian multivariate PDFs is potentially problematic this aspect is not an issue since it turns out that sampling from a bivariate quantile function based distribution may still be achieved using sampling from associated univariate PDFs and two dimensional coordinate transformation mappings.
In this paper our main motivation is to summarize the GS2 based joint PDF using a bivariate quantile smoothing spline that was originally developed by He et al. [23] where a response surface Z depends on two variables X and Y where it is assumed that the observations z_{ij} is known at each (x_{i}, y_{j}) for i = 1, … , m and j = 1, … , n such that x_{1} < ⋯ < x_{m} and y_{1} < ⋯ < y_{n} for convenience. In the original paper by He et al. it was demonstrated that the optimal solution for fitting the surface and they considered the special case where the covariates X_{1} and X_{2} were in the domain [0, 1] × [0, 1] which corresponds to our particular problem for random variables 0 ≤ p, r ≤ 1.
One approach to construct the joint PDF is based on the direct use of the Markov formula as discussed by Cox and Siebert [15] where the PDF denoted as ${g}_{Y}(\eta )$ for a model $Y=f(X)$ is formally defined in terms of the joint PDF denoted as ${g}_{X}(\xi $ as ${g}_{Y}(\eta )={\int}_{\infty}^{\infty}{g}_{X}(\xi )\delta (\eta f(\xi ))\text{\hspace{0.17em}d}{\xi}_{1}\cdots \text{d}{\xi}_{N}$. For our model the crossfloated area of the pressure balance is simply S = A_{0} (1 + λP) so this may be formally expressed as S ∼ g_{S} (η) so that $${g}_{S}(\eta )={\int}_{\infty}^{\infty}{\int}_{\infty}^{\infty}{\int}_{\infty}^{\infty}{g}_{{A}_{0},\lambda}({\xi}_{{A}_{0}},{\xi}_{\lambda}){g}_{P}({\xi}_{P})\times \delta (\eta f({\xi}_{{A}_{0}},{\xi}_{\lambda},{\xi}_{P}))\text{\hspace{0.17em}d}{\xi}_{{A}_{0}}\text{\hspace{0.17em}d}{\xi}_{\lambda}\text{\hspace{0.17em}d}{\xi}_{P}$$(9) In the above formulation ${g}_{{A}_{0},\lambda}({\xi}_{{A}_{0}},{\xi}_{\lambda})$ is the unknown joint PDF which is not necessarily a multivariate Gaussian where the applied pressure is independent since in practical terms it can be varied by simply setting the working fluid pressure. As a result the PDF for the pressure can be uncoupled from the joint PDF for the area. It follows that the above continuous multivariate integral may then be approximated as a system of simultaneous equations such that $${g}_{{S}_{k}}({\eta}_{{S}_{k}}^{(\ell )})\approx \sum _{i=1}^{{N}_{{A}_{0}}}\sum _{j=1}^{{N}_{\lambda}}\sum _{r=1}^{{N}_{P}}\left[{g}_{{A}_{0},\lambda}\left({\xi}_{{A}_{0}}^{(i)},{\xi}_{\lambda}^{(j)}\right){g}_{{P}_{k}}\left({\xi}_{{P}_{k}}^{(r)}\right)\delta \left({\eta}_{\ell}{\xi}_{{A}_{0}}^{(i)}\left\{1+{\xi}_{\lambda}^{(j)}{\xi}_{P}^{(r)}\right\}\right)\right],\text{\hspace{1em}}k=1,\dots ,{n}_{P}\text{\hspace{0.17em}and\hspace{0.17em}}\ell \in [1,\dots ,{N}_{\eta}]$$(10)Whilst the Markov formula may in principle be used to calculate the exact joint PDF it is however not generally feasible in practical terms due to finite arithmetic precision resolution errors. These issues are due to the large difference in magnitudes from the crossfloated area PDF g_{S} (η), the joint PDF ${g}_{{A}_{0},\lambda}({\xi}_{{A}_{0}},{\xi}_{\lambda})$ and pressure PDF g_{P} (ξ_{P}) which when utilized to build up an equivalent system to solve the Markov integral equation will result in an illconditioned linear system of the form $\text{A}x=B$ where the coefficient matrix A is built up in terms of the PDF g_{P} (ξ_{P}), the known vector B is built up in terms of the PDF g_{S} (η) and the unknown x is the values of the joint PDF ${g}_{{A}_{0},\lambda}({\xi}_{{A}_{0}},{\xi}_{\lambda})$ at fixed coordinates for a chosen grid of ${\xi}_{{A}_{0}}$ and ξ_{λ} points. Extensions beyond standard IEEE 32bit and 64bit accuracies to variable precision arithmetic (VPA) accuracies using Fortran/C++ based implementations to mitigate against finite arithmetic resolution errors are discussed in more detail by Bailey and Borwein [24].
Our approach in this paper is to instead use the GS2 approach for multivariate measurand models of the form $h(y,x)=0$ for an input x and output y. If $x\sim {g}_{\text{X}}(\xi )$ and $y\sim {g}_{\text{Y}}(\eta )$ where ξ and η are random variables then model must also satisfy $h(\eta ,\xi )=0$. This means that if the model undergoes a Monte Carlo simulation then η can also be postprocessed in order to determine its probability distribution as per the GS2 documentation if η can be recovered from the equation $h(\eta ,\xi )=0$.
The utilization of the GS2 to determine A_{0} and λ where in our case $\eta ={[{A}_{0},\lambda ]}^{\text{T}}$ is slightly more complicated since the pressure balance model for the TS f (S, q_{1}, … , q_{m}) =0 is not in the standard form where there is an explicit system of equations for the parameters A_{0} and λ. If we however consider the entire set of crossfloat measurements as one system we may then construct a new vector equation h (y, x) =0 by employing a unweighted linear least squares which is discussed in more technical detail by White and Saunders [25].
For our purposes we will utilize the well known results for a general linear unweighted least squares problem where the function is constructed as $y(x)={\sum}_{i=1}^{N}{a}_{i}{X}_{i}(x)$ in terms of an independent variable x, and parameters a_{1}, … , a_{N} where X_{i} (x) are specified basis functions as discussed by Press et al. [26]. Following this approach a merit function is constructed as ${\chi}^{2}={\sum}_{j=1}^{M}{\left[{y}_{j}{\sum}_{i=1}^{N}{a}_{i}{X}_{i}(x)\right]}^{2}$ and minimized by calculating the parameter values to satisfy ∂χ^{2}/∂ a_{i} = 0, i = 1, … , N. Implementing this approach to our particular problem where n_{P} is the number of generated pressures and associated crossfloats with S = A_{0} (1 + λP) then yields $${\chi}^{2}=\sum _{j=1}^{{n}_{P}}{[{S}_{j}\{{A}_{0}(1+\lambda {P}_{j})\}]}^{2}$$(11) so that the simultaneous system of equations ∂χ^{2}/∂ A_{0} = 0 and ∂χ^{2}/∂ λ = 0 may then be used to implement the GS2 methodology where $$h(y,x)=\left[\begin{array}{c}\hfill \frac{\partial {\chi}^{2}}{\partial {A}_{0}}\hfill \\ \hfill \frac{\partial {\chi}^{2}}{\partial \lambda}\hfill \end{array}\right]$$(12a) $$\frac{\partial {\chi}^{2}}{\partial {A}_{0}}=\sum _{j=1}^{{n}_{P}}\{2{A}_{0}(1+\lambda {P}_{j})[{S}_{j}{A}_{0}(1+\lambda {P}_{j})]\}$$(12b) $$\frac{\partial {\chi}^{2}}{\partial \lambda}=\sum _{j=1}^{{n}_{P}}\{2{A}_{0}{P}_{j}[{S}_{j}{A}_{0}(1+\lambda {P}_{j})]\}$$(12c)where in our case $y={[{A}_{0},\lambda ]}^{\text{T}}$ and $x={[{q}_{1},\dots ,{q}_{m}]}^{\text{T}}$ as per our earlier mathematical modelling approach. In practical terms the approach used would be to sample random variables ${q}_{1}^{\star},\dots ,{q}_{m}^{\star}$ in order to solve the LS equation f^{⋆} (p, q_{1}, … , q_{m}) =0 for the generated pressure, and to then use this as input to solve the TS equation f (S, q_{1}, … , q_{m}) =0 for the crossfloated area S. As a result there will be a set of crossfloated areas $\mathcal{S}={[{S}_{1},\dots ,{S}_{M}]}^{\text{T}}$ for the M Monte Carlo simulation events where each simulation event will have a known applied pressure. Since for each simulation event the solved for pressures and crossfloat areas are statistically valid it follows that the model would also need to be satisfied for these values.
As a result the GS2 Monte Carlo simulation of unweighted least squares of statistically sampled values in accordance with the underlying probablity distributions then becomes equivalent to a conventional multivariate regression practice as discussed by Press et al. [26] since the sampled values are formally a statistically valid possibility based on the underlying probability distribution. This system may then be used to determine a Hyperellipsoidal coverage region which in our particular problem will correspond to a two dimensional region for the bivariate joint PDF.
In order to have a better conceptual mathematical understanding of the statistical definition of a quantile we first make the geometric observation that in the case of a univariate PDF g (η) that the corresponding univariate quantile Q_{1D} (ρ) function is technically a one dimensional mapping $${Q}_{1D}\mathrm{}:\rho \to \eta \mathrm{}\text{\hspace{0.17em}},\text{s.t.\hspace{0.17em}}\rho \in R(0,1),\text{\hspace{1em}}\eta \in g(\mathrm{}\xi )\mathrm{}$$ that transforms a random variable ρ sampled from a rectangular distribution into a corresponding random variable ξ such that the sampled value is η = g (ξ). We may then using this conceptual tool in this paper then use the approach of Gilchrist [27] to generalize the definition of a bivariate quantile Q_{2D} ([ρ_{1}, ρ_{2}] ^{T}) as a corresponding two dimensional mapping $$f(x,y)=\frac{1}{\parallel \frac{\partial x}{\partial p}\frac{\partial y}{\partial r}\frac{\partial x}{\partial r}\frac{\partial y}{\partial p}\parallel}$$In particular if when implementing the previously discussed geometrical mapping mathematical definition for bivariate quantiles by taking h (p, r) =1 where p and r are random variables such that 0 ≤ p ≤ 1 and 0 ≤ r ≤ 1 a natural consequence which results is that the bivariate quantile may be considered to be the mapping from the unit rectangle to the corresponding surface of the joint PDF f (x, y) formally defined following the approach by Gilchrist as $$f(x,y)=\frac{1}{\Vert \frac{\partial x}{\partial p}\frac{\partial y}{\partial r}\frac{\partial x}{\partial r}\frac{\partial y}{\partial p}\Vert}$$(13a) $$=[h(p,r)]\cdot \parallel J(p,rx,y)\parallel $$(13b)In the above formula J (p, rx, y) =1/[J (x, yp, r)] is the corresponding Jacobian of the transformation from the unit rectangle h (p, r) =1 to the joint PDF surface f (x, y). The random variables x and y as outlined above may be assumed to be equivalent to $x\equiv {\xi}_{{A}_{0}}$ and y ≡ ξ_{λ} for brevity later in the paper where they correspond to random variables associated with the zeropressure area A_{0} and distortion coefficient λ respectively for our particular pressure balance crossfloated effective area model. If an appropriate two dimensional mapping is correctly chosen then the variables p and r may be independently sampled from a rectanguar distribution R (0, 1) and then the choice of mapping will allow for random variables to be sampled from the corresponding bivariate joint PDF. This purely mathematical transformation result is consistent with the earlier statistical based observation of the need for a choice of copula that may be used with the marginal distributions in order to construct the joint PDF.
As a result of this observation we note that a bivariate quantile is not necessarily unique as there are in principle an arbitary number of mappings between the two surfaces however if we fix the type of mapping which is equivalent to a particular choice of copula then we can construct a unique associated bivariate quantile distribution. Our approach in this paper is to use the multivariate GS1 methodology to generate the actual bivariate joint distribution of the pressure balance TS ${g}_{X}(\xi )$ where $X={[{A}_{0},\lambda ]}^{T}$ and $\xi ={[{\xi}_{{A}_{0}},{\xi}_{\lambda}]}^{\text{T}}$ are associated random variables sampled from a PDF corresponding to X. The actual joint PDF $f({\xi}_{{A}_{0}},{\xi}_{\lambda})$ surface constructed out of the GS1 Monte Carlo simulation data is then numerically approximated with a bivariate PDF $\phi ({\xi}_{{A}_{0}},{\xi}_{\lambda})$ using a selection of two dimensional transformation mappings. The accuracy of the bivariate quantile distribution may then be investigated by determining how the corresponding effective area uncertainty calculated in terms of $\phi (\mathrm{}{\xi}_{{A}_{0}},\mathrm{}{\xi}_{\lambda})\mathrm{}$ compares to the actual effective area uncertainties obtained through the actual GS1 Monte Carlo simulation crossfloating data.
The previous definition of bivariate quantiles by Gilchrist was developed in more formal mathematical statistical details in an earlier work by Chaudhuri [28] and may be formally extended to systems other than bivariate distributions with potential application to metrology problems involving higher dimensional nonGaussian multivariate distributions. One particular model discussed by Gilchrist is the generalized circular model of form $${Q}_{x}(p,r)={\lambda}_{x}+{\eta}_{x}\sqrt{Q(p;\beta )}\mathrm{cos}(2\pi r)$$(14a) $${Q}_{y}(p,r)={\lambda}_{y}+{\eta}_{y}\sqrt{Q(p;\beta )}\mathrm{sin}(2\pi r)$$(14b) where 0 ≤ p, r ≤ 1 as previously discussed and β is an appropriate constant parameter that must be determined. The above system of equations is a particular form for the general bivariate quantile transformation where each random variable has a corresponding equation. Another strategy that may be used to construct bivariate quantiles with more generality than the generalized circular model is that of the marginal/conditional form where $$x\mathrm{}{=Q}_{x}(\mathrm{}p)\mathrm{}$$(15a) $$y\mathrm{}{=Q}_{y}(\mathrm{}p\mathrm{},r\mathrm{})$$(15b)In this formulation the first function x = Q_{x} (p) is a univariate quantile function, whilst in the second function y = Q_{y} (p, r) if x is fixed then p is in turn fixed so that the second function is actually also a univariate quantile function for y for a given choice of x. The practical consequence of choosing to model a bivariate quantile in terms of a marginal/conditional formulation is that the previously developed ELD formulation can in principle also be utilized to construct the associated level curves, however a potential drawback of a conventional marginal distribution approximation of a bivariate joint PDF is that in general only a fixed number of contour curves can be analytically constructed and hence interpolation is required in order to calculate random variables for arbitrary choices of specified values of p and r. Whilst marginal distributions for the zeropressure area A_{0} and distortion coefficient λ can be constructed relatively easily using for example extended lambda based univariate quantile distributions a newer approach of fitting the actual bivariate joint PDFs with copulas is now available. In the copula approach one uses known specified marginal distributions which are relatively easy to calculate, and then optimizes parameters $\theta ={[{\theta}_{1},\dots ,{\theta}_{n}]}^{\text{T}}$ associated with the particular choice of copula family ${C}_{\theta}(u,v)$ in order more closely match the copula constructed joint PDF with the actual known joint PDF. As a result fitting copulas to joint PDFs can potentially offer considerable algebraic simplifications where for bivariate distributions the main choices that may be investigated are usually elliptical copulas and generalized Archimedean copula families.
Whilst these particular copula based approaches have their own respective merits we comment that a generalized multivariate quantile function approach originally developed by Chaudhuri [28] may also be implemented for bivariate distributions. The starting point is to generate multivariate data points ${X}_{1},{X}_{2},\dots ,{X}_{n}\in {\mathbb{R}}^{d},\text{\hspace{0.17em}}2\le d\in \mathbb{Z}$ in ℝ^{d} which are assumed to be known through for example Monte Carlo simulations. For our purposes this Monte Carlo data will be generated with the GS2 approach as discussed earlier with $h(y,x)=0$ which will formally provide a sequence of data points ${y}_{1},\dots ,{y}_{M}\in {\mathbb{R}}^{2}$ where M is the number of Monte Carlo simulation events and ${y}_{q}\mathrm{}{=\left[{\xi}_{{A}_{0}}^{\mathrm{}(q\mathrm{})}\mathrm{},{\xi}_{\lambda}^{(\mathrm{}q)\mathrm{}}\right]}^{\text{T}},\text{\hspace{0.17em}}\mathrm{}\text{\hspace{0.17em}}q=1,\dots ,\mathrm{}M$ and where we make the observation that we do not assume that μ ≈ y and ∑ ≈ U_{y} but simply assume that ${y}_{1},\dots ,{y}_{M}$ follows some underlying probability distribution which may not necessarily be Gaussian.
By defining the set as ${B}^{(d)}=\{u:u\in {\mathbb{R}}^{d},\text{\hspace{0.17em}}\parallel u\parallel <1\}$ i.e. B^{ (d)} is an open unit ball, and by defining a functional as $\Phi (u,t)=\parallel t\parallel +\langle u,t\rangle $ where $u\in {B}^{(d)}$, $t\in {\mathbb{R}}^{d}$ where 〈⋅ , ⋅ 〉 represents the Euclidean inner product, a multivariate quantile function may then be formally mathematically defined as $${{\displaystyle \stackrel{\u02c6}{\text{Q}}}}_{n}\mathrm{}(u)\mathrm{}={\displaystyle \underset{Q\text{\hspace{0.17em}}\in \text{\hspace{0.17em}}{\mathbb{R}}^{d}}{{\displaystyle \text{min}}}}{\displaystyle \sum _{i=1}^{n}}\mathrm{\Phi}\mathrm{}(u,\mathrm{}{X}_{i}Q)\mathrm{}$$(16) The above is a multivariate generalization for 2 ≤ d ∈ ℤ of the univariate case corresponding to d = 1 where the univariate quantile maps values of α for 0 ≤ α ≤ 1 with an associated parameter u = (2α − 1) to the one dimensional interval (−1, 1).
According to the above multivariate definition extreme quantiles would correspond to ∥u ∥ ≈ 1 and central quantiles would correspond to ∥u ∥ ≈ 0 respectively. An iteration algorithm in order to construct the quantile function was originally provided by Chaudhuri where for ${X}_{1},\dots ,{X}_{n}$ the algorithm Step #1 is to compute for all of the corresponding data points ${X}_{i}$ for 1 ≤ i ≤ n and check if the degeneracy condition $$\left\sum _{j=1,j\ne i}^{n}\left\{\parallel {X}_{j}{X}_{i}{\parallel}^{1}({X}_{j}{X}_{i})\right\}+(n1)u\right\le (1+\parallel u\parallel )$$(17) is valid for some 1 ≤ i ≤ n. If the degeneracy condition is satisfied for some i then just set ${\stackrel{\u02c6}{\text{Q}}}_{n}(u)={X}_{i}$. Alternately if the degeneracy condition cannot be satisfied for any 1 ≤ i ≤ n then move to Step #2 by constructing ${\stackrel{\u02c6}{\text{Q}}}_{n}(u)$ by solving the equation $$\sum _{i=1}^{n}\left\{{\left{X}_{i}{\stackrel{\u02c6}{\text{Q}}}_{n}(u)\right}^{1}\left[{X}_{i}{\stackrel{\u02c6}{\text{Q}}}_{n}\right]\right\}+nu=0$$(18)The above equation must be solved by using for example the method of successive approximations of which one possible approach is to use a starting solution composed of a vector of means such that $${\stackrel{\u02c6}{\text{Q}}}_{n}^{(1)}\approx {[{\mu}_{1},\dots ,{\mu}_{n}]}^{\text{T}}$$(19)where each mean is simply the mean of the respective set of components. Once this starting value has been constructed successive approximations for m = 2, 3, … may then be generated using iterations so that $${\stackrel{\u02c6}{\text{Q}}}_{n}^{(m+1)}(u)={\stackrel{\u02c6}{\text{Q}}}_{n}^{(m)}(u)+{\Phi}^{1}\Delta $$(20)where $$\mathrm{\Phi}=\sum _{i=1}^{n}\left({\left{X}_{i}{\stackrel{\u02c6}{\text{Q}}}_{n}^{(m)}\right}^{1}\left[{I}_{d}{\left{X}_{i}{\stackrel{\u02c6}{\text{Q}}}_{n}^{(m)}\right}^{2}\left\{{X}_{i}{\stackrel{\u02c6}{\text{Q}}}_{n}^{(m)}\right\}{\left\{{X}_{i}{\stackrel{\u02c6}{\text{Q}}}_{n}^{(m)}\right\}}^{\text{T}}\right]\right)$$(21) $$\mathrm{\Delta}\mathrm{}{\displaystyle \underset{i=1}{\overset{n}{=\sum}}}{\left{X}_{i}{{\displaystyle \stackrel{\u02c6}{Q}}}_{n}^{\mathrm{}(m)\mathrm{}}\right}^{1}\left\{{X}_{i}{{\displaystyle \stackrel{\u02c6}{Q}}}_{n}^{\mathrm{}(m\mathrm{})}\right\}+n\mathit{u}$$(22)and ${I}_{d}$ is the d × d identity matrix. Further aspects of how this definition of quantile function may be used to quantify and describe skewness and kurtosis in higher dimensional multivariate distributions is discussed in more detail in the original paper by Chaudhuri [28].
This geometric generalization of a quantile function in higher dimensional spaces may thus have some further technical potential for analysing metrology systems than cannot be modelled in terms of either univariate and bivariate probability distributions as a future topic of research study by metrologists particularly when nonGaussian multivariate distributions may occur. In the context of pressure metrology such potential future research problems might include higher dimensional multivariate joint PDF's such as for pressure balances in free deformation mode or controlled clearance modes. Examples of higher dimensional multivariate probability distributions for pressure balances in free deformation mode operation include situations when nonlinear elasticity theory is utilized and two distortion coefficient parameters are then necessary so that the effective area is modelled as A = A_{0} (1 + λ_{1}p + λ_{2}p^{2}) and a joint PDF ${g}_{{A}_{0},{\lambda}_{1},{\lambda}_{2}}({\xi}_{{A}_{0}},{\xi}_{{\lambda}_{1}},{\xi}_{{\lambda}_{2}})$ must be constructed. On the other hand for pressure balances operated in controlled clearance mode such as when the Heydemann–Welch method is implemented for a primary standard scale realization for hydraulic pressures and the pressure balance effective area is modelled as A = A_{0} (1 + α (t − t_{ref}) (1 + λP) [1 + d_{j} (p_{j0} − p_{j})] as discussed by Kajikawa et al. [29,30] additional parameters such as the jacket pressure coefficient d_{j} and zeroclearance jacket pressure p_{j0} are then present in which case a higher dimensional joint probability distribution ${g}_{{A}_{0},\lambda ,{d}_{j},{p}_{j0}}({\xi}_{{A}_{0}},{\xi}_{\lambda},{\xi}_{{d}_{j}}{\xi}_{{p}_{j0}})$ must be approximated to fully characterize a pressure balance.
3 Numerical simulations
In this paper we perform an analysis by utilizing the experimental data set previously reported by Ramnath [19] as indicated in Appendix A which provides full technical details and supporting data where the working fluid is assumed to be Di(2)ethylhexylsebacate defined in terms of an oil equation of state reported by Kocas et al [31]. Numerical experiments were performed on a Toshiba laptop with an Intel Pentium B950 CPU operating at 2.10 GHz with 2 GB of RAM running on a Microsoft Windows 7 64bit operating system using Gnu Octave 4.2.0. Simulations were undertaken with M = 10 000 Monte Carlo simulation events for each generated pressure and crossfloated area data point. The full GS1 simulation for ten generated pressures and crossfloats from 50 MPa to 500 MPa therefore consisted of 10 000 × 10 = 100 000 total simulation events and took approximately 115 minutes to solve. When the GS2 was implemented by postprocessing the GS1 data by using the crossfloat data to fit a curve through the data points [P_{1}, … , P_{10}] ^{T} and [S_{1}, … , S_{10}] ^{T} in order to construct the χ^{2}optimization for the best fit curve S = A_{0} (1 + λP) the equivalent number of M = 10 000 simulation events took less than a minute to solve when using the builtin polyfit routine of Gnu Octave. As a result Monte Carlo simulations for 10 × 10^{3} ≤ M ≤ 25 × 10^{3} simulation events may be is considered a feasible option with singlecore laptops/workstations as these calculations would typically be able to performed in a single working day. On the other hand if higher counts of simulation events in the range 25 × 10^{3} ≤ M ≤ 100 × 10^{3} are required we comment that this would in most practical circumstances require access to either multicore computers or alternatively high performance computing parallel computing computers or clusters to avoid simulation times of a few weeks.
Our approach in this paper as outlined earlier in the paper first solves the underlying equations for the generated pressure ${f}^{\star}(p,{q}_{1}^{\star},\dots ,{q}_{m}^{\star})=0$ and crossfloated area f (S, q_{1}, … , q_{m}) =0 using sampled random values q^{⋆} and q which are drawn from the respective input PDFs which produce Monte Carlo GS1 data for the generated pressure MCP = [P_{1}, … , P_{10}] and crossfloated area MCS = [S_{1}, … , S_{10}] which are both M × 10 matrices since each generated pressure is used to crossfloat the TS. The Monte Carlo data of these matrices are built up in terms of column vectors of the form ${P}_{k}={\left[{\xi}_{{P}_{k}}^{(1)},\dots ,{\xi}_{{P}_{k}}^{(M)}\right]}^{\text{T}}$ and ${S}_{k}={\left[{\eta}_{{S}_{k}}^{(1)},\dots ,{\eta}_{{S}_{k}}^{(M)}\right]}^{\text{T}}$where each column contains the Monte Carlo data for that particular generated pressure or crossfloated area so that the matrix data is then stored in data files for convenience. The distribution functions ${G}_{{P}_{k}}({\xi}_{{P}_{k}})$ and ${G}_{{S}_{k}}({\eta}_{{S}_{k}})$may if necessary be calculated and summarized in terms of ELDs built up in terms of fours parameters a, b, c, and d such that η = Q (ρ) where $$Q(\rho )=\{\begin{array}{l}d+\frac{c}{b}\left[a{\rho}^{b}{(1\rho )}^{b}+1a\right],\text{\hspace{1em}}b\ne 0\hfill \\ d+c\left[a\mathrm{ln}(\rho )\mathrm{ln}(1\rho )\right],\text{\hspace{1em}}b=0\hfill \end{array}$$(23a) $$g(\eta )={\left\{c\left[a{\rho}^{b1}+{(1\rho )}^{b1}\right]\right\}}^{1}$$(23b) and 0 ≤ ρ ≤ 1 is a random variable to generate the PDFs as originally discussed by Willink [16]. We comment that the generated pressures and crossfloated areas may if necessary also be calculated directly as ${\mu}_{{P}_{k}}={\int}_{0}^{1}{Q}_{{P}_{k}}(\rho )\text{\hspace{0.17em}d}\rho $ and ${\sigma}_{{P}_{k}}^{2}={\int}_{0}^{1}{({Q}_{{P}_{k}}{\mu}_{{P}_{k}})}^{2}\text{\hspace{0.17em}d}\rho $ where ${Q}_{{P}_{k}}$ is the quantile function for pressure P_{k} as summarized in Table 1 with similar expressions for the crossfloated areas as summarized in Table 2.
Once the expected values ${\mu}_{{P}_{k}}$ and ${\mu}_{{S}_{k}}$ are calculated this set of GS1 data which is consistent with the associated underlying PDFs may then be used to estimate the approximate nominal zeropressure area ${\mu}_{{A}_{0}}$ and distortion coefficient μ_{λ} from the pressure versus area graph as illustrated in Figure 1a which is based on a GS1 simulation with M = 10 000 total simulation events as previously discussed. When this original GS1 data is then further processed in the GS2 simulation using the original GS1 data in the χ^{2} optimization to extract the values for A_{0} and λ it then results in the scatter plot as illustrated in Figure 1b which may then be further postprocessed in order to construct the bivariate joint PDF.
An approximate visualization of the bivariate joint PDF using a kernal density estimate (KDE) approach along with visualizations of the associated marginal distributions for A_{0} and λ are illustrated in Figure 1c using a Gaussian kernal.
Our approach to construct the bivariate PDF uses the builtin histogram2d function from the Python numpy and scipy scientific computing libraries for convenience since we wish to fit the actual joint PDF f (x, y) where x and y are random variables. This choice of using histogram2d to explicitly build up a discrete approximation of the actual PDF with explicit bin counts that are appropriately weighted for a two dimensional normalization of the joint PDF is used over a KDE approach since the implementation of a KDE implicitly requires a choice of kernal function as discussed by Diwekar and David [32] where the PDF for univariate data ${X}_{1},\dots ,{X}_{n}$ is constructed such that $f(x)=\frac{1}{nh}{\sum}_{i=1}^{n}K\left(\frac{x{X}_{i}}{h}\right)$ where h is the bandwidth and K is the kernel function. The use of a kernal function also occurs in the case of multivariate data ${X}_{i}\in {\mathbb{R}}^{d}$ with data points ${X}_{1},\dots ,{X}_{n}$ where the multivariate PDF is constructed such that $f(x)=\frac{1}{n{h}^{d}}{\sum}_{i=1}^{n}K\left(\frac{x{X}_{i}}{h}\right)$ and as a result a similar issue will arise if we opt to use a KDE formulation to construct the bivariate PDF since the choice of kernal used for the KDE construction will introduce a level of subjectivity in the bivariate joint PDF. Typically when implementing a KDE to construct a univariate PDF a Gaussian kernal function is usually used however other particular choices of kernals such as an Epanechnikov and biweight kernal can also be used. As a result different PDF estimates may occur when the underlying Monte Carlo data is postprocessed using a KDE approach.
Since we do not have any a priori knowledge of the most appropriate underlying kernal our approach is to simply construct the actual discrete approximation of the joint PDF using the Python histogram2d function which correctly normalizes the two dimensional PDF such that ${\int}_{\infty}^{\infty}{\int}_{\infty}^{\infty}f(x,y)\text{\hspace{0.17em}d}x\text{\hspace{0.17em}d}y=1$ where in our particular problem the random variables are chosen such that $x\equiv {\xi}_{{A}_{0}}/[{m}^{2}]$ and y ≡ ξ_{λ}/[Pa^{−1}]. This is performed by using the well known Freedman–Diaconis statistical rule to estimate the bandwidth as $${h}_{x}=2\left(IQR(x)\right){N}^{1/3}$$(24) where IQR (x) = Q_{3} (x) − Q_{1} (x) is the interquartile range for the the random variable x where Q_{3} (x) and Q_{1} (x) are the third and first quartiles. The number of bins h_{x} associated with x using this choice of bandwidth is then calculated using the ceiling function as $${N}_{x}=\lceil \frac{\text{max}(x)\text{min}(x)}{{h}_{x}}\rceil $$(25)with similar associated expressions for the other random variable y. The final actual bivariate joint PDF from the GS2 simulation is illustrated in Figure 2 in natural physical units. Referring to the two dimensional discrete histogram we observe that the limits for a joint PDF ${g}_{{A}_{0},\lambda}({\xi}_{{A}_{0}},{\xi}_{\lambda})$ are approximately $\text{min}({\xi}_{{A}_{0}})=1.9612\text{\hspace{0.17em}m}{\text{m}}^{2}$, $\text{max}({\xi}_{{A}_{0}})=1.9617\text{\hspace{0.17em}m}{\text{m}}^{2}$, min(ξ_{λ}) =0.30 ppm/MPa and max(ξ_{λ}) =1.10 ppm/M. In this joint PDF the random variable for area ${\xi}_{{A}_{0}}$ is in units of m^{2} so that when the pressure is in units of Pa the random variable for the distortion coefficient ξ_{λ} is in units of Pa^{−1} for dimensional consistency in order to satisfy the normalization condition ${\int}_{\infty}^{\infty}{\int}_{\infty}^{\infty}{g}_{{A}_{0},\lambda}({\xi}_{{A}_{0}},{\xi}_{\lambda})\text{\hspace{0.17em}d}{\xi}_{{A}_{0}}\text{d}{\xi}_{\lambda}=1$.
For brevity let the random variable ${\xi}_{{A}_{0}}$ be represented by x and the random variable ξ_{λ} be represented by y so that the joint PDF $f({\xi}_{{A}_{0}},{\xi}_{\lambda})$ is just f (x, y) for convenience. If we construct variables such that $$p=\frac{{\xi}_{{A}_{0}}\text{min}({\xi}_{{A}_{0}})}{\text{max}({\xi}_{{A}_{0}})\text{min}({\xi}_{{A}_{0}})}$$(26a) $$r=\frac{{\xi}_{\lambda}\text{min}({\xi}_{\lambda})}{\text{max}({\xi}_{\lambda})\text{min}({\xi}_{\lambda})}$$(26b) $$\phi (p,r)=\frac{{\xi}_{{A}_{0},\text{}}({\xi}_{{A}_{0}},{\xi}_{\text{}})}{\text{max}({\xi}_{{A}_{0},\text{}}({\xi}_{{A}_{0}},{\xi}_{\text{}}))}$$(26c) the original random variables are mapped to equivalent scaled variables such that 0 ≤ p ≤ 1 and 0 ≤ r ≤ 1. As a result the original joint PDF ${g}_{{A}_{0},\text{}}({\xi}_{{A}_{0}},{\xi}_{\text{}})$ may now be defined in terms of transformed random variables p and r.
The normalization of variables is utilized due to scaling effects since it will be generally be more convenient to fit the normalized joint PDF such that φ (p, r) = [f (x, y)]/[max(f (x, y)]. As a result by fitting the normalized parameters 0 ≤ φ (p, r) ≤1 with 0 ≤ p ≤ 1, 0 ≤ r ≤ 1 and f_{max} = max {f (x, y)} the scaled values can then be recovered such that $$x=p({x}_{\text{max}}{x}_{\text{min}})+{x}_{\text{min}}$$(27a) $$y=r({y}_{\text{max}}{y}_{\text{min}})+{y}_{\text{min}}$$(27b) $$f(x,y)={f}_{\text{max}}\phi (p,r)$$(27c) This fit may be implemented using the marginal distributions form of the generalized bivariate quantile distributions for the zeropressure area designated as ${g}_{{A}_{0}}^{\star}({\xi}_{{A}_{0}})$ in Figure 3a and for the distortion coefficient designated as ${g}_{\lambda}^{\star}({\xi}_{\lambda})$ in Figure 3b respectively. We comment that due to the shape of the PDFs for these marginal distributions as observed from the histogram for A_{0} in Figure 3c and the histogram for λ in Figure 3d respectively which are constructed with a Freedman–Diaconis choice of bin size that it would generally be convenient and reasonably accurate to approximate these PDFs in terms of either univariate ELDs as summarized in Table 3 or alternatively in terms of univariate splines where the degree of interpolation can be adjusted to refine the desired accuracy level.
The approach of sampling points x and y from a marginal distribution approximation of a joint PDF f (x, y) is to first generate a point p using a rectangular random number generator distribution which can then be used to calculate x from the corresponding marginal distribution Q_{x} (p). Next by using this known value of x we solve for y from Q_{y} (p, r) using the previously determined value of p however a practical implementation scheme is needed to construct the marginal distribution for Q_{y} (p, r). In this paper for simplicity we opt to perform this implementation by constructing a set of functions for a set of contour curves for specified known values of x so that any arbitrary values can be estimated using for example bilinear or bicubic interpolation for simplicity.
A conventional classical marginal distribution approximation of the joint PDF may therefore be specified by the following three steps where a convenient functional form would in most practical cases be that of a spline fit such as for example a Bspline. The general algorithm would then implement the following general steps such that:
Step 1. Fit the marginal distribution Q_{p} (p) for the scaled variate p with any convenient practical functional form such that ${Q}_{p}(p)=M(p,\alpha )$ where $M(p,\alpha )$ is a marginal distribution function and $\alpha =[{\alpha}_{1},\dots ,{\alpha}_{m}]$ is a corresponding fitted parameter
Step 2. Construct a set X = {p_{1}, … , p_{m}} for some finite integer m ∈ ℤ where 0 = p_{1} < p_{2} < ⋯ < p_{m−1} < p_{m} = 1
Step 3. For each p_{i} ∈ X extract the corresponding contour curve C_{i} = φ (p_{i}, r) from the scaled joint PDF φ (p, r) and fit any convenient practical functional form for each curve C_{i} such that ${C}_{i}={N}_{i}(r,{\beta}_{i})$ where β_{i} is a corresponding fitted parameter
As a result the quantification and specification of the parameter α for the marginal distribution ${Q}_{x}(p;\alpha )$ and the set of parameters ${\beta}_{1}\mathrm{}\dots \mathrm{}{\beta}_{m}$ for the contour curves for the marginal distribution ${Q}_{y}({p}_{i},r;{\beta}_{i})$ would actually be sufficient to fully characterize the TS pressure balance's behaviour for test calibrations and metrology intercomparisons. The challenge with this traditional marginal distribution approximation of a bivariate distribution approach is that whilst we could easily explicitly directly generate values for the x random variable corresponding to A_{0} we would have to implicitly indirectly generate values for the y random variable corresponding to λ by choosing a fixed number of contour curves and then employing an interpolation scheme.
In order to clarify how to sample random variables ${\xi}_{{A}_{0}}$ and ξ_{λ} from the marginal distribution approximation of the joint PDF ${g}_{{A}_{0},\lambda}({\xi}_{{A}_{0}},{\xi}_{\lambda})$ suppose for illustration purposes that we wish to generate a random variable of the zeropressure area ${\xi}_{{A}_{0}}$ as x_{i} for p_{i} = 0.7 and a random variable for the distortion coefficient λ as y_{i} for r_{i} = 0.82. We can immediately calculate x as x_{i} = Q_{x} (0.7) but it will be more difficult to calculate y_{i} since in general we would have a set of contour curves for fixed values of p, say p ∈ [0.0, 0.25, 0.5, 0.75, 1.0]. Selecting the nearest p_{ℓ} that is smaller than p_{i} i.e. p_{ℓ} = 0.5 and the nearest p_{u} that is larger than p_{i} i.e. p_{u} = 0.75 allows us to calculate the corresponding y for p_{ℓ} as y_{ℓ} = Q_{y} (p_{ℓ}, r_{i}), and that for p_{u} as y_{u} = Q_{y} (p_{u}, r_{i}). Then these data points can be used to interpolate for the required y_{i} corresponding to r_{i} = 0.82 using linear interpolation so that y_{i} = y_{ℓ} + [(y_{u} − y_{ℓ}) (p_{i} − p_{ℓ})]/[p_{u} − p_{ℓ}].
This particular illustrative example for how to perform a two dimensional interpolation to calculate samples of random variables from the underlying joint PDF is mathematically equivalent to a bilinear interpolation, although in principle any particular interpolation scheme such as for example bicubic interpolation may also be used to obtain smoother interpolated fit values. The marginal distribution approach is illustrated by comparing the actual bivariate surface in Figure 4a with the surface approximation as shown in Figure 4b which demonstrates how a set of contour curves can be used to approximate the two dimensional joint PDF surface. In this figure there are a total of N_{x} = 60 curves for evenly spaced values of the random variable $x={\xi}_{{A}_{0}}$ from 1.9612 mm^{2} to 1.9617 mm^{2} and the $y={\xi}_{\mathrm{\lambda}}$ varies from 0.3145 ppm/MPa to 1.1581 ppm/MPa in evenly spaced values using a total of N_{y} = 75 points. Whilst the marginal distribution approach for constructing a bivariate PDF is theoretically possible it does not unfortunately offer a convenient and practical means of modelling and summarizing a bivariate PDF for reporting purposes such as in a pressure balance calibration certificate or intercomparison report.
An alternative to the marginal distribution approach for arbitrary specifications of p_{i} and r_{i} without any need for additional interpolations in order to immediately directly sample random variables from the joint PDF is to formulate the surface fitting of the joint PDF as a scattered data interpolation problem. This problem which is common in the field of statistics is formally specified as that given a set of discrete sampled data points $({x}_{j},{y}_{j})$ for j = 1, 2, … , N with ${x}_{j}\in {\mathbb{R}}^{s}$ and y_{j} ∈ ℝ where 0 < s ∈ ℤ is the dimension of the data, then find a continuous function ${P}_{f}$ such that ${P}_{f}({x}_{j})={y}_{j}\text{\hspace{1em}}\forall j\in [1,2,\dots ,N]$. A particular wellposed solution to the scattered data interpolation problem is to use the distance matrix formulation such that $${P}_{f}(x)=\sum _{k=1}^{N}{c}_{k}{\parallel x{x}_{k}\parallel}_{2},\text{\hspace{1em}}x\in {[0,1]}^{2}$$(28) where ∥ ⋅ ∥ _{2} is the ℓ^{2}norm defined as $\parallel x\parallel =\sqrt{{x}_{1}^{2}+{x}_{2}^{2}}$ for $x={[{x}_{1},{x}_{2}]}^{\text{T}}$ in the two dimensional case when s = 2. The coefficients c_{k} for k = 1, 2, … , N where N is the total number of known discrete points $\left\{{x}_{j},{y}_{j}\right\}$ is obtained by solving the linear system specified as $$\text{D}c=b$$(29a) $${D}_{ij}={\parallel {x}_{i}{x}_{j}\parallel}_{2}$$(29b) $$b={\left[{y}_{1},\dots ,{y}_{N}\right]}^{\text{T}}$$(29c) $$c={[{c}_{1},\dots ,{c}_{N}]}^{\text{T}}$$(29d)We comment that the scattered data interpolation solution using the distance matrix formulation can be refined for an arbitrarily large number of supplied known data points. A useful benefit of this approach is that it is not strictly necessary to first normalize and scale the random variables to fit $x={[{p}_{j},{r}_{j}]}^{\text{T}}$ and y_{j} = φ (p_{j}, r_{j}) for j = 1, … , N since the scattered data interpolation scheme can be applied directly with ${x}_{j}={[{({\xi}_{{A}_{0}})}_{j},{({\xi}_{\lambda})}_{j}]}^{\text{T}}$ and ${y}_{j}=f({({\xi}_{{A}_{0}})}_{j},{({\xi}_{\lambda})}_{j})$. Nevertheless although a normalization is not theoretically necessary in our simulations we opt to apply the fit to the normalized variables 0 ≤ p ≤ 1, 0 ≤ r ≤ 1 and 0 ≤ φ (p, r) ≤1 to avoid illconditioning issues when solving the matrix equation $\text{D}c=b$. Results for a scattered data interpolation scheme using the same underlying data is illustrated by comparing the actual bivariate surface as previously shown in Figure 4a with the scattered data surface approximation as shown in Figure 4c.
Whilst this construction of the conventional scattered data two dimensional joint PDF is better than that of a marginal distribution approach which is restricted by the fixed number of discrete contour curves chosen, the main issue is that the number of constants used to construct the surface is still very large since in this case there are N = 3534 constants c_{1}, … , c_{N} for the underlying discrete data points $\left\{{x}_{j}={[p,r]}^{\text{T}},{y}_{j}=\phi (p,r)\right\},\text{\hspace{1em}}\mathrm{j}=1,\dots ,\mathrm{N}$. As a result the main issue with a scattered data interpolation scheme apart from the large number of constants to approximate the surface in our particular physical pressure balance metrological system, is that although it is guaranteed to adequately model the underlying data that the associated system of equations is usually illconditioned.
Alternative meshless techniques that avoid illconditioning encompass the radial basis function (RBF) approach as discussed by Larsson and Fornberg [33] which may be considered a generalization of the scattered data interpolation scheme ${P}_{f}(x)={\sum}_{k=1}^{N}{c}_{k}{B}_{k}(x)$. In this approach the constants are calculated by solving the matrix equation $\text{A}c=y$ where A_{ij} = ϕ (ϵ, r) where ϕ = r^{2}log(r) in the case of a thin plate spline or where ϕ = exp(− (ϵr) ^{2}) in the case of a Gaussian RBF where $r=\parallel {x}_{j}{x}_{i}\parallel $ and $c={[{c}_{1},\dots ,{c}_{N}]}^{\text{T}}$ as previously discussed, however the same issue of a large number of constants is still present.
This problem again occurs when low order spline approximations of bivariate functions are considered where the bivariate function is traditionally approximated as a tensor product spline surface of the form $$f(x,y)\approx \sum _{k=1}^{K}{\sigma}_{k}{u}_{k}(x){v}_{k}(y)$$(30) where K is an integer corresponding to the order of the approximation, σ_{k} are constants, and u_{k} (x) and v_{k} (y) are univariate functions each with their own individual corresponding parameters particular to the characteristics of the respective basis functions as recently developed by Georgieva and Hofreither [34]. The main issue with such a traditional low rank approximation in our particular problem is that although it can very accurately model and generate a reconstructed surface for suitable choices of σ_{k}, u_{k} (x) , v_{k} (y) and K, is that in general a very large number of coefficients are generally still necessary in order to adequately reconstruct the basis functions u_{k} (x) and v_{k} (y). Whilst a large number of coefficients is not an issue in a computer implementation the reporting of a large number ranging from a few hundred to a few thousand coefficient values is nevertheless however not considered practical for a calibration paper certificate or written intercomparison report.
Due to these practical implementation issues we therefore consider a copula approach C_{θ} (u, v) as previously discussed in order to construct the bivariate distribution for our particular physical metrological measurand system since for many practical copula families only a small number of parameters are required in the term $\theta ={[{\theta}_{1},\dots ,{\theta}_{n}]}^{\text{T}}$. As a result due to generally small number of parameters to adequately characterize bivariate PDF distributions this renders the reporting of the physically characterized pressure balance joint PDF in calibration certificates and intercomparison reports a practical and feasible alternative for oil based pressure balances.
In this paper for brevity we restrict our numerical investigation to the normal, Studentt, Gumbel, Frank and Clayton bivariate copulas C_{θ} (u, v) as summarized in Table 4 where the respective copula parameter θ is obtained from the Kendall tau parameter τ_{K} as previously discussed. When constructing the copula we use x as the random variable for the zeropressure area A_{0} and y as the random variable for the distortion coefficient λ in order to construct the joint PDF f (x, y) such that $$f(x,y)=uv\frac{{\partial}^{2}C(u,v)}{\partial u\partial v}$$(31) $$u={\int}_{\infty}^{x}{g}_{x}({\xi}_{x})\text{\hspace{0.17em}d}{\xi}_{x}$$(32) $$v={\int}_{\infty}^{y}{g}_{y}({\xi}_{y})\text{\hspace{0.17em}d}{\xi}_{y}$$(33) In the above formulae g_{x} (ξ_{x}) is the marginal PDF for x as previously approximated with an ELD as illustrated in Figure 3a, whilst g_{y} (ξ_{y}) is the marginal PDF for y as illustrated in Figure 3b both of which are fully parametrized in terms of respective univariate quantile parameters a, b, c, d. By exploiting the properties of ELDs as previously discussed by Willink [16] the distribution function formally defined as $F(x)={\int}_{\infty}^{x}f(u)\text{\hspace{0.17em}d}u$ for the random variable x with a similar expression for the random variable y may in the special case of an ELD approximation be simplified as $$Q(\rho )=\{\begin{array}{l}d+\frac{c}{b}\left[a{\rho}^{b}{(1\rho )}^{b}+1a\right]\text{\hspace{1em}}if\text{\hspace{0.17em}}b\ne 0\hfill \\ d+c\left\{a\mathrm{ln}\rho \mathrm{ln}(1\rho )\right\}\text{\hspace{1em}}if\text{\hspace{0.17em}}b=0\hfill \end{array}$$(34)As a result if the marginal distribution for the zeropresssure area A_{0} random variable x has ELD parameters a_{x}, b_{x}, c_{x}, d_{x}, and the marginal distribution for the distortion coefficient λ random variable y has ELD parameters a_{y}, b_{y}, c_{y}, d_{y} it follows that we can immediately utilize the previous analytical expressions in the copula construction for the bivariate joint PDF in terms of the ELD parameters. In order to construct a copula function C_{θ} (u, v) for the formulae listed in for example Table 4 we must in the case of the Gaussian copula use Φ_{ρ} (•) which denotes the bivariate standard normal distribution with correlation ρ whilst Φ^{−1} (•) denotes the inverse standard normal distribution. On the other hand for the bivariate Studentt copula t_{ρ,ν} denotes the bivariate tdistribution with parameters ρ and ν where the degrees of freedom ν must typically be calculated using the Akaike Information Criterion (AIC) where the smallest AIC value (which depends on ν) is considered to be the bestfit copula choice, and ${t}_{\nu}^{1}(\u2022)$ is the corresponding inverse tdistribution value for the associated degrees of freedom ν parameter for the copula. Although the conventional tdistribution copula usually has one degree of freedom parameter ν more modern alternatives can incorporate multiple degrees of freedom parameters ν_{1}, … , ν_{n} for bivariate models as discussed in Luo and Shevchenko [35].
As a result whilst it is certainly possible to estimate bivariate copula parameters $\theta ={[{\theta}_{1},\dots ,{\theta}_{n}]}^{\text{T}}$ directly with custom written computer code through various parameter optimization approaches for a range of copula families of which Table 4 is only a small selection of the very extensive range of possible copula families this is no longer strictly essential. This is due to the fact that Yan [36] developed a R based package copula which is now readily available to researchers worldwide, and which has subsequently been expanded by Kojadinovic and Yan [37] for multivariate distribution modelling using copulas. As a result the use of R based open source statistical software from the Comprehensive R Archive Network is now commonly available to researchers worldwide and is accepted as a standard statistical tool within the statistics community. In this paper we will therefore utilize the copula [38], copBasic [39], and VineCopula [40]. R packages to simplify the analysis in order to construct the best choice of a copula function for our underlying pressure balance Monte Carlo bivariate data.
The use of the abovementioned R packages allows us to simply load the bivariate data from our postprocessed GS2 Monte Carlo simulations and to then implement the following computer code using the VineCopula library which automatically selects the most appropriate bivariate copula function drawing on a selection of copula families more extensive that the most common families previously outlined in Table 4. When performing the copula selection to avoid numerical illconditioning due to scaling effects since $\mathcal{O}({\xi}_{{A}_{0}})={10}^{6}$ and $\mathcal{O}({\xi}_{\lambda})={10}^{12}$ in SI units it may depending on the available computer systems be advantageous when fitting the copula function to utilize the variate data ${\xi}_{{A}_{0}}$ for A_{0} and ξ_{λ} for λ such that they are first converted so that x is in mm and y is in ppm/MPa. We comment that the choice of units is not problematic since our main objective is to model the bivariate joint PDF and the form of the fit should simply allow us to adequately sample values for the A_{0} and λ variates x and y. Further technical implementation details for the VineCopula library are available in the official documentation however we briefly comment that the user would in most practical situations select from ml for maximum likelihood, mpl for maximum pseudolikelihood, itau for inversion of Kendall's tau, or irho for inversion of Spearman's rho as choices for the parameter optimization choice for the software code to determine the optimal copula parameter fit.
The process to fit the bivariate data starts with the GS2 Monte Carlo data GUMSupplement2.txt that is obtained with any suitable computer code simulation environment that is saved in a neutral ASCII txt file format. Once the data is saved it can then be processed to firstly extract the univariate ELD quantile parameters in for example GNU Octave for the marginal distributions, and then secondly processed in RStudio using the VineCopula library to determine the optimal bivariate copula and associated copula parameters as shown in Figure 5.
Results for the univariate marginal distribution parameter fits with GNU Octave v4.2.0 by running computer program CalcMarginalParameters.m with the mcode shown in Figure 6 are shown below in Listing 1.
Results for the bivariate copula parameter fits with RStudio by running computer program CalcCopulaParameters.r are shown below in Listing 2.
The above computer code console outputs are actually sufficient to fully characterize the ELD marginal distribution and copula bivariate distribution parametrizations. In order to construct the joint PDF the second order mixed partial derivative of the copula C (u, v ; θ) must be calculated in order to calculate the copula density c (u, v ; θ). Further technical implementation details to directly numerically construct the joint PDF from a general copula that is not amenable to closed form analytical expressions of the copula density are outlined in Appendix B.
For the particular case of a Gaussian copula Meyer [41] provides explicit analytical closed form expressions such that for the univariate case the cumulative distribution function Φ^{−1} (•) is defined as $$\phi (x)=\frac{1}{\sqrt{2\pi}}\mathrm{exp}\left(\frac{{x}^{2}}{2}\right)$$(35) $$\Phi (h)={\int}_{\infty}^{h}\phi (x)\text{\hspace{0.17em}d}x$$(36) The univariate standard normal CDF Φ (h) is then used to construct the bivariate standard normal CDF Φ_{2} (a, b) such that $$C(u,v;\rho )=\frac{1}{2\pi \sqrt{1{\rho}^{2}}}\times {\int}_{\infty}^{{\Phi}^{1}(u)}{\int}_{\infty}^{{\varphi}^{1}(v)}\mathrm{exp}\left(\frac{({s}^{2}2\rho st+{t}^{2})}{2(1{\rho}^{2})}\right)\text{\hspace{0.17em}d}s\text{\hspace{0.17em}d}t$$(37) $$c(u,v;\rho )=\frac{1}{\sqrt{1{\rho}^{2}}}\times \mathrm{exp}\left(\frac{2\rho {\Phi}^{1}(u){\Phi}^{1}(v){\rho}^{2}\left({\Phi}^{1}{(u)}^{2}+{\Phi}^{1}{(v)}^{2}\right)}{2(1{\rho}^{2})}\right)$$(38)When the above expressions are combined we then obtain the final expression for a Gaussian copula formulation of the bivariate joint PDF as $$f(x,y)=F(x)G(y)c(u,v;\rho )$$(39)In our particular case since the optimal copula is a Gaussian we simply specify the copula as $$C(u,v)={\Phi}_{2}({\Phi}^{1}(u),{\Phi}^{1}(v);\rho )$$(40a) $$\rho =0.88312747$$(40b)The above copula function informations is then completed by also specifying the associated quantile function parameters for F (x) = u and G (y) = v which using our ELD approximation is of the form $$u=\{\begin{array}{l}{d}_{x}+\frac{{c}_{x}}{{b}_{x}}\left[{a}_{x}{\rho}^{{b}_{x}}{(1\rho )}^{{b}_{x}}+1{a}_{x}\right],\text{\hspace{1em}}{b}_{x}\ne 0\hfill \\ {d}_{x}+{c}_{x}\left\{{a}_{x}\mathrm{ln}\rho \mathrm{ln}(1\rho )\right\},\text{\hspace{1em}}{b}_{x}=0\hfill \end{array}$$(41a) $$v=\{\begin{array}{l}{d}_{y}+\frac{{c}_{y}}{{b}_{y}}\left[{a}_{y}{\rho}^{{b}_{y}}{(1\rho )}^{{b}_{y}}+1{a}_{y}\right],\text{\hspace{1em}}{b}_{y}\ne 0\hfill \\ {d}_{y}+{c}_{y}\left\{{a}_{y}\mathrm{ln}\rho \mathrm{ln}(1\rho )\right\},\text{\hspace{1em}}{b}_{y}=0\hfill \end{array}$$(41b)where $${a}_{x}=1.1623996764$$(42a) $${b}_{x}=1.6461122078\times {10}^{1}$$(42b) $${c}_{x}=4.5270706830\times {10}^{5}$$(42c) $${d}_{x}=1.9614580454$$(42d)while the quantile function parameters for G (y) are $${a}_{y}=0.9339041282$$(43a) $${b}_{y}=0.1757535247$$(43b) $${c}_{y}=0.0872674857$$(43c) $${d}_{y}=0.7461688986$$(43d)The above set of four equations for the copula function C (u, v ; θ), marginal distributions F (x) and G (y), and parameters associated with the respective marginal distributions {a_{x}, b_{x}, c_{x}, d_{x}} and {a_{y}, b_{y}, c_{y}, d_{y}} are then actually sufficient to model the bivariate joint PDF for the TS pressure balance, by noting the ELD parameter limits are by definition $$\begin{array}{cc}\hfill d\frac{ac}{b}\le \eta \le d+\frac{c}{b}\hfill & \hfill if\text{\hspace{0.17em}}b>0\hfill \end{array}$$(44a) $$\begin{array}{cc}\hfill \infty <\eta >\infty \hfill & \hfill if\text{\hspace{0.17em}}b\le 0\text{\hspace{0.17em}and\hspace{0.17em}}a\ne 0\hfill \end{array}$$(44b) $$\begin{array}{ll}0\le \eta <\infty \hfill & if\text{\hspace{0.17em}}b\le 0\text{\hspace{0.17em}and\hspace{0.17em}}a=0\hfill \end{array}$$(44c)Once the joint PDF has been summarized using f (x, y) = F (x) G (y) c (u, v) with u = F (x) and v = G (y) as indicated above, it can then be used in any further uncertainty analysis numerical simulations for the TS pressure balance. As an example we would in a practical physical pressure balance calibration like to sample random variables ${\xi}_{{A}_{0}}$ and ξ_{λ} from ${g}_{{A}_{0},\lambda}({\xi}_{{A}_{0}},{\xi}_{\lambda})$ in order to calculate the uncertainty in generated pressure. The general numerical procedure we propose in this paper adapted from the univariate sampling procedure for arbitrary distributions is as follows for bivariate distributions:
Step 1. Sample a random variable ρ_{x} ∼ R from the rectangular distribution so that 0 ≤ ρ_{x} ≤ 1
Step 2. Solve the equation F (x^{⋆}) = ρ_{x} in order to deduce a value for x^{⋆} and then simply set ${\xi}_{{A}_{0}}={x}^{\star}$
Step 3. Construct the contour curve $\mathcal{C}(y)={f}_{X}({x}^{\star},y)$ by fixing x = x^{⋆} in the joint PDF f (x, y) and normalize as appropriate
Step 4. Build the CDF such that $H(y)={\int}_{\infty}^{y}\mathcal{C}(\eta )\text{\hspace{0.17em}d}\eta $ and normalize as appropriate
Step 5. Sample a random variable ρ_{y} ∼ R from the rectangular distribution so that 0 ≤ ρ_{y} ≤ 1
Step 6. Solve the equation H (y^{⋆}) = ρ_{y} in order to deduce a value for y^{⋆} and then simply set ξ_{λ} = y^{⋆}
If the sampled variates are x^{⋆} and y^{⋆}, and ρ_{x} and ρ_{y} are sampled rectangular random numbers then the above procedure can be summarized as solving the following system of equations such that $${x}^{\star}\leftarrow F({x}^{\star})={\rho}_{x}$$(45a) $${y}^{\star}\leftarrow \frac{{\int}_{\infty}^{{y}^{\star}}F({x}^{\star})G(y)c({\rho}_{x},G(y);\rho )\text{\hspace{0.17em}d}y}{{\int}_{\infty}^{\text{max}(y)}F({x}^{\star})G(y)c({\rho}_{x},G(y);\rho )\text{\hspace{0.17em}d}y}={\rho}_{y}$$(45b) We comment that the above bivariate sampling scheme is more general than a traditional bivariate Gaussian sampling scheme since it is not restricted to a conventional Gaussian bivariate PDF, and can sample from arbitrary bivariate joint PDFs since the approach will work for arbitrary marginal distributions and arbitrary bivariate copulas. As a result the benefit of using a copula to model a bivariate distribution over a traditional GS2 based Gaussian joint PDF is that one is no longer specifically restricted to bivariate Gaussian distributions as is presently the case according to the official GS2 guidelines. With the application of copulas as a particular modelling approach for bivariate quantile distributions one may therefore now in principle utilize any other alternative physical or mathematically plausible choice of probability distributions modelled in terms of copulas to summarize bivariate Monte Carlo uncertainty analysis simulation data as outlined in Appendix B.
We can now test the extent to which our bivariate joint PDF for the TS pressure balance can estimate the generated pressures. This is performed by implementing a GS1 simulation of a generating pressure model for the TS pressure balance as $$f(p)=\frac{1}{S}\left[\sum _{j=1}^{n}{m}_{j}g\left(1\frac{{\rho}_{a}}{{\rho}_{j}}\right)+g(HS{V}_{s})({\rho}_{f}{\rho}_{a})+\sigma C\right](p{p}_{a})$$(46a) $$S={A}_{0}(1+\lambda P)\varphi (t,{t}_{ref})$$(46b) $$P=(p{p}_{a})$$(46c) where ${[{\xi}_{{A}_{0},{\xi}_{\mathrm{\lambda}}}]}^{\text{T}}\sim {g}_{{A}_{0},\mathrm{\lambda}}({\xi}_{{A}_{0}},{\xi}_{\mathrm{\lambda}})$ is sampled using our bivariate joint PDF modelling approach using the temperature compensation function ϕ (t, t_{ref}) =1 + α (t − t_{ref}).
When this Monte Carlo simulation for the TS pressure balance is performed using the data in Appendix A and the Monte Carlo generated pressure data is obtained, then this generated pressure data for each of the crossfloated pressure points 50 MPa, … , 500 MPa may be compared to the actual generated pressures from the LS pressure balance. Numerical results were performed using a direct simulation single language implementation in GNU Octave with the above generalized sampling scheme for x^{⋆} and y^{⋆} as shown in Table 5. These results were obtained on a computer workstation using an Intel Xeon E51650 v3 CPU running at 3.50 GHz with 32 GB of RAM for M = 500 Monte Carlo simulation events by directly sampling from the bivariate joint PDF by solving the respective equations for x^{⋆} and y^{⋆}.
Alternatively in the special case where the analytical expression for the copula density c (u, v) is explicitly known beforehand then a mixed language approach using R and Octave can offer considerable computational time savings. This approach involves first using R to sample from the Gaussian copula, or any other suitable copula family available from the RStudio available copula library that adequately approximates the bivariate data Kendall tau value, with the specified parameter value θ and then saving the marginal distribution values u and v to a text file as indicated below in Listing 3.
The R computer code shown in Listing 3 writes the u and v values that are consistent with the specified copula density function c (u, v) to a text file CopulaDataUV.txt which is an array of dimension 500 × 2 where the first column is the u data and the second column is the v data, whilst the corresponding copula density c (u, v) is saved to the text file CopulaDataC.txt. Afterwards the corresponding variate data x and y can then be simply recovered by using one dimensional interpolations from the previously generated ELD based univariate marginal distributions u = F (x) and v = G (y). This mixed language process for n = 10 000 samples using RStudio and GNU Octave takes approximately 85.36 s to generate, postprocess and then recover the variate ${\xi}_{{A}_{0}}$ and ξ_{λ} sampled data on the previously mentioned Toshiba laptop. As a result the numerical simulations to test the bivariate statistical sampling approach demonstrates that a copula based bivariate quantile joint distribution sampling scheme is computationally feasible for both single computer and well as mixed computer language implementations when used as inputs in subsequent Monte Carlo uncertainty simulations.
In order to verify and validate (V&V) our proposed bivariate joint PDF modelling approach in terms of the standard quality engineering V&V methodology we must compare and contrast these numerical results which utilize the bivariate statistical sampling of the pressure balance',s zeropressure area and distortion coefficient values with the exact Monte Carlo numerical results previously summarized in Table 1. We can implement this comparison by determining the normalized errors E_{n} defined as $${E}_{n}=\frac{{x}_{lab}{x}_{ref}}{\sqrt{{U}_{lab}^{2}+{U}_{ref}^{2}}}$$(47) for all our generated/crossfloated data points between the LS which generates the known applied pressures and the TS which is crossfloated against the LS. According to this approach x_{ref} would correspond to the LS generated applied pressure ${P}_{k}^{(LS)},\text{\hspace{0.17em}}k=1,\dots ,10$ and U_{lab} would correspond to the expanded uncertainty of the generated pressure ${U}_{k}^{(LS)}$, whilst x_{lab} would correspond to the TS equivalent generated pressure ${P}_{k}^{(TS)}$ using the copula based crossfloated area parameters where ${U}_{k}^{(TS)}$ are the corresponding TS expanded uncertainties for the respective applied pressure.
In order to calculate the normalized errors knowledge of the respective expanded uncertainties ${U}_{k}^{(LS)}$ and ${U}_{k}^{(TS)}$ are necessary noting that we have used ELDs to summarize the respective Monte Carlo data for both the LS and TS. We recall that the expected value μ and standard uncertainty σ for an ELD are specified as $$\mu =d+\frac{c(1a)}{b+1}$$(48a) $${\sigma}^{2}=\{\begin{array}{l}{\left(\frac{c}{b}\right)}^{2}\left[\frac{{a}^{2}+1}{2b+1}\frac{2a\Gamma {(b+1)}^{2}}{\Gamma (2b+2)}{\left(\frac{a1}{b+1}\right)}^{2}\right]\text{\hspace{1em}}if\text{\hspace{0.17em}}b\ne 0\hfill \\ {c}^{2}[{a}^{2}+\frac{{\pi}^{2}a}{3}2a+1]\text{\hspace{1em}}if\text{\hspace{0.17em}}b=1\hfill \end{array}$$(48b) As a result knowledge of the ELD parameters immediately gives us the values of the expected values μ_{k} for the LS and TS but we do not have direct knowledge of the respective expanded uncertainties which must be calculated from the ELD based distribution function. In general for a specified confidence level p, say p = 0.95 corresponding to a 95% confidence level, for a possibly nonsymmetric and/or skew PDF g (η) a value of α is first calculated by minimizing [G^{−1} (p + α) − G^{−1} (α)], however if symmetry with an absence of skewness is assumed as a simplification then $\alpha =\frac{(1p)}{2}$.
Under these circumstances for a measurand y with a distribution function G (η) the confidence interval for the specified confidence level is just Y_{min} = G^{−1} (α) and Y_{max} = G^{−1} (p + α). Consequently we may then just approximate the expanded uncertainty as $$U\approx \frac{1}{2}\left({Y}_{\text{max}}{Y}_{\text{min}}\right)$$(49) where $${Y}_{\text{min}}=Q(\alpha )$$(50a) $${Y}_{\text{max}}=Q(p+\alpha )$$(50b) $$Q(\rho )=\{\begin{array}{ll}d+\frac{c}{b}\left[a{\rho}^{b}{(1\rho )}^{b}+1a\right]\hfill & if\text{\hspace{0.17em}}b\ne 0\hfill \\ a+c\left[a\mathrm{ln}(\rho )\mathrm{ln}(1\rho )\right]\hfill & if\text{\hspace{0.17em}}b=0\hfill \end{array}$$(50c)When the above V&V formulation is implemented it results in the data summarized in Figure 7 using M = 500 Monte Carlo simulation events by sampling from the bivariate joint PDF. Referring to the normalized errors E_{n} it is observed that −1 ≤ E_{n} ≤ 1 for all the applied pressures 50 MPa, … , 500 MPa.
As a result of these numerical simulations we therefore conclude that the proposed method of using bivariate quantiles with ELDs for the marginals with parameter based optimized copulas is mathematically and statistically consistent for pressure balance calibrations and intercomparisons at a primary scientific metrology standards level.
Summary of ELD parameters for generated applied pressures P_{k}/[Pa] for k ∈ [1, … , 10]. The index k indicates the pressures P_{k} from P_{1} = 50 MPa through to P_{10} = 500 MPa so that the extended lambda distribution parameter a_{k}, b_{k}, c_{k} and d_{k} can be used to summarize the probability density function distribution for P_{k}.
Summary of ELD parameters for generated areas S_{k}/[m^{2}] for k ∈ [1, … , 10]. The index k indicates the area S_{k} corresponding to the crossfloated pressures from P_{1} = 50 MPa through to P_{10} = 500 MPa so that the extended lambda distribution parameter a_{k}, b_{k}, c_{k} and d_{k} can be used to summarize the probability density function distribution for S_{k}.
Fig. 1 Illustration of how the Monte Carlo data from a GUM Supplement uncertainty analysis is utilized in order to determine the joint probability density function for the transfer standard pressure balance. In (a) the univariate GUM Supplement 1 (GS1) Monte Carlo data of the crossfloated pressures P_{k} and crossfloated areas S_{k} is first preprocessed by averaging and plotted in order to get a rough qualitative estimate of the model parameters. In (b) the actual GS1 data points are then processed with the GUM Supplement 2 (GS2) using a suitable χ^{2} optimization to extract the actual model parameters and the discrete model parameter values plotted. In (c) the discrete model parameters are then postprocessed using a kernal density estimation scheme in order visualize the mathematically continuously variable joint bivariate probability density function behaviour of the underlying model. 
Fig. 2 Illustration of Monte Carlo GUM Supplement 2 processed data showing a visualization of the transfer standard pressure balance bivariate joint PDF where the random variables are plotted in physical units such that 10^{6}A_{0}/[m^{2}] = A_{0}/[mm^{2}] and 10^{12}λ/[Pa^{−1}] = λ/[ppm/MPa] for easy visualization so that the joint PDF has the correct magnitude for a two dimensional normalization such that ${\int}_{\infty}^{\infty}{\int}_{\infty}^{\infty}{g}_{{A}_{0},\lambda}({\xi}_{{A}_{0}},{\xi}_{\lambda})\text{\hspace{0.17em}d}{\xi}_{{A}_{0}}\text{d}{\xi}_{\lambda}=1$ when constructed from the two dimensional histogram data using ${N}_{{A}_{0}}=\mathrm{}{N}_{x}=63$ and N_{λ} = N_{y} = 58 bins respectively with M = 10 000 Monte Carlo simulation events. In (a) the actual discrete data of the bivariate joint probability density function is first plotted for a qualitative visualization. In (b) the discrete data is postprocessed to produce the mathematically continuous joint probability density function in a three dimensional space where each space coordinate corresponds to a random variable. In (c) the joint probability density function is normalized in order to more clearly view the behaviour of the underlying model. 
Fig. 3 Monte Carlo GUM Supplement 2 marginal distributions. In (a) the probability density function distribution for the continuous random variable ${\xi}_{{A}_{0}}$ of the zeropressure area is visualized. In (b) the probability density function distribution for the continuous random variable ξ_{λ} of the distortion coefficient is visualized. In (c) the underlying discrete data from the GUM Supplement 2 analysis that is used to build the PDF for the zeropressure area is shown. In (d) the underlying discrete data from the GUM Supplement 2 analysis that is used to build the PDF for the distortion coefficient is shown 
Marginal distribution ELD parameters for zeropressure area A_{0} and distortion coefficient λ variates. The underling probability density function distribution for the zeropressure area random variable ${\xi}_{{A}_{0}}$ and the distortion coefficient random variable ξ_{λ} is modelled and approximated with an extended lambda distribution based on the discrete data obtained from the GUM Supplement 2 uncertainty analysis.
Fig. 4 Illustration of comparison between the actual bivariate joint PDF surface with that of a conventional classical marginal distribution approach approximation showing the discrete set of contour curves visualized by three dimensional lines that are used to construct joint PDF surface, and that of a scattered data approximation of the corresponding normalized bivariate surface. In (a) the actual continuous probability density function distribution is shown which must be modelled and approximated through a visualization in a three dimensional space where the x coordinate corresponds to the random variable ${\xi}_{{A}_{0}}$, the y coordinate corresponds to the random variable ξ_{λ} and the z coordinate corresponds to the joint PDF ${g}_{{A}_{0}\mathrm{}\lambda}\mathrm{}({\xi}_{{A}_{0}}\mathrm{},{\xi}_{\lambda}\mathrm{})$. In (b) the modelling and approximation is shown where the three dimensional surface is approximated by a series of three dimensional lines within the three dimensional space. In (c) the modelling and approximation is visualized for the equivalent surface that is built up from discrete three dimensional scattered points. 
Specification of common parameter based bivariate copula families (Adapted from Goda [17]). In our particular problem the variable u corresponds to the random variable ${\xi}_{{A}_{0}}$ and the variable v corresponds to the random variable ξ_{λ} so that the bivariate copula C (u, v) can be used to construct the joint probability density function as ${g}_{{A}_{0},\lambda}({\xi}_{{A}_{0}},{\xi}_{\lambda})=\frac{{\partial}^{2}C}{\partial u\partial v}$ for convenience.
Fig. 5 Illustration of how a GUM Supplement 2 (GS2) Monte Carlo uncertainty analysis would be implemented in practice for constructing the marginal distribution and copula distribution parameters. In (a) an extract of the actual M = 10000 data points is shown from the GS2 which were determined with a suitable χ^{2} optimization. In (b) the computer code implementation of how to process the GS2 data to fit the marginal distributions in terms of an extended lambda distribution is shown. In (c) the computer code implementation of how to determine the copula parameters using open source easily accessible standard statistical software is shown. 
Fig. 6 Illustration of mcode for estimating an ELD parametrization for univariate Monte Carlo data. The computer program works by loading the Monte Carlo univariate data as an input ydata = [y_{1}, … , y_{M}] ^{T} where M = 10 000 is the number of Monte Carlo simulation events and it then uses this data to determine the extended lambda distribution parameters a, b, c and d so that the probability density function can be analytically specified. 
Listing 1 Computer output used to calculate the univariate ELD parameters for A0 and . 
Listing 2 Computer output used to calculate the copula parameter with open source standard statistical software. 
Extended lambda distribution (ELD) parameters for transfer standard pressure balance generated applied pressures P_{k}/[Pa] using bivariate copula joint PDF. The index k corresponds to the pressures P_{k} for 50 MPa through to 500 MPa and the parameters a_{k}, b_{k}, c_{k} and d_{k} can then be used to construct analytical expressions for the probability density function distribution for each of the P_{k} pressures.
Listing 3 Computer output used to calculate sampled values from the copula with open source statistical software. 
Fig. 7 Verification and validation of proposed method using normalized errors of applied pressures demonstrating that ∥E_{n} ∥ ≤1 ∀ n ∈ [1, … , 10] laboratory standard pressure balance generated pressures and transfer standard pressure balance crossfloated area measurements. The pressures P_{1} through to P_{10} correspond to pressures from 50 MPa through to 500 MPa at 50 MPa steps and the E_{n} values for each of these pressures indicates the corresponding normalized error. When calculating the E_{n} value for each pressure the known reference pressure P_{ref} is determined from the laboratory standard pressure balance whilst the calibrated pressure P_{cal} is determined from the transfer standard pressure balance using the bivariate copula that models and summarizes the information of the calibrated pressure balance's zeropressure area A_{0} and distortion coefficient λ. Due to the fact that all of the E_{n} values are smaller than unity this then means that the results and uncertainties that were calculated for the calibrated pressure balance when tested against the reference pressure balance are statistically consistent and hence provides proof that the proposed method of using copulas to model and summarize bivariate probability density functions has been verified and validated. 
4 Discussion
In this paper we have investigated the feasibility and utility of extending and applying quantile functions to systems which naturally exhibit bivariate PDF's and considered the particular case of oil pressure balances where the area takes the form A = A_{0} (1 + λP) and knowledge of the bivariate joint PDF ${g}_{{A}_{0},\lambda}({\xi}_{{A}_{0}},{\xi}_{\lambda})$ in terms of the zeropressure area ${\xi}_{{A}_{0}}$ and distortion coefficient ξ_{λ} random variables is necessary for elevated pressures. Our research approach involved the implementation of a GS2 multivariate uncertainty analysis of a TS pressure balance where we mathematically formulated an approach to postprocess the original Monte Carlo data in order to recover the underlying ${\xi}_{{A}_{0}}$ and ξ_{λ} bivariate random variable statistical data for further analysis.
Numerical simulations were then considered and performed for a variety of mathematical modelling approaches in order to study how to adequately model, summarize and reconstruct the underlying bivariate statistical uncertainty analysis data. Based on these numerical simulations we sought to investigate the extent to which a combination of univariate ELD quantile functions for the marginal distributions u = F (x) and v = G (y) of $x={\xi}_{{A}_{0}}$ and y = ξ_{λ} respectively when coupled with a suitable bivariate copula family C_{θ} (u, v) selection with an optimized copula parameter θ would be sufficient for pressure calibrations and intercomparisons.
Results of generated pressures for a variety of conditions were then considered and analysed in order to perform benchmark studies using conventional metrology statistical tests. When these results were analysed to assess the level of necessary verification and validation measures for the method proposed in this paper it was concluded that a bivariate quantile distribution of a pressure balance PDF of the form ${g}_{{A}_{0},\lambda}({\xi}_{{A}_{0}},{\xi}_{\lambda})=uvc(u,v)$ is indeed sufficient to accurately model and summarize a pressure balance's metrological characteristics at a primary standards scientific metrology level. As a result we conclude that the measurement modelling technique proposed compares very favourably to an exact GS2 UQ analysis and may thus offer benefits to pressure metrologists at national metrology institutes involved in high accuracy client calibration work and in participation in intercomparisons at a primary scientific standards metrology level.
A potential future topic of metrology research in the area of uncertainty analysis for pressure balances is therefore the possible application of vine copulas for the modelling of multivariate uncertainty analysis data for summarizing GS2 Monte Carlo for higher dimensions. Alternatives to vine copulas are empirical copulas which are copulas note defined in terms of a parameter $\theta ={[{\theta}_{1},\dots ,{\theta}_{n}]}^{\text{T}}$ but which are defined directly in terms of the underlying data. In the case of two dimensional variates [x_{i}, y_{j}] ^{T} the bivariate empirical copula when there are n variates is defined as $${C}_{n}\left(\frac{i}{n},\frac{j}{n}\right)\stackrel{\mathrm{d}\mathrm{e}\mathrm{f}}{=}\frac{\mathrm{\#}\text{pairs\hspace{0.17em}}(x,y)\text{\hspace{0.17em}s.t}.\text{}x\le {x}_{i}\text{\hspace{0.17em}and\hspace{0.17em}}y\le {y}_{j}}{n}$$(51) The above working definition is usually expressed mathematically as $${C}_{n}\left(\frac{i}{n},\frac{j}{n}\right)=\frac{1}{n}\sum _{i=1}^{n}1\left(\frac{{R}_{i}}{n}\le {u}_{i},\frac{{S}_{i}}{n}\le {v}_{i}\right)$$(52a) $${R}_{i}=\text{rank of point\hspace{0.17em}}{x}_{i}$$(52b) $${S}_{i}\mathrm{}=\text{rank of point\hspace{0.17em}}{y}_{i}$$(52c) $$1\left(\frac{{R}_{i}}{n}\le {u}_{i},\frac{{S}_{i}}{n}\le {v}_{i}\right)=\{\begin{array}{l}1\text{}\mathrm{i}\mathrm{f}\text{\hspace{0.17em}}\mathrm{c}\mathrm{o}\mathrm{n}\mathrm{d}\mathrm{i}\mathrm{t}\mathrm{i}\mathrm{o}\mathrm{n}\text{\hspace{0.17em}}\mathrm{t}\mathrm{r}\mathrm{u}\mathrm{e}\hfill \\ 0\text{\hspace{1em}}\mathrm{o}\mathrm{t}\mathrm{h}\mathrm{e}\mathrm{r}\mathrm{w}\mathrm{i}\mathrm{s}\mathrm{e}\hfill \end{array}$$(52d)
Different types of implementations for an empirical copula are possible as discussed in the copBasic package [39] and include the Hazen, Weibull and Berstein empirical copulas. More recent recent research in the area of empirical copulas by Segers et al [42] using Berstein polynomials concluded that beta empirical copulas can fully meet the formal mathematical specifications for a copula and that these offer very good performance in terms of estimating bias and variance.
As a result these types of copulas are potentially attractive choices for metrologists for future UQ research in terms of modelling and summarizing higher dimensional PDFs where multivariate Gaussian PDFs as per the GS2 are problematic due to higher dimensional probability distribution asymmetries and skewness characteristics.
5 Implications and influences
The main implication of this paper is that we have demonstrated that quantile functions may be used to accurately and completely model the bivariate joint probability density distribution function for a pressure balance's effective area in terms of its zeropressure area A_{0} and distortion coefficient λ. As a result the specification of nine numerical parameters, four for each of the marginal distributions plus one for the bivariate copula parameter, now enables pressure metrologists to simply summarize pressure balance bivariate PDF information in calibration certificates and intercomparison reports, and to have an increased level of confidence in the behaviour and uncertainty of their oil pressure balance laboratory primary standards at elevated pressures which was previously limited due to the complexity of incorporating the uncertainty in the distortion coefficient.
Based on the results reported in this paper the wider influences which are now possible is an increased awareness of the utility of multivariate higher dimensional uncertainty analysis with the GS2, and how copulas may now be used to simply and conveniently summarize higher dimensional uncertainty analysis results in high accuracy calibrations and scientific metrology intercomparisons in other metrology areas and fields of work.
Acknowledgments
This work was performed with funds provided by the Department of Higher Education and Training (DHET) on behalf of the South African government for research by public universities.
Appendices
Appendix A Experimental pressure balance data
We use the representative experimental data set previously reported by Ramnath [19] as indicated in Table A.1 for the laboratory standard (LS) and TS respective mass and density values, Table A.2 for the temperature and ambient conditions, Table A.3 for the utilized mass pieces of the LS, and Table A.4 for the utilized mass pieces of the TS during the crossfloating experiments where points 1 through to 10 correspond to nominal applied pressures such that P_{i}/[MPa] ≈50i for i ∈ [1,10].
Pressure balance laboratory standard and transfer standard mass and density data.
Pressure balance laboratory standard and transfer standard temperatures and ambient data.
Boolean values to determine pressure balance laboratory standard and transfer standard operating conditions (weights for pressure generation).
Boolean values to determine pressure balance laboratory standard and transfer standard operating conditions (weights for cross floating).
In this data set the working fluid is assumed to be Di(2)ethylhexylsebacate i.e. DHS oil for which the fluid density is modelled as $${\rho}_{f}=[912.8+0.752(p\times {10}^{6})(1.65\times {10}^{3}){(p\times {10}^{6})}^{2}+(1.5\times {10}^{6}){(p\times {10}^{6})}^{3}][1(7.8\times {10}^{4})(t20)]$$ as discussed by Kocas et al. [31].The parameters and inputs for the pressure balance model that generates the known pressures are A_{0} = 1.96151 × 10^{−6} m^{2}, u (A_{0}) = ±9.89581 × 10^{−11} m^{2}, λ = 7.25 × 10^{−13} Pa^{−1}, u (λ) = ±4.5 × 10^{−14} Pa^{−1}, α = 9.10 × 10^{−6} K^{−1}, u (α) = ±0.45 × 10^{−6} K^{−1}, V_{s} = 0 m^{3}, u (V_{s}) = ± 10^{−10} m^{3}, σ = 31.2 × 10^{−3} N m, u (σ) = ±0.001 × 10^{−3} N m^{−1}, H = 173.1 × 10^{−3} m, u (H) = ±0.2 × 10^{−3} m, g = 9.7860994 m s^{−2}, and u (g) = 10^{−7} m s^{−2} respectively where the gravitational acceleration is common to both the pressure balance that generates the pressure as well as the pressure balance which undergoes the crossfloating. For the particular data we assume that the ambient conditions are measured such that u (t_{a}) = ±0.5 ^{∘}C, u (p_{a}) = ±15 Pa and u (h_{a}) = ±0.05 respectively, and that the temperature of the respective pressure balances can be measured such that t = ±0.015 ° C.
The associated parameters and inputs for the pressure balance model that undergoes the crossfloating are V_{s} = 0 m^{3}, u (V_{s}) = ± 10^{−10} m^{3}, H = 0 m, u (H) = ±0.2 × 10^{−3} m and α = 1.45 × 10^{−5} K^{−1} respectively where the surface tension for the oil is the same as for the previously specified pressure balance since the working fluid is common to both the LS and TS pressure balances.
Appendix B Parameter fits of specific copulas
To fit specific types of copulas consider for example explicitly using a Clayton distribution. The corresponding R computer code to fit a Clayton copula and extract the associated parameter values would be:
The parameter value θ associated with a user's particular choice of copula family may then be obtained by simply printing out the value as
When the respective fits are completed for a particular choice of copula family we could then compare the Kendall's tau values for a particular fit with that of the optimal copula fit in order to determine how closely the chosen fit matches the actual data correlation by using the following code:
In our particular case using Kendall's tau as an indication of the variate correlation from the above VineCopula library computer results we observe that the Clayton copula approximation diverges from the optimal Gaussian copula and therefore it would not be beneficial to use a Clayton's copula to model the pressure balance joint distribution. If simulations are performed on a 64bit computer using double floats then the practical numerical precision is usually 16 digits unless variable precision arithmetic (VPA) libraries are used as previously discussed.
In the event that the copula function determined either from an optimal fit using the VineCopula optimal fit routine or a custom copula fit based on physical or mathematical requirements is too complicated to algebraically differentiate then a numerical differentiation of a function u (x, y) may be performed so that $$\frac{{\partial}^{2}u({x}_{i},{y}_{j})}{\partial x\partial y}\approx \frac{1}{4(\Delta x)(\Delta y)}\times [{u}_{i+1,j+1}{u}_{i+1,j1}{u}_{i1,j+1}+{u}_{i1,j1}]+\mathcal{O}\left({(\Delta x)}^{2},{(\Delta y)}^{2}\right)$$ The copula density may then be explicitly calculated directly from the analytical copula function C(u, v ; θ) by using the above numerical approximation so that $$c(u,v;\theta )=\frac{{\partial}^{2}C(u,v;\theta )}{\partial u\partial v}$$As a result as long as an analytical algebraic expression for the copula function C (u, v ; θ) is specified along with the respective value for the parameter θ in for example a calibration certificate or regional/international intercomparison report, then the joint PDF distribution information for the measurement instrument is completely specified through the reported copula function and appropriate parametrizations for the associated marginal distributions.
Extensions to nonparametric copulas in terms of empirical copulas are possible using the R package copBasic [39]. More recent research in the area of empirical copulas tends to favour an empirical beta copula as discussed by Segers et al [42] due to certain statistical technical reasons, and the use of empirical beta copulas may therefore in principle be considered for utilization by metrologists for modelling higher dimensional joint probability density distributions where multivariate Gaussian probability density functions are problematic.
References
 BIPM, IEC, IFCC, ILAC, ISO, IUPAC, IUPAP, and OILM, Guide to the expression of uncertainty in measurement. Technical report, Bureau International de Poids et Mesures (BIPM), 2008, www.bipm.org/utils/common/documents/jcgm/JCGM_100_2008_E.pdf [Google Scholar]
 BIPM, IEC, IFCC, ILAC, ISO, IUPAC, IUPAP, and OILM, Evaluation of Measurement data − Supplement 1 to the “Guide to the expression of uncertainty in measurement” − propagation of distributions using a Monte Carlo method. Technical report, Bureau International de Poids et Mesures (BIPM), 2008, http://www.bipm.org/utils/common/documents/jcgm/JCGM_101_2008_E.pdf [Google Scholar]
 P.M. Harris, C.E. Matthews, M.G. Cox, A.B. Forbes, Summarizing the output of a Monte Carlo Method for uncertainty evaluation, Metrologia 51, 243–252 (2014) [Google Scholar]
 V. Ramnath, Application of quantile functions for the analysis and comparison of gas pressure balance uncertainties, Int. J. Metrol. Qual. Eng. 8, 1–18 (2017) [CrossRef] [EDP Sciences] [Google Scholar]
 BIPM, IEC, IFCC, ILAC, ISO, IUPAC, IUPAP, and OILM, Evaluation of measurement data − supplement 2 to the “Guide to the expression of uncertainty in measurement” − extension to any number of output quantities. Technical report, Bureau International de Poids et Mesures (BIPM), 2011, http://www.bipm.org/utils/common/documents/jcgm/JCGM_102_2011_E.pdf [Google Scholar]
 R.S. Dadson, S.L. Lewis, G.N. Peggs, The Pressure Balance: Theory and Practice ( HMSO, London, 1982) [Google Scholar]
 A. Picard, R.S. Davis, M. Glaser, K. Fujii, Revised formula for the density of moist air (CIPM2007), Metrologia 45, 149–159 (2008) [Google Scholar]
 M. Dehghan, M. Hajarian, Some iterative free quadratic and cubic convergence iterative formulas for solving nonlinear equations, Comput. Appl. Math. 29, 19–30 (2010) [CrossRef] [Google Scholar]
 W. Bich, Interdependence between measurement uncertainty and metrological traceability, Accredit. Qual. Assur. 14, 581–586 (2009) [CrossRef] [Google Scholar]
 M.G. Cox, P.M. Harris, Software support for metrology best practice guide no. 6–uncertainty evaluation. Technical report, National Physical Laboratory (United Kingdom), 2006. NPL Report DEMES011 (ISSN 17440475), http://publications.npl.co.uk/npl_web/pdf/dem_es11.pdf [Google Scholar]
 R. Palencar, G. Wimmer, M. Halaj, Determination of the uncertainties and covariances in the calibration of the set of weights, Meas. Sci. Review 2, 9–20 (2002) [Google Scholar]
 EURAMET, Calibration of Pressure Balances. Technical report, European Association of National Metrology Istitutes, 2011. Guide 3 Version 1.0, ISBN 9783942992022 [Google Scholar]
 W. Sabuga, G. Molinar, G. Buonanno, T. Esward, J.C. Legras, Finite element method used for calculation of the distortion coefficient and associated uncertainty of a PTB 1 GPa pressure balance − EUROMET project 463, Metrologia 43, 311–625 (2006) [Google Scholar]
 S. Dogra, S. Yadav, A.K. Bandyopadhyay, Computer simulation of a 1.0 GPa pistoncylinder assembly using finite element analysis (FEA), Measurement 43, 1345–1354 (2010) [CrossRef] [Google Scholar]
 M.G. Cox, B.R.L. Siebert, The use of a Monte Carlo method for evaluating uncertainty and expanded uncertainty, Metrologia 43, S178–S188 (2006) [Google Scholar]
 R. Willink, Representing Monte Carlo output distributions for transferability in uncertainty analysis: modelling with quantile functions, Metrologia 46, 154–166 (2009) [Google Scholar]
 K. Goda, Statistical modeling of joint probability distribution using copula: application to peak and permanent displacement seismic demands, Struct Saf. 32, 112–123 (2010) [CrossRef] [Google Scholar]
 Scipy v0.19.0 reference guide − scipy.interpolate.bspline, https://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.BSpline.html#scipy.interpolate.BSpline (accessed: 2017 /02/04). [Google Scholar]
 V. Ramnath, Determination of pressure balance distortion coefficient and zeropressure effective area uncertainties, Intern. J. Metrology Quality Eng. 2, 101–119 (2011) [Google Scholar]
 M. Krystek, M. Anton, A leastsquares algorithm for fitting data points with mutually correlated coordinates to a straight line, Meas. Sci. Technol. 22, 035101 (2011) [Google Scholar]
 X.S. Tang, D.Q. Li, C.B. Zhou, K.K. Phoon, L.M. Zhang, Impact of copulas for modelling bivariate distributions on system reliability, Struct. Saf. 44, 80–90 (2013) [CrossRef] [Google Scholar]
 C. Genest, A.C. Favre, Everything you always wanted to know about copula modeling but were afraid to ask, J. Hydrol. Eng. 12, 347–368 (2007) [Google Scholar]
 X. He, P. Ng, S. Portnoy, Bivariate quantile smoothing splines, J. R. Stat. Soc. Ser. B: Stat. Methodol. 60, 537–550 (1998) [CrossRef] [Google Scholar]
 D.H. Bailey, J.M. Borwein, Hightprecision arithmetic in mathematical physics, Mathematics 3, 337–367 (2015) [CrossRef] [Google Scholar]
 D.R. White, P. Saunders, The propogation of uncertainty with calibration equations, Meas. Sci. Technol. 18, 2157–2169 (2007) [Google Scholar]
 W.H. Press, S.A. Teukolsky, W.T. Vetterling, B.P. Flannery, in Numerical Recipes: The Art of Scientific Computing, 3rd edn. (Cambridge University Press, 2007) [Google Scholar]
 W.G. Gilchrist, in Statistical Modelling with Quantile Functions, edited by Chapman and Hall, 1st edn. (2000) ISBN 158488 174–7 [CrossRef] [Google Scholar]
 P. Chaudhuri, On a geometric notion of quantiles for multivariate data, J. Am. Stat. Assoc. 91, 862–872 (1996), http://www.jstor.org/stable/2291681 [Google Scholar]
 H. Kajikawa, T. Kobata, K. Ide, A. Ooiwa. Precise determination of the jacket pressure coefficient of controlled clearance pressure balances, in 3rd IMEKO TC16 International Conference on Pressure Measurement (Curran Associates, Inc., 2007), p. 125–134, ISBN 9781615676354 [Google Scholar]
 H. Kajikawa, K. Ide, T. Kobata, Methods of precisely estimating the jacket pressure coefficient of controlled clearance piston cylinders at pressures up to 1 GPa, Metrologia 48, 352–358 (2011) [Google Scholar]
 I. Kocas, W. Sabuga, M. Bergoglio, A. Eltaweel, C. Korasie, P. Farar, J. Setina, B. Waller, Y. Durgut, Final report on key comparison euramet.m.pk13 in the range 50 mpa to 500 mpa of hydraulic gauge pressure, Metrologia 52, 07008 (2015) [Google Scholar]
 U. Diwekar, A. David, in BONUS Algorithm for Large Scale Stochastic Nonlinear Programming Problems, edited by U. Diwekar, A. David (Springer, 2015), pp. 27–34, ISBN 9781493922819, http://link.springer.com/content/pdf/10.1007%2F9781493922826_3.pdf [Google Scholar]
 E. Larsson, B. Fornberg, A numerical study of some radial basis function based solution methods for elliptic PDEs, Comput. Math. Appl. 46, 891–902 (2003) [Google Scholar]
 I. Georgieva, C. Hofreither, An algorithm for lowrank approximation of bivariate functions using splines, J. Comput. Appl. Math. (2017) 310, 80–91, DOI:10.1016/j.cam.2016.03.023 [Google Scholar]
 X. Luo, P.V. Shevchenko, The t copula with multiple parameters of degrees of freedom: bivariate charactersistics and application to risk management, Quant. Finance 10, 1039–1054 (2010) [Google Scholar]
 J. Yan, Enjoy the joy of copulas: with a package copula, J. Stat. Softw. 21, 1–21 (2007) [EDP Sciences] [Google Scholar]
 I. Kojadinovic, J. Yan, Modelling multivariate distributions with continous margins using the copula R package, J. Stat.Softw. 34, 1–20 (2010) [CrossRef] [Google Scholar]
 M. Hofert, I. Kojadinovic, M. Maechler, J. Yan, The Comprehensive R Archive Network − Multivariate Dependence with Copulas Version 0. 99915, 2017, https://cran.rproject.org (accessed: 05/29/ 2017) [Google Scholar]
 W. Asquith, The Comprehensive R Archive network − General Bivariate Copula Theory and Many Utility Functions Version 2.0.4, 2017, https://cran.rproject.org (accessed: 05/29/2017) [Google Scholar]
 U. Schepsmeier, E. C. Brechmann, B. Graeler, T. Nagler, T. Erhardt, C. Almeida, A. Min, C. Czado, M. Hofmann, M. Killiches, H. Joe, T. Vatter, The Comprehensive R Archive Network − Statistical Inference of Vine Copulas Version 2.1.2, 2017, https://cran.rproject.org (accessed: 05/29/2017) [Google Scholar]
 C. Meyer, The bivariate normal copula, Commun. Stat.: Theory Methods 42, 2402–2422 (2013) [CrossRef] [Google Scholar]
 J. Segers, M. Sibuya, H. Tsukahara, The empirical beta copula, J. Mult ivar. Anal. 155, 35–51 (2017) [CrossRef] [Google Scholar]
Cite this article as: Vishal Ramnath, Numerical analysis of the accuracy of bivariate quantile distributions utilizing copulas compared to the GUM supplement 2 for oil pressure balance uncertainties, Int. J. Metrol. Qual. Eng. 8, 29 (2017)
All Tables
Summary of ELD parameters for generated applied pressures P_{k}/[Pa] for k ∈ [1, … , 10]. The index k indicates the pressures P_{k} from P_{1} = 50 MPa through to P_{10} = 500 MPa so that the extended lambda distribution parameter a_{k}, b_{k}, c_{k} and d_{k} can be used to summarize the probability density function distribution for P_{k}.
Summary of ELD parameters for generated areas S_{k}/[m^{2}] for k ∈ [1, … , 10]. The index k indicates the area S_{k} corresponding to the crossfloated pressures from P_{1} = 50 MPa through to P_{10} = 500 MPa so that the extended lambda distribution parameter a_{k}, b_{k}, c_{k} and d_{k} can be used to summarize the probability density function distribution for S_{k}.
Marginal distribution ELD parameters for zeropressure area A_{0} and distortion coefficient λ variates. The underling probability density function distribution for the zeropressure area random variable ${\xi}_{{A}_{0}}$ and the distortion coefficient random variable ξ_{λ} is modelled and approximated with an extended lambda distribution based on the discrete data obtained from the GUM Supplement 2 uncertainty analysis.
Specification of common parameter based bivariate copula families (Adapted from Goda [17]). In our particular problem the variable u corresponds to the random variable ${\xi}_{{A}_{0}}$ and the variable v corresponds to the random variable ξ_{λ} so that the bivariate copula C (u, v) can be used to construct the joint probability density function as ${g}_{{A}_{0},\lambda}({\xi}_{{A}_{0}},{\xi}_{\lambda})=\frac{{\partial}^{2}C}{\partial u\partial v}$ for convenience.
Extended lambda distribution (ELD) parameters for transfer standard pressure balance generated applied pressures P_{k}/[Pa] using bivariate copula joint PDF. The index k corresponds to the pressures P_{k} for 50 MPa through to 500 MPa and the parameters a_{k}, b_{k}, c_{k} and d_{k} can then be used to construct analytical expressions for the probability density function distribution for each of the P_{k} pressures.
Pressure balance laboratory standard and transfer standard mass and density data.
Pressure balance laboratory standard and transfer standard temperatures and ambient data.
Boolean values to determine pressure balance laboratory standard and transfer standard operating conditions (weights for pressure generation).
Boolean values to determine pressure balance laboratory standard and transfer standard operating conditions (weights for cross floating).
All Figures
Fig. 1 Illustration of how the Monte Carlo data from a GUM Supplement uncertainty analysis is utilized in order to determine the joint probability density function for the transfer standard pressure balance. In (a) the univariate GUM Supplement 1 (GS1) Monte Carlo data of the crossfloated pressures P_{k} and crossfloated areas S_{k} is first preprocessed by averaging and plotted in order to get a rough qualitative estimate of the model parameters. In (b) the actual GS1 data points are then processed with the GUM Supplement 2 (GS2) using a suitable χ^{2} optimization to extract the actual model parameters and the discrete model parameter values plotted. In (c) the discrete model parameters are then postprocessed using a kernal density estimation scheme in order visualize the mathematically continuously variable joint bivariate probability density function behaviour of the underlying model. 

In the text 
Fig. 2 Illustration of Monte Carlo GUM Supplement 2 processed data showing a visualization of the transfer standard pressure balance bivariate joint PDF where the random variables are plotted in physical units such that 10^{6}A_{0}/[m^{2}] = A_{0}/[mm^{2}] and 10^{12}λ/[Pa^{−1}] = λ/[ppm/MPa] for easy visualization so that the joint PDF has the correct magnitude for a two dimensional normalization such that ${\int}_{\infty}^{\infty}{\int}_{\infty}^{\infty}{g}_{{A}_{0},\lambda}({\xi}_{{A}_{0}},{\xi}_{\lambda})\text{\hspace{0.17em}d}{\xi}_{{A}_{0}}\text{d}{\xi}_{\lambda}=1$ when constructed from the two dimensional histogram data using ${N}_{{A}_{0}}=\mathrm{}{N}_{x}=63$ and N_{λ} = N_{y} = 58 bins respectively with M = 10 000 Monte Carlo simulation events. In (a) the actual discrete data of the bivariate joint probability density function is first plotted for a qualitative visualization. In (b) the discrete data is postprocessed to produce the mathematically continuous joint probability density function in a three dimensional space where each space coordinate corresponds to a random variable. In (c) the joint probability density function is normalized in order to more clearly view the behaviour of the underlying model. 

In the text 
Fig. 3 Monte Carlo GUM Supplement 2 marginal distributions. In (a) the probability density function distribution for the continuous random variable ${\xi}_{{A}_{0}}$ of the zeropressure area is visualized. In (b) the probability density function distribution for the continuous random variable ξ_{λ} of the distortion coefficient is visualized. In (c) the underlying discrete data from the GUM Supplement 2 analysis that is used to build the PDF for the zeropressure area is shown. In (d) the underlying discrete data from the GUM Supplement 2 analysis that is used to build the PDF for the distortion coefficient is shown 

In the text 
Fig. 4 Illustration of comparison between the actual bivariate joint PDF surface with that of a conventional classical marginal distribution approach approximation showing the discrete set of contour curves visualized by three dimensional lines that are used to construct joint PDF surface, and that of a scattered data approximation of the corresponding normalized bivariate surface. In (a) the actual continuous probability density function distribution is shown which must be modelled and approximated through a visualization in a three dimensional space where the x coordinate corresponds to the random variable ${\xi}_{{A}_{0}}$, the y coordinate corresponds to the random variable ξ_{λ} and the z coordinate corresponds to the joint PDF ${g}_{{A}_{0}\mathrm{}\lambda}\mathrm{}({\xi}_{{A}_{0}}\mathrm{},{\xi}_{\lambda}\mathrm{})$. In (b) the modelling and approximation is shown where the three dimensional surface is approximated by a series of three dimensional lines within the three dimensional space. In (c) the modelling and approximation is visualized for the equivalent surface that is built up from discrete three dimensional scattered points. 

In the text 
Fig. 5 Illustration of how a GUM Supplement 2 (GS2) Monte Carlo uncertainty analysis would be implemented in practice for constructing the marginal distribution and copula distribution parameters. In (a) an extract of the actual M = 10000 data points is shown from the GS2 which were determined with a suitable χ^{2} optimization. In (b) the computer code implementation of how to process the GS2 data to fit the marginal distributions in terms of an extended lambda distribution is shown. In (c) the computer code implementation of how to determine the copula parameters using open source easily accessible standard statistical software is shown. 

In the text 
Fig. 6 Illustration of mcode for estimating an ELD parametrization for univariate Monte Carlo data. The computer program works by loading the Monte Carlo univariate data as an input ydata = [y_{1}, … , y_{M}] ^{T} where M = 10 000 is the number of Monte Carlo simulation events and it then uses this data to determine the extended lambda distribution parameters a, b, c and d so that the probability density function can be analytically specified. 

In the text 
Listing 1 Computer output used to calculate the univariate ELD parameters for A0 and . 

In the text 
Listing 2 Computer output used to calculate the copula parameter with open source standard statistical software. 

In the text 
Listing 3 Computer output used to calculate sampled values from the copula with open source statistical software. 

In the text 
Fig. 7 Verification and validation of proposed method using normalized errors of applied pressures demonstrating that ∥E_{n} ∥ ≤1 ∀ n ∈ [1, … , 10] laboratory standard pressure balance generated pressures and transfer standard pressure balance crossfloated area measurements. The pressures P_{1} through to P_{10} correspond to pressures from 50 MPa through to 500 MPa at 50 MPa steps and the E_{n} values for each of these pressures indicates the corresponding normalized error. When calculating the E_{n} value for each pressure the known reference pressure P_{ref} is determined from the laboratory standard pressure balance whilst the calibrated pressure P_{cal} is determined from the transfer standard pressure balance using the bivariate copula that models and summarizes the information of the calibrated pressure balance's zeropressure area A_{0} and distortion coefficient λ. Due to the fact that all of the E_{n} values are smaller than unity this then means that the results and uncertainties that were calculated for the calibrated pressure balance when tested against the reference pressure balance are statistically consistent and hence provides proof that the proposed method of using copulas to model and summarize bivariate probability density functions has been verified and validated. 

In the text 
Current usage metrics show cumulative count of Article Views (fulltext article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.
Data correspond to usage on the plateform after 2015. The current usage metrics is available 4896 hours after online publication and is updated daily on week days.
Initial download of the metrics may take a while.