Open Access
Issue
Int. J. Metrol. Qual. Eng.
Volume 8, 2017
Article Number 29
Number of page(s) 29
DOI https://doi.org/10.1051/ijmqe/2017018
Published online 27 November 2017

© V. Ramnath, published by EDP Sciences, 2017

Licence Creative Commons
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

1 Introduction

In the field of pressure metrology when piston-cylinder operated pressure balances are used to measure applied pressures defined as P = F/Ae where F is a generalized force and Ae is the effective area, the most widespread model to characterize a pressure balance is that of a two parameter model such that Ae = A0 (1 + λP) where A0 is the zero-pressure area and λ is the distortion coefficient. As a result knowledge of both the expected values as well as the associated covariance of a pressure balance's effective area model parameters A0 and λ are intrinsically necessary in order to fully quantify the uncertainty in Ae for subsequent application in pressure generation and measurement tasks. The conventional practise by many metrologists when performing an uncertainty quantification (UQ) analysis of a piston-cylinder operated pressure balance is to characterize the model parameters A0 and λ based on an analysis of the combination of the available experimental data such as cross-floating measurements and theoretical data such as finite element simulations. This analysis is usually performed with the application of the classical sensitivity coefficient based formulation of the GUM [1], or alternatively with the GUM Supplement 1 (GS1) Monte Carlo based UQ technique [2] in cases where the measurand model is either too complex or too nonlinear.

When these UQ techniques are applied the resulting multivariate UQ data for the pressure balance model, either implicitly in terms of a multivariate Gaussian distribution if generated with the GUM or potentially in terms of a multivariate non-Gaussian distribution if generated with the GS1, is then typically approximated with univariate probability density function (PDF) distributions for A0 and λ due to the complexity of the models, whilst the covariance information through the available experimental and theoretical available data in terms of the coupling between A0 and λ is then usually either neglected, estimated or occasionally indirectly approximated.

Univariate PDFs of measurands have traditionally been analytically specified either as Gaussian or Student t-distributions in the case of a GUM based analysis, or alternately through discrete representations of either the underlying distribution function or equivalently the PDF when the uncertainty analysis was performed using the GS1 approach. In GS1 based uncertainty analysis approaches generated Monte Carlo data has in a majority of cases been utilized for subsequent analysis by retrieving previously generated and stored numerical data from electronic files. Recent work by Harris et al. [3] considered the problem of how to analytically represent GS1 numerical data by investigating and studying different families of quantile functions and approaches in order to estimate the respective quantile function parameters. Their investigations demonstrated the utility of extended lambda distribution (ELD) based quantile function representations to accurately summarize arbitrary PDFs generated using Monte Carlo simulation data. This methodology has recently been applied by Ramnath [4] in the field of gas pressure metrology through the use of an ELD based quantile function for the characterization of a pressure balance's effective area in the low pressure range 5 kPa ≤ p ≤ 7 MPa where the distortion coefficient λ is considered negligible and the pressure balance characteristics is fully specified in terms of just a univariate PDF of the zero-pressure area A0. In this particular investigation a non-continuum gas flow was present and as a result the GUM was not appropriate due to the presence of a highly non-linear measurand model. The measurand model of the gas pressure balance's effective area which was numerically solved with a corresponding uncertainty PDF determined through a GS1 Monte Carlo methodology demonstrated the feasibility of utilizing quantile functions for characterizing univariate PDFs which may exhibit deviations from more traditional Gaussian shaped deviations such as asymmetry and/or skewness in more complex fluid based primary scientific measurement systems.

Unfortunately for pressure balances operated at higher pressures where the effect of the distortion coefficient is significant due to fluid structural interaction effects the uncertainties and correlation effects between A0 and λ for hydraulic oil operated gauge pressure balances in the medium pressure range 7 MPa ≤ P ≤ 30 MPa must be considered. These effects are considered particularly critical and essential in the high pressure range 30 MPa ≤ P ≤ 500 MPa when calculating the uncertainty of a pressure balance's effective area. As a result based on this observation from an UQ perspective it is considered desirable and beneficial if a pressure balance model can be characterized directly in terms of a joint bivariate PDF in order to preserve intrinsic statistical uncertainty analysis information that may utilized be in order to accurately determine and quantify the magnitudes of the correlation effects that are present in physical pressure balances operated at high hydraulic pressures where there is strong coupling between A0 and λ.

Based on these motivating factors our approach in this paper is to firstly utilize an implicit multivariate GUM methodology which is used to solve for and characterize generated applied pressure data values and uncertainties for an oil pressure balance laboratory standard by sampling from appropriate PDF and joint PDF distributions where available. We then utilize the known generated pressure values and uncertainties as inputs into a full GS1 Monte Carlo simulation which is used to numerically generate multivariate data for the cross-floating of a transfer standard. Once the multivariate data for the transfer standard has been generated we then post-process it using the GS2 [5] methodology for further analysis in order to determine the actual ellipsoidal and smallest coverage regions for the coupling of the transfer standard's zero-pressure area A0 and distortion coefficient λ in a joint PDF. The exact coverage regions obtained with the GS2 methodology are then compared with results obtained with approximate bivariate quantile distributions that we formulate in this paper in order to investigate and determine the potential accuracy of bivariate quantile distributions for modelling pressure balance characteristics in terms of joint PDFs.

2 Mathematical models

The mathematical model for a pressure balance using a working fluid in a liquid state may be formulated as a pressure measurand equation through the analysis of a free-body diagram such that the applied pressure P = (p − pa) is of the form (1) where p is the working fluid pressure, and pa is the surrounding ambient pressure as per the original formulation by Dadson et al. [6]. In this model mi and ρi are the actual mass and density values for the system of weights used to generate the respective pressures if there are n total mass pieces, ρa is the ambient density, Vs is the submerged volume of the piston into the working fluid, ρf is the density of the working fluid at an actual pressure of p, g is the local gravitational acceleration constant, σ is the surface tension of the working fluid where C is the circumference of the wetted perimeter of the piston in contact with the working fluid, H is a hydrostatic height term to account for any potential differences in elevation between the reference datums of generated pressues and measured pressures, and S is the effective area of the pressure balance. In the absence of additional information a useful approximation that may be used to estimate the circumference is .

Due to the fact that physical measurements may occur in which the temperature of the pressure balance may vary through a combination of ambient conditions and operating conditions the effective area is usually adjusted to correspond to a known reference temperature such that S = A0 [1 + λP] ϕ (t, tref) and ϕ (t, tref) =1 + α (t − tref). As per the discussion by Dadson et al. α is the sum of the thermal expansion coefficients of the piston and cylinder so that α = (αp + αc), and the reference temperature is fixed to a constant value which is usually just set to tref = 20 ° C in many commonwealth countries, and to 23  ° C in other countries. Following the general approach by many pressure metrologists the ambient air density may be specified using the CIPM-2007 formula as  as discussed by Picard et al. [7] where Ma = 28.96546 × 10−3 kg mol−1 is the molar mass of dry air, Mv = 18.01528 × 10−3 kg mol−1 is the molar mass of water, Z is the air compressibility, R = 8.314472 J mol−1 K−1 is the CODATA-2006 recommended value for the universal gas constant, and xv is the mole fraction of water vapour for the corresponding pressure, temperature and relative humidity conditions for the surrounding ambient air. The previous formulae may now be combined and rearranged as a nonlinear function  where p is the unknown working fluid pressure p which is to be determined and  are laboratory standard parameters such that (2) The generated pressure p of the working fluid from a laboratory standard may then be obtained by solving  with any suitable solver by noting that ρf, C and S formally depend on the pressure p whilst all the other terms are independent of the working fluid's pressure. The above calculations are performed if we assume that the air density ρa = fa (pa, ta, ha) and fluid density ρf = ff (p, t) are calculated by known functions in terms of the independent variables, and where  formally has units of applied pressure which allows for the numerical solution to be obtained for any specified applied pressure accuracy level.

A simple approach to solve for the unknown generated pressure p is to assume that pmin ≤ p ≤ pmax and search within this pressure range where pmin and pmax are rough estimates of the nominal pressure. As an example for an applied pressure of 250 MPa with a nominal atmospheric pressure of pa = 101.325 kPa first calculate the approximate pressure as p0 = P + pa and then for example set pmin = 0.95p0 and pmax = 1.05p0 as specifications to search within this interval or alternately to solve the full non-linear solver of  using p0 as a starting solution. The next issue which naturally arises is how to solve this particular non-linear equation noting that for the generating pressure equation both the density ρf and the wetted circumference  depend on the pressure p. Ideally we would use for example Newton's method xn+1 = xn − [f(xn)]/[f′(xn)], n = 0, 1, 2, ... for successive approximations to solve f (x) =0 where f′(xn) denotes the derivative of the function where the benefits of using an iterative solver is that the generating pressure may be calculated to any desired numerical accuracy level. In our particular case it is however cumbersome to analytically calculate the derivative due to the choice of the fluid's equation of state. Fortunately there are derivative free iterative formulae as reported by Dehghan and Hajarian [8] to solve a non-linear equation f (x) =0 and we may to use a forward difference formula originally developed by Steffensen such that xn+1 = xn − [f (xn)] 2/[f (xn + f (xn))] − f (xn)]. As a result through the use of the Steffensen formula we may control the numerical precision of the solved for applied pressures Pk, k = 1, … , np where np are the number of generated pressures by specifying the number of iterations.

In pressure metrology practise there is generally only have a limited indirect level of covariance information available and it is therefore common to disregard covariance terms in the  model by setting cov (qi, qj) =0,  i ≠ j in the generated pressure calculation. The only potential exception where cov (qi, qj) ≉0 i ≠ j is in terms of the correlation between A0 and λ i.e. for cov (A0, λ), and in the correlations between the weights and densities used i.e. cov (mi, mj), cov (ρi, ρj) and cov (mi, ρj) respectively for i ≠ j unless otherwise specified.

The issue of how to approximate the correlation for weights was previously investigated by Bich [9] who utilized the multivariate GUM approach of Cox and Harris [10] in a slightly modified equation for a mass measurement as a general implicit equation. Later investigations by Palencar et al. [11] followed a similar multivariate GUM framework but refined the analysis slightly by using a least squares formulation to deduce the optimal values of the parameters in a calibration of mass pieces. Their investigation concluded that covariance terms can actually contribute non-negligible terms in the uncertainty analysis of mass measurements and that ideally covariance information should be included in mass measurement calibration certificates.

Whilst the approach of Palencar et al. [11] does allow for the covariance information to be directly estimated under certain restricting assumptions for mass pieces traceable through a sequence of measurements to a particular country's national kilogram prototype standard or equivalent national metrology institure primary mass standard, in this paper we will disregard the correlation effects between mass and density terms used by the set of weights in the laboratory standard pressure generation model for simplicity. This modelling simplification is due to the fact that in most practical cases calibrated mass certificates do not provide any covariance information for the weights used by a pressure balance. In addition for simplicity we will also disregard the covariance between the laboratory standard zero-pressure area and distortion coefficient cov (A0, λ) in the pressure balance that is used to generate the known applied pressure due to that fact that many existing pressure calibration certificates do not generally provide any specific explicit covariance information.

Our approach of disregarding the covariance term cov (A0, λ) for the generated pressure in the absence of specific known covariance information is consistent with experimental recommendations for national metrology institutes for pressure balances operated in free deformation mode as documented in for example EURAMET CG3 [12], and results from theoretical finite element based studies for pressure balances operated in both free deformation and controlled clearance modes such as the EUROMET Project 463 by Sabuga et al. [13] and in controlled clearance modes only by Dogra et al. [14] respectively. As a result of our choice to disregard covariance effects in the generated pressure calculation when implementing a GS1 Monte Carlo approach as discussed by Cox and Siebert [15] for the pressure generation calculations, it will then usually suffice to sample with appropriate numerical techniques for the inputs qi from univariate PDFs gi (ξi) associated with the corresponding input parameters qi where ξi is a random variable of qi unless information of a joint PDF is specifically available. Unless otherwise specified the PDFs for the input parameters will usually be Gaussian distributions however following the more recent work by Harris et al. it is now in principle possible to model the inputs as arbitrary PDFs using a univariate ELD quantile distribution as originally investigated by Willink [16].

We comment that even though we disregard covariance terms in the inputs of the generated pressure balance model in the absence of available information that when these generated pressures are used as inputs in the cross-floated pressure balance model to deduce the corresponding effective areas that due to the complexity of the mathematical model the covariance information in the parameters used to model the effective areas will still be present in an implicit form of the associated Monte Carlo data of the cross-floated effective areas.

Once the generated pressures from the laboratory standard (LS) pressure balance model are obtained and the GS1 information appropriately processed, the generated pressure PDFs  where is a random variable for generated pressure pi may then used as inputs for when the LS is cross-floated against another pressure balance such as a unit-under-test or transfer standard (TS) in order to determine the other pressure balance's effective area. When two pressure balances are cross-floated against each other the point of equilibrium occurs when the applied pressure P = (p − pa) in both pressure balances are equal in which case the measurand equation for a cross-floated TS is (3) In the above equation S is the unknown TS effective area which must be determined, and qj, j = 1, … , m are corresponding parameters associated with the cross-floated TS pressure balance measurand equation. The benefit of this particular equation formulation is that f (S, q1, … , qm) formally has units of cross-floated area which then allows for the numerical solution accuracy to be controlled by specifying the number of iterations. Whilst a simple estimate to solve for the generated pressure is to assume a range of possible generated pressure values using for example 0.95p0 ≤ p ≤ 1.05p0 where p0 is the approximate nominal pressure p0 = P + pa as previously discussed such a straightforward estimate of the range of possible cross-floated areas of the TS is not possible since we have no prior information in order to determine a possible range of values Smin ≤ S0 ≤ Smax to search within.

A simple initial approximation to overcome this difficulty is to ignore the effect of the oil surface tension and TS submerged volume on the TS cross-floated area so that a rough cross-floated area estimate is  since the hydrostatic pressure head term would not be negligible and to then use a search range of say 0.95S0 ≤ S ≤ 1.05S0 in order to solve the full non-linear equation f (S, q1, … , qm) =0 or alternately by using for example the Steffensen formula as previously discussed.

In this paper the ⋆ super-scripts signifies terms specific to the LS pressure balance which is used to generate the pressure whilst the unstarred terms represent quantities that are specific to the TS pressure balance which is cross-floated in order to determine corresponding effective areas. The model f (S, q1, … , qm) =0 using the previous generated pressure PDF  inputs may then be solved in another GS1 simulation to obtain the PDF distribution  for the cross-floated effective areas Sk for each pressure Pk where is a random variable of the cross-floated effective area Sk.

Since the cross-floated effective areas Sk are univariate data it follows that an ELD-QF distribution may also be used to conveniently summarize Sk in terms of parameters {ak, bk, ck, dk} for k ∈ [1, … , nP] for each of the nP cross-float effective area measurements corresponding to the nP generated pressures where as previously discussed the cross-floated effective area is modelled in terms of a two parameter model such that S = A0 (1 + λP) where P is the applied pressure.

For our particular pressure balance problem a GS1 based uncertainty analysis is utilized for conceptual simplicity for the LS generated pressures in order to generate data for further analysis when the LS is cross-floated against the TS pressure balance as this avoids some of the subjectivity in certain aspects of the correlation effects modelling in the GUM based matrix analysis.

The GS1 methodology is focused on multivariate input models with an input  and a single output  defined in terms of an implicit equation  such that the expected value , variance  and distribution function  are approximated using M Monte Carlo simulations such that  where r = 1, 2, … , M, , where yr ≤ η ≤ yr+1 for r = 1, 2, … , M − 1. When the GS1 methodology is implemented for the LS generated pressure Pk,  k = 1, … , nP there will be a set of Monte Carlo univariate simulation data  for each generated pressure which may then be used as an input for the cross-float of the TS.

A GS2 methodology on the other hand when contrasted to the GS1 method also considers the case for an input X but where there are now multiple outputs such that  defined in terms of an implicit vector equation . This methodology is also implemented by generating Monte Carlo data by first sampling from the input vector PDF for  or relevant associated joint PDFs if this information is available such that  for r = 1, 2, … , M, , and  where M is the number of Monte Carlo simulation events in order to produce a dataset yr. In our particular case when the GS2 methodology is implemented for the TS cross-floated area Sk, k = 1, … , nP there will be a set of Monte Carlo bivariate simulation data .

When the GS2 uncertainty analysis method is implemented and the Monte Carlo simulation data post-processed an implicit multivariate model may be used to construct the average vector value and corresponding covariance matrix . As a result when using a GS2 approach the method as per the officmplicit assumption that the output follows a multivariate Gaussian distribution such that  where the expected value is and the covariance matrix is  unless otherwise specified. This feature is common to both the GS1 and GS2 due to the extreme mathematical complexity and numerical challenges in directly solving the Markov integral which provides for the exact PDF or joint PDF without any assumptions for both univariate as well as multivariate models. For the case of the GS2 the joint PDF for a multivariate Gaussian may then be written in matrix notation such that (4) where V−1 is the inverse of the matrix V, and  denotes the determinant of the matrix V. Using this notation in the case of a bivariate distribution with an input  and a corresponding random variable  the expected vector μ and covariance matrix V may then be specified as (5a) (5b)where and are the variances for X1 and X2 respectively, and ρ is an indication of the covariance between the inputs X1 and X2. The use of a Pearson r instead of ρ i.e. Spearman's rho as an indication of the correlation is usually the more common approach in mechanical metrology problems however other approaches such as Kendall's tau are also possible. The main advantage of these approaches are that the quantification of the extent of the correlation between the random variables x1 and x2 do not depend of the choice of parametrization for the model. Some additional benefits of using either a Kendall's tau or Spearman's rho instead of the Pearson's correlation coefficient are that these options are usually a more accurate indicator of correlation when the random variables X1 and X2 do not follow a Gaussian joint PDF since a Pearson's correlation can sometimes give misleading results if the random variables do not follow multivariate normal distributions as discussed in more detail by Goda [17]. If (x1, y1) , … , (xn, yn) are a set of n observations then Kendall's tau τK may be calculated as (6)where C is the number of concordant pairs and D is the number of discordant pairs for the set of observations where a particular pair (xi, yi) and (xj, yj) for i ≠ j is concordant if xi > xj and yi > yj or alternately if xi < xj and yi < yj, whilst the same pair is discordant if xi > xj and yi < yj or if xi < xj and yi > yj. In the special case if xi = xj and yi = yj then the pair is neither concordant nor discordant and does not contribute in the calculation of τK. The use of Kendall's tau τK provides a useful simplification later in the paper for constructing particular formulations of copulas.

The benefit of using a GS2 approach to directly calculate the cross-floated effective areas is that any covariance information is immediately present and available from the covariance matrix , and as a result this approach avoids any additional modelling assumptions for covariance terms. We will utilize the above  multivariate dataset in order to investigate the accuracy of a bivariate quantile distribution in approximating a joint PDF for A0 and λ.

Once the univariate generated pressure Monte Carlo data k has been generated it is then necessary to summarize them. In the case of a univariate PDF g (η) with a random variable η and output Y one particular definition of defining a quantile function Q (ρ) following the approach of Harris et al. is by setting the distribution function as G (η) = ρ where 0 ≤ ρ ≤ 1 such that the quantile function Q (ρ) is defined such that η = Q (ρ) ≡ G−1 (ρ) from which it may be demonstrated that g (η) =1/[Q (ρ)]. The practical consequences of this definition is that if ρ is sampled from the rectangular distribution R (0, 1) then Q (ρ) corresponds to a sampling from g (η), [Q (p1) , Q (p2)] is a probabilistically symmetric coverage interval for p if  and , and that the expectation and variance for Y may be calculated as  and  respectively. For the univariate case where a quantile function distribution following the approach of Willink [16] utilizing an ELDs is implemented it has been demonstrated by Willink that the corresponding QF and ELD-QF for the PDF may be simply represented in terms of real parameters a, b, c and d. Technical details to determine the values of a, b, c and d for an ELD were discussed in full and complete detail by Willink who compared the method of using four moments, four quantiles, and a modified method using the mean, variance and two quantiles. For pressure metrology purposes the method of using four quantiles is considered relatively simple and straightforward for pressure balances with univariate effective area models such as gas operated pressure balances at low pressure as discussed by Ramnath [4]. The approach of using quantile functions such as ELDs to summarize Monte Carlo uncertainty analysis data was further developed by Harris et al. [3] where they extended the earlier work of Willink for univariate distributions of a quantity Y to summarizing an arbitrary parent probability distributions denoted as P in terms of a generalized quantile function. This approach involved the utilization of the distribution function GP (ζ) and PDF GP (ζ) where ζ ∈ Z ∼ P is a random variable for the parent probability distribution and η ∈ Y is a random variable for the output Y. According to this approach where η = F (ζ) the random variable is calculated as  where ρ ∈ U ∼ R (0, 1) is a random variable following a rectangular distribution and the PDF of Y is calculated as g (η) = [gP (ζ)]/[F (ζ)]. The key simplification of this approach is that the generalized quantile function defined in terms of F (ζ) simply requires knowledge of F (ζ) and a mechanism of sampling random variables from the parent distribution ζ ∈ Z ∼ P. In this paper we will opt for this general approach of Harris et al. since the ELD approach of Willink using four parameters to model the PDFs of the generated pressures and cross-floated areas may not necessarily be adequate. For simplicity we will use a random variable ρ ∈ R (0, 1) following a rectangular distribution as this is convenient with readily available random number generators. Although it is possible to use an increasing number of parameters over and above the four parameters in for example an ELD based QF through the calculation of higher order moments by working out for example the coefficients of B-spline approximations for F (ζ) with open source software routines [18] our approach will simply directly utilize the actual GS1 distribution functions for each of the generated pressures and cross-floated areas when constructing the joint PDF.

Earlier approaches such as that by Ramnath [19] utilized statistical linear regression analysis techniques developed by Krystek and Anton [20] for straight line data with correlation in plots of applied pressure [P ± u (P)] data versus cross-floated effective area [S ± u (S)] data with associated error bars. In this approach the generated pressure and cross-floated areas uncertainties were calculated with the classical sensitivity coefficient based GUM and the straight line fit for the curve S = A0 (1 + λP) allowed for A0 and A0λ to be estimated, and which were then in turn used to estimate the distortion λ and indirectly estimate the correlation between A0 and λ. In this paper our approach is different since our objective is to instead construct an approximation to the actual joint PDF  with  so that we may directly sample random values  and ξλ from the underlying joint bivariate PDF similar to how univariate random variables ξP may be sampled from the generated pressure PDF.

The GS2 documentation specifically focuses on multivariate Gaussian distributions  for a model with an output x, an expected value μ and an uncertainty matrix V. Explicit guidelines for sampling for such a joint PDF are provided such that a sampled draw may be estimated as  where x = [μ1, … , μN] T, and  if  and R is an upper triangular matrix formed from a Cholesky decomposition such that . This focus is achieved through the assumptions that  and , however in general it may not necessarily be the case that a multivariate Gaussian is the best choice of PDF.

Although there may be some merits in particular cases of modelling the multivariate Monte Carlo data in terms of non-Gaussian distributions the benefits of doing so may be outweighed by the necessity of selecting, deciding on and choosing alternative probability distribution information formulations as discussed by Tang et al. [21] who studied these effects in certain applications using the equivalent probability distribution information contained in appropriate choices of copulas. The theoretical approach of constructing bivariate distributions as discussed by Tang et al. for constructing the joint cumulative distribution function (CDF) and PDF of variables X and Y is possible if the marginal distributions of X say F (x) and that of Y say G (y) where x and y are corresponding random variables are both known. If a particular choice of copula function is specified then the application of Sklar's theorem as discussed by Tang et al. allows for the joint PDF f (x, y) to be constructed as (7a) (7b) where u = F (x), v = G (y) and the copula is defined as C : I × I → I where c (F (x) , G (y)) is known as the copula density and I = [0, 1]. The copula C occasionally written as Cθ (u, v) where θ is a fixed parameter is formally defined as a mapping from the unit square I2 to the unit interval I. In this formulation θ is a parameter associated with the choice of the copula function for the underlying data specific to the particular model data as discussed by Genest and Favre [22]. One particular example of a copula are those belonging to the Farlie–Gumbel–Morgenstern family such that Cθ (u, v) = uv + θuv (1 − u) (1 − v) for u, v ∈ [0, 1] where the copula is constrained by the Frechet–Hoeffding bounds such that W (u, v) ≤ C (u, v) ≤ M (u, v) where W (u, v) = max {0, u + v − 1} and M (u, v) = min {u, v}. These bounds are used if X and Y are not independent, whilst a choice of C = Π = uv may be used if for example X and Y are fully independent where the degree of independence between the variables X and Y may be estimated in terms of either the Spearman's rho or Kendall's tau values as previously discussed. In the particular case of bivariate distributions the formal definition of a copula is that of a mapping such that (8a) (8b) (8c) (8d)Different types of copulas are considered later in the paper when we construct the exact joint PDF from a GS2 Monte Carlo simulation and utilize various choices of copulas in approximating the actual bivariate joint PDF in order to investigate the suitability of various choices of bivariate quantile constructions of which copulas are one particular choice amongst several for modelling and summarizing a pressure balance's joint PDF.

Regardless of the particular choice of copula Cθ (u, v) it is seen that the construction of the joint PDF essentially reduces to that of the choice of a mapping function . This choice of mapping function is necessary in order to relate how variables 0 ≤ p, r ≤ 1 may be used as inputs for the mapping function to generate the corresponding random variables consistent with the joint PDF f (x, y). As a result although the absence of guidance from the GS2 for sampling from non-Gaussian multivariate PDFs is potentially problematic this aspect is not an issue since it turns out that sampling from a bivariate quantile function based distribution may still be achieved using sampling from associated univariate PDFs and two dimensional coordinate transformation mappings.

In this paper our main motivation is to summarize the GS2 based joint PDF using a bivariate quantile smoothing spline that was originally developed by He et al. [23] where a response surface Z depends on two variables X and Y where it is assumed that the observations zij is known at each (xi, yj) for i = 1, … , m and j = 1, … , n such that x1 < ⋯ < xm and y1 < ⋯ < yn for convenience. In the original paper by He et al. it was demonstrated that the optimal solution for fitting the surface and they considered the special case where the covariates X1 and X2 were in the domain [0, 1] × [0, 1] which corresponds to our particular problem for random variables 0 ≤ p, r ≤ 1.

One approach to construct the joint PDF is based on the direct use of the Markov formula as discussed by Cox and Siebert [15] where the PDF denoted as  for a model  is formally defined in terms of the joint PDF denoted as  as . For our model the cross-floated area of the pressure balance is simply S = A0 (1 + λP) so this may be formally expressed as S ∼ gS (η) so that (9) In the above formulation  is the unknown joint PDF which is not necessarily a multivariate Gaussian where the applied pressure is independent since in practical terms it can be varied by simply setting the working fluid pressure. As a result the PDF for the pressure can be uncoupled from the joint PDF for the area. It follows that the above continuous multivariate integral may then be approximated as a system of simultaneous equations such that (10)Whilst the Markov formula may in principle be used to calculate the exact joint PDF it is however not generally feasible in practical terms due to finite arithmetic precision resolution errors. These issues are due to the large difference in magnitudes from the cross-floated area PDF gS (η), the joint PDF  and pressure PDF gP (ξP) which when utilized to build up an equivalent system to solve the Markov integral equation will result in an ill-conditioned linear system of the form  where the coefficient matrix A is built up in terms of the PDF gP (ξP), the known vector B is built up in terms of the PDF gS (η) and the unknown x is the values of the joint PDF  at fixed coordinates for a chosen grid of and ξλ points. Extensions beyond standard IEEE 32-bit and 64-bit accuracies to variable precision arithmetic (VPA) accuracies using Fortran/C++ based implementations to mitigate against finite arithmetic resolution errors are discussed in more detail by Bailey and Borwein [24].

Our approach in this paper is to instead use the GS2 approach for multivariate measurand models of the form  for an input x and output y. If  and  where ξ and η are random variables then model must also satisfy . This means that if the model undergoes a Monte Carlo simulation then η can also be post-processed in order to determine its probability distribution as per the GS2 documentation if η can be recovered from the equation .

The utilization of the GS2 to determine A0 and λ where in our case  is slightly more complicated since the pressure balance model for the TS f (S, q1, … , qm) =0 is not in the standard form where there is an explicit system of equations for the parameters A0 and λ. If we however consider the entire set of cross-float measurements as one system we may then construct a new vector equation h (y, x) =0 by employing a unweighted linear least squares which is discussed in more technical detail by White and Saunders [25].

For our purposes we will utilize the well known results for a general linear unweighted least squares problem where the function is constructed as  in terms of an independent variable x, and parameters a1, … , aN where Xi (x) are specified basis functions as discussed by Press et al. [26]. Following this approach a merit function is constructed as  and minimized by calculating the parameter values to satisfy ∂χ2/∂ ai = 0, i = 1, … , N. Implementing this approach to our particular problem where nP is the number of generated pressures and associated cross-floats with S = A0 (1 + λP) then yields (11) so that the simultaneous system of equations ∂χ2/∂ A0 = 0 and ∂χ2/∂ λ = 0 may then be used to implement the GS2 methodology where (12a) (12b) (12c)where in our case  and  as per our earlier mathematical modelling approach. In practical terms the approach used would be to sample random variables  in order to solve the LS equation f (p, q1, … , qm) =0 for the generated pressure, and to then use this as input to solve the TS equation f (S, q1, … , qm) =0 for the cross-floated area S. As a result there will be a set of cross-floated areas  for the M Monte Carlo simulation events where each simulation event will have a known applied pressure. Since for each simulation event the solved for pressures and cross-float areas are statistically valid it follows that the model would also need to be satisfied for these values.

As a result the GS2 Monte Carlo simulation of unweighted least squares of statistically sampled values in accordance with the underlying probablity distributions then becomes equivalent to a conventional multivariate regression practice as discussed by Press et al. [26] since the sampled values are formally a statistically valid possibility based on the underlying probability distribution. This system may then be used to determine a Hyper-ellipsoidal coverage region which in our particular problem will correspond to a two dimensional region for the bivariate joint PDF.

In order to have a better conceptual mathematical understanding of the statistical definition of a quantile we first make the geometric observation that in the case of a univariate PDF g (η) that the corresponding univariate quantile Q1D (ρ) function is technically a one dimensional mapping that transforms a random variable ρ sampled from a rectangular distribution into a corresponding random variable ξ such that the sampled value is η = g (ξ). We may then using this conceptual tool in this paper then use the approach of Gilchrist [27] to generalize the definition of a bivariate quantile Q2D ([ρ1, ρ2] T) as a corresponding two dimensional mapping In particular if when implementing the previously discussed geometrical mapping mathematical definition for bivariate quantiles by taking h (p, r) =1 where p and r are random variables such that 0 ≤ p ≤ 1 and 0 ≤ r ≤ 1 a natural consequence which results is that the bivariate quantile may be considered to be the mapping from the unit rectangle to the corresponding surface of the joint PDF f (x, y) formally defined following the approach by Gilchrist as (13a) (13b)In the above formula J (p, r|x, y) =1/[J (x, y|p, r)] is the corresponding Jacobian of the transformation from the unit rectangle h (p, r) =1 to the joint PDF surface f (x, y). The random variables x and y as outlined above may be assumed to be equivalent to  and y ≡ ξλ for brevity later in the paper where they correspond to random variables associated with the zero-pressure area A0 and distortion coefficient λ respectively for our particular pressure balance cross-floated effective area model. If an appropriate two dimensional mapping is correctly chosen then the variables p and r may be independently sampled from a rectanguar distribution R (0, 1) and then the choice of mapping will allow for random variables to be sampled from the corresponding bivariate joint PDF. This purely mathematical transformation result is consistent with the earlier statistical based observation of the need for a choice of copula that may be used with the marginal distributions in order to construct the joint PDF.

As a result of this observation we note that a bivariate quantile is not necessarily unique as there are in principle an arbitary number of mappings between the two surfaces however if we fix the type of mapping which is equivalent to a particular choice of copula then we can construct a unique associated bivariate quantile distribution. Our approach in this paper is to use the multivariate GS1 methodology to generate the actual bivariate joint distribution of the pressure balance TS  where  and  are associated random variables sampled from a PDF corresponding to X. The actual joint PDF  surface constructed out of the GS1 Monte Carlo simulation data is then numerically approximated with a bivariate PDF  using a selection of two dimensional transformation mappings. The accuracy of the bivariate quantile distribution may then be investigated by determining how the corresponding effective area uncertainty calculated in terms of  compares to the actual effective area uncertainties obtained through the actual GS1 Monte Carlo simulation cross-floating data.

The previous definition of bivariate quantiles by Gilchrist was developed in more formal mathematical statistical details in an earlier work by Chaudhuri [28] and may be formally extended to systems other than bivariate distributions with potential application to metrology problems involving higher dimensional non-Gaussian multivariate distributions. One particular model discussed by Gilchrist is the generalized circular model of form (14a) (14b) where 0 ≤ p, r ≤ 1 as previously discussed and β is an appropriate constant parameter that must be determined. The above system of equations is a particular form for the general bivariate quantile transformation where each random variable has a corresponding equation. Another strategy that may be used to construct bivariate quantiles with more generality than the generalized circular model is that of the marginal/conditional form where (15a) (15b)In this formulation the first function x = Qx (p) is a univariate quantile function, whilst in the second function y = Qy (p, r) if x is fixed then p is in turn fixed so that the second function is actually also a univariate quantile function for y for a given choice of x. The practical consequence of choosing to model a bivariate quantile in terms of a marginal/conditional formulation is that the previously developed ELD formulation can in principle also be utilized to construct the associated level curves, however a potential drawback of a conventional marginal distribution approximation of a bivariate joint PDF is that in general only a fixed number of contour curves can be analytically constructed and hence interpolation is required in order to calculate random variables for arbitrary choices of specified values of p and r. Whilst marginal distributions for the zero-pressure area A0 and distortion coefficient λ can be constructed relatively easily using for example extended lambda based univariate quantile distributions a newer approach of fitting the actual bivariate joint PDFs with copulas is now available. In the copula approach one uses known specified marginal distributions which are relatively easy to calculate, and then optimizes parameters  associated with the particular choice of copula family  in order more closely match the copula constructed joint PDF with the actual known joint PDF. As a result fitting copulas to joint PDFs can potentially offer considerable algebraic simplifications where for bivariate distributions the main choices that may be investigated are usually elliptical copulas and generalized Archimedean copula families.

Whilst these particular copula based approaches have their own respective merits we comment that a generalized multivariate quantile function approach originally developed by Chaudhuri [28] may also be implemented for bivariate distributions. The starting point is to generate multivariate data points  in d which are assumed to be known through for example Monte Carlo simulations. For our purposes this Monte Carlo data will be generated with the GS2 approach as discussed earlier with  which will formally provide a sequence of data points  where M is the number of Monte Carlo simulation events and and where we make the observation that we do not assume that μ ≈ y and ∑ ≈ Uy but simply assume that  follows some underlying probability distribution which may not necessarily be Gaussian.

By defining the set as  i.e. B (d) is an open unit ball, and by defining a functional as  where where 〈⋅ , ⋅ 〉 represents the Euclidean inner product, a multivariate quantile function may then be formally mathematically defined as (16) The above is a multivariate generalization for 2 ≤ d ∈  of the univariate case corresponding to d = 1 where the univariate quantile maps values of α for 0 ≤ α ≤ 1 with an associated parameter u = (2α − 1) to the one dimensional interval (−1, 1).

According to the above multivariate definition extreme quantiles would correspond to ∥u ∥ ≈ 1 and central quantiles would correspond to ∥u ∥ ≈ 0 respectively. An iteration algorithm in order to construct the quantile function was originally provided by Chaudhuri where for  the algorithm Step #1 is to compute for all of the corresponding data points  for 1 ≤ i ≤ n and check if the degeneracy condition (17) is valid for some 1 ≤ i ≤ n. If the degeneracy condition is satisfied for some i then just set . Alternately if the degeneracy condition cannot be satisfied for any 1 ≤ i ≤ n then move to Step #2 by constructing  by solving the equation (18)The above equation must be solved by using for example the method of successive approximations of which one possible approach is to use a starting solution composed of a vector of means such that (19)where each mean is simply the mean of the respective set of components. Once this starting value has been constructed successive approximations for m = 2, 3, … may then be generated using iterations so that (20)where (21) (22)and is the d × d identity matrix. Further aspects of how this definition of quantile function may be used to quantify and describe skewness and kurtosis in higher dimensional multivariate distributions is discussed in more detail in the original paper by Chaudhuri [28].

This geometric generalization of a quantile function in higher dimensional spaces may thus have some further technical potential for analysing metrology systems than cannot be modelled in terms of either univariate and bivariate probability distributions as a future topic of research study by metrologists particularly when non-Gaussian multivariate distributions may occur. In the context of pressure metrology such potential future research problems might include higher dimensional multivariate joint PDF's such as for pressure balances in free deformation mode or controlled clearance modes. Examples of higher dimensional multivariate probability distributions for pressure balances in free deformation mode operation include situations when non-linear elasticity theory is utilized and two distortion coefficient parameters are then necessary so that the effective area is modelled as A = A0 (1 + λ1p + λ2p2) and a joint PDF  must be constructed. On the other hand for pressure balances operated in controlled clearance mode such as when the Heydemann–Welch method is implemented for a primary standard scale realization for hydraulic pressures and the pressure balance effective area is modelled as A = A0 (1 + α (t − tref) (1 + λP) [1 + dj (pj0 − pj)] as discussed by Kajikawa et al. [29,30] additional parameters such as the jacket pressure coefficient dj and zero-clearance jacket pressure pj0 are then present in which case a higher dimensional joint probability distribution  must be approximated to fully characterize a pressure balance.

3 Numerical simulations

In this paper we perform an analysis by utilizing the experimental data set previously reported by Ramnath [19] as indicated in Appendix A which provides full technical details and supporting data where the working fluid is assumed to be Di(2)-ethyl-hexylsebacate defined in terms of an oil equation of state reported by Kocas et al [31]. Numerical experiments were performed on a Toshiba laptop with an Intel Pentium B950 CPU operating at 2.10 GHz with 2 GB of RAM running on a Microsoft Windows 7 64-bit operating system using Gnu Octave 4.2.0. Simulations were undertaken with M = 10 000 Monte Carlo simulation events for each generated pressure and cross-floated area data point. The full GS1 simulation for ten generated pressures and cross-floats from 50 MPa to 500 MPa therefore consisted of 10 000 × 10 = 100 000 total simulation events and took approximately 115 minutes to solve. When the GS2 was implemented by post-processing the GS1 data by using the cross-float data to fit a curve through the data points [P1, … , P10] T and [S1, … , S10] T in order to construct the χ2-optimization for the best fit curve S = A0 (1 + λP) the equivalent number of M = 10 000 simulation events took less than a minute to solve when using the built-in polyfit routine of Gnu Octave. As a result Monte Carlo simulations for 10 × 103 ≤ M ≤ 25 × 103 simulation events may be is considered a feasible option with single-core laptops/workstations as these calculations would typically be able to performed in a single working day. On the other hand if higher counts of simulation events in the range 25 × 103 ≤ M ≤ 100 × 103 are required we comment that this would in most practical circumstances require access to either multi-core computers or alternatively high performance computing parallel computing computers or clusters to avoid simulation times of a few weeks.

Our approach in this paper as outlined earlier in the paper first solves the underlying equations for the generated pressure  and cross-floated area f (S, q1, … , qm) =0 using sampled random values q and q which are drawn from the respective input PDFs which produce Monte Carlo GS1 data for the generated pressure MCP = [P1, … , P10] and cross-floated area MCS = [S1, … , S10] which are both M × 10 matrices since each generated pressure is used to cross-float the TS. The Monte Carlo data of these matrices are built up in terms of column vectors of the form  and where each column contains the Monte Carlo data for that particular generated pressure or cross-floated area so that the matrix data is then stored in data files for convenience. The distribution functions  and may if necessary be calculated and summarized in terms of ELDs built up in terms of fours parameters a, b, c, and d such that η = Q (ρ) where (23a) (23b) and 0 ≤ ρ ≤ 1 is a random variable to generate the PDFs as originally discussed by Willink [16]. We comment that the generated pressures and cross-floated areas may if necessary also be calculated directly as  and  where is the quantile function for pressure Pk as summarized in Table 1 with similar expressions for the cross-floated areas as summarized in Table 2.

Once the expected values and are calculated this set of GS1 data which is consistent with the associated underlying PDFs may then be used to estimate the approximate nominal zero-pressure area and distortion coefficient μλ from the pressure versus area graph as illustrated in Figure 1a which is based on a GS1 simulation with M = 10 000 total simulation events as previously discussed. When this original GS1 data is then further processed in the GS2 simulation using the original GS1 data in the χ2 optimization to extract the values for A0 and λ it then results in the scatter plot as illustrated in Figure 1b which may then be further post-processed in order to construct the bivariate joint PDF.

An approximate visualization of the bivariate joint PDF using a kernal density estimate (KDE) approach along with visualizations of the associated marginal distributions for A0 and λ are illustrated in Figure 1c using a Gaussian kernal.

Our approach to construct the bivariate PDF uses the built-in histogram2d function from the Python numpy and scipy scientific computing libraries for convenience since we wish to fit the actual joint PDF f (x, y) where x and y are random variables. This choice of using histogram2d to explicitly build up a discrete approximation of the actual PDF with explicit bin counts that are appropriately weighted for a two dimensional normalization of the joint PDF is used over a KDE approach since the implementation of a KDE implicitly requires a choice of kernal function as discussed by Diwekar and David [32] where the PDF for univariate data  is constructed such that  where h is the bandwidth and K is the kernel function. The use of a kernal function also occurs in the case of multivariate data  with data points  where the multivariate PDF is constructed such that  and as a result a similar issue will arise if we opt to use a KDE formulation to construct the bivariate PDF since the choice of kernal used for the KDE construction will introduce a level of subjectivity in the bivariate joint PDF. Typically when implementing a KDE to construct a univariate PDF a Gaussian kernal function is usually used however other particular choices of kernals such as an Epanechnikov and biweight kernal can also be used. As a result different PDF estimates may occur when the underlying Monte Carlo data is post-processed using a KDE approach.

Since we do not have any a priori knowledge of the most appropriate underlying kernal our approach is to simply construct the actual discrete approximation of the joint PDF using the Python histogram2d function which correctly normalizes the two dimensional PDF such that  where in our particular problem the random variables are chosen such that  and y ≡ ξλ/[Pa−1]. This is performed by using the well known Freedman–Diaconis statistical rule to estimate the bandwidth as (24) where IQR (x) = Q3 (x) − Q1 (x) is the interquartile range for the the random variable x where Q3 (x) and Q1 (x) are the third and first quartiles. The number of bins hx associated with x using this choice of bandwidth is then calculated using the ceiling function as (25)with similar associated expressions for the other random variable y. The final actual bivariate joint PDF from the GS2 simulation is illustrated in Figure 2 in natural physical units. Referring to the two dimensional discrete histogram we observe that the limits for a joint PDF  are approximately , , min(ξλ) =0.30 ppm/MPa and max(ξλ) =1.10 ppm/M. In this joint PDF the random variable for area is in units of m2 so that when the pressure is in units of Pa the random variable for the distortion coefficient ξλ is in units of Pa−1 for dimensional consistency in order to satisfy the normalization condition .

For brevity let the random variable be represented by x and the random variable ξλ be represented by y so that the joint PDF  is just f (x, y) for convenience. If we construct variables such that (26a) (26b) (26c) the original random variables are mapped to equivalent scaled variables such that 0 ≤ p ≤ 1 and 0 ≤ r ≤ 1. As a result the original joint PDF  may now be defined in terms of transformed random variables p and r.

The normalization of variables is utilized due to scaling effects since it will be generally be more convenient to fit the normalized joint PDF such that φ (p, r) = [f (x, y)]/[max(f (x, y)]. As a result by fitting the normalized parameters 0 ≤ φ (p, r) ≤1 with 0 ≤ p ≤ 1, 0 ≤ r ≤ 1 and fmax = max {f (x, y)} the scaled values can then be recovered such that (27a) (27b) (27c) This fit may be implemented using the marginal distributions form of the generalized bivariate quantile distributions for the zero-pressure area designated as  in Figure 3a and for the distortion coefficient designated as  in Figure 3b respectively. We comment that due to the shape of the PDFs for these marginal distributions as observed from the histogram for A0 in Figure 3c and the histogram for λ in Figure 3d respectively which are constructed with a Freedman–Diaconis choice of bin size that it would generally be convenient and reasonably accurate to approximate these PDFs in terms of either univariate ELDs as summarized in Table 3 or alternatively in terms of univariate splines where the degree of interpolation can be adjusted to refine the desired accuracy level.

The approach of sampling points x and y from a marginal distribution approximation of a joint PDF f (x, y) is to first generate a point p using a rectangular random number generator distribution which can then be used to calculate x from the corresponding marginal distribution Qx (p). Next by using this known value of x we solve for y from Qy (p, r) using the previously determined value of p however a practical implementation scheme is needed to construct the marginal distribution for Qy (p, r). In this paper for simplicity we opt to perform this implementation by constructing a set of functions for a set of contour curves for specified known values of x so that any arbitrary values can be estimated using for example bilinear or bicubic interpolation for simplicity.

A conventional classical marginal distribution approximation of the joint PDF may therefore be specified by the following three steps where a convenient functional form would in most practical cases be that of a spline fit such as for example a B-spline. The general algorithm would then implement the following general steps such that:

Step 1. Fit the marginal distribution Qp (p) for the scaled variate p with any convenient practical functional form such that  where  is a marginal distribution function and  is a corresponding fitted parameter

Step 2. Construct a set X = {p1, … , pm} for some finite integer m ∈  where 0 = p1 < p2 < ⋯ < pm−1 < pm = 1

Step 3. For each pi ∈ X extract the corresponding contour curve Ci = φ (pi, r) from the scaled joint PDF φ (p, r) and fit any convenient practical functional form for each curve Ci such that  where βi is a corresponding fitted parameter

As a result the quantification and specification of the parameter α for the marginal distribution  and the set of parameters  for the contour curves for the marginal distribution  would actually be sufficient to fully characterize the TS pressure balance's behaviour for test calibrations and metrology inter-comparisons. The challenge with this traditional marginal distribution approximation of a bivariate distribution approach is that whilst we could easily explicitly directly generate values for the x random variable corresponding to A0 we would have to implicitly indirectly generate values for the y random variable corresponding to λ by choosing a fixed number of contour curves and then employing an interpolation scheme.

In order to clarify how to sample random variables and ξλ from the marginal distribution approximation of the joint PDF  suppose for illustration purposes that we wish to generate a random variable of the zero-pressure area as xi for pi = 0.7 and a random variable for the distortion coefficient λ as yi for ri = 0.82. We can immediately calculate x as xi = Qx (0.7) but it will be more difficult to calculate yi since in general we would have a set of contour curves for fixed values of p, say p ∈ [0.0, 0.25, 0.5, 0.75, 1.0]. Selecting the nearest p that is smaller than pi i.e. p = 0.5 and the nearest pu that is larger than pi i.e. pu = 0.75 allows us to calculate the corresponding y for p as y = Qy (p, ri), and that for pu as yu = Qy (pu, ri). Then these data points can be used to interpolate for the required yi corresponding to ri = 0.82 using linear interpolation so that yi = y + [(yu − y) (pi − p)]/[pu − p].

This particular illustrative example for how to perform a two dimensional interpolation to calculate samples of random variables from the underlying joint PDF is mathematically equivalent to a bilinear interpolation, although in principle any particular interpolation scheme such as for example bicubic interpolation may also be used to obtain smoother interpolated fit values. The marginal distribution approach is illustrated by comparing the actual bivariate surface in Figure 4a with the surface approximation as shown in Figure 4b which demonstrates how a set of contour curves can be used to approximate the two dimensional joint PDF surface. In this figure there are a total of Nx = 60 curves for evenly spaced values of the random variable  from 1.9612 mm2 to 1.9617 mm2 and the  varies from 0.3145 ppm/MPa to 1.1581 ppm/MPa in evenly spaced values using a total of Ny = 75 points. Whilst the marginal distribution approach for constructing a bivariate PDF is theoretically possible it does not unfortunately offer a convenient and practical means of modelling and summarizing a bivariate PDF for reporting purposes such as in a pressure balance calibration certificate or inter-comparison report.

An alternative to the marginal distribution approach for arbitrary specifications of pi and ri without any need for additional interpolations in order to immediately directly sample random variables from the joint PDF is to formulate the surface fitting of the joint PDF as a scattered data interpolation problem. This problem which is common in the field of statistics is formally specified as that given a set of discrete sampled data points  for j = 1, 2, … , N with  and yj ∈  where 0 < s ∈  is the dimension of the data, then find a continuous function  such that . A particular well-posed solution to the scattered data interpolation problem is to use the distance matrix formulation such that (28) where ∥ ⋅ ∥ 2 is the ℓ2-norm defined as  for  in the two dimensional case when s = 2. The coefficients ck for k = 1, 2, … , N where N is the total number of known discrete points  is obtained by solving the linear system specified as (29a) (29b) (29c) (29d)We comment that the scattered data interpolation solution using the distance matrix formulation can be refined for an arbitrarily large number of supplied known data points. A useful benefit of this approach is that it is not strictly necessary to first normalize and scale the random variables to fit  and yj = φ (pj, rj) for j = 1, … , N since the scattered data interpolation scheme can be applied directly with  and . Nevertheless although a normalization is not theoretically necessary in our simulations we opt to apply the fit to the normalized variables 0 ≤ p ≤ 1, 0 ≤ r ≤ 1 and 0 ≤ φ (p, r) ≤1 to avoid ill-conditioning issues when solving the matrix equation . Results for a scattered data interpolation scheme using the same underlying data is illustrated by comparing the actual bivariate surface as previously shown in Figure 4a with the scattered data surface approximation as shown in Figure 4c.

Whilst this construction of the conventional scattered data two dimensional joint PDF is better than that of a marginal distribution approach which is restricted by the fixed number of discrete contour curves chosen, the main issue is that the number of constants used to construct the surface is still very large since in this case there are N = 3534 constants c1, … , cN for the underlying discrete data points . As a result the main issue with a scattered data interpolation scheme apart from the large number of constants to approximate the surface in our particular physical pressure balance metrological system, is that although it is guaranteed to adequately model the underlying data that the associated system of equations is usually ill-conditioned.

Alternative meshless techniques that avoid ill-conditioning encompass the radial basis function (RBF) approach as discussed by Larsson and Fornberg [33] which may be considered a generalization of the scattered data interpolation scheme . In this approach the constants are calculated by solving the matrix equation  where Aij = ϕ (ϵ, r) where ϕ = r2log(r) in the case of a thin plate spline or where ϕ = exp(− (ϵr) 2) in the case of a Gaussian RBF where  and  as previously discussed, however the same issue of a large number of constants is still present.

This problem again occurs when low order spline approximations of bivariate functions are considered where the bivariate function is traditionally approximated as a tensor product spline surface of the form (30) where K is an integer corresponding to the order of the approximation, σk are constants, and uk (x) and vk (y) are univariate functions each with their own individual corresponding parameters particular to the characteristics of the respective basis functions as recently developed by Georgieva and Hofreither [34]. The main issue with such a traditional low rank approximation in our particular problem is that although it can very accurately model and generate a reconstructed surface for suitable choices of σk, uk (x) , vk (y) and K, is that in general a very large number of coefficients are generally still necessary in order to adequately reconstruct the basis functions uk (x) and vk (y). Whilst a large number of coefficients is not an issue in a computer implementation the reporting of a large number ranging from a few hundred to a few thousand coefficient values is nevertheless however not considered practical for a calibration paper certificate or written inter-comparison report.

Due to these practical implementation issues we therefore consider a copula approach Cθ (u, v) as previously discussed in order to construct the bivariate distribution for our particular physical metrological measurand system since for many practical copula families only a small number of parameters are required in the term . As a result due to generally small number of parameters to adequately characterize bivariate PDF distributions this renders the reporting of the physically characterized pressure balance joint PDF in calibration certificates and inter-comparison reports a practical and feasible alternative for oil based pressure balances.

In this paper for brevity we restrict our numerical investigation to the normal, Student-t, Gumbel, Frank and Clayton bivariate copulas Cθ (u, v) as summarized in Table 4 where the respective copula parameter θ is obtained from the Kendall tau parameter τK as previously discussed. When constructing the copula we use x as the random variable for the zero-pressure area A0 and y as the random variable for the distortion coefficient λ in order to construct the joint PDF f (x, y) such that (31) (32) (33) In the above formulae gx (ξx) is the marginal PDF for x as previously approximated with an ELD as illustrated in Figure 3a, whilst gy (ξy) is the marginal PDF for y as illustrated in Figure 3b both of which are fully parametrized in terms of respective univariate quantile parameters a, b, c, d. By exploiting the properties of ELDs as previously discussed by Willink [16] the distribution function formally defined as  for the random variable x with a similar expression for the random variable y may in the special case of an ELD approximation be simplified as (34)As a result if the marginal distribution for the zero-presssure area A0 random variable x has ELD parameters ax, bx, cx, dx, and the marginal distribution for the distortion coefficient λ random variable y has ELD parameters ay, by, cy, dy it follows that we can immediately utilize the previous analytical expressions in the copula construction for the bivariate joint PDF in terms of the ELD parameters. In order to construct a copula function Cθ (u, v) for the formulae listed in for example Table 4 we must in the case of the Gaussian copula use Φρ (•) which denotes the bivariate standard normal distribution with correlation ρ whilst Φ−1 (•) denotes the inverse standard normal distribution. On the other hand for the bivariate Student-t copula tρ,ν denotes the bivariate t-distribution with parameters ρ and ν where the degrees of freedom ν must typically be calculated using the Akaike Information Criterion (AIC) where the smallest AIC value (which depends on ν) is considered to be the best-fit copula choice, and  is the corresponding inverse t-distribution value for the associated degrees of freedom ν parameter for the copula. Although the conventional t-distribution copula usually has one degree of freedom parameter ν more modern alternatives can incorporate multiple degrees of freedom parameters ν1, … , νn for bivariate models as discussed in Luo and Shevchenko [35].

As a result whilst it is certainly possible to estimate bivariate copula parameters  directly with custom written computer code through various parameter optimization approaches for a range of copula families of which Table 4 is only a small selection of the very extensive range of possible copula families this is no longer strictly essential. This is due to the fact that Yan [36] developed a R based package copula which is now readily available to researchers worldwide, and which has subsequently been expanded by Kojadinovic and Yan [37] for multivariate distribution modelling using copulas. As a result the use of R based open source statistical software from the Comprehensive R Archive Network is now commonly available to researchers worldwide and is accepted as a standard statistical tool within the statistics community. In this paper we will therefore utilize the copula [38], copBasic [39], and VineCopula [40]. R packages to simplify the analysis in order to construct the best choice of a copula function for our underlying pressure balance Monte Carlo bivariate data.

The use of the above-mentioned R packages allows us to simply load the bivariate data from our post-processed GS2 Monte Carlo simulations and to then implement the following computer code using the VineCopula library which automatically selects the most appropriate bivariate copula function drawing on a selection of copula families more extensive that the most common families previously outlined in Table 4. When performing the copula selection to avoid numerical ill-conditioning due to scaling effects since  and  in SI units it may depending on the available computer systems be advantageous when fitting the copula function to utilize the variate data for A0 and ξλ for λ such that they are first converted so that x is in mm and y is in ppm/MPa. We comment that the choice of units is not problematic since our main objective is to model the bivariate joint PDF and the form of the fit should simply allow us to adequately sample values for the A0 and λ variates x and y. Further technical implementation details for the VineCopula library are available in the official documentation however we briefly comment that the user would in most practical situations select from ml for maximum likelihood, mpl for maximum pseudo-likelihood, itau for inversion of Kendall's tau, or irho for inversion of Spearman's rho as choices for the parameter optimization choice for the software code to determine the optimal copula parameter fit.

The process to fit the bivariate data starts with the GS2 Monte Carlo data GUMSupplement2.txt that is obtained with any suitable computer code simulation environment that is saved in a neutral ASCII txt file format. Once the data is saved it can then be processed to firstly extract the univariate ELD quantile parameters in for example GNU Octave for the marginal distributions, and then secondly processed in RStudio using the VineCopula library to determine the optimal bivariate copula and associated copula parameters as shown in Figure 5.

Results for the univariate marginal distribution parameter fits with GNU Octave v4.2.0 by running computer program CalcMarginalParameters.m with the m-code shown in Figure 6 are shown below in Listing 1.

Results for the bivariate copula parameter fits with RStudio by running computer program CalcCopulaParameters.r are shown below in Listing 2.

The above computer code console outputs are actually sufficient to fully characterize the ELD marginal distribution and copula bivariate distribution parametrizations. In order to construct the joint PDF the second order mixed partial derivative of the copula C (u, v ; θ) must be calculated in order to calculate the copula density c (u, v ; θ). Further technical implementation details to directly numerically construct the joint PDF from a general copula that is not amenable to closed form analytical expressions of the copula density are outlined in Appendix B.

For the particular case of a Gaussian copula Meyer [41] provides explicit analytical closed form expressions such that for the univariate case the cumulative distribution function Φ−1 (•) is defined as (35) (36) The univariate standard normal CDF Φ (h) is then used to construct the bivariate standard normal CDF Φ2 (a, b) such that (37) (38)When the above expressions are combined we then obtain the final expression for a Gaussian copula formulation of the bivariate joint PDF as (39)In our particular case since the optimal copula is a Gaussian we simply specify the copula as (40a) (40b)The above copula function informations is then completed by also specifying the associated quantile function parameters for F (x) = u and G (y) = v which using our ELD approximation is of the form (41a) (41b)where (42a) (42b) (42c) (42d)while the quantile function parameters for G (y) are (43a) (43b) (43c) (43d)The above set of four equations for the copula function C (u, v ; θ), marginal distributions F (x) and G (y), and parameters associated with the respective marginal distributions {ax, bx, cx, dx} and {ay, by, cy, dy} are then actually sufficient to model the bivariate joint PDF for the TS pressure balance, by noting the ELD parameter limits are by definition (44a) (44b) (44c)Once the joint PDF has been summarized using f (x, y) = F (x) G (y) c (u, v) with u = F (x) and v = G (y) as indicated above, it can then be used in any further uncertainty analysis numerical simulations for the TS pressure balance. As an example we would in a practical physical pressure balance calibration like to sample random variables and ξλ from  in order to calculate the uncertainty in generated pressure. The general numerical procedure we propose in this paper adapted from the univariate sampling procedure for arbitrary distributions is as follows for bivariate distributions:

Step 1. Sample a random variable ρx ∼ R from the rectangular distribution so that 0 ≤ ρx ≤ 1

Step 2. Solve the equation F (x) = ρx in order to deduce a value for x and then simply set 

Step 3. Construct the contour curve  by fixing x = x in the joint PDF f (x, y) and normalize as appropriate

Step 4. Build the CDF such that  and normalize as appropriate

Step 5. Sample a random variable ρy ∼ R from the rectangular distribution so that 0 ≤ ρy ≤ 1

Step 6. Solve the equation H (y) = ρy in order to deduce a value for y and then simply set ξλ = y

If the sampled variates are x and y, and ρx and ρy are sampled rectangular random numbers then the above procedure can be summarized as solving the following system of equations such that (45a) (45b) We comment that the above bivariate sampling scheme is more general than a traditional bivariate Gaussian sampling scheme since it is not restricted to a conventional Gaussian bivariate PDF, and can sample from arbitrary bivariate joint PDFs since the approach will work for arbitrary marginal distributions and arbitrary bivariate copulas. As a result the benefit of using a copula to model a bivariate distribution over a traditional GS2 based Gaussian joint PDF is that one is no longer specifically restricted to bivariate Gaussian distributions as is presently the case according to the official GS2 guidelines. With the application of copulas as a particular modelling approach for bivariate quantile distributions one may therefore now in principle utilize any other alternative physical or mathematically plausible choice of probability distributions modelled in terms of copulas to summarize bivariate Monte Carlo uncertainty analysis simulation data as outlined in Appendix B.

We can now test the extent to which our bivariate joint PDF for the TS pressure balance can estimate the generated pressures. This is performed by implementing a GS1 simulation of a generating pressure model for the TS pressure balance as (46a) (46b) (46c) where  is sampled using our bivariate joint PDF modelling approach using the temperature compensation function ϕ (t, tref) =1 + α (t − tref).

When this Monte Carlo simulation for the TS pressure balance is performed using the data in Appendix A and the Monte Carlo generated pressure data is obtained, then this generated pressure data for each of the cross-floated pressure points 50 MPa, … , 500 MPa may be compared to the actual generated pressures from the LS pressure balance. Numerical results were performed using a direct simulation single language implementation in GNU Octave with the above generalized sampling scheme for x and y as shown in Table 5. These results were obtained on a computer workstation using an Intel Xeon E5-1650 v3 CPU running at 3.50 GHz with 32 GB of RAM for M = 500 Monte Carlo simulation events by directly sampling from the bivariate joint PDF by solving the respective equations for x and y.

Alternatively in the special case where the analytical expression for the copula density c (u, v) is explicitly known beforehand then a mixed language approach using R and Octave can offer considerable computational time savings. This approach involves first using R to sample from the Gaussian copula, or any other suitable copula family available from the RStudio available copula library that adequately approximates the bivariate data Kendall tau value, with the specified parameter value θ and then saving the marginal distribution values u and v to a text file as indicated below in Listing 3.

The R computer code shown in Listing 3 writes the u and v values that are consistent with the specified copula density function c (u, v) to a text file CopulaDataUV.txt which is an array of dimension 500 × 2 where the first column is the u data and the second column is the v data, whilst the corresponding copula density c (u, v) is saved to the text file CopulaDataC.txt. Afterwards the corresponding variate data x and y can then be simply recovered by using one dimensional interpolations from the previously generated ELD based univariate marginal distributions u = F (x) and v = G (y). This mixed language process for n = 10 000 samples using RStudio and GNU Octave takes approximately 85.36 s to generate, post-process and then recover the variate and ξλ sampled data on the previously mentioned Toshiba laptop. As a result the numerical simulations to test the bivariate statistical sampling approach demonstrates that a copula based bivariate quantile joint distribution sampling scheme is computationally feasible for both single computer and well as mixed computer language implementations when used as inputs in subsequent Monte Carlo uncertainty simulations.

In order to verify and validate (V&V) our proposed bivariate joint PDF modelling approach in terms of the standard quality engineering V&V methodology we must compare and contrast these numerical results which utilize the bivariate statistical sampling of the pressure balance',s zero-pressure area and distortion coefficient values with the exact Monte Carlo numerical results previously summarized in Table 1. We can implement this comparison by determining the normalized errors En defined as (47) for all our generated/cross-floated data points between the LS which generates the known applied pressures and the TS which is cross-floated against the LS. According to this approach xref would correspond to the LS generated applied pressure  and Ulab would correspond to the expanded uncertainty of the generated pressure , whilst xlab would correspond to the TS equivalent generated pressure  using the copula based cross-floated area parameters where  are the corresponding TS expanded uncertainties for the respective applied pressure.

In order to calculate the normalized errors knowledge of the respective expanded uncertainties  and  are necessary noting that we have used ELDs to summarize the respective Monte Carlo data for both the LS and TS. We recall that the expected value μ and standard uncertainty σ for an ELD are specified as (48a) (48b) As a result knowledge of the ELD parameters immediately gives us the values of the expected values μk for the LS and TS but we do not have direct knowledge of the respective expanded uncertainties which must be calculated from the ELD based distribution function. In general for a specified confidence level p, say p = 0.95 corresponding to a 95% confidence level, for a possibly non-symmetric and/or skew PDF g (η) a value of α is first calculated by minimizing [G−1 (p + α) − G−1 (α)], however if symmetry with an absence of skewness is assumed as a simplification then .

Under these circumstances for a measurand y with a distribution function G (η) the confidence interval for the specified confidence level is just Ymin = G−1 (α) and Ymax = G−1 (p + α). Consequently we may then just approximate the expanded uncertainty as (49) where (50a) (50b) (50c)When the above V&V formulation is implemented it results in the data summarized in Figure 7 using M = 500 Monte Carlo simulation events by sampling from the bivariate joint PDF. Referring to the normalized errors En it is observed that −1 ≤ En ≤ 1 for all the applied pressures 50 MPa, … , 500 MPa.

As a result of these numerical simulations we therefore conclude that the proposed method of using bivariate quantiles with ELDs for the marginals with parameter based optimized copulas is mathematically and statistically consistent for pressure balance calibrations and inter-comparisons at a primary scientific metrology standards level.

Table 1

Summary of ELD parameters for generated applied pressures Pk/[Pa] for k ∈ [1, … , 10]. The index k indicates the pressures Pk from P1 = 50 MPa through to P10 = 500 MPa so that the extended lambda distribution parameter ak, bk, ck and dk can be used to summarize the probability density function distribution for Pk.

Table 2

Summary of ELD parameters for generated areas Sk/[m2] for k ∈ [1, … , 10]. The index k indicates the area Sk corresponding to the cross-floated pressures from P1 = 50 MPa through to P10 = 500 MPa so that the extended lambda distribution parameter ak, bk, ck and dk can be used to summarize the probability density function distribution for Sk.

thumbnail Fig. 1

Illustration of how the Monte Carlo data from a GUM Supplement uncertainty analysis is utilized in order to determine the joint probability density function for the transfer standard pressure balance. In (a) the univariate GUM Supplement 1 (GS1) Monte Carlo data of the cross-floated pressures Pk and cross-floated areas Sk is first pre-processed by averaging and plotted in order to get a rough qualitative estimate of the model parameters. In (b) the actual GS1 data points are then processed with the GUM Supplement 2 (GS2) using a suitable χ2 optimization to extract the actual model parameters and the discrete model parameter values plotted. In (c) the discrete model parameters are then post-processed using a kernal density estimation scheme in order visualize the mathematically continuously variable joint bivariate probability density function behaviour of the underlying model.

thumbnail Fig. 2

Illustration of Monte Carlo GUM Supplement 2 processed data showing a visualization of the transfer standard pressure balance bivariate joint PDF where the random variables are plotted in physical units such that 106A0/[m2] = A0/[mm2] and 1012λ/[Pa−1] = λ/[ppm/MPa] for easy visualization so that the joint PDF has the correct magnitude for a two dimensional normalization such that  when constructed from the two dimensional histogram data using  and Nλ = Ny = 58 bins respectively with M = 10 000 Monte Carlo simulation events. In (a) the actual discrete data of the bivariate joint probability density function is first plotted for a qualitative visualization. In (b) the discrete data is post-processed to produce the mathematically continuous joint probability density function in a three dimensional space where each space coordinate corresponds to a random variable. In (c) the joint probability density function is normalized in order to more clearly view the behaviour of the underlying model.

thumbnail Fig. 3

Monte Carlo GUM Supplement 2 marginal distributions. In (a) the probability density function distribution for the continuous random variable of the zero-pressure area is visualized. In (b) the probability density function distribution for the continuous random variable ξλ of the distortion coefficient is visualized. In (c) the underlying discrete data from the GUM Supplement 2 analysis that is used to build the PDF for the zero-pressure area is shown. In (d) the underlying discrete data from the GUM Supplement 2 analysis that is used to build the PDF for the distortion coefficient is shown

Table 3

Marginal distribution ELD parameters for zero-pressure area A0 and distortion coefficient λ variates. The underling probability density function distribution for the zero-pressure area random variable and the distortion coefficient random variable ξλ is modelled and approximated with an extended lambda distribution based on the discrete data obtained from the GUM Supplement 2 uncertainty analysis.

thumbnail Fig. 4

Illustration of comparison between the actual bivariate joint PDF surface with that of a conventional classical marginal distribution approach approximation showing the discrete set of contour curves visualized by three dimensional lines that are used to construct joint PDF surface, and that of a scattered data approximation of the corresponding normalized bivariate surface. In (a) the actual continuous probability density function distribution is shown which must be modelled and approximated through a visualization in a three dimensional space where the x coordinate corresponds to the random variable , the y coordinate corresponds to the random variable ξλ and the z coordinate corresponds to the joint PDF . In (b) the modelling and approximation is shown where the three dimensional surface is approximated by a series of three dimensional lines within the three dimensional space. In (c) the modelling and approximation is visualized for the equivalent surface that is built up from discrete three dimensional scattered points.

Table 4

Specification of common parameter based bivariate copula families (Adapted from Goda [17]). In our particular problem the variable u corresponds to the random variable and the variable v corresponds to the random variable ξλ so that the bivariate copula C (u, v) can be used to construct the joint probability density function as  for convenience.

thumbnail Fig. 5

Illustration of how a GUM Supplement 2 (GS2) Monte Carlo uncertainty analysis would be implemented in practice for constructing the marginal distribution and copula distribution parameters. In (a) an extract of the actual M = 10000 data points is shown from the GS2 which were determined with a suitable χ2 optimization. In (b) the computer code implementation of how to process the GS2 data to fit the marginal distributions in terms of an extended lambda distribution is shown. In (c) the computer code implementation of how to determine the copula parameters using open source easily accessible standard statistical software is shown.

thumbnail Fig. 6

Illustration of m-code for estimating an ELD parametrization for univariate Monte Carlo data. The computer program works by loading the Monte Carlo univariate data as an input ydata = [y1, … , yM] T where M = 10 000 is the number of Monte Carlo simulation events and it then uses this data to determine the extended lambda distribution parameters a, b, c and d so that the probability density function can be analytically specified.

thumbnail Listing 1

Computer output used to calculate the univariate ELD parameters for A0 and .

thumbnail Listing 2

Computer output used to calculate the copula parameter with open source standard statistical software.

Table 5

Extended lambda distribution (ELD) parameters for transfer standard pressure balance generated applied pressures Pk/[Pa] using bivariate copula joint PDF. The index k corresponds to the pressures Pk for 50 MPa through to 500 MPa and the parameters ak, bk, ck and dk can then be used to construct analytical expressions for the probability density function distribution for each of the Pk pressures.

thumbnail Listing 3

Computer output used to calculate sampled values from the copula with open source statistical software.

thumbnail Fig. 7

Verification and validation of proposed method using normalized errors of applied pressures demonstrating that ∥En ∥ ≤1 ∀ n ∈ [1, … , 10] laboratory standard pressure balance generated pressures and transfer standard pressure balance cross-floated area measurements. The pressures P1 through to P10 correspond to pressures from 50 MPa through to 500 MPa at 50 MPa steps and the En values for each of these pressures indicates the corresponding normalized error. When calculating the En value for each pressure the known reference pressure Pref is determined from the laboratory standard pressure balance whilst the calibrated pressure Pcal is determined from the transfer standard pressure balance using the bivariate copula that models and summarizes the information of the calibrated pressure balance's zero-pressure area A0 and distortion coefficient λ. Due to the fact that all of the En values are smaller than unity this then means that the results and uncertainties that were calculated for the calibrated pressure balance when tested against the reference pressure balance are statistically consistent and hence provides proof that the proposed method of using copulas to model and summarize bivariate probability density functions has been verified and validated.

4 Discussion

In this paper we have investigated the feasibility and utility of extending and applying quantile functions to systems which naturally exhibit bivariate PDF's and considered the particular case of oil pressure balances where the area takes the form A = A0 (1 + λP) and knowledge of the bivariate joint PDF  in terms of the zero-pressure area and distortion coefficient ξλ random variables is necessary for elevated pressures. Our research approach involved the implementation of a GS2 multivariate uncertainty analysis of a TS pressure balance where we mathematically formulated an approach to post-process the original Monte Carlo data in order to recover the underlying and ξλ bivariate random variable statistical data for further analysis.

Numerical simulations were then considered and performed for a variety of mathematical modelling approaches in order to study how to adequately model, summarize and reconstruct the underlying bivariate statistical uncertainty analysis data. Based on these numerical simulations we sought to investigate the extent to which a combination of univariate ELD quantile functions for the marginal distributions u = F (x) and v = G (y) of  and y = ξλ respectively when coupled with a suitable bivariate copula family Cθ (u, v) selection with an optimized copula parameter θ would be sufficient for pressure calibrations and inter-comparisons.

Results of generated pressures for a variety of conditions were then considered and analysed in order to perform bench-mark studies using conventional metrology statistical tests. When these results were analysed to assess the level of necessary verification and validation measures for the method proposed in this paper it was concluded that a bivariate quantile distribution of a pressure balance PDF of the form  is indeed sufficient to accurately model and summarize a pressure balance's metrological characteristics at a primary standards scientific metrology level. As a result we conclude that the measurement modelling technique proposed compares very favourably to an exact GS2 UQ analysis and may thus offer benefits to pressure metrologists at national metrology institutes involved in high accuracy client calibration work and in participation in inter-comparisons at a primary scientific standards metrology level.

A potential future topic of metrology research in the area of uncertainty analysis for pressure balances is therefore the possible application of vine copulas for the modelling of multivariate uncertainty analysis data for summarizing GS2 Monte Carlo for higher dimensions. Alternatives to vine copulas are empirical copulas which are copulas note defined in terms of a parameter  but which are defined directly in terms of the underlying data. In the case of two dimensional variates [xi, yj] T the bivariate empirical copula when there are n variates is defined as (51) The above working definition is usually expressed mathematically as (52a) (52b) (52c) (52d)

Different types of implementations for an empirical copula are possible as discussed in the copBasic package [39] and include the Hazen, Weibull and Berstein empirical copulas. More recent recent research in the area of empirical copulas by Segers et al [42] using Berstein polynomials concluded that beta empirical copulas can fully meet the formal mathematical specifications for a copula and that these offer very good performance in terms of estimating bias and variance.

As a result these types of copulas are potentially attractive choices for metrologists for future UQ research in terms of modelling and summarizing higher dimensional PDFs where multivariate Gaussian PDFs as per the GS2 are problematic due to higher dimensional probability distribution asymmetries and skewness characteristics.

5 Implications and influences

The main implication of this paper is that we have demonstrated that quantile functions may be used to accurately and completely model the bivariate joint probability density distribution function for a pressure balance's effective area in terms of its zero-pressure area A0 and distortion coefficient λ. As a result the specification of nine numerical parameters, four for each of the marginal distributions plus one for the bivariate copula parameter, now enables pressure metrologists to simply summarize pressure balance bivariate PDF information in calibration certificates and inter-comparison reports, and to have an increased level of confidence in the behaviour and uncertainty of their oil pressure balance laboratory primary standards at elevated pressures which was previously limited due to the complexity of incorporating the uncertainty in the distortion coefficient.

Based on the results reported in this paper the wider influences which are now possible is an increased awareness of the utility of multivariate higher dimensional uncertainty analysis with the GS2, and how copulas may now be used to simply and conveniently summarize higher dimensional uncertainty analysis results in high accuracy calibrations and scientific metrology inter-comparisons in other metrology areas and fields of work.

Acknowledgments

This work was performed with funds provided by the Department of Higher Education and Training (DHET) on behalf of the South African government for research by public universities.

Appendices

Appendix A Experimental pressure balance data

We use the representative experimental data set previously reported by Ramnath [19] as indicated in Table A.1 for the laboratory standard (LS) and TS respective mass and density values, Table A.2 for the temperature and ambient conditions, Table A.3 for the utilized mass pieces of the LS, and Table A.4 for the utilized mass pieces of the TS during the cross-floating experiments where points 1 through to 10 correspond to nominal applied pressures such that Pi/[MPa] ≈50i for i ∈ [1,10].

Table A.1

Pressure balance laboratory standard and transfer standard mass and density data.

Table A.2

Pressure balance laboratory standard and transfer standard temperatures and ambient data.

Table A.3

Boolean values to determine pressure balance laboratory standard and transfer standard operating conditions (weights for pressure generation).

Table A.4

Boolean values to determine pressure balance laboratory standard and transfer standard operating conditions (weights for cross floating).

In this data set the working fluid is assumed to be Di(2)-ethyl-hexylsebacate i.e. DHS oil for which the fluid density is modelled as as discussed by Kocas et al. [31].The parameters and inputs for the pressure balance model that generates the known pressures are A0 = 1.96151 × 10−6 m2, u (A0) = ±9.89581 × 10−11 m2, λ = 7.25 × 10−13 Pa−1, u (λ) = ±4.5 × 10−14 Pa−1, α = 9.10 × 10−6 K−1, u (α) = ±0.45 × 10−6 K−1, Vs = 0 m3, u (Vs) = ± 10−10 m3, σ = 31.2 × 10−3 N m, u (σ) = ±0.001 × 10−3 N m−1, H = 173.1 × 10−3 m, u (H) = ±0.2 × 10−3 m, g = 9.7860994 m s−2, and u (g) = 10−7 m s−2 respectively where the gravitational acceleration is common to both the pressure balance that generates the pressure as well as the pressure balance which undergoes the cross-floating. For the particular data we assume that the ambient conditions are measured such that u (ta) = ±0.5 Cu (pa) = ±15 Pa and u (ha) = ±0.05 respectively, and that the temperature of the respective pressure balances can be measured such that t = ±0.015 ° C.

The associated parameters and inputs for the pressure balance model that undergoes the cross-floating are Vs = 0 m3, u (Vs) = ± 10−10 m3, H = 0 m, u (H) = ±0.2 × 10−3 m and α = 1.45 × 10−5 K−1 respectively where the surface tension for the oil is the same as for the previously specified pressure balance since the working fluid is common to both the LS and TS pressure balances.

Appendix B Parameter fits of specific copulas

To fit specific types of copulas consider for example explicitly using a Clayton distribution. The corresponding R computer code to fit a Clayton copula and extract the associated parameter values would be:

The parameter value θ associated with a user's particular choice of copula family may then be obtained by simply printing out the value as

When the respective fits are completed for a particular choice of copula family we could then compare the Kendall's tau values for a particular fit with that of the optimal copula fit in order to determine how closely the chosen fit matches the actual data correlation by using the following code:

In our particular case using Kendall's tau as an indication of the variate correlation from the above VineCopula library computer results we observe that the Clayton copula approximation diverges from the optimal Gaussian copula and therefore it would not be beneficial to use a Clayton's copula to model the pressure balance joint distribution. If simulations are performed on a 64-bit computer using double floats then the practical numerical precision is usually 16 digits unless variable precision arithmetic (VPA) libraries are used as previously discussed.

In the event that the copula function determined either from an optimal fit using the VineCopula optimal fit routine or a custom copula fit based on physical or mathematical requirements is too complicated to algebraically differentiate then a numerical differentiation of a function u (x, y) may be performed so that The copula density may then be explicitly calculated directly from the analytical copula function C(u, v ; θ) by using the above numerical approximation so that As a result as long as an analytical algebraic expression for the copula function C (u, v ; θ) is specified along with the respective value for the parameter θ in for example a calibration certificate or regional/international inter-comparison report, then the joint PDF distribution information for the measurement instrument is completely specified through the reported copula function and appropriate parametrizations for the associated marginal distributions.

Extensions to non-parametric copulas in terms of empirical copulas are possible using the R package copBasic [39]. More recent research in the area of empirical copulas tends to favour an empirical beta copula as discussed by Segers et al [42] due to certain statistical technical reasons, and the use of empirical beta copulas may therefore in principle be considered for utilization by metrologists for modelling higher dimensional joint probability density distributions where multivariate Gaussian probability density functions are problematic.

References

  1. BIPM, IEC, IFCC, ILAC, ISO, IUPAC, IUPAP, and OILM, Guide to the expression of uncertainty in measurement. Technical report, Bureau International de Poids et Mesures (BIPM), 2008, www.bipm.org/utils/common/documents/jcgm/JCGM_100_2008_E.pdf (In the text)
  2. BIPM, IEC, IFCC, ILAC, ISO, IUPAC, IUPAP, and OILM, Evaluation of Measurement data − Supplement 1 to the “Guide to the expression of uncertainty in measurement” − propagation of distributions using a Monte Carlo method. Technical report, Bureau International de Poids et Mesures (BIPM), 2008, http://www.bipm.org/utils/common/documents/jcgm/JCGM_101_2008_E.pdf (In the text)
  3. P.M. Harris, C.E. Matthews, M.G. Cox, A.B. Forbes, Summarizing the output of a Monte Carlo Method for uncertainty evaluation, Metrologia 51, 243–252 (2014) [CrossRef] (In the text)
  4. V. Ramnath, Application of quantile functions for the analysis and comparison of gas pressure balance uncertainties, Int. J. Metrol. Qual. Eng. 8, 1–18 (2017) [NASA ADS] [CrossRef] [EDP Sciences] (In the text)
  5. BIPM, IEC, IFCC, ILAC, ISO, IUPAC, IUPAP, and OILM, Evaluation of measurement data − supplement 2 to the “Guide to the expression of uncertainty in measurement” − extension to any number of output quantities. Technical report, Bureau International de Poids et Mesures (BIPM), 2011, http://www.bipm.org/utils/common/documents/jcgm/JCGM_102_2011_E.pdf (In the text)
  6. R.S. Dadson, S.L. Lewis, G.N. Peggs, The Pressure Balance: Theory and Practice ( HMSO, London, 1982) (In the text)
  7. A. Picard, R.S. Davis, M. Glaser, K. Fujii, Revised formula for the density of moist air (CIPM-2007), Metrologia 45, 149–159 (2008) [CrossRef] (In the text)
  8. M. Dehghan, M. Hajarian, Some iterative free quadratic and cubic convergence iterative formulas for solving nonlinear equations, Comput. Appl. Math. 29, 19–30 (2010) [CrossRef] (In the text)
  9. W. Bich, Interdependence between measurement uncertainty and metrological traceability, Accredit. Qual. Assur. 14, 581–586 (2009) [CrossRef] (In the text)
  10. M.G. Cox, P.M. Harris, Software support for metrology best practice guide no. 6–uncertainty evaluation. Technical report, National Physical Laboratory (United Kingdom), 2006. NPL Report DEM-ES-011 (ISSN 1744-0475), http://publications.npl.co.uk/npl_web/pdf/dem_es11.pdf (In the text)
  11. R. Palencar, G. Wimmer, M. Halaj, Determination of the uncertainties and covariances in the calibration of the set of weights, Meas. Sci. Review 2, 9–20 (2002) (In the text)
  12. EURAMET, Calibration of Pressure Balances. Technical report, European Association of National Metrology Istitutes, 2011. Guide 3 Version 1.0, ISBN 978-3-942992-02-2 (In the text)
  13. W. Sabuga, G. Molinar, G. Buonanno, T. Esward, J.C. Legras, Finite element method used for calculation of the distortion coefficient and associated uncertainty of a PTB 1 GPa pressure balance − EUROMET project 463, Metrologia 43, 311–625 (2006) [CrossRef] (In the text)
  14. S. Dogra, S. Yadav, A.K. Bandyopadhyay, Computer simulation of a 1.0 GPa piston-cylinder assembly using finite element analysis (FEA), Measurement 43, 1345–1354 (2010) [CrossRef] (In the text)
  15. M.G. Cox, B.R.L. Siebert, The use of a Monte Carlo method for evaluating uncertainty and expanded uncertainty, Metrologia 43, S178–S188 (2006) [CrossRef] (In the text)
  16. R. Willink, Representing Monte Carlo output distributions for transferability in uncertainty analysis: modelling with quantile functions, Metrologia 46, 154–166 (2009) [CrossRef] (In the text)
  17. K. Goda, Statistical modeling of joint probability distribution using copula: application to peak and permanent displacement seismic demands, Struct Saf. 32, 112–123 (2010) [CrossRef] (In the text)
  18. Scipy v0.19.0 reference guide − scipy.interpolate.bspline, https://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.BSpline.html#scipy.interpolate.BSpline (accessed: 2017 /02/04). (In the text)
  19. V. Ramnath, Determination of pressure balance distortion coefficient and zero-pressure effective area uncertainties, Intern. J. Metrology Quality Eng. 2, 101–119 (2011) [CrossRef] [EDP Sciences] (In the text)
  20. M. Krystek, M. Anton, A least-squares algorithm for fitting data points with mutually correlated coordinates to a straight line, Meas. Sci. Technol. 22, 035101 (2011) [CrossRef] (In the text)
  21. X.-S. Tang, D.-Q. Li, C.-B. Zhou, K.-K. Phoon, L.-M. Zhang, Impact of copulas for modelling bivariate distributions on system reliability, Struct. Saf. 44, 80–90 (2013) [CrossRef] (In the text)
  22. C. Genest, A.-C. Favre, Everything you always wanted to know about copula modeling but were afraid to ask, J. Hydrol. Eng. 12, 347–368 (2007) [CrossRef] (In the text)
  23. X. He, P. Ng, S. Portnoy, Bivariate quantile smoothing splines, J. R. Stat. Soc. Ser. B: Stat. Methodol. 60, 537–550 (1998) [CrossRef] (In the text)
  24. D.H. Bailey, J.M. Borwein, Hight-precision arithmetic in mathematical physics, Mathematics 3, 337–367 (2015) [CrossRef] (In the text)
  25. D.R. White, P. Saunders, The propogation of uncertainty with calibration equations, Meas. Sci. Technol. 18, 2157–2169 (2007) [CrossRef] (In the text)
  26. W.H. Press, S.A. Teukolsky, W.T. Vetterling, B.P. Flannery, in Numerical Recipes: The Art of Scientific Computing, 3rd edn. (Cambridge University Press, 2007) (In the text)
  27. W.G. Gilchrist, in Statistical Modelling with Quantile Functions, edited by Chapman and Hall, 1st edn. (2000) ISBN 1-58488- 174–7 [CrossRef] (In the text)
  28. P. Chaudhuri, On a geometric notion of quantiles for multivariate data, J. Am. Stat. Assoc. 91, 862–872 (1996), http://www.jstor.org/stable/2291681 [CrossRef] (In the text)
  29. H. Kajikawa, T. Kobata, K. Ide, A. Ooiwa. Precise determination of the jacket pressure coefficient of controlled clearance pressure balances, in 3rd IMEKO TC16 International Conference on Pressure Measurement (Curran Associates, Inc., 2007), p. 125–134, ISBN 978-1-61567-635-4 (In the text)
  30. H. Kajikawa, K. Ide, T. Kobata, Methods of precisely estimating the jacket pressure coefficient of controlled clearance piston cylinders at pressures up to 1 GPa, Metrologia 48, 352–358 (2011) [CrossRef] (In the text)
  31. I. Kocas, W. Sabuga, M. Bergoglio, A. Eltaweel, C. Korasie, P. Farar, J. Setina, B. Waller, Y. Durgut, Final report on key comparison euramet.m.p-k13 in the range 50 mpa to 500 mpa of hydraulic gauge pressure, Metrologia 52, 07008 (2015) [CrossRef] (In the text)
  32. U. Diwekar, A. David, in BONUS Algorithm for Large Scale Stochastic Nonlinear Programming Problems, edited by U. Diwekar, A. David (Springer, 2015), pp. 27–34, ISBN 978-1-4939-2281-9, http://link.springer.com/content/pdf/10.1007%2F978-1-4939-2282-6_3.pdf (In the text)
  33. E. Larsson, B. Fornberg, A numerical study of some radial basis function based solution methods for elliptic PDEs, Comput. Math. Appl. 46, 891–902 (2003) [CrossRef] (In the text)
  34. I. Georgieva, C. Hofreither, An algorithm for low-rank approximation of bivariate functions using splines, J. Comput. Appl. Math. (2017) 310, 80–91, DOI:10.1016/j.cam.2016.03.023 [CrossRef] (In the text)
  35. X. Luo, P.V. Shevchenko, The t copula with multiple parameters of degrees of freedom: bivariate charactersistics and application to risk management, Quant. Finance 10, 1039–1054 (2010) [CrossRef] [MathSciNet] (In the text)
  36. J. Yan, Enjoy the joy of copulas: with a package copula, J. Stat. Softw. 21, 1–21 (2007) [EDP Sciences] (In the text)
  37. I. Kojadinovic, J. Yan, Modelling multivariate distributions with continous margins using the copula R package, J. Stat.Softw. 34, 1–20 (2010) [CrossRef] (In the text)
  38. M. Hofert, I. Kojadinovic, M. Maechler, J. Yan, The Comprehensive R Archive Network − Multivariate Dependence with Copulas Version 0. 999-15, 2017, https://cran.r-project.org (accessed: 05/29/ 2017) (In the text)
  39. W. Asquith, The Comprehensive R Archive network − General Bivariate Copula Theory and Many Utility Functions Version 2.0.4, 2017, https://cran.r-project.org (accessed: 05/29/2017) (In the text)
  40. U. Schepsmeier, E. C. Brechmann, B. Graeler, T. Nagler, T. Erhardt, C. Almeida, A. Min, C. Czado, M. Hofmann, M. Killiches, H. Joe, T. Vatter, The Comprehensive R Archive Network − Statistical Inference of Vine Copulas Version 2.1.2, 2017, https://cran.r-project.org (accessed: 05/29/2017) (In the text)
  41. C. Meyer, The bivariate normal copula, Commun. Stat.: Theory Methods 42, 2402–2422 (2013) [CrossRef] (In the text)
  42. J. Segers, M. Sibuya, H. Tsukahara, The empirical beta copula, J. Mult ivar. Anal. 155, 35–51 (2017) [CrossRef] (In the text)

Cite this article as: Vishal Ramnath, Numerical analysis of the accuracy of bivariate quantile distributions utilizing copulas compared to the GUM supplement 2 for oil pressure balance uncertainties, Int. J. Metrol. Qual. Eng. 8, 29 (2017)

All Tables

Table 1

Summary of ELD parameters for generated applied pressures Pk/[Pa] for k ∈ [1, … , 10]. The index k indicates the pressures Pk from P1 = 50 MPa through to P10 = 500 MPa so that the extended lambda distribution parameter ak, bk, ck and dk can be used to summarize the probability density function distribution for Pk.

Table 2

Summary of ELD parameters for generated areas Sk/[m2] for k ∈ [1, … , 10]. The index k indicates the area Sk corresponding to the cross-floated pressures from P1 = 50 MPa through to P10 = 500 MPa so that the extended lambda distribution parameter ak, bk, ck and dk can be used to summarize the probability density function distribution for Sk.

Table 3

Marginal distribution ELD parameters for zero-pressure area A0 and distortion coefficient λ variates. The underling probability density function distribution for the zero-pressure area random variable and the distortion coefficient random variable ξλ is modelled and approximated with an extended lambda distribution based on the discrete data obtained from the GUM Supplement 2 uncertainty analysis.

Table 4

Specification of common parameter based bivariate copula families (Adapted from Goda [17]). In our particular problem the variable u corresponds to the random variable and the variable v corresponds to the random variable ξλ so that the bivariate copula C (u, v) can be used to construct the joint probability density function as  for convenience.

Table 5

Extended lambda distribution (ELD) parameters for transfer standard pressure balance generated applied pressures Pk/[Pa] using bivariate copula joint PDF. The index k corresponds to the pressures Pk for 50 MPa through to 500 MPa and the parameters ak, bk, ck and dk can then be used to construct analytical expressions for the probability density function distribution for each of the Pk pressures.

Table A.1

Pressure balance laboratory standard and transfer standard mass and density data.

Table A.2

Pressure balance laboratory standard and transfer standard temperatures and ambient data.

Table A.3

Boolean values to determine pressure balance laboratory standard and transfer standard operating conditions (weights for pressure generation).

Table A.4

Boolean values to determine pressure balance laboratory standard and transfer standard operating conditions (weights for cross floating).

All Figures

thumbnail Fig. 1

Illustration of how the Monte Carlo data from a GUM Supplement uncertainty analysis is utilized in order to determine the joint probability density function for the transfer standard pressure balance. In (a) the univariate GUM Supplement 1 (GS1) Monte Carlo data of the cross-floated pressures Pk and cross-floated areas Sk is first pre-processed by averaging and plotted in order to get a rough qualitative estimate of the model parameters. In (b) the actual GS1 data points are then processed with the GUM Supplement 2 (GS2) using a suitable χ2 optimization to extract the actual model parameters and the discrete model parameter values plotted. In (c) the discrete model parameters are then post-processed using a kernal density estimation scheme in order visualize the mathematically continuously variable joint bivariate probability density function behaviour of the underlying model.

In the text
thumbnail Fig. 2

Illustration of Monte Carlo GUM Supplement 2 processed data showing a visualization of the transfer standard pressure balance bivariate joint PDF where the random variables are plotted in physical units such that 106A0/[m2] = A0/[mm2] and 1012λ/[Pa−1] = λ/[ppm/MPa] for easy visualization so that the joint PDF has the correct magnitude for a two dimensional normalization such that  when constructed from the two dimensional histogram data using  and Nλ = Ny = 58 bins respectively with M = 10 000 Monte Carlo simulation events. In (a) the actual discrete data of the bivariate joint probability density function is first plotted for a qualitative visualization. In (b) the discrete data is post-processed to produce the mathematically continuous joint probability density function in a three dimensional space where each space coordinate corresponds to a random variable. In (c) the joint probability density function is normalized in order to more clearly view the behaviour of the underlying model.

In the text
thumbnail Fig. 3

Monte Carlo GUM Supplement 2 marginal distributions. In (a) the probability density function distribution for the continuous random variable of the zero-pressure area is visualized. In (b) the probability density function distribution for the continuous random variable ξλ of the distortion coefficient is visualized. In (c) the underlying discrete data from the GUM Supplement 2 analysis that is used to build the PDF for the zero-pressure area is shown. In (d) the underlying discrete data from the GUM Supplement 2 analysis that is used to build the PDF for the distortion coefficient is shown

In the text
thumbnail Fig. 4

Illustration of comparison between the actual bivariate joint PDF surface with that of a conventional classical marginal distribution approach approximation showing the discrete set of contour curves visualized by three dimensional lines that are used to construct joint PDF surface, and that of a scattered data approximation of the corresponding normalized bivariate surface. In (a) the actual continuous probability density function distribution is shown which must be modelled and approximated through a visualization in a three dimensional space where the x coordinate corresponds to the random variable , the y coordinate corresponds to the random variable ξλ and the z coordinate corresponds to the joint PDF . In (b) the modelling and approximation is shown where the three dimensional surface is approximated by a series of three dimensional lines within the three dimensional space. In (c) the modelling and approximation is visualized for the equivalent surface that is built up from discrete three dimensional scattered points.

In the text
thumbnail Fig. 5

Illustration of how a GUM Supplement 2 (GS2) Monte Carlo uncertainty analysis would be implemented in practice for constructing the marginal distribution and copula distribution parameters. In (a) an extract of the actual M = 10000 data points is shown from the GS2 which were determined with a suitable χ2 optimization. In (b) the computer code implementation of how to process the GS2 data to fit the marginal distributions in terms of an extended lambda distribution is shown. In (c) the computer code implementation of how to determine the copula parameters using open source easily accessible standard statistical software is shown.

In the text
thumbnail Fig. 6

Illustration of m-code for estimating an ELD parametrization for univariate Monte Carlo data. The computer program works by loading the Monte Carlo univariate data as an input ydata = [y1, … , yM] T where M = 10 000 is the number of Monte Carlo simulation events and it then uses this data to determine the extended lambda distribution parameters a, b, c and d so that the probability density function can be analytically specified.

In the text
thumbnail Listing 1

Computer output used to calculate the univariate ELD parameters for A0 and .

In the text
thumbnail Listing 2

Computer output used to calculate the copula parameter with open source standard statistical software.

In the text
thumbnail Listing 3

Computer output used to calculate sampled values from the copula with open source statistical software.

In the text
thumbnail Fig. 7

Verification and validation of proposed method using normalized errors of applied pressures demonstrating that ∥En ∥ ≤1 ∀ n ∈ [1, … , 10] laboratory standard pressure balance generated pressures and transfer standard pressure balance cross-floated area measurements. The pressures P1 through to P10 correspond to pressures from 50 MPa through to 500 MPa at 50 MPa steps and the En values for each of these pressures indicates the corresponding normalized error. When calculating the En value for each pressure the known reference pressure Pref is determined from the laboratory standard pressure balance whilst the calibrated pressure Pcal is determined from the transfer standard pressure balance using the bivariate copula that models and summarizes the information of the calibrated pressure balance's zero-pressure area A0 and distortion coefficient λ. Due to the fact that all of the En values are smaller than unity this then means that the results and uncertainties that were calculated for the calibrated pressure balance when tested against the reference pressure balance are statistically consistent and hence provides proof that the proposed method of using copulas to model and summarize bivariate probability density functions has been verified and validated.

In the text

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.