Statistical tools and approaches to validate analytical methods: methodology and practical examples ★

. The analyst needs to know whether the result of measurement can be accepted with con ﬁ dence or, on the contrary, rejected because it is wrong. It is essential, also, to know whether an analysis method is suitable for the intended use. Likewise, it is more important for the researcher to know if he can trust a new developed method and what are the criteria to respect to ensure its validity. The statistical tools allow us to address all these points. The experimental protocol applied in this work is based on a common methodology, inspired by regulatory guidelines regarding statistical data analysis in analytical method validation, to optimize the number of assays and satisfy the study of validation criteria. In order to better understand the statistical analysis of raw data, practical examples are treated for quantify: an active ingredient in pharmaceutical drugs, a heavy metal in ﬁ shery products, and a drug in seizures.


Introduction
The principle of validation of quantitative analytical procedures is widely spread today in all domains of activities where measures are made.The objective of validation of an analytical method is to demonstrate that the method is suitable for the intended use, such as evaluation of a known product for potency, and impurities.The intent of method validation is to provide scientific evidence that the analytical method is reliable and consistent before it can be used in routine analysis of product.The analytical method validation is governed by the International Conference on Harmonization (ICH) [1,2].
The key criteria for evaluation of an analytical method are: specificity, accuracy, precision, detection limit, quantitation limit, sensitivity, working range and linearity, robustness and recovery [3,4].The ICH definitions for validation characteristics are indexed in Table 1.
The type of method and analytical technique used will determine the nature and extent of the validation studies required.The most common methods for validation are identification, assay and impurities determination [5][6][7][8] (Tab.2).Results for each applicable validation characteristic are compared against the selected acceptance criteria.
Besides ICH, other guidance can be referred for detailed information on the international requirements such as the US Food and Drug Administration guidance (FDA) [9][10][11] and the United States Pharmacopoeia (USP) [12].
The aim of this work is to furnish a conspectus of regulatory expectations related to statistical analysis and the survey of mutual statistical techniques used to analyze analytical method validation data with some examples.
The following two points are given: -Statistics to analyze data for analytical method validation such as mean, standard deviation, confidence intervals, and linear regression.-Regulatory guidelines regarding statistical data analysis in analytical method validation.

Statistical tools in analytical method validation
Statistical analysis of data obtained during a method validation should be performed to demonstrate validity of the analytical method.The primary parameters used for Table 1.Performance criteria for analytical method validation.

Validation characteristics ICH definitions
Specificity Specificity is the ability to assess unequivocally the analyte in the presence of components which may be expected to be present.Typically these might include impurities, degradants, matrix, etc.

Accuracy
The accuracy of an analytical procedure expresses the closeness of agreement between the value that is accepted either as a conventional true value or an accepted reference value and the value found.This is sometimes termed trueness.

Precision
The precision of an analytical procedure expresses the closeness of agreement (degree of scatter) between a series of measurements obtained from multiple sampling of the same homogenous sample under the prescribed conditions.Precision may be considered at three levels: repeatability, intermediate precision, and reproducibility.

Repeatability
Repeatability expresses the precision under the same operating conditions over a short interval of time.Repeatability is also termed intra-assay precision.Intermediate precision Intermediate precision expresses within-laboratories variations: different days, different analysts, different equipments, etc.

Reproducibility
Reproducibility expresses the precision between laboratories (collaborative studies usually applied to standardization of methodology).

Detection limit
The detection limit of an individual analytical procedure is the lowest amount of analyte in a sample that can be detected but not necessarily quantitated as an exact value.

Quantitation limit
The quantitation limit of an individual analytical procedure is the lowest amount of analyte in a sample that can be quantitatively determined with suitable precision and accuracy.The quantitation limit is a parameter of quantitative assays for low levels of compounds in sample matrices and is used particularly for the determination of impurities and/or degradation products.

Linearity
The linearity of an analytical procedure is its ability (within a given range) to obtain test results that are directly proportional to the concentration (amount) of analyte in the sample.

Range
The range of an analytical procedure is the interval between the upper and lower concentration (amounts) of analyte in the sample (including these concentrations) for which it has been demonstrated that the analytical procedure has a suitable level of precision, accuracy, and linearity.

Robustness
The robustness of an analytical procedure is a measure of its capacity to remain unaffected by small, but deliberate, variations in method parameters and also provides an indication of its reliability during normal usage.the interpretation of analytical method validation results are the calculation of the mean (or average), standard deviation, relative standard deviation, confidence intervals, and regression analysis and the main tools are the F-test, t-test, and regression and correlation analysis.These calculations are characteristically performed using statistical software packages such as SPSS, R, and Minitab.The goal of statistical analysis is to summarize a collection of data that provides an understanding of the examined method characteristic.The statistical tools are described in the following.

Mean
The mean or average of a data set is the basic and the most common statistics used.The mean is calculated by adding all data points and dividing the sum by the number of samples.It is typically denoted by X (X bar) and is computed using the following formula: where X i are individual values and n is the number of individual data points.

Standard deviation
The standard deviation of a data set is the measure of the spread of the values in the sample set and is computed by measuring the difference between the mean and the individual values in a set.It is computed using the following formula: where X i is individual value, X is the sample mean, and n is the number of individual data points.

Relative standard deviation
The relative standard deviation is computed by taking the standard deviation of the sample set multiplied by 100% and dividing it by the sample set average.The relative standard deviation is expressed as percent.Generally, the acceptance criteria for accuracy, precision, and repeatability of data is expressed in %RSD:

Note
When needed (e.g. for the F-test,F ¼ S 2 1 =S 2 2 ) the variance can, of course, be calculated by squaring the standard deviation:

Confidence interval
Confidence intervals are used to indicate the reliability of an estimate.Confidence intervals provide limits around the sample mean to predict the range of the true population of the mean.The prediction is usually based on probability of 95%.The confidence interval depends on the sample standard deviation and the sample mean.

Confidence interval for
where s is the sample deviation, X is the sample mean, n is the number of individual data points, and z is constant obtained from statistical tables for z.
The value of z depends on the confidence level listed in statistical tables for z.For 95%, z is 1.96 [13].For small samples, z can be replaced by t-value obtained from the Student's t-distribution tables [14].The value of t corresponds to nÀ1.
For example, for: So, this calculated confidence interval indicates that the range of the true population of the mean is between 101.33 and 102.31.
In brief, confidence interval can be applied in various ways to determine the size of errors (confidence) in analytical work or measurements: single determinations in routine work, certain calibrations, determinations for which no previous data exist, etc.

Note
Student's t-distribution becomes closer and closer then the standard normal distribution as the degrees of freedom get larger.The t-distribution will be indistinguishable from the standard normal distribution, as can be seen in Figure 1.
On the other hand, we can use the table for the standard normal distribution instead of the table for the t-distribution (Fig. 2).
Figure 2 shows that 68% of the data fall in the range X ± s, 95% in the range X ± 2s , and 99.7% in the range X ± 3s.
When the distribution is skewed statistical treatment is more complicated.

Regression analysis
Regression analysis is used to evaluate a linear relationship between test results.A linear relationship is, in general, evaluated over the range of the analytical procedure.The data obtained from analysis of the solutions prepared at a range of different concentration levels is habitually investigated by plotting on a graph.
Linear regression evaluates the relationship between two variables by fitting a linear equation to observed data.A linear regression line has an equation of the form Y = b 0 + b 1 Â X, where X is the independent variable and Y is the dependent variable.The slope of the line is b 1 , and b 0 is the intercept (the value of y when x = 0).The statistical procedure of finding the "best-fitting" straight line is to obtain a line through the points to minimize the deviations of the points from the prospective line.The best-fit criterion of goodness of the fit is known as the principle of least squares.In mathematical terms, the best fitting line is the line that minimizes the sum of squares of the deviations of the observed values of Y from those predicted.
In Figure 3, the data clearly shows a linear relationship.
Calculations used to compute y-intercept and the line slope are as follows: So, the equation of the line, for data listed in Figure 2, is Y = À0.2147+ 0.0225 X.
Once a regression model has been fit to a group of data, examination of the residuals (the deviations from the fitted line to the observed values) allows investigation of the validity of the assumption that a linear relationship exists.Plotting the residuals on the y-axis against the independent variable on the x-axis reveals any possible non-linear relationship among the variables or might alert to investigate outliers.
The other important calculations that are ordinarily reported are the coefficient of determination (r 2 ) and linear correlation coefficient (r).The coefficient of determination (r 2 ) measures the proportion of variation that is explained by the model.Ideally, r 2 should be equal to one, which would indicate zero error.The correlation coefficient (r) is the correlation between the predicted and observed values.This will have a value between 0 and 1; the closer the value is to 1, the better the correlation.Any data that form a straight line will give high correlation coefficient; therefore, extra caution should be taken when interpreting correlation coefficient.Additional statistical analysis is recommended to provide estimates of systematic errors, not just the correlation or results.For example, in method comparison studies, if one method gives consistently higher results than the other method, the results would show linear correlation and have a high correlation coefficient, although a difference between the two methods.
Equations used to determine the coefficient of determination (r 2 ) and the correlation coefficient (r) are listed in Table 3.

The hypothesis tests
The hypothesis tests are intended to verify if the experimental data are consistent with certain theoretical hypothesis.
-The null hypothesis symbolized by H 0 considers that the two elements or series of elements are equal.-The second step consists in measuring the deviation between different characteristics.-The third step is to calculate the probability P to have this deviation if H 0 is true.-The fourth step is to draw conclusions that are required: If P is large, we admit that H 0 is plausible, on the other side if P is small, the deviation is incompatible with H 0 .The value limit of P that is fixed to determine if P is large or small is the level of confidence or significance level (usually we chose P = 0.95 as level of confidence (a = 0.05 as significance level)).
Four situations are possible: -Acceptance of H 0 true.

Other statistical tools
Other statistical tools used in method validation include comparative studies using Student's t-test, Fisher's test, analysis of variation (ANVA), design of experiments, and assessment of outliers.Information on these statistical tools can be obtained from references on statistics suggested in the reference section.

Validation characteristics
Statistics recommended by ICH, USP, and FDA to evaluate method suitability are presented below.

Specificity/selectivity
Specificity is a quantitative indication of the extent to which a method can distinguish between the analyte of interest and interfering substances on the basis of signals produced under actual experimental conditions.Random interferences should be determined using representative blank samples.Table 4 presents an example of specificity of a developed method for determination of ornidazole in pharmaceutical formulations [16].
The % recovery was found to be in the range of 99.4-100.4%,hence there were no interferences of the excipients and additives which indicate the selectivity of the developed method.

Accuracy
Accuracy refers to closeness of agreement between the true value of the analyte concentration and the mean result obtained by applying experimental procedure to a large number of homogeneous samples.It is related to systematic error and analyte recovery.Systematic errors can be established by the use of appropriate certified reference materials (matrix-matched) or by applying alternative analytical techniques.Table 5 provides an example of The slope of the line relate to the relationship between concentration and response The standard error of the intercept can be used to calculate the required confidence interval The standard error of the slope can be used to calculate the required confidence interval The square of the correlation coefficient The correlation between the predicted and observed values.This will have a value between 0 and 1; the closer the value is to 1, the better the correlation ffiffiffiffi ffi r 2 p

Regression SS
The regression sum of squares is the variability in the response that is accounted for by the regression line SS total À P (X i ) 2   Residual SS (the error sum of squares) The residual sum of squares is the variability about the regression line (the amount of uncertainty that remains)

SS total À SS regression
Total SS The total sum of squares is the total amount of variability in the response accuracy data assessment of an analytical method for atomic absorption spectrometry analysis of Pb in bivalve molluscs [17].
In this example, the calculated accuracy error is less than 2. So, it is considered insignificant [18].Therefore, the uncertainty associated to accuracy of the method is equal to the uncertainty of the reference material used for testing accuracy study.

Precision
Comparison of results obtained from samples prepared to test the following conditions: -Repeatability expresses the precision under the same operating conditions over a short interval of time.
Repeatability is also termed intra-assay precision.-Intermediate precision expresses within-laboratories variations: different days, different analysts, different equipments, etc. -Reproducibility expresses the precision between laboratories (collaborative studies, usually applied to standardization of methodology).
Both repeatability and the reproducibility are expressed in terms of standard deviation and are generally dependent on analyte concentration.It is thus recommended that the repeatability and within-laboratory reproducibility are determined at different concentrations across the working range, by carrying out 10 repeated determinations at each concentration level.As stipulated by Horwitz and Albert [19], the variability between laboratories is the dominating error component in the world of practical ultartrace analysis.They conclude that a single laboratory cannot determine its own error structure, except in the context of certified reference materials or consensus results from other laboratories.Table 6 provides an example of a typical data analysis summary for the evaluation of a precision study for an analytical method for the quantitation of Ecstasy in seizures by UV-Vis spectrophotometer [15].In this example, the method was tested in two different laboratories by two different analysts on two different instruments.
The standard deviations and the percentage recoveries (not more than 2%) indicate good precision of the method [20].
In the example provided in Table 6, precision is determined for a number of different levels during validation, which include system precision, repeatability, intermediate precision, and reproducibility.The system precision is evaluated by comparing the means and relative standard deviations.Reproducibility is assessed by means of an inter-laboratory assay.The intermediate precision is  established by comparing analytical results obtained when using different analysts and instruments and performing the analysis on different days.The repeatability is assessed by measuring the variability in the results obtained when using the analytical method in a single determination.In each case, the mean and % of RSD is calculated and compared to the established acceptance criteria.

Detection limit
The ICH guideline mentions several approaches for determining the detection limit: visual inspection, signalto-noise, and using the standard deviation of the response and the slope.The detection limit and the method used for determining the detection limit should be presented.If visual evaluation is used, the detection limit is determined by the analysis of samples with known concentration of analyte and by establishing the minimum level at which the analyte can be reliably detected.The signal-to-noise ratio is performed by comparing measured signals from samples with known low concentrations of analyte with those of blank.When the detection limit is based on the standard deviation of the response and the slope, it is calculated using the following equation.
where s is the standard deviation of the response and S is the slope of the calibration curve.The limit of detection is usually expressed as the analyte concentration corresponding to the sample blank plus three sample standard deviations, based on 10 independent analyses of sample blanks.

Quantitation limit
The ICH guideline mentions several approaches for determining the quantitation limit: an approach based on visual evaluation, an approach based on signal-to-noise and an approach based on the standard deviation of the response and the slope.The quantitation limit and the method used for determining the quantitation limit should be presented.When the quantitation limit is based on the standard deviation of the response and the slope, it is calculated using the equation below: where s is the standard deviation of the response and S is the slope of the calibration curve.Limit of quantitation is set by various conventions to be five, six or ten standard deviations of the blank mean.It is also sometimes known as the limit of determination.
The LDM defined as the lowest detectable concentration on the calibration curve where both accuracy and precision should be within the maximum tolerable CV of 5.32%, was deemed to be 0.406 μg/mL (Tab.7).This LDM is adequate for the analysis of forensic samples, as this value falls within the concentration range of MDMA in many ecstasy tablets analyzed.Furthermore, Ration of Conformity (6.26) is between 4 and 10 [21], so LDM is validated.

Working and linear ranges
For any quantitative method, there is a range of analyte concentrations over which the method may be applied (example in Tab. 8).At the lower end of the concentration range the limiting factor is the value of the limit of detection and/or limit of quantification.At the upper end of the concentration range limitations will be imposed by various effects depending on the detection mechanism.
Within this working range there may exist a linear range, within which the detection response will have a sufficiently linear relation to analyte concentration.The working and linear range may differ in different sample types according to the effect of interferences arising from the sample matrix.It is recommended that, in the first instance, the response relationship should be examined over the working range by carrying out a single assessment of the response levels to at least six concentration levels.To determine the response relationship within the linear range, it is recommended that three replicates (examples in Tab. 9) are carried out at each of at least six concentration levels.
If there is a linear relationship, test results should be evaluated by linear regression analysis.The correlation coefficient, y-intercept, and slope of the regression line and residual sum of squires should be submitted with a plot of data.

Sensitivity
Sensitivity is the measure of the change in instrument response which corresponds to a change in analyte concentration.Where the response has been established as linear with respect to concentration, sensitivity corresponds to the gradient of the response curve.Recovery study reported in Table 10 shows that the calculated percentage recovery varies between 79.38% and 131.62%.These percentages were validated by Student's t-test [24].

Robustness
This is a measure of how effectively the performance of the analytical method stands up to less than perfect implementation.In any method there will be certain parts which will severely affect the method performance, unless they are carried out with sufficient care.These aspects should be identified and, if possible, their influence on the method performance should be evaluated using the robustness tests.These tests provide important information for the evaluation of the measurement uncertainty.The methodology for evaluating uncertainty given in the IS Guide [25] relies on identifying all parameters that may affect the result and on quantifying the uncertainty contribution from each source.This is very similar to procedures used in robustness tests which identify all the parameters likely to influence the result and determine the acceptability of their influence through control.If carried out with this in mind, the robustness tests can provide information on the contribution to the overall uncertainty from each of the parameters studied.Mean and %RSDs are compared against the acceptance criteria to evaluate impact of changing experimental parameters.
The ICH guidelines suggest detailed validation schemes relative to the purpose of the methods.It lists recommended data to report for each validation parameter.Acceptance criteria for validation must be based on the previous performances of the method, the product specifications and the phase of development.
The path to validation forms a continuum.It begins in the early phases of development as a set of informal experiments that establishes the soundness of the method for its intended purpose.It is expanded throughout the regulatory submission process into a fully-documented report that is required for commercial production.It is repeated whenever there is a significant change in instrumentation, method, specifications and process [1][2][3][4][5][6][7][8][9][10][11][12].

Uncertainty components
All the features that are used to describe the performance of a method of analysis do not directly impact on the evaluation of the uncertainty of the analytical result.Table 11 summarizes the impact of different characteristics on the evaluation of uncertainty.

Challenges
To succeed development and validation of a method, a total approach is recommended.A common challenge encountered during methods development and validation is that methods are generally developed by the research and development department while validation is generally the responsibility of a quality assurance and quality control.It is important that all the groups work as one staff.The transfer of analytical methods from one group to another so becomes an important step for ensuring that the proper validation is in place to justify its intended use.In order that the method will be run by several groups during its progress from development to validation, it must be robust.A shared weakness in development and validation of methods is that the methods are not robust quite.If robustness is not built into methods early in development then the results are likely to lack efficiency in quality testing and encounter lengthy and complicated validation process.The design and execution of the studies requires thorough knowledge of the product being tested as well as a good understanding of the analysis technique.New regulatory guidelines are being published [26][27][28][29][30] that govern the expectations of regulatory agencies throughout the world for methods development and validation.There is need to meet current regulatory standards.From a simple method improvement to a complete redevelopment and subsequent implementation is tedious task.For this reason, one must be alert to current trends in regulatory guidelines and to adopt a proactive approach to changes that may affect development and validation programs.Finally, one of the key requirements for methods validation is that only well-characterized reference materials with proper documented purities should be used during method validation.

Conclusion
This work discusses applications of statistics in analytical method validation.Method validation helps to validate the analytical method for a range of concentrations so that the change in formulation or concentration do not require additional validation.Once the methods have been developed, qualified and validated the impact they have on out-of-specification rates and process capability needs to be quantified and evaluated to determine its effectiveness for future use.The statistical methods used during the analytical method validation involve the basic knowledge of statistics.Even though there are statistical packages available to perform statistical calculations, it is important to understand the mathematical basis behind these calculations.It is essential for the analysts to be familiar with the basic statistical elements.Statistics used for validation data interpretations should be incorporated into the company's standard procedure and specified in the validation protocol and report.

Characteristics
Impacts on uncertainty Specificity/selectivity Effects related to the selectivity and specificity are overall measured when the evaluation of the uncertainty component associated with accuracy.Precision These variances are among the most significant elements for the assessment of uncertainty.Linearity, sensitivity 1 st case: If the statistical test shows that the model is linear, no effect on the uncertainty.2 nd case: If the test shows that the model is not linear, then we may either reduce the measurement field or add a component (of non-linearity) in the evaluation of uncertainty.This component can be the maximum deviation to the model.

Limit of detection
This feature does no effect on the uncertainty assessment; it serves to define the field of use of the method.

Robustness
For a factor whose change scope is limited to ∓a, then the corresponding uncertainty is: Þ, where c is the sensitivity coefficient of the response due to variations of the factor a.

Accuracy
In chemical analysis, it is not applied corrections of accuracy.The goal during the development of the method is to check that there is no significant systematic error. If:

Fig. 3 .
Fig. 3. Linear regression analysis of the calibration curve of Ecstasy in a seizure by UV-Vis spectrophotometer.Each point is the mean ∓SD of three experiments [15].
This work is dedicated to the memory of Professor M'Hamed CHARROUF, died on November 29, 2014.

Table 2 .
Validation characteristics according to the nature of the test.
[8]ignifies that this characteristic is not normally evaluated; + signifies that this characteristic is normally evaluated; (1) in cases where reproducibility (see glossary) has been performed, intermediate precision is not needed[8]; (2) lack of specificity of one analytical procedure could be compensated by other supporting analytical procedure(s); (3) may be needed in some cases.

Table 7 .
[15]t of detection and limit of quantification data of a method for the quantitation of Ecstasy in seizures by UV-Vis spectrophotometer[15].

Table 8 .
[22]ple of linear regression analysis of standard solutions for cocaine by a HPLC/MS/MS method[22].

Table 9 .
[22]ple of linear regression analysis of standard solutions for a 1-100 ng/mL concentration range, performed in 3 different days for cocaine by a HPLC/MS/MS method[22].

Table 10 .
[23]lt of recovery study of cadmium quantification in bivalve molluscs using an atomic absorption spectrometry with graphite furnace method[23].
C: concentration before standard additions; C f : concentration after standard additions; C a : concentration of standard additions.

Table 11 .
Impacts of characteristics on uncertainty.