- Main - Syllabus - Schedule - Regulations - Resources - Projects - Concepts - | |

> Resources
- Lecture notes |
Faculty of Engineering and Natural Sciences, Bahçeşehir University, İstanbul.
Statistical analysis
Contents:
1. IntroductionDuring the development and final stages of your project you will need to verify that your product performs according to the required specifications, for example, for accuracy, precision, efficiency, rate. This is an objective process where appropriate experimental data (observations) is collected and analysed. In many cases a single measurement of a parameter (e.g. the diameter of a wheel) may be sufficient given the correct used of an appropriate instrument (e.g. a Vernier caliper). More complex systems often exhibit variability due to random processes within the system (e.g. electronic noise), or within the measurement instrument/procedure (e.g. variable alignment of rulers, reading errors), or in the working environment (e.g. ambient temperature changes, random arrival of objects on a conveyor belt). In the presence of variability, we need to make repeated measurements and then apply statistical analysis, and finally interpretation of the results. This page outlines some common basic statistical tools that you may find useful for the verification process of your product. The focus here is not only on obtaining a measurement of the performance parameter, but also an estimate of the uncertainty in the measurement. Since many other tools and procedures exist, please discuss with your supervisor which procedures and tools are appropriate for your project.
The sample mean of the values, vbar, is a estimate of the true mean (center of mass) μ of the population.
The difference between μ and the design target for the population is sometimes called accuracy.
The sample standard deviation, s, is an estimate of the standard deviation, σ, which is
a standard measure of the variability of the process; this is sometimes called precision.
These definitions of accuracy and precision are illustrated in Figure 4.
The estimates vbar and s are calculated as follows:
For these values to be meaningful, we need an estimate of their uncertainty. This is commonly achieved by calculating a confidence interval or applying a hypothesis test.
It is important to note that in both methods, uncertainty is proportional to √n; design your experiment to collect enough data for your needs.
Example: The acceleration due to gravity is determined to be g._{0} = 9.81 ± 0.02 m/s^{2} (95% ci)This can also be written as: 9.79 < g.
_{0} < 9.83 m/s^{2} (95% ci)
A 95% confidence interval means that if we were to take 100 different samples of the same sample size
and compute a 95% confidence interval for each sample, then approximately 95 of the 100 confidence intervals would contain the true value of the parameter.
It
The value of t_{0.025} depends on the number of degrees of freedom, n-1;
these values are tabulated above. For other values of n-1 you can obtain the t-value
by solving the Matlab/Octave equation:" tcdf(".
Alternative, use this t_{0.025},n-1)=0.975cdf calculator.
[open the image in a new tab to get a larger version]The value of χ^{2}_{0.95} depends on the number of degrees of freedom, n-1
and can be calculated by solving the Matlab/Octave equation:" chi2cdf(".
Alternatively, use the χ^{2}_{0.95},n-1)=0.95scale formula and take the scale from the table (open the image in a new tab to see the full size).
Alternatively, use this cdf calculator.
A confidence interval for (≈ 95% ci)This simple formulation does not work well for values of p close to 0 or 1, for such cases a proper treatment is required.
The exact Clopper-Pearson interval (based on the pmf leaving an equal probability on both sides) can be constructed as follows:
Example: >> sum(binopdf(m:n,n,0.70816)) ans = 0.025 >> sum(binopdf(0:m,n,0.87334)) ans = 0.025and so 0.708 < p < 0.873 with 95% confidence.The same result can be obtained more easily from this online calculator :-).Proof that this actually works (reasonably) well can be seen in this c++ simulation
[The simulation suggests that the confidence level is a bit higher, more like 97%, so the c.i. is a bit conservative. In some cases 0:m+1 works better, but my simulation might be a bit off(?)].
Compare this to the more simple form:
For a Poisson process, in a time There are more rigorous treatments, but the above form should be a reasonable approximation as long as you record enough events, that is much more than 10. Simply increase the time period until you have collected plenty of observations of the event.
Example As usual with sampling, the uncertainty reduces ∝ 1/√n. For example is 450 events were observed over 600 seconds (ten times the observation period) then Proof that n provides a 95% c.i. for large _{0} = n ± 2√nn can be seen in this c++ simulation.
The following sections present examples of testing a hypothese for μ, σ,
s.
_{obs}
This can be solved using Matlab/Octave, or by using this cdf calculator.Here, "small" would be a few percent or less for a significant result, and 1% or less for a very significant result (smaller is better). If the probability is large then the hypothesis cannot be rejected and we are not confident that that the accuracy requirement has been met.
Example The probability that we would observe a sample mean of 5.2473 mm when the true mean is 10 mm, is just 1.7%. This is a significant (small) P-value and so we can reject the hypothesis μ = 10 mm in favor of μ < 10 mm and so we conclude that accuracy requirement is satisfied.
This can be solved using Matlab/Octave, or by using this cdf calculator.Here, "small" would be a few percent or less for a significant result, and 1% or less for a very significant result (smaller is better). If the probability is large then the hypothesis cannot be rejected and we are not confident that that the precision requirement has been met.
Example Since we have a high probability (35%) that we would observe a sample standard deviation of 14.260 mm (or less), given a true standard deviation of 15 mm, then we cannot reject the hypothesis σ = 15 mm in favor of σ < 15 mm. We cannot say that the precision requirement is satisfied. In this case we would increase the sample size and/or improve the robot precision.
p with the hope of observing a value p' = m/n
that is statistically significantly greater than p to rule out the chance of a statistical fluctuation._{0}In a hypothesis test, we can reject the hypothesis p = p in favor of _{0}p > p if the following computed probability (P-value) is "small":
_{0}
where m is the observed number of sucessess out of n trials.This can be computed with Matlab/Octave as follows: sum(binopdf(m:n,n,Here, "small" would be a few percent or less for a significant result, and 1% or less for a very significant result (smaller is better). If the probability is large then the hypothesis cannot be rejected and we are not confident that that the requirement for p has been met.
Example The P-value is small (significant), so we can reject the hypothesis that p = 0.7 in favor of p > 0.7 and so the requirement is met.Remember that the 95% confidence interval was 0.708 < p < 0.873 which also excludes p = 0.7.
For a Poisson process, in a time n > n if the following computed probability (P-value) is "small":
_{0}Here, "small" would be a few percent or less for a significant result, and 1% or less for a very significant result (smaller is better). If the probability is large then the hypothesis cannot be rejected and we are not confident that that the requirement has been met.
Example: The P-value is very signficant (very small), less than 1%, and so reject the hypothesis and can conclude that the requirement for the mean rate, λ > 0.5, is satsisfied. This is consistent with the above confidence interval result of 0.53 < λ < 0.97 events/second (95% c.i.) which excludes λ = 0.5. Comments/corrections to andrewjohn.beddall@eng.bau.edu.tr. |

Last modified: Thu, 11 Mar 2021 21:10:52 +0300