|
By Issa Bass
Sampling is one of the most important functions of quality control. In a large scale production environment, testing every single product from production lines is not cost effective because it would require a plethora of manpower and a great deal of time and space.
Consider a company that produces a hundred thousand tires a day, if the company is open 16 hours a day (two shifts) and it takes an employee 10 minutes to test a tire, it would need at least 2084 employees in the quality control department to test every single tire that comes out of production and a tremendous amount of space for the QA department and the inventory.
For a normally distributed production output, taking a sample of the output and testing it can help determine the quality level of the whole production. Sampling consists into testing a subset of the population in order to derive a conclusion for the whole population.
The sample statistics may not always be exactly the same as their corresponding population parameters. The difference is known as the sampling error.
Suppose a population of 10 bolts has diameter measures of 9, 11, 12, 12,14, 10, 9, 8, 7, 9. The mean for that population would be 10.1. If a sample of the following three measures - 9, 14, 10- is taken from the population, the mean of the sample would be (9 + 14 +10)/3 = 11 and the sampling error 
Let's take another sample of three measures 7, 12 and 11. This time the mean will be 10 and the sampling error 
If another sample is taken and estimated, its sampling error might be different. These differences are said to be due to chance.
So if it is possible to make mistakes while estimating the population's parameters from a sample, how can we be sure that sampling can help get a good estimate? Why use sampling as a mean of estimating the population parameters?
The central limit theorem can help us answer these questions.
3 -1 Central Limit Theorem
The Central limit theorem states that for sufficiently large sample sizes , regardless of the shape of the population distribution, if samples of size n are randomly drawn from a population that has a mean and a standard deviation , the samples' means , are approximately normally distributed. If the populations are normally distributed, the samples' means are normally distributed regardless of the sample sizes.

Where is the mean of the samples and is the standard deviation of the samples.
The implication of this theorem is that for sufficiently large populations, the normal distribution can be used to analyze samples drawn from populations that are not normally distributed or which shapes are unknown.
When means are used as estimators to make inferences about a population parameters, and , then the estimator will be approximately normally distributed in repeated sampling.
3 -2 Sampling distribution of 
We have seen in the example of the bolts diameters that the mean of the first sample was 11, the mean of the second was 10. If the means of all possible samples are obtained and organized we could derive the Sampling distribution of the means.
In that example, we had 10 bolts, if all possible samples of 3 were computed, there would have been 120 samples and means.

The mean and standard deviation of that sampling distribution are given as:

Example 1:
Gajaga-electronics is a company that manufactured circuit boards, the average imperfection on a board is with a standard deviation
when the production process is under control.
A random sample of circuit boards has been taken for inspection and a mean of defects per board was found. What is the probability of getting a value of if the process is under control?
Solution:
Since the sample size is greater than 30, the central limit theorem can be used in this case even though the number of defects per board in this case follows a Poisson distribution. Therefore, the distribution of the sample mean is approximately normal with the standard deviation


corresponds to 0.4948 on the table of Normal curve areas.
The probability of getting a value of is 0.5 + 0.4948 = 0.9948.

The previous example is valid for an extremely large population. Sampling from a finite population will require some adjustment called the finite correction factor:

Z will therefore become equal to

Example 2:
A city's 450 restaurant employees average $35 tips a day with a standard deviation of 9. a sample of 50 employees is taken, what is the probability that the sample will have an average of less than $37 tips a day.
Solution:

On the Z-score table, 1.77 corresponds to .4616 therefore, the probability of getting an average daily tip of less than $37 will be .4616 + .5= .9616.
If the Finite correction factor was not taken into account, z would have been 1.57 which corresponds to .4418 on the z score table and therefore the probability of having a daily tip of less than $37 would have been .9418.
3 -3 Sampling distribution of 
When the data being analyzed are measurable as in the case of the two previous examples or in the case of distance or income, the sample mean is often privileged. However, when that data are countable as in the case of people in a group or defective items on a production line, the sample proportion is the statistic of choice.
The Sample proportion applies to situations that would have required a Binomial distribution where is the probability for a success and q the probability for a failure with .
When a random sample of n trials is selected from a Binomial population (an experiment with n identical trials with each trial having only two possible outcomes considered as success or failure) with parameter p , the sampling distribution of the sample proportion will be
Where x is the number of success.
The mean and standard deviation will be

If then the sampling distribution of can be approximated using the normal distribution.
Example 1: In a sample of 100 workers, 25 might be coming late once a week.
, the sample proportion of the late comers will be 25/100 = 0.25. In that example,

If and , the Central limit theorem applies to the sample proportion.
The Z formula for the sample proportion is given as:

Where:
= Sample proportion
P = Population proportion
n = Sample size
q = 1- p
Example 2:
40% of the parts that come of a production line are defective, what is the probability of taking a random sample of size 75 from the line and finding that .7 or less are defective?
Solution:


On the standard normal distribution table, 3.54 corresponds to 0.4998.
So the probability of finding 70% or less defective parts is 0.5 + 0.4998 = 0.9998.
Example 3:
40% of all the employees have signed up for the stock option plan. An HR specialist believes that this ratio is too high. She takes a sample of 450 employees and finds that 200 have signed up. What is the probability of getting a sample proportion larger than this if the population proportion is really 0.4?
Solution:


Which corresponds to 0.4582 on the Standard normal distribution table.
The probability of getting a sample proportion larger than 0.4 will be
0.5 - 0.4582 = 0.0418.
3 -4 Estimating the population mean with large sample sizes
Suppose a company has just developed a new process for prolonging the life of a light bulb. The engineers want to be able to date each bulb to determine its longevity, yet it is not possible to test each bulb in a production process that generates hundreds of thousands of bulbs a day. But they can take a random sample and determine its average longevity and from there, they can estimate the longevity of the whole population.
Using the central limit theorem, we have determined that the z value for sample means can be used for large samples.

By rearranging this formula, we can derive the value of .

Since Z can be positive or negative, the next formula would be more accurate

In other terms will be within the following confidence interval:

where:

is the lower confidence limit LCL and

is the upper confidence limit UCL
But a confidence interval presented as such does not take into account the area under the normal curve that is outside the confidence interval.
We estimate with some confidence that the mean is within the interval:

but we cannot be absolutely certain that it is unless the confidence interval is 100%.
For a two-tailed normal curve, if we want to be 95% sure that is within that interval, then the confidence interval will be equal to .95 (1 - or 1 - .05) and the areas under the tails will be then which corresponds to 1.96 on the Z -table.
The confidence interval should be rewritten as

or:

The table below shows the most commonly used confidence coefficients and their Z-score values.
Confidence interval (1 -  ) |
|
|
| 0.90 |
0.10 |
1.645 |
| 0.95 |
0.05 |
1.96 |
| 0.99 |
0.01 |
2.58 |
Example 1:
A survey of companies that use solar panels as a primary source of electricity was conducted. the question that was asked was this: How much of the electricity used in your company comes from the solar panels? A random sample of 55 responses produced a mean of 45 megawatts. suppose the population standard deviation for this question is 15.5 megawatts. Find the 95% confidence interval for the mean.
Solution:

We can be 95% sure that the mean will be between 40.9 and 49.1 megawatt, in other words the probability for the mean to be between 40.9 and 49.1 will be 0.95.

When the sample size is large , the sample's standard deviation can be used as an estimate of the population standard deviation.
Example 2:
A sample of 200 circuit boards was taken from a production line and it showed the number of average defects to be 7 and a standard deviation of 2. What is the 95% confidence interval for the population average ?
Solution:

In repeated sampling, 95% of the confidence intervals will enclose the average defects per circuit board for the whole population .
Example 3:
What would the interval be like if the confidence interval were 90%?
Solution:

In repeated sampling, 95% of the confidence intervals will enclose the average defects per circuit board for the whole population .
3 -5 Estimating the population mean with small sample sizes and unknown
3 -5.1 t -Distribution
We have seen that when the population is normally distributed and the standard deviation is known, can be estimated to be within the interval . But as in the case of the above example, is not known. in these cases, it can be replaced by S , the sample's standard deviation and is found within the interval . Replacing with S can only be a good if approximation if the sample sizes are large, i.e. n>30.
In fact, the Z formula has been determined not to always generate normal distributions for small sizes even if the population is normally distributed.
So in the case of small samples and when is not known, the t -distribution is used instead.
The formula for that distribution is given as:

The right side of this equation is identical to the one of the Z formula but the tables used to determine the values are different from the ones used for the z values.
Just as in the case of the z formula, the t can also be manipulated to estimate , but since the sample sizes are small, in order not to produce a biased result, we need to convert the them to degrees of freedom df. df = n - 1.
So the mean will be found within the interval or

Example:
A manager of a car rental company wants to number of times luxury cars are rented a month, she takes a random sample of 19 cars that produce the following result:
3 7 12 5 9 13 2 8 6 14 6 1 2 3 2 5 11 13 5
She wants to use these data to construct a 95% confidence interval to estimate the average.
Solution:
3 + 7 + 12 + 5 + 9 +13 + 2 + 8 + 6 + 14 + 6 + 1 + 2 + 3 + 2 + 5 +11 +13 + 5 = 127





The probability for to be between 4.64 and 8.72 is 0.99.
3 -5.2 Distribution
In most cases, in quality control, the objective of the auditor is not to find the mean of a population but rather to determine the level of variation of the output. He would for instance want to know how much variation the production process exhibits about the target in order to see what adjustments are needed to reach a defect free process.
We have already seen that the sample variance is determined as:

The formula for single variance is given as:

The shape of the resembles the normal curve but it is not symmetrical and its shape depends on the degree of freedom.

The formula can be rearranged to find . will be within the interval 
With a degree of freedom n - 1.
Example:
A sample of 9 screws was taken out of a production line and the values are as follow:
13.00mm
13.00mm
12.00mm
12.55mm
12.99mm
12.89mm
12.88mm
12.97mm
12.99mm
We are trying to estimate the population variance with 95% confidence.
Solution:
We need to determine the point of estimate which is the sample's variance.

With a degree of freedom df of n – 1 = 8.
Since we want to estimate with a confidence level of 95%,

So will be within the following interval:

From the table, the values of and for a degree of freedom of 8 are respectively 17.5346 and 2.17973.
So the confidence interval becomes:

And

The probability for to be between 0.0512 and 0.412 is 0.95.
3 -6 Estimating sample sizes
In most cases, sampling is used in quality control to make an inference for a whole population because of the cost associated to actually studying every individual part of that population. But then again, the question of the sample size arises. What size of a sample best reflects the condition of the whole population being estimated? Should we consider a sample of 150 or 1000 of products from a production line to determine the quality level of the output?
3 -6.1 Sample size when estimating the mean
At the beginning of this chapter, we defined the sampling error E as being the difference between the sampling mean and the population mean . 
We also have seen, when studying the Sampling Distribution of that when is being determined, we can use the Z formula for sampling means.

We can clearly see that the nominator is nothing but the Sampling error E. We can therefore replace by E in the Z formula and come up with:

We can determine n from this equation:
Example:
A production manager at a call center wants to know how much time should an employee spend on the phone with a customer on average. She wants to be within 2 minutes of the actual length of time and the standard deviation of the average time spent is known to be 3 minutes. What sample size of calls should she consider if she wants to be 95% confident of her result?
Solution

Since we cannot have 8.6436 calls, we can round up the result to 9 calls.
The manager can be 95% confident that with a sample of 9 calls she can determine the average length of time an employee needs to spend on the phone with a customer.
3 -6.2 Sample size when estimating the population proportion
To determine the sample size needed when estimating P, we can use the same procedure as the one we used when determining the sample size for .
We have already seen that the Z formula for the sample proportion is given as:

The Error of estimation (or sampling error) in this case will be 
We ca replace by its value in the Z formula and obtain:

We can derive n from this equation.

Example:
A study is being conducted to determine the extent to which companies promote Open Book Management. The question asked to employees is: Do your managers provide you with with enough information about the company? It was previously estimated that only 30% of the companies did actually provide the information needed to their employees. If the researcher wants to be 95% confident in the results and be within 0.05 of the true population proportion, What size of sample should she take?

She must take a sample of 323 companies.
About the author
Issa Bass is the managing editor of SixSigmaFirst. He can be reached at issa@sixsigmafirst.com
www.manorhouseassociates.com
Tell us what you think about this article. Send a note to the Editor.
|