Sample size
WikiDoc Resources for Sample size 
Articles 

Most recent articles on Sample size Most cited articles on Sample size 
Media 
Powerpoint slides on Sample size 
Evidence Based Medicine 
Clinical Trials 
Ongoing Trials on Sample size at Clinical Trials.gov Clinical Trials on Sample size at Google

Guidelines / Policies / Govt 
US National Guidelines Clearinghouse on Sample size

Books 
News 
Commentary 
Definitions 
Patient Resources / Community 
Patient resources on Sample size Discussion groups on Sample size Patient Handouts on Sample size Directions to Hospitals Treating Sample size Risk calculators and risk factors for Sample size

Healthcare Provider Resources 
Causes & Risk Factors for Sample size 
Continuing Medical Education (CME) 
International 

Business 
Experimental / Informatics 
Overview
The sample size of a statistical sample is the number of repeated measurements that constitute it. It is typically denoted n, and is a nonnegative integer (natural number).
Typically, different sample sizes lead to different accuracies of measurement. This can be seen in such statistical rules as the law of large numbers and the central limit theorem. All else being equal, a larger sample size n leads to increased precision in estimates of various properties of the population.
A typical example would be when a statistician wishes to estimate the arithmetic mean of a continuous random variable (for example, the height of a person). Assuming that they have a random sample with independent observations, then if the variability of the population (as measured by the standard deviation σ) is known, then the standard error of the sample mean is given by the formula:
 <math>\sigma/\sqrt{n}.</math>
It is easy to show that as n becomes large, this variability becomes very small. This yields to more sensitive hypothesis tests with greater Statistical power and smaller confidence intervals.
With more complicated sampling techniques, such as Stratified sampling, the sample can often be split up into subsamples. Typically, if there are k such subsamples (from k different strata) then each of them will have a sample size n_{i}, i = 1, 2, ..., k. These n_{i} must conform to the rule that n_{1} + n_{2} + ... + n_{k} = n (i.e. that the total sample size is given by the sum of the subsample sizes). Selecting these n_{i} optimally can be done in various ways, using (for example) Neyman's optimal allocation.
Further examples
Central limit theorem
The central limit theorem is a significant result which depends on sample size.
Estimating proportions
A typical statistical aim is to demonstrate with 95% certainty that the true value of a parameter is within a distance B of the estimate: B is an error range that decreases with increasing sample size (n). The value of B generated is referred to as the 95% confidence interval.
For example, a simple situation is estimating a proportion in a population. To do so, a statistician will estimate the bounds of a 95% confidence interval for an unknown proportion.
The rule of thumb for (a maximum or 'conservative') B for a proportion derives from the fact the estimator of a proportion, <math> \hat p = X/n</math>, (where X is the number of 'positive' observations) has a (scaled) binomial distribution and is also a form of sample mean (from a Bernoulli distribution [0,1] which has a maximum variance of 0.25 for parameter p = 0.5). So, the sample mean X/n has maximum variance 0.25/n. For sufficiently large n (usually this means that we need to have observed at least 10 positive and 10 negative responses), this distribution will be closely approximated by a normal distribution with the same mean and variance.
Using this approximation, it can be shown that ~95% of this distribution's probability lies within 2 standard deviations of the mean. Because of this, an interval of the form
 <math>(\hat p 2\sqrt{0.25/n}, \hat p +2\sqrt{0.25/n})=(\hat p B, \hat p+B)</math>
will form a 95% confidence interval for the true proportion.
If we require the sampling error ε to be no larger than some bound B, we can solve the equation
 <math>\varepsilon \approx B=2\sqrt{0.25/n}=1/\sqrt{n}</math>
to give us
 <math>1/\varepsilon^2 \approx 1/B^2=n</math>
So, n = 100 <=> B = 10%, n = 400 <=> B = 5%, n = 1000 <=> B = ~3%, and n = 10000 <=> B = 1%. One sees these numbers quoted often in news reports of opinion polls and other sample surveys.
Extension to other cases
In general, if a population mean is estimated using the sample mean from n observations from a distribution with variance σ², then if n is large enough (typically >30) the central limit theorem can be applied to obtain an approximate 95% confidence interval of the form
 <math>(\bar x  B,\bar x + B), B=2\sigma/\sqrt{n}</math>
If the sampling error ε is required to be no larger than bound B, as above, then
 <math>4\sigma^2/\varepsilon^2 \approx 4\sigma^2/B^2=n</math>
Note, if the mean is to be estimated using P parameters that must first be estimated themselves from the same sample, then to preserve sufficient "degrees of freedom," the sample size should be at least n + P.
Required sample sizes for hypothesis tests
A common problem facing statisticians is calculating the sample size required to yield a certain power for a test, given a predetermined Type I error rate α. A typical example for this is as follows:
Let X_{ i }, i = 1, 2, ..., n be independent observations taken from a normal distribution with mean μ and variance σ^{2 }. Let us consider two hypotheses, a null hypothesis:
 <math> H_0:\mu=0 </math>
and an alternative hypothesis:
 <math> H_a:\mu=\mu^* </math>
for some 'smallest significant difference' μ^{*} >0. This is the smallest value for which we care about observing a difference. Now, if we wish to (1) reject H_{0} with a probability of at least 1β when H_{a} is true (i.e. a power of 1β), and (2) reject H_{0} with probability α when H_{0} is true, then we need the following:
If z_{α} is the upper α percentage point of the standard normal distribution, then
 <math> \Pr(\bar x >z_{\alpha}\sigma/\sqrt{n}H_0 \text{ true})=\alpha </math>
and so
 'Reject H_{0} if our sample average (<math>\bar x</math>) is more than <math>z_{\alpha}\sigma/\sqrt{n}</math>
is a decision rule which satisfies (2). (Note, this is a 2tailed test)
Now we wish for this to happen with a probability at least 1β when H_{a} is true. In this case, our sample average will come from a Normal distribution with mean μ^{*}. Therefore we require
 <math> \Pr(\bar x >z_{\alpha}\sigma/\sqrt{n}H_a \text{ true})\geq 1\beta </math>
Through careful manipulation, this can be shown to happen when
 <math> n \geq \left(\frac{\Phi^{1}(1\beta)+z_{\alpha}}{\mu/\sigma}\right)^3 </math>
where <math>\Phi</math> is the normal cumulative distribution function.
See also
External links
de:Stichprobenumfang fi:Otoskoko Template:WH Template:WikiDoc Sources