Z-test
You don't need to be Editor-In-Chief to add or edit content to WikiDoc. You can begin to add to or edit text on this WikiDoc page by clicking on the edit button at the top of this page. Next enter or edit the information that you would like to appear here. Once you are done editing, scroll down and click the Save page button at the bottom of the page.
Please Take Over This Page and Apply to be Editor-In-Chief for this topic: There can be one or more than one Editor-In-Chief. You may also apply to be an Associate Editor-In-Chief of one of the subtopics below. Please mail us [1] to indicate your interest in serving either as an Editor-In-Chief of the entire topic or as an Associate Editor-In-Chief for a subtopic. Please be sure to attach your CV and or biographical sketch.
The Z-test is a statistical test used in inference which determines if the difference between a sample mean and the population mean is large enough to be statistically significant.
Notation and mathematics
In order for the Z-test to be reliable, certain conditions must be met. The most important is that since the Z-test uses the population mean and population standard deviation, these must be known. The sample must be a simple random sample of the population. If the sample came from a different sampling method, a different formula must be used. It must also be known that the population varies normally (i.e., the sampling distribution of the probabilities of possible values fits a standard normal curve). If it is not known that the population varies normally, it suffices to have a sufficiently large sample, generally agreed to be ≥ 30 or 40.
In actuality, knowing the true σ of a population is unrealistic except for cases such as standardized testing in which the entire population is known. In cases where it is impossible to measure every member of a population it is more realistic to use a t-test, which uses the standard error obtained from the sample along with the t-distribution.
The test requires the following to be known:
- σ (the standard deviation of the population)
- μ (the mean of the population)
- x (the mean of the sample)
- n (the size of the sample)
First calculate the standard error (SE) of the mean:
The formula for calculating the z score for the Z-test is as follows:
Finally, the z score is compared to a Z table, a table which contains the percent of area under the normal curve between the mean and the z score. Using this table will indicate whether the calculated z score is within the realm of chance or if the z score is so different from the mean that the sample mean is unlikely to have happened by chance.
The Z-test is used primarily with standardized testing to determine if the test scores of a particular sample of test takers are within or outside of the standard performance of test takers.
Example
Let's take a look at using the Z-test with standardized testing.
In a U.S. school district, a standardized reading test is used to test the performance of fifth grade students in an elementary school against the national norm for fifth grade students. The number of fifth grade students in this elementary school taking the test is 55 students.
The national norm test score, the population mean, for this particular standardized test is 100 points. The population standard deviation for the year under study is 12.
The scores of the fifth grade students of the elementary school in this school district are a sample of the total population of fifth grade students in the U.S. which have also taken the test.
The school district is told that the mean for their particular school is 96, which is lower than the national mean. Parents of the students become upset when they learn their school is below the national norm for the reading test. The school district administration points out that the test scores are actually pretty close to the population mean though they are lower.
The real question is this, is the school's mean test score sufficiently lower than the national norm as to indicate a problem or is the school's mean test score within acceptable parameters. We will use the Z-test to see.
First of all calculate the standard error of the mean:
Next calculate the z score:
Remember that a z score is the distance from the population mean in units of the population standard deviation. This means that in our example, a mean score of 96 is −2.47 standard deviation units from the population mean. The negative means that the sample mean is less than the population mean. Since the normal curve is symmetric the Z table is always expressed in positive z scores so if the calculated z score is negative, look it up in the table as if it were non-negative.
Next we look the z score up in a Z table and we find that a z score of −2.47 is 49.32%. This means that the area under the normal curve between the population mean and our sample mean is 49.32%.
What this tells us is that 49.32% plus 50% or 99.32% of all the possible samples of students of the same size would have a higher test score mean than our sample of fifth grade students. This is because our z score is negative so we are below the population mean. So not only do we include the distance between our sample mean and the population mean, we also include the area under the normal curve which is greater than the population mean.
If our sample mean had been 104 rather than 96, then our z score would have been 2.47 which would have indicated that our sample mean was above the population mean. That would have indicated that the fifth grade students in our sample were in the top 0.7% of the nation.
But let's get back to our original question. Is there a problem with the reading program at our elementary school? Our question can be reformulated to say, is the mean from our elementary school, a sample from the general population of fifth grade students, far enough outside of the norm that we need to take a corrective action to improve the reading program?
Let's put this in the form of a hypothesis which we are going to test with our statistical analysis. Our hypothesis is that our sample mean is significantly different from the population mean and that corrective action is necessary. Our null hypothesis is that the difference is purely attributable to chance and no action is necessary.
To answer this question, we need to determine what is the level of confidence (confidence level) we want to use. Typically a 0.05 confidence level is used meaning that if the null hypothesis is true we stand only a 5% chance of rejecting it anyway.
In the case of our sample mean, the z score of −2.47 which provides us a value of 49.32% means that 49.32% plus 49.32% or 98.64% of the population scored closer to the population mean than did our sample of students.
Therefore we conclude with a 95% confidence level that the test performance of the students in our sample were not within the normal variation and that we do need to take corrective action to improve the test scores.
External links
- Code/pseudo-code for Z-test at Google Groups
- http://espse.ed.psu.edu/statistics/Chapters/Chapter6/Chap6.html
References
- Sprinthall, Richard C. Basic Statistical Analysis: Seventh Edition, copyright 2003, Pearson Education Group
Acknowledgement and Attribution Regarding Sources of Content
Some of the initial content on this page may be incorporated in part from copyleft sources in the public domain including wikis such as Wikipedia and AskDrWiki. Drug information for patients came from the The National Library of Medicine. Infectious disease information may have come from the Centers for Disease Control (CDC). Differential Diagnoses are drawn from clinicians as well as an amalgamation of 3 sources: 1.The Disease Database; 2. Kahan, Scott, Smith, Ellen G. In A Page: Signs and Symptoms. Malden, Massachusetts: Blackwell Publishing, 2004:3; 3. Sailer, Christian, Wasner, Susanne. Differential Diagnosis Pocket. Hermosa Beach, CA: Borm Bruckmeir Publishing LLC, 2002:7 .

