P-value

Overview

In statistical hypothesis testing, the p-value is the probability of obtaining a result at least as extreme as a given data point, assuming the data point was the result of chance alone. The fact that p-values are based on this assumption is crucial to their correct interpretation. The p-value may be noted as a decimal: p-value < 0.05 means that the likelihood that the event occurred by chance alone is less than 5%. The lower the p-value, the less likely the event would occur by chance alone.^[1]

Coin flipping example

For example, say an experiment is performed to determine if a coin flip is fair (50% chance of landing heads or tails), or unfairly biased, either toward heads (> 50% chance of landing heads) or toward tails (< 50% chance of landing heads). Since we consider both biased alternatives, a two-tailed test is performed. The null hypothesis is that the coin is fair, and that any deviations from the 50% rate can be ascribed to chance alone. Suppose that the experimental results show the coin turning up heads 14 times out of 20 total flips. The p-value of this result would be the chance of a fair coin landing on heads at least 14 times out of 20 flips (as larger values in this case are also less favorable to the null hypothesis of a fair coin) or landing on tails at most 6 times out of 20 flips. In this case the random variable T has a binomial distribution. The probability that 20 flips of a fair coin would result in 14 or more heads is 0.0577. Since this is a two-tailed test, the probability that 20 flips of the coin would result in 14 or more heads or 6 or less heads is 0.0577 x 2 = 0.115.

Generally, the smaller the p-value, the more people there are who would be willing to say that the results came from a biased coin.

Interpretation

Generally, one rejects the null hypothesis if the p-value is smaller than or equal to the significance level, often represented by the Greek letter α (alpha). If the level is 0.05, then the results are only 5% likely to be as extraordinary as just seen, given that the null hypothesis is true.

In the above example, the calculated p-value exceeds 0.05, and thus the null hypothesis - that the observed result of 14 heads out of 20 flips can be ascribed to chance alone - is not rejected. Such a finding is often stated as being "not statistically significant at the 5% level".

However, had a single extra head been obtained, the resulting p-value would be 0.02. This time the null hypothesis - that the observed result of 15 heads out of 20 flips can be ascribed to chance alone - is rejected. Such a finding would be described as being "statistically significant at the 5% level".

Critics of p-values point out that the criterion used to decide "statistical significance" is based on the somewhat arbitrary choice of level (often set at 0.05). A proposed replacement for the p-value is p-rep.

Frequent misunderstandings

There are several common misunderstandings about p-values.^[2]

The p-value is not the probability that the null hypothesis is true (claimed to justify the "rule" of considering as significant p-values closer to 0 (zero)).
In fact, frequentist statistics does not, and cannot, attach probabilities to hypotheses. Comparison of Bayesian and classical approaches shows that a p-value can be very close to zero while the posterior probability of the null is very close to unity. This is the Jeffreys-Lindley paradox.
The p-value is not the probability that a finding is "merely a fluke" (again, justifying the "rule" of considering small p-values as "significant").
As the calculation of a p-value is based on the assumption that a finding is the product of chance alone, it patently cannot simultaneously be used to gauge the probability of that assumption being true.
The p-value is not the probability of falsely rejecting the null hypothesis. This error is a version of the so-called prosecutor's fallacy.
The p-value is not the probability that a replicating experiment would not yield the same conclusion.
1 − (p-value) is not the probability of the alternative hypothesis being true (see (1)).
The significance level of the test is not determined by the p-value.
The significance level of a test is a value that should be decided upon by the agent interpreting the data before the data are viewed, and is compared against the p-value or any other statistic calculated after the test has been performed.
The p-value does not indicate the size or importance of the observed effect (compare with effect size).

External links

Free p-Value Calculator for the Chi-Square test from Daniel Soper's Free Statistics Calculators website. Computes the one-tailed probability value of a chi-square test (i.e., the area under the chi-square distribution from the chi-square value to infinity), given the chi-square value and the degrees of freedom.
Free p-Value Calculator for the Fisher F-test from Daniel Soper's Free Statistics Calculators website. Computes the probability value of an F-test, given the F-value, numerator degrees of freedom, and denominator degrees of freedom.
Free p-Value Calculator for the Student t-test from Daniel Soper's Free Statistics Calculators website. Computes the one-tailed and two-tailed probability values of a t-test, given the t-value and the degrees of freedom.
Understanding P-values, Jim Berger's page with links to various websites about p-values, and a Java applet that illustrates how the numerical values of p-values can give quite misleading impressions about the truth or falsity of the hypothesis under test.

Additional reading

Dallal GE (2007) Historical background to the origins of p-values and the choice of 0.05 as the cut-off for significance
Hubbard R, Armstrong JS (2005) Historical background on the widespread confusion of the p-value (PDF)
Fisher's method for combining independent tests of significance using their p-values

References

↑ Duffy ME, Munroe BH, Jacobsen BS. Sifting the evidence — what's wrong with significance tests?. Unknown parameter |Edition= ignored (|edition= suggested) (help); Unknown parameter |book= ignored (help)
↑ Sterne JAC, Smith GD (2001). "Sifting the evidence — what's wrong with significance tests?". BMJ. 322 (7280): 226–231.

Template:Statistics

de:P-Wert it:Valore-p nl:P-waarde su:Ajén-P Template:Jb1 Template:WH Template:WikiDoc Sources

[Diffy2005-1] Duffy ME, Munroe BH, Jacobsen BS. Sifting the evidence — what's wrong with significance tests?. Unknown parameter |Edition= ignored (|edition= suggested) (help); Unknown parameter |book= ignored (help)

[Sterne2001-2] Sterne JAC, Smith GD (2001). "Sifting the evidence — what's wrong with significance tests?". BMJ. 322 (7280): 226–231.

[1]

[2]