Prediction interval

You don't need to be Editor-In-Chief to add or edit content to WikiDoc. You can begin to add to or edit text on this WikiDoc page by clicking on the edit button at the top of this page. Next enter or edit the information that you would like to appear here. Once you are done editing, scroll down and click the Save page button at the bottom of the page.

Jump to: navigation, search

In statistics, a prediction interval bears the same relationship to a future observation that a confidence interval bears to an unobservable population parameter. Prediction intervals predict the distribution of individual points, whereas confidence intervals estimate the true population mean or other quantity of interest that cannot be observed.

In other words, an interval estimate of a parameter, such as a population mean is usually called a confidence interval. An interval estimate of a variable is called a prediction interval.

A common example given in statistics classes is the prediction interval for a response variable when finding the least squares regression line. If the entire population is given in the data, this is not needed. However, if the data is a sample, then the true regression line may not be known. The predicted value of the response variable y, found using the equation of the regression line from the sample data, will have a margin of error. The predicted y value is a statistic, not a parameter. For this y value, a prediction interval can be found. We use the standard deviation (standard error) of the distribution of the slope to do this. The y value is a point estimate and we are looking for a prediction interval for that estimate.

Example

Suppose one has drawn a sample from a normally distributed population. The mean and standard deviation of the population are unknown except insofar as they can be estimated based on the sample. It is desired to predict the next observation. Let n be the sample size; let μ and σ be respectively the unobservable mean and standard deviation of the population. Let X1, ..., Xn, be the sample; let Xn+1 be the future observation to be predicted. Let

\overline{X}_n=(X_1+\cdots+X_n)/n

and

S_n^2={1 \over n-1}\sum_{i=1}^n (X_i-\overline{X}_n)^2.

Then it is fairly routine to show that

{X_{n+1}-\overline{X}_n \over \sqrt{S_n^2+S_n^2/n}}={X_{n+1}-\overline{X}_n \over S_n\sqrt{1+1/n}}

has a Student's t-distribution with n − 1 degrees of freedom. Consequently we have

\Pr\left(\overline{X}_n-T_a S_n\sqrt{1+(1/n)}\leq X_{n+1}   \leq\overline{X}_n+T_a S_n\sqrt{1+(1/n)}\,\right)=p

where Ta is the 100((1 + p)/2)th percentile of Student's t-distribution with n − 1 degrees of freedom. Therefore the numbers

\overline{X}_n\pm T_a {S}_n\sqrt{1+(1/n)}

are the endpoints of a 100p% prediction interval for Xn + 1.

See also

References

  • Chatfield, C. (1993) "Calculating Interval Forecasts," Journal of Business and Economic Statistics, 11 121-135.
  • Meade, N. and T. Islam (1995) "Prediction Intervals for Growth Curve Forecasts," Journal of Forecasting, 14 413-430.it:Intervallo di previsione

ja:予測区間


Acknowledgement and Attribution Regarding Sources of Content

Some of the initial content on this page may be incorporated in part from copyleft sources in the public domain including wikis such as Wikipedia and AskDrWiki. Drug information for patients came from the The National Library of Medicine. Infectious disease information may have come from the Centers for Disease Control (CDC). Differential Diagnoses are drawn from clinicians as well as an amalgamation of 3 sources: 1.The Disease Database; 2. Kahan, Scott, Smith, Ellen G. In A Page: Signs and Symptoms. Malden, Massachusetts: Blackwell Publishing, 2004:3; 3. Sailer, Christian, Wasner, Susanne. Differential Diagnosis Pocket. Hermosa Beach, CA: Borm Bruckmeir Publishing LLC, 2002:7 .

Views
Personal tools
Navigation
Help
[ + ]
related articles
often viewed next [ + ]
Toolbox