Box-Cox transformation: Difference between revisions

Jump to navigation Jump to search
No edit summary
(Redirected page to Power transform)
 
Line 1: Line 1:
{{SI}}
#redirect[[Power transform]]
{{EH}}
 
In [[statistics]], the '''power transform''' is a family of transformations that map [[data]] from one space to another using power functions. This is a useful data (pre)[[processing]] technique used to reduce data variation, make the data more [[normal distribution]]-like, improve the correlation between variables and for other data stabilization procedures. The '''Box–Cox transformation''', by statisticians [[George E. P. Box]] and [[David Cox (statistician)|David Cox]], is one particular way of parameterising a power transform that has advantageous properties.
 
==Definition==
The power transformation is defined as a continuously varying function, with respect to the power parameter ''&lambda;'', in a piece-wise function form that makes it continuous at the point of singularity (''&lambda;''&nbsp;=&nbsp;0). For data vectors (''y''<sub>1</sub>,...,&nbsp;''y''<sub>''n''</sub>) in which each ''y''<sub>''i''</sub>&nbsp;>&nbsp;0, the power transform is
 
: <math>y_i^{(\lambda)} =
\begin{cases}
\dfrac{y_i^\lambda-1}{\lambda(\operatorname{GM}(y))^{\lambda -1}} , &\mbox{ if } \lambda \neq 0 \\  \\
\operatorname{GM}(y)\log{y_i} , &\mbox{ if } \lambda = 0
\end{cases}
</math>
 
where
 
: <math> \operatorname{GM}(y) = (y_1\cdots y_n)^{1/n} \, </math>
 
is the [[geometric mean]] of the observations ''y''<sub>1</sub>,&nbsp;...,&nbsp;''y''<sub>''n''</sub>.
 
The inclusion of the (''&lambda;''&nbsp;&minus;&nbsp;1)th power of the geometric mean in the denominator implies that the units of measurement do not change as ''&lambda;'' changes.  That makes it possible to compare sums of squares of [[errors and residuals in statistics|residuals]] and choose the value of ''&lambda;'' that minimizes that sum.
 
The value at ''Y'' = 1 for any ''λ'' is 0, and the [[derivative]] with respect to ''Y'' there is 1 for any ''λ''. Sometimes ''Y'' is a version of some other variable scaled to give ''Y'' = 1 at some sort of average value.
 
The transformation is a [[power (mathematics)|power]] transformation, but done in such a way as to make it [[continuous function|continuous]] with the parameter ''λ'' at ''λ'' = 0. It has proved popular in [[regression analysis]], including [[econometrics]].
 
Box and Cox also proposed a more general form of the transformation that incorporates a shift parameter.
 
:<math>\tau(y_i;\lambda, \alpha) = \begin{cases} \dfrac{(y_i + \alpha)^\lambda - 1}{\lambda (\operatorname{GM}(y))^{\lambda - 1}} & \mathrm{if}\ \lambda\neq 0, \\  \\
\operatorname{GM}(y)\ln(y_i + \alpha)& \mathrm{if}\ \lambda=0.\end{cases}</math>
 
If &tau;(''Y'', &lambda;, &alpha;) follows a [[truncated normal distribution]], then ''Y'' is said to follow a [[Box&ndash;Cox distribution]].
 
==Use of the power transform==
* Power transforms are ubiquitously used in various fields. For example, [http://portal.acm.org/citation.cfm?id=1172964.1173292&coll=&dl=acm&CFID=15151515&CFTOKEN=6184618 multi-resolution and wavelet analysis], [[statistical data analysis]], [http://www.andrologyjournal.org/cgi/reprint/23/5/629.pdf medical research], [http://www.springerlink.com/content/y25q020x24602701/ modeling of physical processes], [http://www.springerlink.com/content/mt81u60813077641/ geochemical data analysis], [http://www.blackwell-synergy.com/doi/abs/10.1111/j.1467-9876.2005.00476.x epidemiology] and many other clinical, environmental and social research areas.
 
==Power transform activities==
The [[SOCR]] resource pages contain a number of [http://wiki.stat.ucla.edu/socr/index.php/SOCR_EduMaterials_Activities_PowerTransformFamily_Graphs hands-on interactive activities with the Power Transform] using Java applets and charts.
 
== Example ==
The BUPA liver data set contains data on liver enzymes [[Alanine transaminase|ALT]] and [[Gamma-glutamyl transpeptidase|&gamma;GT]]. The data can be found via the [[classic data sets]] page. Suppose we are interested in using log(&gamma;GT) to predict ALT. A plot of the data appears in panel (a) of the figure. There appears to be non-constant variance, and a Box&ndash;Cox transformation might help.
 
[[image:BUPA_BoxCox.JPG]]
 
The log-likelihood of the power parameter appears in panel (b). The horizontal reference line is at a distance of &chi;<sub>1</sub><sup>2</sup>/2 from the maximum and can be used to read off an approximate 95% confidence interval for &lambda;. It appears as though a value close to zero would be good, so we take logs.
 
Possibly, the transformation could be improved by adding a shift parameter to the log transformation. Panel (c) of the figure shows the log-likelihood. In this case, the maximum of the likelihood is close to zero suggesting that a shift parameter is not needed. The final panel shows the transformed data with a superimposed regression line.
 
Note that although Box&ndash;Cox transformations can make big improvements in model fit, there are some issues that the transformation cannot help with. In the current example, the data are rather heavy-tailed so that the assumption of normality is not realistic and a [[robust regression]] approach leads to a more precise model.
 
== Econometric application ==
 
Economists often characterize production relationships by some variant of the Box&ndash;Cox transformation.
 
Consider a common representation of production ''Q'' as dependent on services provided by a capital stock ''K'' and by labor hours ''N'':
 
:<math>\tau(Q)=\alpha \tau(K)+ (1-\alpha)\tau(N).\,</math>
 
Solving for ''Q'' by inverting the Box&ndash;Cox transformation we find
 
:<math>Q=\big(\alpha K^\lambda + (1-\alpha) N^\lambda\big)^{1/\lambda},\,</math>
 
which is known as the ''constant elasticity of substitution (CES)'' production function.
 
The CES production function is a [[homogeneous function]] of degree one.
 
When ''&lambda;'' = 1, this produces the linear production function:
 
: <math>Q=\alpha K + (1-\alpha)N.\,</math>
 
When ''λ'' → 0 this produces the famous [[Cobb-Douglas]] production function:
 
: <math>Q=K^\alpha N^{1-\alpha}.\,</math>
 
==Activities and demonstrations==
The [[SOCR]] resource pages contain a number of [http://wiki.stat.ucla.edu/socr/index.php/SOCR_EduMaterials_Activities_PowerTransformFamily_Graphs hands-on interactive activities] demonstrating the Box&ndash;Cox (Power) Transformation using Java applets and charts. These directly illustrate the effects of this transform on [[Qq plot]]s, X-Y [[scatterplot]]s, [[time-series]] plots and [[histogram]]s.
 
==References==
* {{cite journal | last = Box | first = George E. P. | authorlink = George EP Box | coauthors = [[David Cox (statistician)|Cox, D. R.]] | title = An analysis of transformations | journal = Journal of the Royal Statistical Society, Series B | volume = 26 | pages = 211–246 | date = 1964 | url=http://www.jstor.org/stable/2984418}}
*  Carroll, RJ and Ruppert, D. [http://wiki.stat.ucla.edu/socr/uploads/b/b8/PowerTransformFamily_Biometrica609.pdf On prediction and the power transformation family]. Biometrika 68: 609&ndash;615.
* {{cite journal | last = DeGroot| first = M. H.| title = A Conversation with George Box | journal = Statistical Science | volume = 2 | pages = 239–258 | date = 1987| doi = 10.1214/ss/1177013223}}
* Handelsman, DJ. Optimal Power Transformations for Analysis of Sperm Concentration and Other Semen Variables. Journal of Andrology, Vol. 23, No. 5, September/October 2002.
* Gluzman, S and Yukalov, VI. Self-similar power transforms in extrapolation problems. Journal of Mathematical Chemistry, Volume 39, Number 1 / January, 2006, DOI 10.1007/s10910-005-9003-7, 47&ndash;56.
* Howarth, RJ and Earle, SAM. Application of a generalized power transformation to geochemical data Journal    Mathematical Geology, Volume 11, Number 1 / February, 1979, DOI    10.1007/BF01043245, pages 45&ndash;62.
* Peters, JL Rushton, L, Sutton, AJ, Jones, DR, Abrams, KR, Mugglestone, MA. (2005) Bayesian methods for the cross-design synthesis of epidemiological and toxicological evidence. [[Journal of the Royal Statistical Society]]: Series C (Applied Statistics) 54 (1), 159–172, doi:10.1111/j.1467-9876.2005.00476.x
 
==External links==
* [http://wiki.stat.ucla.edu/socr/index.php/SOCR_EduMaterials_Activities_PowerTransformFamily_Graphs SOCR Power Transform Activities and Applets]
* [http://www.stat.uconn.edu/~studentjournal/index_files/pengfi_s05.pdf Box&ndash;Cox Transformation: An Overview, Pengfei Li]
 
 
[[Category:Statistics]]
 
 
{{SIB}}
 
[[de:Box-Cox-Transformation]]
[[eu:Box-Cox aldakuntza]]
[[pl:Przekształcenie Boxa-Coxa]]
 
{{WH}}
{{WS}}

Latest revision as of 18:14, 20 May 2009

Redirect to: