Hotelling's T-square distribution

Jump to: navigation, search

In statistics, Hotelling's T-square statistic,[1] named for Harold Hotelling, is a generalization of Student's t statistic that is used in multivariate hypothesis testing.

Hotelling's T-square statistic is defined as

$t^2=n({\mathbf x}-{\mathbf\mu})'{\mathbf W}^{-1}({\mathbf x}-{\mathbf\mu})$ where n is a number of points (see below), ${\mathbf x}$ is a column vector of $p$ elements and ${\mathbf W}$ is a $p\times p$ matrix.

If $x\sim N_p(\mu,{\mathbf V})$ is a random variable with a multivariate Gaussian distribution and ${\mathbf W}\sim W_p(m,{\mathbf V})$ (independent of x) has a Wishart distribution with the same non-singular variance matrix $\mathbf V$ and with $m=n-1$, then the distribution of $t^2$ is $T^2(p,m)$, Hotelling's T-square distribution with parameters p and m. It can be shown that

$\frac{m-p+1}{pm} T^2\sim F_{p,m-p+1}$ where $F$ is the F-distribution.

Now suppose that

${\mathbf x}_1,\dots,{\mathbf x}_n$

are p×1 column vectors whose entries are real numbers. Let

$\overline{\mathbf x}=(\mathbf{x}_1+\cdots+\mathbf{x}_n)/n$

be their mean. Let the p×p positive-definite matrix

${\mathbf W}=\sum_{i=1}^n (\mathbf{x}_i-\overline{\mathbf x})(\mathbf{x}_i-\overline{\mathbf x})'/(n-1)$

be their "sample variance" matrix. (The transpose of any matrix M is denoted above by M′). Let μ be some known p×1 column vector (in applications a hypothesized value of a population mean). Then Hotelling's T-square statistic is

$t^2=n(\overline{\mathbf x}-{\mathbf\mu})'{\mathbf W}^{-1}(\overline{\mathbf x}-{\mathbf\mu}).$

Note that $t^2$ is closely related to the squared Mahalanobis distance.

In particular, it can be shown [2] that if ${\mathbf x}_1,\dots,{\mathbf x}_n\sim N_p(\mu,{\mathbf V})$, are independent, and $\overline{\mathbf x}$ and ${\mathbf W}$ are as defined above then ${\mathbf W}$ has a Wishart distribution with n − 1 degrees of freedom

$\mathbf{W} \sim W_p(V,n-1)$.

and is independent of $\overline{\mathbf x}$, and

$\overline{\mathbf x}\sim N_p(\mu,V/n)$

This implies that:

$t^2 = n(\overline{\mathbf x}-{\mathbf\mu})'{\mathbf W}^{-1}(\overline{\mathbf x}-{\mathbf\mu}) \sim T^2(p, n-1).$

Hotelling's two-sample T-square statistic

If ${\mathbf x}_1,\dots,{\mathbf x}_{n_x}\sim N_p(\mu,{\mathbf V})$ and ${\mathbf y}_1,\dots,{\mathbf y}_{n_y}\sim N_p(\mu,{\mathbf V})$, with the samples independently drawn from two independent multivariate normal distributions with the same mean and covariance, and we define

$\overline{\mathbf x}=\frac{1}{n_x}\sum_{i=1}^{n_x} \mathbf{x}_i \qquad \overline{\mathbf y}=\frac{1}{n_y}\sum_{i=1}^{n_y} \mathbf{y}_i$

as the sample means, and

${\mathbf W}= \frac{\sum_{i=1}^{n_x}(\mathbf{x}_i-\overline{\mathbf x})(\mathbf{x}_i-\overline{\mathbf x})' +\sum_{i=1}^{n_y}(\mathbf{y}_i-\overline{\mathbf y})(\mathbf{y}_i-\overline{\mathbf y})'}{n_x+n_y-2}$ as the unbiased pooled covariance matrix estimate, then Hotelling's two-sample T-square statistic is

$t^2 = \frac{n_x n_y}{n_x+n_y}(\overline{\mathbf x}-\overline{\mathbf y})'{\mathbf W}^{-1}(\overline{\mathbf x}-\overline{\mathbf y}) \sim T^2(p, n_x+n_y-2)$

and it can be related to the F-distribution by

$\frac{n_x+n_y-p-1}{(n_x+n_y-2)p}t^2 \sim F(p,n_x+n_y-1-p).$[2]

References

1. H. Hotelling (1931) The generalization of Student's ratio, Ann. Math. Statist., Vol. 2, pp360-378.
2. 2.0 2.1 K.V. Mardia, J.T. Kent, and J.M. Bibby (1979) Multivariate Analysis, Academic Press.
it:Variabile casuale T-quadrato di Hotelling