# Yule-Simon distribution

 Parameters Probability mass functionPlot of the Yule-Simon PMFYule-Simon PMF on a log-log scale. (Note that the function is only defined at integer values of k. The connecting lines do not indicate continuity.) Cumulative distribution functionPlot of the Yule-Simon CMFYule-Simon CMF. (Note that the function is only defined at integer values of k. The connecting lines do not indicate continuity.) $\rho>0\,$ shape (real) $k \in \{1,2,\dots\}\,$ $\rho\,\mathrm{B}(k, \rho+1)\,$ $1 - k\,\mathrm{B}(k, \rho+1)\,$ $\frac{\rho}{\rho-1}\,$ for $\rho>1\,$ $1\,$ $\frac{\rho^2}{(\rho-1)^2\;(\rho-2)}\,$ for $\rho>2\,$ $\frac{(\rho+1)^2\;\sqrt{\rho-2 {{{kurtosis}}} {{{entropy}}} {{{mgf}}} {{{char}}} {(\rho-3)\;\rho}\,$ for $\rho>3\,$|
 kurtosis   =$\rho+3+\frac{11\rho^3-49\rho-22} {(\rho-4)\;(\rho-3)\;\rho}\,$ for $\rho>4\,$|
entropy    =|
mgf        =$\frac{\rho}{\rho+1}\;{}_2F_1(1,1; \rho+2; e^t)\,e^t \,$|
char       =$\frac{\rho}{\rho+1}\;{}_2F_1(1,1; \rho+2; e^{i\,t})\,e^{i\,t} \,$|


}} In probability and statistics, the Yule-Simon distribution is a discrete probability distribution named after Udny Yule and Herbert Simon. Simon originally called it the Yule distribution.

The probability mass function of the Yule-Simon(ρ) distribution is

$f(k;\rho) = \rho\,\mathrm{B}(k, \rho+1), \,$

for integer $k \geq 1$ and real $\rho > 0$, where $\mathrm{B}$ is the beta function. Equivalently the pmf can be written in terms of the falling factorial as

$f(k;\rho) = \frac{\rho\,\Gamma(\rho+1)}{(k+\rho)^{\underline{\rho+1}}} ,  \,$

where $\Gamma$ is the gamma function. Thus, if $\rho$ is an integer,

$f(k;\rho) = \frac{\rho\,\rho!\,(k-1)!}{(k+\rho)!} .  \,$

The probability mass function f has the property that for sufficiently large k we have

$f(k;\rho) \approx \frac{\rho\,\Gamma(\rho+1)}{k^{\rho+1}} \propto \frac{1}{k^{\rho+1}} .  \,$

This means that the tail of the Yule-Simon distribution is a realization of Zipf's law: $f(k;\rho)$ can be used to model, for example, the relative frequency of the $k$th most frequent word in a large collection of text, which according to Zipf's law is inversely proportional to a (typically small) power of $k$.

## Occurrence

The Yule-Simon distribution arises as a continuous mixture of geometric distributions. Specifically, assume that $W$ follows an exponential distribution with scale $1/\rho$ or rate $\rho$:

$W \sim \mathrm{Exponential}(\rho)\,$
$h(w;\rho) = \rho \, \exp(-\rho\,w)\,$

Then a Yule-Simon distributed variable $K$ has the following geometric distribution:

$K \sim \mathrm{Geometric}(\exp(-W))\,$

The pmf of a geometric distribution is

$g(k; p) = p \, (1-p)^{k-1}\,$

for $k\in\{1,2,\dots\}$. The Yule-Simon pmf is then the following exponential-geometric mixture distribution:

$f(k;\rho) = \int_0^{\infty} \,\,\, g(k;\exp(-w))\,h(w;\rho)\,dw  \,$

## Generalizations

The two-parameter generalization of the original Yule distribution replaces the beta function with an incomplete beta function. The probability mass function of the generalized Yule-Simon(ρ, α) distribution is defined as

$f(k;\rho,\alpha) = \frac{\rho}{1-\alpha^{\rho}} \; \mathrm{B}_{1-\alpha}(k, \rho+1) , \,$


with $0 \leq \alpha < 1$. For $\alpha = 0$ the ordinary Yule-Simon(ρ) distribution is obtained as a special case. The use of the incomplete beta function has the effect of introducing an exponential cutoff in the upper tail.

File:Yule-Simon distribution.png
Plot of the Yule-Simon(1) distribution (red) and its asymptotic Zipf law (blue)

## References

• Herbert A. Simon, On a Class of Skew Distribution Functions, Biometrika 42(3/4): 425–440, December 1955.
• Colin Rose and Murray D. Smith, Mathematical Statistics with Mathematica. New York: Springer, 2002, ISBN 0-387-95234-9. (See page 107, where it is called the "Yule distribution".)