Poisson-Dirichlet distribution

In probability theory, Poisson-Dirichlet distributions are probability distributions on the set of nonnegative, non-increasing sequences with sum 1, depending on two parameters α ∈ [ 0 , 1 ) {\displaystyle \alpha \in [0,1)} and θ ∈ ( − α , ∞ ) {\displaystyle \theta \in (-\alpha ,\infty )}. It can be defined as follows. One considers independent random variables ( Y n ) n ≥ 1 {\displaystyle (Y_{n})_{n\geq 1}} such that Y n {\displaystyle Y_{n}} follows the beta distribution of parameters 1 − α {\displaystyle 1-\alpha } and θ + n α {\displaystyle \theta +n\alpha }. Then, the Poisson-Dirichlet distribution P D ( α , θ ) {\displaystyle PD(\alpha ,\theta )} of parameters α {\displaystyle \alpha } and θ {\displaystyle \theta } is the law of the random decreasing sequence containing Y 1 {\displaystyle Y_{1}} and the products Y n ∏ k = 1 n − 1 ( 1 − Y k ) {\displaystyle Y_{n}\prod _{k=1}^{n-1}(1-Y_{k})}. This definition is due to Jim Pitman and Marc Yor. It generalizes Kingman's law, which corresponds to the particular case α = 0 {\displaystyle \alpha =0}.

Number theory

Patrick Billingsley has proven the following result: if n {\displaystyle n} is a uniform random integer in { 2 , 3 , … , N } {\displaystyle \{2,3,\dots ,N\}}, if k ≥ 1 {\displaystyle k\geq 1} is a fixed integer, and if p 1 ≥ p 2 ≥ ⋯ ≥ p k {\displaystyle p_{1}\geq p_{2}\geq \dots \geq p_{k}} are the k {\displaystyle k} largest prime divisors of n {\displaystyle n} (with p j {\displaystyle p_{j}} arbitrarily defined if n {\displaystyle n} has less than j {\displaystyle j} prime factors), then the joint distribution of( log ⁡ p 1 / log ⁡ n , log ⁡ p 2 / log ⁡ n , … , log ⁡ p k / log ⁡ n ) {\displaystyle (\log p_{1}/\log n,\log p_{2}/\log n,\dots ,\log p_{k}/\log n)}converges to the law of the k {\displaystyle k} first elements of a P D ( 0 , 1 ) {\displaystyle PD(0,1)} distributed random sequence, when N {\displaystyle N} goes to infinity.

Random permutations and Ewens's sampling formula

The Poisson-Dirichlet distribution of parameters α = 0 {\displaystyle \alpha =0} and θ = 1 {\displaystyle \theta =1} is also the limiting distribution, for N {\displaystyle N} going to infinity, of the sequence ( ℓ 1 / N , ℓ 2 / N , ℓ 3 / N , … ) {\displaystyle (\ell _{1}/N,\ell _{2}/N,\ell _{3}/N,\dots )}, where ℓ j {\displaystyle \ell _{j}} is the length of the j th {\displaystyle j^{\operatorname {th} }} largest cycle of a uniformly distributed permutation of order N {\displaystyle N}. If for θ > 0 {\displaystyle \theta >0}, one replaces the uniform distribution by the distribution P N , θ {\displaystyle \mathbb {P} _{N,\theta }} on S N {\displaystyle {\mathfrak {S}}_{N}} such that P N , θ ( σ ) = θ n ( σ ) θ ( θ + 1 ) … ( θ + n − 1 ) {\displaystyle \mathbb {P} _{N,\theta }(\sigma )={\frac {\theta ^{n(\sigma )}}{\theta (\theta +1)\dots (\theta +n-1)}}}, where n ( σ ) {\displaystyle n(\sigma )} is the number of cycles of the permutation σ {\displaystyle \sigma }, then we get the Poisson-Dirichlet distribution of parameters α = 0 {\displaystyle \alpha =0} and θ {\displaystyle \theta }. The probability distribution P N , θ {\displaystyle \mathbb {P} _{N,\theta }} is called Ewens's distribution, and comes from the Ewens's sampling formula, first introduced by Warren Ewens in population genetics, in order to describe the probabilities associated with counts of how many different alleles are observed a given number of times in the sample.