In population genetics, Ewens's sampling formula describes the probabilities associated with counts of how many different alleles are observed a given number of times in the sample.

Definition

Ewens's sampling formula, introduced by Warren Ewens, states that under certain conditions (specified below), if a random sample of n gametes is taken from a population and classified according to the gene at a particular locus then the probability that there are a1 alleles represented once in the sample, and a2 alleles represented twice, and so on, is

Pr ⁡ ( a 1 , … , a n ; θ ) = n ! θ ( θ + 1 ) ⋯ ( θ + n − 1 ) ∏ j = 1 n θ a j j a j a j ! , {\displaystyle \operatorname {Pr} (a_{1},\dots ,a_{n};\theta )={n! \over \theta (\theta +1)\cdots (\theta +n-1)}\prod _{j=1}^{n}{\theta ^{a_{j}} \over j^{a_{j}}a_{j}!},}

for some positive number θ representing the population mutation rate, whenever a 1 , … , a n {\displaystyle a_{1},\ldots ,a_{n}} is a sequence of nonnegative integers such that

a 1 + 2 a 2 + 3 a 3 + ⋯ + n a n = ∑ i = 1 n i a i = n . {\displaystyle a_{1}+2a_{2}+3a_{3}+\cdots +na_{n}=\sum _{i=1}^{n}ia_{i}=n.\,}

The phrase "under certain conditions" used above is made precise by the following assumptions:

  • The sample size n is small by comparison to the size of the whole population; and
  • The population is in statistical equilibrium under mutation and genetic drift and the role of selection at the locus in question is negligible; and
  • Every mutant allele is novel.

This is a probability distribution on the set of all partitions of the integer n. Among probabilists and statisticians it is often called the multivariate Ewens distribution.

Mathematical properties

When θ = 0, the probability is 1 that all n genes are the same. When θ = 1, then the distribution is precisely that of the integer partition induced by a uniformly distributed random permutation. As θ → ∞, the probability that no two of the n genes are the same approaches 1.

This family of probability distributions enjoys the property that if after the sample of n is taken, m of the n gametes are chosen without replacement, then the resulting probability distribution on the set of all partitions of the smaller integer m is just what the formula above would give if m were put in place of n.

The Ewens distribution arises naturally from the Chinese restaurant process.

See also

Notes

  • Ewens, Warren (1972). "The sampling theory of selectively neutral alleles". Theoretical Population Biology. 3: 87–112. doi:.
  • H. Crane. (2016) "", Statistical Science, 31:1 (Feb 2016). This article introduces a series of seven articles about Ewens Sampling in a special issue of the journal.
  • Kingman, J. F. C. (1978). "Random partitions in population genetics". Proceedings of the Royal Society of London. Series B, Mathematical and Physical Sciences. 361 (1704). doi:.
  • Tavare, S.; Ewens, W. J. (1997). "The Multivariate Ewens distribution". In Johnson, N. L.; Kotz, S.; Balakrishnan, N. (eds.). Discrete Multivariate Distributions. Wiley. pp. 232–246. ISBN 0-471-12844-9.