Probability-proportional-to-size sampling

In survey methodology, probability-proportional-to-size (pps) sampling is a sampling process where each element of the population (of size N) has some (independent) chance p i {\displaystyle p_{i}} to be selected to the sample when performing one draw. This p i {\displaystyle p_{i}} is proportional to some known quantity x i {\displaystyle x_{i}} so that p i = x i ∑ i = 1 N x i {\displaystyle p_{i}={\frac {x_{i}}{\sum _{i=1}^{N}x_{i}}}}.

One of the cases this occurs in, as developed by Hanson and Hurwitz in 1943, is when we have several clusters of units, each with a different (known upfront) number of units, then each cluster can be selected with a probability that is proportional to the number of units inside it. So, for example, if we have 3 clusters with 10, 20 and 30 units each, then the chance of selecting the first cluster will be 1/6, the second would be 1/3, and the third cluster will be 1/2.

The pps sampling results in a fixed sample size n (as opposed to Poisson sampling which is similar but results in a random sample size with expectancy of n). When selecting items with replacement the selection procedure is to just draw one item at a time (like getting n draws from a multinomial distribution with N elements, each with their own p i {\displaystyle p_{i}} selection probability). If doing a without-replacement sampling, the schema can become more complex.

Another sampling method, Reservoir sampling, is 'Weighted random sampling with a reservoir', which offers an algorithm for drawing a weighted random sample of size m from a population of n weighted items, where m⩽n, in one-pass over unknown population size.

Distribution and properties

If observations from some distribution F are sampled in a way that is proportional to their value, then the distribution of the values in that sample follows a Length-biased distribution, with the following density function:

g ( x ) = x f ( x ) / E [ x ] {\displaystyle g(x)=xf(x)/E[x]}

Also: E [ g ( x ) ] = E [ x 2 ] / E [ x ] {\displaystyle E[g(x)]=E[x^{2}]/E[x]}

Notice that this would assume that the PPS sampling is done with replacement (or if the sample size is much smaller than the population size).

Probability-proportional-to-size sampling

Distribution and properties

See also