Ball divergence

Ball Divergence (BD) is a nonparametric two‐sample statistic that quantifies the discrepancy between two probability measures μ {\displaystyle \mu } and ν {\displaystyle \nu } on a metric space ( V , ρ ) {\displaystyle (V,\rho )}. It is defined by integrating the squared difference of the measures over all closed balls in V {\displaystyle V}. Let B ¯ ( u , r ) = { w ∈ V ∣ ρ ( u , w ) ≤ r } {\displaystyle {\overline {B}}(u,r)=\{w\in V\mid \rho (u,w)\leq r\}} be the closed ball of radius r ≥ 0 {\displaystyle r\geq 0} centered at u ∈ V {\displaystyle u\in V}. Equivalently, one may set r = ρ ( u , v ) {\displaystyle r=\rho (u,v)} and write B ¯ ( u , ρ ( u , v ) ) {\displaystyle {\overline {B}}{\bigl (}u,\rho (u,v){\bigr )}}. The Ball divergence is then defined by B D ( μ , ν ) = ∬ V × V [ μ ( B ¯ ( u , ρ ( u , v ) ) ) − ν ( B ¯ ( u , ρ ( u , v ) ) ) ] 2 [ μ ( d u ) μ ( d v ) + ν ( d u ) ν ( d v ) ] . {\displaystyle BD(\mu ,\nu )=\iint _{V\times V}{\bigl [}\mu ({\overline {B}}(u,\rho (u,v)))-\nu ({\overline {B}}(u,\rho (u,v))){\bigr ]}^{2}\;{\bigl [}\mu (du)\,\mu (dv)+\nu (du)\,\nu (dv){\bigr ]}.} This measure can be seen as an integral of the Harald Cramér's distance over all possible pairs of points. By summing squared differences of μ {\displaystyle \mu } and ν {\displaystyle \nu } over balls of all scales, BD captures both global and local discrepancies between distributions, yielding a robust, scale-sensitive comparison. Moreover, since BD is defined as the integral of a squared measure difference, it is always non-negative, and B D ( μ , ν ) = 0 {\displaystyle BD(\mu ,\nu )=0} if and only if μ = ν {\displaystyle \mu =\nu }.

Testing for equal distributions

Next, we will try to give a sample version of Ball Divergence. For convenience, we can decompose the Ball Divergence into two parts: A = ∬ V × V [ μ − ν ] 2 ( B ¯ ( u , ρ ( u , v ) ) ) μ ( d u ) μ ( d v ) , {\displaystyle A=\iint _{V\times V}[\mu -\nu ]^{2}({\bar {B}}(u,\rho (u,v)))\mu (du)\mu (dv),} and C = ∬ V × V [ μ − ν ] 2 ( B ¯ ( u , ρ ( u , v ) ) ) ν ( d u ) ν ( d v ) . {\displaystyle C=\iint _{V\times V}[\mu -\nu ]^{2}({\bar {B}}(u,\rho (u,v)))\nu (du)\nu (dv).} Thus B D ( μ , ν ) = A + C . {\displaystyle BD(\mu ,\nu )=A+C.}

Let δ ( x , y , z ) = I ( z ∈ B ¯ ( x , ρ ( x , y ) ) ) {\displaystyle \delta (x,y,z)=I(z\in {\bar {B}}(x,\rho (x,y)))} denote whether point z {\displaystyle z} locates in the ball B ¯ ( x , ρ ( x , y ) ) {\displaystyle {\bar {B}}(x,\rho (x,y))}. Given two independent samples { X 1 , … , X n } {\displaystyle \{X_{1},\ldots ,X_{n}\}} form μ {\displaystyle \mu } and { Y 1 , … , Y m } {\displaystyle \{Y_{1},\ldots ,Y_{m}\}} form ν {\displaystyle \nu }

A i j X = 1 n ∑ u = 1 n δ ( X i , X j , X u ) , A i j Y = 1 m ∑ v = 1 m δ ( X i , X j , Y v ) , C k l X = 1 n ∑ u = 1 n δ ( Y k , Y l , X u ) , C i j Y = 1 m ∑ v = 1 m δ ( Y k , Y l , Y v ) , {\displaystyle {\begin{aligned}A_{ij}^{X}&={\frac {1}{n}}\sum _{u=1}^{n}\delta {\left(X_{i},X_{j},X_{u}\right)},&A_{ij}^{Y}&={\frac {1}{m}}\sum _{v=1}^{m}\delta {\left(X_{i},X_{j},Y_{v}\right)},\\C_{kl}^{X}&={\frac {1}{n}}\sum _{u=1}^{n}\delta {\left(Y_{k},Y_{l},X_{u}\right)},&C_{ij}^{Y}&={\frac {1}{m}}\sum _{v=1}^{m}\delta {\left(Y_{k},Y_{l},Y_{v}\right)},\end{aligned}}} where A i j X {\displaystyle A_{ij}^{X}} means the proportion of samples from the probability measure μ {\displaystyle \mu } located in the ball B ¯ ( X i , ρ ( X i , X j ) ) {\displaystyle {\bar {B}}\left(X_{i},\rho \left(X_{i},X_{j}\right)\right)} and A i j Y {\displaystyle A_{ij}^{Y}} means the proportion of samples from the probability measure ν {\displaystyle \nu } located in the ball B ¯ ( X i , ρ ( X i , X j ) ) {\displaystyle {\bar {B}}\left(X_{i},\rho \left(X_{i},X_{j}\right)\right)}. Meanwhile, C i j X {\displaystyle C_{ij}^{X}} and C i j Y {\displaystyle C_{ij}^{Y}} means the proportion of samples from the probability measure μ {\displaystyle \mu } and ν {\displaystyle \nu } located in the ball B ¯ ( Y i , ρ ( Y i , Y j ) ) {\displaystyle {\bar {B}}\left(Y_{i},\rho \left(Y_{i},Y_{j}\right)\right)}. The sample versions of A {\displaystyle A} and C {\displaystyle C} are as follows

A n , m = 1 n 2 ∑ i , j = 1 n ( A i j X − A i j Y ) 2 , C n , m = 1 m 2 ∑ k , l = 1 m ( C k l X − C k l Y ) 2 . {\displaystyle A_{n,m}={\frac {1}{n^{2}}}\sum _{i,j=1}^{n}\left(A_{ij}^{X}-A_{ij}^{Y}\right)^{2},\qquad C_{n,m}={\frac {1}{m^{2}}}\sum _{k,l=1}^{m}\left(C_{kl}^{X}-C_{kl}^{Y}\right)^{2}.}

Finally, we can give the sample ball divergence

B D n , m = A n , m + C n , m . {\displaystyle BD_{n,m}=A_{n,m}+C_{n,m}.}

It can be proved that B D n , m {\displaystyle BD_{n,m}} is a consistent estimator of BD. Moreover, if n n + m → τ {\textstyle {\tfrac {n}{n+m}}\to \tau } for some τ ∈ [ 0 , 1 ] {\displaystyle \tau \in [0,1]}, then under the null hypothesis B D n , m {\displaystyle BD_{n,m}} converges in distribution to a mixture of chi-squared distributions, whereas under the alternative hypothesis it converges to a normal distribution.

Properties

The square root of Ball Divergence is a symmetric divergence but not a metric, because it does not satisfy the triangle inequality.
It can be shown that Ball divergence, energy distance test, and MMD are unified within the variogram framework; for details see Remark 2.4 in.

Homogeneity Test

Ball divergence admits a straightforward extension to the K-sample setting. Suppose μ 1 , … , μ K {\displaystyle \mu _{1},\dots ,\mu _{K}} are K ( ≥ 2 ) {\displaystyle K(\geq 2)} probability measures on a Banach space ( V , ‖ ⋅ ‖ ) {\displaystyle (V,\|\cdot \|)}. Define the K-sample BD by

D ( μ 1 , … , μ K ) = ∑ 1 ≤ k < l ≤ K ∬ V × V [ μ k ( B ¯ ( u , ρ ( u , v ) ) ) − μ l ( B ¯ ( u , ρ ( u , v ) ) ) ] 2 [ μ k ( d u ) μ k ( d v ) + μ l ( d u ) μ l ( d v ) ] . {\displaystyle D(\mu _{1},\dots ,\mu _{K})=\sum _{1\leq k<l\leq K}\iint _{V\times V}{\bigl [}\mu _{k}{\bigl (}{\overline {B}}(u,\rho (u,v)){\bigr )}-\mu _{l}{\bigl (}{\overline {B}}(u,\rho (u,v)){\bigr )}{\bigr ]}^{2}\;{\bigl [}\mu _{k}(du)\,\mu _{k}(dv)+\mu _{l}(du)\,\mu _{l}(dv){\bigr ]}.}

It then follows from Theorems 1 and 2 that D ( μ 1 , … , μ K ) = 0 {\displaystyle D(\mu _{1},\dots ,\mu _{K})=0} if and only if μ 1 = μ 2 = ⋯ = μ K . {\displaystyle \mu _{1}=\mu _{2}=\cdots =\mu _{K}.}

By employing closed balls to define a metric distribution function, one obtains an alternative homogeneity measure.

Given a probability measure μ ~ {\displaystyle {\tilde {\mu }}} on a metric space ( V , ρ ) {\displaystyle (V,\rho )}, its metric distribution function is defined by

F μ ~ M ( u , v ) = μ ~ ( B ¯ ( u , ρ ( u , v ) ) ) = E [ δ ( u , v , X ) ] , u , v ∈ V , {\displaystyle F_{\tilde {\mu }}^{M}(u,v)={\tilde {\mu }}{\bigl (}{\overline {B}}(u,\rho (u,v)){\bigr )}=\mathbb {E} {\bigl [}\delta (u,v,X){\bigr ]},\quad u,v\in V,}

where B ¯ ( u , r ) = { w ∈ V : d ( u , w ) ≤ r } {\displaystyle {\overline {B}}(u,r)=\{w\in V:d(u,w)\leq r\}} is the closed ball of radius r ≥ 0 {\displaystyle r\geq 0} centered at u {\displaystyle u}, and δ ( u , v , X ) = ∏ k = 1 K 1 { X ( k ) ∈ B ¯ k ( u k , ρ k ( u k , v k ) ) } . {\displaystyle \delta (u,v,X)=\prod _{k=1}^{K}\mathbf {1} \{X^{(k)}\in {\overline {B}}_{k}(u_{k},\rho _{k}(u_{k},v_{k}))\}.}

If ( X 1 , … , X N ) {\displaystyle (X_{1},\dots ,X_{N})} are i.i.d. draws from ( μ ~ ) {\displaystyle ({\tilde {\mu }})}, the empirical version is

F μ ~ , N M ( u , v ) = 1 N ∑ i = 1 N δ ( u , v , X i ) . {\displaystyle F_{{\tilde {\mu }},N}^{M}(u,v)={\frac {1}{N}}\sum _{i=1}^{N}\delta (u,v,X_{i}).}

Based on these, the homogeneity measure based on MDF, also called metric Cramér-von Mises (MCVM) is M C V M ( μ k ∥ μ ) = ∫ V × V p k 2 w ( u , v ) [ F μ k M ( u , v ) − F μ M ( u , v ) ] 2 d μ k ( u ) d μ k ( v ) , {\displaystyle \mathrm {MCVM} {\bigl (}\mu _{k}\parallel \mu {\bigr )}=\int _{V\times V}p_{k}^{2}\,w(u,v)\,{\bigl [}F_{\mu _{k}}^{M}(u,v)-F_{\mu }^{M}(u,v){\bigr ]}^{2}\,d\mu _{k}(u)\,d\mu _{k}(v),}

where μ = ∑ k = 1 K p k μ k {\textstyle \mu =\sum _{k=1}^{K}p_{k}\,\mu _{k}} be their mixture with weights p 1 , … , p K {\displaystyle p_{1},\dots ,p_{K}}, and w ( u , v ) = exp ⁡ ( − d ( u , v ) 2 2 σ 2 ) {\textstyle w(u,v)=\exp \left(-{\tfrac {d(u,v)^{2}}{2\sigma ^{2}}}\right)}. The overall MCVM is then

M C V M ( μ 1 , … , μ K ) = ∑ k = 1 K p k 2 M C V M ( μ k ∥ μ ) . {\displaystyle \mathrm {MCVM} (\mu _{1},\dots ,\mu _{K})=\sum _{k=1}^{K}p_{k}^{2}\,\mathrm {MCVM} {\bigl (}\mu _{k}\parallel \mu {\bigr )}.}

The empirical MCVM is given by

M C V M ^ ( μ k ∥ μ ) = 1 n k 2 ∑ X i ( k ) , X j ( k ) ∈ X k w ( X i ( k ) , X j ( k ) ) [ F μ k , n k M ( X i ( k ) , X j ( k ) ) − F μ , n M ( X i ( k ) , X j ( k ) ) ] 2 . {\displaystyle {\widehat {\mathrm {MCVM} }}{\bigl (}\mu _{k}\parallel \mu {\bigr )}={\frac {1}{n_{k}^{2}}}\sum _{X_{i}^{(k)},X_{j}^{(k)}\in {\mathcal {X}}_{k}}w{\bigl (}X_{i}^{(k)},X_{j}^{(k)}{\bigr )}\,\left[F_{\mu _{k},n_{k}}^{M}{\bigl (}X_{i}^{(k)},X_{j}^{(k)}{\bigr )}-F_{\mu ,n}^{M}{\bigl (}X_{i}^{(k)},X_{j}^{(k)}{\bigr )}\right]^{2}.}

where X k = { X 1 ( k ) , … , X n k ( k ) } {\displaystyle {\mathcal {X}}_{k}=\{X_{1}^{(k)},\dots ,X_{n_{k}}^{(k)}\}} be an i.i.d. sample from μ k {\displaystyle \mu _{k}}, and p ^ k = n k ∑ ℓ = 1 K n ℓ . {\displaystyle {\hat {p}}_{k}={\frac {n_{k}}{\sum _{\ell =1}^{K}n_{\ell }}}.} A practical choice for σ 2 {\displaystyle \sigma ^{2}} is the median of the squared distances { d ( X , X ′ ) 2 : X , X ′ ∈ ⋃ k = 1 K X k } . {\displaystyle \left\{d(X,X')^{2}:X,X'\in \bigcup _{k=1}^{K}{\mathcal {X}}_{k}\right\}.}