Discrepancy of hypergraphs

Discrepancy of hypergraphs is an area of discrepancy theory that studies the discrepancy of general set systems.

Definitions

In the classical setting, we aim at partitioning the vertices of a hypergraph H = ( V , E ) {\displaystyle {\mathcal {H}}=(V,{\mathcal {E}})} into two classes in such a way that ideally each hyperedge contains the same number of vertices in both classes. A partition into two classes can be represented by a coloring χ : V → { − 1 , + 1 } {\displaystyle \chi \colon V\rightarrow \{-1,+1\}}. We call −1 and +1 colors. The color-classes χ − 1 ( − 1 ) {\displaystyle \chi ^{-1}(-1)} and χ − 1 ( + 1 ) {\displaystyle \chi ^{-1}(+1)} form the corresponding partition. For a hyperedge E ∈ E {\displaystyle E\in {\mathcal {E}}}, set

χ ( E ) := ∑ v ∈ E χ ( v ) . {\displaystyle \chi (E):=\sum _{v\in E}\chi (v).}

The discrepancy of H {\displaystyle {\mathcal {H}}} with respect to χ {\displaystyle \chi } and the discrepancy of H {\displaystyle {\mathcal {H}}} are defined by

disc ⁡ ( H , χ ) := max E ∈ E | χ ( E ) | , {\displaystyle \operatorname {disc} ({\mathcal {H}},\chi ):=\;\max _{E\in {\mathcal {E}}}|\chi (E)|,}

disc ⁡ ( H ) := min χ : V → { − 1 , + 1 } disc ⁡ ( H , χ ) . {\displaystyle \operatorname {disc} ({\mathcal {H}}):=\min _{\chi :V\rightarrow \{-1,+1\}}\operatorname {disc} ({\mathcal {H}},\chi ).}

These notions as well as the term 'discrepancy' seem to have appeared for the first time in a paper of Beck. Earlier results on this problem include the famous lower bound on the discrepancy of arithmetic progressions by Roth and upper bounds for this problem and other results by Erdős and Spencer and Sárközi. At that time, discrepancy problems were called quasi-Ramsey problems.

Examples

To get some intuition for this concept, let's have a look at a few examples.

If all edges of H {\displaystyle {\mathcal {H}}} intersect trivially, i.e. E 1 ∩ E 2 = ∅ {\displaystyle E_{1}\cap E_{2}=\varnothing } for any two distinct edges E 1 , E 2 ∈ E {\displaystyle E_{1},E_{2}\in {\mathcal {E}}}, then the discrepancy is zero, if all edges have even cardinality, and one, if there is an odd cardinality edge.
The other extreme is marked by the complete hypergraph ( V , 2 V ) {\displaystyle (V,2^{V})}. In this case the discrepancy is ⌈ 1 2 | V | ⌉ {\displaystyle \lceil {\frac {1}{2}}|V|\rceil }. Any 2-coloring will have a color class of at least this size, and this set is also an edge. On the other hand, any coloring χ {\displaystyle \chi } with color classes of size ⌈ 1 2 | V | ⌉ {\displaystyle \lceil {\frac {1}{2}}|V|\rceil } and ⌊ 1 2 | V | ⌋ {\displaystyle \lfloor {\frac {1}{2}}|V|\rfloor } proves that the discrepancy is not larger than ⌈ 1 2 | V | ⌉ {\displaystyle \lceil {\frac {1}{2}}|V|\rceil }. It seems that the discrepancy reflects how chaotic the hyperedges of H {\displaystyle {\mathcal {H}}} intersect. Things are not that easy, however, as the following example shows.
Set n = 4 k {\displaystyle n=4k}, k ∈ N {\displaystyle k\in {\mathcal {N}}} and H n = ( [ n ] , { E ⊆ [ n ] ∣ | E ∩ [ 2 k ] | = | E ∖ [ 2 k ] | } ) {\displaystyle {\mathcal {H}}_{n}=([n],\{E\subseteq [n]\mid |E\cap [2k]|=|E\setminus [2k]|\})}. In words, H n {\displaystyle {\mathcal {H}}_{n}} is the hypergraph on 4k vertices {1,...,4k}, whose edges are all subsets that have the same number of elements in {1,...,2k} as in {2k+1,...,4k}. Now H n {\displaystyle {\mathcal {H}}_{n}} has many (more than ( n / 2 n / 4 ) 2 = Θ ( 1 n 2 n ) {\displaystyle {\binom {n/2}{n/4}}^{2}=\Theta ({\frac {1}{n}}2^{n})}) complicatedly intersecting edges. However, its discrepancy is zero, since we can color {1,...,2k} in one color and {2k+1,...,4k} in another color.

The last example shows that we cannot expect to determine the discrepancy by looking at a single parameter like the number of hyperedges. Still, the size of the hypergraph yields first upper bounds.

General hypergraphs

1. For any hypergraph H {\displaystyle {\mathcal {H}}} with n vertices and m edges:

disc ⁡ ( H ) ≤ 2 n ln ⁡ ( 2 m ) . {\displaystyle \operatorname {disc} ({\mathcal {H}})\leq {\sqrt {2n\ln(2m)}}.}

The proof is a simple application of the probabilistic method. Let χ : V → { − 1 , 1 } {\displaystyle \chi :V\rightarrow \{-1,1\}} be a random coloring, i.e. we have

Pr ( χ ( v ) = − 1 ) = Pr ( χ ( v ) = 1 ) = 1 2 {\displaystyle \Pr(\chi (v)=-1)=\Pr(\chi (v)=1)={\frac {1}{2}}}

independently for all v ∈ V {\displaystyle v\in V}. Since χ ( E ) = ∑ v ∈ E χ ( v ) {\displaystyle \chi (E)=\sum _{v\in E}\chi (v)} is a sum of independent −1, 1 random variables. So we have Pr ( | χ ( E ) | > λ ) < 2 exp ⁡ ( − λ 2 / ( 2 n ) ) {\displaystyle \Pr(|\chi (E)|>\lambda )<2\exp(-\lambda ^{2}/(2n))} for all E ⊆ V {\displaystyle E\subseteq V} and λ ≥ 0 {\displaystyle \lambda \geq 0}. Taking λ = 2 n ln ⁡ ( 2 m ) {\displaystyle \lambda ={\sqrt {2n\ln(2m)}}} gives

Pr ( disc ⁡ ( H , χ ) > λ ) ≤ ∑ E ∈ E Pr ( | χ ( E ) | > λ ) < 1. {\displaystyle \Pr(\operatorname {disc} ({\mathcal {H}},\chi )>\lambda )\leq \sum _{E\in {\mathcal {E}}}\Pr(|\chi (E)|>\lambda )<1.}

Since a random coloring with positive probability has discrepancy at most λ {\displaystyle \lambda }, in particular, there are colorings that have discrepancy at most λ {\displaystyle \lambda }. Hence disc ⁡ ( H ) ≤ λ . ◻ {\displaystyle \operatorname {disc} ({\mathcal {H}})\leq \lambda .\ \Box }

2. For any hypergraph H {\displaystyle {\mathcal {H}}}with n vertices and m edges such that m ≥ n {\displaystyle m\geq n}:

disc ⁡ ( H ) ∈ O ( n ) . {\displaystyle \operatorname {disc} ({\mathcal {H}})\in O({\sqrt {n}}).}

To prove this, a much more sophisticated approach using the entropy function was necessary. Of course this is particularly interesting for m = O ( n ) {\displaystyle m=O(n)}. In the case m = n {\displaystyle m=n}, disc ⁡ ( H ) ≤ 6 n {\displaystyle \operatorname {disc} ({\mathcal {H}})\leq 6{\sqrt {n}}} can be shown for n large enough. Therefore, this result is usually known to as 'Six Standard Deviations Suffice'. It is considered to be one of the milestones of discrepancy theory. The entropy method has seen numerous other applications, e.g. in the proof of the tight upper bound for the arithmetic progressions of Matoušek and Spencer or the upper bound in terms of the primal shatter function due to Matoušek.

Hypergraphs of bounded degree

Better discrepancy bounds can be attained when the hypergraph has a bounded degree, that is, each vertex of H {\displaystyle {\mathcal {H}}} is contained in at most t edges, for some small t. In particular:

Beck and Fiala proved that disc ⁡ ( H ) < 2 t {\displaystyle \operatorname {disc} ({\mathcal {H}})<2t}; this is known as the Beck–Fiala theorem. They conjectured that disc ⁡ ( H ) = O ( t ) {\displaystyle \operatorname {disc} ({\mathcal {H}})=O({\sqrt {t}})}.
Bednarchak and Helm and Helm improved the Beck-Fiala bound in tiny steps to disc ⁡ ( H ) ≤ 2 t − 3 {\displaystyle \operatorname {disc} ({\mathcal {H}})\leq 2t-3} (for a slightly restricted situation, i.e. t ≥ 3 {\displaystyle t\geq 3}).
Bukh improved this in 2016 to 2 t − log ∗ ⁡ t {\displaystyle 2t-\log ^{*}t}, where log ∗ ⁡ t {\displaystyle \log ^{*}t} denotes the iterated logarithm.
A corollary of Beck's paper – the first time the notion of discrepancy explicitly appeared – shows disc ⁡ ( H ) ≤ C t log ⁡ m log ⁡ n {\displaystyle \operatorname {disc} ({\mathcal {H}})\leq C{\sqrt {t\log m}}\log n} for some constant C.
The latest improvement in this direction is due to Banaszczyk: disc ⁡ ( H ) = O ( t log ⁡ n ) {\displaystyle \operatorname {disc} ({\mathcal {H}})=O({\sqrt {t\log n}})}.

Special hypergraphs

Better bounds on the discrepancy are possible for hypergraphs with a special structure, such as:

Discrepancy of permutations - when the vertices are the integers 1,...,n, and the hyperedges are all the intervals of some m given permutations on the integers.
Geometric discrepancy - when the vertices are points in a Euclidean space, and the hyperedges are geometric objects, such as rectangles or half-spaces.

Arithmetic progressions (Roth, Sárközy, Beck, Matoušek & Spencer)
Six Standard Deviations Suffice (Spencer)

Major open problems

Komlós Conjecture

Applications

Numerical Integration: Monte Carlo methods in high dimensions.
Computational Geometry: Divide and conquer algorithms.
Image Processing: Halftoning

Notes

Beck, József; Chen, William W. L. (2009). Irregularities of Distribution. Cambridge University Press. ISBN 978-0-521-09300-2.
Chazelle, Bernard (2000). . Cambridge University Press. ISBN 0-521-77093-9.
Doerr, Benjamin (2005). (PDF) (Habilitation thesis). University of Kiel. OCLC .
Matoušek, Jiří (1999). Geometric Discrepancy: An Illustrated Guide. Springer. ISBN 3-540-65528-X.