Continuous Bernoulli distribution

Parameters λ = 1 / ( 1 + e − θ ) ∈ ( 0 , 1 ) {\displaystyle \lambda =1/(1+e^{-\theta })\in (0,1)} θ ∈ R {\displaystyle \theta \in \mathbb {R} }, natural parameter

Support x ∈ [ 0 , 1 ] {\displaystyle x\in [0,1]} x ∈ [ 0 , 1 ] {\displaystyle x\in [0,1]}

PDF C ( λ ) λ x ( 1 − λ ) 1 − x {\displaystyle C(\lambda )\lambda ^{x}(1-\lambda )^{1-x}\!} where C ( λ ) = { 2 if λ = 1 2 2 tanh − 1 ⁡ ( 1 − 2 λ ) 1 − 2 λ otherwise {\displaystyle C(\lambda )={\begin{cases}2&{\text{if }}\lambda ={\frac {1}{2}}\\{\frac {2\tanh ^{-1}(1-2\lambda )}{1-2\lambda }}&{\text{ otherwise}}\end{cases}}} f ( x ∣ θ ) = { 1 θ = 0 exp ⁡ ( x θ − log ⁡ { ( e θ − 1 ) / θ } ) θ ≠ 0 {\displaystyle f(x\mid \theta )={\begin{cases}1&\theta =0\\\exp(x\theta -\log\{(e^{\theta }-1)/\theta \})&\theta \neq 0\end{cases}}}

CDF F ( x ∣ λ ) = { x , λ = 1 2 λ x ( 1 − λ ) 1 − x + λ − 1 2 λ − 1 , otherwise {\displaystyle F(x\mid \lambda )={\begin{cases}x,&\lambda ={\tfrac {1}{2}}\\[6pt]{\dfrac {\lambda ^{x}(1-\lambda )^{1-x}+\lambda -1}{2\lambda -1}},&{\text{otherwise}}\end{cases}}} F ( x ∣ θ ) = { x θ = 0 ( e θ x − 1 ) / ( e θ − 1 ) θ ≠ 0 {\displaystyle F(x\mid \theta )={\begin{cases}x&\theta =0\\(e^{\theta x}-1)/(e^{\theta }-1)&\theta \neq 0\end{cases}}}

Mean E ⁡ [ X ] = { 1 2 λ = 1 2 λ 2 λ − 1 + 1 2 tanh − 1 ⁡ ( 1 − 2 λ ) , otherwise {\displaystyle \operatorname {E} [X]={\begin{cases}{\tfrac {1}{2}}&\lambda ={\tfrac {1}{2}}\\[6pt]{\dfrac {\lambda }{2\lambda -1}}+{\dfrac {1}{2\tanh ^{-1}(1-2\lambda )}},&{\text{otherwise}}\end{cases}}} E ⁡ [ X ] = { 1 / 2 θ = 0 e θ / ( e θ − 1 ) − θ − 1 θ ≠ 0 {\displaystyle \operatorname {E} [X]={\begin{cases}1/2&\theta =0\\e^{\theta }/(e^{\theta }-1)-\theta ^{-1}&\theta \neq 0\end{cases}}}

Variance Var ⁡ ( X ) = { 1 12 , λ = 1 2 − λ ( 1 − λ ) ( 1 − 2 λ ) 2 + 1 ( 2 tanh − 1 ⁡ ( 1 − 2 λ ) ) 2 , otherwise {\displaystyle \operatorname {Var} (X)={\begin{cases}{\tfrac {1}{12}},&\lambda ={\tfrac {1}{2}}\\[6pt]-{\dfrac {\lambda (1-\lambda )}{(1-2\lambda )^{2}}}+{\dfrac {1}{(2\tanh ^{-1}(1-2\lambda ))^{2}}},&{\text{otherwise}}\end{cases}}} Var ⁡ ( X ) = { 1 / 12 θ = 0 ( 2 − e θ − e − θ ) − 1 + θ 2 θ ≠ 0 {\displaystyle \operatorname {Var} (X)={\begin{cases}1/12&\theta =0\\(2-e^{\theta }-e^{-\theta })^{-1}+\theta ^{2}&\theta \neq 0\end{cases}}}

In probability theory, statistics, and machine learning, the continuous Bernoulli distribution is a family of continuous probability distributions parameterized by a single shape parameter λ ∈ ( 0 , 1 ) {\displaystyle \lambda \in (0,1)}, defined on the unit interval x ∈ [ 0 , 1 ] {\displaystyle x\in [0,1]}, by:

p ( x | λ ) ∝ λ x ( 1 − λ ) 1 − x . {\displaystyle p(x|\lambda )\propto \lambda ^{x}(1-\lambda )^{1-x}.}

The continuous Bernoulli distribution arises in deep learning and computer vision, specifically in the context of variational autoencoders, for modeling the pixel intensities of natural images. As such, it defines a proper probabilistic counterpart for the commonly used binary cross entropy loss, which is often applied to continuous, [ 0 , 1 ] {\displaystyle [0,1]}-valued data. This practice amounts to ignoring the normalizing constant of the continuous Bernoulli distribution, since the binary cross entropy loss only defines a true log-likelihood for discrete, { 0 , 1 } {\displaystyle \{0,1\}}-valued data.

The continuous Bernoulli also defines an exponential family of distributions. Writing θ = log ⁡ ( λ / ( 1 − λ ) ) {\displaystyle \theta =\log \left(\lambda /(1-\lambda )\right)} for the natural parameter, the density can be rewritten in canonical form: p ( x | θ ) ∝ exp ⁡ ( θ x ) {\displaystyle p(x|\theta )\propto \exp(\theta x)}.

Statistical inference

Given an independent sample of n {\displaystyle n} points x 1 , … , x n {\displaystyle x_{1},\dots ,x_{n}} with x i ∈ [ 0 , 1 ] ∀ i {\displaystyle x_{i}\in [0,1]\,\forall i} from continuous Bernoulli, the log-likelihood of the natural parameter θ {\displaystyle \theta } is

L ( θ ) = θ ∑ i = 1 n x i − n log ⁡ { ( e θ − 1 ) / θ } {\displaystyle {\mathcal {L}}(\theta )=\theta \sum _{i=1}^{n}x_{i}-n\log\{(e^{\theta }-1)/\theta \}}

and the maximum likelihood estimator of the natural parameter θ {\displaystyle \theta } is the solution of L ′ ( θ ) = 0 {\displaystyle {\mathcal {L}}'(\theta )=0}, that is, θ ^ {\displaystyle {\hat {\theta }}} satisfies

e θ ^ e θ ^ − 1 − 1 θ ^ = 1 n ∑ i = 1 n x i {\displaystyle {\frac {e^{\hat {\theta }}}{e^{\hat {\theta }}-1}}-{\frac {1}{\hat {\theta }}}={\frac {1}{n}}\sum _{i=1}^{n}x_{i}}

where the left hand side e θ ^ / ( e θ ^ − 1 ) − θ ^ − 1 {\displaystyle e^{\hat {\theta }}/(e^{\hat {\theta }}-1)-{\hat {\theta }}^{-1}} is the expected value of continuous Bernoulli with parameter θ ^ {\displaystyle {\hat {\theta }}}. Although θ ^ {\displaystyle {\hat {\theta }}} does not admit a closed-form expression, it can be easily calculated with numerical inversion.

Further properties

The entropy of a continuous Bernoulli distribution is

H ⁡ [ X ] = { 0 if λ = 1 2 λ log ⁡ ( λ ) − ( 1 − λ ) log ⁡ ( 1 − λ ) 1 − 2 λ − log ⁡ ( 2 tanh − 1 ⁡ ( 1 − 2 λ ) e ( 1 − 2 λ ) ) otherwise {\displaystyle \operatorname {H} [X]={\begin{cases}0&{\text{ if }}\lambda ={\frac {1}{2}}\\{\frac {\lambda \log \left(\lambda \right)-\left(1-\lambda \right)\log \left(1-\lambda \right)}{1-2\lambda }}-\log \left({\frac {2\tanh ^{-1}\left(1-2\lambda \right)}{e\left(1-2\lambda \right)}}\right)&{\text{ otherwise}}\end{cases}}\!}

Related distributions

Bernoulli distribution

The continuous Bernoulli can be thought of as a continuous relaxation of the Bernoulli distribution, which is defined on the discrete set { 0 , 1 } {\displaystyle \{0,1\}} by the probability mass function:

p ( x ) = p x ( 1 − p ) 1 − x , {\displaystyle p(x)=p^{x}(1-p)^{1-x},}

where p {\displaystyle p} is a scalar parameter between 0 and 1. Applying this same functional form on the continuous interval [ 0 , 1 ] {\displaystyle [0,1]} results in the continuous Bernoulli probability density function, up to a normalizing constant.

Uniform distribution

The Uniform distribution between the unit interval [0,1] is a special case of continuous Bernoulli when λ = 1 / 2 {\displaystyle \lambda =1/2} or θ = 0 {\displaystyle \theta =0}.

Exponential distribution

An exponential distribution with rate Λ {\displaystyle \Lambda } restricted to the unit interval [0,1] corresponds to a continuous Bernoulli distribution with natural parameter θ = − Λ < 0 {\displaystyle \theta =-\Lambda <0}.

Continuous categorical distribution

The multivariate generalization of the continuous Bernoulli is called the continuous-categorical.