In probability theory, statistics, and machine learning, the continuous Bernoulli distribution is a family of continuous probability distributions parameterized by a single shape parameter λ ∈ ( 0 , 1 ) {\displaystyle \lambda \in (0,1)}, defined on the unit interval x ∈ [ 0 , 1 ] {\displaystyle x\in [0,1]}, by:

p ( x | λ ) ∝ λ x ( 1 − λ ) 1 − x . {\displaystyle p(x|\lambda )\propto \lambda ^{x}(1-\lambda )^{1-x}.}

The continuous Bernoulli distribution arises in deep learning and computer vision, specifically in the context of variational autoencoders, for modeling the pixel intensities of natural images. As such, it defines a proper probabilistic counterpart for the commonly used binary cross entropy loss, which is often applied to continuous, [ 0 , 1 ] {\displaystyle [0,1]}-valued data. This practice amounts to ignoring the normalizing constant of the continuous Bernoulli distribution, since the binary cross entropy loss only defines a true log-likelihood for discrete, { 0 , 1 } {\displaystyle \{0,1\}}-valued data.

The continuous Bernoulli also defines an exponential family of distributions. Writing θ = log ⁡ ( λ / ( 1 − λ ) ) {\displaystyle \theta =\log \left(\lambda /(1-\lambda )\right)} for the natural parameter, the density can be rewritten in canonical form: p ( x | θ ) ∝ exp ⁡ ( θ x ) {\displaystyle p(x|\theta )\propto \exp(\theta x)}.

Statistical inference

Given an independent sample of n {\displaystyle n} points x 1 , … , x n {\displaystyle x_{1},\dots ,x_{n}} with x i ∈ [ 0 , 1 ] ∀ i {\displaystyle x_{i}\in [0,1]\,\forall i} from continuous Bernoulli, the log-likelihood of the natural parameter θ {\displaystyle \theta } is

L ( θ ) = θ ∑ i = 1 n x i − n log ⁡ { ( e θ − 1 ) / θ } {\displaystyle {\mathcal {L}}(\theta )=\theta \sum _{i=1}^{n}x_{i}-n\log\{(e^{\theta }-1)/\theta \}}

and the maximum likelihood estimator of the natural parameter θ {\displaystyle \theta } is the solution of L ′ ( θ ) = 0 {\displaystyle {\mathcal {L}}'(\theta )=0}, that is, θ ^ {\displaystyle {\hat {\theta }}} satisfies

e θ ^ e θ ^ − 1 − 1 θ ^ = 1 n ∑ i = 1 n x i {\displaystyle {\frac {e^{\hat {\theta }}}{e^{\hat {\theta }}-1}}-{\frac {1}{\hat {\theta }}}={\frac {1}{n}}\sum _{i=1}^{n}x_{i}}

where the left hand side e θ ^ / ( e θ ^ − 1 ) − θ ^ − 1 {\displaystyle e^{\hat {\theta }}/(e^{\hat {\theta }}-1)-{\hat {\theta }}^{-1}} is the expected value of continuous Bernoulli with parameter θ ^ {\displaystyle {\hat {\theta }}}. Although θ ^ {\displaystyle {\hat {\theta }}} does not admit a closed-form expression, it can be easily calculated with numerical inversion.

Further properties

The entropy of a continuous Bernoulli distribution is

H ⁡ [ X ] = { 0 if λ = 1 2 λ log ⁡ ( λ ) − ( 1 − λ ) log ⁡ ( 1 − λ ) 1 − 2 λ − log ⁡ ( 2 tanh − 1 ⁡ ( 1 − 2 λ ) e ( 1 − 2 λ ) ) otherwise {\displaystyle \operatorname {H} [X]={\begin{cases}0&{\text{ if }}\lambda ={\frac {1}{2}}\\{\frac {\lambda \log \left(\lambda \right)-\left(1-\lambda \right)\log \left(1-\lambda \right)}{1-2\lambda }}-\log \left({\frac {2\tanh ^{-1}\left(1-2\lambda \right)}{e\left(1-2\lambda \right)}}\right)&{\text{ otherwise}}\end{cases}}\!}

Related distributions

Bernoulli distribution

The continuous Bernoulli can be thought of as a continuous relaxation of the Bernoulli distribution, which is defined on the discrete set { 0 , 1 } {\displaystyle \{0,1\}} by the probability mass function:

p ( x ) = p x ( 1 − p ) 1 − x , {\displaystyle p(x)=p^{x}(1-p)^{1-x},}

where p {\displaystyle p} is a scalar parameter between 0 and 1. Applying this same functional form on the continuous interval [ 0 , 1 ] {\displaystyle [0,1]} results in the continuous Bernoulli probability density function, up to a normalizing constant.

Uniform distribution

The Uniform distribution between the unit interval [0,1] is a special case of continuous Bernoulli when λ = 1 / 2 {\displaystyle \lambda =1/2} or θ = 0 {\displaystyle \theta =0}.

Exponential distribution

An exponential distribution with rate Λ {\displaystyle \Lambda } restricted to the unit interval [0,1] corresponds to a continuous Bernoulli distribution with natural parameter θ = − Λ < 0 {\displaystyle \theta =-\Lambda <0}.

Continuous categorical distribution

The multivariate generalization of the continuous Bernoulli is called the continuous-categorical.