Prais–Winsten estimation

In econometrics, Prais–Winsten estimation is a procedure meant to take care of the serial correlation of type AR(1) in a linear model. Conceived by Sigbert Prais and Christopher Winsten in 1954, it is a modification of Cochrane–Orcutt estimation in the sense that it does not lose the first observation, which leads to more efficiency as a result and makes it a special case of feasible generalized least squares.

Theory

Consider the model

y t = α + X t β + ε t , {\displaystyle y_{t}=\alpha +X_{t}\beta +\varepsilon _{t},\,}

where y t {\displaystyle y_{t}} is the time series of interest at time t, β {\displaystyle \beta } is a vector of coefficients, X t {\displaystyle X_{t}} is a matrix of explanatory variables, and ε t {\displaystyle \varepsilon _{t}} is the error term. The error term can be serially correlated over time: ε t = ρ ε t − 1 + e t , | ρ | < 1 {\displaystyle \varepsilon _{t}=\rho \varepsilon _{t-1}+e_{t},\ |\rho |<1} and e t {\displaystyle e_{t}} is white noise. In addition to the Cochrane–Orcutt transformation, which is

y t − ρ y t − 1 = α ( 1 − ρ ) + ( X t − ρ X t − 1 ) β + e t , {\displaystyle y_{t}-\rho y_{t-1}=\alpha (1-\rho )+(X_{t}-\rho X_{t-1})\beta +e_{t},\,}

for t = 2,3,...,T, the Prais-Winsten procedure makes a reasonable transformation for t = 1 in the following form:

1 − ρ 2 y 1 = α 1 − ρ 2 + ( 1 − ρ 2 X 1 ) β + 1 − ρ 2 ε 1 . {\displaystyle {\sqrt {1-\rho ^{2}}}y_{1}=\alpha {\sqrt {1-\rho ^{2}}}+\left({\sqrt {1-\rho ^{2}}}X_{1}\right)\beta +{\sqrt {1-\rho ^{2}}}\varepsilon _{1}.\,}

Then the usual least squares estimation is done.

Estimation procedure

First notice that

v a r ( ε t ) = v a r ( ρ ε t − 1 + e t ) = ρ 2 v a r ( ε t − 1 ) + v a r ( e t ) {\displaystyle \mathrm {var} (\varepsilon _{t})=\mathrm {var} (\rho \varepsilon _{t-1}+e_{t})=\rho ^{2}\mathrm {var} (\varepsilon _{t-1})+\mathrm {var} (e_{t})}

Noting that for a stationary process, variance is constant over time,

( 1 − ρ 2 ) v a r ( ε t ) = v a r ( e t ) {\displaystyle (1-\rho ^{2})\mathrm {var} (\varepsilon _{t})=\mathrm {var} (e_{t})}

and thus,

v a r ( ε t ) = v a r ( e t ) ( 1 − ρ 2 ) {\displaystyle \mathrm {var} (\varepsilon _{t})={\frac {\mathrm {var} (e_{t})}{(1-\rho ^{2})}}}

Without loss of generality suppose the variance of the white noise is 1. To do the estimation in a compact way one must look at the autocovariance function of the error term considered in the model below:

c o v ( ε t , ε t + h ) = ρ h v a r ( ε t ) = ρ h 1 − ρ 2 , for h = 0 , ± 1 , ± 2 , … . {\displaystyle \mathrm {cov} (\varepsilon _{t},\varepsilon _{t+h})=\rho ^{h}\mathrm {var} (\varepsilon _{t})={\frac {\rho ^{h}}{1-\rho ^{2}}},{\text{ for }}h=0,\pm 1,\pm 2,\dots \,.}

It is easy to see that the variance–covariance matrix, Ω {\displaystyle \mathbf {\Omega } }, of the model is

Ω = [ 1 1 − ρ 2 ρ 1 − ρ 2 ρ 2 1 − ρ 2 ⋯ ρ T − 1 1 − ρ 2 ρ 1 − ρ 2 1 1 − ρ 2 ρ 1 − ρ 2 ⋯ ρ T − 2 1 − ρ 2 ρ 2 1 − ρ 2 ρ 1 − ρ 2 1 1 − ρ 2 ⋯ ρ T − 3 1 − ρ 2 ⋮ ⋮ ⋮ ⋱ ⋮ ρ T − 1 1 − ρ 2 ρ T − 2 1 − ρ 2 ρ T − 3 1 − ρ 2 ⋯ 1 1 − ρ 2 ] . {\displaystyle \mathbf {\Omega } ={\begin{bmatrix}{\frac {1}{1-\rho ^{2}}}&{\frac {\rho }{1-\rho ^{2}}}&{\frac {\rho ^{2}}{1-\rho ^{2}}}&\cdots &{\frac {\rho ^{T-1}}{1-\rho ^{2}}}\\[8pt]{\frac {\rho }{1-\rho ^{2}}}&{\frac {1}{1-\rho ^{2}}}&{\frac {\rho }{1-\rho ^{2}}}&\cdots &{\frac {\rho ^{T-2}}{1-\rho ^{2}}}\\[8pt]{\frac {\rho ^{2}}{1-\rho ^{2}}}&{\frac {\rho }{1-\rho ^{2}}}&{\frac {1}{1-\rho ^{2}}}&\cdots &{\frac {\rho ^{T-3}}{1-\rho ^{2}}}\\[8pt]\vdots &\vdots &\vdots &\ddots &\vdots \\[8pt]{\frac {\rho ^{T-1}}{1-\rho ^{2}}}&{\frac {\rho ^{T-2}}{1-\rho ^{2}}}&{\frac {\rho ^{T-3}}{1-\rho ^{2}}}&\cdots &{\frac {1}{1-\rho ^{2}}}\end{bmatrix}}.}

Having ρ {\displaystyle \rho } (or an estimate of it), we see that,

Θ ^ = ( Z T Ω − 1 Z ) − 1 ( Z T Ω − 1 Y ) , {\displaystyle {\hat {\Theta }}=(\mathbf {Z} ^{\mathsf {T}}\mathbf {\Omega } ^{-1}\mathbf {Z} )^{-1}(\mathbf {Z} ^{\mathsf {T}}\mathbf {\Omega } ^{-1}\mathbf {Y} ),\,}

where Z {\displaystyle \mathbf {Z} } is a matrix of observations on the independent variable (Xt, t = 1, 2, ..., T) including a vector of ones, Y {\displaystyle \mathbf {Y} } is a vector stacking the observations on the dependent variable (yt, t = 1, 2, ..., T) and Θ ^ {\displaystyle {\hat {\Theta }}} includes the model parameters.

Note

To see why the initial observation assumption stated by Prais–Winsten (1954) is reasonable, considering the mechanics of generalized least square estimation procedure sketched above is helpful. The inverse of Ω {\displaystyle \mathbf {\Omega } } can be decomposed as Ω − 1 = G T G {\displaystyle \mathbf {\Omega } ^{-1}=\mathbf {G} ^{\mathsf {T}}\mathbf {G} } with

G = [ 1 − ρ 2 0 0 ⋯ 0 − ρ 1 0 ⋯ 0 0 − ρ 1 ⋯ 0 ⋮ ⋮ ⋮ ⋱ ⋮ 0 0 0 ⋯ 1 ] . {\displaystyle \mathbf {G} ={\begin{bmatrix}{\sqrt {1-\rho ^{2}}}&0&0&\cdots &0\\-\rho &1&0&\cdots &0\\0&-\rho &1&\cdots &0\\\vdots &\vdots &\vdots &\ddots &\vdots \\0&0&0&\cdots &1\end{bmatrix}}.}

A pre-multiplication of model in a matrix notation with this matrix gives the transformed model of Prais–Winsten.

Restrictions

The error term is still restricted to be of an AR(1) type. If ρ {\displaystyle \rho } is not known, a recursive procedure (Cochrane–Orcutt estimation) or grid-search (Hildreth–Lu estimation) may be used to make the estimation feasible. Alternatively, a full information maximum likelihood procedure that estimates all parameters simultaneously has been suggested by Beach and MacKinnon.

Prais–Winsten estimation

Theory

Estimation procedure

Note

Restrictions

Further reading