. 13
( 14)


The market return R:+, is measured by the Saloman Brothers world bond por
and the zero-beta portfolio comprises a portfolio of short-term assets (such tha
lation between Rm and RZ is zero). From equation (19.21), we see that if we
risk premium 6, as E , ( H ( " )- Rz)r+l then

For Germany Bisignano (1987, Table 27) reports the following results for fi
10-year bonds:
= 0.74R;+, 0.26Rz1 - 125.6ccI:)1
(10.7) (3.8) (2.3)
78(2)-85(11), R2 = 0.182, DW = 1.78, ( . ) = t statistic
= 0.64R:+, 0.367Rz1 - 246.5CCI:q'
(5.75) (3.2) (2.1 )
78(2)-85(11), R2 = 0.133, DW = 2.2, ( . ) = t statistic
There is therefore some support for this 'two-factor model', namely that both
return R?+, and the consumption covariability term CC,+1 help in explain
holding period yields. However, the consumption term is relatively less well
is statistically significant and rises with term to maturity n. For example, fo
p(') = 0.21 (t = 4.2) and p(*O) = 1.28 ( t = 5.4) (Table 23) indicating greater
risk for long bonds than for short bonds. Thus there is some support here for th
CAPM as an explanation of time varying excess HPYs on bonds even whe
are assumed to be time invariant. Below Bisignano's model is extended by a
conditional covariance between Hi") and RY, that is p("), to vary not only w
maturity but also over time.

ARCH and GARCH Models
In the CAPM risk is measured either by the asset's beta or by its covarianc
market portfolio. ARCH or GARCH processes can be used to model time v
premia. After setting out the model we look at some recent illustrative empir
using this approach. We shall see that implementation of ARCH models ofte
estimating a large number of parameters and in fact the estimation procedure is
non-linear which can create additional (convergence) problems. Because of the
difficulties particularly when working with a finite data set, we shall see that
have simplified the estimation problem in various ways. First, there is us
limitation on the number of asset returns considered (e.g. just returns on dom
rather than domestic and foreign bonds) and second, the parameters of the GAR
are usually restricted in some way (e.g. using low order rather than high orde
The theoretical model of the bond market outlined below incorporates bonds
maturities and also allows for time varying risk premia. The basic model used is
and hence time varying premia are modelled via time varying covariances. In
covariances are modelled by an ARCH or GARCH process. For a bond of
the CAPM RE implies that the excess (HPY) yield y!:\ = [H,':', - r,] is gi

where h is the market price of risk. For the marketportfolio the excess HPY is p
to the market price of risk and the variance of the excess yield on the marke

where we assume ˜ 1 " + ˜N ( 0 , o;,,,). From (19.25) the conditional variance of
excess HPY, var(y'")Q,), is equal to E , [ U : ˜ ] *= E , o ˜ , + Hence (19.25) can be

and from (19.25) [y!:! - Ey,'?!] = u z 1 .
From (19.24) [y!:: - Ey:$] =
conditional covariance of Hi:', and RF+l is given by cov(Hj[ti, R;,) = E,[u
0 " We can therefore rewrite (19.24) as:
requires expressions for the variance and covariances of the error terms. A G
model implies that agents forecast future variances/covariances (i.e. ˜risk™) o
of past variances and covariances. For an n-period bond its own variance (
covariance with the market a may be represented by the following two G
+ a10,2,+ a2(&:I2
Gn,+l= a0

The GARCH variance for the market forecast errors 0; is

The above equations are just a replication of the GARCH equations examined
Each equation implies that the conditional variance (or covariance) is a weigh
of the previous period™s conditional variance (covariance) and last period™s act
error (or covariance of forecast errors with the market). The degree to whic
believe that a period of turbulence will persist in the future is given by the size
+ +
(or 81 8 2 , etc.). If a1 a = 0 then the variance G,˜ = a and it is not ti
2 0
If a + a = 1 then a forecast error at time t, E: will influence the investor™s
of the amount of risk in all future periods. If 0 < (a1 a2) < 1 then shocks
only influence perceptions of future risk, for a finite number of future period
The illustrative empirical application of the above model is provided by
(1992) who estimate the above model for bonds of several maturities and for
different countries. For each country, they assume that the market portfolio c
of domestic bonds (i.e. does not contain any foreign bonds) and HPYs are me
one month. For the marketportfolio of bonds there is evidence of a GARC
y1, y2 are non-zero. However, the conditional variance a $ 1 does not appear
statistically at least, much of the variation in the excess market yields E , R z
in equation (19.26) A is only statistically different from zero for Japan and
we can accept A = 0 for the UK, Canada, the USA and Germany). In genera
also find that the covariance terms do not influence HPYs on bonds of mat
n = 1-3, 3-5, 5-7, 7-9, 10-15, and greater than 15 years) in a number of co
the UK, Japan, the USA and Germany).
Hall et a1 also test to see whether information at time t , namely lagged e
(H - r)t-j and the yield spread [R!“™ - r,], influence the excess yield E,Y:+,
(19.28). This is a test of whether the CAPM with time varying risk premia to
RE provides a complete explanation of excess yields. In addition, Hall et a1
conditional (own) variance of the n-period bond a alongside the covarianc
in equation (19.28). If the CAPM is correct the own-variance a should n
the HPY.
In the majority of cases they reject the hypothesis that the lagged HPY a
spread are statistically significant. Hence Mankiw™s (1986) results where the
rejecting the CAPM.
There are many potential candidates to explain the rather mixed results acros
and countries found in the Hall et a1 study. Candidates include:

(i) an approximation to HPYs is used,
(ii) holding periods other than one month are not investigated and the on
˜static™ CAPM is not invariant to the choice of the length of the holdin
the market portfolio is taken to be all domestic bonds (e.g. no foreign b
equities are included),
(iv) there are a large number of parameters to estimate from a relatively l
set and therefore Hall et a1 assume that the parameters in the differe
equations for variances and covariances are equal (i.e. a;= 6; = y,, fo
This saves on degrees of freedom and mitigates possible difficulties in
a highly non-linear likelihood function but may impose invalid restrictio
even when these restrictions are imposed the likelihood function may
determined (i.e. may be rather ˜flat™).

Despite some restrictive assumptions in the Hall et a1 study it does sugges
varying risk premia (which exhibit persistence) do exist in bond markets b
rather difficult to pin down empirically and the effects are not uniform acro
different maturities and different countries. The study does highlight the di
using the ARCH procedure to obtain a tractable model of the CAPM with ti
variances and covariances. The number of parameters to be estimated and the n
of the maximisation procedure on a limited (and perhaps somewhat poor) dat
that any results are likely to involve wide margins™of error and they may not
robust to slight specification changes.

The CAPM determines the expected returns on each and every asset if agents
them in their ˜equilibrium™ portfolio. In the CAPM, all agents hold the market
all assets in proportion to market value weights. However, we know that most
diversification may be obtained by holding around 20 assets. Given transactions
of collecting and monitoring information, as well as the need to hedge project
of cash against maturing assets, then holding a subset of the market portfolio m
for some individuals and institutions. An investor might therefore focus his a
the choice between blocks of assets and be relatively unconcerned about the
of assets within a particular ˜block™. Thus the investor might focus his deci
returns from holding a ˜block™ of six-month bills, a ˜block™ of long bonds an
of stocks. Over a three-month holding period the return on the above assets i
The above simplification is used by Bollerslev et a1 (1988) in simultaneously
the HPY on these three broad classes of asset using the CAPM. They also util
+ w 2 cov[H˜”, H ˜ 2 ™ ] + w3 cov[H˜”,
= w1 cov[H˜”H“™]

+ W2012r + W3013r
= W l a lI f
where wi = value weight of each asset in the market portfolio. Thus we ca
˜single™ CAPM covariance term on the LHS of (19.32) into the above three c
The estimating equations for the excess HPYs on the three assets yit+l(i = 1 ,
the form:
r3 1

where the wj are known and h is the market price of risk, which according to
should be the same across all assets. Bollerslev et a1 estimate three excess HPY
of the form (19.32): one for six-month bills, another for 20-year bonds, and
stock market index. They model the three time varying variance and covari
using a GARCH(1,l) model (but with no restrictions on the parameters of th
procedure) of the form:
a i j r + l = aij Bijaijr YijEirEjr

The broad thrust of the results are:
(i) The excess holding period yield on the three assets does depend on the ti
conditional covariances since h = 0.499 (se = 0.16).
(ii) Conditional variances and covariances are time varying and are adequatel
using the GARCH(1,l) model. Persistence in conditional variance is
bills, then for bonds and finally for stocks (a1+a;!equals 0.91, 0.62
(iii) Although the CAPM does not imply that conditional covariances sho
all of the movement in ex-post excess HPYs (because the arrival of ˜n
a divergence between yt+l and E,y,+l) one can see that actual HPYs
more than the expected HPY given by the prediction from (19.32) (see F
for bills (the graphs for bonds and stocks are qualitatively similar).

The CAPM with GARCH time varying volatilities does reasonably well em
explaining HPYs and when the own variance from the GARCH equations
(19.32) it is statistically insignificant (as one would expect under the CAPM)
the CAPM does not provide a ˜complete™ explanation of excess HPYs. For exa
the lagged excess HPY, for asset i, is added to the CAPM equation for asset i
to be statistically highly significant - this rejects the simple static CAPM f
Also when a ˜surprise consumption™ variable is added to the CAPM equa
usually statistically significant. This indicates rejection of the simple one-fac
and suggests the consumption CAPM might have additional explanatory pow
I I 1 I I 1
1959 1963 1967 1971 1975 1979 1983 1987 1991

Figure 19.1 Risk Premia for Bills. Source: Bollerslev et a1 (1988). ˜A Capital Asset P
with Time Varying Covariances™, Journal of Political Economy, 96( l), Fig. 1, pp.
Reproduced by permission of University of Chicago Press

The Bollerslev et a1 model of expected returns in equation (19.32) is very
that of Thomas and Wickens (1993) described in Chapter 18. The difference b
two approaches is that Thomas and Wickens consider a wider set of assets (e.g
foreign bonds and stocks), they allow the shares w; to vary and they test (and
additional restriction that the elements 0 ; j are equal to the variance-covarianc
error terms. However, the less restrictive model of Bollerslev et a1 does app
better empirically.

There appears to be only very weak evidence of a time varying term premium
term zero coupon bonds (bills). As the variability in the price of bills is in m
smaller than that of long-term bonds (or equities) this result is perhaps not too
When the short-term bill markets experience severe volatility (e.g. USA, 1979-
the evidence of persistence in time varying term premia and the impact of the
variance on expected return is much stronger.
Holding period yields on long-term bonds do seem to be influenced by ti
conditional variances and covariances but the stability of such relationship
to question. While ARCH and GARCH models provide a useful statistica
modelling time varying second moments the complexity of some of the param
is such that precise parameter estimates are often not obtained in empirical wo
studies the conditional second moments appear to be highly persistent. If, in ad
are thought to affect required returns then shocks to variances and hence to
can have a strong impact on bond prices. However, such results are sometim
be sensitive to specification changes in the ARCH process.
It was noted in Chapter 17 that generally speaking the above points als
empirical work on stock returns. However, there is perhaps somewhat stronge
This is most easily demonstrated assuming continuously compounded
fixed maturity value M on a zero coupon bond we have:
In Pj3) = In M - ( 1/4)rf
In pj6)= In M - ( 1 / 2 ) ˜ ,

The excess HPY is:
Hf+13 - (1/4)rr
and substituting from (3) in (4) we see that the expected excess HPY
2R, - Et˜r+13 r, as in the text.

Given that for continuously compounded rates on zero coupon bond
1nM - nR:")the absolute change in interest rates on an n-period bond is p
to the percentage change in the price of the bond.

The recent literature in this area tends to be very technical, covering a wid
ARCWGARCH-type models as well as non-parametric and stochastic volatil
Cuthbertson et a1 (1992) and Mills (1993) provide brief overviews of the us
and GARCH models in finance. Overviews of the econometric issues are giv
and Schwert (1990), Bollerslev et a1 (1992), with Bollerslev (1986) and E
being the most accessible of these four sources.
Econometric Issues in Testing As
Pricing Models

This section of the book presents a brief overview of the key concepts
econometric analysis of time series data on financial variables. Since these c
techniques are widely used in the finance literature dealing with discrete time
material is included in order to make the book as self-contained as possible. Th
of these topics is fairly brief and concentrates on the use to which these tech
be put rather than detailed proofs. Nevertheless, it should provide a concise i
to this complex subject matter.
First, an analysis is given of univariate time series covering topics such as
sive and moving average representations, stationarity and non-stationary, cond
unconditional forecasts and the distinction between deterministic and stocha
Then the extension of these ideas in a multivariate framework (Section 20.2) a
tionship between structural economic models, VARs and the literature on co
and error correction models are considered. Section 20.3 presents the basic id
ARCH and GARCH models and their use in modelling time varying variances
ances. Attention will not be given to detailed estimation issues of ARCH mod
see Bollerslev (1986)) but on the economic interpretation of these models. S
outlines some basic issues in estimating models which invoke the rational e
assumption, a key hypothesis in many of the tests reported in the rest of
Again the aim is not a definitive account of these active research areas but to
overview which will enable the reader to understand the basis of the empir
reported elsewhere in the book.
This page intentionally left blank
L Economic and Statistical Mode
An ˜economic model™ can be defined as one that has some basis in econo
Economic theory usually yields ˜static equilibrium™ or ˜long-run™ relatio
example, if purchasing power parity (PPP) holds then the log of the nomina
rate y is linked to the price of domestic goods relative to the price of for
xr = In Pr - In Pr. If we assume instantaneous adjustment we have:

+ B2xr + Er
Y f = B1
where Er is a random error term which is often taken to be white noise (see bel
PPP, we expect 8 2 = 1. However, it may be possible to obtain a ˜good™ repre
the behaviour of a variable yr without recourse to any economic theory. For
purely statistical model of the exchange rate yr is the univariate autoregressiv
order 1, that is AR(1):
+ +
Yr = (2 BYr-1 Er

It may be the case that some economic theory is consistent with equation (2
purely statistical or time series modeller would not be concerned or probably n
of this. Clearly then the ˜statistical modeller™ and ˜economic modeller™ might e
the same statistical representation of the data. However, their motivation and c
whether the representation is ˜adequate™ may well be different. Both the statist
economic modellers require that their models adequately characterise the data
the economic modeller will also generally require that his model is in confo
some economic theory.
Whether one is a statistical modeller or an economic modeller it is useful to
data one is using in various ways. To set the ball rolling consider the followin
autoregressive model of order 1, AR(l), for y f :
+ Er
yr = Byr-1
where Er is a zero-mean random variable with constant variance 02, Er is u and
with any other variable in the sequence { E r - j , j = f l , f 2 , . . .}
EEr = 0 Vt
var(Er) = E(&:) = o2 Vt

V j # 0 and t
C O V ( E ˜ ,E 1 - j )
We can represent (20.3) as:
(1 - /3L)Yf = El

y,-m and (1 - /3L) is said to be a poly
L is the lag operator, such that L"y,
order 1) in the lag operator. A equivalent representation of (20.5) is:

yr is therefore an infinite geometrically weighted average of the error term E
also have obtained (20.6˜) repeated 'back substitution':

+Er-1) +
Yr = B(Byr-2 Et

Yr = P 2 ( / 3 Y f - 3 + E r - 2 ) + B E t - 1 + Er
++ +f
yr = /3"yr-n ˜ ˜ ˜ -* 2
Et * 6

As long as 1/31 < 1 then /Y + 0 as n + 00 and the term /Yy,-, become
cant. The lag operator is a convenient shorthand and is useful in manipula
expressions. For example
(1 - p L ) - l & , = (1 + / 3 L + / 3 2 L 2 + * * * ) E l

It is obvious that y, in (20.7˜)depends on current and all past values of E
however Y , is uncorrelated with future values of E f + j ( j 3 1).

If 1/31 < 1 then yl is a stationary series. Broadly speaking a stationary series ha
mean and variance and the correlation between values yr and Y r - j depends
time diflerence ' j ' . Thus the mean, variance and (auto-) correlation for any
independent of time. A stationary series tends to return often to its mean va
variability of the series doesn't alter as we move through time (Figure 20.1).
The condition for stationarity 1/31 < 1 can be seen intuitively by noting that
a starting value yo then subsequent values of y in periods 1, 2, . . . are Pyo
+ + +
˜ 2 ) ,(P2&1 / 3 ˜ 2 ˜ 3 ) etc. If 1/31 <
plus the random error terms ˜ 1 , ,
deterministic part of y , namely /3"yo, approaches zero and the weighted ave
E i S are also finite (and eventually they tend to cancel out as the E i S are rand
zero). Hence y,+":
Figure 20.1 Stationary Series.

remains finite. More formally a process is ˜weakly™ or ˜covariance™ stationary
E f =cL
var(yt) = o2

where all the RHS population ˜moments™ are independent of time t and have f
If in addition y, is normally distributed then the process represented by (20.9a
is strongly stationary. However, the distinction between weak and strong sta
not important in what follows so ˜stationarity™ is used to mean ˜weak™ or ˜
stationary. A white noise error term Er is a very specific type of stationary s
the mean and covariance are zero.
All the usual hypothesis testing procedures in statistics are based on the
that the variables used in constructing the tests are stationary. For a non-statio
the distribution of ˜conventional™ test statistics may not be well behaved. Th
properties of tests on non-stationary series generally involve substantial change
of the ˜conventional™ tests (e.g. special tables of critical values). To illustrate
a non-stationary series consider:

+ Byr-1 +
=a Er

where /?= 1 and Er is white noise. This is known as a random walk with dri
parameter is a and the model is:
Ayr = a E t
The growth in yf (assume yf is in natural logarithms) is a constant (= a) p
noise error. The realisation of (20.10) is shown in Figure 20.2. Clearly yr h
which increases over time and hence the level of y, is non-stationary.
If a! = 0 and = 1 we have a random walk without drip. The realisation of
in Figure 20.2 and y, is non-stationary because the (unconditional) variance
larger as n + 00 and therefore is not independent of time.
+ Er + + ˜ t - 2+
yf = (1 - L ) - l E t = E , Er-1 ***
Figure 20.2 Non-Stationary Series. Random Walk with Drift(-), Random Wa
Drift( - - - - - )



Figure 20.3 MA(I) Series (yf = E, - 0.5cf-1).

Ey, = O

Hence var(y,) + 09 as n + 09.

Moving Average Process
Another simple time series representation of yr is the moving average proce
1, MA(1):
+ +
yr = Er h t - 1 = (1 U)&,

a realisation of which is shown in Figure (20.3). An equivalent representatio
+ AL)-'y, = E,
l l < 1) may be represented as an infinite autoregressive process. For 1A1 < 1
process is said to be ˜invertible™. Similarly, comparing (20.3) and (20.6˜) AR
may be represented as an infinite moving average process (for 1/31 < 1).

ARIMA Process
If a series is non-stationary, then differencing often produces a stationary
example, if the level of yr is a random walk with drift then A y r is stat
equation (20.11)). Any stationary stochastic time series yr can be approximated
autoregressive moving average (ARMA) process of order (p, q), that is ARM

where 4 (L) and 8 (L) are polynomials in the lag operator:
- . . . - 4pLP
# ( L ) = 1 - 41L - &L2 - 4&3
+ + 8 , +˜. . .˜+ 0 q ˜ 4
#(L) = 1 e l L
The stationarity condition then is that the roots of t (L) lie outside the uni
all the roots of $ (L) are greater than one in absolute value). A similar condit
placed on 8 ( L ) to ensure invertibility (i.e. that the MA(q) part may be writt
of an infinite autoregression on y , see equation (20.14)).
If a series needs differencing d times (for most economic time series
is sufficient) to yield a stationary series then A d y r can be modelled as an AR
process or equivalently the level of yr is an ARIMA(p, d, q ) model. For examp
(1966) demonstrates that many economic time series (e.g. GDP, consumptio
may be adequately represented by a purely statistical ARIMA(O,l,l) model:

Autocorrelation Function: Correlogram
The properties of univariate time series models may be summarised by t
tion) autocorrelation function and its sample analogue, the correlogram. The (
autocorrelation between yr and Y r - r ( t = 0, f l , f 2 , . . .) is

where yr is the autocovariancefinction at lag t.

Yr = Cov(Yr Yr-r

YO = var(yr 1

By definition po = 1. There is a value of P r for each lag ˜c, and the autocorr
dimensionless. The sign of Pr indicates whether yr and yr-5 are positively or
A plot of 6, against t is called the correlogram. The correlogram is useful in
classify the type of ARIMA model that might best characterise the data. A corre
approaches zero as r increases indicates a stationary series, whereas a ˜flat™ c
indicates that the series must be differenced at least once to yield a stationary
example, if yr is a random walk with drift, equation (20.10),then yr and the firs
of this series, Ay,, have correlograms like those of Figure 20.4. The corre
the stationary AR(1) model (with 0 < / < 1) and the MA(1) model (with
as represented by equations (20.3) and (20.13), respectively, have the distinc
shown in Figure 20.5. Hence a researcher wishing to model yr as a univariate
could immediately identifi from the correlograms in Figure 20.5 what typ
to estimate. When yr is generated by a more complex ARMA(p, q ) proce
shape of the correlogram can only be taken as indicative of the type of AR
to estimate. The information given by the shape of the correlogram and the
choice of univariate ˜time series™ model forms the basis of Box-Jenkins or

Forecasts and the Variance of Forecast Errors
This section demonstrates the relationship between unconditional forecasts and
forecasts and their associated forecast error variances using a simple statio
Yr = a + Syr-1 IS1 < 1

Random Walk with Drift ( y t )

Figure 20.4 Correlogram for Random Walk with Drift (yf) and for Ayf
L 1
1 I
2 4

+ c l ) and MA(1) (yt = cI +
Figure 20.5 Correlogram for AR(1) (yt = 0.5y,-l

is uncorrelated with y+1. For I/?I < 1 we have f
where Et is N ( 0 , a 2 )and Er
by back substitution:

+ + + .) + Bmyt-m + + B E t - 1 + +t
yt = (a! ˜ ˜ ˜ -
** ***

+ + pet-1 + B2E,-2 + - - -
Yt = - Et
where for < 1, the term Bmyt-m approaches zero. At time t the uncondi
of yr from (20.20) is:
E f =P
where p = a/(l - B) and we have used Et&+, = 0 ( j 2 0). We can interpret
ditional (population) mean of yf as the best forecast of yt when we have no
about previous values of y. Alternatively, it is the ˜average value™ around wh
lates in the population. Now let us calculate the unconditional variance o
(20.20) and (20.21) we can easily see that

+ But-1 + +
- a/(l - p) = /?(yr-1 - p )
(Yr - P ) = Et Et


For a stationary series var(yt) = var(yt-l) and because Et is uncorrelated with
the last term is zero, hence the unconditional variance of yt is:

var(y,) = a2/(1- p 2 )
It is also useful to derive this variance from (20.20)
The unconditional variance is the ˜best guess™ of the variance without any kn
recent past values of y. In the non-stationary case /3 = 1 and the unconditiona
variance are infinite (undefined). This is clear from equations (20.21) and
is the mathematical equivalent of the statement that ˜a random walk series
anywhere ™.

Conditional Mean and Variance: Stationary Series
In calculating the conditional mean and variance we have to be precise abou
of the information set 52. If we have information at t or earlier (i.e. on yr-j or
then the conditional mean of yr is denoted E(yf(Qr)or Eryr for short. From
conditional mean in the AR( 1) model for different forecast horizons is obtained

The conditional variance, given information at time t , for different forecast ho
defined as
var(yr+m152t) = E [yr+m - E ( Y ˜ I Q , ) I ˜
By successive substitution:


+ BYr+2 + Er+3
Yr+3 = a
+ B + B2) + B3Yf + (P2Er+1 + BEr+2 + E f + 3 )
m- 1 m-1


Using (20.6), (20.7) and (20.8) the conditional variances at various forecast e

var(Yt+l 152,) = Q2
+ B2 +
var(y,+3IQr) = (1 @IQ2

var(yr+,1Qr) = (1 + p2 + p4+ - - .˜ ˜˜-˜)
We can immediately see by comparing (20.25) and (20.29) that the conditional
always less than the unconditional variance at all forecast horizons. Of course
Conditional Mean and Variance: Non-Stationary Series
The above analysis is repeated with j9 = 1. After ˜m™ periods the expected valu

+ am
EtYr+m = yt

Since yt is a fixed starting point (which we can set to zero), the (conditiona
value of yr+* is a deterministic time trend ˜am™ Yr = U bt). However, th
behaviour of the RW ( p = 1) with drift (a# 0) is given by

and is qften referred to as a stochastic trend. This is because in addition to the d
trend element ˜am™, there is a stochastic moving average error. For the ra
without drift (a = 0) the best conditional forecast of all future values of y is
current value y r .
Unlike the stationary case, the influence of past errors on yr+m does not
m -+ 00 since the CEt+i terms are not ˜weighted™ by < 1 (see (20.31)). The
variance of the forecasts can be obtained from (20.29) by setting p = 1 and i

var(yt+m IQ r ) = ma2
As m increases the variance increases and approaches infinity as the forec
m 00. The conditional variance for a random walk series is explosive.

Deterministic and Stochastic Trends
A deterministic trend is given by

+ pt + Er
yr = a
where t takes the values 1, 2, 3 . . ., etc. The dependent variable yr in (20.
stationary since its mean rises continuously over time. The mean of yt is
hence is independent of other economic variables, even when considering a f
the distant future. The conditional forecast error is a2 and it does not increase
the forecast horizon. Our stochastic trend for the RW with drift may be writte

+ Bt + + +- +
Yr = yo El)
(Et Et-1 **

Hence the stochastic trend has an infinite memory: the initial historic value of
(i.e. yo) has an influence on all future values of y. The implications of determ
stochastic trends in forecasting are very different as are the properties of te
applied to these two types of series. We need some statistical tests to discrimina
these two hypotheses. Appropriate ˜detrending™ of a deterministic trend involv
sion of yt on ˜time™ but for a stochastic trend (i.e. random walk) ˜detrendin
taking first differences of the series (i.e. using Ayr).
no deterministic time trend present (this is often referred to as the spurious
problem). Also if a random walk series is detrended using a deterministic tren
the new detrended series is yf - ) t then the autocorrelation function will
indicate positive correlation at low lags and a cyclical pattern at high lags.

It may be useful to summarise briefly the main points dealt with so far, these
A stationary series is one that has a mean, variance and autocovariances i
lation (of data) that are independent of time and are finite. The populatio
variance are constant and the covariance between yr and Yr-m depends onl
length ˜m™ (and is constant for any given lag length). Since the populatio
(i.e. mean, variance and covariance) are constant, they can usually be
estimated by their sample analogues.
A graph of a stationary series (for the population of data) has no discern
upward or downward trend, the series frequently crosses its mean value
variability of the series around the mean value is, on average, a constan
(which is finite).
A non-stationary series is one that either has its population mean or
or its autocovariances which vary over time. In some special cases the
(unconditional) mean, variance (or even covariance) may approach plu
infinity (i.e. be undefined).
The simplest form of non-stationary series is the random walk with d
Gaussian error)
+ pyr-l where /I=1
yt = a +Er

(1 - BL)yr = a Er

The non-stationary series yr is said to have a unit root in the lag polynom
(1 - BL) has /?= 1. If a = 0 the best conditional forecast of yr+m based on
at time t, for all horizons ˜m™, is simply the current value yr. For a #
conditional forecast of Yr+m is am.
Stochastic trends and deterministic trends have different time series pro
must be modelled differently.
It may be shown that any stationary stochastic series yr may be represented
nite moving average of white noise errors (plus a deterministic component
be ignored). This is Wold™s decomposition theorem. If the moving average
is invertible then the yt may also be represented by an infinite autoregressi
lag ARMA(p, q) process may provide a parsimonious approximation to
lag AR or MA process.
The unconditional mean of a stationary series may be viewed as the lon
to which the series settles down. The unconditional variance gives a
This section generalises the results for the univariate time series models discu
to a multivariate framework. In particular, it will show how a multivariate sys
reduced to a univariate system. Since the real world is a multivariate system the
concerning the appropriate choice of multivariate system to use in practica
will be discussed and a brief look taken of the relationship between a structura
model and a multivariate time series representation of the data. Finally a m
system where some variables may be cointegrated will be examined.
It is convenient at this point to summarise the results of the univariate ca
decomposition theorem states that any stationary stochastic series yt may be rep
a univariate infinite moving average of white noise errors (plus a deterministic
which we ignore throughout, for simplicity of exposition):

+ +
where 6(L) = (1 OIL 82L2 - - .). If 8(L) is invertible then yt may also be
as an infinite univariate autoregression, from (20.33):

A stationary series may also be represented as an ARMA (p, q ) model:

As yr is stationary the roots of 4(L) lie outside the unit circle and (20.3
transformed into an infinite MA model

or if 8(L) is invertible then (20.35) may also be transformed into an infinite lag
plus a white noise error
e-' (L)#(L)Yr= Er

Hence we have a number of alternative equivalent representations of any univ
series yr. By using a matrix formulation we will see that the above alternativ
tations apply to a multivariate system.
consider only three for illustrative purposes) the vector autoregressive mov
model (VARMA) is

- -) Hence ea
where $ 1 1 , 4 2 2 , 4 3 3 are of the form $11 = (1 - &)L - &)L2 -
is of the form

The above equations can be represented in matrix form as:
O(L)Y, = 8 ( L ) s r
where Y, = (ylt, y2,, y3) and 8 , = (El,, ˜ 2 t ˜, 3 and) O(L) and 8 ( L ) are c
matrices which depend on the $ij and 8ij parameters, respectively.
The VARMA equation system (20.39) is consistent with Wold's decomposit
if all the roots of Q ( L ) lie outside the unit circle, since (20.39) implies:
Y,= O-'(L)B(L)s,
Hence each Yir can be represented as an infinite moving average of current an
noise errors E r . Since any linear combination of ( E l l , czr,‚¬31) can also be rep
terms of a MA of a single error, say, vjr, then each yir can be written:

It is also straightforward to see that if 8 ( L ) is invertible then:
e-'(L)Q(L)Y, = 61
and hence any set of variables yir(i = 1 , 2 , 3 ) may be represented as an inf
autoregression plus a linear combination of white noise errors E i r ( i = 1, 2,3).
Yjt, take ylr as an example, is of the form

where vlr is a linear combination of white noise errors at time t and hence is
noise. The above representation is known as a vector autoregression (VAR) an
notation is:
Y, = A(L)Y,-1 vt

Can we Reduce the Size of the VARMA System?
It can now be demonstrated how
To simplify the algebra consider reducing a simple 2 x 2 system to a univari
We begin with a VARMA (2 x 2) model:

+ 412Y2r-1 + @l(L)Elr
y1r = 4llYlr-1
+ 422Y2r-1 + @2(L)E2r
= 421Ylr-1

and will only derive the univariate equation for ylr. From (20.45) we can obt
as a function of ylf-j (and the error terms):

+ @2(L)E2t]
= (1 - 422LI-l
Y2t [421Ylr-l

= f(Ylr-j, E24

Substituting for from (20.46) in (20.44) we have

Equation (20.47) is a univariate ARMA model for yl,. As we have seen
model can be further reduced to an infinite autoregression or moving averag
stationarity and invertibility apply). The results of this section can be sum
follows. A set of k stationary stochastic variables ylr, yzr, . . . ykr can be repre

a (k x k) VARMA system,
a smaller (k - I ) x (k - I ) VARMA system,
(iii) an infinite vector autoregressive VAR series with white noise errors B(L
that each yir depends only on lags of itself yir-, and lags of all the oth
yk,,-j (and a white noise error).
(iv) The set of k variables can be reduced to a set of univariate ARMA eq
each of the yir:
+(L)yir = @(L)Eir (i = 1,2, . . .k )

(v) The univariate ARMA representation can be transformed into an infin
average representation (Wold™s decomposition theorem) or an infinite
autoregression (assuming invertibility and stationarity).

What are the Advantages and Disadvantages of Alternative Time Series
All of the time series representations we have discussed assume that any se
be represented as a linear function of either its own lags, lags of other varia
error terms. If the world is ˜non-linear™ then clearly the linear form can at
approximation to the true non-linear system. Some non-linear time series m
However, this approach has not as yet featured greatly in the economic analy
prices and will not be discussed further.

Temporal Stability
A key issue in using time series models is temporal stability in the parameters
danger when investigating a VARMA or VAR system that it is ˜too small™ to
˜true™ constant parameters out there in the real world. Suppose, for the sake o
that the VARMA equation (20.44) for y l , has stable parameters when cons
function of lagged values of y l r , ˜ 2 However, suppose that equation (20.45
some unstable parameters (even one will do). When we substitute for y2r from
(20.44) and reduce the system to a univariate form (either AR or MA) then th
system for y l r has ˜new™ parameters that incorporate the parameters of equat
for yzr. Hence if the latter are unstable, then the parameters of all ˜smaller sy
equation 20.47) will also be unstable.
To give a concrete example of the above, suppose y l r and yzr are prices
rates, respectively. Without loss, let us assume the errors in (20.44) and (20.45
white noise, that is consist of E l r and ˜2˜ only. Equation (20.45), for interest ra
interpret as the monetary authorities reaction function. If the authorities have
rule whereby an increase in the price level causes the authorities to raise in
over successive time periods then we have:

+ h 2 Y 2 r - 1 + E2r
= $21Ylr-1

with $21, $22 > 0. Suppose that after some time the monetary authorities dec
a feedback rule for the interest rate and simply try and gradually lower in
Equation (20.48) now becomes:
+ E2r
= h 2 Y 2 r -1

with 0 < 4)22 < 1. Hence the parameters of the equation for y2r, the interest ra
varying over the two different monetary regimes. A researcher who estimates
system for either ylr or over the whole data set would find that the par
unstable. If he ignores this temporal instability then his estimates of the para
be biased and he will not have ˜discovered™ the true constant parameters of
in the two distinct regimes. Any tests based on the ˜smaller system™ (e.g.
on the parameters, forecast tests) will be incorrect. On the other hand the
equation (20.44) for y l r has stable parameters (although that for does not)
danger in representing the ˜real world™ by time series models which involve
small number of variables is that the ˜omitted variables™ which have been
out™ to yield the smaller system of equations may have non-constant parame
the smaller system will also have non-constant parameters.
To represent a series either as an infinite VAR or AR model, or an infin
MA model, is impossible in a finite data set. One approximates such models
a finite order. There are diagnostic tests (e.g. tests for serial correlation in the
Structural and Statistical Models Revisited
This section draws some comparisons between a structural economic mode
derived from economic theory (or theories) and the purely time series repr
discussed above. As we shall see a structural economic model can be re
purely time series representation. The transformation to a time series representat
implies restrictions on the parameters but a pure time series modeller would i
and merely estimate an unrestricted time series representation.
A macroeconomic model usually consists of a set of equations that may re
behaviour of economic agents, identities or technical identities or equilibrium
Shown below is a stylised model with three endogenous variables
ylt = rate of wage inflation (percent per annum)
y2t = rate of domestic price inflation in the UK (percent per annum)
= percentage deviation of output from its long-run trend (i.e. from
rate™ or non-accelerating rate of inflation (NAIRU))
y3t is called ˜cyclical output™ from now on. The exogenous variables in the m
xlr = percentage change in import prices (in domestic currency)
xzt = trades union power
= government expenditure

The first equation in the model is a wages version of the expectations augmen
curve. Wage inflation ylr is assumed to depend on price inflation y2t, cycl
y3r and trades union power xzt. The second equation models price inflation
a cost mark-up equation. Price inflation ˜2˜ depends on wage inflation yl,
price inflation x l t . (These two equations are used in the chapter on exchange
final equation expresses equilibrium in the goods market. Real output y3t in
real income rises and hence is positively related to wage increases ylr but it is
related to domestic price inflation yzr.Government expenditure ˜3˜ directly adds
and hence influences real output. The three-equation economic model is then

Lagged values have been included of all the basic variables of the system in eac
because it is probable that ylr,y2r and ˜3˜ react to changes in the RHS var
time, rather than instantaneously. The above three-equation system (20.50) is k
structural simultaneous equation system. The simultaneous aspect is due to th
one endogenous variable Yit depends on another endogenous variable Y j t at ti
˜identified™. Space constraints prevent us from discussing the concept of iden
detail. However, unless the system is identified, estimates of the parameters ar
less since any linear combination of the three equations is equally valid statis
clearly an arbitrary linear combination would not in general conform to one™s
priors). We will assume that the three-equation system in (20.50) is identified
not a problem here.
Notice that in a three-equation system we can only solve (algebraically
unknowns (yl,, y2,, y3,) in terms of the ˜knowns™ (XI,, x2,, ˜ 3 ˜ The econo
a priori which of the variables are not determined within the model: these
exogenous variables. Finally, note that economic theory would usually imply
are explained by the RHS variables except for an additive white noise process
we can relax this assumption and assume the Ejr are ARMA processes and w
would still be valid.) The differences between the ˜structural economic model
and the VARMA time series model are:
(i) the presence of current dated variables on the RHS of the structural m
and xir,
(ii) the presence of lagged exogenous variables x i t - j ( i = 1,2,3),
(iii) some ˜exclusion restrictions™ on the variables in the structural model (i.e.
appear in equation (20.50)),
(iv) the economic model involves a structure that is determined by the econo
under consideration.
Taking up the last point we see, for example, that agents alter yl, only in
changes in y2,, y3r or x2, (and their lags) and not directly because of changes i
Economic theory would usually suggest that the behavioral parameters should
over time.

From a Structural Model to a V M A or VAR System
The structural model in (20.50) can be compactly written in matrix notation
+ A2 (L)Yr- + A3 (L)Xr- +
&Yt = A1X, 1 1 61

+ A2(L)Yt-i + A3(L)Xt-i +
Yt = &˜(AiX, 6,)

Equation (20.51), which is known as the ˜final form™, expresses Y,as a functio
values of Y,and current and lagged values of X,. We know that any covarianc
series may be given a purely statistical representation in the form of a VAR
If we apply this to X, we have
+ e(L)v,
X, = B(L)Xr-1
If X, is stationary all the roots of B(L) lie outside the unit circle and hence
X, = ( I - B(L))-™B(L)V,
a VARMA model of the form

where the final term is a moving average of the white noise errors E , and vt. He
VARMA statistical representation of the stationary series X,, any structural si
equations model can be represented as a VARMA model. It follows from
discussion that the structural model can be further reduced to the ˜simpler
ARMA models outlined in the ˜summary™ above.
In general, the fact that the Ai matrices of the structural model have restric
by economic theory (i.e. some elements are zero) often implies some rest
the derived VARMA equations. These restrictions can usually be tested. H
we ignore such restrictions then any VARMA model may be viewed as an u
representation of a structural economic model. Note that the VARMA model
will usually depend on lags of all the other yj,(i # j ) variables even though th
equation for Yjr might exclude a particular yjr.
Some modellers advocate starting with a structural model based on som
theory, which usually involves some a priori (yet testable) restrictions on the
One can then analyse this model with a variety of statistical tests. Others (e.g. S
feel that the apriori knowledge required to identify the structural model (e.g
restrictions) are so ˜incredible™ that one should start with an unrestricted VAR
and simplify this model only on the basis of various statistical tests (e.g. exclus
tions and Granger causality tests, see below). These ˜simplification restriction
would not be suggested by economic theory but one would simply trade off
parsimony on purely statistical grounds (e.g. by using the Akaike information

Expectations Variables Added
If we add an expectations variable for one particular variable, for example E,y
q = 1, or 2, or 3, etc.) to the structural economic model then we have additiona
of interpretation and identification (the latter are particularly problematic, e.g.
(1987)). However, since expectations are formed with information available a
earlier then it must be true that

We can therefore (assuming suitable transversality conditions hold, see Cuthb
Taylor (1987) and Pesaran (1987)) substitute out for E,y,+, in terms of c
past observable values of the variables in the system. Hence the structural m
expectations can be reduced to one without expectations and is of the ge
(20.51). In fact as we have noted, the assumption of RE usually implies some
restrictions on the Ai matrices of the structural model. In general, therefore, t
of expectations variables still allows us to express any set of time series v
as a multivariate VARMA, VAR or VMA model or as a univariate ARMA,
represent ation.
parameters. If either of these conditions does not hold then the resulting line
time series model for Yjr will be misspecified and have unstable parameters.
On the other hand suppose the world is linear but the apriori restrictions
sion restrictions) imposed on the structural model by the economic theorist a
˜in reality™: in this case the VARMA representation may provide a superio
representation of the data.
The aim of the above is to point out the relationship between these
approaches to modelling time series. Both have acute potential problems, b
an enormous amount of judgement in deciding which approach is ˜reasona
given circumstances. It is certainly this author™s view that economic theory ou
some role in this decision process but there are no clear-cut infallible ˜rule
apply: ˜beauty and truth™ in applied economics are usually in the eye of the b

Stationarity and Non-Stationarity in Systems: Cointegration
This section deals with the issue of cointegration and discusses how non-statio
can yield ˜spurious regressions™ and how one can test to see if individual v
non-stationary. Having found that a set of variables is non-stationary it is no
to outline how the Johansen procedure can be used to determine whether a
stationary series are ˜linked together™ in the long run, that is cointegrated.
relationship between the Johansen procedure, the Box Jenkins methodology, e
tion models and Granger causality is briefly discussed.
We have noted that any single series can be classified as stationary or non
We now consider possible relationships between a set of non-stationary var
series xr and yr might be ˜highly trended™ because of a deterministic time trend
alt E r ) or because of a ˜stochastic trend™ (e.g. random walk with drift, yr = a
E t ) . Cointegration deals with data that have stochastic trends. In general two

stochastic trends will not be statistically related. For example, consider x, = PO
+ +
and y = a 0 yr-l Er where and vr are statistically independent (white n
In the ˜true™ model there is no relationship between yr and x,. However, if w
on x, in a sample of data then standard statistics (e.g. R2, t statistics) will s
they are linearly related:
+ ilxr
y = 50

This is usually referred to as the ˜spurious regression™ or ˜nonsense regressio
(Granger and Newbold, 1974). The R2 and t statistics from such regr
misleading. The t statistics for Si are not distributed as a Student™s t distribution
be used for testing hypotheses on the parameters SO,&. The R2 is often bimod
and Newbold noted that in these ˜spurious regressions™, R2 > DW (DW = Dur
statistic). The DW was ˜low™ indicating positive serial correlation in the resid
Cointegration seeks to provide a correct method of estimating equations
a set of variables, some of which have stochastic trends. Such stochastic
frequently found in economic time series (e.g. stock prices, interest rates and
the random walk) are said to be integrated of order I , that is I(1).There are
tests available to ascertain whether an individual series is 1(1).We will only c
Dickey-Fuller (DF) and augmented DF tests. The AR(1) model for any time

+ crlYf-1 + Er
y , = a0
where we take E˜ to be a stationary white noise series. If a < 1 (we take a to 1
then y , is a stationary I ( 0 ) series. However, if a = 1 then y , is I(1) since it
differenced once to yield a stationary series:

Thus A y t is stationary given that is stationary. Rearranging (20.57) we ha

where 8 = a - 1. If a1 < 1 then 8 < 0. Hence a test for stationarity is a tes
Dickey and Fuller (1979) show that the t statistic on 8 in the OLS regression
be used to test for 8 < 0. However, the critical value of the t statistic is not
Student™s t distribution and requires special tables of critical values. The DF c
for testing 8 < 0 is about 2.85 for reasonable sample sizes. Hence for y , to
require 8 < 0 and It1 > 2.85 where t = t statistic on 8. If y , is found to be I(
series is differenced once and the DF test applied to the A y , series to see if it
augmented Dickey -Fuller test includes additional lagged difference terms of

i= 1
to remove any serial correlation that may be present in E,. (A deterministic tim
also be included in (20.59).) Having ascertained that a set of variables are all in
the same order (we only consider I(1)series) then we can proceed to see if the
move together in the long run (i.e. have common stochastic trends). We consi
variable case first.
In general a linear combination of I(1) series that is q, = y , - P™x, is al
therefore non-stationary. However, it is possible that the linear combination q, i
and in this case y and x are said to be cointegrated with a cointegration para
q, is a stationary I ( 0 ) variable then we can say that the stochastic trend in y , is
by™ the stochastic trend in P™x,. Hence the two series move together over tim
q, between them is finite and the gap doesn™t grow larger over time.
If we have two variables which are cointegrated then the cointegrating vecto
However, for any Y 1 variables there can be up to Y unique cointegrating v
illustrative purposes let us consider a three-variable system, where y l , , y2,
I(1).Let us suppose that there are two unique cointegrating vectors, which w
of generality we normalise on y l , and y2,. The Engle and Granger (1987) rep
theorem states that cointegration implies that there exists a statistical represent
+ (terms in lagged Aylr-j, Ay2r-jV Ay31-j) + ˜3˜

The interesting features of the ECM are

(i) The two cointegrating vectors may, in principle, appear in all the equati
(3 x 3) system. The cointegrating parameters are 81 and 82.
(ii) All the variables in the error correction system are stationary I ( 0 ) var
yir (i = 1, 2, 3) are I(1) by assumption, hence Ayir must be I(0). T
(ylr-l - 81y3r-1) and (˜2r-1- 8 2 ˜ 3 ˜ - 1are stationary because the I(1)
In fact (20.60) is nothing more than VAR where the non-stationary I(1) var
been ˜transformed™ into stationary series (i.e. into difference terms or co
vectors) so that a VAR representation is permissible. The error terms Eif(i =
given by:
Eir = (Ayit - ˜linear combination of stationary variables™)

and hence are stationary. Since the error terms are stationary the usual statistic
be applied to the aij parameters of a VAR model in error correction form.
Before cointegration came on the scene the so-called Box- Jenkins metho
been used in analysing statistical time series models. This methodology im
any non-stationary I(1) series be differenced before estimating and testing th
VARMA) model. Hence a VAR model which contains only the first differences
variables is used in the Box-Jenkins methodology. This ensured that the error t
equation were stationary and hence conformed to standard distribution theory
statistical tests using ˜standard tables™ of critical values could then be used
cointegration analysis indicates that a VAR solely in first differences is miss
there are some cointegrating vectors present among the I(1) series. Put ano
VAR solely in first differences omits potentially important stationary variab
error correction, cointegrating vectors) and hence parameter estimates may
omitted variables bias. How acute the omitted variables bias might be depe
correlation between the included ˜differenced only™ terms of the Box- Jenkin
the omitted cointegration variables (yjt - Sjyjr). If these correlations are low
the omitted variables bias is likely to be low (high).
The parameters of the error correction model (20.60) can be estimated jo
the Johansen procedure. This procedure also allows one to test for the numbe
cointegrating vectors in the system which involves non-standard critical valu
done for interest rates in Chapter 14.) One may be able to simplify the error
system by testing to see if any of the weights on the error correction terms
system. For any variables that are 1(1) one can apply the Johansen procedure
provide a set of unique cointegrating vectors (if any) and these variables are th
in the structural model in the form (yjt-1 - 8;zr-1).
Since the ECM is a VAR involving lags of stationary variables it can, like
be transformed to give the following alternative representations. First, a ˜sm
correction system (e.g. 3 x 3 to a 2 x 2 system), or a VMA system. Also it can
to a ˜single-equation™ ARMA, AR or MA model in exactly the same way a
above for the VAR with stationary variables. It follows that the ˜dangers™, pa
terms of parameter stability of the resulting representations, will again be of k
In general the choice between a ˜large™ VAR/ECM or a ˜smaller™ system
off between efficiency and bias. The larger system is less likely to suffer fr
variables bias but the parameters may not be very precisely estimated (sinc
˜degrees of freedom™ as the number of parameters to be estimated increases
the fixed sample of data). On the other hand, excluding some variables (i.e. pa
starting with a very ˜small™ system may increase the precision of the estimated
but at a cost in terms of potential omitted variables bias and perhaps a loss of
accuracy. With any real world finite data set one can apply a wide variety
guide one™s choice but ultimately a great deal of judgement is required.

Granger Causality
There is one widely used and simple test on a VAR which enables a more pa
representation of the data. To illustrate this consider a (3 x 3) VAR system in th
variables ylr, y2r, ˜ 3 The . equation for ylr is:

where sufficient lags have been included to ensure that E l r is white noise. One
to test the proposition that lags of ˜2˜ taken together have no direct effect o
restriction that all the coefficients in 82(L) are zero is not rejected then we ca
that y2 does not Granger cause yl. The terms in y2 can then be omitted from t
which explains yl. A similar Granger causality test can be done for y3 in equat
We can also apply Granger causality tests for the equations with and y3 a
variables. Granger causality tests are often referred to as ˜block exogeneity
that Granger causality is a purely statistical view of causality: it simply tests t
of yjf have incremental explanatory power for Yir ( i # j ) .
In the VAR/ECM, lags of y also appear in the error correction term so th
must also include the ˜weights™ ajj in the Granger causality test. Otherwise t
is the same as in the pure VAR case described above.
A good example of where economic theory suggests a test for Granger
the term structure of interest rates. Here the expectations hypothesis of the ter
suggests that the long rate Rf is directly linked to future short-term interes
( j = 1 , 2 , 3 , .. ., etc). Hence the theory would imply that the long rate shou
cause short rates. Similarly, according to the fundamental valuation equatio
The main conclusions of this section are as follows:
(i) A structural economic model can be represented as a purely vector
model and the latter can be reduced to a univariate model. In using V
the issue of the temporal stability of the parameter is of key importanc
(ii) Cointegration deals with the relationships between non-stationary I(1) v
a set of variables are cointegrated then any cointegrating vectors should
in the VAR representation. Hence the VAR system purely in first differen
variables may be misspecified.
(iii) The VAR cointegration framework has been extensively used in testing
EMH holds for speculative asset prices.

This section outlines the basis of models of autoregressive conditional hetero
(ARCH) and the generalised ARCH (or GARCH) approach in modelling ti
risk premia. The emphasis is on the intuitive economic reasons for using th
rather than the details of the estimation procedures and algorithms. We begin
simple ARCH model and then build up to a generalised ARCH in mean mode
conditional variances and covariances.

Simple ARCH Model
The basic idea behind ARCH models is that the second moments of the distri
have an autoregressive structure. Consider an asset return model where we
expected excess return E,y,+l is constant

Now assume RE
+ Ef+l
Yf+l = q
where E f + l = yf+l - Efyf+lis the RE forecast error. Many asset markets are ch
by periods of ˜turbulence and tranquillity™, that is to say large (small) forecas
whatever sign) tend to be followed by further large errors (small) errors. There
persistence in the variance of the forecast errors. The simplest formulation of
a,+, = var(Ef+11S2,)= w (;YE;

If a < 1 the unconditional variance of denoted a2,is given by

a* = w / ( l - a)

and is a constant. However, the conditional variance given by (20.64) varie
and E; can be used to predict the variance next period af+l.
+ (20.64) the likelihood may be expressed as:
In terms of (20.63)

+("4yt-1'I2- w2]
+a ( y t - l - -2
1 = -- ln [w -
2 t

The log-likelihood can therefore be expressed as a non-linear function of th
parameters w, a and p and the data series y t . Standard optimisation routine
be used to maximise the likelihood. In particular, one must ensure that the ti
variance is always positive. In this case this is easily done by replacing w and a
by w2 and a2 which are always positive and hence a in (20.64) will always

GARCH Models
The ARCH process in (20.64) has a memory of only one period. We could gen
process by adding lags of E ; - ˜ :

+ a,&;+ +
at+,= w a2E;-, ***

but the number of parameters to estimate increases rapidly. A more parsimonio
of introducing a 'long memory' is the GARCH(1,l) process
+ a&;+ pa:
at+l= w

The GARCH process has the same structure as an AR(1) model for the error t
the AR(1) process applies to the variance. The unconditional variance den
constant) is:
a2 = w/[l - (a p)]
for (a p ) < 1. However, (20.69) can be rearranged to give
+ a&;
(1 - pL)a:+l = w
= ˜ ( /3 + /!I2 + + a(l + BL + (pL)2+
.) .)E;
** **

Hence a can be viewed as an infinite weighted average of all past squar
errors. The weights on E;-, in (20.71) are constrained to be geometrically decl
a little manipulation (20.71) may also be rewritten
(a,,, - a2 ) = a(&;- a*) / ( ;- a 2 )
Using either (20.69) or (20.72) we can take expectations of both sides and
Et$ = a we have (for equation (20.69))

+ (a+ p>a:
+ +
values of 0 f + j depends on the size of a /?.If a /? is close to unity then
time t will persist for many future periods. For a /? = 1 then any shock w
permanent change in all future values of hence shocks to the conditional v
˜persistent™. For a /I = 1 we have what is known as an integrated GARCH p
IGARCH). For IGARCH the conditional variance is non-stationary and the un
variance is unbounded. The statistical/distribution properties of IGARCH pr
currently the focus of much research in this literature.
One can generalise (20.69) to a GARCH(p, q) process where there are p
and q lags of 0;.This allows the conditional variance to have an infinitely lo
(because of the c ˜ ; - ˜terms) but doesn™t constraint the response of a,:+j to a
to have geometrically declining weights. Given the acute non-linearities of th
function and loss of degrees of freedom as the number of parameters in th
process increases, researchers have often used fairly low-order GARCH( p , q)
these have been found to fit the time varying volatility in the forecast erro
prices fairly well.
The likelihood function for (20.63) plus the GARCH(1,l) model is of the fo
However, an arbitrary starting value for 0; is required to generate the terms in
likelihood function. ˜1 is given by (y1 - Q) and this together with a starting v
+ +PO:. Values of Ef+l = y,+l - \I, for
be used to generate 0 = w :
+ +
= w a l ˜ ; /?c$(t = 2 , 3 , . . .) by recursive
can be used to generate
for 0;.

ARCH-M and GARCH-M Models
Suppose we now extend our economic model of asset pricing so that the exp
depends positively on the perceived riskiness of the asset or portfolio. Supp
riskiness can be adequately represented by the (conditional) ˜own™ variance of
errors of returns o:+˜.This model of expected returns can be represented by
in mean™ equation:
Yf+l = Qo + Q10?+1 + Ef+1

Thus the expected return Efyf+lis given by (Qo Q ˜ O ; + ˜ where is t
(conditional) variance (strictly we should write this as E(CT:+˜ but do no
notational ease). The forecast error squared is
= (Yf+l - Qo - QlCJf+l)

We assume that there is persistence in the variance of the conditional forecast e
is either ARCH( p) or GARCH( 1 , l )
+ CaiE:-j

af+l = w +
GARCH (;YE: /?U:
that are
variables zlr in the expected return equation or additional variables
influence the conditional variance. For a GARCH(1,l) process this would co
following two equations:
+ *10:+1+
yr+l = q o ˜ 2 ˜ ˜ l rEr+l

+ a&;+ po; + y™zzr

For example, the APT suggests that the variables in zlr could be macroeconom
such as the growth in output or the rate of inflation. Often the dividend price ra
to influence returns and this might also be included in (20.78). Economic th
particularly informative about the variables ˜2˜ that might influence investors™
of future volatility but clearly if volatility in prices is associated with new
arriving in the market then a market turnover variable might be included in zz
variable for days of the week when the market is open (i.e. zzr = 1 for m
and zero otherwise) can also be incorporated in the GARCH equation. It is a
that elements of ˜2˜ also appear in the equation for yr+l (and a dummy variab
returns and the weekend effect are obvious candidates). So far we have assum
forecast errors Er+l are normally distributed, so that:
- “q™xr,o;+,)

where the variables that influence yr+l are subsumed in x,. The conditional mea
q ™ x r and the conditional variance is given by o?+˜.However, we can assume a
tion we like for the conditional moments. For example, for daily data on stoc
changes in exchange rates (i.e. the yr+l variable) the conditional distribution oft
is the Student™s t distribution which has slightly fatter tails than the normal d
The likelihood is more complex algebraically than that given in equation (20
normal distribution but the principle behind the estimation remains unchanged
in q?and Er in the (new) likelihood equation are replaced by (non-linear) func
˜observables™ yr,xr and the unknown parameters Q and (w, a,p) of the GARC
The likelihood can then be estimated in the usual fashion (e.g. see the m
options in the GAUSS, LIMDEP or RATS programmes) which often involve
(rather than analytic) techniques.
Although the unconditional distributions for many financial variables
changes in stock prices, interest rates or exchange rates) may be leptokur
be that the conditional distribution based on a model like the GARCH-M
(20.78) (20.79) may yield a distribution that is not leptokurtic.

Covariances and ARCH Models
To motivate the application of ARCH models to incorporate covariances c
excess return ylr+l on a particular portfolio (e.g. the return on a portfolio of
in various chemical firms). The CAPM plus RE implies that
Hence the return on the market portfolio is determined by the conditional vari
market portfolio 0 The term should equal zero and is the market pr
(Note that appears in both (20.81) and (20.82) according to portfolio th
shocks that influence the return on the market portfolio are also likely to in
return on asset 1. For example, this is because ˜good news™ about the econom
an effect on the return on the stocks in portfolio 1 as well as all other stocks in
portfolio. Hence E l r and E,, are likely to be correlated (i.e. their covariance is
If we assume a GARCH( 1™1) process for the covariance we have

There will also be an equation of the form (20.83) to explain the conditiona
for both o:r+l and o:r+l(see equation (20.79)). The parameters a and /I in t
covariance equation and in the two GARCH variance equations need not b
Hence the degree of persistence (i.e. the value of a /I) can be different in ea
specification and generally speaking one™s ˜instinct™ would be that the degre
tence might well be different in each process. In practice, however, the (a,
are often assumed to be the same in each GARCH equation because otherwis
cult to get precise estimates of the parameters from the likelihood function whi
non-linear in the parameters.
Clearly the GARCH-M model with variances and covariances involves a (2
of error terms ( E l r , E m r ) and the likelihood function in (20.66) is no longer
However, the principle is the same. Given starting values at t = 0 for elm,
the GARCH equations can be used recursively to generate values for these
t = 1 , 2 . . . in terms of the unknown parameters. Similarly equations (20.81)
provide a series for ˜ 1 and Emt to input into the likelihood, which is then
using numerical methods.

ARCH and GARCH models provide a fairly flexible method of modelling ti
conditional variances and covariances. Such models assume that investors™
of risk tomorrow depend on what their perception of risk has been in earl
ARCH models are therefore autoregressive in the second moment of the d
An ˜ARCH (or GARCH) in mean™ model assumes that expected returns d
investors™ perceptions of risk. A higher level of risk requires a higher level
returns. ARCH models allow this risk premium to vary over time and henc
equilibrium returns also vary over time.

Over the last 10 years the role of expectations formation in both theoretical
financial economics has been of central importance. At the applied level rel
pure expectations hypothesis of the term structure, the long rate on bonds
expectations about future short-term interest rates. The current price of stocks
expected future dividends and under risk neutrality the current forward rate is
predictor of the expected future spot rate of exchange.
In general the efficient markets literature is concerned with the proposition
use all available information to remove any known profitable opportunities in
and this usually involves agents forming expectations about future events.
propositions outlined above we need a framework for modelling these un


. 13
( 14)