. 2
( 7)


„ ¤1
1 for
= .
1 S (T )

S (T ) 0 otherwise
Assuming that k-step ahead forecast errors are at most (k ’ 1)-
dependent, it is therefore recommended that S (T ) = (k ’ 1). It is not
likely that f d (0) will be negative, but in the rare event that f d (0) < 0,
Volatility Forecast Evaluation 27

it should be treated as zero and the null hypothesis of equal forecast
accuracy be rejected automatically.

2.3.2 Diebold and Mariano™s sign test
The sign test targets on the median with the null hypothesis that
Med (d) = Med g(eit ) ’ g e jt
= 0.
Assuming that dt ∼ iid, then the test statistic is
S2 = I+ (dt )

1 if dt > 0
I+ (dt ) = .
0 otherwise
For small sample, S2 should be assessed using a table for cumulative
binomial distribution. In large sample, the Studentized verson of S2 is
asymptotically normal
S2 ’ 0.5T a
S2a = √ ∼ N (0, 1).

2.3.3 Diebold and Mariano™s Wilcoxon sign-rank test
As the name indicates, this test is based on both the sign and the rank of
loss differential with test statistic
I+ (dt ) rank (|dt |)
S3 =

represents the sum of the ranks of the absolute values of the positive
observations. The critical values for S3 have been tabulated for small
sample. For large sample, the Studentized verson of S3 is again asymp-
totically normal
T (T + 1)
S3 ’ a
S2a = ∼ N (0, 1) .
T (T + 1) (2T + 1)
28 Forecasting Financial Market Volatility

2.3.4 Serially correlated loss differentials
Serial correlation is explicitly taken care of in S1 . For S2 and S3 (and
their asymptotic counter parts S2a and S3a ), the following k-set of loss
differentials have to be tested jointly

di j,1 , di j,1+k , di j,1+2k , · · · ,
di j,2 , di j,2+k , di j,2+2k , · · · ,
di j,k , di j,2k , di j,3k , · · · .

A test with size bounded by ± is then tested k times, each of size ±/k,
on each of the above k loss-differentials sequences. The null hypothesis
of equal forecast accuracy is rejected if the null is rejected for any of the
k samples.

The regression-based method for examining the informational content
of forecasts is by far the most popular method in volatility forecasting.
It involves regressing the actual volatility, X t , on the forecasts literature,
X t , as shown below:

X t = ± + β X t + …t . (2.3)

Conditioning upon the forecast, the prediction is unbiased only if ± = 0
and β = 1.
Since the error term, …t , is heteroscedastic and serially correlated when
overlapping forecasts are evaluated, the standard errors of the parameter
estimates are often computed on the basis of Hansen and Hodrick (1980).
Let Y be the row matrix of regressors including the constant term. In
(2.3), Yt = 1 X t is a 1 — 2 matrix. Then
=T …t2 Yt Yt
+T Q (k, t) …k …t Yt Yk + Yk Yt ,
k=1 t=k+1
Volatility Forecast Evaluation 29

where …k and …t are the residuals for observation k and t from the regres-
sion. The operator Q (k, t) is an indicator function taking the value 1 if
there is information overlap between Yk and Yt . The adjusted covariance
matrix for the regression coef¬cients is then calculated as
’1 ’1
= YY .
YY (2.4)

Canina and Figlewski (1993) conducted some simulatation studies and
found the corrected standard errors in (2.4) are close to the true values,
and the use of overlapping data reduced the standard error between one-
quarter and one-eighth of what would be obtained with only nonover-
lapping data.
In cases where there are more than one forecasting model, additional
forecasts are added to the right-hand side of (2.3) to check for incremen-
tal explanatory power. Such a forecast encompassing test dates back
to Theil (1966). Chong and Hendry (1986) and Fair and Shiller (1989,
1990) provide further theoretical exposition of such methods for testing
forecast ef¬ciency. The ¬rst forecast is said to subsume information con-
tained in other forecasts if these additional forecasts do not signi¬cantly
increase the adjusted regression R 2 . Alternatively, an orthogonality test
may be conducted by regressing the residuals from (2.3) on other fore-
casts. If these forecasts are orthogonal, i.e. do not contain additional
information, then the regression coef¬cients will not be different from
While it is useful to have an unbiased forecast, it is important to
distinguish between bias and predictive power. A biased forecast can
have predictive power if the bias can be corrected. An unbiased forecast
is useless if all forecast errors are big. For X i to be considered as a good
forecast, Var(…t ) should be small and R 2 for the regression should tend
to 100%. Blair, Poon and Taylor (2001) use the proportion of explained
variability, P, to measure explanatory power
Xi ’ Xi
P =1’ . (2.5)
(X i ’ µ X )2

The ratio in the right-hand side of (2.5) compares the sum of squared
prediction errors (assuming ± = 0 and β = 1 in (2.3)) with the sum
of squared variation of X i . P compares the amount of variation in the
forecast errors with that in actual volatility. If prediction errors are small,
30 Forecasting Financial Market Volatility

P is closer to 1. Given that a regression model that produces (2.5) is more
restrictive than (2.3), P is likely to be smaller than conventional R 2 . P
can even be negative since the ratio on the right-hand side of (2.5) can
be greater than 1. A negative P means that the forecast errors have
a greater amount of variation than the actual volatility, which is not a
desirable characteristic for a well-behaved forecasting model.

In all forecast evaluations, it is important to distinguish in-sample and
out-of-sample forecasts. In-sample forecast, which is based on param-
eters estimated using all data in the sample, implicitly assumes parameter
estimates are stable across time. In practice, time variation of parameter
estimates is a critical issue in forecasting. A good forecasting model
should be one that can withstand the robustness of out-of-sample test “
a test design that is closer to reality.
Instead of striving to make some statistical inference, model perfor-
mance could be judged on some measures of economic signi¬cance. Ex-
amples of such an approach include portfolio improvement derived from
better volatility forecasts (Fleming, Kirby and Ostdiek, 2000, 2002).
Some papers test forecast accuracy by measuring the impact on option
pricing errors (Karolyi, 1993). In the latter case, pricing error in the
option model will be cancelled out when the option implied volatility is
reintroduced into the pricing formula. So it is not surprising that evalu-
ation which involves comparing option pricing errors often prefers the
implied volatility method to all other time series methods.
Research in ¬nancial market volatility has been concentrating on
modelling and less on forecasting. Work on combined forecast is rare,
probably because the groups of researchers in time series models and
option pricing do not seem to mix. What has not yet been done in the
literature is to separate the forecasting period into ˜normal™ and ˜excep-
tional™ periods. It is conceivable that different forecasting methods are
better suited to different trading environment and economic conditions.
Historical Volatility Models

Compared with the other types of volatility models, the historical volatil-
ity models (HIS) are the easiest to manipulate and construct. The well-
known Riskmetrics EWMA (equally weighted moving average) model
from JP Morgan is a form of historical volatility model; so are models
that build directly on realized volatility that have became very popu-
lar in the last few years. Historical volatility models have been shown
to have good forecasting performance compared with other time series
volatility models. Unlike the other two time series models (viz. ARCH
and stochastic volatility (SV)) conditional volatility is modelled sepa-
rately from returns in the historical volatility models, and hence they
are less restrictive and are more ready to respond to changes in volatil-
ity dynamic. Studies that ¬nd historical volatility models forecast better
than ARCH and/or SV models include Taylor (1986, 1987), Figlewski
(1997), Figlewski and Green (1999), Andersen, Bollerslev, Diebold and
Labys (2001) and Taylor, J. (2004). With the increased availability of
intraday data, we can expect to see research on the realized volatility
variant of the historical model to intensify in the next few years.

Unlike ARCH SV models where returns are the main input, HIS models
do not normally use returns information so long as the volatility estimates
are ready at hand. Take the simplest form of ARCH(1) for example,
rt = µ + µt , µt ∼ N (0, σt ) (3.1)
µt = z t σt , z t ∼ N (0, 1)
σt2 = ω + ±1 µt’1 .
The conditional volatility σt2 in (3.2) is modelled as a ˜byproduct™
of the return equation (3.1). The estimation is done by maximizing
the likelihood of observing {µt } using the normal, or other chosen,
density. The construction and estimation of SV models are similar to
those of ARCH, except that there is now an additional innovation term
in (3.2).
32 Forecasting Financial Market Volatility

In contrast, the HIS model is built directly on conditional volatility,
e.g. an AR(1) model:
σt = γ + β1 σt’1 + …t . (3.3)
The parameters γ and β1 are estimated by minimizing in-sample forecast
errors, …t , where
…t = σt ’ γ ’ β 1 σ t’1 ,
and the forecaster has the choice of reducing mean square errors, mean
absolute errors etc., as in the case of choosing an appropriate forecast
error statistic in Section 2.2.
The historical volatility estimates σt in (3.3) can be calculated as
sample standard deviations if there are suf¬cient data for each t in-
terval. If there is not suf¬cient information, then the H -L method of
Section 1.3.2 may be used, and in the most extreme case, where only
one observation is available for each t interval, one often resorts to using
absolute return to proxy for volatility at t. In Section 1.3.1 we have high-
lighted the danger of using daily absolute or squared returns to proxy
˜actual™ daily volatility for the purpose of forecast evaluation, as this
could lead to very misleading model ranking. The problem with the use
of daily absolute return in volatility modelling is less severe provided
that long distributed lags are included (Nelson, 1992; Nelson and Foster,
1995). With the increased availability of intraday data, historical volatil-
ity estimates can be calculated quite accurately as realized volatility
following Section 1.3.3.

There are now two major types of HIS models: the single-state and the
regime-switching models. All the HIS models differ by the number of lag
volatility terms included in the model and the weights assigned to them,
re¬‚ecting the choice on the tradeoff between increasing the amount of
information and more updated information.

3.2.1 Single-state historical volatility models
The simplest historical price model is the random walk model, where
the difference between consecutive period volatility is modelled as a
random noise;
σt = σt’1 + vt ,
Historical Volatility Models 33

So the best forecast for tomorrow™s volatility is today™s volatility:
σ t+1 = σt ,
where σt alone is used as a forecast for σt+1 .
In contrast, the historical average method makes a forecast based on
the entire history
σ t+1 =(σt + σt’1 + · · · + σ1 ) .
The simple moving average method below,
σ t+1 = (σt + σt’1 + · · · + σt’„ ’1 ) ,

is similar to the historical average method, except that older information
is discarded. The value of „ (i.e. the lag length to past information
used) could be subjectively chosen or based on minimizing in-sample
forecast error, ‚t+1 = σt+1 ’ σ t+1 . The multi-period forecasts σ t+„ for
„ > 1 will be the same as the one-step-ahead forecast σ t+1 for all three
methods above.
The exponential smoothing method below,
σt = (1 ’ β) σt’1 + βσ t’1 + ξt 0 ¤ β ¤ 1,
σ t+1 = (1 ’ β) σt + βσ t ,
is similar to the historical method, but more weight is given to the recent
past and less weight to the distant past. The smoothing parameter β is
estimated by minimizing the in-sample forecast errors ξt .
The exponentially weighted moving average method (EWMA) below
is the moving average method with exponential weights:
„ „
σ t+1 = β σt’i’1 βi .

i=1 i=1

Again the smoothing parameter β is estimated by minimizing the in-
sample forecast errors ξt . The JP Morgan RiskmetricsTM model is a
procedure that uses the EWMA method.
All the historical volatility models above have a ¬xed weighting
scheme or a weighting scheme that follows some declining pattern. Other
types of historical model have weighting schemes that are not prespec-
i¬ed. The simplest of such models is the simple regression method,
σt = γ + β1 σt’1 + β2 σt’2 + · · · + βn σt’n + …t ,
σ t+1 = γ + β1 σt + β2 σt’1 + · · · + βn σt’n+1 ,
34 Forecasting Financial Market Volatility

which expresses volatility as a function of its past values and an error
The simple regression method is principally autoregressive. If past
volatility errors are also included, one gets the ARMA model
σ t+1 = β1 σt + β2 σt’1 + · · · + γ1 …t + γ2 …t’1 + · · · .
Introducing a differencing order I(d), we get ARIMA when d = 1 and
ARFIMA when d < 1.

3.2.2 Regime switching and transition exponential smoothing
In this section, we have the threshold autoregressive model from Cao
and Tsay (1992):
(i) (i)
σt = φ0 + φ1 σt’1 + · · · + φ (i) σt’ p + vt , i = 1, 2, . . . , k
(i) (i)
σ t+1 = φ0 + φ1 σt + · · · + φ (i) σt+1’ p ,

where the thresholds separate volatility into states with independent
simple regression models and noise processes in each state. The predic-
tion σ t+1 could be based solely on current state information i assuming
the future will remain on current state. Alternatively it could be based on
information of all states weighted by the transition probability for each
state. Cao and Tsay (1992) found the threshold autoregressive model
outperformed EGARCH and GARCH in forecasting of the 1- to 30-
month volatility of the S&P value-weighted index. EGARCH provided
better forecasts for the S&P equally weighted index, possibly because
the equally weighted index gives more weights to small stocks where
the leverage effect could be more important.
The smooth transition exponential smoothing model is from Taylor,
J. (2004):
σ t = ±t’1 µt’1 + (1 ’ ±t’1 ) σ 2 + vt ,

±t’1 = ,
1 + exp (β + γ Vt’1 )
and Vt’1 = aµt’1 + b |µt’1 | is the transition variable. The smoothing
parameter ±t’1 varies between 0 and 1, and its value depends on the
size and the sign of µt’1 . The dependence on µt’1 means that multi-step-
ahead forecasts cannot be made except through simulation. (The same
would apply to many nonlinear ARCH and SV models as we will show
in the next few chapters.)
Historical Volatility Models 35

One-day-ahead forecasting results show that the smooth transition ex-
ponential smoothing model performs very well against several ARCH
counterparts and even outperformed, on a few occasions, the realized
volatility forecast. But these rankings were not tested for statistical sig-
ni¬cance, so it is dif¬cult to come to a conclusion given the closeness
of many error statistics reported.

Taylor (1987) was one of the earliest to test time-series volatility fore-
casting models before ARCH/GARCH permeated the volatility litera-
ture. Taylor (1987) used extreme value estimates based on high, low and
closing prices to forecast 1 to 20 days DM/$ futures volatility and found
a weighted average composite forecast performed best. Wiggins (1992)
also gave support to extreme-value volatility estimators.
In the pre-ARCH era, there were many studies that covered a wide
range of issues. Sometimes forecasters would introduce ˜learning™ by
allowing parameters and weights of combined forecasts to be dynam-
ically updated. These frequent updates did not always lead to better
results, however. Dimson and Marsh (1990) found ex ante time-varying
optimized weighting schemes do not always work well in out-of-sample
forecasts. Sill (1993) found S&P500 volatility was higher during reces-
sion and that commercial T-Bill spread helped to predict stock-market
The randow walk and historical average method seems naive at ¬rst,
but they seem to work very well for medium and long horizon forecasts.
For forecast horizons that are longer than 6 months, low-frequency data
over a period at least as long as the forecast horizon works best. To
provide equity volatility for investment over a 5-year period for exam-
ple, Alford and Boatsman (1995) recommended, after studying a sam-
ple of 6879 stocks, that volatility should be estimated from weekly or
monthly returns from the previous 5 years and that adjustment made
based on industry and company size. Figlewski (1997) analysed the
volatility of the S&P500, the long- and short-term US interest rate and
the Deutschemark“dollar exchange rate and the use of monthly data
over a long period provides the best long-horizon forecast. Alford and
Boatsman (1995), Figlewski (1997) and Figlewski and Green (1999) all
stressed the importance of having a long enough estimation period to
make good volatility forecasts over long horizon.

Financial market volatility is known to cluster. A volatile period tends to
persist for some time before the market returns to normality. The ARCH
(AutoRegressive Conditional Heteroscedasticity) model proposed by
Engle (1982) was designed to capture volatility persistence in in¬‚ation.
The ARCH model was later found to ¬t many ¬nancial time series and its
widespread impact on ¬nance has led to the Nobel Committee™s recog-
nition of Rob Engle™s work in 2003. The ARCH effect has been shown
to lead to high kurtosis which ¬ts in well with the empirically observed
tail thickness of many asset return distributions. The leverage effect, a
phenomenon related to high volatility brought on by negative return,
is often modelled with a sign-based return variable in the conditional
volatility equation.

4.1 ENGLE (1982)
The ARCH model, ¬rst introduced by Engle (1982), has been ex-
tended by many researchers and extensively surveyed in Bera and
Higgins (1993), Bollerslev, Chou and Kroner (1992), Bollerslev,
Engle and Nelson (1994) and Diebold and Lopez (1995). In contrast to
the historical volatility models described in the previous chapter, ARCH
models do not make use of the past standard deviations, but formulate
conditional variance, h t , of asset returns via maximum likelihood pro-
cedures. (We follow the ARCH literature here by writing σt2 = h t .) To
illustrate this, ¬rst write returns, rt , as
r t = µ + µt ,
µt = h t z t , (4.1)
where z t ∼ D (0, 1) is a white noise. The distribution D is often taken as
normal. The process z t is scaled by h t , the conditional variance, which
in turn is a function of past squared residual returns. In the ARCH(q)
process proposed by Engle (1982),
ht = ω + ± j µt’ j
38 Forecasting Financial Market Volatility

with ω > 0 and ± j ≥ 0 to ensure h t is strictly positive variance. Typi-
cally, q is of high order because of the phenomenon of volatility per-
sistence in ¬nancial markets. From the way in which volatility is con-
structed in (4.2), h t is known at time t ’ 1. So the one-step-ahead forecast
is readily available. The multi-step-ahead forecasts can be formulated
by assuming E µt+„ = h t+„ .

The unconditional variance of rt is
σ2 = .
1’ ±j

The process is covariance stationary if and only if the sum of the autore-
gressive parameters is less than one j=1 ± j < 1.

For high-order ARCH(q) process, it is more parsimonious to model
volatility as a GARCH( p, q) (generalized ARCH due to Bollerslev
(1986) and Taylor (1986)), where additional dependencies are permitted
on p lags of past h t as shown below:
p q
ht = ω + βi h t’i + ± j µt’ j

i=1 j=1

and ω > 0. For GARCH(1, 1), the constraints ±1 ≥ 0 and β1 ≥ 0 are
needed to ensure h t is strictly positive. For higher orders of GARCH,
the constraints on βi and ± j are more complex (see Nelson and Cao
(1992) for details). The unconditional variance equals
σ2 = p q
1’ βi ’ ±j
i=1 j=1

The GARCH( p, q) model is covariance stationary if and only if
p q
i=1 βi + j=1 ± j < 1.
Volatility forecasts from GARCH(1, 1) can be made by repeated sub-
stitutions. First, we make use of the relationship (4.1) to provide an
estimate for the expected squared residuals
E µt2 = h t E z t2 = h t .
Arch 39

The conditional variance h t+1 and the one-step-ahead forecast is known
at time t,
h t+1 = ω + ±1 µt2 + β1 h t . (4.3)
The forecast of h t+2 makes use of the fact that E µt+1 = h t+1 and we

h t+2 = ω + ±1 µt+1 + β1 h t+1

= ω + (±1 + β1 ) h t+1 .
h t+3 = ω + (±1 + β1 ) h t+2
= ω + ω (±1 + β1 ) + (±1 + β1 )2 h t+1
= ω + ω (±1 + β1 ) + ω (±1 + β1 )2 + (±1 + β1 )2 ±1 µt2 + β1 h t .
As the forecast horizon „ lengthens,
+ (±1 + β1 )„ ±1 µt2 + β1 h t .
h t+„ = (4.4)
1 ’ (±1 + β1 )
If ±1 + β1 < 1, the second term on the RHS of (4.4) dies out eventually
and h t+„ converges to ω/[1 ’ (±1 + β1 )], the unconditional variance.
If we write …t = µt2 ’ h t and substitute h t = µt2 ’ …t into (4.3), we
µt2 ’ …t = ω + ±1 µt’1 + β1 µt’1 ’ β1 …t’1
2 2

µt2 = ω + (±1 + β1 ) µt’1 + …t ’ β1 …t’1 .
Hence, µt2 , the squared residual returns follow an ARMA process with
autoregressive parameter (±1 + β1 ). If ±1 + β1 is close to 1, the autore-
gressive process in (4.5) dies out slowly.

p q
For a GARCH( p, q) process, when i=1 ±i + j=1 β j = 1, the un-
conditional variance σ 2 ’ ∞ is no longer de¬nite. The series rt is not
covariance stationary, although it remains strictly stationary and ergodic.
The conditional variance is then described as an integrated GARCH (de-
noted as IGARCH) and there is no ¬nite fourth moment.1

This is not the same as, and should not be confused with, the ˜integrated volatility™ described in Section 1.3.3.
40 Forecasting Financial Market Volatility

An in¬nite volatility is a concept rather counterintuitive to real phe-
nomena in economics and ¬nance. Empirical ¬ndings suggest that
GARCH(1, 1) is the most popular structure for many ¬nancial time
series. It turns out that RiskmetricsTM EWMA (exponentially weighted
moving average) is a nonstationary version of GARCH(1, 1) where the
persistence parameters, ±1 and β1 , sum to 1. To see the parallel, we ¬rst
make repeated substitution of (4.3) and obtain

h t+2 = ω + ±µt+1 + βh t+1

= ω + ωβ + ±µt+1 + ±βµt2 + β 2 h t ,

„ „
β i’1 µt+„ ’1 + β „ h t .
h t+„ = ω β +±
i’1 2

i=1 i=1

When „ ’ ∞, and provided that β < 1 we can infer that

ht = +± β i’1 µt’i .
1’β i=1

Next, we have the EWMA model for the sample standard deviations,
σ2 = σt’1 + »σt’1 + · · · + »n σt’n .
2 2 2
1 + » + »2 + · · · + »n

As n ’ ∞, and provided that » < 1

σ2 = (1 ’ ») »i’1 σt’i .

If we view µt2 as a proxy for σt2 , (4.6) and (4.7) are both autoregressive
series with long distributed lags, except that (4.6) has a constant term
and (4.7) has not.2
While intuitively unconvincing as a volatility process because of the
in¬nite variance, the EWMA model has nevertheless been shown to be
powerful in volatility forecasting as it is not constrained by a mean level
of volatility (unlike e.g. the GARCH(1, 1) model), and hence it adjusts
readily to changes in unconditional volatility.

EWMA, a sample standard deviation model, is usually estimated based on minimizing in-sample forecast
errors. There is no volatility error in GARCH conditional variance. This is why σ 2 in (4.7) has a hat and h t in
(4.6) has not.
Arch 41

The exponential GARCH (denoted as EGARCH) model is due to Nelson
(1991). The EGARCH( p, q) model speci¬es conditional variance in log-
arithmic form, which means that there is no need to impose an estimation
constraint in order to avoid negative variance;
ln h t = ±0 + β j ln h t’ j
+ γk | t’k |
+ θk ’ 2/π

= µt ht .

Here, h t depends on both the size and the sign of µt . With appropriate
conditioning of the parameters, this speci¬cation captures the stylized
fact that a negative shock leads to a higher conditional variance in the
subsequent period than a positive shock. The process is covariance sta-
tionary if and only if j=1 β j < 1.
Forecasting with EGARCH is a bit involved because of the logarithmic
transformation. Tsay (2002) showed how forecasts can be formulated
with EGARCH(1, 0) and gave the one-step-ahead forecast as
h t+1 = h 2±1 exp [(1 ’ ±1 ) ±0 ] exp [g ( )]

+γ | t’1 |
g( ) = θ ’ 2/π .

For the multi-step forecast
h t+„ = h 2±1 („ ’ 1) exp (ω) exp 0.5 (θ + γ )2 (θ + γ )
+ exp 0.5 (θ ’ γ )2 (θ ’ γ ) ,
ω = (1 ’ ±1 ) ±0 ’ γ 2/π
and (·) is the cumulative density function of the standard normal dis-

Models that also allow for nonsymmetrical dependencies include the
GJR-GARCH (Glosten, Jagannathan and Runkle, 1993) as shown
42 Forecasting Financial Market Volatility

p q
ht = ω + βi h t’i + ± j µt’ j + δ j D j, t’1 µt’ j
2 2

i=1 j=1

1 if µt’1 < 0
Dt’1 = ,
0 if µt’1 ≥ 0
The conditional volatility is positive when parameters satisfy ±0 > 0,
±i ≥ 0, ±i + γi ≥ 0 and β j ≥ 0, for i = 1, · · · , p and j = 1, · · · , q.
The process is covariance stationary if and only if
p q
βi + ±j + γj < 1.
i=1 j=1

Take the GJR-GARCH(1, 1) case as an example. The one-step-ahead
forecast is
h t+1 = ω + β1 h t + ±1 µt2 + δ1 µt2 Dt ,
and the multi-step forecast is
h t+„ = ω + (±1 + γ1 ) + β1 h t+„ ’1
and use repeated substitution for h t+„ ’1 .
The TGARCH (threshold GARCH) model from Zako¨an (1994) is
similar to GJR-GARCH but is formulated with absolute return instead:
p q
±i |µt’i | + γi Di, t’i |µt’i | +
σ t = ±0 + β j σt’ j . (4.8)
i=1 j=1

The conditional volatility is positive when ±0 > 0, ±i ≥ 0, ±i + γi ≥ 0
and β j ≥ 0, for i = 1, · · · , p and j = 1, · · · , q. The process is covari-
ance stationary, in the case p = q = 1, if and only if
12 2
β1 + ±1 + (±1 + γ1 )2 + √ β1 (±1 + γ1 ) < 1.
2 2π
QGARCH (quadratic GARCH) and various other nonlinear GARCH
models are reviewed in Franses and van Dijk (2000). A QGARCH(1, 1)
has the following structure
h t = ω + ± (µt’1 ’ γ )2 + βh t’1 .
Arch 43

Although Taylor (1986) was one of the earliest studies to test the pre-
dictive power of GARCH Akigray (1989) is more commonly cited in
many subsequent GARCH studies, although an earlier investigation had
appeared in Taylor (1986). In the following decade, there were no fewer
than 20 papers that test GARCH predictive power against other time
series methods and against option implied volatility forecasts. The ma-
jority of these forecast volatility of major stock indices and exchange
The ARCH class models, and their variants, have many supporters.
Akgiray ¬nds GARCH consistently outperforms EWMA and RW in
all subperiods and under all evaluation measures. Pagan and Schwert
(1990) ¬nd EGARCH is best, especially in contrast to some nonpara-
metric methods. Despite a low R 2 , Cumby, Figlewski and Hasbrouck
(1993) conclude that EGARCH is better than RW. Figlewski (1997)
¬nds GARCH superiority con¬ned to the stock market and for forecast-
ing volatility over a short horizon only.
In general, models that allow for volatility asymmetry come out well
in the forecasting contest because of the strong negative relationship be-
tween volatility and shock. Cao and Tsay (1992), Heynen and Kat (1994),
Lee (1991) and Pagan and Schwert (1990) favour the EGARCH model
for volatility of stock indices and exchange rates, whereas Brailsford
and Faff (1996) and Taylor, J. (2004) ¬nd GJR-GARCH outperforms
GARCH in stock indices. Bali (2000) ¬nds a range of nonlinear models
work well for forecasting one-week-ahead volatility of US T-Bill yields.
Cao and Tsay (1992) ¬nd the threshold autoregressive model (TAR in
the previous chapter) provides the best forecast for large stocks and
EGARCH gives the best forecast for small stocks, and they suspect that
the latter might be due to a leverage effect.
Other studies ¬nd no clear-cut result. These include Lee (1991),
West and Cho (1995), Brailsford and Faff (1996), Brooks (1998), and
McMillan, Speight and Gwilym (2000). All these studies (and many
other volatility forecasting studies) share one or more of the following
characteristics: (i) they test a large number of very similar models all
designed to capture volatility persistence, (ii) they use a large number
of forecast error statistics, each of which has a very different loss func-
tion, (iii) they forecast and calculate error statistics for variance and
not standard deviation, which makes the difference between forecasts
of different models even smaller, (iv) they use squared daily, weekly or
44 Forecasting Financial Market Volatility

monthly returns to proxy daily, weekly or monthly ˜actual™ volatility,
which results in extremely noisy ˜actual™ volatility estimates. The noise
in the ˜actual™ volatility estimates makes the small differences between
forecasts of similar models indistinguishable.
Unlike the ARCH class model, the ˜simpler™ methods, including the
EWMA method, do not separate volatility persistence from volatility
shocks and most of them do not incorporate volatility mean reversion.
The ˜simpler™ methods tend to provide larger volatility forecasts most
of the time because there is no constraint on stationarity or convergence
to the unconditional variance, and may result in larger forecast errors
and less frequent VaR violations. The GJR model allows the volatility
persistence to change relatively quickly when return switches sign from
positive to negative and vice versa. If unconditional volatility of all
parametric volatility models is the same, then GJR will have the largest
probability of an underforecast.3 This possibly explains why GJR was
the worst-performing model in Franses and Van Dijk (1996) because they
use MedSE (median standard error) as their sole evaluation criterion. In
Brailsford and Faff (1996), the GJR(1, 1) model outperforms the other
models when MAE, RMSE and MAPE are used.
There is some merit in using ˜simpler™ methods, and especially mod-
els that include long distributed lags. As ARCH class models assume
variance stationarity, the forecasting performance suffers when there are
changes in volatility level. Parameter estimation becomes unstable when
the data period is short or when there is a change in volatility level. This
has led to a GARCH convergence problem in several studies (e.g. Tse and
Tung (1992) and Walsh and Tsou (1998)). Taylor (1986), Tse (1991),
Tse and Tung (1992), Boudoukh, Richardson and Whitelaw (1997),
Walsh and Tsou (1998), Ederington and Guan (1999), Ferreira (1999),
and Taylor, J, (2004) all favour some form of exponential smoothing
method to GARCH for forecasting volatility of a wide range of assets
ranging from equities, exchange rates to interest rates.
This characteristic is clearly evidenced in Table 2 of Brailsford and Faff (1996). The GJR(1, 1) model
underforecasts 76 (out of 90) times. The RW model has an equal chance of underforecasts and overforecasts,
whereas all the other methods overforecast more than 50 (out of 90) times.
Linear and Nonlinear Long
Memory Models

As mentioned before, volatility persistence is a feature that many time
series models are designed to capture. A GARCH model features an
exponential decay in the autocorrelation of conditional variances. How-
ever, it has been noted that squared and absolute returns of ¬nancial
assets typically have serial correlations that are slow to decay, similar to
those of an I(d) process. A shock in the volatility series seems to have
very ˜long memory™ and to impact on future volatility over a long hori-
zon. The integrated GARCH (IGARCH) model of Engle and Bollerslev
(1986) captures this effect, but a shock in this model impacts upon future
volatility over an in¬nite horizon and the unconditional variance does
not exist for this model.

Let ρ„ denote the correlation between xt and xt’„ . The time series xt
is said to have a short memory if n =1 ρ„ converges to a constant as n

becomes large. A long memory series has autocorrelation coef¬cients
that decline slowly at a hyperbolic rate. Long memory in volatility oc-
curs when the effects of volatility shocks decay slowly which is often
detected by the autocorrelation of measures of volatility, such as abso-
lute or squared returns. A long memory process is covariance stationary
if n =1 ρ„ /„ 2d’1 , for some positive d < 1 , converges to a constant as
„ 2
n ’ ∞. When d ≥ 2 , the volatility series is not covariance stationary

although it is still strictly stationary. Taylor (1986) was the ¬rst to note
that autocorrelation of absolute returns, |rt |, is slow to decay compared
with that of rt2 . The highly popular GARCH model is a short memory
model based on squared returns rt2 . Following the work of Granger and
Joyeux (1980) and Hosking (1981), where fractionally integrated se-
ries was shown to exhibit long memory property described above, Ding,
Granger and Engle (1993) propose a fractionally integrated model based
on |rt |d where d is a fraction. The whole issue of Journal of Economet-
rics, 1996, vol. 73, no. 1, edited by Richard Baillie and Maxwell King
46 Forecasting Financial Market Volatility

was devoted to long memory and, in particular, fractional integrated
There has been a lot of research investigating whether long memory of
volatility can help to make better volatility forecasts and explain anom-
alies in option prices. Hitherto much of this research has used the frac-
tional integrated models described in Section 5.3. More recently, several
studies have showed that a number of nonlinear short memory volatility
models are capable of producing spurious long memory characteris-
tics in volatility as well. Examples of such nonlinear models include the
break model (Granger and Hyung, 2004), the volatility component model
(Engle and Lee, 1999), and the regime-switching model (Hamilton and
Susmel, 1994; Diebold and Inoue, 2001). In these three models, volatil-
ity has short memory between breaks, for each volatility component
and within each regime. Without controlling for the breaks, the different
components and the changing regimes, volatility will produce spuri-
ous long memory characteristics. Each of these short memory nonlinear
models provides a rich interpretation of the ¬nancial market volatil-
ity structure compared with the apparently myopic fractional integrated
model which simply requires ¬nancial market participants to remem-
ber and react to shocks for a long time. Discussion of these competing
models is provided in Section 5.4.

The long memory characteristic of ¬nancial market volatility is well
known and has important implications for volatility forecasting and op-
tion pricing. Some evidence of long memory has already been presented
in Section 1.3. In Table 5.1, we present some statistics from a wider
range of assets and through simulation that we published in the Finan-
cial Analysts Journal recently. In the table, we report the sum of the ¬rst
1000 autocorrelation coef¬cients for a number of volatility proxies for
a selection of stock indices, stocks, exchange rates, interest rates and
commodities. We have also presented the statistics for GARCH(1, 1)
and GJR-GARCH(1, 1) series, both simulated using high volatility per-
sistence parameters. The statistics for the simulated series are in the
range of 0.478 to 2.308 while the empirical statistics are much higher.
As noted by Taylor (1986), the absolute return has a longer memory
than the square returns. This has been known as the ˜Taylor effect™.
But, taking logs or trimming the data by capping the values in the 0.1%
Long Memory Models 47

tails often lengthens the memory. This phenomenon continues to puzzle
volatility researchers.
The impact of volatility long memory on option pricing has been
studied in Bollerslev and Mikkelsen (1996, 1999), Taylor (2000) and
Ohanissian, Russel and Tsay (2003). The effect is best understood an-
alytically from the stochastic volatility option pricing model which is
based on stock having the stochastic process below:

d St = µSdt + …t Sdz s,t ,

d…t = κ [θ ’ …t ] dt + σν …t dz …,t ,
which, in a risk-neutral option pricing framework, becomes
√ *
d…t = κ [θ ’ …t ] dt ’ »…t dt + σν …t dz …,t
√ *
= κ * θ * ’ …t dt + σν …t dz …,t , (5.1)
where …t is the instantaneous variance, κ is the speed of mean reversion, θ
is the long run level of volatility, σν is the ˜volatility of volatility™, » is the
market price of (volatility) risk, and κ * = κ + » and θ * = κθ /(κ + »).
The two Wiener processes, dz s,t and dz …,t have constant correlation ρ.
Here κ * is the risk-neutral mean reverting parameter and θ * is the risk-
neutral long run level of volatility. The parameter σν and ρ implicit in
the risk-neutral process are the same as that in the real volatility process.
In the risk-neutral stochastic volatility process in (5.1), a low κ (or κ * )
corresponds to strong volatility persistence, volatility long memory and
high kurtosis. A fast, mean reverting volatility will reduce the impact of
stochastic volatility. The effect of low κ (or high volatility persistence)
is most pronounced when θ the long run level is low but the initial
˜instantaneous™ volatility is high as shown in the table below. The table
reports kurtosis of the simulated distribution when κ = 0.1, » = ρ = 0.
When the correlation coef¬cient ρ is zero, the distribution is symmetrical
and has zero skewness.

…t \ θ 0.05 0.1 0.15 0.02 0.25 0.3
0.1 5.90 4.45 3.97 3.73 3.58 3.48
0.2 14.61 8.80 6.87 5.90 5.32 4.94
0.3 29.12 16.06 11.71 9.53 8.22 7.35
0.4 49.44 26.22 18.48 14.61 12.29 10.74
0.5 75.56 39.28 27.19 21.14 17.51 15.09
At low mean version κ, the option pricing impact crucially de-
pends on the initial volatility, however. Figure 5.1 below presents the
Black“Scholes implied volatility inverted from simulated option prices
Table 5.1 Sum of autocorrelation coef¬cients of the ¬rst 1000 lags for selected ¬nancial time series and simulated GARCH and GJR processes

No. of obs ρ(|r |) ρ(r 2 ) ρ(ln |r |) ρ(|T r |)

Stock Market Indices:
USA S&P500 Composite 9676 35.687 3.912 27.466 40.838
Germany DAX 30 Industrial 9634 75.571 37.102 41.890 79.186
Japan NIKKEI 225 Stock Average 8443 89.559 23.405 84.257 95.789
France CAC 40 8276 43.310 17.467 22.432 46.539
UK FTSE All Share and FTSE100 8714 30.817 12.615 18.394 33.199
Average STOCK INDICES 54.989 18.900 38.888 59.110
Cadbury Schweppes 7418 48.607 19.236 85.288 50.235
Marks & Spencer Group 7709 40.635 17.541 67.480 42.575
Shell Transport 8115 38.947 20.078 44.711 40.035
FTSE Small Cap Index 4437 25.381 3.712 35.152 28.533
Average STOCKS 38.392 15.142 58.158 40.344
Exchange Rates:
7942 56.308 24.652 84.717 57.432
US $ to UK £
7859 32.657 0.052 72.572 48.241
Australian $ to UK £
5394 9.545 1.501 13.760 14.932
Mexican Peso to UK £
2964 20.819 4.927 31.509 21.753
Indonesian Rupiah to UK £
Average EXCHANGE RATES 29.832 7.783 50.640 35.589
Interest Rates:
US 1 month Eurodollar deposits 8491 281.799 20.782 327.770 331.877
UK Interbank 1-month 7448 12.699 0.080 22.901 25.657
Venezuela PAR Brady Bond 3279 19.236 9.944 32.985 19.800
South Korea Overnight Call 2601 54.693 12.200 57.276 56.648
Average INTEREST RATES 92.107 10.752 110.233 108.496
Table 5.1 (Continued)
Gold, Bullion, $/troy oz (London ¬xing) close 6536 125.309 39.305 140.747 133.880
Silver Fix (LBM), cash cents/troy oz 7780 45.504 8.275 88.706 52.154
Brent Oil (1 month forward) $/barrel 2389 11.532 5.469 9.882 11.81
Average COMMODITIES 60.782 17.683 79.778 65.948
Average ALL 54.931 14.113 65.495 61.555
1000 simulated GARCH : mean 10 000 1.045 1.206 0.478 1.033
standard deviation (1.099) (1.232) (0.688) (1.086)
1000 simulated GJR: mean 10 000 1.945 2.308 0.870 1.899
standard deviation (1.709) (2.048) (0.908) (1.660)

Note: ˜Tr™ denote trimmed returns whereby returns in the 0.01% tail take the value of the 0.01% quantile.
The simulated GARCH process is

µt = z t h t , µt ∼ N (0, 1)
h t = (1 ’ 0.96 ’ 0.02) + 0.96h t’1 + 0.02µt’1 .
The simulated GJR process is

µt = z t h t , µt ∼ N (0, 1)
2 2
h t = (1 ’ 0.9 ’ 0.03 ’ 0.5 — 0.09) + 0.9h t’1 + 0.03µt’1 + 0.09Dt’1 µt’1 ,
1 for µt < 0
Dt =
0 otherwise.

Copyright 2004, CFA Institute. Reproduced and republished from Financial Analysts Journal with permission from CFA Institute. All Rights Reserved.
50 Forecasting Financial Market Volatility

k =0.01, ·…t = 0.7 k =3, ·…t =0.7

k =0.01, ·…t = 0.15 k =3, ·…t =0.15


0. 10 4 9
50 60 70 80 90 100 110 120 13 0 140 150

Strike Price, K

Figure 5.1 Effect of kappa
(S = 100, r = 0, T = 1, » = 0, σ… = 0.6, θ = 0.2)

produced from a stochastic option pricing model. The Black“Scholes
model is used here only to get the implied volatility which gives a
clearer relative pricing relationship. The Black“Scholes implied volatil-
ity (BSIV) is directly proportional to option price. First we look at the
high volatility state where …t = 0.7. The implied volatility for κ = 0.01
is higher than that for κ = 3.0, which means that a long memory volatility
(slow mean reversion and high volatility persistence) will lead to a higher
option price. But, in reverse, long memory volatility will result in lower
option prices, hence lower implied volatility at low volatility state, e.g.

…t = 0.15. So unlike the conclusion in previous studies, long memory
in volatility does not always lead to higher option prices. It is conditioned
on the current level of volatility vis-` -vis the long run level of volatility.

Both the historical volatility models and the ARCH models have been
tested for fractional integration. Baillie, Bollerslev and Mikkelsen (1996)
¬tted FIGARCH to US dollar“Deutschemark exchange rates. Bollerslev
and Mikkelsen (1996, 1999) used FIEGARCH to study S&P500
volatility and option pricing impact, and so did Taylor (2000). Vilasuso
(2002) tested FIGARCH against GARCH and IGARCH for volatility
prediction for ¬ve major currencies. In Andersen, Bollerslev, Diebold
and Labys (2003), a vector autoregressive model with long distributed
lags was built on the realized volatility of three exchange rates, which
Long Memory Models 51

they called the VAR-RV model. In Zumbach (2002) the weights applied
to the time series of realized volatility follow a power law, which he
called the LM-ARCH model. Three other papers, viz. Li (2002), Martens
and Zein (2004) and Pong, Shackleton, Taylor and Xu (2004), compared
long memory volatility model forecasts with option implied volatility. Li
(2002) used ARFIMA whereas the other two papers used log-ARFIMA.
Hwang and Satchell (1998) studied the log-ARFIMA model also, but
they forecast Black“Scholes ˜risk-neutral™ implied volatility of the eq-
uity option instead of the underlying asset.

The FIGARCH(1, d, 1) model below:
h t = ω + [1 ’ β1 L ’ (1 ’ φ1 L)(1 ’ L)d ]µt2 + β1 h t’1
was used in Baillie, Bollerslev and Mikkelsen (1996), and all the fol-
lowing speci¬cations are equivalent:
(1 ’ β1 L)h t = ω + [1 ’ β1 L ’ (1 ’ φ1 L)(1 ’ L)d ]µt2 ,
h t = ω(1 ’ β1 )’1 + (1 ’ β1 L)’1
—[(1 ’ β1 L) ’ (1 ’ φ1 L)(1 ’ L)d ]µt2 ,
h t = ω(1 ’ β1 )’1 + [1 ’ (1 ’ β1 L)’1 (1 ’ φ1 L)(1 ’ L)d ]µt2 .
For the one-step-ahead forecast
h t+1 = ω(1 ’ β1 )’1 + [1 ’ (1 ’ β1 L)’1 (1 ’ φ1 L)(1 ’ L)d ]µt2 ,
and the multi-step-ahead forecast is
h T +„ = ω(1 ’ β1 )’1 + [1 ’ (1 ’ β1 L)’1 (1 ’ φ1 L)(1 ’ L)d ]µT +„ ’1 .

The FIGARCH model is estimated based on the approximate maxi-
mum likelihood techniques using the truncated ARCH representation.
We can transform the FIGARCH model to the ARCH model with in¬nite
lags. The parameters in the lag polynomials
»(L) = 1 ’ (1 ’ β1 L)’1 (1 ’ φ1 L)(1 ’ L)d
may be written as
»1 = φ1 ’ β1 + d,
»k = β1 »k’1 + (πk ’ φ1 πk’1 ) for k ≥ 2,
52 Forecasting Financial Market Volatility


(1 ’ L) = πj L j,

π0 = 0.
In the literature, a truncation lag at J = 1000 is common.

Bollerslev and Mikkelsen (1996) ¬nd fractional integrated models pro-
vide better ¬t to S&P500 returns. Speci¬cally, they ¬nd that frac-
tionally integrated models perform better than GARCH( p, q) and
IGARCH( p, q), and that FIEGARCH speci¬cation is better than FI-
GARCH. Bollerslev and Mikkelsen (1999) con¬rm that FIEGARCH
beats EGARCH and IEGARCH in pricing options of S&P500 LEAPS
(Long-term Equity Anticipation Securities) contracts. Speci¬cally
Bollerslev and Mikkelsen (1999) ¬tted an AR(2)-FIEGARCH(1, d, 1)
as shown below:
rt = µ + ρ1 L + ρ2 L 2 rt + z t , (5.2)
ln σt2 = ωt + (1 + ψ1 L) (1 ’ φ1 L)’1 (1 ’ L)’d g ( t ) ,
g ( t ) = θ t’1 + γ [| t’1 | ’ E | t’1 |] ,
ωt = ω + ln (1 + δ Nt ) .
The FIEGARCH model in (5.2) is truly a model for absolute return.
Since both EGARCH and FIEGARCH provide forecasts for ln σ , to
infer forecast for σ from ln σ requires adjustment for Jensen inequality
which is not a straightforward task without the assumption of a normal
distribution for ln σ .

5.3.3 The positive drift in fractional integrated series
As Hwang and Satchell (1998) and Granger (2001) pointed out, positive
I(d) process has a positive drift term or a time trend in volatility level
which is not observed in practice. This is a major weakness of the frac-
tionally integrated model for it to be adopted as a theoretically sound
model for volatility.
All fractional integrated models of volatility have a nonzero drift.
In practice the estimation of fractional integrated models require an
arbitrary truncation of the in¬nite lags and as a result the mean will
be biased. Zumbach™s (2002) LM-ARCH will not have this problem
because of the ¬xed number of lags and the way in which the weights are
Long Memory Models 53

calculated. Hwang and Satchell™s (1998) scaled-truncated log-ARFIMA
model is mean adjusted to control for the bias that is due to this truncation
and the log transformation. The FIGARCH has a positive mean in the
conditional variance equation whereas FIEGARCH has no such problem
because the lag-dependent terms have zero mean.

5.3.4 Forecasting performance
Vilasuso (2002) ¬nds FIGARCH produces signi¬cantly better 1- and
10-day-ahead volatility forecasts for ¬ve major exchange rates than
GARCH and IGARCH. Zumbach (2002) produces only one-day-ahead
forecasts and ¬nd no difference among model performance. Andersen,
Bollerslev, Diebold and Labys (2003) ¬nd the realized volatility con-
structed VAR model, i.e. VAR-RV, produces the best 1- and 10-day-
ahead volatility forecasts. It is dif¬cult to attribute this superior
performance to the fractional integrated model alone because the VAR
structure allows a cross series linkage that is absent in all other univari-
ate models and we also know that the more accurate realized volatility
estimates would result in improved forecasting performance, everything
else being equal.
The other three papers that compare forecasts from LM models with
implied volatility forecasts generally ¬nd implied volatility forecast to
produce the highest explanatory power. Martiens and Zein (2004) ¬nd
log-ARFIMA forecast beats implied in S&P500 futures but not in ¥/US$
and crude oil futures. Li (2002) ¬nds implied produces better short hori-
zon forecast whereas the ARFIMA provides better forecast for a 6-month
horizon. However, when regression coef¬cients are constrained to be
± = 0 and β = 1, the regression R 2 becomes negative at long horizon.
From our discussion in Section 2.4, this suggests that volatility at the
6-month horizon might be better forecast using the unconditional vari-
ance instead of model-based forecasts. Pong, Shackleton, Taylor and Xu
(2004) ¬nd implied volatility to outperform time series volatility models
including the log-ARFIMA model in forecasting 1- to 3-month-ahead
volatility of the dollar-sterling exchange rate.
Many of the fractional integration papers were written more recently
and used realized volatilities constructed from intraday high-frequency
data. When comparison is made with option implied volatility, the im-
plied volatility is usually extracted from daily closing option prices,
however. Despite the lower data frequency, implied appears to outper-
form forecasts from LM models that use intraday information.
54 Forecasting Financial Market Volatility

Fractionally integrated series is the simplest linear model that produces
long memory characteristics. It is also the most commonly used and
tested model in the literature for capturing long memory in volatility.
There are many other nonlinear short memory models that exhibit spu-
rious long memory in volatility, viz. break, volatility component and
regime-switching models. These three models, plus the fractional inte-
grated model, have very different volatility dynamics and produce very
different volatility forecasts.
The volatility breaks model permits the mean level of volatility to
change in a step function through time with some weak constraint on
the number of breaks in the volatility level. It is more general than the
volatility component model and the regime switching model. In the case
of the volatility component model, the mean level is a slowly evolving
process. For the regime-switching model, the mean level of volatility
could differ according to regimes the total number of which is usually
con¬ned to a small number such as two or three.

5.4.1 Breaks
A break process can be written as

Vt = m t + u t ,

where u t is a noise variable and m t represents occasional level shifts.
m t are controlled by qt (a zero“one indicator for the presence of breaks)
and ·t (the size of jump) such that
m t = m t’1 + qt ·t = m 0 + qi ·i ,
0, with probability 1 ’ p
qt = .
1, with probability p

The expected number of breaks for a given sample is Tp where T
is the total number of observations. Provided that p converges to zero
slowly as the sample size increases, i.e. p ’ 0 as T ’ ∞, such that
limT ’∞ Tp is a nonzero constant, Granger and Hyung (2004) showed
that the integrating parameter, I (d), is a function of Tp. While d is
Long Memory Models 55

bounded between 0 and 1, the expected value of d is proportionate to
the number of breaks in the series.
One interesting empirical ¬nding on the volatility break model comes
from Aggarwal, Inclan and Leal (1999) who use the ICSS (integrated
cumulative sums of squares) algorithm to identify sudden shifts in the
variance of 20 stock market indices and the duration of such shifts.
They ¬nd most volatility shifts are due to local political events. When
dummy variables, indicating the location of sudden change in variance,
were ¬tted to a GARCH(1,1) model, most of the GARCH parameters be-
came statistically insigni¬cant. The GARCH(1,1) with occasional break
model can be written as follows:
h t = ω1 D1 + · · · + ω R+1 D R+1 + ±1 µt’1 + β1 h t’1 ,

where D1 , · · · , D R+1 are the dummy variables taking the value 1 in
each regime of variance, and zero elsewhere. The one-step-ahead and
multi-step-ahead forecasts are
h t+1 = ω R+1 + ±1 µt2 + β1 h t ,
h t+„ = ω R+1 + (±1 + β1 )h t+„ ’1 .
In estimating the break points using the ICSS algorithms, a minimum
length between breaks is needed to reduce the possibility of any tempo-
rary shocks in a series being mistaken as break.

5.4.2 Components model
Engle and Lee (1999) proposed the component GARCH (CGARCH)
model whereby the volatility process is modelled as the sum of a perma-
nent process, m t , that has memory close to a unit root, and a transitory
mean reverting process, u t , that has a more rapid time decay. The model
can be seen as an extension of the GARCH(1,1) model with the con-
ditional variance mean-revert to a long term trend level, m t , instead of
a ¬xed position at σ . Speci¬cally, m t is permitted to evolve slowly in
an autoregressive manner. The CGARCH(1,1) model has the following
(h t ’ m t ) = ± µt’1 ’ m t’1 + β (h t’1 ’ m t’1 ) ≡ u t ,
m t = ω + ρm t’1 + • µt’1 ’ h t’1 ,

where (h t ’ m t ) = u t represents the short-run transitory component and
m t represents a time-varying trend or permanent component in volatility
56 Forecasting Financial Market Volatility

which is driven by volatility prediction error µt’1 ’ h t’1 and is inte-

grated if ρ = 1.
For the one-step-ahead forecast
h t+1 = qt+1 + ± µt2 ’ qt + β (h t ’ qt ) ,
qt+1 = ω + ρqt + • µt2 ’ h t ,
and for the multi-step-ahead forecast
h t+„ = qt+„ ’ (± + β)qt+„ ’1 + (± + β)h t+„ ,
qt+„ = ω + ρqt+„ ’1 ,
where h t+„ and qt+„ ’1 are calculated through repeat substitutions.
This model has various interesting properties: (i) both m t and u t are
driven by µt’1 ’ h t’1 ; (ii) the short-run volatility component mean-

reverts to zero at a geometric rate of (± + β) if 0 < (± + β) < 1; (iii)
the long-run volatility component evolves over time following an AR
process and converge to a constant level de¬ned by ω/ (1 ’ ρ) if 0 <
ρ < 1; (iv) it is assumed that 0 < (± + β) < ρ < 1 so that the long-run
component is more persistent than the short-run component.
This model was found to obey several economic and asset pricing
relationships. Many have observed and proposed that the volatility per-
sistence of large jumps is shorter than shocks due to ordinary news
events. The component model allows large shocks to be transitory. In-
deed Engle and Lee (1999) establish that the impact of the October 1987
crash on stock market volatility was temporary. The expected risk pre-
mium, as measured by the expected amount of returns in excess of the
risk-free interest rate, in the stock market was found to be related to the
long-run component of stock return volatility.1 The authors suggested,
but did not test, that such pricing relationship may have fundamen-
tal economic explanations. The well-documented ˜leverage effect™ (or
volatility asymmetry) in the stock market (see Black, 1976; Christie,
1982; Nelson, 1991) is shown to have a temporary impact; the long-run
volatility component shows no asymmetric response to market changes.
The reduced form of Equation (5.3) can be expressed as a
GARCH(2,2) process below:
h t = (1 ’ ± ’ β) ω + (± + •) µt’1 + [’• (± + β) ’ ±ρ] µt’2
2 2

+ (ρ + β ’ •) h t’1 + [• (± + β) ’ βρ] h t’2 ,

Merton (1980) and French, Schwert and Stambaugh (1987) also studied and measured the relationships
between risk premium and ˜total™ volatility.
Long Memory Models 57

with all ¬ve parameters, ±, β, ω, • and ρ, constraint to be positive and
real, 0 < (± + β) < ρ < 1, and 0 < • < β.

5.4.3 Regime-switching model
One approach for modelling changing volatility level and persistence
is to use a Hamilton (1989) type regime-switching (RS) model, which
like GARCH model is strictly stationary and covariance stationary. Both
ARCH and GARCH models have been implemented with a Hamilton
(1989) type regime-switching framework, whereby volatility persistence
can take different values depending on whether it is in high or low volatil-
ity regimes. The most generalized form of regime-switching model is
the RS-GARCH(1, 1) model used in Gray (1996) and Klaassen (1998)
h t, St’1 = ω St’1 + ± St’1 µt’1 + β St’1 h t’1, St’1

where St indicates the state of regime at time t.
It has long been argued that the ¬nancial market reacts to large and
small shocks differently and the rate of mean reversion is faster for large
shocks. Friedman and Laibson (1989), Jones, Lamont and Lumsdaine
(1998) and Ederington and Lee (2001) all provide explanations and
empirical support for the conjecture that volatility adjustment in high
and low volatility states follows a twin-speed process: slower adjust-
ment and more persistent volatility in the low volatility state and faster
adjustment and less volatility persistence in the high volatility state.
The earlier RS applications, such as Pagan and Schwert (1990) and
Hamilton and Susmel (1994) are more rigid, where conditional vari-
ance is state-dependent but not time-dependent. In these studies, only
ARCH class conditional variance is entertained. Recent extensions by
Gray (1996) and Klaassen (1998) allow GARCH-type heteroscedas-
ticity in each state and the probability of switching between states to
be time-dependent. More recent advancement is to allow more ¬‚exi-
ble switching probability. For example, Peria (2001) allowed the tran-
sition probabilities to vary according to economic conditions with the
RS-GARCH model below:
rt | t’1 N (µi , h it ) w.p. pit ,
h it = ωi + ±i t’1 + βi h t’1 .

where i represents a particular regime, ˜w.p.™ stands for with probability,
pit = Pr ( St = i| t’1 ) and pit = 1.
58 Forecasting Financial Market Volatility

The STGARCH (smooth transition GARCH) model below was tested
in Taylor, J. (2004)
h t = ω + (1 ’ F (µt’1 )) ±µt’1 + F (µt’1 ) δµt’1 + βh t’1 ,
2 2

F (µt’1 ) = for logistic STGARCH,
1 + exp (’θ µt’1 )

F (µt’1 ) = 1 + exp ’θ µt’1
for exponential STGARCH.

5.4.4 Forecasting performance
The TAR model used in Cao and Tsay (1992) is similar to a SV model
with regime switching, and Cao and Tsay (1992) reported better forecast-
ing performance from TAR than EGARCH and GARCH. Hamilton and
Susmel (1994) ¬nd regime-switching ARCH with leverage effect pro-
duces better volatility forecast than the asymmetry version of GARCH.
Hamilton and Lin (1996) use a bivariate RS model and ¬nd stock market
returns are more volatile during a period of recession. Gray (1996) ¬ts
a RSGARCH (1,1) model to US 1-month T-Bill rates, where the rate
of mean level reversion is permitted to differ under different regimes,
and ¬nds substantial improvement in forecasting performance. Klaassen
(1998) also applies RSGARCH (1,1) to the foreign exchange market and
¬nds a superior, though less dramatic, performance.
It is worth noting that interest rates are different to the other assets
in that interest rates exhibit ˜level™ effect, i.e. volatility depends on the
level of the interest rate. It is plausible that it is this level effect that Gray
(1996) is picking up that result in superior forecasting performance. This
level effect also appears in some European short rates (Ferreira, 1999).
There is no such level effect in exchange rates and so it is not surprising
that Klaassen (1998) did not ¬nd similar dramatic improvement. No
other published forecasting results are available for break and component
volatility models.
Stochastic Volatility

The stochastic volatility (SV) model is, ¬rst and foremost, a theoretical
model rather than a practical and direct tool for volatility forecast. One
should not overlook the developments in the stochastic volatility area,
however, because of the rapid advancement in research, noticeably by
Ole Barndorff-Nielsen and Neil Shephard. As far as implementation is
concerned, the SV estimation still poses a challenge to many researchers.
Recent publications indicate a trend towards the MCMC (Monte Carlo
Markov Chain) approach. A good source of reference for the MCMC
approach for SV estimation is Tsay (2002). Here we will provide only an
overview. An early survey of SV work is Ghysels, Harvey and Renault
(1996) but the subject is rapidly changing. A more recent SV book is
Shephard (2003). The SV models and the ARCH models are closely
related and many ARCH models have SV equivalence as continuous time
diffusion limit (see Taylor, 1994; Duan, 1997; Corradi, 2000; Fleming
and Kirby, 2003).

The discrete time SV model is

rt = µ + µt ,
µt = z t exp (0.5h t ) ,
h t = ω + βh t’1 + …t ,

where …t may or may not be independent of z t . We have already seen this
continuous time speci¬cation in Section 5.2, and it will appear again in
Chapter 9 when we discuss stochastic volatility option pricing models.
The SV model has an additional innovative term in the volatility dy-
namics and, hence, is more ¬‚exible than ARCH class models. It has been
found to ¬t ¬nancial market returns better and has residuals closer to stan-
dard normal. Modelling volatility as a stochastic variable immediately
leads to fat tail distributions for returns. The autoregressive term in the
volatility process introduces persistence, and the correlation between
60 Forecasting Financial Market Volatility

the two innovative terms in the volatility process and the return pro-
cess produces volatility asymmetry (Hull and White, 1987, 1988). Long
memory SV models have also been proposed by allowing the volatility
process to have a fractional integrated order (see Harvey, 1998).
The volatility noise term makes the SV model a lot more ¬‚exible, but
as a result the SV model has no closed form, and hence cannot be esti-
mated directly by maximum likelihood. The quasi-maximum likelihood
estimation (QMLE) approach of Harvey, Ruiz and Shephard (1994) is in-
ef¬cient if volatility proxies are non-Gaussian (Andersen and Sorensen,
1997). The alternatives are the generalized method of moments (GMM)
approach through simulations (Duf¬e and Singleton, 1993), or analyt-
ical solutions (Singleton, 2001), and the likelihood approach through
numerical integration (Fridman and Harris, 1998) or Monte Carlo in-
tegration using either importance sampling (Danielsson, 1994; Pitt and
Shephard, 1997; Durbin and Koopman, 2000) or Markov chain (e.g.
Jacquier, Polson and Rossi, 1994; Kim, Shephard and Chib, 1998). In
the following section, we will describe the MCMC approach only.

The MCMC approach to modelling stochastic volatility was made pop-
ular by authors such as Jacquier, Polson and Rossi (1994). Tsay (2002)
has a good description of how the algorithm works. Consider here the
simplest case:
r t = at ,
a t = h t µt ,
ln h t = ±0 + ±1 ln h t’1 + vt , (6.1)
where µt ∼ N (0, 1), vt ∼ N (0, σν ) and µt and vt are independent.

Let w = (±0 , ±1 , σν ) . Let R = (r1 , · · · , rn ) be the collection of n ob-

served returns, and H = (h 1 , · · · , h n ) be the n-dimension unobservable
conditional volatilities. Estimation of model (6.1) is made complicated
because the likelihood function is a mixture over the n-dimensional H
distribution as follows:

f (R | w) = f (R | H ) · f (H | w) d H.

The objective is still maximizing the likelihood of {at }, but the density
of R is determined by H which in turn is determined by w.
Stochastic Volatility 61

Assuming that prior distributions for the mean and the volatility equa-
tions are independent, the Gibbs sampling approach to estimating model
(6.1) involves drawing random samples from the following conditional
posterior distributions:
f (β | R, X, H, w), f (H | R, X, β, w) f (w | R, X, β, H )
This process is repeated with updated information till the likelihood tol-
erance or the predetermined maximum number of iterations is reached.

6.2.1 The volatility vector H
First, the volatility vector H is drawn element by element
f (h t |R, H’t , w )
∝ f (at |h t , rt ) f (h t |h t’1 , w ) f (h t+1 |h t , w )
(ln h t ’ µt )2
rt 2
’0.5 ’1
∝ h t exp ’ · h t exp ’ (6.2)
2σ 2
2h t
µt = [±0 (1 ’ ±1 ) + ±1 (ln h t+1 + ln h t’1 )] ,
1 + ±1 2

1 + ±1 2

Equation (6.2) can be obtained using results for a missing value in an
A R(1) model. To see how this works, start from the volatility equation
ln h t = ±0 + ±1 ln h t’1 + at ,
±0 + ±1 ln h t’1 = 1 — ln h t ’at ,
xt bt
yt = xt ln h t + bt , (6.3)
and for t + 1
ln h t+1 ’ ±0 = ±1 + ln h t + at+1 ,
yt+1 = xt+1 ln h t + bt+1 . (6.4)
Given that bt and bt+1 have the same distribution because at is also
N (0, σν ), ln h t can be estimated from (6.3) and (6.4) using the least
62 Forecasting Financial Market Volatility

squares principle,
xt yt + xt+1 yt+1
ln h t =
xt2 + xt+1

±0 (1 ’ ±1 ) + ±1 (ln h t+1 + ln h t’1 )
= .


. 2
( 7)