„ ¤1

1 for

= .

1 S (T )

S (T ) 0 otherwise

Assuming that k-step ahead forecast errors are at most (k ’ 1)-

dependent, it is therefore recommended that S (T ) = (k ’ 1). It is not

likely that f d (0) will be negative, but in the rare event that f d (0) < 0,

Volatility Forecast Evaluation 27

it should be treated as zero and the null hypothesis of equal forecast

accuracy be rejected automatically.

2.3.2 Diebold and Mariano™s sign test

The sign test targets on the median with the null hypothesis that

Med (d) = Med g(eit ) ’ g e jt

= 0.

Assuming that dt ∼ iid, then the test statistic is

T

S2 = I+ (dt )

t=1

where

1 if dt > 0

I+ (dt ) = .

0 otherwise

For small sample, S2 should be assessed using a table for cumulative

binomial distribution. In large sample, the Studentized verson of S2 is

asymptotically normal

S2 ’ 0.5T a

S2a = √ ∼ N (0, 1).

0.25T

2.3.3 Diebold and Mariano™s Wilcoxon sign-rank test

As the name indicates, this test is based on both the sign and the rank of

loss differential with test statistic

T

I+ (dt ) rank (|dt |)

S3 =

t=1

represents the sum of the ranks of the absolute values of the positive

observations. The critical values for S3 have been tabulated for small

sample. For large sample, the Studentized verson of S3 is again asymp-

totically normal

T (T + 1)

S3 ’ a

4

S2a = ∼ N (0, 1) .

T (T + 1) (2T + 1)

24

28 Forecasting Financial Market Volatility

2.3.4 Serially correlated loss differentials

Serial correlation is explicitly taken care of in S1 . For S2 and S3 (and

their asymptotic counter parts S2a and S3a ), the following k-set of loss

differentials have to be tested jointly

di j,1 , di j,1+k , di j,1+2k , · · · ,

di j,2 , di j,2+k , di j,2+2k , · · · ,

.

.

.

di j,k , di j,2k , di j,3k , · · · .

A test with size bounded by ± is then tested k times, each of size ±/k,

on each of the above k loss-differentials sequences. The null hypothesis

of equal forecast accuracy is rejected if the null is rejected for any of the

k samples.

2.4 REGRESSION-BASED FORECAST EFFICIENCY

AND ORTHOGONALITY TEST

The regression-based method for examining the informational content

of forecasts is by far the most popular method in volatility forecasting.

It involves regressing the actual volatility, X t , on the forecasts literature,

X t , as shown below:

X t = ± + β X t + …t . (2.3)

Conditioning upon the forecast, the prediction is unbiased only if ± = 0

and β = 1.

Since the error term, …t , is heteroscedastic and serially correlated when

overlapping forecasts are evaluated, the standard errors of the parameter

estimates are often computed on the basis of Hansen and Hodrick (1980).

Let Y be the row matrix of regressors including the constant term. In

(2.3), Yt = 1 X t is a 1 — 2 matrix. Then

T

’1

=T …t2 Yt Yt

t=1

T T

’1

+T Q (k, t) …k …t Yt Yk + Yk Yt ,

k=1 t=k+1

Volatility Forecast Evaluation 29

where …k and …t are the residuals for observation k and t from the regres-

sion. The operator Q (k, t) is an indicator function taking the value 1 if

there is information overlap between Yk and Yt . The adjusted covariance

matrix for the regression coef¬cients is then calculated as

’1 ’1

= YY .

YY (2.4)

Canina and Figlewski (1993) conducted some simulatation studies and

found the corrected standard errors in (2.4) are close to the true values,

and the use of overlapping data reduced the standard error between one-

quarter and one-eighth of what would be obtained with only nonover-

lapping data.

In cases where there are more than one forecasting model, additional

forecasts are added to the right-hand side of (2.3) to check for incremen-

tal explanatory power. Such a forecast encompassing test dates back

to Theil (1966). Chong and Hendry (1986) and Fair and Shiller (1989,

1990) provide further theoretical exposition of such methods for testing

forecast ef¬ciency. The ¬rst forecast is said to subsume information con-

tained in other forecasts if these additional forecasts do not signi¬cantly

increase the adjusted regression R 2 . Alternatively, an orthogonality test

may be conducted by regressing the residuals from (2.3) on other fore-

casts. If these forecasts are orthogonal, i.e. do not contain additional

information, then the regression coef¬cients will not be different from

zero.

While it is useful to have an unbiased forecast, it is important to

distinguish between bias and predictive power. A biased forecast can

have predictive power if the bias can be corrected. An unbiased forecast

is useless if all forecast errors are big. For X i to be considered as a good

forecast, Var(…t ) should be small and R 2 for the regression should tend

to 100%. Blair, Poon and Taylor (2001) use the proportion of explained

variability, P, to measure explanatory power

2

Xi ’ Xi

P =1’ . (2.5)

(X i ’ µ X )2

The ratio in the right-hand side of (2.5) compares the sum of squared

prediction errors (assuming ± = 0 and β = 1 in (2.3)) with the sum

of squared variation of X i . P compares the amount of variation in the

forecast errors with that in actual volatility. If prediction errors are small,

30 Forecasting Financial Market Volatility

P is closer to 1. Given that a regression model that produces (2.5) is more

restrictive than (2.3), P is likely to be smaller than conventional R 2 . P

can even be negative since the ratio on the right-hand side of (2.5) can

be greater than 1. A negative P means that the forecast errors have

a greater amount of variation than the actual volatility, which is not a

desirable characteristic for a well-behaved forecasting model.

2.5 OTHER ISSUES IN FORECAST EVALUATION

In all forecast evaluations, it is important to distinguish in-sample and

out-of-sample forecasts. In-sample forecast, which is based on param-

eters estimated using all data in the sample, implicitly assumes parameter

estimates are stable across time. In practice, time variation of parameter

estimates is a critical issue in forecasting. A good forecasting model

should be one that can withstand the robustness of out-of-sample test “

a test design that is closer to reality.

Instead of striving to make some statistical inference, model perfor-

mance could be judged on some measures of economic signi¬cance. Ex-

amples of such an approach include portfolio improvement derived from

better volatility forecasts (Fleming, Kirby and Ostdiek, 2000, 2002).

Some papers test forecast accuracy by measuring the impact on option

pricing errors (Karolyi, 1993). In the latter case, pricing error in the

option model will be cancelled out when the option implied volatility is

reintroduced into the pricing formula. So it is not surprising that evalu-

ation which involves comparing option pricing errors often prefers the

implied volatility method to all other time series methods.

Research in ¬nancial market volatility has been concentrating on

modelling and less on forecasting. Work on combined forecast is rare,

probably because the groups of researchers in time series models and

option pricing do not seem to mix. What has not yet been done in the

literature is to separate the forecasting period into ˜normal™ and ˜excep-

tional™ periods. It is conceivable that different forecasting methods are

better suited to different trading environment and economic conditions.

3

Historical Volatility Models

Compared with the other types of volatility models, the historical volatil-

ity models (HIS) are the easiest to manipulate and construct. The well-

known Riskmetrics EWMA (equally weighted moving average) model

from JP Morgan is a form of historical volatility model; so are models

that build directly on realized volatility that have became very popu-

lar in the last few years. Historical volatility models have been shown

to have good forecasting performance compared with other time series

volatility models. Unlike the other two time series models (viz. ARCH

and stochastic volatility (SV)) conditional volatility is modelled sepa-

rately from returns in the historical volatility models, and hence they

are less restrictive and are more ready to respond to changes in volatil-

ity dynamic. Studies that ¬nd historical volatility models forecast better

than ARCH and/or SV models include Taylor (1986, 1987), Figlewski

(1997), Figlewski and Green (1999), Andersen, Bollerslev, Diebold and

Labys (2001) and Taylor, J. (2004). With the increased availability of

intraday data, we can expect to see research on the realized volatility

variant of the historical model to intensify in the next few years.

3.1 MODELLING ISSUES

Unlike ARCH SV models where returns are the main input, HIS models

do not normally use returns information so long as the volatility estimates

are ready at hand. Take the simplest form of ARCH(1) for example,

rt = µ + µt , µt ∼ N (0, σt ) (3.1)

µt = z t σt , z t ∼ N (0, 1)

σt2 = ω + ±1 µt’1 .

2

(3.2)

The conditional volatility σt2 in (3.2) is modelled as a ˜byproduct™

of the return equation (3.1). The estimation is done by maximizing

the likelihood of observing {µt } using the normal, or other chosen,

density. The construction and estimation of SV models are similar to

those of ARCH, except that there is now an additional innovation term

in (3.2).

32 Forecasting Financial Market Volatility

In contrast, the HIS model is built directly on conditional volatility,

e.g. an AR(1) model:

σt = γ + β1 σt’1 + …t . (3.3)

The parameters γ and β1 are estimated by minimizing in-sample forecast

errors, …t , where

…t = σt ’ γ ’ β 1 σ t’1 ,

and the forecaster has the choice of reducing mean square errors, mean

absolute errors etc., as in the case of choosing an appropriate forecast

error statistic in Section 2.2.

The historical volatility estimates σt in (3.3) can be calculated as

sample standard deviations if there are suf¬cient data for each t in-

terval. If there is not suf¬cient information, then the H -L method of

Section 1.3.2 may be used, and in the most extreme case, where only

one observation is available for each t interval, one often resorts to using

absolute return to proxy for volatility at t. In Section 1.3.1 we have high-

lighted the danger of using daily absolute or squared returns to proxy

˜actual™ daily volatility for the purpose of forecast evaluation, as this

could lead to very misleading model ranking. The problem with the use

of daily absolute return in volatility modelling is less severe provided

that long distributed lags are included (Nelson, 1992; Nelson and Foster,

1995). With the increased availability of intraday data, historical volatil-

ity estimates can be calculated quite accurately as realized volatility

following Section 1.3.3.

3.2 TYPES OF HISTORICAL VOLATILITY MODELS

There are now two major types of HIS models: the single-state and the

regime-switching models. All the HIS models differ by the number of lag

volatility terms included in the model and the weights assigned to them,

re¬‚ecting the choice on the tradeoff between increasing the amount of

information and more updated information.

3.2.1 Single-state historical volatility models

The simplest historical price model is the random walk model, where

the difference between consecutive period volatility is modelled as a

random noise;

σt = σt’1 + vt ,

Historical Volatility Models 33

So the best forecast for tomorrow™s volatility is today™s volatility:

σ t+1 = σt ,

where σt alone is used as a forecast for σt+1 .

In contrast, the historical average method makes a forecast based on

the entire history

1

σ t+1 =(σt + σt’1 + · · · + σ1 ) .

t

The simple moving average method below,

1

σ t+1 = (σt + σt’1 + · · · + σt’„ ’1 ) ,

„

is similar to the historical average method, except that older information

is discarded. The value of „ (i.e. the lag length to past information

used) could be subjectively chosen or based on minimizing in-sample

forecast error, ‚t+1 = σt+1 ’ σ t+1 . The multi-period forecasts σ t+„ for

„ > 1 will be the same as the one-step-ahead forecast σ t+1 for all three

methods above.

The exponential smoothing method below,

σt = (1 ’ β) σt’1 + βσ t’1 + ξt 0 ¤ β ¤ 1,

and

σ t+1 = (1 ’ β) σt + βσ t ,

is similar to the historical method, but more weight is given to the recent

past and less weight to the distant past. The smoothing parameter β is

estimated by minimizing the in-sample forecast errors ξt .

The exponentially weighted moving average method (EWMA) below

is the moving average method with exponential weights:

„ „

σ t+1 = β σt’i’1 βi .

i

i=1 i=1

Again the smoothing parameter β is estimated by minimizing the in-

sample forecast errors ξt . The JP Morgan RiskmetricsTM model is a

procedure that uses the EWMA method.

All the historical volatility models above have a ¬xed weighting

scheme or a weighting scheme that follows some declining pattern. Other

types of historical model have weighting schemes that are not prespec-

i¬ed. The simplest of such models is the simple regression method,

σt = γ + β1 σt’1 + β2 σt’2 + · · · + βn σt’n + …t ,

σ t+1 = γ + β1 σt + β2 σt’1 + · · · + βn σt’n+1 ,

34 Forecasting Financial Market Volatility

which expresses volatility as a function of its past values and an error

term.

The simple regression method is principally autoregressive. If past

volatility errors are also included, one gets the ARMA model

σ t+1 = β1 σt + β2 σt’1 + · · · + γ1 …t + γ2 …t’1 + · · · .

Introducing a differencing order I(d), we get ARIMA when d = 1 and

ARFIMA when d < 1.

3.2.2 Regime switching and transition exponential smoothing

In this section, we have the threshold autoregressive model from Cao

and Tsay (1992):

(i) (i)

σt = φ0 + φ1 σt’1 + · · · + φ (i) σt’ p + vt , i = 1, 2, . . . , k

p

(i) (i)

σ t+1 = φ0 + φ1 σt + · · · + φ (i) σt+1’ p ,

p

where the thresholds separate volatility into states with independent

simple regression models and noise processes in each state. The predic-

tion σ t+1 could be based solely on current state information i assuming

the future will remain on current state. Alternatively it could be based on

information of all states weighted by the transition probability for each

state. Cao and Tsay (1992) found the threshold autoregressive model

outperformed EGARCH and GARCH in forecasting of the 1- to 30-

month volatility of the S&P value-weighted index. EGARCH provided

better forecasts for the S&P equally weighted index, possibly because

the equally weighted index gives more weights to small stocks where

the leverage effect could be more important.

The smooth transition exponential smoothing model is from Taylor,

J. (2004):

σ t = ±t’1 µt’1 + (1 ’ ±t’1 ) σ 2 + vt ,

2

(3.4)

t’1

where

1

±t’1 = ,

1 + exp (β + γ Vt’1 )

and Vt’1 = aµt’1 + b |µt’1 | is the transition variable. The smoothing

parameter ±t’1 varies between 0 and 1, and its value depends on the

size and the sign of µt’1 . The dependence on µt’1 means that multi-step-

ahead forecasts cannot be made except through simulation. (The same

would apply to many nonlinear ARCH and SV models as we will show

in the next few chapters.)

Historical Volatility Models 35

One-day-ahead forecasting results show that the smooth transition ex-

ponential smoothing model performs very well against several ARCH

counterparts and even outperformed, on a few occasions, the realized

volatility forecast. But these rankings were not tested for statistical sig-

ni¬cance, so it is dif¬cult to come to a conclusion given the closeness

of many error statistics reported.

3.3 FORECASTING PERFORMANCE

Taylor (1987) was one of the earliest to test time-series volatility fore-

casting models before ARCH/GARCH permeated the volatility litera-

ture. Taylor (1987) used extreme value estimates based on high, low and

closing prices to forecast 1 to 20 days DM/$ futures volatility and found

a weighted average composite forecast performed best. Wiggins (1992)

also gave support to extreme-value volatility estimators.

In the pre-ARCH era, there were many studies that covered a wide

range of issues. Sometimes forecasters would introduce ˜learning™ by

allowing parameters and weights of combined forecasts to be dynam-

ically updated. These frequent updates did not always lead to better

results, however. Dimson and Marsh (1990) found ex ante time-varying

optimized weighting schemes do not always work well in out-of-sample

forecasts. Sill (1993) found S&P500 volatility was higher during reces-

sion and that commercial T-Bill spread helped to predict stock-market

volatility.

The randow walk and historical average method seems naive at ¬rst,

but they seem to work very well for medium and long horizon forecasts.

For forecast horizons that are longer than 6 months, low-frequency data

over a period at least as long as the forecast horizon works best. To

provide equity volatility for investment over a 5-year period for exam-

ple, Alford and Boatsman (1995) recommended, after studying a sam-

ple of 6879 stocks, that volatility should be estimated from weekly or

monthly returns from the previous 5 years and that adjustment made

based on industry and company size. Figlewski (1997) analysed the

volatility of the S&P500, the long- and short-term US interest rate and

the Deutschemark“dollar exchange rate and the use of monthly data

over a long period provides the best long-horizon forecast. Alford and

Boatsman (1995), Figlewski (1997) and Figlewski and Green (1999) all

stressed the importance of having a long enough estimation period to

make good volatility forecasts over long horizon.

4

ARCH

Financial market volatility is known to cluster. A volatile period tends to

persist for some time before the market returns to normality. The ARCH

(AutoRegressive Conditional Heteroscedasticity) model proposed by

Engle (1982) was designed to capture volatility persistence in in¬‚ation.

The ARCH model was later found to ¬t many ¬nancial time series and its

widespread impact on ¬nance has led to the Nobel Committee™s recog-

nition of Rob Engle™s work in 2003. The ARCH effect has been shown

to lead to high kurtosis which ¬ts in well with the empirically observed

tail thickness of many asset return distributions. The leverage effect, a

phenomenon related to high volatility brought on by negative return,

is often modelled with a sign-based return variable in the conditional

volatility equation.

4.1 ENGLE (1982)

The ARCH model, ¬rst introduced by Engle (1982), has been ex-

tended by many researchers and extensively surveyed in Bera and

Higgins (1993), Bollerslev, Chou and Kroner (1992), Bollerslev,

Engle and Nelson (1994) and Diebold and Lopez (1995). In contrast to

the historical volatility models described in the previous chapter, ARCH

models do not make use of the past standard deviations, but formulate

conditional variance, h t , of asset returns via maximum likelihood pro-

cedures. (We follow the ARCH literature here by writing σt2 = h t .) To

illustrate this, ¬rst write returns, rt , as

r t = µ + µt ,

µt = h t z t , (4.1)

where z t ∼ D (0, 1) is a white noise. The distribution D is often taken as

normal. The process z t is scaled by h t , the conditional variance, which

in turn is a function of past squared residual returns. In the ARCH(q)

process proposed by Engle (1982),

q

ht = ω + ± j µt’ j

2

(4.2)

j=1

38 Forecasting Financial Market Volatility

with ω > 0 and ± j ≥ 0 to ensure h t is strictly positive variance. Typi-

cally, q is of high order because of the phenomenon of volatility per-

sistence in ¬nancial markets. From the way in which volatility is con-

structed in (4.2), h t is known at time t ’ 1. So the one-step-ahead forecast

is readily available. The multi-step-ahead forecasts can be formulated

by assuming E µt+„ = h t+„ .

2

The unconditional variance of rt is

ω

σ2 = .

q

1’ ±j

j=1

The process is covariance stationary if and only if the sum of the autore-

q

gressive parameters is less than one j=1 ± j < 1.

4.2 GENERALIZED ARCH

For high-order ARCH(q) process, it is more parsimonious to model

volatility as a GARCH( p, q) (generalized ARCH due to Bollerslev

(1986) and Taylor (1986)), where additional dependencies are permitted

on p lags of past h t as shown below:

p q

ht = ω + βi h t’i + ± j µt’ j

2

i=1 j=1

and ω > 0. For GARCH(1, 1), the constraints ±1 ≥ 0 and β1 ≥ 0 are

needed to ensure h t is strictly positive. For higher orders of GARCH,

the constraints on βi and ± j are more complex (see Nelson and Cao

(1992) for details). The unconditional variance equals

ω

σ2 = p q

1’ βi ’ ±j

i=1 j=1

The GARCH( p, q) model is covariance stationary if and only if

p q

i=1 βi + j=1 ± j < 1.

Volatility forecasts from GARCH(1, 1) can be made by repeated sub-

stitutions. First, we make use of the relationship (4.1) to provide an

estimate for the expected squared residuals

E µt2 = h t E z t2 = h t .

Arch 39

The conditional variance h t+1 and the one-step-ahead forecast is known

at time t,

h t+1 = ω + ±1 µt2 + β1 h t . (4.3)

The forecast of h t+2 makes use of the fact that E µt+1 = h t+1 and we

2

get

h t+2 = ω + ±1 µt+1 + β1 h t+1

2

= ω + (±1 + β1 ) h t+1 .

Similarly,

h t+3 = ω + (±1 + β1 ) h t+2

= ω + ω (±1 + β1 ) + (±1 + β1 )2 h t+1

= ω + ω (±1 + β1 ) + ω (±1 + β1 )2 + (±1 + β1 )2 ±1 µt2 + β1 h t .

As the forecast horizon „ lengthens,

ω

+ (±1 + β1 )„ ±1 µt2 + β1 h t .

h t+„ = (4.4)

1 ’ (±1 + β1 )

If ±1 + β1 < 1, the second term on the RHS of (4.4) dies out eventually

and h t+„ converges to ω/[1 ’ (±1 + β1 )], the unconditional variance.

If we write …t = µt2 ’ h t and substitute h t = µt2 ’ …t into (4.3), we

get

µt2 ’ …t = ω + ±1 µt’1 + β1 µt’1 ’ β1 …t’1

2 2

µt2 = ω + (±1 + β1 ) µt’1 + …t ’ β1 …t’1 .

2

(4.5)

Hence, µt2 , the squared residual returns follow an ARMA process with

autoregressive parameter (±1 + β1 ). If ±1 + β1 is close to 1, the autore-

gressive process in (4.5) dies out slowly.

4.3 INTEGRATED GARCH

p q

For a GARCH( p, q) process, when i=1 ±i + j=1 β j = 1, the un-

conditional variance σ 2 ’ ∞ is no longer de¬nite. The series rt is not

covariance stationary, although it remains strictly stationary and ergodic.

The conditional variance is then described as an integrated GARCH (de-

noted as IGARCH) and there is no ¬nite fourth moment.1

1

This is not the same as, and should not be confused with, the ˜integrated volatility™ described in Section 1.3.3.

40 Forecasting Financial Market Volatility

An in¬nite volatility is a concept rather counterintuitive to real phe-

nomena in economics and ¬nance. Empirical ¬ndings suggest that

GARCH(1, 1) is the most popular structure for many ¬nancial time

series. It turns out that RiskmetricsTM EWMA (exponentially weighted

moving average) is a nonstationary version of GARCH(1, 1) where the

persistence parameters, ±1 and β1 , sum to 1. To see the parallel, we ¬rst

make repeated substitution of (4.3) and obtain

h t+2 = ω + ±µt+1 + βh t+1

2

= ω + ωβ + ±µt+1 + ±βµt2 + β 2 h t ,

2

„ „

β i’1 µt+„ ’1 + β „ h t .

h t+„ = ω β +±

i’1 2

i=1 i=1

When „ ’ ∞, and provided that β < 1 we can infer that

∞

ω

ht = +± β i’1 µt’i .

2

(4.6)

1’β i=1

Next, we have the EWMA model for the sample standard deviations,

where

1

σ2 = σt’1 + »σt’1 + · · · + »n σt’n .

2 2 2

1 + » + »2 + · · · + »n

t

As n ’ ∞, and provided that » < 1

∞

σ2 = (1 ’ ») »i’1 σt’i .

2

(4.7)

t

i=1

If we view µt2 as a proxy for σt2 , (4.6) and (4.7) are both autoregressive

series with long distributed lags, except that (4.6) has a constant term

and (4.7) has not.2

While intuitively unconvincing as a volatility process because of the

in¬nite variance, the EWMA model has nevertheless been shown to be

powerful in volatility forecasting as it is not constrained by a mean level

of volatility (unlike e.g. the GARCH(1, 1) model), and hence it adjusts

readily to changes in unconditional volatility.

2

EWMA, a sample standard deviation model, is usually estimated based on minimizing in-sample forecast

errors. There is no volatility error in GARCH conditional variance. This is why σ 2 in (4.7) has a hat and h t in

t

(4.6) has not.

Arch 41

4.4 EXPONENTIAL GARCH

The exponential GARCH (denoted as EGARCH) model is due to Nelson

(1991). The EGARCH( p, q) model speci¬es conditional variance in log-

arithmic form, which means that there is no need to impose an estimation

constraint in order to avoid negative variance;

q

ln h t = ±0 + β j ln h t’ j

j=1

p

+ γk | t’k |

+ θk ’ 2/π

t’k

k=1

= µt ht .

t

Here, h t depends on both the size and the sign of µt . With appropriate

conditioning of the parameters, this speci¬cation captures the stylized

fact that a negative shock leads to a higher conditional variance in the

subsequent period than a positive shock. The process is covariance sta-

q

tionary if and only if j=1 β j < 1.

Forecasting with EGARCH is a bit involved because of the logarithmic

transformation. Tsay (2002) showed how forecasts can be formulated

with EGARCH(1, 0) and gave the one-step-ahead forecast as

h t+1 = h 2±1 exp [(1 ’ ±1 ) ±0 ] exp [g ( )]

t

+γ | t’1 |

g( ) = θ ’ 2/π .

t’1

For the multi-step forecast

h t+„ = h 2±1 („ ’ 1) exp (ω) exp 0.5 (θ + γ )2 (θ + γ )

t

+ exp 0.5 (θ ’ γ )2 (θ ’ γ ) ,

where

ω = (1 ’ ±1 ) ±0 ’ γ 2/π

and (·) is the cumulative density function of the standard normal dis-

tribution.

4.5 OTHER FORMS OF NONLINEARITY

Models that also allow for nonsymmetrical dependencies include the

GJR-GARCH (Glosten, Jagannathan and Runkle, 1993) as shown

42 Forecasting Financial Market Volatility

below:

p q

ht = ω + βi h t’i + ± j µt’ j + δ j D j, t’1 µt’ j

2 2

i=1 j=1

1 if µt’1 < 0

Dt’1 = ,

0 if µt’1 ≥ 0

The conditional volatility is positive when parameters satisfy ±0 > 0,

±i ≥ 0, ±i + γi ≥ 0 and β j ≥ 0, for i = 1, · · · , p and j = 1, · · · , q.

The process is covariance stationary if and only if

p q

1

βi + ±j + γj < 1.

2

i=1 j=1

Take the GJR-GARCH(1, 1) case as an example. The one-step-ahead

forecast is

h t+1 = ω + β1 h t + ±1 µt2 + δ1 µt2 Dt ,

and the multi-step forecast is

1

h t+„ = ω + (±1 + γ1 ) + β1 h t+„ ’1

2

and use repeated substitution for h t+„ ’1 .

The TGARCH (threshold GARCH) model from Zako¨an (1994) is

±

similar to GJR-GARCH but is formulated with absolute return instead:

p q

±i |µt’i | + γi Di, t’i |µt’i | +

σ t = ±0 + β j σt’ j . (4.8)

i=1 j=1

The conditional volatility is positive when ±0 > 0, ±i ≥ 0, ±i + γi ≥ 0

and β j ≥ 0, for i = 1, · · · , p and j = 1, · · · , q. The process is covari-

ance stationary, in the case p = q = 1, if and only if

12 2

β1 + ±1 + (±1 + γ1 )2 + √ β1 (±1 + γ1 ) < 1.

2

2 2π

QGARCH (quadratic GARCH) and various other nonlinear GARCH

models are reviewed in Franses and van Dijk (2000). A QGARCH(1, 1)

has the following structure

h t = ω + ± (µt’1 ’ γ )2 + βh t’1 .

Arch 43

4.6 FORECASTING PERFORMANCE

Although Taylor (1986) was one of the earliest studies to test the pre-

dictive power of GARCH Akigray (1989) is more commonly cited in

many subsequent GARCH studies, although an earlier investigation had

appeared in Taylor (1986). In the following decade, there were no fewer

than 20 papers that test GARCH predictive power against other time

series methods and against option implied volatility forecasts. The ma-

jority of these forecast volatility of major stock indices and exchange

rates.

The ARCH class models, and their variants, have many supporters.

Akgiray ¬nds GARCH consistently outperforms EWMA and RW in

all subperiods and under all evaluation measures. Pagan and Schwert

(1990) ¬nd EGARCH is best, especially in contrast to some nonpara-

metric methods. Despite a low R 2 , Cumby, Figlewski and Hasbrouck

(1993) conclude that EGARCH is better than RW. Figlewski (1997)

¬nds GARCH superiority con¬ned to the stock market and for forecast-

ing volatility over a short horizon only.

In general, models that allow for volatility asymmetry come out well

in the forecasting contest because of the strong negative relationship be-

tween volatility and shock. Cao and Tsay (1992), Heynen and Kat (1994),

Lee (1991) and Pagan and Schwert (1990) favour the EGARCH model

for volatility of stock indices and exchange rates, whereas Brailsford

and Faff (1996) and Taylor, J. (2004) ¬nd GJR-GARCH outperforms

GARCH in stock indices. Bali (2000) ¬nds a range of nonlinear models

work well for forecasting one-week-ahead volatility of US T-Bill yields.

Cao and Tsay (1992) ¬nd the threshold autoregressive model (TAR in

the previous chapter) provides the best forecast for large stocks and

EGARCH gives the best forecast for small stocks, and they suspect that

the latter might be due to a leverage effect.

Other studies ¬nd no clear-cut result. These include Lee (1991),

West and Cho (1995), Brailsford and Faff (1996), Brooks (1998), and

McMillan, Speight and Gwilym (2000). All these studies (and many

other volatility forecasting studies) share one or more of the following

characteristics: (i) they test a large number of very similar models all

designed to capture volatility persistence, (ii) they use a large number

of forecast error statistics, each of which has a very different loss func-

tion, (iii) they forecast and calculate error statistics for variance and

not standard deviation, which makes the difference between forecasts

of different models even smaller, (iv) they use squared daily, weekly or

44 Forecasting Financial Market Volatility

monthly returns to proxy daily, weekly or monthly ˜actual™ volatility,

which results in extremely noisy ˜actual™ volatility estimates. The noise

in the ˜actual™ volatility estimates makes the small differences between

forecasts of similar models indistinguishable.

Unlike the ARCH class model, the ˜simpler™ methods, including the

EWMA method, do not separate volatility persistence from volatility

shocks and most of them do not incorporate volatility mean reversion.

The ˜simpler™ methods tend to provide larger volatility forecasts most

of the time because there is no constraint on stationarity or convergence

to the unconditional variance, and may result in larger forecast errors

and less frequent VaR violations. The GJR model allows the volatility

persistence to change relatively quickly when return switches sign from

positive to negative and vice versa. If unconditional volatility of all

parametric volatility models is the same, then GJR will have the largest

probability of an underforecast.3 This possibly explains why GJR was

the worst-performing model in Franses and Van Dijk (1996) because they

use MedSE (median standard error) as their sole evaluation criterion. In

Brailsford and Faff (1996), the GJR(1, 1) model outperforms the other

models when MAE, RMSE and MAPE are used.

There is some merit in using ˜simpler™ methods, and especially mod-

els that include long distributed lags. As ARCH class models assume

variance stationarity, the forecasting performance suffers when there are

changes in volatility level. Parameter estimation becomes unstable when

the data period is short or when there is a change in volatility level. This

has led to a GARCH convergence problem in several studies (e.g. Tse and

Tung (1992) and Walsh and Tsou (1998)). Taylor (1986), Tse (1991),

Tse and Tung (1992), Boudoukh, Richardson and Whitelaw (1997),

Walsh and Tsou (1998), Ederington and Guan (1999), Ferreira (1999),

and Taylor, J, (2004) all favour some form of exponential smoothing

method to GARCH for forecasting volatility of a wide range of assets

ranging from equities, exchange rates to interest rates.

3

This characteristic is clearly evidenced in Table 2 of Brailsford and Faff (1996). The GJR(1, 1) model

underforecasts 76 (out of 90) times. The RW model has an equal chance of underforecasts and overforecasts,

whereas all the other methods overforecast more than 50 (out of 90) times.

5

Linear and Nonlinear Long

Memory Models

As mentioned before, volatility persistence is a feature that many time

series models are designed to capture. A GARCH model features an

exponential decay in the autocorrelation of conditional variances. How-

ever, it has been noted that squared and absolute returns of ¬nancial

assets typically have serial correlations that are slow to decay, similar to

those of an I(d) process. A shock in the volatility series seems to have

very ˜long memory™ and to impact on future volatility over a long hori-

zon. The integrated GARCH (IGARCH) model of Engle and Bollerslev

(1986) captures this effect, but a shock in this model impacts upon future

volatility over an in¬nite horizon and the unconditional variance does

not exist for this model.

5.1 WHAT IS LONG MEMORY IN VOLATILITY?

Let ρ„ denote the correlation between xt and xt’„ . The time series xt

is said to have a short memory if n =1 ρ„ converges to a constant as n

„

becomes large. A long memory series has autocorrelation coef¬cients

that decline slowly at a hyperbolic rate. Long memory in volatility oc-

curs when the effects of volatility shocks decay slowly which is often

detected by the autocorrelation of measures of volatility, such as abso-

lute or squared returns. A long memory process is covariance stationary

if n =1 ρ„ /„ 2d’1 , for some positive d < 1 , converges to a constant as

„ 2

n ’ ∞. When d ≥ 2 , the volatility series is not covariance stationary

1

although it is still strictly stationary. Taylor (1986) was the ¬rst to note

that autocorrelation of absolute returns, |rt |, is slow to decay compared

with that of rt2 . The highly popular GARCH model is a short memory

model based on squared returns rt2 . Following the work of Granger and

Joyeux (1980) and Hosking (1981), where fractionally integrated se-

ries was shown to exhibit long memory property described above, Ding,

Granger and Engle (1993) propose a fractionally integrated model based

on |rt |d where d is a fraction. The whole issue of Journal of Economet-

rics, 1996, vol. 73, no. 1, edited by Richard Baillie and Maxwell King

46 Forecasting Financial Market Volatility

was devoted to long memory and, in particular, fractional integrated

series.

There has been a lot of research investigating whether long memory of

volatility can help to make better volatility forecasts and explain anom-

alies in option prices. Hitherto much of this research has used the frac-

tional integrated models described in Section 5.3. More recently, several

studies have showed that a number of nonlinear short memory volatility

models are capable of producing spurious long memory characteris-

tics in volatility as well. Examples of such nonlinear models include the

break model (Granger and Hyung, 2004), the volatility component model

(Engle and Lee, 1999), and the regime-switching model (Hamilton and

Susmel, 1994; Diebold and Inoue, 2001). In these three models, volatil-

ity has short memory between breaks, for each volatility component

and within each regime. Without controlling for the breaks, the different

components and the changing regimes, volatility will produce spuri-

ous long memory characteristics. Each of these short memory nonlinear

models provides a rich interpretation of the ¬nancial market volatil-

ity structure compared with the apparently myopic fractional integrated

model which simply requires ¬nancial market participants to remem-

ber and react to shocks for a long time. Discussion of these competing

models is provided in Section 5.4.

5.2 EVIDENCE AND IMPACT OF VOLATILITY

LONG MEMORY

The long memory characteristic of ¬nancial market volatility is well

known and has important implications for volatility forecasting and op-

tion pricing. Some evidence of long memory has already been presented

in Section 1.3. In Table 5.1, we present some statistics from a wider

range of assets and through simulation that we published in the Finan-

cial Analysts Journal recently. In the table, we report the sum of the ¬rst

1000 autocorrelation coef¬cients for a number of volatility proxies for

a selection of stock indices, stocks, exchange rates, interest rates and

commodities. We have also presented the statistics for GARCH(1, 1)

and GJR-GARCH(1, 1) series, both simulated using high volatility per-

sistence parameters. The statistics for the simulated series are in the

range of 0.478 to 2.308 while the empirical statistics are much higher.

As noted by Taylor (1986), the absolute return has a longer memory

than the square returns. This has been known as the ˜Taylor effect™.

But, taking logs or trimming the data by capping the values in the 0.1%

Long Memory Models 47

tails often lengthens the memory. This phenomenon continues to puzzle

volatility researchers.

The impact of volatility long memory on option pricing has been

studied in Bollerslev and Mikkelsen (1996, 1999), Taylor (2000) and

Ohanissian, Russel and Tsay (2003). The effect is best understood an-

alytically from the stochastic volatility option pricing model which is

based on stock having the stochastic process below:

√

d St = µSdt + …t Sdz s,t ,

√

d…t = κ [θ ’ …t ] dt + σν …t dz …,t ,

which, in a risk-neutral option pricing framework, becomes

√ *

d…t = κ [θ ’ …t ] dt ’ »…t dt + σν …t dz …,t

√ *

= κ * θ * ’ …t dt + σν …t dz …,t , (5.1)

where …t is the instantaneous variance, κ is the speed of mean reversion, θ

is the long run level of volatility, σν is the ˜volatility of volatility™, » is the

market price of (volatility) risk, and κ * = κ + » and θ * = κθ /(κ + »).

The two Wiener processes, dz s,t and dz …,t have constant correlation ρ.

Here κ * is the risk-neutral mean reverting parameter and θ * is the risk-

neutral long run level of volatility. The parameter σν and ρ implicit in

the risk-neutral process are the same as that in the real volatility process.

In the risk-neutral stochastic volatility process in (5.1), a low κ (or κ * )

corresponds to strong volatility persistence, volatility long memory and

high kurtosis. A fast, mean reverting volatility will reduce the impact of

stochastic volatility. The effect of low κ (or high volatility persistence)

is most pronounced when θ the long run level is low but the initial

˜instantaneous™ volatility is high as shown in the table below. The table

reports kurtosis of the simulated distribution when κ = 0.1, » = ρ = 0.

When the correlation coef¬cient ρ is zero, the distribution is symmetrical

and has zero skewness.

…t \ θ 0.05 0.1 0.15 0.02 0.25 0.3

0.1 5.90 4.45 3.97 3.73 3.58 3.48

0.2 14.61 8.80 6.87 5.90 5.32 4.94

0.3 29.12 16.06 11.71 9.53 8.22 7.35

0.4 49.44 26.22 18.48 14.61 12.29 10.74

0.5 75.56 39.28 27.19 21.14 17.51 15.09

At low mean version κ, the option pricing impact crucially de-

pends on the initial volatility, however. Figure 5.1 below presents the

Black“Scholes implied volatility inverted from simulated option prices

Table 5.1 Sum of autocorrelation coef¬cients of the ¬rst 1000 lags for selected ¬nancial time series and simulated GARCH and GJR processes

No. of obs ρ(|r |) ρ(r 2 ) ρ(ln |r |) ρ(|T r |)

Stock Market Indices:

USA S&P500 Composite 9676 35.687 3.912 27.466 40.838

Germany DAX 30 Industrial 9634 75.571 37.102 41.890 79.186

Japan NIKKEI 225 Stock Average 8443 89.559 23.405 84.257 95.789

France CAC 40 8276 43.310 17.467 22.432 46.539

UK FTSE All Share and FTSE100 8714 30.817 12.615 18.394 33.199

Average STOCK INDICES 54.989 18.900 38.888 59.110

Stocks:

Cadbury Schweppes 7418 48.607 19.236 85.288 50.235

Marks & Spencer Group 7709 40.635 17.541 67.480 42.575

Shell Transport 8115 38.947 20.078 44.711 40.035

FTSE Small Cap Index 4437 25.381 3.712 35.152 28.533

Average STOCKS 38.392 15.142 58.158 40.344

Exchange Rates:

7942 56.308 24.652 84.717 57.432

US $ to UK £

7859 32.657 0.052 72.572 48.241

Australian $ to UK £

5394 9.545 1.501 13.760 14.932

Mexican Peso to UK £

2964 20.819 4.927 31.509 21.753

Indonesian Rupiah to UK £

Average EXCHANGE RATES 29.832 7.783 50.640 35.589

Interest Rates:

US 1 month Eurodollar deposits 8491 281.799 20.782 327.770 331.877

UK Interbank 1-month 7448 12.699 0.080 22.901 25.657

Venezuela PAR Brady Bond 3279 19.236 9.944 32.985 19.800

South Korea Overnight Call 2601 54.693 12.200 57.276 56.648

Average INTEREST RATES 92.107 10.752 110.233 108.496

Table 5.1 (Continued)

Commodities:

Gold, Bullion, $/troy oz (London ¬xing) close 6536 125.309 39.305 140.747 133.880

Silver Fix (LBM), cash cents/troy oz 7780 45.504 8.275 88.706 52.154

Brent Oil (1 month forward) $/barrel 2389 11.532 5.469 9.882 11.81

Average COMMODITIES 60.782 17.683 79.778 65.948

Average ALL 54.931 14.113 65.495 61.555

1000 simulated GARCH : mean 10 000 1.045 1.206 0.478 1.033

standard deviation (1.099) (1.232) (0.688) (1.086)

1000 simulated GJR: mean 10 000 1.945 2.308 0.870 1.899

standard deviation (1.709) (2.048) (0.908) (1.660)

Note: ˜Tr™ denote trimmed returns whereby returns in the 0.01% tail take the value of the 0.01% quantile.

The simulated GARCH process is

µt = z t h t , µt ∼ N (0, 1)

2

h t = (1 ’ 0.96 ’ 0.02) + 0.96h t’1 + 0.02µt’1 .

The simulated GJR process is

µt = z t h t , µt ∼ N (0, 1)

2 2

h t = (1 ’ 0.9 ’ 0.03 ’ 0.5 — 0.09) + 0.9h t’1 + 0.03µt’1 + 0.09Dt’1 µt’1 ,

1 for µt < 0

Dt =

0 otherwise.

Copyright 2004, CFA Institute. Reproduced and republished from Financial Analysts Journal with permission from CFA Institute. All Rights Reserved.

50 Forecasting Financial Market Volatility

k =0.01, ·…t = 0.7 k =3, ·…t =0.7

k =0.01, ·…t = 0.15 k =3, ·…t =0.15

0.8

0.7

0.6

BSIV

0.5

0.4

0.3

0.2

0.1

0. 10 4 9

0

50 60 70 80 90 100 110 120 13 0 140 150

Strike Price, K

Figure 5.1 Effect of kappa

(S = 100, r = 0, T = 1, » = 0, σ… = 0.6, θ = 0.2)

produced from a stochastic option pricing model. The Black“Scholes

model is used here only to get the implied volatility which gives a

clearer relative pricing relationship. The Black“Scholes implied volatil-

ity (BSIV) is directly proportional to option price. First we look at the

high volatility state where …t = 0.7. The implied volatility for κ = 0.01

is higher than that for κ = 3.0, which means that a long memory volatility

(slow mean reversion and high volatility persistence) will lead to a higher

option price. But, in reverse, long memory volatility will result in lower

option prices, hence lower implied volatility at low volatility state, e.g.

√

…t = 0.15. So unlike the conclusion in previous studies, long memory

in volatility does not always lead to higher option prices. It is conditioned

on the current level of volatility vis-` -vis the long run level of volatility.

a

5.3 FRACTIONALLY INTEGRATED MODEL

Both the historical volatility models and the ARCH models have been

tested for fractional integration. Baillie, Bollerslev and Mikkelsen (1996)

¬tted FIGARCH to US dollar“Deutschemark exchange rates. Bollerslev

and Mikkelsen (1996, 1999) used FIEGARCH to study S&P500

volatility and option pricing impact, and so did Taylor (2000). Vilasuso

(2002) tested FIGARCH against GARCH and IGARCH for volatility

prediction for ¬ve major currencies. In Andersen, Bollerslev, Diebold

and Labys (2003), a vector autoregressive model with long distributed

lags was built on the realized volatility of three exchange rates, which

Long Memory Models 51

they called the VAR-RV model. In Zumbach (2002) the weights applied

to the time series of realized volatility follow a power law, which he

called the LM-ARCH model. Three other papers, viz. Li (2002), Martens

and Zein (2004) and Pong, Shackleton, Taylor and Xu (2004), compared

long memory volatility model forecasts with option implied volatility. Li

(2002) used ARFIMA whereas the other two papers used log-ARFIMA.

Hwang and Satchell (1998) studied the log-ARFIMA model also, but

they forecast Black“Scholes ˜risk-neutral™ implied volatility of the eq-

uity option instead of the underlying asset.

5.3.1 FIGARCH

The FIGARCH(1, d, 1) model below:

h t = ω + [1 ’ β1 L ’ (1 ’ φ1 L)(1 ’ L)d ]µt2 + β1 h t’1

was used in Baillie, Bollerslev and Mikkelsen (1996), and all the fol-

lowing speci¬cations are equivalent:

(1 ’ β1 L)h t = ω + [1 ’ β1 L ’ (1 ’ φ1 L)(1 ’ L)d ]µt2 ,

h t = ω(1 ’ β1 )’1 + (1 ’ β1 L)’1

—[(1 ’ β1 L) ’ (1 ’ φ1 L)(1 ’ L)d ]µt2 ,

h t = ω(1 ’ β1 )’1 + [1 ’ (1 ’ β1 L)’1 (1 ’ φ1 L)(1 ’ L)d ]µt2 .

For the one-step-ahead forecast

h t+1 = ω(1 ’ β1 )’1 + [1 ’ (1 ’ β1 L)’1 (1 ’ φ1 L)(1 ’ L)d ]µt2 ,

and the multi-step-ahead forecast is

h T +„ = ω(1 ’ β1 )’1 + [1 ’ (1 ’ β1 L)’1 (1 ’ φ1 L)(1 ’ L)d ]µT +„ ’1 .

2

The FIGARCH model is estimated based on the approximate maxi-

mum likelihood techniques using the truncated ARCH representation.

We can transform the FIGARCH model to the ARCH model with in¬nite

lags. The parameters in the lag polynomials

»(L) = 1 ’ (1 ’ β1 L)’1 (1 ’ φ1 L)(1 ’ L)d

may be written as

»1 = φ1 ’ β1 + d,

»k = β1 »k’1 + (πk ’ φ1 πk’1 ) for k ≥ 2,

52 Forecasting Financial Market Volatility

where

∞

(1 ’ L) = πj L j,

d

j=0

π0 = 0.

In the literature, a truncation lag at J = 1000 is common.

5.3.2 FIEGARCH

Bollerslev and Mikkelsen (1996) ¬nd fractional integrated models pro-

vide better ¬t to S&P500 returns. Speci¬cally, they ¬nd that frac-

tionally integrated models perform better than GARCH( p, q) and

IGARCH( p, q), and that FIEGARCH speci¬cation is better than FI-

GARCH. Bollerslev and Mikkelsen (1999) con¬rm that FIEGARCH

beats EGARCH and IEGARCH in pricing options of S&P500 LEAPS

(Long-term Equity Anticipation Securities) contracts. Speci¬cally

Bollerslev and Mikkelsen (1999) ¬tted an AR(2)-FIEGARCH(1, d, 1)

as shown below:

rt = µ + ρ1 L + ρ2 L 2 rt + z t , (5.2)

ln σt2 = ωt + (1 + ψ1 L) (1 ’ φ1 L)’1 (1 ’ L)’d g ( t ) ,

g ( t ) = θ t’1 + γ [| t’1 | ’ E | t’1 |] ,

ωt = ω + ln (1 + δ Nt ) .

The FIEGARCH model in (5.2) is truly a model for absolute return.

Since both EGARCH and FIEGARCH provide forecasts for ln σ , to

infer forecast for σ from ln σ requires adjustment for Jensen inequality

which is not a straightforward task without the assumption of a normal

distribution for ln σ .

5.3.3 The positive drift in fractional integrated series

As Hwang and Satchell (1998) and Granger (2001) pointed out, positive

I(d) process has a positive drift term or a time trend in volatility level

which is not observed in practice. This is a major weakness of the frac-

tionally integrated model for it to be adopted as a theoretically sound

model for volatility.

All fractional integrated models of volatility have a nonzero drift.

In practice the estimation of fractional integrated models require an

arbitrary truncation of the in¬nite lags and as a result the mean will

be biased. Zumbach™s (2002) LM-ARCH will not have this problem

because of the ¬xed number of lags and the way in which the weights are

Long Memory Models 53

calculated. Hwang and Satchell™s (1998) scaled-truncated log-ARFIMA

model is mean adjusted to control for the bias that is due to this truncation

and the log transformation. The FIGARCH has a positive mean in the

conditional variance equation whereas FIEGARCH has no such problem

because the lag-dependent terms have zero mean.

5.3.4 Forecasting performance

Vilasuso (2002) ¬nds FIGARCH produces signi¬cantly better 1- and

10-day-ahead volatility forecasts for ¬ve major exchange rates than

GARCH and IGARCH. Zumbach (2002) produces only one-day-ahead

forecasts and ¬nd no difference among model performance. Andersen,

Bollerslev, Diebold and Labys (2003) ¬nd the realized volatility con-

structed VAR model, i.e. VAR-RV, produces the best 1- and 10-day-

ahead volatility forecasts. It is dif¬cult to attribute this superior

performance to the fractional integrated model alone because the VAR

structure allows a cross series linkage that is absent in all other univari-

ate models and we also know that the more accurate realized volatility

estimates would result in improved forecasting performance, everything

else being equal.

The other three papers that compare forecasts from LM models with

implied volatility forecasts generally ¬nd implied volatility forecast to

produce the highest explanatory power. Martiens and Zein (2004) ¬nd

log-ARFIMA forecast beats implied in S&P500 futures but not in ¥/US$

and crude oil futures. Li (2002) ¬nds implied produces better short hori-

zon forecast whereas the ARFIMA provides better forecast for a 6-month

horizon. However, when regression coef¬cients are constrained to be

± = 0 and β = 1, the regression R 2 becomes negative at long horizon.

From our discussion in Section 2.4, this suggests that volatility at the

6-month horizon might be better forecast using the unconditional vari-

ance instead of model-based forecasts. Pong, Shackleton, Taylor and Xu

(2004) ¬nd implied volatility to outperform time series volatility models

including the log-ARFIMA model in forecasting 1- to 3-month-ahead

volatility of the dollar-sterling exchange rate.

Many of the fractional integration papers were written more recently

and used realized volatilities constructed from intraday high-frequency

data. When comparison is made with option implied volatility, the im-

plied volatility is usually extracted from daily closing option prices,

however. Despite the lower data frequency, implied appears to outper-

form forecasts from LM models that use intraday information.

54 Forecasting Financial Market Volatility

5.4 COMPETING MODELS FOR VOLATILITY LONG

MEMORY

Fractionally integrated series is the simplest linear model that produces

long memory characteristics. It is also the most commonly used and

tested model in the literature for capturing long memory in volatility.

There are many other nonlinear short memory models that exhibit spu-

rious long memory in volatility, viz. break, volatility component and

regime-switching models. These three models, plus the fractional inte-

grated model, have very different volatility dynamics and produce very

different volatility forecasts.

The volatility breaks model permits the mean level of volatility to

change in a step function through time with some weak constraint on

the number of breaks in the volatility level. It is more general than the

volatility component model and the regime switching model. In the case

of the volatility component model, the mean level is a slowly evolving

process. For the regime-switching model, the mean level of volatility

could differ according to regimes the total number of which is usually

con¬ned to a small number such as two or three.

5.4.1 Breaks

A break process can be written as

Vt = m t + u t ,

where u t is a noise variable and m t represents occasional level shifts.

m t are controlled by qt (a zero“one indicator for the presence of breaks)

and ·t (the size of jump) such that

t

m t = m t’1 + qt ·t = m 0 + qi ·i ,

i=1

0, with probability 1 ’ p

qt = .

1, with probability p

The expected number of breaks for a given sample is Tp where T

is the total number of observations. Provided that p converges to zero

slowly as the sample size increases, i.e. p ’ 0 as T ’ ∞, such that

limT ’∞ Tp is a nonzero constant, Granger and Hyung (2004) showed

that the integrating parameter, I (d), is a function of Tp. While d is

Long Memory Models 55

bounded between 0 and 1, the expected value of d is proportionate to

the number of breaks in the series.

One interesting empirical ¬nding on the volatility break model comes

from Aggarwal, Inclan and Leal (1999) who use the ICSS (integrated

cumulative sums of squares) algorithm to identify sudden shifts in the

variance of 20 stock market indices and the duration of such shifts.

They ¬nd most volatility shifts are due to local political events. When

dummy variables, indicating the location of sudden change in variance,

were ¬tted to a GARCH(1,1) model, most of the GARCH parameters be-

came statistically insigni¬cant. The GARCH(1,1) with occasional break

model can be written as follows:

h t = ω1 D1 + · · · + ω R+1 D R+1 + ±1 µt’1 + β1 h t’1 ,

2

where D1 , · · · , D R+1 are the dummy variables taking the value 1 in

each regime of variance, and zero elsewhere. The one-step-ahead and

multi-step-ahead forecasts are

h t+1 = ω R+1 + ±1 µt2 + β1 h t ,

h t+„ = ω R+1 + (±1 + β1 )h t+„ ’1 .

In estimating the break points using the ICSS algorithms, a minimum

length between breaks is needed to reduce the possibility of any tempo-

rary shocks in a series being mistaken as break.

5.4.2 Components model

Engle and Lee (1999) proposed the component GARCH (CGARCH)

model whereby the volatility process is modelled as the sum of a perma-

nent process, m t , that has memory close to a unit root, and a transitory

mean reverting process, u t , that has a more rapid time decay. The model

can be seen as an extension of the GARCH(1,1) model with the con-

ditional variance mean-revert to a long term trend level, m t , instead of

a ¬xed position at σ . Speci¬cally, m t is permitted to evolve slowly in

an autoregressive manner. The CGARCH(1,1) model has the following

speci¬cation:

(h t ’ m t ) = ± µt’1 ’ m t’1 + β (h t’1 ’ m t’1 ) ≡ u t ,

2

(5.3)

m t = ω + ρm t’1 + • µt’1 ’ h t’1 ,

2

where (h t ’ m t ) = u t represents the short-run transitory component and

m t represents a time-varying trend or permanent component in volatility

56 Forecasting Financial Market Volatility

which is driven by volatility prediction error µt’1 ’ h t’1 and is inte-

2

grated if ρ = 1.

For the one-step-ahead forecast

h t+1 = qt+1 + ± µt2 ’ qt + β (h t ’ qt ) ,

qt+1 = ω + ρqt + • µt2 ’ h t ,

and for the multi-step-ahead forecast

h t+„ = qt+„ ’ (± + β)qt+„ ’1 + (± + β)h t+„ ,

qt+„ = ω + ρqt+„ ’1 ,

where h t+„ and qt+„ ’1 are calculated through repeat substitutions.

This model has various interesting properties: (i) both m t and u t are

driven by µt’1 ’ h t’1 ; (ii) the short-run volatility component mean-

2

reverts to zero at a geometric rate of (± + β) if 0 < (± + β) < 1; (iii)

the long-run volatility component evolves over time following an AR

process and converge to a constant level de¬ned by ω/ (1 ’ ρ) if 0 <

ρ < 1; (iv) it is assumed that 0 < (± + β) < ρ < 1 so that the long-run

component is more persistent than the short-run component.

This model was found to obey several economic and asset pricing

relationships. Many have observed and proposed that the volatility per-

sistence of large jumps is shorter than shocks due to ordinary news

events. The component model allows large shocks to be transitory. In-

deed Engle and Lee (1999) establish that the impact of the October 1987

crash on stock market volatility was temporary. The expected risk pre-

mium, as measured by the expected amount of returns in excess of the

risk-free interest rate, in the stock market was found to be related to the

long-run component of stock return volatility.1 The authors suggested,

but did not test, that such pricing relationship may have fundamen-

tal economic explanations. The well-documented ˜leverage effect™ (or

volatility asymmetry) in the stock market (see Black, 1976; Christie,

1982; Nelson, 1991) is shown to have a temporary impact; the long-run

volatility component shows no asymmetric response to market changes.

The reduced form of Equation (5.3) can be expressed as a

GARCH(2,2) process below:

h t = (1 ’ ± ’ β) ω + (± + •) µt’1 + [’• (± + β) ’ ±ρ] µt’2

2 2

+ (ρ + β ’ •) h t’1 + [• (± + β) ’ βρ] h t’2 ,

1

Merton (1980) and French, Schwert and Stambaugh (1987) also studied and measured the relationships

between risk premium and ˜total™ volatility.

Long Memory Models 57

with all ¬ve parameters, ±, β, ω, • and ρ, constraint to be positive and

real, 0 < (± + β) < ρ < 1, and 0 < • < β.

5.4.3 Regime-switching model

One approach for modelling changing volatility level and persistence

is to use a Hamilton (1989) type regime-switching (RS) model, which

like GARCH model is strictly stationary and covariance stationary. Both

ARCH and GARCH models have been implemented with a Hamilton

(1989) type regime-switching framework, whereby volatility persistence

can take different values depending on whether it is in high or low volatil-

ity regimes. The most generalized form of regime-switching model is

the RS-GARCH(1, 1) model used in Gray (1996) and Klaassen (1998)

h t, St’1 = ω St’1 + ± St’1 µt’1 + β St’1 h t’1, St’1

2

where St indicates the state of regime at time t.

It has long been argued that the ¬nancial market reacts to large and

small shocks differently and the rate of mean reversion is faster for large

shocks. Friedman and Laibson (1989), Jones, Lamont and Lumsdaine

(1998) and Ederington and Lee (2001) all provide explanations and

empirical support for the conjecture that volatility adjustment in high

and low volatility states follows a twin-speed process: slower adjust-

ment and more persistent volatility in the low volatility state and faster

adjustment and less volatility persistence in the high volatility state.

The earlier RS applications, such as Pagan and Schwert (1990) and

Hamilton and Susmel (1994) are more rigid, where conditional vari-

ance is state-dependent but not time-dependent. In these studies, only

ARCH class conditional variance is entertained. Recent extensions by

Gray (1996) and Klaassen (1998) allow GARCH-type heteroscedas-

ticity in each state and the probability of switching between states to

be time-dependent. More recent advancement is to allow more ¬‚exi-

ble switching probability. For example, Peria (2001) allowed the tran-

sition probabilities to vary according to economic conditions with the

RS-GARCH model below:

rt | t’1 N (µi , h it ) w.p. pit ,

h it = ωi + ±i t’1 + βi h t’1 .

2

where i represents a particular regime, ˜w.p.™ stands for with probability,

pit = Pr ( St = i| t’1 ) and pit = 1.

58 Forecasting Financial Market Volatility

The STGARCH (smooth transition GARCH) model below was tested

in Taylor, J. (2004)

h t = ω + (1 ’ F (µt’1 )) ±µt’1 + F (µt’1 ) δµt’1 + βh t’1 ,

2 2

where

1

F (µt’1 ) = for logistic STGARCH,

1 + exp (’θ µt’1 )

F (µt’1 ) = 1 + exp ’θ µt’1

2

for exponential STGARCH.

5.4.4 Forecasting performance

The TAR model used in Cao and Tsay (1992) is similar to a SV model

with regime switching, and Cao and Tsay (1992) reported better forecast-

ing performance from TAR than EGARCH and GARCH. Hamilton and

Susmel (1994) ¬nd regime-switching ARCH with leverage effect pro-

duces better volatility forecast than the asymmetry version of GARCH.

Hamilton and Lin (1996) use a bivariate RS model and ¬nd stock market

returns are more volatile during a period of recession. Gray (1996) ¬ts

a RSGARCH (1,1) model to US 1-month T-Bill rates, where the rate

of mean level reversion is permitted to differ under different regimes,

and ¬nds substantial improvement in forecasting performance. Klaassen

(1998) also applies RSGARCH (1,1) to the foreign exchange market and

¬nds a superior, though less dramatic, performance.

It is worth noting that interest rates are different to the other assets

in that interest rates exhibit ˜level™ effect, i.e. volatility depends on the

level of the interest rate. It is plausible that it is this level effect that Gray

(1996) is picking up that result in superior forecasting performance. This

level effect also appears in some European short rates (Ferreira, 1999).

There is no such level effect in exchange rates and so it is not surprising

that Klaassen (1998) did not ¬nd similar dramatic improvement. No

other published forecasting results are available for break and component

volatility models.

6

Stochastic Volatility

The stochastic volatility (SV) model is, ¬rst and foremost, a theoretical

model rather than a practical and direct tool for volatility forecast. One

should not overlook the developments in the stochastic volatility area,

however, because of the rapid advancement in research, noticeably by

Ole Barndorff-Nielsen and Neil Shephard. As far as implementation is

concerned, the SV estimation still poses a challenge to many researchers.

Recent publications indicate a trend towards the MCMC (Monte Carlo

Markov Chain) approach. A good source of reference for the MCMC

approach for SV estimation is Tsay (2002). Here we will provide only an

overview. An early survey of SV work is Ghysels, Harvey and Renault

(1996) but the subject is rapidly changing. A more recent SV book is

Shephard (2003). The SV models and the ARCH models are closely

related and many ARCH models have SV equivalence as continuous time

diffusion limit (see Taylor, 1994; Duan, 1997; Corradi, 2000; Fleming

and Kirby, 2003).

6.1 THE VOLATILITY INNOVATION

The discrete time SV model is

rt = µ + µt ,

µt = z t exp (0.5h t ) ,

h t = ω + βh t’1 + …t ,

where …t may or may not be independent of z t . We have already seen this

continuous time speci¬cation in Section 5.2, and it will appear again in

Chapter 9 when we discuss stochastic volatility option pricing models.

The SV model has an additional innovative term in the volatility dy-

namics and, hence, is more ¬‚exible than ARCH class models. It has been

found to ¬t ¬nancial market returns better and has residuals closer to stan-

dard normal. Modelling volatility as a stochastic variable immediately

leads to fat tail distributions for returns. The autoregressive term in the

volatility process introduces persistence, and the correlation between

60 Forecasting Financial Market Volatility

the two innovative terms in the volatility process and the return pro-

cess produces volatility asymmetry (Hull and White, 1987, 1988). Long

memory SV models have also been proposed by allowing the volatility

process to have a fractional integrated order (see Harvey, 1998).

The volatility noise term makes the SV model a lot more ¬‚exible, but

as a result the SV model has no closed form, and hence cannot be esti-

mated directly by maximum likelihood. The quasi-maximum likelihood

estimation (QMLE) approach of Harvey, Ruiz and Shephard (1994) is in-

ef¬cient if volatility proxies are non-Gaussian (Andersen and Sorensen,

1997). The alternatives are the generalized method of moments (GMM)

approach through simulations (Duf¬e and Singleton, 1993), or analyt-

ical solutions (Singleton, 2001), and the likelihood approach through

numerical integration (Fridman and Harris, 1998) or Monte Carlo in-

tegration using either importance sampling (Danielsson, 1994; Pitt and

Shephard, 1997; Durbin and Koopman, 2000) or Markov chain (e.g.

Jacquier, Polson and Rossi, 1994; Kim, Shephard and Chib, 1998). In

the following section, we will describe the MCMC approach only.

6.2 THE MCMC APPROACH

The MCMC approach to modelling stochastic volatility was made pop-

ular by authors such as Jacquier, Polson and Rossi (1994). Tsay (2002)

has a good description of how the algorithm works. Consider here the

simplest case:

r t = at ,

a t = h t µt ,

ln h t = ±0 + ±1 ln h t’1 + vt , (6.1)

where µt ∼ N (0, 1), vt ∼ N (0, σν ) and µt and vt are independent.

2

Let w = (±0 , ±1 , σν ) . Let R = (r1 , · · · , rn ) be the collection of n ob-

2

served returns, and H = (h 1 , · · · , h n ) be the n-dimension unobservable

conditional volatilities. Estimation of model (6.1) is made complicated

because the likelihood function is a mixture over the n-dimensional H

distribution as follows:

f (R | w) = f (R | H ) · f (H | w) d H.

The objective is still maximizing the likelihood of {at }, but the density

of R is determined by H which in turn is determined by w.

Stochastic Volatility 61

Assuming that prior distributions for the mean and the volatility equa-

tions are independent, the Gibbs sampling approach to estimating model

(6.1) involves drawing random samples from the following conditional

posterior distributions:

f (β | R, X, H, w), f (H | R, X, β, w) f (w | R, X, β, H )

and

This process is repeated with updated information till the likelihood tol-

erance or the predetermined maximum number of iterations is reached.

6.2.1 The volatility vector H

First, the volatility vector H is drawn element by element

f (h t |R, H’t , w )

∝ f (at |h t , rt ) f (h t |h t’1 , w ) f (h t+1 |h t , w )

(ln h t ’ µt )2

rt 2

’0.5 ’1

∝ h t exp ’ · h t exp ’ (6.2)

2σ 2

2h t

where

1

µt = [±0 (1 ’ ±1 ) + ±1 (ln h t+1 + ln h t’1 )] ,

1 + ±1 2

σν2

σ=

2

1 + ±1 2

Equation (6.2) can be obtained using results for a missing value in an

A R(1) model. To see how this works, start from the volatility equation

ln h t = ±0 + ±1 ln h t’1 + at ,

±0 + ±1 ln h t’1 = 1 — ln h t ’at ,

xt bt

yt

yt = xt ln h t + bt , (6.3)

and for t + 1

ln h t+1 ’ ±0 = ±1 + ln h t + at+1 ,

yt+1 = xt+1 ln h t + bt+1 . (6.4)

Given that bt and bt+1 have the same distribution because at is also

N (0, σν ), ln h t can be estimated from (6.3) and (6.4) using the least

2

62 Forecasting Financial Market Volatility

squares principle,

xt yt + xt+1 yt+1

ln h t =

xt2 + xt+1

2

±0 (1 ’ ±1 ) + ±1 (ln h t+1 + ln h t’1 )

= .