. 1
( 7)



>>

A Practical Guide to Forecasting
Financial Market Volatility



Ser-Huang Poon
A Practical Guide to Forecasting
Financial Market Volatility
For other titles in the Wiley Finance series
please see www.wiley.com/¬nance
A Practical Guide to Forecasting
Financial Market Volatility



Ser-Huang Poon
Copyright 2005 John Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester,
C
West Sussex PO19 8SQ, England
Telephone (+44) 1243 779777
Email (for orders and customer service enquiries): cs-books@wiley.co.uk
Visit our Home Page on www.wiley.com
All Rights Reserved. No part of this publication may be reproduced, stored in a retrieval system
or transmitted in any form or by any means, electronic, mechanical, photocopying, recording,
scanning or otherwise, except under the terms of the Copyright, Designs and Patents Act 1988
or under the terms of a licence issued by the Copyright Licensing Agency Ltd, 90 Tottenham
Court Road, London W1T 4LP, UK, without the permission in writing of the Publisher.
Requests to the Publisher should be addressed to the Permissions Department, John Wiley &
Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex PO19 8SQ, England, or emailed
to permreq@wiley.co.uk, or faxed to (+44) 1243 770620.
Designations used by companies to distinguish their products are often claimed as trademarks.
All brand names and product names used in this book are trade names, service marks, trademarks
or registered trademarks of their respective owners. The Publisher is not associated with any product
or vendor mentioned in this book.
This publication is designed to provide accurate and authoritative information in regard to
the subject matter covered. It is sold on the understanding that the Publisher is not engaged
in rendering professional services. If professional advice or other expert assistance is
required, the services of a competent professional should be sought.

Other Wiley Editorial Of¬ces
John Wiley & Sons Inc., 111 River Street, Hoboken, NJ 07030, USA
Jossey-Bass, 989 Market Street, San Francisco, CA 94103-1741, USA
Wiley-VCH Verlag GmbH, Boschstr. 12, D-69469 Weinheim, Germany
John Wiley & Sons Australia Ltd, 33 Park Road, Milton, Queensland 4064, Australia
John Wiley & Sons (Asia) Pte Ltd, 2 Clementi Loop #02-01, Jin Xing Distripark, Singapore 129809
John Wiley & Sons Canada Ltd, 22 Worcester Road, Etobicoke, Ontario, Canada M9W 1L1

Wiley also publishes its books in a variety of electronic formats. Some content that appears
in print may not be available in electronic books.

Library of Congress Cataloging-in-Publication Data

Poon, Ser-Huang.
A practical guide for forecasting ¬nancial market volatility / Ser Huang
Poon.
p. cm. ” (The Wiley ¬nance series)
Includes bibliographical references and index.
ISBN-13 978-0-470-85613-0 (cloth : alk. paper)
ISBN-10 0-470-85613-0 (cloth : alk. paper)
1. Options (Finance)”Mathematical models. 2. Securities”Prices”
Mathematical models. 3. Stock price forecasting”Mathematical models. I. Title.
II. Series.
HG6024.A3P66 2005
332.64 01 5195”dc22 2005005768

British Library Cataloguing in Publication Data

A catalogue record for this book is available from the British Library

ISBN-13 978-0-470-85613-0 (HB)
ISBN-10 0-470-85613-0 (HB)

Typeset in 11/13pt Times by TechBooks, New Delhi, India
Printed and bound in Great Britain by TJ International Ltd, Padstow, Cornwall
This book is printed on acid-free paper responsibly manufactured from sustainable forestry
in which at least two trees are planted for each one used for paper production.
I dedicate this book to my mother
Contents

Foreword by Clive Granger xiii

Preface xv

1 Volatility De¬nition and Estimation 1
1.1 What is volatility? 1
1.2 Financial market stylized facts 3
1.3 Volatility estimation 10
1.3.1 Using squared return as a proxy for
daily volatility 11
1.3.2 Using the high“low measure to proxy volatility 12
1.3.3 Realized volatility, quadratic variation
and jumps 14
1.3.4 Scaling and actual volatility 16
1.4 The treatment of large numbers 17

2 Volatility Forecast Evaluation 21
2.1 The form of X t 21
2.2 Error statistics and the form of µt 23
2.3 Comparing forecast errors of different models 24
2.3.1 Diebold and Mariano™s asymptotic test 26
2.3.2 Diebold and Mariano™s sign test 27
2.3.3 Diebold and Mariano™s Wilcoxon sign-rank test 27
2.3.4 Serially correlated loss differentials 28
2.4 Regression-based forecast ef¬ciency and
orthogonality test 28
2.5 Other issues in forecast evaluation 30
viii Contents


3 Historical Volatility Models 31
3.1 Modelling issues 31
3.2 Types of historical volatility models 32
3.2.1 Single-state historical volatility models 32
3.2.2 Regime switching and transition exponential
smoothing 34
3.3 Forecasting performance 35

4 Arch 37
4.1 Engle (1982) 37
4.2 Generalized ARCH 38
4.3 Integrated GARCH 39
4.4 Exponential GARCH 41
4.5 Other forms of nonlinearity 41
4.6 Forecasting performance 43

5 Linear and Nonlinear Long Memory Models 45
5.1 What is long memory in volatility? 45
5.2 Evidence and impact of volatility long memory 46
5.3 Fractionally integrated model 50
5.3.1 FIGARCH 51
5.3.2 FIEGARCH 52
5.3.3 The positive drift in fractional integrated series 52
5.3.4 Forecasting performance 53
5.4 Competing models for volatility long memory 54
5.4.1 Breaks 54
5.4.2 Components model 55
5.4.3 Regime-switching model 57
5.4.4 Forecasting performance 58

6 Stochastic Volatility 59
6.1 The volatility innovation 59
6.2 The MCMC approach 60
6.2.1 The volatility vector H 61
6.2.2 The parameter w 62
6.3 Forecasting performance 63

7 Multivariate Volatility Models 65
7.1 Asymmetric dynamic covariance model 65
Contents ix


7.2 A bivariate example 67
7.3 Applications 68

8 Black“Scholes 71
8.1 The Black“Scholes formula 71
8.1.1 The Black“Scholes assumptions 72
8.1.2 Black“Scholes implied volatility 73
8.1.3 Black“Scholes implied volatility smile 74
8.1.4 Explanations for the ˜smile™ 75
8.2 Black“Scholes and no-arbitrage pricing 77
8.2.1 The stock price dynamics 77
8.2.2 The Black“Scholes partial differential equation 77
8.2.3 Solving the partial differential equation 79
8.3 Binomial method 80
8.3.1 Matching volatility with u and d 83
8.3.2 A two-step binomial tree and American-style
options 85
8.4 Testing option pricing model in practice 86
8.5 Dividend and early exercise premium 88
8.5.1 Known and ¬nite dividends 88
8.5.2 Dividend yield method 88
8.5.3 Barone-Adesi and Whaley quadratic
approximation 89
8.6 Measurement errors and bias 90
8.6.1 Investor risk preference 91
8.7 Appendix: Implementing Barone-Adesi and Whaley™s
ef¬cient algorithm 92

9 Option Pricing with Stochastic Volatility 97
9.1 The Heston stochastic volatility option pricing model 98
9.2 Heston price and Black“Scholes implied 99
9.3 Model assessment 102
9.3.1 Zero correlation 103
9.3.2 Nonzero correlation 103
9.4 Volatility forecast using the Heston model 105
9.5 Appendix: The market price of volatility risk 107
9.5.1 Ito™s lemma for two stochastic variables 107
9.5.2 The case of stochastic volatility 107
9.5.3 Constructing the risk-free strategy 108
x Contents


9.5.4 Correlated processes 110
9.5.5 The market price of risk 111

10 Option Forecasting Power 115
10.1 Using option implied standard deviation to forecast
volatility 115
10.2 At-the-money or weighted implied? 116
10.3 Implied biasedness 117
10.4 Volatility risk premium 119

11 Volatility Forecasting Records 121
11.1 Which volatility forecasting model? 121
11.2 Getting the right conditional variance and forecast
with the ˜wrong™ models 123
11.3 Predictability across different assets 124
11.3.1 Individual stocks 124
11.3.2 Stock market index 125
11.3.3 Exchange rate 126
11.3.4 Other assets 127

12 Volatility Models in Risk Management 129
12.1 Basel Committee and Basel Accords I & II 129
12.2 VaR and backtest 131
12.2.1 VaR 131
12.2.2 Backtest 132
12.2.3 The three-zone approach to backtest
evaluation 133
12.3 Extreme value theory and VaR estimation 135
12.3.1 The model 136
12.3.2 10-day VaR 137
12.3.3 Multivariate analysis 138
12.4 Evaluation of VaR models 139

13 VIX and Recent Changes in VIX 143
13.1 New de¬nition for VIX 143
13.2 What is the VXO? 144
13.3 Reason for the change 146

14 Where Next? 147
Contents xi


Appendix 149

References 201

Index 215
Foreword

If one invests in a ¬nancial asset today the return received at some pre-
speci¬ed point in the future should be considered as a random variable.
Such a variable can only be fully characterized by a distribution func-
tion or, more easily, by a density function. The main, single and most
important feature of the density is the expected or mean value, repre-
senting the location of the density. Around the mean is the uncertainty or
the volatility. If the realized returns are plotted against time, the jagged
oscillating appearance illustrates the volatility. This movement contains
both welcome elements, when surprisingly large returns occur, and also
certainly unwelcome ones, the returns far below the mean. The well-
known fact that a poor return can arise from an investment illustrates
the fact that investing can be risky and is why volatility is sometimes
equated with risk.
Volatility is itself a stock variable, having to be measured over a period
of time, rather than a ¬‚ow variable, measurable at any instant of time.
Similarly, a stock price is a ¬‚ow variable but a return is a stock variable.
Observed volatility has to be observed over stated periods of time, such
as hourly, daily, or weekly, say.
Having observed a time series of volatilities it is obviously interesting
to ask about the properties of the series: is it forecastable from its own
past, do other series improve these forecasts, can the series be mod-
eled conveniently and are there useful multivariate generalizations of
the results? Financial econometricians have been very inventive and in-
dustrious considering such questions and there is now a substantial and
often sophisticated literature in this area.
The present book by Professor Ser-Huang Poon surveys this literature
carefully and provides a very useful summary of the results available.
xiv Foreword


By so doing, she allows any interested worker to quickly catch up with
the ¬eld and also to discover the areas that are still available for further
exploration.
Clive W.J. Granger
December 2004
Preface

Volatility forecasting is crucial for option pricing, risk management and
portfolio management. Nowadays, volatility has become the subject of
trading. There are now exchange-traded contracts written on volatility.
Financial market volatility also has a wider impact on ¬nancial regula-
tion, monetary policy and macroeconomy. This book is about ¬nancial
market volatility forecasting. The aim is to put in one place models, tools
and ¬ndings from a large volume of published and working papers from
many experts. The material presented in this book is extended from two
review papers (˜Forecasting Financial Market Volatility: A Review™ in
the Journal of Economic Literature, 2003, 41, 2, pp. 478“539, and ˜Prac-
tical Issues in Forecasting Volatility™ in the Financial Analysts Journal,
2005, 61, 1, pp. 45“56) jointly published with Clive Granger.
Since the main focus of this book is on volatility forecasting perfor-
mance, only volatility models that have been tested for their forecasting
performance are selected for further analysis and discussion. Hence, this
book is oriented towards practical implementations. Volatility models
are not pure theoretical constructs. The practical importance of volatil-
ity modelling and forecasting in many ¬nance applications means that
the success or failure of volatility models will depend on the charac-
teristics of empirical data that they try to capture and predict. Given
the prominent role of option price as a source of volatility forecast, I
have also devoted much effort and the space of two chapters to cover
Black“Scholes and stochastic volatility option pricing models.
This book is intended for ¬rst- and second-year ¬nance PhD students
and practitioners who want to implement volatility forecasting models
but struggle to comprehend the huge volume of volatility research. Read-
ers who are interested in more technical aspects of volatility modelling
xvi Preface


could refer to, for example, Gourieroux (1997) on ARCH models,
Shephard (2003) on stochastic volatility and Fouque, Papanicolaou and
Sircar (2000) on stochastic volatility option pricing. Books that cover
speci¬c aspects or variants of volatility models include Franses and van
Dijk (2000) on nonlinear models, and Beran (1994) and Robinson (2003)
on long memory models. Specialist books that cover ¬nancial time se-
ries modelling in a more general context include Alexander (2001),
Tsay (2002) and Taylor (2005). There are also a number of edited series
that contain articles on volatility modelling and forecasting, e.g. Rossi
(1996), Knight and Satchell (2002) and Jarrow (1998).
I am very grateful to Clive for his teaching and guidance in the last
few years. Without his encouragement and support, our volatility survey
works and this book would not have got started. I would like to thank all
my co-authors on volatility research, in particular Bevan Blair, Namwon
Hyung, Eric Jondeau, Martin Martens, Michael Rockinger, Jon Tawn,
Stephen Taylor and Konstantinos Vonatsos. Much of the writing here
re¬‚ects experience gained from joint work with them.
1
Volatility De¬nition and
Estimation

1.1 WHAT IS VOLATILITY?
It is useful to start with an explanation of what volatility is, at least
for the purpose of clarifying the scope of this book. Volatility refers
to the spread of all likely outcomes of an uncertain variable. Typically,
in ¬nancial markets, we are often concerned with the spread of asset
returns. Statistically, volatility is often measured as the sample standard
deviation

T
1
σ= (rt ’ µ)2 , (1.1)
T ’1 t=1


where rt is the return on day t, and µ is the average return over the T -day
period.
Sometimes, variance, σ 2 , is used also as a volatility measure. Since
variance is simply the square of standard deviation, it makes no differ-
ence whichever measure we use when we compare the volatility of two
assets. However, variance is much less stable and less desirable than
standard deviation as an object for computer estimation and volatility
forecast evaluation. Moreover standard deviation has the same unit of
measure as the mean, i.e. if the mean is in dollar, then standard devi-
ation is also expressed in dollar whereas variance will be expressed in
dollar square. For this reason, standard deviation is more convenient and
intuitive when we think about volatility.
Volatility is related to, but not exactly the same as, risk. Risk is associ-
ated with undesirable outcome, whereas volatility as a measure strictly
for uncertainty could be due to a positive outcome. This important dif-
ference is often overlooked. Take the Sharpe ratio for example. The
Sharpe ratio is used for measuring the performance of an investment by
comparing the mean return in relation to its ˜risk™ proxy by its volatility.
2 Forecasting Financial Market Volatility


The Sharpe ratio is de¬ned as

Average Risk-free interest

return, µ rate, e.g. T-bill rate
Sharpe ratio = .
Standard deviation of returns, σ

The notion is that a larger Sharpe ratio is preferred to a smaller one. An
unusually large positive return, which is a desirable outcome, could lead
to a reduction in the Sharpe ratio because it will have a greater impact
on the standard deviation, σ , in the denominator than the average return,
µ, in the numerator.
More importantly, the reason that volatility is not a good or perfect
measure for risk is because volatility (or standard deviation) is only
a measure for the spread of a distribution and has no information on
its shape. The only exception is the case of a normal distribution or a
lognormal distribution where the mean, µ, and the standard deviation,
σ , are suf¬cient statistics for the entire distribution, i.e. with µ and σ
alone, one is able to reproduce the empirical distribution.
This book is about volatility only. Although volatility is not the sole
determinant of asset return distribution, it is a key input to many im-
portant ¬nance applications such as investment, portfolio construction,
option pricing, hedging, and risk management. When Clive Granger and
I completed our survey paper on volatility forecasting research, there
were 93 studies on our list plus several hundred non-forecasting papers
written on volatility modelling. At the time of writing this book, the
number of volatility studies is still rising and there are now about 120
volatility forecasting papers on the list. Financial market volatility is a
˜live™ subject and has many facets driven by political events, macroecon-
omy and investors™ behaviour. This book will elaborate some of these
complexities that kept the whole industry of volatility modelling and
forecasting going in the last three decades. A new trend now emerging
is on the trading and hedging of volatility. The Chicago Board of Ex-
change (CBOE) for example has started futures trading on a volatility
index. Options on such futures contracts are likely to follow. Volatility
swap contracts have been traded on the over-the-counter market well
before the CBOE™s developments. Previously volatility was an input to
a model for pricing an asset or option written on the asset. It is now the
principal subject of the model and valuation. One can only predict that
volatility research will intensify for at least the next decade.
Volatility De¬nition and Estimation 3


1.2 FINANCIAL MARKET STYLIZED FACTS
To give a brief appreciation of the amount of variation across different
¬nancial assets, Figure 1.1 plots the returns distributions of a normally

(a) Normal N(0,1) (b) Daily returns on S&P100
Jan 1965 “ Jul 2003




’5 ’4 ’3 ’2 ’1
’4 ’3 ’2 ’1 0 1 2 3 4 5
0 1 2 3 4 5


(c) £ vs. yen daily exchange rate returns (d) Daily returns on Legal & General share
Sep 1971 “ Jul 2003 Jan 1969 “ Jul 2003




’4 ’3 ’2 ’1 ’10 ’5
0 1 2 3 4 0 5 10



(e) Daily returns on UK Small Cap Index (f) Daily returns on silver
Jan 1986 “ Jul 2003 Aug 1971 “ Jul 2003




’4 ’3 ’2 ’1 ’10 ’5
0 1 2 3 4 0 5 10

Figure 1.1 Distribution of daily ¬nancial market returns. (Note: the dotted line is
the distribution of a normal random variable simulated using the mean and standard
deviation of the ¬nancial asset returns)
4 Forecasting Financial Market Volatility


distributed random variable, and the respective daily returns on the US
Standard and Poor market index (S&P100),1 the yen“sterling exchange
rate, the share of Legal & General (a major insurance company in the
UK), the UK Index for Small Capitalisation Stocks (i.e. small compa-
nies), and silver traded at the commodity exchange. The normal distri-
bution simulated using the mean and standard deviation of the ¬nancial
asset returns is drawn on the same graph to facilitate comparison.
From the small selection of ¬nancial asset returns presented in Fig-
ure 1.1, we notice several well-known features. Although the asset re-
turns have different degrees of variation, most of them have long ˜tails™ as
compared with the normally distributed random variable. Typically, the
asset distribution and the normal distribution cross at least three times,
leaving the ¬nancial asset returns with a longer left tail and a higher peak
in the middle. The implications are that, for a large part of the time, ¬nan-
cial asset returns ¬‚uctuate in a range smaller than a normal distribution.
But there are some occasions where ¬nancial asset returns swing in a
much wider scale than that permitted by a normal distribution. This phe-
nomenon is most acute in the case of UK Small Cap and silver. Table 1.1
provides some summary statistics for these ¬nancial time series.
The normally distributed variable has a skewness equal to zero and√
a kurtosis of 3. The annualized standard deviation is simply 252σ ,
assuming that there are 252 trading days in a year. The ¬nancial asset
returns are not adjusted for dividend. This omission is not likely to have
any impact on the summary statistics because the amount of dividends
distributed over the year is very small compared to the daily ¬‚uctuations
of asset prices. From Table 1.1, the Small Cap Index is the most nega-
tively skewed, meaning that it has a longer left tail (extreme losses) than
right tail (extreme gains). Kurtosis is a measure for tail thickness and
it is astronomical for S&P100, Small Cap Index and silver. However,
these skewness and kurtosis statistics are very sensitive to outliers. The
skewness statistic is much closer to zero, and the amount of kurtosis
dropped by 60% to 80%, when the October 1987 crash and a small
number of outliers are excluded.
Another characteristic of ¬nancial market volatility is the time-
varying nature of returns ¬‚uctuations, the discovery of which led to
Rob Engle™s Nobel Prize for his achievement in modelling it. Figure 1.2
plots the time series history of returns of the same set of assets presented

1
The data for S&P100 prior to 1986 comes from S&P500. Adjustments were made when the two series were
grafted together.
Table 1.1 Summary statistics for a selection of ¬nancial series

N (0, 1) S&P100 Yen/£ rate Legal & General UK Small Cap Silver

Start date Jan 65 Sep 71 Jan 69 Jan 86 Aug 71
Number of observations 8000 9675 7338 7684 4432 7771
Daily averagea 0 0.024 0.043 0.022 0.014
’0.021
Daily Standard Deviation 1 0.985 0.715 2.061 0.648 2.347
Annualized average 0 6.067 10.727 5.461 3.543
’5.188
Annualized Standard Deviation 15.875 15.632 11.356 32.715 10.286 37.255
Skewness 0 0.026 0.387
’1.337 ’0.523 ’3.099
Kurtosis 3 37.140 7.664 6.386 42.561 45.503
Number of outliers removed 1 5 9
Skewnessb ’0.055 ’0.917 ’0.088
Kurtosisb 7.989 13.972 15.369
aReturns not adjusted for dividends.
bThese two statistical measures are computed after the removal of outliers.
All series have an end date of 22 July, 2003.
(b) Daily returns on S&P100
10
(a) Normally distributed random variable N(0,1)
8
8
6
6
4
4 2
0
2
’2
0 ’4
’2 ’6
’8
’4
’10
’6
2
04 07 08 26 2 03 21 07 19 07
1
’8 01 02 02 01 01 12 12 11 11
5 9 3 70 1 5 88 92 96 00
96 96 97 97 98 98
1 1 1 1 1 1 19 19 19 20
’10


(c) Yen to £ exchange rate returns (d) Daily returns on Legal & General's share
6 20
4 15
2 10
0 5
0
’2
’4 ’5
’6 ’10
’8 ’15
1 3 1 5 5 1 7 2 6 5 4 1 1 8 1
06 14 17 06 11 02 15 15 02 24 10 18 05 16 04 15
83 90 22 61 71 92 10 22 31 50 61 53 51 41 40
0 01 06 10 04 10 03 06 10 02 04 06 07 09 09 10 10
10 40 70 10 30 51 71 00 20 40 60 80 00 20
7 7 7 79 8 69 71 73 76 78 81 83 85 88 90 92 94 96 98 00 02
98 98 98 99 99 99 99 99 00 00
19 19 19 19 19 1 1 1 1 1 1 1 1 2 2 19 19 19 19 19 19 19 19 19 19 19 19 19 19 20 20


(f) Daily returns on silver
(e) Daily returns UK Small Cap Index 40
8
30
20
4
10
0
0
’10
’4
’20
’8
’30
’40
’12
2 0 6 1 7 4 1 0 8 3 0 0 8 7 4 0
8 4 6 0 0 0 3 3 5 5 4 7 7 0 7
10 31 51 72 92 20 21 42 62 90 11 12 32 60 81 81 91 91 93 03 11 21 22 10 10 11 11 11 21 31 40
60 70 80 90 00 11 30 40 50 60 71 90 00 10 20 10 30 50 70 91 11 31 51 80 00 20 40 60 80 00 37
98 98 98 98 99 99 99 99 99 99 99 99 00 00 00 97 97 97 97 97 98 98 98 98 99 99 99 99 99 00
1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2



Figure 1.2 Time series of daily returns on a simulated random variable and a collection of ¬nancial assets
Volatility De¬nition and Estimation 7


in Figure 1.1. The amplitude of the returns ¬‚uctuations represents the
amount of variation with respect to a short instance in time. It is clear
from Figures 1.2(b) to (f) that ¬‚uctuations of ¬nancial asset returns are
˜lumpier™ in contrast to the even variations of the normally distributed
variable in Figure 1.2(a). In the ¬nance literature, this ˜lumpiness™ is
called volatility clustering. With volatility clustering, a turbulent trad-
ing day tends to be followed by another turbulent day, while a tranquil
period tends to be followed by another tranquil period. Rob Engle (1982)
is the ¬rst to use the ARCH (autoregressive conditional heteroscedastic-
ity) model to capture this type of volatility persistence; ˜autoregressive™
because high/low volatility tends to persist, ˜conditional™ means time-
varying or with respect to a point in time, and ˜heteroscedasticity™ is a
technical jargon for non-constant volatility.2
There are several salient features about ¬nancial market returns and
volatility that are now well documented. These include fat tails and
volatility clustering that we mentioned above. Other characteristics doc-
umented in the literature include:
(i) Asset returns, rt , are not autocorrelated except possibly at lag one
due to nonsynchronous or thin trading. The lack of autocorrelation
pattern in returns corresponds to the notion of weak form market
ef¬ciency in the sense that returns are not predictable.
(ii) The autocorrelation function of |rt | and rt2 decays slowly and
corr (|rt | , |rt’1 |) > corr rt2 , rt’1 . The decay rate of the auto-
2

correlation function is much slower than the exponential rate of
a stationary AR or ARMA model. The autocorrelations remain
positive for very long lags. This is known as the long memory
effect of volatility which will be discussed in greater detail in
Chapter 5. In the table below, we give a brief taste of the ¬nding:


ρ(|r |) ρ(r 2 ) ρ(ln|r |) ρ(|T r |)

S&P100 35.687 3.912 27.466 41.930
Yen/£ 4.111 1.108 0.966 5.718
L&G 25.898 14.767 29.907 28.711
Small Cap 25.381 3.712 35.152 38.631
Silver 45.504 8.275 88.706 60.545



2
It is worth noting that the ARCH effect appears in many time series other than ¬nancial time series. In fact
Engle™s (1982) seminal work is illustrated with the UK in¬‚ation rate.
8 Forecasting Financial Market Volatility


(iii) The numbers reported above are the sum of autocorrelations for the
¬rst 1000 lags. The last column, ρ(|T r |), is the autocorrelation of
absolute returns after the most extreme 1% tail observations were
truncated. Let r0.01 and r0.99 be the 98% con¬dence interval of the
empirical distribution,
T r = Min [r, r0.99 ] , or Max [r, r0.01 ] . (1.2)
The effect of such an outlier truncation is discussed in Huber (1981).
The results reported in the table show that suppressing the large
numbers markedly increases the long memory effect.
(iv) Autocorrelation of powers of an absolute return are highest at power
one: corr (|rt | , |rt’1 |) > corr rtd , rt’1 , d = 1. Granger and Ding
d

(1995) call this property the Taylor effect, following Taylor (1986).
We showed above that other means of suppressing large numbers
could make the memory last longer. The absolute returns |rt | and
squared returns rt2 are proxies of daily volatility. By analysing the
more accurate volatility estimator, we note that the strongest auto-
correlation pattern is observed among realized volatility. Figure 1.3
demonstrates this convincingly.
(v) Volatility asymmetry: it has been observed that volatility increases if
the previous day returns are negative. This is known as the leverage
effect (Black, 1976; Christie, 1982) because the fall in stock price
causes leverage and ¬nancial risk of the ¬rm to increase. The phe-
nomenon of volatility asymmetry is most marked during large falls.
The leverage effect has not been tested between contemporaneous
returns and volatility possibly due to the fact that it is the previ-
ous day residuals returns (and its sign dummy) that are included
in the conditional volatility speci¬cation in many models. With the
availability of realized volatility, we ¬nd a similar, albeit slightly
weaker, relationship in volatility and the sign of contemporaneous
returns.
(vi) The returns and volatility of different assets (e.g. different company
shares) and different markets (e.g. stock vs. bond markets in one
or more regions) tend to move together. More recent research ¬nds
correlation among volatility is stronger than that among returns and
both tend to increase during bear markets and ¬nancial crises.
The art of volatility modelling is to exploit the time series proper-
ties and stylized facts of ¬nancial market volatility. Some ¬nancial time
series have their unique characteristics. The Korean stock market, for
Volatility De¬nition and Estimation 9


0.9
(a) Autocorrelation of daily returns on S&P100
0.7

0.5

0.3

0.1

’0.1

’0.3
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36




0.9
(b) Autocorrelation of daily squared returns on S&P100
0.7
0.5
0.3
0.1
’0.1
’0.3
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36




0.9 (c) Autocorrelation of daily absolute returns on S&P100
0.7
0.5
0.3
0.1
’0.1
’0.3
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36




(d) Autocorrelation of daily realized volatility of S&P100
0.9
0.7
0.5
0.3
0.1
’0.1
’0.3
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36


Figure 1.3 Aurocorrelation of daily returns and proxies of daily volatility of S&P100.
(Note: dotted lines represent two standard errors)


example, clearly went through a regime shift with a much higher volatil-
ity level after 1998. Many of the Asian markets have behaved differently
since the Asian crisis in 1997. The dif¬culty and sophistication of volatil-
ity modelling lie in the controlling of these special and unique features
of each individual ¬nancial time series.
10 Forecasting Financial Market Volatility


1.3 VOLATILITY ESTIMATION
Consider a time series of returns rt , t = 1, · · · , T , the standard de-
viation, σ , in (1.1) is the unconditional volatility over the T period.
Since volatility does not remain constant through time, the conditional
volatility, σt,„ is a more relevant information for asset pricing and risk
management at time t. Volatility estimation procedure varies a great deal
depending on how much information we have at each sub-interval t, and
the length of „ , the volatility reference period. Many ¬nancial time series
are available at the daily interval, while „ could vary from 1 to 10 days
(for risk management), months (for option pricing) and years (for in-
vestment analysis). Recently, intraday transaction data has become more
widely available providing a channel for more accurate volatility esti-
mation and forecast. This is the area where much research effort has
been concentrated in the last two years.
When monthly volatility is required and daily data is available,
volatility can simply be calculated using Equation (1.1). Many macro-
economic series are available only at the monthly interval, so the current
practice is to use absolute monthly value to proxy for macro volatility.
The same applies to ¬nancial time series when a daily volatility estimate
is required and only daily data is available. The use of absolute value
to proxy for volatility is the equivalent of forcing T = 1 and µ = 0 in
Equation (1.1). Figlewski (1997) noted that the statistical properties of
the sample mean make it a very inaccurate estimate of the true mean es-
pecially for small samples. Taking deviations around zero instead of the
sample mean as in Equation (1.1) typically increases volatility forecast
accuracy.
The use of daily return to proxy daily volatility will produce a very
noisy volatility estimator. Section 1.3.1 explains this in a greater detail.
Engle (1982) was the ¬rst to propose the use of an ARCH (autoregres-
sive conditional heteroscedasticity) model below to produce conditional
volatility for in¬‚ation rate rt ;

rt = µ + µt , µt ∼ N 0, ht .

µt = z t h t ,
h t = ω + ±1 µt’1 + ±2 µt’2 + · · · .
2 2
(1.3)
The ARCH model is estimated by maximizing the likelihood of {µt }.
This approach of estimating conditional volatility is less noisy than the
absolute return approach but it relies on the assumption that (1.3) is the
Volatility De¬nition and Estimation 11


true return-generating process, µt is Gaussian and the time series is long
enough for such an estimation.
While Equation (1.1) is an unbiased estimator for σ 2 , the square root
of σ 2 is a biased estimator for σ due to Jensen inequality.3 Ding, Granger
and Engle (1993) suggest measuring volatility directly from absolute re-
turns. Davidian and Carroll (1987) show absolute returns volatility spec-
i¬cation is more robust against asymmetry and nonnormality. There is
some empirical evidence that deviations or absolute returns based mod-
els produce better volatility forecasts than models that are based on
squared returns (Taylor, 1986; Ederington and Guan, 2000a; McKenzie,
1999). However, the majority of time series volatility models, especially
the ARCH class models, are squared returns models. There are methods
for estimating volatility that are designed to exploit or reduce the in¬‚u-
ence of extremes.4 Again these methods would require the assumption
of a Gaussian variable or a particular distribution function for returns.


1.3.1 Using squared return as a proxy for daily volatility
Volatility is a latent variable. Before high-frequency data became widely
available, many researchers have resorted to using daily squared returns,
calculated from market daily closing prices, to proxy daily volatility.
Lopez (2001) shows that µt2 is an unbiased but extremely imprecise
estimator of σt2 due to its asymmetric distribution. Let
Yt = µ + µt , µt = σt z t , (1.4)
and z t ∼ N (0, 1). Then
E µt2 = σt2 E z t2 = σt2
t’1 t’1

since z t2 ∼ χ(1) . However, since the median of a χ(1) distribution is 0.455,
2 2

µt2 is less than 1 σt2 more than 50% of the time. In fact
2

1232 13
Pr µt2 ∈ σ, σ = Pr z t2 ∈ , = 0.2588,
2t 2t 22
which means that µt2 is 50% greater or smaller than σt2 nearly 75% of
the time!
√ √
If rt ∼ N 0, σt2 , then E (|rt |) = σt 2/π. Hence, σ t = |rt |/ 2/π if rt has a conditional normal distri-
3

bution.
4
For example, the maximum likelihood method proposed by Ball and Torous (1984), the high“low method
proposed by Parkinson (1980) and Garman and Klass (1980).
12 Forecasting Financial Market Volatility


Under the null hypothesis that returns in (1.4) are generated by a
GARCH(1,1) process, Andersen and Bollerslev (1998) show that the
population R 2 for the regression

µt2 = ± + βσ 2 + …t
t

is equal to κ ’1 where κ is the kurtosis of the standardized residuals and κ
is ¬nite. For conditional Gaussian error, the R 2 from a correctly speci¬ed
GARCH(1,1) model cannot be greater than 1/3. For thick tail distribu-
tion, the upper bound for R 2 is lower than 1/3. Christodoulakis and
Satchell (1998) extend the results to include compound normals and the
Gram“Charlier class of distributions con¬rming that the mis-estimation
of forecast performance is likely to be worsened by nonnormality known
to be widespread in ¬nancial data.
Hence, the use of µt2 as a volatility proxy will lead to low R 2 and under-
mine the inference on forecast accuracy. Blair, Poon and Taylor (2001)
report an increase of R 2 by three to four folds for the 1-day-ahead fore-
cast when intraday 5-minutes squared returns instead of daily squared
returns are used to proxy the actual volatility. The R 2 of the regression
of |µt | on σtintra is 28.5%. Extra caution is needed when interpreting em-
pirical ¬ndings in studies that adopt such a noisy volatility estimator.
Figure 1.4 shows the time series of these two volatility estimates over
the 7-year period from January 1993 to December 1999. Although the
overall trends look similar, the two volatility estimates differ in many
details.


1.3.2 Using the high“low measure to proxy volatility
The high“low, also known as the range-based or extreme-value, method
of estimating volatility is very convenient because daily high, low, open-
ing and closing prices are reported by major newspapers, and the cal-
culation is easy to program using a hand-held calculator. The high“low
volatility estimator was studied by Parkinson (1980), Garman and Klass
(1980), Beckers (1993), Rogers and Satchell (1991), Wiggins (1992),
Rogers, Satchell and Yoon (1994) and Alizadeh, Brandt and Diebold
(2002). It is based on the assumption that return is normally distributed
with conditional volatility σt . Let Ht and L t denote, respectively, the
highest and the lowest prices on day t. Applying the Parkinson (1980)
H -L measure to a price process that follows a geometric Brownian
Volatility De¬nition and Estimation 13


(a) Conditional variance proxied by daily squared returns
0.08

0.07

0.06

0.05

0.04

0.03

0.02

0.01

0
04/01/1993 04/01/1994 04/01/1995 04/01/1996 04/01/1997 04/01/1998 04/01/1999



0.08
(b) Conditional variance derived as the sum of intraday squared returns
0.07

0.06

0.05

0.04

0.03

0.02

0.01

0
04/01/1993 04/01/1994 04/01/1995 04/01/1996 04/01/1997 04/01/1998 04/01/1999


Figure 1.4 S&P100 daily volatility for the period from January 1993 to December
1999

motion results in the following volatility estimator (Bollen and Inder,
2002):
(ln Ht ’ ln L t )2
σ2
= t
4 ln 2
The Garman and Klass (1980) estimator is an extension of Parkinson
(1980) where information about opening, pt’1 , and closing, pt , prices
are incorporated as follows:
2 2
Ht pt
σ2 = 0.5 ln ’ 0.39 ln .
t
Lt pt’1
We have already shown that ¬nancial market returns are not likely to
be normally distributed and have a long tail distribution. As the H -L
volatility estimator is very sensitive to outliers, it will be useful to ap-
ply the trimming procedures in Section 1.4. Provided that there are no
destabilizing large values, the H -L volatility estimator is very ef¬cient
14 Forecasting Financial Market Volatility


and, unlike the realized volatility estimator introduced in the next sec-
tion, it is least affected by market microstructure effect.


1.3.3 Realized volatility, quadratic variation and jumps
More recently and with the increased availability of tick data, the term
realized volatility is now used to refer to volatility estimates calculated
using intraday squared returns at short intervals such as 5 or 15 minutes.5
For a series that has zero mean and no jumps, the realized volatility con-
verges to the continuous time volatility. To understand this, we assume
for the ease of exposition that the instantaneous returns are generated by
the continuous time martingale,
d p t = σt d W t , (1.5)
where d Wt denotes a standard Wiener process. From (1.5) the con-
ditional variance for the one-period returns, rt+1 ≡ pt+1 ’ pt , is
t+1 2
σs ds which is known as the integrated volatility over the period t
t
to t + 1. Note that while asset price pt can be observed at time t, the
volatility σt is an unobservable latent variable that scales the stochastic
process d Wt continuously through time.
Let m be the sampling frequency such that there are m continuously
compounded returns in one unit of time and
rm,t ≡ pt ’ pt’ 1/m (1.6)
and realized volatility
RVt+1 = rm,t+ j /m .
2

j=1,···,m

If the discretely sampled returns are serially uncorrelated and the sample
path for σt is continuous, it follows from the theory of quadratic variation
(Karatzas and Shreve, 1988) that
t+1
σs2 ds ’ = 0.
2
p lim rm,t+ j /m
m’∞ t j=1,···,m

Hence time t volatility is theoretically observable from the sample path
of the return process so long as the sampling process is frequent enough.

5
See Fung and Hsieh (1991) and Andersen and Bollerslev (1998). In the foreign exchange markets, quotes
for major exchange rates are available round the clock. In the case of stock markets, close-to-open squared return
is used in the volatility aggregation process during market close.
Volatility De¬nition and Estimation 15


When there are jumps in price process, (1.5) becomes
d pt = σt d Wt + κt dqt ,
where dqt is a Poisson process with dqt = 1 corresponding to a jump at
time t, and zero otherwise, and κt is the jump size at time t when there
is a jump. In this case, the quadratic variation for the cumulative return
process is then given by
t+1
σs2 ds + κ 2 (s) , (1.7)
t t<s¤t+1

which is the sum of the integrated volatility and jumps.
In the absence of jumps, the second term on the right-hand side of (1.7)
disappears, and the quadratic variation is simply equal to the integrated
volatility. In the presence of jumps, the realized volatility continues to
converge to the quadratic variation in (1.7)
m
t+1
σs2 ds + κ (s) ’ = 0.
2 2
p lim rm,t+ j /m (1.8)
m’∞ t t<s¤t+1 j=1

Barndorff-Nielsen and Shephard (2003) studied the property of the stan-
dardized realized bipower variation measure
m’1
a b
[a,b]
=m rm,t+ ( j+1)/m , a, b ≥ 0.
[(a+b)/2’1]
BVm,t+1 rm,t+ j /m
j=1

They showed that when jumps are large but rare, the simplest case where
a = b = 1,
m’1 t+1
µ’2 BVm,t+1 µ’2
[1,1]
= rm,t+ j /m rm,t+ ( j+1)/m ’ σs2 ds
1 1
t
j=1

where µ1 = 2/π . Hence, the realized volatility and the realized
bipower variation can be substituted into (1.8) to estimate the jump
component, κt . Barndorff-Nielsen and Shephard (2003) suggested im-
posing a nonnegative constraint on κt . This is perhaps too restrictive.
For nonnegative volatility, κt + µ’2 BVt > 0 will be suf¬cient.
1
Characteristics of ¬nancial market data suggest that returns measured
at an interval shorter than 5 minutes are plagued by spurious serial
correlation caused by various market microstructure effects including
nonsynchronous trading, discrete price observations, intraday periodic
16 Forecasting Financial Market Volatility



volatility patterns and bid“ask bounce.6 Bollen and Inder (2002), Ait-
Sahalia, Mykland and Zhang (2003) and Bandi and Russell (2004) have
given suggestions on how to isolate microstructure noise from realized
volatility estimator.

1.3.4 Scaling and actual volatility
The forecast of multi-period volatility σT,T + j (i.e. for j period) is taken to
be the sum of individual multi-step point forecasts s h T + j|T . These
j=1
multi-step point forecasts are produced by recursive substitution and
using the fact that µT +i|T = h T +i|T for i > 0 and µT +i|T = µT +i for
2 2 2

T + i ¤ 0. Since volatility of ¬nancial time series has complex struc-
ture, Diebold, Hickman, Inoue and Schuermann (1998) warn that fore-
cast estimates will differ depending on the current level of volatility,
volatility structure (e.g. the degree of persistence and mean reversion
etc.) and the forecast horizon.
If returns are iid (independent and identically distributed, or strict
white noise), then variance of returns over a long horizon can be derived
as a simple multiple of single-period variance. But, this is clearly not the
case for many ¬nancial time series because of the stylized facts listed in
Section 1.2. While a point forecast of σ T ’1,T | t’1 becomes very noisy
as T ’ ∞, a cumulative forecast, σ t,T | t’1 , becomes more accurate
because of errors cancellation and volatility mean reversion except when
there is a fundamental change in the volatility level or structure.7
Complication in relation to the choice of forecast horizon is partly
due to volatility mean reversion. In general, volatility forecast accu-
racy improves as data sampling frequency increases relative to forecast
horizon (Andersen, Bollerslev and Lange, 1999). However, for forecast-
ing volatility over a long horizon, Figlewski (1997) ¬nds forecast error
doubled in size when daily data, instead of monthly data, is used to fore-
cast volatility over 24 months. In some cases, where application is of
very long horizon e.g. over 10 years, volatility estimate calculated using

6
The bid“ask bounce for example induces negative autocorrelation in tick data and causes the realized
volatility estimator to be upwardly biased. Theoretical modelling of this issue so far assumes the price process
and the microstructure effect are not correlated, which is open to debate since market microstructure theory
suggests that trading has an impact on the ef¬cient price. I am grateful to Frank de Jong for explaining this to me
at a conference.
σ t,T | t’1 denotes a volatility forecast formulated at time t ’ 1 for volatility over the period from t to T . In
7

pricing options, the required volatility parameter is the expected volatility over the life of the option. The pricing
model relies on a riskless hedge to be followed through until the option reaches maturity. Therefore the required
volatility input, or the implied volatility derived, is a cumulative volatility forecast over the option maturity and
not a point forecast of volatility at option maturity. The interest in forecasting σ t,T | t’1 goes beyond the riskless
hedge argument, however.
Volatility De¬nition and Estimation 17


weekly or monthly data is better because volatility mean reversion is
dif¬cult to adjust using high frequency data. In general, model-based
forecasts lose supremacy when the forecast horizon increases with re-
spect to the data frequency. For forecast horizons that are longer than
6 months, a simple historical method using low-frequency data over a
period at least as long as the forecast horizon works best (Alford and
Boatsman, 1995; and Figlewski, 1997).
As far as sampling frequency is concerned, Drost and Nijman (1993)
prove, theoretically and for a special case (i.e. the GARCH(1,1) process,
which will be introduced in Chapter 4), that volatility structure should be
preserved through intertemporal aggregation. This means that whether
one models volatility at hourly, daily or monthly intervals, the volatility
structure should be the same. But, it is well known that this is not the
case in practice; volatility persistence, which is highly signi¬cant in
daily data, weakens as the frequency of data decreases. 8 This further
complicates any attempt to generalize volatility patterns and forecasting
results.


1.4 THE TREATMENT OF LARGE NUMBERS
In this section, I use large numbers to refer generally to extreme values,
outliers and rare jumps, a group of data that have similar characteristics
but do not necessarily belong to the same set. To a statistician, there are
always two ˜extremes™ in each sample, namely the minimum and the
maximum. The H -L method for estimating volatility described in the
previous section, for example, is also called the extreme value method.
We have also noted that these H -L estimators assume conditional dis-
tribution is normal. In extreme value statistics, normal distribution is but
one of the distributions for the tail. There are many other extreme value
distributions that have tails thinner or thicker than the normal distribu-
tion™s. We have known for a long time now that ¬nancial asset returns are
not normally distributed. We also know the standardized residuals from
ARCH models still display large kurtosis (see McCurdy and Morgan,
1987; Milhoj, 1987; Hsieh, 1989; Baillie and Bollerslev, 1989). Con-
ditional heteroscedasticity alone could not account for all the tail thick-
ness. This is true even when the Student-t distribution is used to construct

8
See Diebold (1988), Baillie and Bollerslev (1989) and Poon and Taylor (1992) for examples. Note that
Nelson (1992) points out separately that as the sampling frequency becomes shorter, volatility modelled using
discrete time model approaches its diffusion limit and persistence is to be expected provided that the underlying
returns is a diffusion or a near-diffusion process with no jumps.
18 Forecasting Financial Market Volatility


the likelihood function (see Bollerslev, 1987; Hsieh, 1989). Hence, in the
literature, the extreme values and the tail observations often refer to those
data that lie outside the (conditional) Gaussian region. Given that jumps
are large and are modelled as a separate component to the Brownian
motion, jumps could potentially be seen as a set similar to those tail
observations provided that they are truly rare.
Outliers are by de¬nition unusually large in scale. They are so large
that some have argued that they are generated from a completely dif-
ferent process or distribution. The frequency of occurrence should be
much smaller for outliers than for jumps or extreme values. Outliers
are so huge and rare that it is very unlikely that any modelling effort
will be able to capture and predict them. They have, however, undue
in¬‚uence on modelling and estimation (Huber, 1981). Unless extreme
value techniques are used where scale and marginal distribution are of-
ten removed, it is advisable that outliers are removed or trimmed before
modelling volatility. One such outlier in stock market returns is the Oc-
tober 1987 crash that produced a 1-day loss of over 20% in stock markets
worldwide.
The ways that outliers have been tackled in the literature largely de-
pend on their sizes, the frequency of their occurrence and whether these
outliers have an additive or a multiplicative impact. For the rare and
additive outliers, the most common treatment is simply to remove them
from the sample or omit them in the likelihood calculation (Kearns and
Pagan, 1993). Franses and Ghijsels (1999) ¬nd forecasting performance
of the GARCH model is substantially improved in four out of ¬ve stock
markets studied when the additive outliers are removed. For the rare
multiplicative outliers that produced a residual impact on volatility, a
dummy variable could be included in the conditional volatility equation
after the outlier returns has been dummied out in the mean equation
(Blair, Poon and Taylor, 2001).

rt = µ + ψ1 Dt + µt , µt = h t z t
h t = ω + βh t’1 + ±µt’1 + ψ2 Dt’1
2


where Dt is 1 when t refers to 19 October 1987 and 0 otherwise. Per-
sonally, I ¬nd a simple method such as the trimming rule in (1.2) very
quick to implement and effective.
The removal of outliers does not remove volatility persistence. In fact,
the evidence in the previous section shows that trimming the data using
(1.2) actually increases the ˜long memory™ in volatility making it appear
Volatility De¬nition and Estimation 19


to be extremely persistent. Since autocorrelation is de¬ned as
Cov (rt , rt’„ )
ρ (rt , rt’„ ) = ,
V ar (rt )
the removal of outliers has a great impact on the denominator, reduces
V ar (rt ) and increases the individual and the cumulative autocorrelation
coef¬cients.
Once the impact of outliers is removed, there are different views about
how the extremes and jumps should be handled vis-` -vis the rest of the
a
data. There are two schools of thought, each proposing a seemingly
different model, and both can explain the long memory in volatility. The
¬rst believes structural breaks in volatility cause mean level of volatility
to shift up and down. There is no restriction on the frequency or the size
of the breaks. The second advocates the regime-switching model where
volatility switches between high and low volatility states. The means of
the two states are ¬xed, but there is no restriction on the timing of the
switch, the duration of each regime and the probability of switching.
Sometimes a three-regime switching is adopted but, as the number of
regimes increases, the estimation and modelling become more complex.
Technically speaking, if there are in¬nite numbers of regimes then there
is no difference between the two models. The regime-switching model
and the structural break model will be described in Chapter 5.
2
Volatility Forecast Evaluation

Comparing forecasting performance of competing models is one of the
most important aspects of any forecasting exercise. In contrast to the
efforts made in the construction of volatility models and forecasts, little
attention has been paid to forecast evaluation in the volatility forecasting
literature. Let X t be the predicted variable, X t be the actual outcome and
µt = X t ’ X t be the forecast error. In the context of volatility forecast,
X t and X t are the predicted and actual conditional volatility. There are
many issues to consider:
(i) The form of X t : should it be σt2 or σt ?
(ii) Given that volatility is a latent variable, the impact of the noise
introduced in the estimation of X t , the actual volatility.
(iii) Which form of µt is more relevant for volatility model selection;
µt2 , |µt | or |µt |/X t ? Do we penalize underforecast, X t < X t ,
more than overforecast, X t > X t ?
(iv) Given that all error statistics are subject to noise, how do we know
if one model is truly better than another?
(v) How do we take into account when X t and X t+1 (and similarly for
µt and X t ) cover a large amount of overlapping data and are serially
correlated?
All these issues will be considered in the following sections.


2.1 THE FORM OF Xt
Here we argue that X t should be σt , and that if σt cannot be estimated
with some accuracy it is best not to perform comparison across predictive
models at all. The practice of using daily squared returns to proxy daily
conditional variance has been shown time and again to produce wrong
signals in model selection.
Given that all time series volatility models formulate forecasts based
on past information, they are not designed to predict shocks that are new
22 Forecasting Financial Market Volatility


to the system. Financial market volatility has many stylized facts. Once
a shock has entered the system, the merit of the volatility model depends
on how well it captures these stylized facts in predicting the volatility of
the following days. Hence we argue that X t should be σt . Conditional
variance σt2 formulation gives too much weight to the errors caused by
˜new™ shocks and especially the large ones, distorting the less extreme
forecasts where the models are to be assessed.
Note also that the square of a variance error is the fourth power of
the same error measured from standard deviation. This can complicate
the task of forecast evaluation given the dif¬culty in estimating fourth
moments with common distributions let alone the thick-tailed ones in
¬nance. The con¬dence interval of the mean error statistic can be very
wide when forecast errors are measured from variances and worse if they
are squared. This leads to dif¬culty in ¬nding signi¬cant differences
between forecasting models.
Davidian and Carroll (1987) make similar observations in their study
of variance function estimation for heteroscedastic regression. Using
high-order theory, they show that the use of square returns for modelling
variance is appropriate only for approximately normally distributed data,
and becomes nonrobust when there is a small departure from normal-
ity. Estimation of the variance function that is based on logarithmic
transformation or absolute returns is more robust against asymmetry
and nonnormality.
Some have argued that perhaps X t should be lnσt to rescale the size of
the forecast errors (Pagan and Schwert, 1990). This is perhaps one step
too far. After all, the magnitude of the error directly impacts on option
pricing, risk management and investment decision. Taking the logarithm
of the volatility error is likely to distort the loss function which is directly
proportional to the magnitude of forecast error. A decision maker might
be more risk-averse towards the larger errors.
We have explained in Section 1.3.1 the impact of using squared returns
to proxy daily volatility. Hansen and Lunde (2004b) used a series of
simulations to show that ˜. . . the substitution of a squared return for
the conditional variance in the evaluation of ARCH-type models can
result in an inferior model being chosen as [the] best with a probability
converges to one as the sample size increases . . . ™. Hansen and Lunde
(2004a) advocate the use of realized volatility in forecast evaluation but
caution the noise introduced by market macrostructure when the intraday
returns are too short.
Volatility Forecast Evaluation 23


2.2 ERROR STATISTICS AND THE FORM OF µt
Ideally an evaluation exercise should re¬‚ect the relative or absolute use-
fulness of a volatility forecast to investors. However, to do that one
needs to know the decision process that require these forecasts and the
costs and bene¬ts that result from using better forecasts. Utility-based
criteria, such as that used in West, Edison and Cho (1993), require some
assumptions about the shape and property of the utility function. In prac-
tice these costs, bene¬ts and utility function are not known and one often
resorts to simply use measures suggested by statisticians.
Popular evaluation measures used in the literature include
Mean Error (ME)
N N
1 1
µt = (σ t ’ σt ) ,
N N
t=1 t=1

Mean Square Error (MSE)
N N
1 1
µt2 = (σ t ’ σt )2 ,
N N
t=1 t=1

Root Mean Square Error (RMSE)

N N
1 1
µt2 = (σ t ’ σt )2 ,
N N
t=1 t=1

Mean Absolute Error (MAE)
N N
1 1
|µt | = |σ t ’ σt | ,
N N
t=1 t=1

Mean Absolute Percent Error (MAPE)
|µt | |σ t ’ σt |
N N
1 1
= .
σt σt
N N
t=1 t=1

Bollerslev and Ghysels (1996) suggested a heteroscedasticity-
adjusted version of MSE called HMSE where
2
N
σt
1
HMSE = ’1
σt
N t=1
24 Forecasting Financial Market Volatility


This is similar to squared percentage error but with the forecast error
scaled by predicted volatility. This type of performance measure is not
appropriate if the absolute magnitude of the forecast error is a major
concern. It is not clear why it is the predicted and not the actual volatility
that is used in the denominator. The squaring of the error again will give
greater weight to large errors.
Other less commonly used measures include mean logarithm of ab-
solute errors (MLAE) (as in Pagan and Schwert, 1990), the Theil-U
statistic and one based on asymmetric loss function, namely LINEX:
Mean Logarithm of Absolute Errors (MLAE)
N N
1 1
ln |µt | = ln |σ t ’ σt |
N N
t=1 t=1

Theil-U measure
N
(σ t ’ σt )2
t=1
Theil-U = , (2.1)
N
2
σ tB M ’ σt
t=1

where σ tB M is the benchmark forecast, used here to remove the effect of
any scalar transformation applied to σt .
LINEX has asymmetric loss function whereby the positive errors are
weighted differently from the negative errors:
N
1
[exp {’a (σ t ’ σt )} + a (σ t ’ σt ) ’ 1].
LINEX = (2.2)
N t=1

The choice of the parameter a is subjective. If a > 0, the function is
approximately linear for overprediction and exponential for underpre-
diction. Granger (1999) describes a variety of other asymmetric loss
functions of which the LINEX is an example. Given that most investors
would treat gains and losses differently, the use of asymmetric loss func-
tions may be advisable, but their use is not common in the literature.

2.3 COMPARING FORECAST ERRORS
OF DIFFERENT MODELS
In the special case where the error distribution of one forecasting
model dominates that of another forecasting model, the comparison is
Volatility Forecast Evaluation 25


straightforward (Granger, 1999). In practice, this is rarely the case, and
most comparisons of forecasting results are made based on the error
statistics described in Section 2.2. It is important to note that these er-
ror statistics are themselves subject to error and noise. So if an error
statistic of model A is higher than that of model B, one cannot conclude
that model B is better than A without performing tests of signi¬cance.
For statistical inference, West (1996), West and Cho (1995) and West
and McCracken (1998) show how standard errors for ME, MSE, MAE
and RMSE may be derived taking into account serial correlation in the
forecast errors and uncertainty inherent in volatility model parameter
estimates.
If there are T number of observations in the sample and T is large,
there are two ways in which out-of-sample forecasts may be made.
Assume that we use n number of observations for estimation and make
T ’ n number of forecasts. The recursive scheme starts with the sample
{1, · · · , n} and makes ¬rst forecast at n + 1. The second forecast for
n + 2 will include the last observation and form the information set
{1, · · · , n + 1}. It follows that the last forecast for T will include all
but the last observation, i.e. the information set is {1, · · · , T ’ 1}. In
practice, the rolling scheme is more popular, where a ¬xed number of
observations is used in the estimation. So the forecast for n + 2 will be
based on information set {2, · · · , n + 1}, and the last forecast at T will be
based on {T ’ n, · · · , T ’ 1}. The rolling scheme omits information in
the distant past. It is also more manageable in terms of computation when
T is very large. The standard errors developed by West and co-authors
are based on asymptotic theory and work for recursive scheme only. For
smaller sample and rolling scheme forecasts, Diebold and Mariano™s
(1995) small sample methods are more appropriate.
Diebold and Mariano (1995) propose three tests for ˜equal accuracy™
between two forecasting models. The tests relate prediction error to
some very general loss function and analyse loss differential derived
from errors produced by two competing models. The three tests include
an asymptotic test that corrects for series correlation and two exact ¬-
nite sample tests based on sign test and the Wilcoxon sign-rank test.
Simulation results show that the three tests are robust against non-
Gaussian, nonzero mean, serially and contemporaneously correlated
forecast errors. The two sign-based tests in particular continue to work
well among small samples. The Diebold and Mariano tests have been
used in a number of volatility forecasting contests. We provide the test
details here.
26 Forecasting Financial Market Volatility



Let { X it }t=1 and { X jt }t=1 be two sets of forecasts for {X t }t=1 from
T
T T

models i and j respectively. Let the associated forecast errors be {eit }t=1
T

and {e jt }t=1 . Let g (·) be the loss function (e.g. the error statistics in
T

Section 2.2) such that
g X t , X it = g (eit ) .
Next de¬ne loss differential
dt ≡ g (eit ) ’ g e jt .
The null hypothesis is equal forecast accuracy and zero loss differential
E(dt ) = 0.


2.3.1 Diebold and Mariano™s asymptotic test
The ¬rst test targets on the mean
T
1
d= |g(eit ) ’ g(e jt )|
T t=1

with test statistic
d
S1 = S1 ∼ N (0, 1)
1
2π f d (0)
T
T ’1

2π f d (0) = γ d („ )
1
S (T )
„ =’(T ’1)

T
1
γ d („ ) = dt ’ d dt’|„ | ’ d .
T t=|„ |+1

The operator 1 („ /S (T )) is the lag window, and S (T ) is the truncation
lag with
±

. 1
( 7)



>>