riving the Fokker-Planck equation. He considered one-dimensional

motion of a spherical particle of mass m and radius R that is subjected

to two forces. The first force is the viscous drag force described by the

dr

Stokes formula, F ¼ À6pZRv, where Z is viscosity and v ¼ is the

dt

particle velocity. Another force, Z, describes collisions of the water

molecules with the particle and therefore has a random nature. The

Langevin equation of the particle motion is

dv

¼ À6pZRv þ Z

m (4:2:1)

dt

Let us multiply both sides of equation (4.2.1) by r. Since

dv d 1d 2

r ¼ (rv) À v2 and rv ¼ (r ), then

dt dt 2dt

dr 2

1 d2 2 d

¼ À3pZR (r2 ) þ Zr

m 2 (r ) À m (4:2:2)

2 dt dt dt

Note that the mean kinetic energy of a spherical particle, E[ 1 mv2 ],

2

3

equals 2 kT. Since E[Zr] ¼ 0 due to the random nature of Z, averaging

of equation (4.2.2) yields

33

Stochastic Processes

d2 d

m 2 E[r2 ] þ 6pZR E[r2 ] ¼ 6kT (4:2:3)

dt dt

The solution to equation (4.2.3) is

d

E[r2 ] ¼ kT=(pZR) þ C exp (À6pZRt=m) (4:2:4)

dt

where C is an integration constant. The second term in equation

(4.2.4) decays exponentially and can be neglected in the asymptotic

solution. Then

E[r2 ] À r2 ¼ [kT=(pZR)]t (4:2:5)

0

where r0 is the particle position at t ¼ 0. It follows from the compari-

son of equations (4.2.5) and (4.1.15) that D ¼ kT=(pZR).1

The Brownian motion can be also derived as the continuous limit

for the discrete random walk (see, e.g., [3]). First, let us introduce the

process e(t) that is named the white noise and satisfies the following

conditions

E[e(t)] ¼ 0; E[e2 (t)] ¼ s2 ; E[e(t) e(s)] ¼ 0, if t 6¼ s: (4:2:6)

Hence, the white noise has zero mean and constant variance s2 . The

last condition in (4.2.6) implies that there is no linear correlation

between different observations of the white noise. Such a model repre-

sents an independently and identically distributed process (IID) and is

sometimes denoted IID(0, s2 ). The IID process can still have non-

linear correlations (see Section 5.3). The normal distribution N(0, s2 )

is the special case of the white noise. First, consider a simple discrete

process

y(k) ¼ y(k À 1) þ e(k) (4:2:7)

where the white noise innovations can take only two values2

D, with probability p, p ¼ const < 1

e(k) ¼ (4:2:8)

ÀD, with probability (1 À p)

Now, let us introduce the continuous process yn (t) within the time

interval t 2 [0, T], such that

yn (t) ¼ y([t=h]) ¼ y([nt=T]), t 2 [0, T] (4:2:9)

34 Stochastic Processes

In (4.2.9), [x] denotes the greatest integer that does not exceed x. The

process yn (t) has the stepwise form: it is constant except the moments

t ¼ kh, k ¼ 1, . . . , n. Mean and variance of the process yn (T) equal

E[yn (T)] ¼ n(2p À 1)D ¼ T(2p À 1)D=h (4:2:10)

Var[yn (T)] ¼ nD2 ¼ TD2 =h (4:2:11)

Both mean (4.2.10) and variance (4.2.11) become infinite in the

limiting case h ! 0 with arbitrary D. Hence, we must impose a rela-

tion between D and h that ensures the finite values of the moments

E[yn (T)] and Var[yn (T)]. Namely, let us set

p¬¬¬ p¬¬¬

p ¼ (1 þ m h=s)=2, D ¼ s h (4:2:12)

where m and s are some parameters. Then

E[yn (T)] ¼ mT, Var[yn (T)] ¼ s2 T (4:2:13)

It can be shown that yn (T) converges to the normal distribution

N(mT, s2 T) in the continuous limit. Hence, m and s are the drift

and diffusion parameters, respectively. Obviously, the drift parameter

differs from zero only when p 6¼ 0:5, that is when there is preference

for one direction of innovations over another. The continuous process

defined with the relations (4.2.13) is named the arithmetic Brownian

motion. It is reduced to the Wiener process when m ¼ 0 and s ¼ 1.

Note that in a more generic approach, the time intervals between

observations of y(t) themselves represent a random variable [4, 5].

While this process (so-called continuous-time random walk) better

resembles the market price variations, its description is beyond the

scope of this book.

In the general case, the arithmetic Brownian motion can be ex-

pressed in the following form

y(t) ¼ m(t)t þ s(y(t), t)W(t) (4:2:14)

The random variable in this process may have negative values. This

creates a problem for describing prices that are essentially positive.

Therefore, the geometric Brownian motion Y(t) ¼ exp [y(t)] is often

used in financial applications.

One can simulate the Wiener process with the following equation

p¬¬¬¬¬

[W(t þ Dt) À W(t)] DW ¼ N(0, 1) Dt (4:2:15)

35

Stochastic Processes

While the Wiener process is a continuous process, its innovations are

random. Therefore, the limit of the expression DW=Dt does not

converge when Dt ! 0. Indeed, it follows for the Wiener process that

lim [DW(t)=Dt)] ¼ lim [DtÀ1=2 ] (4:2:16)

Dt!0 Dt!0

As a result, the derivative dW(t)/dt does not exist in the ordinary

sense. Thus, one needs a special calculus to describe the stochastic

processes.

4.3 STOCHASTIC DIFFERENTIAL EQUATION

The Brownian motion (4.2.14) can be presented in the differential

form3

dy(t) ¼ mdt þ sdW(t) (4:3:1)

The equation (4.3.1) is named the stochastic differential equation.

Note that the term dW(t) ¼ [W(t þ dt) À W(t)] has the following

properties

E[dW] ¼ 0, E[dW dW] ¼ dt, E[dW dt] ¼ 0 (4:3:2)

Let us calculate (dy)2 having in mind (4.3.2) and retaining the terms

O(dt):4

(dy)2 ¼ [mdt þ sdW]2 ¼ m2 dt2 þ 2mdt sdW þ s2 dW2 % s2 dt (4:3:3)

It follows from (4.3.3) that while dy is a random variable, (dy)2 is a

deterministic one. This result allows one to derive the Ito™s lemma.

Consider a function F(y, t) that depends on both deterministic, t, and

stochastic, y(t), variables. Let us expand the differential for F(y, t)

into the Taylor series retaining linear terms and bearing in mind

equation (4.3.3)

1 @2F

@F @F

(dy)2

dF(y, t) ¼ dy þ dt þ

2 @y2

@y @t

@F s2 @ 2 F

@F

¼ dy þ þ dt (4:3:4)

2 @ y2

@y @t

The Ito™s expression (4.3.4) has an additional term in comparison with

the differential for a function with deterministic independent vari-

36 Stochastic Processes

s2 @ 2 F

ables. Namely, the term dt has stochastic nature. If y(t) is the

2 @ y2

Brownian motion (4.3.1), then

@F s2 @ 2 F

@F

dF(y, t) ¼ [mdt þ sdW(t)] þ þ dt

2 @ y2

@y @t

@F @F s2 @ 2 F @F

¼m þ þ dt þ s dW(t) (4:3:5)

2 @ y2

@ y @t @y

Let us consider the function F ¼ W2 as a simple example for

employing the Ito™s lemma. In this case, m ¼ 0, s ¼ 1, and equation

(4.3.5) is reduced to

dF ¼ dt þ 2WdW (4:3:6)

Finally, we specify the Ito™s expression for the geometric Brownian

@F @ 2 F @F

motion F ¼ exp [y(t)]. Since in this case, ¼ 2 ¼ F and ¼ 0,

@y @y @t

then

s2

dF ¼ m þ Fdt þ sFdW(t) (4:3:7)

2

Hence, if F is the geometric Brownian motion, its relative change, dF/F,

behaves as the arithmetic Brownian motion.

The Ito™s lemma is a pillar of the option pricing theory. It will be

used for deriving the classical Black-Scholes equation in Section 9.4.

4.4 STOCHASTIC INTEGRAL

Now that the stochastic differential has been introduced, let us

discuss how to perform its integration. First, the Riemann-Stieltjes

integral should be defined. Consider a deterministic function f(t)

on the interval t 2 [0, T]. In order to calculate the Riemann integral

of f(t) over the interval [0, T], this interval is divided into n sub-intervals

t0 ¼ 0 < t1 < . . . < tn ¼ T and the following sum should be computed

X

n

Sn ¼ f(ti )(ti À tiÀ1 ) (4:4:1)

i¼1

where ti 2 [tiÀ1 , ti ]. The Riemann integral is the limit of Sn

37

Stochastic Processes

°

T

f(t)dt ¼ lim Sn , max (ti À tiÀ1 ) ! 0 for all i: (4:4:2)

0

Note that the limit (4.4.2) exists only if the function f(t) is sufficiently

smooth. Another type of integral is the Stieltjes integral. Let us define

the differential of a function g(x)

dg ¼ g(x þ dx) À g(x) (4:4:3)

Then the Stieltjes integral for the function g(t) on the interval

t 2 [0, T] is defined as

X

n

Sn ¼ f(ti )[g(ti ) À g(tiÀ1 )] (4:4:4)

i¼1

where ti 2 [tiÀ1 , ti ]

°

T

f(t)dg(t) ¼ lim Sn , where max (ti À tiÀ1 ) ! 0 for all i: (4:4:5)

0

dg

If g(t) has a derivative, then dg % dt ¼ g0 (t)dt, and the sum (4.4.4)

dt

can be written as

X

n

f(ti )g0 (ti )(ti À tiÀ1 )

Sn ¼ (4:4:6)

i¼1

Similarity between the Riemann sum (4.4.1) and the Stieltjes sum

(4.4.6) leads to the definition of the Riemann-Stieltjes integral. The

Riemann-Stieltjes integral over the deterministic functions does not

depend on the particular choice of the point ti within the intervals

[tiÀ1 , ti ]. However, if the function f(t) is random, the sum Sn does

depend on the choice of ti . Consider, for example, the sum (4.4.4) for

the case f(t) ¼ g(t) ¼ W(t) (where W(t) is the Wiener process). It

follows from (4.1.16) that

" #

X

n

E[Sn ] ¼ E W(ti ){W(ti ) À W(tiÀ1 )}

i¼1

X X

n n

¼ [ min (ti , ti ) À min (ti , tiÀ1 )] ¼ (ti À tiÀ1 ) (4:4:7)

i¼1 i¼1

38 Stochastic Processes

Let us set for all i

ti ¼ ati þ (1 À a)tiÀ1 0 1 (4:4:8)

a

Substitution of (4.4.8) into (4.4.7) leads to E[Sn ] ¼ aT. Hence, the

sum (4.4.7) depends on the arbitrary parameter a and therefore can

have any value. Within the Ito™s formalism, the value a ¼ 0 is chosen,

so that ti ¼ tiÀ1 . The stochastic Ito™s integral is defined as

°

T

X

n

f(t)dW(t) ¼ msÀlim f(tiÀ1 )[W(ti ) À W(tiÀ1 )] (4:4:9)

n!1

i¼1

0

The notation ms-lim stands for the mean-square limit. It means that

the difference between the Ito integral in the left-hand side of (4.4.9)

and the sum in the right-hand side of (4.4.9) has variance that ap-

proaches zero as n increases to infinity. Thus, (4.4.9) is equivalent to

2T 32

° Xn

lim E4 f(t)dW(t) À f(tiÀ1 ){W(ti ) À W(tiÀ1 )}5 ¼ 0 (4:4:10)

n!1

iÀ1

0

Let us consider the integral

°

t2

I(t2 , t1 ) ¼ W(t)dW(t) (4:4:11)

t1

as an example of calculating the Ito™s integral. If the function W(t) is

deterministic, then the Riemann-Stieltjes integral IRÀS (t2 , t1 ) equals

IRÀS (t2 , t1 ) ¼ 0:5[W(t2 )2 À W(t1 )2 ] (4:4:12)

However, when W(t) is the Wiener process, the Ito™s integral II (t2 , t1 )

leads to a somewhat unexpected result

II (t2 , t1 ) ¼ 0:5[W(t2 )2 À W(t1 )2 À (t2 À t1 )] (4:4:13)

This follows directly from equation (4.3.6). Obviously, the result

(4.4.13) can be derived directly from the definition of the Ito™s integral

(see Exercise 1). Note that the mean of the Ito™s integral (4.4.11)

equals zero

E[II (t2 , t1 )] ¼ 0 (4:4:14)

39

Stochastic Processes

The difference between the right-hand sides of (4.4.12) and (4.4.13) is

determined by the particular choice of a ¼ 0 in (4.4.8). Stratonovich

has offered another definition of the stochastic integral by choosing

a ¼ 0:5. In contrast to equation (4.4.9), the Stratonovich™s integral is

defined as

X tiÀ1 þ ti

°

T

n

f(t)dW(t) ¼ msÀlim [W(ti ) À W(tiÀ1 )] (4:4:15)

f

2

n!1

i¼1

0

For the integrand in (4.4.11), the Stratonovich™s integral IS (t2 , t1 )

coincides with the Riemann-Stieltjes integral

IS (t2 , t1 ) ¼ 0:5[W(t2 )2 À W(t1 )2 ] (4:4:16)

Both Ito™s and Stratonovich™s formulations can be transformed into

each other. In particular, the Ito™s stochastic differential equation (4.3.1)

dyI (t) ¼ mdt þ sdW(t) (4:4:17)

is equivalent to the Stratonovich™s equation

@s

dyS (t) ¼ m À 0:5s dt þ sdW(t) (4:4:18)

@y

The applications of stochastic calculus in finance are based almost

exclusively on the Ito™s theory. Consider, for example, the integral

°

t2

(4:4:19)

s(t)dW(t)

t1

If no correlation between the function s(t) and the innovation dW(t)

is assumed, then the Ito™s approximation is a natural choice. In this

case, the function s(t) is said to be a nonanticipating function [1, 2].

However, if the innovations dW(t) are correlated (so-called non-white

noise), then the Stratonovich™s approximation appears to be an ad-

equate theory [1, 6].

4.5 MARTINGALES

The martingale methodology plays an important role in the

modern theory of finance [2, 7, 8]. Martingale is a stochastic process

X(t) that satisfies the following condition

40 Stochastic Processes

E[X(t þ 1)jX(t), X(t À 1), . . . ] ¼ X(t) (4:5:1)

The equivalent definition is given by

E[X(t þ 1) À X(t)jX(t), X(t À 1), . . . ] ¼ 0 (4:5:2)

Both these definitions are easily generalized for the continuum pre-

sentation where the time interval, dt, between two sequent moments

t þ 1 and t approaches zero (dt ! 0). The notion of martingale is

rooted in the gambling theory. It is closely associated with the notion

of fair game, in which none of the players has an advantage. The

condition (4.5.1) implies that the expectation of the gamer wealth at

time t þ 1 conditioned on the entire history of the game is equal to the

gamer wealth at time t. Similarly, equation (4.5.2) means that the

expectation to win at every round of the game being conditioned on

the history of the game equals zero. In other words, martingale has no

trend. A process that has positive trend is named submartingale.

A process with negative trend is called supermartingale.

The martingale hypothesis applied to the asset prices states that the

expectation of future price is simply the current price. This assumption

is closely related to the Efficient Market Hypothesis discussed in

Section 2.3. Generally, the asset prices are not martingales for they

incorporate risk premium. Indeed, there must be some reward offered

to investors for bearing the risks associated with keeping the assets. It

can be shown, however, that the prices with discounted risk premium

are martingales [3].

The important property of the Ito™s integral is that it is martingale.

Consider, for example, the integral (4.4.19) approximated with the

sum (4.4.9). Because the innovations dW(t) are unpredictable, it

follows from (4.4.14) that

2tþDt 3

°

E4 s(z)dW(z)5 ¼ 0 (4:5:3)

t

Therefore,

2tþDt 3t

° °

E4 s(z)dW(z)5 ¼ s(z)dW(z) (4:5:4)

0 0

41

Stochastic Processes

and the integral (4.4.19) satisfies the martingale definition. Note that

for the Brownian motion with drift (4.2.14)

2 3

°

tþdt

E[y(t þ dt)] ¼ E4y(t) þ dy5 ¼ y(t) þ mdt (4:5:5)

t

Hence, the Brownian motion with drift is not a martingale. However,

the process

z(t) ¼ y(t) À mt (4:5:6)

is a martingale since

E[z(t þ dt)] ¼ z(t) (4:5:7)

This result follows also from the Doob-Meyer decomposition theorem,

which states that a continuous submartingale X(t) at 0 t 1 with

finite expectation E[X(t)] < 1 can be decomposed into a continuous

martingale and an increasing deterministic process.

4.6 REFERENCES FOR FURTHER READING

Theory and applications of the stochastic processes in natural

sciences are described in [1, 6]. A good introduction to the stochastic

calculus in finance is given in [2]. For a mathematically inclined

reader, the presentation of the stochastic theory with increasing

level of technical details can be found in [7, 8].

4.7 EXERCISES

1. Simulate daily price returns using the geometric Brownian

motion (4.3.7) for four years. Use equation (4.2.15) for approxi-

mating DW. Assume that S(0) ¼ 10, m ¼ 10%, s ¼ 20% (m and

s are given per annum). Assume 250 working days per annum.

2. Prove that

° °

t2 t2

1 n

W(s)n dW(s) ¼ [W(t2 )nþ1 À W(t1 )nþ1 ] À W(s)nÀ1 ds

nþ1 2

t1 t1

Hint: Calculate d(Wnþ1 ) using the Ito™s lemma.

42 Stochastic Processes

3. Solve the Ornstein-Uhlenbeck equation that describes the mean-

reverting process in which the solution fluctuates around its

mean

dX ¼ ÀmXdt þ sdW, m > 0

Hint: introduce the variable Y ¼ X exp (mt).

*4. Derive the integral (4.4.13) directly from the definition of the

Ito™s integral (4.4.9).

Chapter 5

Time Series Analysis

Time series analysis has become an indispensable theoretical tool in

financial and economic research. Section 5.1 is devoted to the com-

monly used univariate autoregressive and moving average models.

The means for modeling trends and seasonality effects are described

in Section 5.2. The processes with non-stationary variance (condi-

tional heteroskedasticity) are discussed in Section 5.3. Finally,

the specifics of the multivariate time series are introduced in

Section 5.4.

5.1 AUTOREGRESSIVE AND MOVING AVERAGE

MODELS

5.1.1 AUTOREGRESSIVE MODEL

First, we shall consider a univariate time series y(t) for a process

that is observed at moments t ¼ 0, 1, . . . , n (see, e.g., [1, 2]). The time

series in which the observation at moment t depends linearly on

several lagged observations at moments t À 1, t À 2, . . . , t À p

y(t) ¼ a1 y(t À 1) þ a2 y(t À 2) þ . . . þ ap y(t À p) þ e(t), t > p (5:1:1)

is called the autoregressive process of order p, or AR(p). The term e(t) in

(5.1.1) is the white noise that satisfies the conditions (4.2.6). The lag

43

44 Time Series Analysis

operator Lp ¼ y(t À p) is often used for describing time series. Note that

L0 ¼ y(t). Equation (5.1.1) in terms of the lag operator has the form

Ap (L)y(t) ¼ e(t) (5:1:2)

where

Ap (L) ¼ 1 À a1 L À a2 L2 À . . . À ap Lp (5:1:3)

The operator Ap (L) is called the AR polynomial in lag operator of

order p. Let us consider AR(1) that starts with a random shock. Its

definition implies that

y(0) ¼ e(0), y(1) ¼ a1 y(0) þ e(1),

y(2) ¼ a1 y(1) þ e(2) ¼ a1 2 e(0) þ a1 e(1) þ e(2), . . .

Hence, by induction,

X

t

a1 i e(t À i)

y(t) ¼ (5:1:4)

i¼0

Mean and variance of AR(1) equal, respectively

E[y(t)] ¼ 0, Var[y(t)] ¼ s2 =(1 À a1 2 ), (5:1:5)

Obviously, the contributions of the ˜˜old™™ noise converge with time to

zero when ja1 j < 1. As a result, this process does not drift too far from

its mean. This feature is named mean reversion.

The process with a1 ¼ 1 is called the random walk

y(t) ¼ y(t À 1) þ e(t) (5:1:6)

In this case, equation (5.1.4) reduces to

X

t

y(t) ¼ e(t À i)

i¼0

The noise contributions to the random walk do not weaken with time.

Therefore, the random walk does not exhibit mean reversion. Now,

consider the process that represents the first difference

x(t) ¼ y(t) À y(t À 1) ¼ e(t) (5:1:7)

Obviously, past noise has only transitory character for the process

x(t). Therefore, x(t) is mean-reverting. Some processes must be

45

Time Series Analysis

differenced several times in order to exclude non-transitory noise

shocks. The processes differenced d times are named integrated of

order d and denoted as I(d). The differencing operator is used for

describing an I(d) process

Di d ¼ (1 À Li )d , j, d ¼ . . . , À2, À1, 0, 1, 2 . . . (5:1:8)

If an I(d) process can be reduced to AR(p) process while applying the

differencing operator, it is named ARI(p, d) process and has the form:

D1 d y(t) À a1 D1 d y(t À 1) À . . . À ap D1 d y(t À p) ¼ e(t), t ! p þ d

(5:1:9)

Note that differencing a time series d times reduces the number of

independent variables by d, so that the total number of independent

variables in ARI(p, d) within the sample with n observations equals

n À p À d.

The unit root is another notion widely used for discerning perman-

ent and transitory effects of random shocks. It is based on the roots of

the characteristic polynomial for the AR(p) model. For example,

AR(1) has the characteristic polynomial

1 À a1 z ¼ 0 (5:1:10)

If a1 ¼ 1, then z ¼ 1 and the characteristic polynomial has the

unit root. In general, the characteristic polynomial roots can have

complex values. The solution to equation (5.1.10) is outside the unit

circle (i.e., z > 1) when a1 < 1. It can be shown that all solutions for

AR(p) are outside the unit circle when

1 À a1 z À a2 z2 À . . . À ap zp ¼ 0 (5:1:11)

5.1.2 MOVING AVERAGE MODELS

A model more general than AR(p) contains both lagged observa-

tions and lagged noise

y(t) ¼ a1 y(t À 1) þ a2 y(t À 2) þ . . . þ ap y(t À p) þ e(t)

þ b1 e(t À 1) þ b2 e(t À 2) þ . . . þ bq e(t À q) (5:1:12)

This model is called autoregressive moving average model of order

(p,q), or simply ARMA(p,q). Sometimes modeling of empirical data

46 Time Series Analysis

requires AR(p) with a rather high number p. Then, ARMA(p, q) may

be more efficient in that the total number of its terms (p þ q) needed

for given accuracy is lower than the number p in AR(p). ARMA(p, q)

can be expanded into the integrated model, ARIMA(p, d, q), similar

to the expansion of AR(p) into ARI(p, d). Neglecting the autoregres-

sive terms in ARMA(p, q) yields a ˜˜pure™™ moving average model

MA(q)

y(t) ¼ e(t) þ b1 e(t À 1) þ b2 e(t À 2) þ . . . þ bq e(t À q) (5:1:13)

MA(q) can be presented in the form

y(t) ¼ Bq (L)e(t) (5:1:14)

where Bq (L) is the MA polynomial in lag operator

Bq (L) ¼ 1 þ b1 L þ b2 L2 þ . . . þ bq Lq (5:1:15)

The moving average model does not depend explicitly on the lagged

values of y(t). Yet, it is easy to show that this model implicitly

incorporates the past. Consider, for example, the MA(1) model

y(t) ¼ e(t) þ b1 e(t À 1) (5:1:16)

with e(0) ¼ 0. For this model,

y(1) ¼ e(1), y(2) ¼ e(2) þ b1 e(1) ¼ e(2) þ b1 y(1),

y(3) ¼ e(3) þ b1 e(2) ¼ e(3) þ b1 (y(2) À b1 y(1)), . . .

Thus, the general result for MA(1) has the form

y(t)(1 À b1 L þ b1 L2 À b1 L3 þ . . . ) ¼ e(t) (5:1:17)

Equation (5.1.17) can be viewed as the AR(1) process, which illus-

trates that the MA model does depend on past.

The MA(q) model is invertible if it can be transformed into an

AR(1) model. It can be shown that MA(q) is invertible if all solu-

tions to the equation

1 þ b1 z þ b2 z2 þ . . . þ bq zq ¼ 0 (5:1:18)

are outside the unit circle. In particular, MA(1) is invertible if

jb1 j < 1. If the process y(t) has a non-zero mean value m, then the

AR(1) model can be presented in the following form

47

Time Series Analysis

y(t) À m ¼ a1 [y(t À 1) À m] þ e(t) ¼ c þ a1 y(t À 1) þ e(t) (5:1:19)

In (5.1.19), intercept c equals:

c ¼ m(1 À a1 ) (5:1:20)

The general AR(p) model with a non-zero mean has the following

form

Ap (L)y(t) ¼ c þ e(t), c ¼ m(1 À a1 À . . . ap ) (5:1:21)

Similarly, the intercept can be included into the general moving

average model MA(q)

y(t) ¼ c þ Bp (L)e(t), c ¼ m (5:1:22)

Note that mean of the MA model coincides with its intercept because

mean of the white noise is zero.

5.1.3 AUTOCORRELATION FORECASTING

AND

Now, let us introduce the autocorrelation function (ACF) for pro-

cess y(t)

r(k) ¼ g(k)=g(0) (5:1:23)

where g(k) is the autocovariance of order k

g(k) ¼ E[y(t) À m)(y(t À k) À m)] (5:1:24)

The autocorrelation functions may have some typical patterns, which

can be used for identification of empirical time series [2]. The obvious

properties of ACF are

r(0) ¼ 1, À 1 < r(k) < 1 for k 6¼ 0 (5:1:25)

ACF is closely related to the ARMA parameters. In particular, for

AR(1)

r(1) ¼ a1 (5:1:26)

The ACF of the first order for MA(1) equals

r(1) ¼ b1 =(b1 2 þ 1) (5:1:27)

The right-hand side of the expression (5.1.27) has the same value for

the inverse transform b1 ! 1=b1 . For example, two processes

48 Time Series Analysis

x(t) ¼ e(t) þ 2e(t À 1)

y(t) ¼ e(t) þ 0:5e(t À 1)

have the same r(1). Note, however, that y(t) is an invertible process

while x(t) is not.

ARMA modeling is widely used for forecasting. Consider a fore-

cast of a variable y(t þ 1) based on a set of n variables x(t) known at

moment t. This set can be just past values of y, that is,

y(t), y(t À 1), . . . , y(t À n þ 1). Let us denote the forecast with

^(t þ 1jt). The quality of forecast is usually defined with the some

y

loss function. The mean squared error (MSE) is the conventional loss

function in many applications

MSE(^(t þ 1jt)) ¼ E[(y(t þ 1)À^(t þ 1jt))2 ]

y y (5:1:28)

The forecast that yields the minimum of MSE turns out to be the

expectation of y(t þ 1) conditioned on x(t)

^(t þ 1jt) ¼ E[y(t þ 1)jx(t)]

y (5:1:29)

In the case of linear regression

y(t þ 1) ¼ b0 x(t) þ e(t) (5:1:30)

MSE is reduced to the ordinary least squares (OLS) estimate for b.

For a sample with T observations,

X X

T T

x(t)x0 (t)

b¼ x(t)y(t þ 1)= (5:1:31)

t¼1 t¼1

Another important concept in the time series analysis is the maximum

likelihood estimate (MLE) [2]. Consider the general ARMA model

(5.1.12) with the white noise (4.2.6). The problem is how to estimate

the ARMA parameters on the basis of given observations of y(t). The

idea of MLE is to find such a vector r0 ¼ (a1 , . . . , ap , . . . ,

b1 , . . . , bq , s2 ) that maximizes the likelihood function for given ob-

servations (y1 , y2 , . . . , yT )

f1 , 2 , . . . , T (y1 , y2 , . . . , yT ; r0 ) (5:1:32)

The likelihood function (5.1.32) has the sense of probability of ob-

serving the data sample (y1 , y2 , . . . , yT ). In this approach, the ARMA

model and the probability distribution for the white noise should be

49

Time Series Analysis

specified first. Often the normal distribution leads to reasonable

estimates even if the real distribution is different. Furthermore, the

likelihood function must be calculated for the chosen ARMA model.

Finally, the components of the vector r0 must be estimated. The latter

step may require sophisticated numerical optimization technique.

Details of implementation of MLE are discussed in [2].

5.2 TRENDS AND SEASONALITY

Finding trends is an important part of the time series analysis.

Presence of trend implies that the time series has no mean reversion.

Moreover, mean and variance of a trending process depend on the

sample. The time series with trend is named non-stationary. If a

process y(t) is stationary, its mean, variance, and autocovariance are

finite and do not depend on time. This implies that autocovariance

(5.1.24) depends only on the lag parameter k. Such a definition of

stationarity is also called covariance-stationarity or weak stationarity

because it does not impose any restrictions on the higher moments of

the process. Strict stationarity implies that higher moments also do

not depend on time. Note that any MA process is covariance-station-

ary. However, the AR(p) process is covariance-stationary only if the

roots of its polynomial are outside the unit circle.

It is important to discern deterministic trend and stochastic trend.

They have a different nature yet their graphs may look sometimes

very similar [1]. Consider first the AR(1) model with the deterministic

trend

y(t) À m À ct ¼ a1 (y(t À 1) À m À c(t À 1)) þ e(t) (5:2:1)

Let us introduce z(t) ¼ y(t) À m À ct. Then equation (5.2.1) has the

solution

X

t

t

a1 tÀi e(t)

z(t) ¼ a1 z(0) þ (5:2:2)

i¼1

where z(0) is a pre-sample starting value of z. Obviously, the random

shocks are transitory if ja1 j < 1. The trend incorporated in the defin-

ition of z(t) is deterministic when ja1 j < 1. However, if a1 ¼ 1, then

equation (5.2.1) has the form

50 Time Series Analysis

y(t) ¼ c þ y(t À 1) þ e(t) (5:2:3)

The process (5.2.3) is named the random walk with drift. In this case,

equation (5.2.2) is reduced to

X

t

z(t) ¼ z(0) þ (5:2:4)

e(t)

i¼1

The sum of non-transitory shocks in the right-hand side of equation

(5.2.4) is named stochastic trend. Consider, for example, the determin-

istic trend model with m ¼ 0 and e(t) ¼ N(0, 1)

y(t) ¼ 0:1t þ e(t) (5:2:5)

and the stochastic trend model

y(t) ¼ 0:1 þ y(t À 1) þ e(t), y(0) ¼ 0 (5:2:6)

As it can be seen from Figure 5.1, both graphs look similar. In

general, however, the stochastic trend model can deviate from the

deterministic trend for a long time.

Stochastic trend implies that the process is I(1). Then the lag

polynomial (5.1.3) can be represented in the form

7

y(t)

6

5

4

B

3

A

2

1

t

0

0 10 20 30 40

Figure 5.1 Deterministic and stochastic trends: A - equation (5.2.5), B -

equation (5.2.6).

51

Time Series Analysis

Ap (L) ¼ (1 À L)ApÀ1 (L) (5:2:7)

Similarly, the process I(2) has the lag polynomial

Ap (L) ¼ (1 À L)2 ApÀ2 (L) (5:2:8)

and so on. The standard procedure for testing presence of the unit

root in time series is the Dickey-Fuller method [1, 2]. This method is

implemented in major econometric software packages (see the Section

5.5).

Seasonal effects may play an important role in the properties of time

series. Sometimes, there is a need to eliminate these effects in order to

focus on the stochastic specifics of the process. Various differencing

filters can be used for achieving this goal [2]. In other cases, seasonal

effect itself may be the object of interest. The general approach for

handling seasonal effects is introducing dummy parameters D(s, t)

where s ¼ 1, 2, . . . , S; S is the number of seasons. For example,

S ¼ 12 is used for modeling the monthly effects. Then the parameter

D(s, t) equals 1 at a specific season s and equals zero at all other

seasons. The seasonal extension of an ARMA(p,q) model has the

following form

y(t) ¼ a1 y(t À 1) þ a2 y(t À 2) þ . . . þ ap y(t À p) þ e(t)

X

S

þb1 e(t À 1) þ b2 e(t À 2) þ . . . þ bq e(t À q) þ ds D(s, t) (5:2:9)

s¼1

Note that forecasting with the model (5.2.9) requires estimating

(p þ q þ S) parameters.

5.3 CONDITIONAL HETEROSKEDASTICITY

So far, we considered random processes with the white noise (4.2.6)

that are characterized with constant unconditional variance. Condi-

tional variance has not been discussed so far. In general, the processes

with unspecified conditional variance are named homoskedastic.

Many random time series are not well described with the IID process.

In particular, there may be strong positive autocorrelation in squared

asset returns. This means that large returns (either positive or nega-

tive) follow large returns. In this case, it is said that the return

52 Time Series Analysis

volatility is clustered. The effect of volatility clustering is also called

autoregressive conditional heteroskedasticity (ARCH). It should be

noted that small autocorrelation in squared returns does not neces-

sarily mean that there is no volatility clustering. Strong outliers that

lead to high values of skewness and kurtosis may lower autocorrela-

tion. If these outliers are removed from the sample, volatility cluster-

ing may become apparent [3].

Several models in which past shocks contribute to the current

volatility have been developed. Generally, they are rooted in the

ARCH(m) model where the conditional variance is a weighed sum

of m squared lagged returns

s2 (t) ¼ v þ a1 e2 (t À 1) þ a2 e2 (t À 2) þ . . . þ am e2 (t À m) (5:3:1)

In (5.3.1), e(t) $ N(0, s2 (t)), v > 0, a1 , . . . , am ! 0. Unfortunately,

application of the ARCH(m) process to modeling the financial time

series often requires polynomials with high order m. A more efficient

model is the generalized ARCH (GARCH) process. The GARCH

(m, n) process combines the ARCH(m) process with the AR(n) pro-

cess for lagged variance

s2 (t) ¼ v þ a1 e2 (t À 1) þ a2 e2 (t À 2) þ . . . þ am e2 (t À m)

þ b1 s2 (t À 1) þ b2 s2 (t À 2) þ . . . þ bn s2 (t À n) (5:3:2)

The simple GARCH(1, 1) model is widely used in applications

s2 (t) ¼ v þ ae2 (t À 1) þ bs2 (t À 1) (5:3:3)

Equation (5.3.3) can be transformed into

s2 (t) ¼ v þ (a þ b)s2 (t À 1) þ a[e2 (t) À s2 (t À 1)] (5:3:4)

The last term in equation (5.3.4) is conditioned on information avail-

able at time (t À 1) and has zero mean. This term can be treated as a

shock to volatility. Therefore, the unconditional expectation of vola-

tility for the GARCH(1, 1) model equals

E[s2 (t)] ¼ v=(1 À a À b) (5:3:5)

This implies that the GARCH(1, 1) process is weakly stationary when

a þ b < 1. The advantage of the stationary GARCH(1, 1) model is

that it can be easily used for forecasting. Namely, the conditional

expectation of volatility at time (t þ k) equals [4]

53

Time Series Analysis

E[s2 (t þ k)] ¼ (a þ b)k [s2 (t) À v=(1 À a À b)] þ v=(1 À a À b) (5:3:6)

The GARCH(1, 1) model (5.3.4) can be rewritten as

s2 (t) ¼ v=(1 À b) þ a(e2 (t À 1) þ be2 (t À 2) þ b2 e2 (t À 3) þ . . . ) (5:3:7)

Equation (5.3.7) shows that the GARCH(1, 1) model is equivalent to

the infinite ARCH model with exponentially weighed coefficients.

This explains why the GARCH models are more efficient than the

ARCH models.

Several GARCH models have been described in the econometric

literature [1“3]. One popular GARCH(1, 1) model with a þ b ¼ 1 is

called integrated GARCH (IGARCH). It has the autoregressive unit

root. Therefore volatility of this process follows random walk and can

be easily forecasted

E[s2 (t þ k)] ¼ s2 (t) þ kv (5:3:8)

IGARCH can be presented in the form

s2 (t) ¼ v þ (1 À l)e2 (t À 1) þ ls2 (t À 1) (5:3:9)

where 0 < l < 1. If v ¼ 0, IGARCH coincides with the exponentially

weighed moving average (EWMA)

X

n

liÀ1 e2 (t À i)

2

s (t) ¼ (1 À l) (5:3:10)

i¼1

Indeed, the n-period EWMA for a time series y(t) is defined as

z(t) ¼ [y(t À 1) þ ly(t À 2) þ l2 y(t À 3) þ . . .þ

(5:3:11)

nÀ1 n

y(t À n)]=(1 þ l þ . . . þ l )

l

where 0 < l < 1. For large n, the denominator of (5.3.11) converges

to 1=(1 À l). Then for z(t) ¼ s2 (t) and y(t) ¼ e2 (t), equation (5.3.11) is

equivalent to equation (5.3.7) with v ¼ 0.

The GARCH models discussed so far are symmetric in that the

shock sign does not affect the resulting volatility. In practice, how-

ever, negative price shocks influence volatility more than the positive

shocks. A noted example of the asymmetric GARCH model is the

exponential GARCH (EGARCH) (see, e.g., [3]). It has the form

log [s2 (t)] ¼ v þ b log [s2 (t À 1)] þ lz(t À 1)þ

p¬¬¬¬¬¬¬¬ (5:3:12)

g(jz(t À 1)j À 2=p)

54 Time Series Analysis

p¬¬¬¬¬¬¬¬

where z(t) ¼ e(t)=s(t). Note that E[z(t)] ¼ 2=p. Hence, the last term

in (5.3.12) is the mean deviation of z(t). If g > 0 and l < 0, negative

shocks lead to higher volatility than positive shocks.

5.4 MULTIVARIATE TIME SERIES

Often the current value of a variable depends not only on its past

values, but also on past and/or current values of other variables.

Modeling of dynamic interdependent variables is conducted with

multivariate time series. The multivariate models yield not only new

implementation problems but also some methodological difficulties.

In particular, one should be cautious with simple regression models

y(t) ¼ ax(t) þ e(t) (5:4:1)

that may lead to spurious results. It is said that (5.4.1) is a simultan-

eous equation as both explanatory (x) and dependent (y) variables are

present at the same moment of time. A notorious example for spuri-

ous inference is the finding that the best predictor in the United

Nations database for the Standard & Poor™s 500 stock index is

production of butter in Bangladesh [5].

A statistically sound yet spurious relationship is named data

snooping. It may appear when the data being the subject of research

are used to construct the test statistics [4]. Another problem with

simultaneous equations is that noise can be correlated with the ex-

planatory variable, which leads to inaccurate OLS estimates of the

regression coefficients. Several techniques for handling this problem

are discussed in [2].

A multivariate time series y(t) ¼ (y1 (t), y2 (t), . . . , yn (t))0 is a vector

of n processes that have data available for the same moments of time.

It is supposed also that all these processes are either stationary or

have the same order of integration. In practice, the multivariate

moving average models are rarely used due to some restrictions [1].

Therefore, we shall focus on the vector autoregressive model (VAR)

that is a simple extension of the univariate AR model to multivariate

time series. Consider a bivariate VAR(1) process

y1 (t) ¼ a10 þ a11 y1 (t À 1) þ a12 y2 (t À 1) þ e1 (t)

y2 (t) ¼ a20 þ a21 y1 (t À 1) þ a22 y2 (t À 1) þ e2 (t) (5:4:2)

55

Time Series Analysis

that can be presented in the matrix form

y(t) ¼ a0 þ Ay(t À 1) þ «(t) (5:4:3)

In (5.4.3), y(t) ¼ (y1 (t), y2 (t))0 , a0 ¼ (a10 , a20 )0 , «(t) ¼ (e1 (t), e2 (t))0 ,

a11 a12

and A ¼ .

a21 a22

The right-hand sides in example (5.4.2) depend on past values only.

However, dependencies on current values can also be included (so-

called simultaneous dynamic model [1]). Consider the modification of

the bivariate process (5.4.2)

y1 (t) ¼ a11 y1 (t À 1) þ a12 y2 (t) þ e1 (t)

y2 (t) ¼ a21 y1 (t) þ a22 y2 (t À 1) þ e2 (t) (5:4:4)

The matrix form of this process is

Àa12 y1 (t) y1 (t À 1)

1 a11 0 e1 (t)

¼ þ (5:4:5)

Àa21 0 a22 y2 (t À 1)

1 y2 (t) e2 (t)

Multiplying both sides of (5.4.5) with the inverse of the left-hand

matrix yields

a12 a22 y1 (t À 1)

y1 (t) a11

¼ (1 À a12 a21 )À1

y2 (t À 1)

y2 (t) a11 a21 a22

1 a12 e1 (t)

þ (1 À a12 a21 )À1 (5:4:6)

a21 1 e2 (t)

Equation (5.4.6) shows that the simultaneous dynamic models can

also be represented in the VAR form.

In the general case of n-variate time series, VAR(p) has the form [2]

y(t) ¼ a0 þ A1 y(t À 1) þ . . . þ Ap y(t À p) þ «(t) (5:4:7)

where y(t), a0 , and «(t) are n-dimensional vectors and Ai (i ¼ 1, . . . , p)

are n x n matrices. Generally, the white noises «(t) are mutually

independent. Let us introduce

Ap (L) ¼ In À A1 L À . . . À Ap Lp (5:4:8)

where In is the n-dimensional unit vector. Then equation (5.4.7) can

be presented as

56 Time Series Analysis

Ap (L)y(t) ¼ a0 þ «(t) (5:4:9)

Two covariance-stationary processes x(t) and y(t) are jointly covar-

iance-stationary if their covariance Cov(x(t), y(t À s)) depends on lag

s only. The condition for the covariance-stationary VAR(p) is the

generalization of (5.1.11) for AR(p). Namely, all values of z satisfying

the equation

jIn À A1 z À . . . À Ap zp j ¼ 0 (5:4:10)

must lie outside the unit circle. Equivalently, all solutions of the

equation

jIn lp À A1 lpÀ1 À . . . À Ap j ¼ 0 (5:4:11)

must satisfy the condition jlj < 1.

The problem of whether the lagged values of process y can improve

prediction of process x (so-called Granger causality) is often posed in

forecasting. It is said that if y fails to Granger-cause x, then the

following condition holds for all s > 0

MSE(E[x(t þ s)jx(t), x(t À 1), . . . ]) ¼

MSE(E[x(t þ s)jx(t), x(t À 1), . . . , y(t), y(t À 1), . . . ]) (5:4:12)

In this case, y is called exogenous variable with respect to x. For

example, y2 (t) is exogenous with respect to y1 (t) in (5.4.2) if a12 ¼ 0.

General methods for testing the Granger causality are described in [2].

The last notion that is introduced in this section is cointegration.

Two processes are cointegrated if they both have unit roots (i.e., they

both are I(1) ), but some linear combination of these processes is

stationary (i.e., is I(0) ). This definition can be extended to an arbi-

trary number of processes. As an example, consider a bivariate model

y1 (t) ¼ ay2 (t) þ e1 (t)

y2 (t) ¼ y2 (t À 1) þ e2 (t) (5:4:13)

Both processes y1 (t) and y2 (t) are random walks. However the process

z(t) ¼ y1 (t) À ay2 (t) (5:4:14)

is stationary. Details of testing the integration hypothesis are de-

scribed in [2]. Implications of cointegration in financial data analysis

are discussed in [3].

57

Time Series Analysis

5.5 REFERENCES FOR FURTHER READING

AND ECONOMETRIC SOFTWARE

A good concise introduction into the time series analysis is given by

Franses [1]. The comprehensive presentation of the subject can be

found in monographs by Hamilton [2] and Green [6]. Important

specifics of time series analysis in finance, particularly for analysis

and forecasting of volatility, are discussed by Alexander in [3]. In this

chapter, only time series on homogenous grids were considered. Spe-

cifics of analysis of tick-by-tick data on non-homogenous grids are

discussed in [7]. It should be noted that the exercises with the econo-

metric software packages are very helpful for learning the subject.

Besides the generic scientific software such as SAS, Splus, and Matlab

that have modules for the time series analysis, several econometric

software packages are available: PCGive, RATS, Shazam, and TSP.

While these packages may have the trial and student versions, Easy-

Reg offered by H. J. Bierens5 has sufficient capability for an intro-

ductory course and is free of charge.

5.6 EXERCISES

1. Verify equations (5.1.25)“(5.1.27).

2. Verify if the process y(t) ¼ 1:2y(t À 1) À 0:32y(t À 2) þ e(t)

(where e(t) is IID) is covariance-stationary.

3. Estimate the linear dividend growth rate from the dividends

paid in the last years (verify these data on the AMEX website:

http://www.amex.com): 2000 “ $1.51, 2001 “ $1.42, 2002 “ $1.50,

and 2003 “ $1.63.

4. Verify equation (5.4.6) for the processes (5.4.4).

This page intentionally left blank

Chapter 6

Fractals

In short, fractals are the geometric objects that are constructed by

repeating geometric patterns at a smaller and smaller scale. The

fractal theory is a beautiful theory that describes beautiful objects.

Development of the fractal theory and its financial applications has

been greatly influenced by Mandelbrot [1]. In this chapter, a short

introduction to the fractal theory relevant to financial applications is

given. In Section 6.1, the basic definitions of the fractal theory are

provided. Section 6.2 is devoted to the concept of multifractals that

has been receiving a lot of attention in the recent research of the

financial time series.

6.1 BASIC DEFINITIONS

Self-similarity is the defining property of fractals. This property

implies that the geometric patterns are isotropic, meaning shape

transformations along all coordinate axes are the same. If the geo-

metric patterns are not isotropic, say the object is contracted along

the y-axis with a scale different from that of along the x-axis, it is said

that the object is self-affine. The difference between self-similarity and

self-affinity is obvious for geometric objects. However, only self-

affinity is relevant for the graphs of financial time series [1]. Indeed,

since time and prices are measured with different units, their scaling

factors cannot be compared.

59

60 Fractals

(a)

(b)

Figure 6.1 Deterministic (a) and stochastic (b) fractals with the same

fractal dimension D ¼ ln(5)/ln(3).

If the geometric pattern used in fractal design is deterministic, the

resulting object is named a deterministic fractal. Consider an example

in path (a) of Figure 6.1 where a square is repeatedly divided into nine

small squares and four of them that have even numbers are deleted

(the squares are numerated along rows). If four squares are deleted at

random, one obtains a random fractal (one of such fractals is depicted

in path (b) of Figure 6.1). While the deterministic and stochastic

fractals in Figure 6.1 look quite different, they have the same fractal

dimension. Let us outline the physical sense of this notion.

Consider a jagged line, such as a coastline. It is embedded into a

plane. Thus, its dimension is lower than two. Yet, the more zigzagged

the line is, the greater part of plane it covers. One may then expect

that the dimension of a coastline is higher than one and it depends on

a measure of jaggedness. Another widely used example is a crumpled

paper ball. It is embedded in three-dimensional space. Yet, the

61

Fractals

volume of a paper ball depends on the sizes of its creases. Therefore,

its dimension is expected to be in the range of two to three. Thus, we

come to the notion of the fractal (non-integer) dimension for objects

that cannot be accurately described within the framework of Eucli-

dian geometry.

There are several technical definitions for the fractal dimension [2].

The most popular one is the box-counting dimension. It implies map-

ping the grid boxes of size h (e.g., squares and cubes for the two-

dimensional and the three-dimensional spaces, respectively) onto the

object of interest. The number of boxes that fill the object is

N(h) $ hÀD . The fractal dimension D is then the limit

D ¼ lim [ ln N(h)= ln (1=h)] (6:1:1)

h!0

The box-counting dimension has another equivalent definition with

the fixed unit size of the grid box and varying object size L

D ¼ lim [ ln N(L)= ln (L)] (6:1:2)

L!1

The fractal dimension for both deterministic and stochastic fractals in

Figure 6.1 equals D ¼ ln (5)= ln (3) % 1:465. Random fractals exhibit

self-similarity only in a statistical sense. Therefore, the scale invari-

ance is a more appropriate concept for random fractals than self-

similarity.

The iterated function systems are commonly used for generating

fractals. The two-dimensional iterated function algorithm for N fixed

points can be presented as

X(k þ 1) ¼ rX(k) þ (1 À r)XF (i)

Y(k þ 1) ¼ rY(k) þ (1 À r)YF (i) (6:1:3)

In (6.1.3), r is the scaling parameter; XF (i) and YF (i) are the coordin-

ates of the fixed point i; i ¼ 1, 2, . . . N. The fixed point i is selected at

every iteration at random. A famous example with N ¼ 3, the Sier-

pinski triangle, is shown in Figure 6.2.

Now, let us turn to the random processes relevant to financial time

series. If a random process X(t) is self-affine, then it satisfies the

scaling rule

X(ct) ¼ cH X(t) (6:1:4)

62 Fractals

Figure 6.2 The Sierpinski triangle with r ¼ 0:5.

The parameter H is named the Hurst exponent. Let us introduce the

fractional Brownian motion BH (t). This random process satisfies

the following conditions for all t and T [1]

E[BH (t þ T) À BH (t)] ¼ 0, (6:1:5)

E[BH (t þ T) À BH (t)]2 ¼ T2H (6:1:6)

When H ¼ 1„2 , the fractional Brownian motion is reduced to the

regular Brownian motion. For the Brownian motion, the correlation

between the past average E[BH (t) À BH (t À T)]=T and the future aver-

age E[BH (t þ T) ÀBH (t)]=T equals

C ¼ 22HÀ1 À 1 (6:1:7)

Obviously, this correlation does not depend on T. If 1„2 < H < 1, then

C > 0 and it is said that BH (t) is a persistent process. Namely, if BH (t)

grew in the past, it will most likely grow in the immediate future.

63

Fractals

Conversely, if BH (t) decreased in the past, it will most probably

continue to fall. Thus, persistent processes maintain trend. In the

opposite case (0 < H < 1„2, C < 0), the process is named anti-persist-

ent. It is said also that anti-persistent processes are mean reverting; for

example, if the current process innovation is positive, then the next

one will most likely be negative, and vice versa. There is a simple

relationship between the box-counting fractal dimension and the

Hurst exponent

D¼2ÀH (6:1:8)

The fractal dimension of a time series can be estimated using the

Hurst™s rescaled range (R/S) analysis [1, 3]. Consider the data set

xi (i ¼ 1, . . . N) with mean mN and the standard deviation sN . To

define the rescaled range, the partial sums Sk must be calculated

X

k

Sk ¼ (xi À mN ), 1 k N (6:1:9)

i¼1

The rescaled range equals

R=S ¼ [ max (Sk ) À min (Sk )]=sN , 1 k N (6:1:10)

The value of R/S is always greater than zero since max (Sk ) > 0 and

min (Sk ) < 0. For given R/S, the Hurst exponent can be estimated

using the relation

R=S ¼ (aN)H (6:1:11)

where a is a constant. The R/S analysis is superior to many other

methods of determining long-range dependencies. But this approach

has a noted shortcoming, namely, high sensitivity to the short-range

memory [4].

6.2 MULTIFRACTALS

Let us turn to the generic notion of multifractals (see, e.g., [5]).

Consider the map filled with a set of boxes that are used in the box-

counting fractal dimension. What matters for the fractal concept is

whether the given box belongs to fractal. The basic idea behind the

notion of multifractals is that every box is assigned a measure m

that characterizes some probability density (e.g., intensity of color

64 Fractals

between the white and black limits). The so-called multiplicative

process (or cascade) defines the rule according to which measure is

fragmented when the object is partitioned into smaller components.

The fragmentation ratios that are used in this process are named

¨

multipliers. The multifractal measure is characterized with the Holder

exponent a

a ¼ lim [ ln m(h)= ln (h)] (6:2:1)

h!0

where h is the box size. Let us denote the number of boxes with given

¨

h and a via Nh (a). The distribution of the Holder exponents in the

limit h ! 0 is sometimes called the multifractal spectrum

f(a) ¼ À lim [ ln Nh (a)= ln (h)] (6:2:2)

h!0

The distribution f(a) can be treated as a generalization of the fractal

dimension for the multifractal processes.

Let us describe the simplest multifractal, namely the binomial

measure m on the interval [0, 1] (see [5] for details). In the binomial

cascade, two positive multipliers, m0 and m1 , are chosen so that

m0 þ m1 ¼ 1. At the step k ¼ 0, the uniform probability measure

for mass distribution, m0 ¼ 1, is used. At the next step (k ¼ 1), the

measure m1 uniformly spreads mass in proportion m0 =m1 on the

intervals [0, 1„2 ] and [1„2 , 1], respectively. Thus, m1 [0, 1„2 ] ¼ m0 and

m1 [ 1„2 , 1] ¼ m1 . In the next steps, every interval is again divided into

two subintervals and the mass of the interval is distributed between

subintervals in proportion m0 =m1 . For example, at k ¼ 2: m2 [0, 1„4 ]

¼ m0 m0 , m2 [ 1„4 , 1„2 ] ¼ m2 [1„2 , 3„4 ] ¼ m0 m1 , m2 [3„4 , 1] ¼ m1 m1 and so on.

At the kth iteration, mass is partitioned into 2k intervals of length 2Àk .

Let us introduce the notion of the binary expansion 0:b1 b2 . . . bk for

the point x ¼ b1 2À1 þ b2 2À2 þ bk 2Àk where 0 x 1 and

0 < bk < 1. Then the measure for every dyadic interval I0b1b2 : : : bk of

length 2Àk equals

Y

k

mbi ¼ m0 n m1 kÀn

m0b1b2 : : : bk ¼ (6:2:3)

i¼1

_

where n is the number of digits 0 in the address 0b1 b2 . . . bk of the

interval™s left end, and (k À n) is the number of digits 1. Since the

subinterval mass is preserved at every step, the cascade is called

65

Fractals

3 3

(a) (b)

2.5 2.5

2 2

1.5 1.5

1 1

0.5 0.5

0 0

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31

3

3

(c) (d)

2.5 2.5

2 2

1.5 1.5

1 1

0.5 0.5

0 0

14 7 10 13 16 19 22 25 28 31 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31

3

3

(f)

(e)

2.5

2.5

2 2

1.5 1.5

1 1

0.5 0.5

0 0

1 4 7 10 13 16 19 22 25 28 31 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31

Figure 6.3 Binomial cascade with m0 ¼ 0.6: a) k ¼ 0, b) k ¼ 1, c) k ¼ 2, d)

k ¼ 3, e) k ¼ 4, f) k ¼ 5.

conservative or microcanonical. The first five steps of the binomial

cascade with m0 ¼ 0:6 are depicted in Figure 6.3.

The multifractal spectrum of the binomial cascade equals

amax À a amax À a a À amin a À amin

f(a) ¼ À À

log2 log2

amax À amin amax À amin amax À amin amax À amin

(6:2:4)

The distribution (6.2.4) is confined with the interval [amin , amax ]. If

m0 ! 0:5, then amin ¼ À log2 (m0 ) and amax ¼ À log2 (1 À m0 ). The

binomial cascade can be generalized in two directions. First, one

can introduce a multinomial cascade by increasing the number of

subintervals to N > 2. Note that the condition

X

NÀ1

mi ¼ 1 (6:2:5)

0

66 Fractals

is needed for preserving the conservative character of the cascade.

Secondly, the values of mi can be randomized rather than assigned

fixed values. A cascade with randomized mi is called canonical. In this

case, the condition (6.2.5) is satisfied only on average, that is

" #

X

NÀ1

mi ¼ 1

E (6:2:6)

0

An example of the randomized cascade that has an explicit expres-

sion for the multifractal spectrum is the lognormal cascade [6]. In this

process, the multiplier that distributes the mass of the interval, M, is

determined with the lognormal distribution (i.e., log2 (M) is drawn

from the Gaussian distribution). If the Gaussian mean and variance

are l and s, respectively, then the conservative character of the

cascade E[M] ¼ 0.5 is preserved when

s2 ¼ 2(l À 1)= ln (2) (6:2:7)

The multifractal spectrum of the lognormal cascade that satisfies

(6.2.7) equals

(a À l)2

f(a) ¼ 1 À (6:2:8)

4(l À 1)

Note that in contrast to the binomial cascade, the lognormal

cascade may yield negative values of f(a), which requires interpret-

ation of f(a) other than the fractal dimension.

Innovation of multifractal process, DX ¼ X(t þ Dt) À X(t), is de-

scribed with the scaling rule

E[j(DX)jq ] ¼ c(q)(Dt)t(q)þ1 (6:2:9)

where c(q) and t(q) (so-called scaling function) are deterministic func-

tions of q. It can be shown that the scaling function t(q) is always

concave. Obviously, t(0) ¼ À1. A self-affine process (6.1.4) can be

treated as a multifractal process with t(q) ¼ Hq À 1. In particular, for

the Wiener processes, H ¼ 1„2 and tw (q) ¼ q=2 À 1. The scaling func-

tion of the binomial cascade can be expressed in terms of its multi-

pliers

t(q) ¼ log2 (m0 q þ m1 q ) (6:2:10)

67

Fractals

The scaling function t(q) is related to the multifractal spectrum f(a)

via the Legendre transformation

t(q) ¼ min[qa À f(a)] (6:2:11)

a

which is equivalent to

f(a) ¼ arg min[qa À t(q)] (6:2:12)

q

Note that f(a) ¼ q(a À H) þ 1 for the self-affine processes.

In practice, the scaling function of a multifractal process X(t) can

be calculated using so-called partition function

X

NÀ1

jX(t þ Dt) À X(t)jq

Sq (T, Dt) ¼ (6:2:13)

i¼0

where the sample X(t) has N points within the interval [0, T] with the

mesh size Dt. It follows from (6.2.9) that

log {E[Sq (T, Dt)]} ¼ t(q) log (Dt) þ c(q) log T (6:2:14)

Thus, plotting log {E[Sq (T, Dt)]} against log (Dt) for different values

of q reveals the character of the scaling function t(q). Multifractal

models have become very popular in analysis of the financial time

series. We shall return to this topic in Section 8.2

6.3 REFERENCES FOR FURTHER READING

The Mandelbrot™s work on scaling in the financial time series is

compiled in the collection [1]. Among many excellent books on frac-

tals, we choose [2] for its comprehensive material that includes a

description of relations between chaos and fractals and an important

chapter on multifractals [5].

6.4 EXERCISES

*1. Implement an algorithm that draws the Sierpinski triangle with

r ¼ 0:5 (see Figure 6.2).

Hint: Choose the following fixed points: (0, 0), (0, 100), (100,

0). Use the following method for the randomized choice of the

68 Fractals

fixed point: i ¼ [10 rand()] %3 where rand() is the uniform

distribution within [0, 1] and % is modulus (explain the ration-

ale behind this method). Note that at least 10000 iterations are

required for a good-quality picture.

*2. Reproduce the first five steps of the binomial cascade with

m0 ¼ 0:6 (see Figure 6.3). How will this cascade change if

m0 ¼ 0:8?

Chapter 7

Nonlinear Dynamical Systems

7.1 MOTIVATION

It is well known that many nonlinear dynamical systems, including

seemingly simple cases, can exhibit chaotic behavior. In short, the

presence of chaos implies that very small changes in the initial condi-

tions or parameters of a system can lead to drastic changes in its

behavior. In the chaotic regime, the system solutions stay within the

phase space region named strange attractor. These solutions never

repeat themselves; they are not periodic and they never intersect.

Thus, in the chaotic regime, the system becomes unpredictable. The

chaos theory is an exciting and complex topic. Many excellent books

are devoted to the chaos theory and its applications (see, e.g., refer-

ences in Section 7.7). Here, I only outline the main concepts that may

be relevant to quantitative finance.

The first reason to turn to chaotic dynamics is a better understand-

ing of possible causes of price randomness. Obviously, new infor-

mation coming to the market moves prices. Whether it is a

company™s performance report, a financial analyst™s comments, or a

macroeconomic event, the company™s stock and option prices may

change, thus reflecting the news. Since news usually comes unexpect-

edly, prices change in unpredictable ways.1 But is new information the

only source reason for price randomness? One may doubt this while

observing the price fluctuations at times when no relevant news is

69

70 Nonlinear Dynamical Systems

released. A tempting proposition is that the price dynamics can be

attributed in part to the complexity of financial markets. The possi-

bility that the deterministic processes modulate the price variations

has a very important practical implication: even though these pro-

cesses can have the chaotic regimes, their deterministic nature means

that prices may be partly forecastable. Therefore, research of chaos in

finance and economics is accompanied with discussion of limited

predictability of the processes under investigation [1].

There have been several attempts to find possible strange attractors

in the financial and economic time series (see, e.g., [1“3] and refer-

ences therein). Discerning the deterministic chaotic dynamics from a

˜˜pure™™ stochastic process is always a non-trivial task. This problem is

even more complicated for financial markets whose parameters may