. 2
( 5)


diffusion equation. Langevin offered another simple method for de-
riving the Fokker-Planck equation. He considered one-dimensional
motion of a spherical particle of mass m and radius R that is subjected
to two forces. The first force is the viscous drag force described by the
Stokes formula, F ¼ À6pZRv, where Z is viscosity and v ¼ is the
particle velocity. Another force, Z, describes collisions of the water
molecules with the particle and therefore has a random nature. The
Langevin equation of the particle motion is
¼ À6pZRv þ Z
m (4:2:1)
Let us multiply both sides of equation (4.2.1) by r. Since
dv d 1d 2
r ¼ (rv) À v2 and rv ¼ (r ), then
dt dt 2dt

dr 2
1 d2 2 d
¼ À3pZR (r2 ) þ Zr
m 2 (r ) À m (4:2:2)
2 dt dt dt
Note that the mean kinetic energy of a spherical particle, E[ 1 mv2 ],
equals 2 kT. Since E[Zr] ¼ 0 due to the random nature of Z, averaging
of equation (4.2.2) yields
Stochastic Processes

d2 d
m 2 E[r2 ] þ 6pZR E[r2 ] ¼ 6kT (4:2:3)
dt dt
The solution to equation (4.2.3) is
E[r2 ] ¼ kT=(pZR) þ C exp (À6pZRt=m) (4:2:4)
where C is an integration constant. The second term in equation
(4.2.4) decays exponentially and can be neglected in the asymptotic
solution. Then

E[r2 ] À r2 ¼ [kT=(pZR)]t (4:2:5)

where r0 is the particle position at t ¼ 0. It follows from the compari-
son of equations (4.2.5) and (4.1.15) that D ¼ kT=(pZR).1
The Brownian motion can be also derived as the continuous limit
for the discrete random walk (see, e.g., [3]). First, let us introduce the
process e(t) that is named the white noise and satisfies the following

E[e(t)] ¼ 0; E[e2 (t)] ¼ s2 ; E[e(t) e(s)] ¼ 0, if t 6¼ s: (4:2:6)
Hence, the white noise has zero mean and constant variance s2 . The
last condition in (4.2.6) implies that there is no linear correlation
between different observations of the white noise. Such a model repre-
sents an independently and identically distributed process (IID) and is
sometimes denoted IID(0, s2 ). The IID process can still have non-
linear correlations (see Section 5.3). The normal distribution N(0, s2 )
is the special case of the white noise. First, consider a simple discrete
y(k) ¼ y(k À 1) þ e(k) (4:2:7)
where the white noise innovations can take only two values2

D, with probability p, p ¼ const < 1
e(k) ¼ (4:2:8)
ÀD, with probability (1 À p)
Now, let us introduce the continuous process yn (t) within the time
interval t 2 [0, T], such that
yn (t) ¼ y([t=h]) ¼ y([nt=T]), t 2 [0, T] (4:2:9)
34 Stochastic Processes

In (4.2.9), [x] denotes the greatest integer that does not exceed x. The
process yn (t) has the stepwise form: it is constant except the moments
t ¼ kh, k ¼ 1, . . . , n. Mean and variance of the process yn (T) equal
E[yn (T)] ¼ n(2p À 1)D ¼ T(2p À 1)D=h (4:2:10)
Var[yn (T)] ¼ nD2 ¼ TD2 =h (4:2:11)
Both mean (4.2.10) and variance (4.2.11) become infinite in the
limiting case h ! 0 with arbitrary D. Hence, we must impose a rela-
tion between D and h that ensures the finite values of the moments
E[yn (T)] and Var[yn (T)]. Namely, let us set
p¬¬¬ p¬¬¬
p ¼ (1 þ m h=s)=2, D ¼ s h (4:2:12)
where m and s are some parameters. Then
E[yn (T)] ¼ mT, Var[yn (T)] ¼ s2 T (4:2:13)
It can be shown that yn (T) converges to the normal distribution
N(mT, s2 T) in the continuous limit. Hence, m and s are the drift
and diffusion parameters, respectively. Obviously, the drift parameter
differs from zero only when p 6¼ 0:5, that is when there is preference
for one direction of innovations over another. The continuous process
defined with the relations (4.2.13) is named the arithmetic Brownian
motion. It is reduced to the Wiener process when m ¼ 0 and s ¼ 1.
Note that in a more generic approach, the time intervals between
observations of y(t) themselves represent a random variable [4, 5].
While this process (so-called continuous-time random walk) better
resembles the market price variations, its description is beyond the
scope of this book.
In the general case, the arithmetic Brownian motion can be ex-
pressed in the following form
y(t) ¼ m(t)t þ s(y(t), t)W(t) (4:2:14)
The random variable in this process may have negative values. This
creates a problem for describing prices that are essentially positive.
Therefore, the geometric Brownian motion Y(t) ¼ exp [y(t)] is often
used in financial applications.
One can simulate the Wiener process with the following equation
[W(t þ Dt) À W(t)]  DW ¼ N(0, 1) Dt (4:2:15)
Stochastic Processes

While the Wiener process is a continuous process, its innovations are
random. Therefore, the limit of the expression DW=Dt does not
converge when Dt ! 0. Indeed, it follows for the Wiener process that

lim [DW(t)=Dt)] ¼ lim [DtÀ1=2 ] (4:2:16)
Dt!0 Dt!0

As a result, the derivative dW(t)/dt does not exist in the ordinary
sense. Thus, one needs a special calculus to describe the stochastic

The Brownian motion (4.2.14) can be presented in the differential
dy(t) ¼ mdt þ sdW(t) (4:3:1)
The equation (4.3.1) is named the stochastic differential equation.
Note that the term dW(t) ¼ [W(t þ dt) À W(t)] has the following
E[dW] ¼ 0, E[dW dW] ¼ dt, E[dW dt] ¼ 0 (4:3:2)
Let us calculate (dy)2 having in mind (4.3.2) and retaining the terms
(dy)2 ¼ [mdt þ sdW]2 ¼ m2 dt2 þ 2mdt sdW þ s2 dW2 % s2 dt (4:3:3)
It follows from (4.3.3) that while dy is a random variable, (dy)2 is a
deterministic one. This result allows one to derive the Ito™s lemma.
Consider a function F(y, t) that depends on both deterministic, t, and
stochastic, y(t), variables. Let us expand the differential for F(y, t)
into the Taylor series retaining linear terms and bearing in mind
equation (4.3.3)
1 @2F
@F @F
dF(y, t) ¼ dy þ dt þ
2 @y2
@y @t
@F s2 @ 2 F
¼ dy þ þ dt (4:3:4)
2 @ y2
@y @t
The Ito™s expression (4.3.4) has an additional term in comparison with
the differential for a function with deterministic independent vari-
36 Stochastic Processes

s2 @ 2 F
ables. Namely, the term dt has stochastic nature. If y(t) is the
2 @ y2
Brownian motion (4.3.1), then
@F s2 @ 2 F
dF(y, t) ¼ [mdt þ sdW(t)] þ þ dt
2 @ y2
@y @t
@F @F s2 @ 2 F @F
¼m þ þ dt þ s dW(t) (4:3:5)
2 @ y2
@ y @t @y
Let us consider the function F ¼ W2 as a simple example for
employing the Ito™s lemma. In this case, m ¼ 0, s ¼ 1, and equation
(4.3.5) is reduced to
dF ¼ dt þ 2WdW (4:3:6)
Finally, we specify the Ito™s expression for the geometric Brownian
@F @ 2 F @F
motion F ¼ exp [y(t)]. Since in this case, ¼ 2 ¼ F and ¼ 0,
@y @y @t
dF ¼ m þ Fdt þ sFdW(t) (4:3:7)
Hence, if F is the geometric Brownian motion, its relative change, dF/F,
behaves as the arithmetic Brownian motion.
The Ito™s lemma is a pillar of the option pricing theory. It will be
used for deriving the classical Black-Scholes equation in Section 9.4.

Now that the stochastic differential has been introduced, let us
discuss how to perform its integration. First, the Riemann-Stieltjes
integral should be defined. Consider a deterministic function f(t)
on the interval t 2 [0, T]. In order to calculate the Riemann integral
of f(t) over the interval [0, T], this interval is divided into n sub-intervals
t0 ¼ 0 < t1 < . . . < tn ¼ T and the following sum should be computed
Sn ¼ f(ti )(ti À tiÀ1 ) (4:4:1)

where ti 2 [tiÀ1 , ti ]. The Riemann integral is the limit of Sn
Stochastic Processes


f(t)dt ¼ lim Sn , max (ti À tiÀ1 ) ! 0 for all i: (4:4:2)

Note that the limit (4.4.2) exists only if the function f(t) is sufficiently
smooth. Another type of integral is the Stieltjes integral. Let us define
the differential of a function g(x)
dg ¼ g(x þ dx) À g(x) (4:4:3)
Then the Stieltjes integral for the function g(t) on the interval
t 2 [0, T] is defined as
Sn ¼ f(ti )[g(ti ) À g(tiÀ1 )] (4:4:4)

where ti 2 [tiÀ1 , ti ]

f(t)dg(t) ¼ lim Sn , where max (ti À tiÀ1 ) ! 0 for all i: (4:4:5)
If g(t) has a derivative, then dg % dt ¼ g0 (t)dt, and the sum (4.4.4)
can be written as
f(ti )g0 (ti )(ti À tiÀ1 )
Sn ¼ (4:4:6)

Similarity between the Riemann sum (4.4.1) and the Stieltjes sum
(4.4.6) leads to the definition of the Riemann-Stieltjes integral. The
Riemann-Stieltjes integral over the deterministic functions does not
depend on the particular choice of the point ti within the intervals
[tiÀ1 , ti ]. However, if the function f(t) is random, the sum Sn does
depend on the choice of ti . Consider, for example, the sum (4.4.4) for
the case f(t) ¼ g(t) ¼ W(t) (where W(t) is the Wiener process). It
follows from (4.1.16) that
" #
E[Sn ] ¼ E W(ti ){W(ti ) À W(tiÀ1 )}
n n
¼ [ min (ti , ti ) À min (ti , tiÀ1 )] ¼ (ti À tiÀ1 ) (4:4:7)
i¼1 i¼1
38 Stochastic Processes

Let us set for all i
ti ¼ ati þ (1 À a)tiÀ1 0 1 (4:4:8)
Substitution of (4.4.8) into (4.4.7) leads to E[Sn ] ¼ aT. Hence, the
sum (4.4.7) depends on the arbitrary parameter a and therefore can
have any value. Within the Ito™s formalism, the value a ¼ 0 is chosen,
so that ti ¼ tiÀ1 . The stochastic Ito™s integral is defined as
f(t)dW(t) ¼ msÀlim f(tiÀ1 )[W(ti ) À W(tiÀ1 )] (4:4:9)

The notation ms-lim stands for the mean-square limit. It means that
the difference between the Ito integral in the left-hand side of (4.4.9)
and the sum in the right-hand side of (4.4.9) has variance that ap-
proaches zero as n increases to infinity. Thus, (4.4.9) is equivalent to
2T 32
° Xn
lim E4 f(t)dW(t) À f(tiÀ1 ){W(ti ) À W(tiÀ1 )}5 ¼ 0 (4:4:10)

Let us consider the integral

I(t2 , t1 ) ¼ W(t)dW(t) (4:4:11)

as an example of calculating the Ito™s integral. If the function W(t) is
deterministic, then the Riemann-Stieltjes integral IRÀS (t2 , t1 ) equals
IRÀS (t2 , t1 ) ¼ 0:5[W(t2 )2 À W(t1 )2 ] (4:4:12)
However, when W(t) is the Wiener process, the Ito™s integral II (t2 , t1 )
leads to a somewhat unexpected result
II (t2 , t1 ) ¼ 0:5[W(t2 )2 À W(t1 )2 À (t2 À t1 )] (4:4:13)
This follows directly from equation (4.3.6). Obviously, the result
(4.4.13) can be derived directly from the definition of the Ito™s integral
(see Exercise 1). Note that the mean of the Ito™s integral (4.4.11)
equals zero
E[II (t2 , t1 )] ¼ 0 (4:4:14)
Stochastic Processes

The difference between the right-hand sides of (4.4.12) and (4.4.13) is
determined by the particular choice of a ¼ 0 in (4.4.8). Stratonovich
has offered another definition of the stochastic integral by choosing
a ¼ 0:5. In contrast to equation (4.4.9), the Stratonovich™s integral is
defined as
X tiÀ1 þ ti 
f(t)dW(t) ¼ msÀlim [W(ti ) À W(tiÀ1 )] (4:4:15)

For the integrand in (4.4.11), the Stratonovich™s integral IS (t2 , t1 )
coincides with the Riemann-Stieltjes integral
IS (t2 , t1 ) ¼ 0:5[W(t2 )2 À W(t1 )2 ] (4:4:16)
Both Ito™s and Stratonovich™s formulations can be transformed into
each other. In particular, the Ito™s stochastic differential equation (4.3.1)
dyI (t) ¼ mdt þ sdW(t) (4:4:17)
is equivalent to the Stratonovich™s equation
dyS (t) ¼ m À 0:5s dt þ sdW(t) (4:4:18)
The applications of stochastic calculus in finance are based almost
exclusively on the Ito™s theory. Consider, for example, the integral


If no correlation between the function s(t) and the innovation dW(t)
is assumed, then the Ito™s approximation is a natural choice. In this
case, the function s(t) is said to be a nonanticipating function [1, 2].
However, if the innovations dW(t) are correlated (so-called non-white
noise), then the Stratonovich™s approximation appears to be an ad-
equate theory [1, 6].

The martingale methodology plays an important role in the
modern theory of finance [2, 7, 8]. Martingale is a stochastic process
X(t) that satisfies the following condition
40 Stochastic Processes

E[X(t þ 1)jX(t), X(t À 1), . . . ] ¼ X(t) (4:5:1)

The equivalent definition is given by

E[X(t þ 1) À X(t)jX(t), X(t À 1), . . . ] ¼ 0 (4:5:2)
Both these definitions are easily generalized for the continuum pre-
sentation where the time interval, dt, between two sequent moments
t þ 1 and t approaches zero (dt ! 0). The notion of martingale is
rooted in the gambling theory. It is closely associated with the notion
of fair game, in which none of the players has an advantage. The
condition (4.5.1) implies that the expectation of the gamer wealth at
time t þ 1 conditioned on the entire history of the game is equal to the
gamer wealth at time t. Similarly, equation (4.5.2) means that the
expectation to win at every round of the game being conditioned on
the history of the game equals zero. In other words, martingale has no
trend. A process that has positive trend is named submartingale.
A process with negative trend is called supermartingale.
The martingale hypothesis applied to the asset prices states that the
expectation of future price is simply the current price. This assumption
is closely related to the Efficient Market Hypothesis discussed in
Section 2.3. Generally, the asset prices are not martingales for they
incorporate risk premium. Indeed, there must be some reward offered
to investors for bearing the risks associated with keeping the assets. It
can be shown, however, that the prices with discounted risk premium
are martingales [3].
The important property of the Ito™s integral is that it is martingale.
Consider, for example, the integral (4.4.19) approximated with the
sum (4.4.9). Because the innovations dW(t) are unpredictable, it
follows from (4.4.14) that
2tþDt 3
E4 s(z)dW(z)5 ¼ 0 (4:5:3)

2tþDt 3t
° °
E4 s(z)dW(z)5 ¼ s(z)dW(z) (4:5:4)
0 0
Stochastic Processes

and the integral (4.4.19) satisfies the martingale definition. Note that
for the Brownian motion with drift (4.2.14)
2 3

E[y(t þ dt)] ¼ E4y(t) þ dy5 ¼ y(t) þ mdt (4:5:5)

Hence, the Brownian motion with drift is not a martingale. However,
the process
z(t) ¼ y(t) À mt (4:5:6)
is a martingale since
E[z(t þ dt)] ¼ z(t) (4:5:7)
This result follows also from the Doob-Meyer decomposition theorem,
which states that a continuous submartingale X(t) at 0 t 1 with
finite expectation E[X(t)] < 1 can be decomposed into a continuous
martingale and an increasing deterministic process.

Theory and applications of the stochastic processes in natural
sciences are described in [1, 6]. A good introduction to the stochastic
calculus in finance is given in [2]. For a mathematically inclined
reader, the presentation of the stochastic theory with increasing
level of technical details can be found in [7, 8].

1. Simulate daily price returns using the geometric Brownian
motion (4.3.7) for four years. Use equation (4.2.15) for approxi-
mating DW. Assume that S(0) ¼ 10, m ¼ 10%, s ¼ 20% (m and
s are given per annum). Assume 250 working days per annum.
2. Prove that
° °
t2 t2
1 n
W(s)n dW(s) ¼ [W(t2 )nþ1 À W(t1 )nþ1 ] À W(s)nÀ1 ds
nþ1 2
t1 t1

Hint: Calculate d(Wnþ1 ) using the Ito™s lemma.
42 Stochastic Processes

3. Solve the Ornstein-Uhlenbeck equation that describes the mean-
reverting process in which the solution fluctuates around its
dX ¼ ÀmXdt þ sdW, m > 0
Hint: introduce the variable Y ¼ X exp (mt).
*4. Derive the integral (4.4.13) directly from the definition of the
Ito™s integral (4.4.9).
Chapter 5

Time Series Analysis

Time series analysis has become an indispensable theoretical tool in
financial and economic research. Section 5.1 is devoted to the com-
monly used univariate autoregressive and moving average models.
The means for modeling trends and seasonality effects are described
in Section 5.2. The processes with non-stationary variance (condi-
tional heteroskedasticity) are discussed in Section 5.3. Finally,
the specifics of the multivariate time series are introduced in
Section 5.4.

First, we shall consider a univariate time series y(t) for a process
that is observed at moments t ¼ 0, 1, . . . , n (see, e.g., [1, 2]). The time
series in which the observation at moment t depends linearly on
several lagged observations at moments t À 1, t À 2, . . . , t À p

y(t) ¼ a1 y(t À 1) þ a2 y(t À 2) þ . . . þ ap y(t À p) þ e(t), t > p (5:1:1)

is called the autoregressive process of order p, or AR(p). The term e(t) in
(5.1.1) is the white noise that satisfies the conditions (4.2.6). The lag

44 Time Series Analysis

operator Lp ¼ y(t À p) is often used for describing time series. Note that
L0 ¼ y(t). Equation (5.1.1) in terms of the lag operator has the form
Ap (L)y(t) ¼ e(t) (5:1:2)
Ap (L) ¼ 1 À a1 L À a2 L2 À . . . À ap Lp (5:1:3)
The operator Ap (L) is called the AR polynomial in lag operator of
order p. Let us consider AR(1) that starts with a random shock. Its
definition implies that
y(0) ¼ e(0), y(1) ¼ a1 y(0) þ e(1),
y(2) ¼ a1 y(1) þ e(2) ¼ a1 2 e(0) þ a1 e(1) þ e(2), . . .
Hence, by induction,
a1 i e(t À i)
y(t) ¼ (5:1:4)

Mean and variance of AR(1) equal, respectively
E[y(t)] ¼ 0, Var[y(t)] ¼ s2 =(1 À a1 2 ), (5:1:5)
Obviously, the contributions of the ˜˜old™™ noise converge with time to
zero when ja1 j < 1. As a result, this process does not drift too far from
its mean. This feature is named mean reversion.
The process with a1 ¼ 1 is called the random walk
y(t) ¼ y(t À 1) þ e(t) (5:1:6)
In this case, equation (5.1.4) reduces to
y(t) ¼ e(t À i)

The noise contributions to the random walk do not weaken with time.
Therefore, the random walk does not exhibit mean reversion. Now,
consider the process that represents the first difference
x(t) ¼ y(t) À y(t À 1) ¼ e(t) (5:1:7)
Obviously, past noise has only transitory character for the process
x(t). Therefore, x(t) is mean-reverting. Some processes must be
Time Series Analysis

differenced several times in order to exclude non-transitory noise
shocks. The processes differenced d times are named integrated of
order d and denoted as I(d). The differencing operator is used for
describing an I(d) process

Di d ¼ (1 À Li )d , j, d ¼ . . . , À2, À1, 0, 1, 2 . . . (5:1:8)
If an I(d) process can be reduced to AR(p) process while applying the
differencing operator, it is named ARI(p, d) process and has the form:
D1 d y(t) À a1 D1 d y(t À 1) À . . . À ap D1 d y(t À p) ¼ e(t), t ! p þ d
Note that differencing a time series d times reduces the number of
independent variables by d, so that the total number of independent
variables in ARI(p, d) within the sample with n observations equals
n À p À d.
The unit root is another notion widely used for discerning perman-
ent and transitory effects of random shocks. It is based on the roots of
the characteristic polynomial for the AR(p) model. For example,
AR(1) has the characteristic polynomial
1 À a1 z ¼ 0 (5:1:10)
If a1 ¼ 1, then z ¼ 1 and the characteristic polynomial has the
unit root. In general, the characteristic polynomial roots can have
complex values. The solution to equation (5.1.10) is outside the unit
circle (i.e., z > 1) when a1 < 1. It can be shown that all solutions for
AR(p) are outside the unit circle when
1 À a1 z À a2 z2 À . . . À ap zp ¼ 0 (5:1:11)

A model more general than AR(p) contains both lagged observa-
tions and lagged noise
y(t) ¼ a1 y(t À 1) þ a2 y(t À 2) þ . . . þ ap y(t À p) þ e(t)
þ b1 e(t À 1) þ b2 e(t À 2) þ . . . þ bq e(t À q) (5:1:12)
This model is called autoregressive moving average model of order
(p,q), or simply ARMA(p,q). Sometimes modeling of empirical data
46 Time Series Analysis

requires AR(p) with a rather high number p. Then, ARMA(p, q) may
be more efficient in that the total number of its terms (p þ q) needed
for given accuracy is lower than the number p in AR(p). ARMA(p, q)
can be expanded into the integrated model, ARIMA(p, d, q), similar
to the expansion of AR(p) into ARI(p, d). Neglecting the autoregres-
sive terms in ARMA(p, q) yields a ˜˜pure™™ moving average model
y(t) ¼ e(t) þ b1 e(t À 1) þ b2 e(t À 2) þ . . . þ bq e(t À q) (5:1:13)
MA(q) can be presented in the form
y(t) ¼ Bq (L)e(t) (5:1:14)
where Bq (L) is the MA polynomial in lag operator
Bq (L) ¼ 1 þ b1 L þ b2 L2 þ . . . þ bq Lq (5:1:15)
The moving average model does not depend explicitly on the lagged
values of y(t). Yet, it is easy to show that this model implicitly
incorporates the past. Consider, for example, the MA(1) model
y(t) ¼ e(t) þ b1 e(t À 1) (5:1:16)
with e(0) ¼ 0. For this model,
y(1) ¼ e(1), y(2) ¼ e(2) þ b1 e(1) ¼ e(2) þ b1 y(1),
y(3) ¼ e(3) þ b1 e(2) ¼ e(3) þ b1 (y(2) À b1 y(1)), . . .
Thus, the general result for MA(1) has the form

y(t)(1 À b1 L þ b1 L2 À b1 L3 þ . . . ) ¼ e(t) (5:1:17)
Equation (5.1.17) can be viewed as the AR(1) process, which illus-
trates that the MA model does depend on past.
The MA(q) model is invertible if it can be transformed into an
AR(1) model. It can be shown that MA(q) is invertible if all solu-
tions to the equation

1 þ b1 z þ b2 z2 þ . . . þ bq zq ¼ 0 (5:1:18)
are outside the unit circle. In particular, MA(1) is invertible if
jb1 j < 1. If the process y(t) has a non-zero mean value m, then the
AR(1) model can be presented in the following form
Time Series Analysis

y(t) À m ¼ a1 [y(t À 1) À m] þ e(t) ¼ c þ a1 y(t À 1) þ e(t) (5:1:19)
In (5.1.19), intercept c equals:
c ¼ m(1 À a1 ) (5:1:20)
The general AR(p) model with a non-zero mean has the following
Ap (L)y(t) ¼ c þ e(t), c ¼ m(1 À a1 À . . . ap ) (5:1:21)
Similarly, the intercept can be included into the general moving
average model MA(q)
y(t) ¼ c þ Bp (L)e(t), c ¼ m (5:1:22)
Note that mean of the MA model coincides with its intercept because
mean of the white noise is zero.


Now, let us introduce the autocorrelation function (ACF) for pro-
cess y(t)
r(k) ¼ g(k)=g(0) (5:1:23)
where g(k) is the autocovariance of order k
g(k) ¼ E[y(t) À m)(y(t À k) À m)] (5:1:24)
The autocorrelation functions may have some typical patterns, which
can be used for identification of empirical time series [2]. The obvious
properties of ACF are
r(0) ¼ 1, À 1 < r(k) < 1 for k 6¼ 0 (5:1:25)
ACF is closely related to the ARMA parameters. In particular, for
r(1) ¼ a1 (5:1:26)
The ACF of the first order for MA(1) equals
r(1) ¼ b1 =(b1 2 þ 1) (5:1:27)
The right-hand side of the expression (5.1.27) has the same value for
the inverse transform b1 ! 1=b1 . For example, two processes
48 Time Series Analysis

x(t) ¼ e(t) þ 2e(t À 1)
y(t) ¼ e(t) þ 0:5e(t À 1)
have the same r(1). Note, however, that y(t) is an invertible process
while x(t) is not.
ARMA modeling is widely used for forecasting. Consider a fore-
cast of a variable y(t þ 1) based on a set of n variables x(t) known at
moment t. This set can be just past values of y, that is,
y(t), y(t À 1), . . . , y(t À n þ 1). Let us denote the forecast with
^(t þ 1jt). The quality of forecast is usually defined with the some
loss function. The mean squared error (MSE) is the conventional loss
function in many applications
MSE(^(t þ 1jt)) ¼ E[(y(t þ 1)À^(t þ 1jt))2 ]
y y (5:1:28)
The forecast that yields the minimum of MSE turns out to be the
expectation of y(t þ 1) conditioned on x(t)
^(t þ 1jt) ¼ E[y(t þ 1)jx(t)]
y (5:1:29)
In the case of linear regression
y(t þ 1) ¼ b0 x(t) þ e(t) (5:1:30)
MSE is reduced to the ordinary least squares (OLS) estimate for b.
For a sample with T observations,
x(t)x0 (t)
b¼ x(t)y(t þ 1)= (5:1:31)
t¼1 t¼1

Another important concept in the time series analysis is the maximum
likelihood estimate (MLE) [2]. Consider the general ARMA model
(5.1.12) with the white noise (4.2.6). The problem is how to estimate
the ARMA parameters on the basis of given observations of y(t). The
idea of MLE is to find such a vector r0 ¼ (a1 , . . . , ap , . . . ,
b1 , . . . , bq , s2 ) that maximizes the likelihood function for given ob-
servations (y1 , y2 , . . . , yT )
f1 , 2 , . . . , T (y1 , y2 , . . . , yT ; r0 ) (5:1:32)
The likelihood function (5.1.32) has the sense of probability of ob-
serving the data sample (y1 , y2 , . . . , yT ). In this approach, the ARMA
model and the probability distribution for the white noise should be
Time Series Analysis

specified first. Often the normal distribution leads to reasonable
estimates even if the real distribution is different. Furthermore, the
likelihood function must be calculated for the chosen ARMA model.
Finally, the components of the vector r0 must be estimated. The latter
step may require sophisticated numerical optimization technique.
Details of implementation of MLE are discussed in [2].

Finding trends is an important part of the time series analysis.
Presence of trend implies that the time series has no mean reversion.
Moreover, mean and variance of a trending process depend on the
sample. The time series with trend is named non-stationary. If a
process y(t) is stationary, its mean, variance, and autocovariance are
finite and do not depend on time. This implies that autocovariance
(5.1.24) depends only on the lag parameter k. Such a definition of
stationarity is also called covariance-stationarity or weak stationarity
because it does not impose any restrictions on the higher moments of
the process. Strict stationarity implies that higher moments also do
not depend on time. Note that any MA process is covariance-station-
ary. However, the AR(p) process is covariance-stationary only if the
roots of its polynomial are outside the unit circle.
It is important to discern deterministic trend and stochastic trend.
They have a different nature yet their graphs may look sometimes
very similar [1]. Consider first the AR(1) model with the deterministic
y(t) À m À ct ¼ a1 (y(t À 1) À m À c(t À 1)) þ e(t) (5:2:1)
Let us introduce z(t) ¼ y(t) À m À ct. Then equation (5.2.1) has the
a1 tÀi e(t)
z(t) ¼ a1 z(0) þ (5:2:2)

where z(0) is a pre-sample starting value of z. Obviously, the random
shocks are transitory if ja1 j < 1. The trend incorporated in the defin-
ition of z(t) is deterministic when ja1 j < 1. However, if a1 ¼ 1, then
equation (5.2.1) has the form
50 Time Series Analysis

y(t) ¼ c þ y(t À 1) þ e(t) (5:2:3)
The process (5.2.3) is named the random walk with drift. In this case,
equation (5.2.2) is reduced to
z(t) ¼ z(0) þ (5:2:4)

The sum of non-transitory shocks in the right-hand side of equation
(5.2.4) is named stochastic trend. Consider, for example, the determin-
istic trend model with m ¼ 0 and e(t) ¼ N(0, 1)
y(t) ¼ 0:1t þ e(t) (5:2:5)
and the stochastic trend model
y(t) ¼ 0:1 þ y(t À 1) þ e(t), y(0) ¼ 0 (5:2:6)
As it can be seen from Figure 5.1, both graphs look similar. In
general, however, the stochastic trend model can deviate from the
deterministic trend for a long time.
Stochastic trend implies that the process is I(1). Then the lag
polynomial (5.1.3) can be represented in the form




0 10 20 30 40
Figure 5.1 Deterministic and stochastic trends: A - equation (5.2.5), B -
equation (5.2.6).
Time Series Analysis

Ap (L) ¼ (1 À L)ApÀ1 (L) (5:2:7)
Similarly, the process I(2) has the lag polynomial

Ap (L) ¼ (1 À L)2 ApÀ2 (L) (5:2:8)

and so on. The standard procedure for testing presence of the unit
root in time series is the Dickey-Fuller method [1, 2]. This method is
implemented in major econometric software packages (see the Section
Seasonal effects may play an important role in the properties of time
series. Sometimes, there is a need to eliminate these effects in order to
focus on the stochastic specifics of the process. Various differencing
filters can be used for achieving this goal [2]. In other cases, seasonal
effect itself may be the object of interest. The general approach for
handling seasonal effects is introducing dummy parameters D(s, t)
where s ¼ 1, 2, . . . , S; S is the number of seasons. For example,
S ¼ 12 is used for modeling the monthly effects. Then the parameter
D(s, t) equals 1 at a specific season s and equals zero at all other
seasons. The seasonal extension of an ARMA(p,q) model has the
following form
y(t) ¼ a1 y(t À 1) þ a2 y(t À 2) þ . . . þ ap y(t À p) þ e(t)
þb1 e(t À 1) þ b2 e(t À 2) þ . . . þ bq e(t À q) þ ds D(s, t) (5:2:9)

Note that forecasting with the model (5.2.9) requires estimating
(p þ q þ S) parameters.

So far, we considered random processes with the white noise (4.2.6)
that are characterized with constant unconditional variance. Condi-
tional variance has not been discussed so far. In general, the processes
with unspecified conditional variance are named homoskedastic.
Many random time series are not well described with the IID process.
In particular, there may be strong positive autocorrelation in squared
asset returns. This means that large returns (either positive or nega-
tive) follow large returns. In this case, it is said that the return
52 Time Series Analysis

volatility is clustered. The effect of volatility clustering is also called
autoregressive conditional heteroskedasticity (ARCH). It should be
noted that small autocorrelation in squared returns does not neces-
sarily mean that there is no volatility clustering. Strong outliers that
lead to high values of skewness and kurtosis may lower autocorrela-
tion. If these outliers are removed from the sample, volatility cluster-
ing may become apparent [3].
Several models in which past shocks contribute to the current
volatility have been developed. Generally, they are rooted in the
ARCH(m) model where the conditional variance is a weighed sum
of m squared lagged returns
s2 (t) ¼ v þ a1 e2 (t À 1) þ a2 e2 (t À 2) þ . . . þ am e2 (t À m) (5:3:1)
In (5.3.1), e(t) $ N(0, s2 (t)), v > 0, a1 , . . . , am ! 0. Unfortunately,
application of the ARCH(m) process to modeling the financial time
series often requires polynomials with high order m. A more efficient
model is the generalized ARCH (GARCH) process. The GARCH
(m, n) process combines the ARCH(m) process with the AR(n) pro-
cess for lagged variance
s2 (t) ¼ v þ a1 e2 (t À 1) þ a2 e2 (t À 2) þ . . . þ am e2 (t À m)
þ b1 s2 (t À 1) þ b2 s2 (t À 2) þ . . . þ bn s2 (t À n) (5:3:2)
The simple GARCH(1, 1) model is widely used in applications
s2 (t) ¼ v þ ae2 (t À 1) þ bs2 (t À 1) (5:3:3)
Equation (5.3.3) can be transformed into
s2 (t) ¼ v þ (a þ b)s2 (t À 1) þ a[e2 (t) À s2 (t À 1)] (5:3:4)
The last term in equation (5.3.4) is conditioned on information avail-
able at time (t À 1) and has zero mean. This term can be treated as a
shock to volatility. Therefore, the unconditional expectation of vola-
tility for the GARCH(1, 1) model equals
E[s2 (t)] ¼ v=(1 À a À b) (5:3:5)
This implies that the GARCH(1, 1) process is weakly stationary when
a þ b < 1. The advantage of the stationary GARCH(1, 1) model is
that it can be easily used for forecasting. Namely, the conditional
expectation of volatility at time (t þ k) equals [4]
Time Series Analysis

E[s2 (t þ k)] ¼ (a þ b)k [s2 (t) À v=(1 À a À b)] þ v=(1 À a À b) (5:3:6)
The GARCH(1, 1) model (5.3.4) can be rewritten as
s2 (t) ¼ v=(1 À b) þ a(e2 (t À 1) þ be2 (t À 2) þ b2 e2 (t À 3) þ . . . ) (5:3:7)
Equation (5.3.7) shows that the GARCH(1, 1) model is equivalent to
the infinite ARCH model with exponentially weighed coefficients.
This explains why the GARCH models are more efficient than the
ARCH models.
Several GARCH models have been described in the econometric
literature [1“3]. One popular GARCH(1, 1) model with a þ b ¼ 1 is
called integrated GARCH (IGARCH). It has the autoregressive unit
root. Therefore volatility of this process follows random walk and can
be easily forecasted
E[s2 (t þ k)] ¼ s2 (t) þ kv (5:3:8)
IGARCH can be presented in the form
s2 (t) ¼ v þ (1 À l)e2 (t À 1) þ ls2 (t À 1) (5:3:9)
where 0 < l < 1. If v ¼ 0, IGARCH coincides with the exponentially
weighed moving average (EWMA)
liÀ1 e2 (t À i)
s (t) ¼ (1 À l) (5:3:10)

Indeed, the n-period EWMA for a time series y(t) is defined as
z(t) ¼ [y(t À 1) þ ly(t À 2) þ l2 y(t À 3) þ . . .þ
nÀ1 n
y(t À n)]=(1 þ l þ . . . þ l )
where 0 < l < 1. For large n, the denominator of (5.3.11) converges
to 1=(1 À l). Then for z(t) ¼ s2 (t) and y(t) ¼ e2 (t), equation (5.3.11) is
equivalent to equation (5.3.7) with v ¼ 0.
The GARCH models discussed so far are symmetric in that the
shock sign does not affect the resulting volatility. In practice, how-
ever, negative price shocks influence volatility more than the positive
shocks. A noted example of the asymmetric GARCH model is the
exponential GARCH (EGARCH) (see, e.g., [3]). It has the form
log [s2 (t)] ¼ v þ b log [s2 (t À 1)] þ lz(t À 1)þ
p¬¬¬¬¬¬¬¬ (5:3:12)
g(jz(t À 1)j À 2=p)
54 Time Series Analysis

where z(t) ¼ e(t)=s(t). Note that E[z(t)] ¼ 2=p. Hence, the last term
in (5.3.12) is the mean deviation of z(t). If g > 0 and l < 0, negative
shocks lead to higher volatility than positive shocks.

Often the current value of a variable depends not only on its past
values, but also on past and/or current values of other variables.
Modeling of dynamic interdependent variables is conducted with
multivariate time series. The multivariate models yield not only new
implementation problems but also some methodological difficulties.
In particular, one should be cautious with simple regression models
y(t) ¼ ax(t) þ e(t) (5:4:1)
that may lead to spurious results. It is said that (5.4.1) is a simultan-
eous equation as both explanatory (x) and dependent (y) variables are
present at the same moment of time. A notorious example for spuri-
ous inference is the finding that the best predictor in the United
Nations database for the Standard & Poor™s 500 stock index is
production of butter in Bangladesh [5].
A statistically sound yet spurious relationship is named data
snooping. It may appear when the data being the subject of research
are used to construct the test statistics [4]. Another problem with
simultaneous equations is that noise can be correlated with the ex-
planatory variable, which leads to inaccurate OLS estimates of the
regression coefficients. Several techniques for handling this problem
are discussed in [2].
A multivariate time series y(t) ¼ (y1 (t), y2 (t), . . . , yn (t))0 is a vector
of n processes that have data available for the same moments of time.
It is supposed also that all these processes are either stationary or
have the same order of integration. In practice, the multivariate
moving average models are rarely used due to some restrictions [1].
Therefore, we shall focus on the vector autoregressive model (VAR)
that is a simple extension of the univariate AR model to multivariate
time series. Consider a bivariate VAR(1) process
y1 (t) ¼ a10 þ a11 y1 (t À 1) þ a12 y2 (t À 1) þ e1 (t)
y2 (t) ¼ a20 þ a21 y1 (t À 1) þ a22 y2 (t À 1) þ e2 (t) (5:4:2)
Time Series Analysis

that can be presented in the matrix form
y(t) ¼ a0 þ Ay(t À 1) þ «(t) (5:4:3)
In (5.4.3), y(t) ¼ (y1 (t), y2 (t))0 , a0 ¼ (a10 , a20 )0 , «(t) ¼ (e1 (t), e2 (t))0 ,
a11 a12
and A ¼ .
a21 a22
The right-hand sides in example (5.4.2) depend on past values only.
However, dependencies on current values can also be included (so-
called simultaneous dynamic model [1]). Consider the modification of
the bivariate process (5.4.2)
y1 (t) ¼ a11 y1 (t À 1) þ a12 y2 (t) þ e1 (t)
y2 (t) ¼ a21 y1 (t) þ a22 y2 (t À 1) þ e2 (t) (5:4:4)
The matrix form of this process is
Àa12 y1 (t) y1 (t À 1)
1 a11 0 e1 (t)
¼ þ (5:4:5)
Àa21 0 a22 y2 (t À 1)
1 y2 (t) e2 (t)
Multiplying both sides of (5.4.5) with the inverse of the left-hand
matrix yields
a12 a22 y1 (t À 1)
y1 (t) a11
¼ (1 À a12 a21 )À1
y2 (t À 1)
y2 (t) a11 a21 a22
1 a12 e1 (t)
þ (1 À a12 a21 )À1 (5:4:6)
a21 1 e2 (t)
Equation (5.4.6) shows that the simultaneous dynamic models can
also be represented in the VAR form.
In the general case of n-variate time series, VAR(p) has the form [2]
y(t) ¼ a0 þ A1 y(t À 1) þ . . . þ Ap y(t À p) þ «(t) (5:4:7)
where y(t), a0 , and «(t) are n-dimensional vectors and Ai (i ¼ 1, . . . , p)
are n x n matrices. Generally, the white noises «(t) are mutually
independent. Let us introduce

Ap (L) ¼ In À A1 L À . . . À Ap Lp (5:4:8)
where In is the n-dimensional unit vector. Then equation (5.4.7) can
be presented as
56 Time Series Analysis

Ap (L)y(t) ¼ a0 þ «(t) (5:4:9)
Two covariance-stationary processes x(t) and y(t) are jointly covar-
iance-stationary if their covariance Cov(x(t), y(t À s)) depends on lag
s only. The condition for the covariance-stationary VAR(p) is the
generalization of (5.1.11) for AR(p). Namely, all values of z satisfying
the equation
jIn À A1 z À . . . À Ap zp j ¼ 0 (5:4:10)
must lie outside the unit circle. Equivalently, all solutions of the

jIn lp À A1 lpÀ1 À . . . À Ap j ¼ 0 (5:4:11)
must satisfy the condition jlj < 1.
The problem of whether the lagged values of process y can improve
prediction of process x (so-called Granger causality) is often posed in
forecasting. It is said that if y fails to Granger-cause x, then the
following condition holds for all s > 0
MSE(E[x(t þ s)jx(t), x(t À 1), . . . ]) ¼
MSE(E[x(t þ s)jx(t), x(t À 1), . . . , y(t), y(t À 1), . . . ]) (5:4:12)
In this case, y is called exogenous variable with respect to x. For
example, y2 (t) is exogenous with respect to y1 (t) in (5.4.2) if a12 ¼ 0.
General methods for testing the Granger causality are described in [2].
The last notion that is introduced in this section is cointegration.
Two processes are cointegrated if they both have unit roots (i.e., they
both are I(1) ), but some linear combination of these processes is
stationary (i.e., is I(0) ). This definition can be extended to an arbi-
trary number of processes. As an example, consider a bivariate model
y1 (t) ¼ ay2 (t) þ e1 (t)
y2 (t) ¼ y2 (t À 1) þ e2 (t) (5:4:13)
Both processes y1 (t) and y2 (t) are random walks. However the process
z(t) ¼ y1 (t) À ay2 (t) (5:4:14)
is stationary. Details of testing the integration hypothesis are de-
scribed in [2]. Implications of cointegration in financial data analysis
are discussed in [3].
Time Series Analysis

A good concise introduction into the time series analysis is given by
Franses [1]. The comprehensive presentation of the subject can be
found in monographs by Hamilton [2] and Green [6]. Important
specifics of time series analysis in finance, particularly for analysis
and forecasting of volatility, are discussed by Alexander in [3]. In this
chapter, only time series on homogenous grids were considered. Spe-
cifics of analysis of tick-by-tick data on non-homogenous grids are
discussed in [7]. It should be noted that the exercises with the econo-
metric software packages are very helpful for learning the subject.
Besides the generic scientific software such as SAS, Splus, and Matlab
that have modules for the time series analysis, several econometric
software packages are available: PCGive, RATS, Shazam, and TSP.
While these packages may have the trial and student versions, Easy-
Reg offered by H. J. Bierens5 has sufficient capability for an intro-
ductory course and is free of charge.

1. Verify equations (5.1.25)“(5.1.27).
2. Verify if the process y(t) ¼ 1:2y(t À 1) À 0:32y(t À 2) þ e(t)
(where e(t) is IID) is covariance-stationary.
3. Estimate the linear dividend growth rate from the dividends
paid in the last years (verify these data on the AMEX website:
http://www.amex.com): 2000 “ $1.51, 2001 “ $1.42, 2002 “ $1.50,
and 2003 “ $1.63.
4. Verify equation (5.4.6) for the processes (5.4.4).
This page intentionally left blank
Chapter 6


In short, fractals are the geometric objects that are constructed by
repeating geometric patterns at a smaller and smaller scale. The
fractal theory is a beautiful theory that describes beautiful objects.
Development of the fractal theory and its financial applications has
been greatly influenced by Mandelbrot [1]. In this chapter, a short
introduction to the fractal theory relevant to financial applications is
given. In Section 6.1, the basic definitions of the fractal theory are
provided. Section 6.2 is devoted to the concept of multifractals that
has been receiving a lot of attention in the recent research of the
financial time series.

Self-similarity is the defining property of fractals. This property
implies that the geometric patterns are isotropic, meaning shape
transformations along all coordinate axes are the same. If the geo-
metric patterns are not isotropic, say the object is contracted along
the y-axis with a scale different from that of along the x-axis, it is said
that the object is self-affine. The difference between self-similarity and
self-affinity is obvious for geometric objects. However, only self-
affinity is relevant for the graphs of financial time series [1]. Indeed,
since time and prices are measured with different units, their scaling
factors cannot be compared.

60 Fractals



Figure 6.1 Deterministic (a) and stochastic (b) fractals with the same
fractal dimension D ¼ ln(5)/ln(3).

If the geometric pattern used in fractal design is deterministic, the
resulting object is named a deterministic fractal. Consider an example
in path (a) of Figure 6.1 where a square is repeatedly divided into nine
small squares and four of them that have even numbers are deleted
(the squares are numerated along rows). If four squares are deleted at
random, one obtains a random fractal (one of such fractals is depicted
in path (b) of Figure 6.1). While the deterministic and stochastic
fractals in Figure 6.1 look quite different, they have the same fractal
dimension. Let us outline the physical sense of this notion.
Consider a jagged line, such as a coastline. It is embedded into a
plane. Thus, its dimension is lower than two. Yet, the more zigzagged
the line is, the greater part of plane it covers. One may then expect
that the dimension of a coastline is higher than one and it depends on
a measure of jaggedness. Another widely used example is a crumpled
paper ball. It is embedded in three-dimensional space. Yet, the

volume of a paper ball depends on the sizes of its creases. Therefore,
its dimension is expected to be in the range of two to three. Thus, we
come to the notion of the fractal (non-integer) dimension for objects
that cannot be accurately described within the framework of Eucli-
dian geometry.
There are several technical definitions for the fractal dimension [2].
The most popular one is the box-counting dimension. It implies map-
ping the grid boxes of size h (e.g., squares and cubes for the two-
dimensional and the three-dimensional spaces, respectively) onto the
object of interest. The number of boxes that fill the object is
N(h) $ hÀD . The fractal dimension D is then the limit
D ¼ lim [ ln N(h)= ln (1=h)] (6:1:1)

The box-counting dimension has another equivalent definition with
the fixed unit size of the grid box and varying object size L
D ¼ lim [ ln N(L)= ln (L)] (6:1:2)

The fractal dimension for both deterministic and stochastic fractals in
Figure 6.1 equals D ¼ ln (5)= ln (3) % 1:465. Random fractals exhibit
self-similarity only in a statistical sense. Therefore, the scale invari-
ance is a more appropriate concept for random fractals than self-
The iterated function systems are commonly used for generating
fractals. The two-dimensional iterated function algorithm for N fixed
points can be presented as
X(k þ 1) ¼ rX(k) þ (1 À r)XF (i)
Y(k þ 1) ¼ rY(k) þ (1 À r)YF (i) (6:1:3)
In (6.1.3), r is the scaling parameter; XF (i) and YF (i) are the coordin-
ates of the fixed point i; i ¼ 1, 2, . . . N. The fixed point i is selected at
every iteration at random. A famous example with N ¼ 3, the Sier-
pinski triangle, is shown in Figure 6.2.
Now, let us turn to the random processes relevant to financial time
series. If a random process X(t) is self-affine, then it satisfies the
scaling rule
X(ct) ¼ cH X(t) (6:1:4)
62 Fractals

Figure 6.2 The Sierpinski triangle with r ¼ 0:5.

The parameter H is named the Hurst exponent. Let us introduce the
fractional Brownian motion BH (t). This random process satisfies
the following conditions for all t and T [1]
E[BH (t þ T) À BH (t)] ¼ 0, (6:1:5)
E[BH (t þ T) À BH (t)]2 ¼ T2H (6:1:6)
When H ¼ 1„2 , the fractional Brownian motion is reduced to the
regular Brownian motion. For the Brownian motion, the correlation
between the past average E[BH (t) À BH (t À T)]=T and the future aver-
age E[BH (t þ T) ÀBH (t)]=T equals
C ¼ 22HÀ1 À 1 (6:1:7)
Obviously, this correlation does not depend on T. If 1„2 < H < 1, then
C > 0 and it is said that BH (t) is a persistent process. Namely, if BH (t)
grew in the past, it will most likely grow in the immediate future.

Conversely, if BH (t) decreased in the past, it will most probably
continue to fall. Thus, persistent processes maintain trend. In the
opposite case (0 < H < 1„2, C < 0), the process is named anti-persist-
ent. It is said also that anti-persistent processes are mean reverting; for
example, if the current process innovation is positive, then the next
one will most likely be negative, and vice versa. There is a simple
relationship between the box-counting fractal dimension and the
Hurst exponent
D¼2ÀH (6:1:8)
The fractal dimension of a time series can be estimated using the
Hurst™s rescaled range (R/S) analysis [1, 3]. Consider the data set
xi (i ¼ 1, . . . N) with mean mN and the standard deviation sN . To
define the rescaled range, the partial sums Sk must be calculated
Sk ¼ (xi À mN ), 1 k N (6:1:9)

The rescaled range equals
R=S ¼ [ max (Sk ) À min (Sk )]=sN , 1 k N (6:1:10)
The value of R/S is always greater than zero since max (Sk ) > 0 and
min (Sk ) < 0. For given R/S, the Hurst exponent can be estimated
using the relation
R=S ¼ (aN)H (6:1:11)
where a is a constant. The R/S analysis is superior to many other
methods of determining long-range dependencies. But this approach
has a noted shortcoming, namely, high sensitivity to the short-range
memory [4].

Let us turn to the generic notion of multifractals (see, e.g., [5]).
Consider the map filled with a set of boxes that are used in the box-
counting fractal dimension. What matters for the fractal concept is
whether the given box belongs to fractal. The basic idea behind the
notion of multifractals is that every box is assigned a measure m
that characterizes some probability density (e.g., intensity of color
64 Fractals

between the white and black limits). The so-called multiplicative
process (or cascade) defines the rule according to which measure is
fragmented when the object is partitioned into smaller components.
The fragmentation ratios that are used in this process are named
multipliers. The multifractal measure is characterized with the Holder
exponent a
a ¼ lim [ ln m(h)= ln (h)] (6:2:1)

where h is the box size. Let us denote the number of boxes with given
h and a via Nh (a). The distribution of the Holder exponents in the
limit h ! 0 is sometimes called the multifractal spectrum
f(a) ¼ À lim [ ln Nh (a)= ln (h)] (6:2:2)

The distribution f(a) can be treated as a generalization of the fractal
dimension for the multifractal processes.
Let us describe the simplest multifractal, namely the binomial
measure m on the interval [0, 1] (see [5] for details). In the binomial
cascade, two positive multipliers, m0 and m1 , are chosen so that
m0 þ m1 ¼ 1. At the step k ¼ 0, the uniform probability measure
for mass distribution, m0 ¼ 1, is used. At the next step (k ¼ 1), the
measure m1 uniformly spreads mass in proportion m0 =m1 on the
intervals [0, 1„2 ] and [1„2 , 1], respectively. Thus, m1 [0, 1„2 ] ¼ m0 and
m1 [ 1„2 , 1] ¼ m1 . In the next steps, every interval is again divided into
two subintervals and the mass of the interval is distributed between
subintervals in proportion m0 =m1 . For example, at k ¼ 2: m2 [0, 1„4 ]
¼ m0 m0 , m2 [ 1„4 , 1„2 ] ¼ m2 [1„2 , 3„4 ] ¼ m0 m1 , m2 [3„4 , 1] ¼ m1 m1 and so on.
At the kth iteration, mass is partitioned into 2k intervals of length 2Àk .
Let us introduce the notion of the binary expansion 0:b1 b2 . . . bk for
the point x ¼ b1 2À1 þ b2 2À2 þ bk 2Àk where 0 x 1 and
0 < bk < 1. Then the measure for every dyadic interval I0b1b2 : : : bk of
length 2Àk equals
mbi ¼ m0 n m1 kÀn
m0b1b2 : : : bk ¼ (6:2:3)
where n is the number of digits 0 in the address 0b1 b2 . . . bk of the
interval™s left end, and (k À n) is the number of digits 1. Since the
subinterval mass is preserved at every step, the cascade is called

3 3
(a) (b)
2.5 2.5
2 2
1.5 1.5
1 1
0.5 0.5
0 0
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31
(c) (d)
2.5 2.5
2 2
1.5 1.5
1 1
0.5 0.5
0 0
14 7 10 13 16 19 22 25 28 31 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31
2 2
1.5 1.5
1 1
0.5 0.5
0 0
1 4 7 10 13 16 19 22 25 28 31 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31
Figure 6.3 Binomial cascade with m0 ¼ 0.6: a) k ¼ 0, b) k ¼ 1, c) k ¼ 2, d)
k ¼ 3, e) k ¼ 4, f) k ¼ 5.

conservative or microcanonical. The first five steps of the binomial
cascade with m0 ¼ 0:6 are depicted in Figure 6.3.
The multifractal spectrum of the binomial cascade equals
amax À a amax À a a À amin a À amin
f(a) ¼ À À
log2 log2
amax À amin amax À amin amax À amin amax À amin
The distribution (6.2.4) is confined with the interval [amin , amax ]. If
m0 ! 0:5, then amin ¼ À log2 (m0 ) and amax ¼ À log2 (1 À m0 ). The
binomial cascade can be generalized in two directions. First, one
can introduce a multinomial cascade by increasing the number of
subintervals to N > 2. Note that the condition
mi ¼ 1 (6:2:5)
66 Fractals

is needed for preserving the conservative character of the cascade.
Secondly, the values of mi can be randomized rather than assigned
fixed values. A cascade with randomized mi is called canonical. In this
case, the condition (6.2.5) is satisfied only on average, that is
" #
mi ¼ 1
E (6:2:6)

An example of the randomized cascade that has an explicit expres-
sion for the multifractal spectrum is the lognormal cascade [6]. In this
process, the multiplier that distributes the mass of the interval, M, is
determined with the lognormal distribution (i.e., log2 (M) is drawn
from the Gaussian distribution). If the Gaussian mean and variance
are l and s, respectively, then the conservative character of the
cascade E[M] ¼ 0.5 is preserved when

s2 ¼ 2(l À 1)= ln (2) (6:2:7)
The multifractal spectrum of the lognormal cascade that satisfies
(6.2.7) equals

(a À l)2
f(a) ¼ 1 À (6:2:8)
4(l À 1)
Note that in contrast to the binomial cascade, the lognormal
cascade may yield negative values of f(a), which requires interpret-
ation of f(a) other than the fractal dimension.
Innovation of multifractal process, DX ¼ X(t þ Dt) À X(t), is de-
scribed with the scaling rule

E[j(DX)jq ] ¼ c(q)(Dt)t(q)þ1 (6:2:9)
where c(q) and t(q) (so-called scaling function) are deterministic func-
tions of q. It can be shown that the scaling function t(q) is always
concave. Obviously, t(0) ¼ À1. A self-affine process (6.1.4) can be
treated as a multifractal process with t(q) ¼ Hq À 1. In particular, for
the Wiener processes, H ¼ 1„2 and tw (q) ¼ q=2 À 1. The scaling func-
tion of the binomial cascade can be expressed in terms of its multi-
t(q) ¼ log2 (m0 q þ m1 q ) (6:2:10)

The scaling function t(q) is related to the multifractal spectrum f(a)
via the Legendre transformation
t(q) ¼ min[qa À f(a)] (6:2:11)

which is equivalent to
f(a) ¼ arg min[qa À t(q)] (6:2:12)

Note that f(a) ¼ q(a À H) þ 1 for the self-affine processes.
In practice, the scaling function of a multifractal process X(t) can
be calculated using so-called partition function
jX(t þ Dt) À X(t)jq
Sq (T, Dt) ¼ (6:2:13)

where the sample X(t) has N points within the interval [0, T] with the
mesh size Dt. It follows from (6.2.9) that
log {E[Sq (T, Dt)]} ¼ t(q) log (Dt) þ c(q) log T (6:2:14)
Thus, plotting log {E[Sq (T, Dt)]} against log (Dt) for different values
of q reveals the character of the scaling function t(q). Multifractal
models have become very popular in analysis of the financial time
series. We shall return to this topic in Section 8.2

The Mandelbrot™s work on scaling in the financial time series is
compiled in the collection [1]. Among many excellent books on frac-
tals, we choose [2] for its comprehensive material that includes a
description of relations between chaos and fractals and an important
chapter on multifractals [5].

*1. Implement an algorithm that draws the Sierpinski triangle with
r ¼ 0:5 (see Figure 6.2).
Hint: Choose the following fixed points: (0, 0), (0, 100), (100,
0). Use the following method for the randomized choice of the
68 Fractals

fixed point: i ¼ [10 rand()] %3 where rand() is the uniform
distribution within [0, 1] and % is modulus (explain the ration-
ale behind this method). Note that at least 10000 iterations are
required for a good-quality picture.
*2. Reproduce the first five steps of the binomial cascade with
m0 ¼ 0:6 (see Figure 6.3). How will this cascade change if
m0 ¼ 0:8?
Chapter 7

Nonlinear Dynamical Systems

It is well known that many nonlinear dynamical systems, including
seemingly simple cases, can exhibit chaotic behavior. In short, the
presence of chaos implies that very small changes in the initial condi-
tions or parameters of a system can lead to drastic changes in its
behavior. In the chaotic regime, the system solutions stay within the
phase space region named strange attractor. These solutions never
repeat themselves; they are not periodic and they never intersect.
Thus, in the chaotic regime, the system becomes unpredictable. The
chaos theory is an exciting and complex topic. Many excellent books
are devoted to the chaos theory and its applications (see, e.g., refer-
ences in Section 7.7). Here, I only outline the main concepts that may
be relevant to quantitative finance.
The first reason to turn to chaotic dynamics is a better understand-
ing of possible causes of price randomness. Obviously, new infor-
mation coming to the market moves prices. Whether it is a
company™s performance report, a financial analyst™s comments, or a
macroeconomic event, the company™s stock and option prices may
change, thus reflecting the news. Since news usually comes unexpect-
edly, prices change in unpredictable ways.1 But is new information the
only source reason for price randomness? One may doubt this while
observing the price fluctuations at times when no relevant news is

70 Nonlinear Dynamical Systems

released. A tempting proposition is that the price dynamics can be
attributed in part to the complexity of financial markets. The possi-
bility that the deterministic processes modulate the price variations
has a very important practical implication: even though these pro-
cesses can have the chaotic regimes, their deterministic nature means
that prices may be partly forecastable. Therefore, research of chaos in
finance and economics is accompanied with discussion of limited
predictability of the processes under investigation [1].
There have been several attempts to find possible strange attractors
in the financial and economic time series (see, e.g., [1“3] and refer-
ences therein). Discerning the deterministic chaotic dynamics from a
˜˜pure™™ stochastic process is always a non-trivial task. This problem is
even more complicated for financial markets whose parameters may


. 2
( 5)