in-phase output and an in-quadrature output. The in-phase output coincides pre-
cisely with any signal in the market with a frequency that lies at the center of the
pass-band of the filter. The in-quadrature output is precisely 90 degrees out-of-
phase, having zero crossings when the in-phase output is at a peak or trough, and
having peaks and troughs when the in-phase output is crossing zero. In a mathe-
matical sense, the outputs can be said to be orthogonal. Using these filters, the
instantaneous amplitude of the cyclic activity (at the frequency the filter is tuned
to) can be computed by simply taking the square of the in-phase output, adding it
to the square of the in-quadrature output, and taking the square root of the sum.
There is no need to look back for peaks and valleys in the filtered output, and to
measure their amplitude, to determine the strength of a cycle. There is also no need
to use any other unusual technique, such as obtaining a correlation between the fil-
ter output and prices over approximately the length of one cycle of bars, as we did
in 1997. Instead, if a strong cycle is detected by one of the filters in the bank, the
pair of filter outputs can generate a trading signal at any desired point in the phase
of the detected cycle.
Figure 10-l shows a single filter responding to a cycle of fixed amplitude
(origina signal), with a frequency that is being swept from low to high (left to
right on graph). The center of the filter was set to a period of 12. The second line
down from the top (in-phasefilter output) illustrates the in-phase output from the
filter as it responds to the input signal. It is evident that as the periodicity of the
original signal approaches the center of the filterâ€™s pass-band, the amplitude of the
in-phase output climbs, reaching a maximum at the center of the pass-band. As the
periodicity of the original signal becomes longer than the center of the pass-band,
the amplitude of the in-phase output declines. Near the center of the pass-band, the
in-phase output from the filter is almost perfectly aligned with the original input
signal. Except for the alignment, the in-quadraturefilter output (third line) shows
the same kind of amplitude variation in response to the changing periodicity of the
driving signal. Near the center of the filter pass-band, the in-quadrature output is
almost exactly 90 degrees out-of-phase with the in-phase output, Finally, the
fourth line depicts instantaneous power, as estimated from the filter outputs. This
represents the strength or amplitude of cyclic activity in the signal near the center
of the filter pass-band. The curve for instantaneous power is exceedingly smooth,
reaches a peak when the signal has a periodicity matching the tuning of the filter,
and declines thereafter. In the chart, the center of the pass-band appears to occur
at a period of 13, rather than 12, the period to which the filter was set. The reason
for the slight distortion is that the periodicity of the original signal was being
rapidly swept from low to high. Since the filter needs to look back several cycles,
the spectral estimate is distorted. Nevertheless, it seems apparent that trades based
on the filtered output would be highly profitable. The scaling of the y-axis is irrel-
evant; it was done in the manner presented to make the signals appear clearly, at
separate locations, on the chart.
Figure 10-2 depicts the frequency (or periodicity) and phase response of the
filter. In this case, the filter is set to have a center, or pass-band, periodicity of 20.
The relative power curve shows the strength of the output from the filter as the sig-
nal frequency is varied, but held constant in power. The filter passes the signal to
a maximum extent when it has a frequency at the center of the pass-band, and as
the frequency moves away from the center frequency of the filter, the output of the
filter smoothly and rapidly declines. There are no side lobes in the response curve,
and the output power drops to zero as the periodicity goes down or up. The filter
has absolutely no response to a steady trend or a fixed offset-a highly desirable
property for traders, since there is then no need to fuss with de-trending, or any
other preprocessing of the input signal, before applying the filter. The phase
response also shows many desirable characteristics. For the most part, the phase
response is well within +90 degrees within the pass-band of the filter. At the cen-
ter of the pass-band, there is no phase shift; i.e., the in-phase filter output is exact-
ly in synchronization with the input series-a trader timing his trades would
achieve perfect entries. As with the power, the phase response is smooth and
extremely well behaved. Any engineer or physicist seeing this chart should appre-
ciate the quality of these filters. When a similar chart was generated for the
Butterworth band-pass filters (used in our 1997 study), the results were much less
pleasing, especially with regard to the filterâ€™s phase response and offset character-
istics. Severe phase shifts developed very rapidly as the periodicity of the signal
moved even slightly away from the center periodicity of the filter. In real-life cir
cumstances, with imprecisely timed cycles, the phase response of those filters
would likely play total havoc with any effort to achieve good trade timing.
Figure IO-3 shows the impulse response for both outputs from the wavelet
filter pair: the in-quadrature output and the in-phase output. These curves look
almost as though they were exponentially decaying sines or cosines. The decay,
however, is not quite exponential, and there are slight, though imperceptible,
adjustments in the relative amplitudes of the peaks to eliminate sensitivity to off-
set or trend.
Frequency (Period) and Phase Response of a Quadrature Mirror Wavelet Filter Pair
In addition to the data provided in the charts, a number of other tests were
conducted using â€śplasmodes.â€ť A plasmode is a set of data constructed to have the
characteristics that are assumed to exist in the real data. The intention is to test the
ability of some algorithm or analytic technique to properly extract, detect, or ana-
lyze those characteristics. A good cycle-based trading system should be able to do
a good job trading a synthetic data series containing lots of noise and occasional
embedded cycles. If it cannot, there could be no expectation of it trading well in
any real market. The kind of filters used in the tests below perform very well in
tests involving plasmodes.
GENERATING CYCLE ENTRIES USING FILTER
One way to generate cycle entries is to set up a series of filters, each for a different
frequency or periodicity (e.g., going down a certain percentage per filter throughout
some range or spectrum that will be analyzed). If one of these filters shows strong
resonance, while the others show little or no activity, there is presumably a strong
cycle in the market. An entry is generated by looking at the pair of filter outputs and
buying at the next bar, if the cycle phase is such that a cyclic bottom will occur on
that bar, or selling at the next bar, if a cycle top is evident or would be expected on
that bar, Since the strongest-responding filter should produce no undesirable lag or
phase error, such cyclic entries should provide exceedingly timely signals if the mar-
ket is evidencing cyclic behavior. Attempting to buy cycle bottoms and sell cycle
tops is one of the traditional ways in which cycle information has been used to trade
the markets. Cycle information derived from filter banks, or by other means, can also
enhance other kinds of systems or adapt indicators to current market conditions. An
example of how information regarding the signal-to-noise ratio and periodicity of
the dominant cycle (if there is any) may be used within another system, or to adapt
an indicator to current market conditions, can be found in Ruggiero (1997).
CHARACTERISTICS OF CYCLE-BASED ENTRIES
A cycle-based entry of the kind studied below (which attempts to buy bottoms and
sell tops) has several strong characteristics: a high percentage of winning trades,
low levels of slippage, and the ability to capture as much of each move as possi-
ble. This is the kind of trading that is a traderâ€™s dream. The assumption being made
is that there are well-behaved cycles in the market that can be detected and, more
importantly. extrapolated by this kind of technology. It has been said that the mar-
kets evidence cyclic activity up to 70% of the time. Even if clear cycles that lead
to successful trades only occur some smaller percentage of the time, because of
the nature of the model, the losses can be kept small on the failed trades by the use
of tight stops. The main disadvantage of a cycle-based entry is that the market may
have become efftcient relative to such trading methodologies, thanks to the prolif-
eration of fairly powerful cycle analysis techniques, e.g., maximum entropy. The
trading away of well-behaved cycles may hamper all cycle detection approaches.
Since cycle entries of the kind just discussed are countertrend in nature, if the
cycles show no follow-through, but the trends do, the trader can get wiped out
unless good money management practices (tight stops) are employed. Whether or
not a sophisticated cycle analysis works, at least as implemented here, is the ques-
tion to be answered by the tests that follow.
In all tests of cycle-based entry models, the standard portfolio of 36 commodities
is used. The number of contracts in any market at any time to buy or sell on entry
was chosen to approximate the dollar volatility of two S&P 500 contracts at the
end of 1998. Exits are the standard ones in which a money management stop clos-
es out any trade that moves more than one volatility unit into the red, a profit tar-
get limit closes out trades that push more that four volatility units into the profit
zone, and a market-at-close order ends any trade that has not yet been closed out
by the stop-loss or profit target after 10 days has elapsed. Entry rules are specified
in the discussion of the model code and the individual tests. All tests are performed
using the standard C-Trader toolkit. Here is the code implementing the wavelet fil-
ter entry model along with the standard exit strategy:
The code above implements the model being tested. The first significant
block of code specifically relevant to a cyclic trading model initializes the indi-
vidual filters that make up the filter bank. This code is set up to run only on the
fistpass, or when a parameter specifically affecting the computations involved in
initializing the filter bank (e.g., the width parameter) has changed; if no relevant
parameter has changed, there is no point in reinitializing the filters every time the
Model function is called.
The next block of code applies each of the filters in the bank to the input sig-
nal. In this block, two arrays are allocated to hold the filter bank outputs. The first
array contains the in-phase outputs (inphase), and the second contains the in-quad-
rature outputs (inquad). The inputs to the filters are the raw closing prices. Because
the filters ;rre mathematiically optimal, and designed to eliminate offsets and trends,
there is no need to preprocess the closing prices before applying them, as might be
necessary when using less sophisticated analysis techniques. Each row in the arrays
represents the output of a single filter with a specified center frequency or periodic-
ity. Each column represents a bar. The frequencies (or periodicities) at which the fil-
ters are centered are all spaced evenly on a logarithmic scale; i.e., the ratio between
the center frequency of a given filter and the next has a fixed value. The selectivity
or bandwidth (width) is the only adjustable parameter in the computation of the fil-
ter banks, the correct value of which may be sought by optimization.
The usual bar-stepping loop is then entered and the actual trading signals
generated. First, a good, pure cycle to trade is identified, which involves deter-
mining the power at the periodicity that has the strongest resonance with current
market activity (peakpower). The cycle periodicity at which the peak power occurs
is also assessed. If the periodicity is not at one of the end points of the range of
periodicities being examined (in this case the range is 3 bars to 30 bars), one of
the conditions for a potentially good cycle is met. A check is then made to see
what the maximum power (peaknoise) is at periodicities at least 2 filters away
from the periodicity at which peak power occurs If peakpower is more than 1.5
times thepeaknoise (a signal-to-noise ratio of 1.5 or greater), the second condition
for a good cycle is met. The phase angle of that cycle is then determined (easy to
do given the pair of filter outputs), making adjustments for the slice that occurs at
180 degrees in the plane of polar coordinates. The code then checks whether the
phase is such that a cycle bottom or a cycle top is present. A small displacement
term (disp) is incorporated in the phase assessments. It acts like the displacements
in previous models, except that here it is in terms of phase angle, rather than bars.
There is a direct translation between phase angle and number of bars; specifical-
ly, the period of the cycle is multiplied by the phase angle (in degrees), and the
sum is then divided by 360, which is the number of bars represented by the phase
angle. If the displaced phase is such that a bottom can be expected a certain num-
ber of degrees before or after the present bar, a buy is posted. If the phase angle is
such that a top can be expected, a sell signal is issued. The limit and stop prices
are then calculated, as usual. Finally, the necessary trading orders are posted.
Many other blocks of code present in the above listing have not been dis-
cussed. These were used for debugging and testing. Comments embedded in the
code should make their purpose fairly clear.
Only one model was tested. Tests were performed for entry at the open (Test I),
entry on a limit (Test 2), and entry on a stop (Test 3). The rules were simple: Buy
predicted cycle bottoms and sell predicted cycle tops. Exits took place when a
cycle signal reversed an existing position or when the standard strategy closed out
the trade, whichever came first. This simple trading model was first evaluated on
a noisy sine wave that was swept from a period of about 4 bars to a period of about
20 bars to verify behavior of the model implementation. On this data, buy and sell
signals appeared with clockwork precision at cycle tops and bottoms. The timing
of the signals indicates that when real cycles are present, the model is able to
detect and trade them with precision.
Table 10-l contains the best in-sample parameters, as well as the perfor-
mance of the portfolio on both the in-sample and verification sample data. In the
table, SAMP = whether the test was on the optimization sample (IN or OUT);
ROA% = the annualized return-on-account; ARRR = the annualized risk-to-
reward ratio; PROB = the associated probability or statistical significance; TRDS
= the number of trades taken across all commodities in the portfolio: WIN% = the
percentage of winning trades; $TRD = the average profit/loss per trade; BARS =
the average number of days a trade was held; NETL = the total net profit on long
trades, in thousands of dollars: and NETS = the total net profit on short trades, in
thousands of dollars. Two parameters were optimized. The first (PI) represents the
bandwidth for each filter in the filter bank. The second parameter (P2) represents
the phase displacement in degrees. In all cases, the parameters were optimized on
the in-sample data by stepping the bandwidth from 0.05 to 0.2 in increments of
0.05, and by stepping the phase angle displacement from -20 degrees to +20
degrees in increments of 10 degrees. Only the best solutions are shown.
It is interesting that, overall, the cycle model performed rather poorly. This
model was not as bad, on a dollars-per-trade basis, as many of the other systems test-
ed, but it was nowhere near as good as the best. In-sample, the loss per trade was
$1,329 with entry at open, $1,037 with entry on limit, and $1,245 with entry on stop.
The limit order had the highest percentage of wins and the smallest average loss per
Portfolio Performance with Best In-Sample Parameters on Both In-
Sample and Out-of-Sample Data
trade. The long side was slightly profitable with entry at open, was somewhat more
profitable with entry on limit, and lost with entry on stop. The behavior out-of-sam
ple, with entry at open and on limit, was a lot worse than the behavior of the model
in-sample. The loss per trade grew to $3,741 for entry at open and to $3,551 for
entry on limit. The percentage of winning trades also declined, to 34%. The per-
formance of the cycle model on the verification sample was among the worst
observed of the various models tested. The deterioration cannot be attributed to
optimization: Several other parameter sets were examined, and regardless of
which was chosen, the cycle model still performed much worse out-of-sample.
With entry on stop, the out-of-sample performance did not deteriorate. In this case,
the loss ($944) was not too different from the in-sample loss. Although the stop
order appears to have prevented the deterioration of the model that was seen with
the other orders, in more recent times the system is a loser.
The decline of system performance in recent years was unusually severe, as
observed from the results of the other models tested. One possible reason may be
the recent proliferation of sophisticated cycle trading tools. Another explanation
might be that major trading firms are conducting research using sophisticated
techniques, including wavelets of the kind studied here. These factors may have
contributed to making the markets relatively efficient to basic cycle trading.
Table 10-2 shows the in-sample and out-of-sample behavior of the cycle
model broken down by market and entry order (test). The SYM column represents
the market being studied. The center and rightmost columns (COUhrr, contain the
number of profitable tests for a given market. The numbers in the first row repre-
sent test identifiers: 01, 02, and 03 represent entry at open, on limit, and on stop,
respectively. The last row (COUNT) contains the number of markets on which a
given model was profitable. The data in this table provides relatively detailed
information about which markets are and are not profitable when traded by each
of the models: One dash (-) indicates a moderate loss per trade, i.e., $2,000 to
$4,000; two dashes (--) represent a large loss per trade, i.e., $4,000 or more; one
plus sign (+) means a moderate profit per trade, i.e., $1,000 to $2,000; two plus-
es (++) indicate a large gain per trade, i.e., $2,000 or more; and a blank cell
means that the loss was between $0 and $1,999 or the profit was between $0 and
$1,000 per trade. (For information about the various markets and their symbols,
see Table 11-l in the â€śIntroductionâ€ť to Part II.)
Only the IO-Year Notes and Cotton showed strong profits across all three
entry orders in-sample. Out-of-sample, performance on these markets was miser-
able. The S&P 500, a market that, in our experience, has many clear and tradable
cycles, demonstrated strong profitability on the in-sample data when entry was at
open or on limit. This market was strongly profitable out-of-sample with entry on
limit and on stop, but somewhat less profitable with entry at open. Interestingly,
the NYFE, although evidencing strong in-sample profits with entry at open and on
limit, had losses out-of-sample across all three orders. There are a few other prof-
CHAPTER 10 Cycle-Based Entries 223
Performance Data Broken Down by Market and Test
, __. _. --ample 1
â€ť In2 03 1 count I01 102 103 1 count
I++ I+4 I- I 91 I I& I** I&+ I 7
. . I I-
91 -_ _ _ I n
++ ++ -
itable in-sample market-order combinations, as well as out-of-sample market-
order combinations, However, very little correspondence between the two was
observed. Perhaps markets that have not had cycles in the past (in-sample) have
cycles in the present (out-of-sample), and vice versa. At least the S&P 500
behaved as expected on the basis of prior research and may be one of the few mar-
kets consistently amenable to cycle trading in this crude form.
Figure 10-4 depicts the equity for the portfolio with entry at open. Equity
declined slowly and then became rather flat until about August 1992, at which
time it began a steady and rapid decline.
In our May 1997 study, the filter bank method appeared to have potential as the
basis for an effective trading strategy. At times it worked incredibly well, and was
almost completely insensitive to large variations in its parameters, whereas at
other times it performed poorly. The results may simply have been due to the fact
that the implementation was â€śquick and dirty.â€ť Back then, the focus was on the
S&P 500, a market that continued to trade well in the present study.
The results of the current study are disappointing, all the more given the the-
oretical elegance of the filters. It may be that other approaches to the analysis of
cycles, e.g., the use of maximum entropy, might have provided better perfor-
mance; then again, maybe not. Other traders have also experienced similar disap-
pointments using a variety of techniques when trading cycles in a simple,
buy-the-bottom/sell-the-top manner. It may be that cycles are too obvious and
detectable by any of a number of methods, and may be traded away very quickly
whenever they develop in the market. This especially seems the case in recent
years with the proliferation of cycle analysis software. The suggestion is not that
cycles should be abandoned as a concept, but that a more sophisticated use of
detected cycles must be made. Perhaps better results would ensue if cycles were
combined with other kinds of entry criteria, e.g., taking trades only if a cycle top
corresponds to an expected seasonal turning-point top.
Further studies are needed to determine whether the cycle model does indeed
have the characteristic of giving precise entries when it works, but failing miser-
ably when it does not work. Looking over a chart of the S&P 500 suggests this is
the case. There are frequentiy strings of four or five trades in a row, with entries
that occur precisely at market tops and bottoms, as if predicted with perfect hind-
sight. At other times, entries occur exactly where they should not. With a system
that behaves this way, our experience indicates that, combined with a proper exit,
sometimes great profits can be achieved. More specifically, losses have to be cut
very quickly when the model fails, but trades should not be prematurely terminat-
ed when the model is correct in its predictions. Because of the precision of the
model when the predictions are correct, an extremely tight stop could perhaps
FIGURE 1 O-4
Portfolio Equity Growth for Count&rend Cycle Trading
accomplish the goal. When an exact cycle top or bottom is caught, the market
begins to move immediately in the favored direction, with hardly any adverse
excursion, and the stop is never hit. When the model fails, the stop is hit very
quickly, resulting in only a small loss. Given the fairly loose stop of the standard
exit, the benefits of sophisticated cycle trading may not have been realized.
WHAT HAVE WE LEARNED?
Models that are theoretically sound, elegant, and appealing do not neces-
sarily work well when trading real markets.
n Exception to Rule 1: The S&P 500 may respond to such methods; it did
so both in our earlier study and in the current one.
. When the model does work, it does so remarkably well. As stated earlier,
when examining its behavior on the S&P 500 and several other markets,
one can quickly and easily find strings of signals that pick off tops and
bottoms with the precision of hindsight.
. The previous point suggests that exits specifically designed for a system
that yields high precision when correct, but fails badly when incorrect,
may he required.
. The markets appear to have become more efficient relative to cycle
models, as they have to breakout models. Obvious market behavior
(such as clear, tradable cycles) are traded away before most traders can
capitalize on them. The lesson: Anything too theoretically appealing or
obvious will tend not to work.
Neural network technology, a form of artificial intelligence (or AI), arose from
endeavors to emulate the kind of information processing and decision making
that occurs in living organisms. The goal was to model the behavior of neural tis-
sue in living systems by using a computer to implement structures composed of
simulated neurons and neural interconnections (synapses). Research on neural
networks began in the 1940s on a theoretical level. When computer technology
became sophisticated enough to accommodate such research, the study of neural
networks and their applications began in earnest. It was not, however, until the
mid-to-late 1980s that neural network technology became of interest to the finan-
cial community. By 1989, a few vendors of neural network development tools
were available, and there was one commercial S&P 500 forecasting system based
on this technology (Scientific Consultant Servicesâ€™ NexTurn). In the early 199Os,
interest peaked, more development tools appeared, but the fervor then waned for
reasons discussed later.
While it is not within the scope of this book to present a full tutorial on neur-
al network technology, below is a brief discussion to provide basic understanding.
Those interested in exploring this subject in greater depth should read our contr-
butions to the books Virtual Trading (Lederman and Klein, 1995) and
Computerized Trading (Jurik, 1999), in which we also present detailed informa-
tion on system development using neural networks, as well as our articles in
Technical Analysis of Stocks and Commodities (Katz, April 1992; Katz and
McCormick, November 1996, November 1997). Neural Networks in Finance and
Znvesting (Trippi and Turban, 1993) should also be of interest.
WHAT ARE NEURAL NETWORKS?
Neural nerworks (or â€śnetsâ€ť) are basically building blocks that learn and are useful
for pattern recognition, classification, and prediction. They hold special appeal to
traders because nets are capable of coping both with probability estimates in
uncertain situations and with â€śfuzzyâ€ť patterns, i.e., those recognizable by eye but
difficult to define in software using precise rules; and they have the potential to
recognize almost any pattern that exists. Nets can also integrate large amounts of
information without becoming stifled by detail and can be made to adapt to chang-
ing markets and market conditions.
A variety of neural networks are available, differing in terms of their â€śarchi-
tecture,â€ť i.e., the ways in which the simulated neurons are interconnected, the
details of how these neurons behave (signal processing behavior or â€śtransfer func-
tionsâ€ť), and the process through which learning takes place. There are a number of
popular kinds of neural networks that are of some use to traders: the Kohonen and
the Learning Vector Quantization (LVQ) networks, various adaptive resonance net-
works, and recurrent networks. In this chapter, the most popular and, in many
respects, the most useful kind of network is discussed: the â€śfeed-forwardâ€ť network.
As mentioned above, nets differ in the ways they learn. The system develop-
er plays the role of the neural networkâ€™s teacher, providing the net with examples
to learn from. Some nets employ â€śsupervised learningâ€ť and others â€śunsupervised
learning.â€ť Supervised learning occurs when the network is taught to produce a cor-
rect solution by being shown instances of correct solutions. This is a form of
paired-associate learning: The network is presented with pairs of inputs and a
desired output; for every set of inputs, it is the task of the net to learn to produce
the desired output. Unsupervised learning, on the other hand, involves nets that
take the sets of inputs they are given and organize them as they see tit, according
to patterns they lind therein. Regardless of the form of learning employed, the
main difficulty in developing successful neural network models is in finding and
â€śmassagingâ€ť historical data into training examples or â€śfactsâ€ť that highlight rele-
vant patterns so that the nets can learn efficiently and not be put astray or con-
fused; â€śpreprocessingâ€ť the data is an art in itself.
The actual process of learning usually involves some mechanism for updat-
ing the neural connection weights in response to the training examples. With feed-
forward architectures, back-propagation, a form of steepest-descent optimization,
is often used. Genetic algorithms are also effective. These are very computation-
ally intensive and time-consuming, but generally produce better final results.
Feed-Forward Neural Networks
A feed-forward network consists of layers of neurons. The input layer, the tirst
layer, receives data or inputs from the outside world. The inputs consist of inde-
pendent variables (e.g., market or indicator variables upon which the system is to
be based) from which some inference is to be drawn or a prediction is to be made.
The input layer is massively connected to tire next layer, which is often called the
hidden layer because it has no connections to tire outside world. The outputs of the
hidden layer are fed to the next layer, which may be another hidden layer (if it is,
the process repeats), or it may be the output layer. Each neuron in the output layer
produces an output composed of the predictions, classifications, or decisions made
by the network. Networks are usually identified by the number of neurons in each
layer: For example, a 10-3-l network is one that has 10 neurons in its first or input
layer, 3 neurons in its middle layer, and 1 neuron in its output layer. Networks vary
in size, from only a few neurons to thousands, from only three layers to dozens;
the size depends on the complexity of the problem. Almost always, a three- or
four-layer network suffices.
Feed-forward networks (the kind being used in this chapter) implement a par-
ticular form of nonlinear multiple regression. The net takes a number of input vari-
ables and uses them to predict a target, exactly as in regression. In a standard linear
multiple regression, if the goal is to predict cholesterol (the dependent variable or
target) on the basis of dietary fat intake and exercise (the independent variables or
inputs), the data would be modeled as follows: predicted cholesterol = a + b * fat
intake + c * exercise: where a, b, and c represent parameters that would be deter-
mined by a statistical procedure. In a least-squares sense, a line, plane, or hyper-
plane (depending on the number of independent variables) is being fitted to the
points in a data space. In the example above, a plane is being fit: The x-axis repre-
sents fat intake, tire y-axis is exercise, and the height of the plane at each xy coor-
dinate pair represents predicted cholesterol.
When using neural network technology, the two-dimensional plane or n-
dimensional hyperplane of linear multiple regression is replaced by a smooth n-
dimensional curved surface characterized by peaks and valleys, ridges and troughs.
As an example, let us say there is a given number of input variables and a goal of
finding a nonlinear mapping that will provide an output from the network that best
fits the target. In the neural network, the goal is achieved via the â€śneurons,â€ť the non-
linear elements that are connected to one another. The weights of the comrections are
adjusted to fit the surface to the data. The learning algorithm adjusts the weights to
get a particular curved surface that best fits the data points. As in a standard multi-
ple regression model, in which the coefficients of the regression are needed to define
the slope of the plane or hyperplane, a neural model requires that parameters, in the
form of connection weights, be determined so that the particular surface generated
(in this case a curved surface with hills and dales) will best fit the data.
NEURAL NETWORKS IN TRADING
Neural networks had their heyday in the late 1980 and early 1990s. Then the hon-
eymoon ended. What happened? Basically, disillusionment set in among traders
PART I, The study of Entries
who believed that this new technology could, with little or no effort on the traderâ€™s
part, magically provide the needed edge. System developers would â€śtrainâ€ť their
nets on raw or mildly preprocessed data, hoping the neural networks themselves
would discover something useful. This approach was naive; nothing is ever so sim-
ple, especially when trading the markets. Not only was this â€śneural newbieâ€ť
approach an ineffective way to use neural networks, but so many people were
attempting to use nets that whatever edge was originally gained was nullified by
the response of the markets, which was to become more efficient with regard to
the technology. The technology itself was blamed and discarded with little con-
sideration to the thought that it was being inappropriately applied. A more sophis-
ticated, reasoned approach was needed if success was going to be achieved.
Most attempts to develop neural network forecasting models, whether in a
simplistic manner or more elaborately, have focused on individual markets. A seri-
ous problem with the use of individual markets, however, is the limited number of
data points available on which to train the net. This situation leads to grand oppor-
tunities for curve-fitting (the bad kind)--something that can contribute greatly to
the likelihood of failure with a neural network, especially with less than ideal data
preprocessing and targets. In this chapter, however, neural networks will be trained
on a whole portfolio of tradables, resulting in the availability of many tens of thou-
sands of data points (facts), and a reduction in curve-fitting for small to moderate-
sized networks. Perhaps, in this context, a fairly straightforward attempt to have a
neural network predict current or near-future market behavior might be success-
ful. In essence, such a network could be considered a universal market forecaster,
in that, trained across an entire portfolio of tradables, it might be able to predict
on all markets, in a non-market-specific fashion.
FORECASTING WITH NEURAL NETWORKS
Neural networks will be developed to predict (1) where the market is in terms of
its near-future range and (2) whether tomorrowâ€™s open represents a turning point.
Consider, first, the goal of predicting where the market is relative to its near-future
range. An attempt will be made to build a network to predict a time-reversed
Stochastic, specifically the time-reversed Slow %K. This is the usual Stochastic,
except that it is computed with time running backward. The time-reversed Slow
%K reflects where the current close lies with respect to the price range over the
next several bars. If something could predict this, it would be useful to the trader:
Knowing that todayâ€™s close, and probably tomorrowâ€™s open, lies near the bottom of
the range of the next several daysâ€™ prices would suggest a good buy point; and
knowing that todayâ€™s close, or tomorrowâ€™s open, lies near the top of the range
would be useful in deciding to sell. Consider, second, the goal of predicting whether
tomorrowâ€™s open is a top, a bottom, or neither. Two neural networks will be trained.
One will predict whether tomorrowâ€™s open represents a bottom turning point, i.e.,
has a price that is lower than the prices on earlier and later bars. The other will pre-
dict whether tomorrowâ€™s open represents a top turning point, i.e., has a price that is
higher than the prices on earlier or later bars. Being able to predict whether a bot-
tom or a top will occur at tomorrowâ€™s open is also useful for the trader trying to
decide when to enter the market and whether to go long or short. The goal in this
study is to achieve such predictions in any market to which the model is applied.
GENERATING ENTRIES WITH NEURAL
Three nets will be trained, yielding three entry models. no models will be con-
stmcted for turning points. One model will be designed to detect bottoms, the other
model to detect tops. For the bottom detection model, if the neural net indicates that
the probability that tomorrowâ€™s open will be a bottom is greater than some thresh-
old, then a buy order will be posted. For the top detection model, if the neural net
indicates that the probability that tomorrowâ€™s open will be a top is greater than
some other threshold, then a sell order will be posted. Neither model will post an
order under any other cicumstances. These rules amount to nothing mom than a
simple strategy of selling predicted tops and buying predicted bottoms. If, with bet-
ter than chance accuracy, the locations of bottoms and tops can be detected in time
to trade them, trading should be profitable. The detection system does not have to
be perfect, just sufficiently better than chance so as to overcome transaction costs.
For the model that predicts the time-reversed Slow %K, a similar strategy will
be used. If the prediction indicates that the time-reversed Slow %K is likely to be
less than some lower threshold, a buy will be posted; the market is near the bottom
of its near-future range and so a profit should quickly develop. Likewise, if the pre-
dicted reverse Slow %K is high, above an upper threshold, a sell will be posted.
These entries share the characteristics of other entries based on predictive, rather
than responsive, analysis. The entries lend themselves to countertrend trading and, if
the predictions are accurate, can dramatically limit transaction costs in the form of slip-
page, and provide good fills since the trader will be buying when others are selling and
vice versa. A good predictive model is the traderâ€™s Holy Grail, providing the ability to
sell near tops and buy near bottoms. As with other predictive-based entries, if the pre-
dictions are not sufficiently accurate, the benefits will be outweighed by the costs of
bad trades when the predictions go wrong, as they often do.
TIME-REVERSED SLOW %K MODEL
The first step in developing a neural forecasting model is to prepare a trainingfact
set, which is the sample of data consisting of examples from which the net learns;
i.e., it is the data used to train the network and to estimate certain statistics. In this
case, the fact set is generated using the in-sample data from all commodities in the
portfolio. The number of facts in the fact set is, therefore, large-88,092 data
points. A fact set is only generated for training, not for testing, for reasons that will
be explained later.
To generate the facts that make up the fact set for this model, the initial step
of computing the time-reversed Slow %K, which is to serve as the target, must be
taken. Each fact is then created and written to a file by stepping through the in-
sample bars for each commodity in the portfolio. For each current bar (the one cur-
rently being stepped through), the process of creating a fact begins with computing
each input variable in the fact. This is done by calculating a difference between a
pair of prices, and then dividing that difference by the square-root of the number
of bars that separate the two prices. The square-root correction is used because, in
a random market, the standard deviation of a price difference between a pair of
bars is roughly proportional to the square-root of the number of bars separating the
two prices. The correction will force each price difference to contribute about
equally to the fact. In this experiment, each fact contains 18 price changes that are
computed using the square-root correction. These 18 prices change scores will
serve as the 18 inputs to the neural network after some additional processing.
The pairs of prices (used when computing the changes) are sampled with
increasing distance between them: i.e., the further back in time, the greater the dis-
tance between the pairs. For the first few bars prior to the current bar, the spacing
between the prices differenced is only 1 bar; i.e., the price of the bar prior to the
current bar is be subtracted from the price of the current bar; the price 2 bars
before the current bar is subtracted from the price 1 bar ago, etc. After several such
price change scores, the sampling is increased to every 2 bars, then every 4, then
every 8, etc. The exact spacings are in a table inside the code. The rationale behind
this procedure is to obtain more detailed information on very recent price behav-
ior. The further back the prices are in time, the more likely only longer-term move-
ments will be significant. Therefore, less resolution should be required. Sampling
the bars in this way ought to provide sufftcient resolution to detect cycles and other
phenomena that range from a period of 1 or 2 bars through 50 bars or more. This
approach is in accord with a suggestion made by Mark Jurik (jurikres.com).
After assembling the 18 input variables consisting of the square-root-cor-
rected price differences for a fact, a normalization procedure is applied. The inten-
tion is to preserve wave shape while discarding amplitude information, By treating
the 18 input variables as a vector, the normalization consists of scaling the vector
to unit length. The calculations involve squaring each vector element or price dif-
ference, summing the squares, taking the square-root, and then dividing each ele-
ment by the resultant number. These are the input variables for the neural network.
In actual fact, the neural network software will further scale these inputs to an
appropriate range for the input neurons.
The target (dependent variable in regression terms) for each fact is simply the
time-reversed Slow %K for the current bar. The input variables and target for each
fact are written in simple ASCII format to a file that can be analyzed with a good
neural network development package.
The resultant fact set is used to train a net to predict the time-reversed Slow
%K, i.e., to predict the relative position of todayâ€™s close, and, it is hoped, tomor-
rowâ€™s open, with respect to the range of prices over the next 10 bars (a IO-bar time-
reversed Slow %K).
The next step in developing the neural forecaster is to actually train some
neural networks using the just-computed fact set. A series of neural networks,
varying in size, are trained. The method used to select the most appropriately sized
and trained network is not the usual one of examining behavior on a test set con-
sisting of out-of-sample data. Instead, the correlation coefficients, which reflect
the predictive capabilities of each of the networks, are corrected for shrinkage
based on the sample size and the number of parameters or connection weights
being estimated in the corresponding network. The equation employed to correct
for shrinkage is the same one used to correct the multiple correlations derived
from a multivariate regression (see the chapters on optimization and statistics).
Shrinkage is greater for larger networks, and reflects curve-fitting of the undesir-
able kind. For a larger network to be selected over a smaller network, i.e., to over-
come its greater shrinkage, the correlation it produces must be sufficiently greater
than that produced by the smaller network. This technique enables networks to be
selected without the usual reference to out-of-sample data. All networks are fully
trained: i.e., no attempt is being made to compensate for loss of degrees of free-
dom by undertraining.
The best networks, selected on the basis of the shrinkage-corrected correla-
tions, are then tested using the actual entry model, together with the standard exit,
on both in- and out-of-sample data and across all markets. Because shrinkage results
from curve-fitting, excessively curve-fit networks should have very poor shrinkage-
corrected correlations. The large number of facts in the fact set (88,092) should help
reduce the extent of undesirable curve-fitting for moderately sized networks.
Code for the Reverse Slow %K Model
] ,, process next bar
The code is comprised of two functions: the usual function that implements the
trading model (Model), and a procedure to prepare the neural network inputs
(PrepareNeuralInputs). The procedure that prepares the inputs requires the index
of the current bar (cb) and a series of closing prices (cls) on which to operate.
The PrepareNeurallnputs function, given the index to the currant bar and a
series of closing prices, calculates all inputs for a given fact that are required for the
neural network. In the list, pbars, the numbers after the tirst zero (which is ignored),
are the lookbacks, relative to the current bar, which are used to calculate the price
differences discussed earlier. The first block of code, after the declarations, initial-
izes a price adjustment factor table. The table IS tmualized on the first pass through
the function and contains the square-roots of the number of bars between each pair
of prices from which a difference is computed. The next block of code calculates the
adjusted price differences, as well as the sum of the squares of these differences, i.e.,
the squared amplitude or length of the resultant vector. The final block of code in
this function normalizes the vector of price differences to unit length.
The general code that implements the model follows our standard practice.
After a block of declarations, a number of parameters are copied to local variables for
convenient reference. The 50-bar average true range, which is used for the standard
exit, and the time-reversed lo-bar Slow %K, used as a target, are then computed.
One of the parameters (mode) sets the mode in which the code will run. A
mode of 1 runs the code to prepare a fact file: The file is opened, the header (con-
sisting of the number of inputs, 18, and the number of targets, 1) is written, and the
fact count is initialized to zero. This process only occurs for the first market in the
portfolio. The tile remains open during all further processing until it is closed after
the last tradable in the portfolio has been processed. After the header, facts are writ-
ten to the tile. All data before the in-sample date and after the out-of-sample date
are ignored. Only the in-sample data are used. Each fact written to the file consists
of a fact number, the 18 input variables (obtained using PrepareNeuralInpufs), and
the target (which is the time-reversed Slow %K). Progress inforntation is displayed
for the user as the fact file is prepared.
If mode is set to 2, a neural network that has been trained using the fact tile
discussed above is used to generate entries into trades, The first block of code
opens and loads the desired neural network before beginning to process the first
commodity. Then the standard loop begins. It steps through bars to simulate actu-
al trading. After executing the usual code to update the simulator, calculate the
number of contracts to trade, avoid limit-locked days, etc., the block of code is
reached that generates the entry signals, stop prices, and limit prices. The
PrepareNeuralInpufs function is called to generate the inputs corresponding to the
current bar, these inputs are fed to the net, the network is told to run itself, the out-
put from the net is retrieved, and the entry signal is generated.
The rules used to generate the entry signal are as follows. If the output from the
network is greater than a threshold (thresh), a sell signal is issued; the net is predicting
a high value for the time-reversed Slow %K, meaning that the current closing price
might be near the high of its near-future range. If the output from the network (the pre-
diction of the time-reversed Slow %K) is below 100 minus thresh, a buy signal is
issued. As an example, if thresh were set to 80, any predicted,time-reversed Slow %K
greater than 80 would result in the posting of a sell signal, and any predicted time-
reversed Slow %K less than 20 would result in the issuing of a buy signal.
Finally, there are the two blocks of code used to issue the actual entry orders
and to implement the standardized exits. These blocks of code are identical to
those that have appeared and been discussed in previous chapters.
Test Methodology for the Reverse Slow %K Model
The model is executed with the mode parameter set to 1 to generate a fact set. The
fact set is loaded into N-TRAIN, a neural network development kit (Scientific
Consultant Services, 516-696-3333). appropriately scaled for neural processing,
and shuffled, as required when developing a neural network. A series of networks
are then trained, beginning with a small network and working up to a fairly large
network. Most of the networks are simple, 3-layer nets. Two 4.layer networks are
also trained. All nets are trained to maximum convergence and then â€śpolishedâ€ť to
remove any small biases or offsets. The process of polishing is achieved by reduc-
ing the learning rate to a very small number and then continuing to train the net for
about 50 runs.
Table 1 l-l contains information regarding all networks that were trained for
this model, along with the associated correlations and other statistics. In the table,
Nei Name = the file name to which the net was saved; Net Size = the number of
layers and the number of neurons in each layer; Connecrions = the number of con-
nections in the net optimized by the training process (similar to the number of
regression coeff%zients in a multiple regression in terms of their impact on curve-
fitting and shrinkage); and Correlarion = the multiple correlation of the network
output with the target (this is not a squared multiple correlation but an actual cor-
relation). Corrected for Shrinkage covers two columns: The left one represents the
correlation corrected for shrinkage under the assumption of an effective sample
size of 40,000 data points or facts in the training set. The right column represents
the correlation corrected for shrinkage under the assumption of 13,000 data points
or facts in the training set. The last line of the table contains the number of facts
or data points (Actual N) and the number of data points assumed for each of the
shrinkage corrections (Assumed).
The number of data points specified to the shrinkage adjustment equation is
smaller than the actual number of facts or data points in the training set. The reason
Training Statistics for Neural Nets to Predict Time-Reversed Slow %K
NN3.NET 18-8-l 152
NN4.NET 18-10-1 190
NNS.NET 18-12-l 228 p14r
NNB.NET Ibl44l 312
is the presence of redundancy between facts. Specifically, a fact derived from one
bar is likely to be fairly similar to a fact derived from an immediately adjacent bar.
Because of the similarity, the â€śeffectiveâ€ť number of data points, in terms of con-
tributing statistically independent information, will be smaller than the actual num-
ber of data points. The two corrected correlation columns represent adjustments
assuming two differently reduced numbers of facts. The process of correcting corre-
lations is analogous to that of correcting probabilities for multiple tests in optimiza-
tion: As a parameter is stepped through a number of values, results are likely to be
similar for nearby parameter values, meaning the effective number of tests is sotne-
what less tbau the actual number of tests.
Training Results for Time-Reversed Slow %K Model
As evident from Table 1 l-l, the raw correlations rose monotonically with the size
of the network in terms of numbers of connections. When adjusted for shrinkage,
by assuming an effective sample size of 13,000, the picture changed dramatically:
The nets that stood out were the small Mayer net with 6 middle layer neurons, and
the smaller of the two 4-layer networks. With the more moderate shrinkage cor-
rection, the two large 4-layer networks had the highest estimated predictive abili-
ty, as indicated by the multiple correlation of their outputs with the target.
On the basis of the more conservative statistics (those assuming a smaller
effective sample size and, hence, more shrinkage due to curve-fitting) in Table 1 l-
1, two neural nets were selected for use in the entry model: the 18-6-l network
(nn2.nei) and the 18-14-4-l network (nn&rer). These were considered the best
bets for nets that might hold up out-of-sample. For the test of the entry model
using these nets, the model implementation was run with mode set to 2. As usual,
all order types (at open, on limit, on stop) were tested.
For these models, two additional fact sets are needed. Except for their targets,
these fact sets are identical to the one constructed for the time-reversed Slow %K.
The target for the first fact set is a 1, indicating a bottom turning point, if tomor-
rowâ€™s open is lower than the 3 preceding bars and 10 succeeding bars. If not, this
target is set to 0. The target for the second fact set is a 1, indicating a top, if tomor-
rowâ€™s open has a price higher than the preceding 3 and succeeding 10 opens.
Otherwise this target is set to 0. Assuming there are consistent patterns in the mat-
ket, the networks should be able to learn them and, therefore, predict whether
tomorrowâ€™s open is going to be a top, a bottom, or neither.
Unlike the fact set for the time-reversed Slow %K model, the facts in the sets
for these models are generated only if tomorrowâ€™s open could possibly be a turn
ing point. If, for example, tomorrowâ€™s open is higher than todayâ€™s open, then
tomorrowâ€™s open cannot be considered a turning point, as defined earlier, no mat-
ter what happens thereafter. Why ask the network to make a prediction when there
is no uncertainty or need? Only in cases where there is an uncertainty about
whether tomorrowâ€™s open is going to be a turning point is it worth asking the net-
work to make a forecast. Therefore, facts are only generated for such cases.
The processing of the inputs, the use of statistics, and all other aspects of the
test methodology for the turning-point models are identical to that for the time-
reversed Slow %K model. Essentially, both models are identical, and so is the
methodology; only the subjects of the predictions, and, consequently, the targets
on which the nets are trained, differ. Lastly, since the predictions are different, the
rules for generating entries based on the predictions are different between models.
The outputs of the trained networks represent the probabilities, ranging from
0 to 1, of whether a bottom, a top, or neither is present. The two sets of rules for
the two models for generating entries are as follows: For the tirst model, if the bot-
tom predictor output is greater than a threshold, buy. For the second model, if the
top predictor output is greater than a threshold, sell. For both models, the thresh-
old represents a level of confidence that the nets must have that there will be a bot-
tom or a top before an entry order is placed.
// write actual in-sample facts to the fact file
forccb = 1; Cb <= nix cb++l (
if (fit L&l < ISDATE˜ continue;
if cdt Lcb+lOl > OOS˜DATE˜ break; // ignore 00s data
if(opnIch+l] >= Lowest(opn, 3 , Cb) I
// skip these fame
fprintf Cfil, â€ś$6dâ€ť, ++factcoâ€ťnt) i // fact number
PrepareNeâ€ťralInputs(â€śar, ClS, SD);
forck = 1; k c= 1s; kt+j
fprintf(fil, â€ś%7.3fâ€ť, varGd1; /, Standard inputs
if˜opn˜cbill < Lowest Copn, 9. cb+lO) 1
netout I 1.0; else netout = 0.0; ,, calculate target
fprintfcfil,â€˜%6.lf\Ilâ€ť. netout i ,, target
ifC(Cb % 500) == 1)
printf C = %d\nâ€™ , cb) ;
vZ* // progress info
// generate entry signals. stop prices and limit prices
if˜opn˜cb+ll c LOWest˜opn, 3 , Cbl) ( // r u n only these
PrepareNeâ€ťralmputs(var. cls, &I; I/ preprocess data
rmset-inputvbmet, &â€śarIll) ; ,, feed net inputs
ntlfire Cnnet) ; ,, run the net
netaâ€ť˜ = ntlget˜output˜nnet, 0); /, get mtput
netout *= 100.0; // scale to percent
Since the code for the bottom predictor model is almost identical to that of the time-
reversed Slow %K model, only the two blocks that contain changed code are pre-
sented above. In the first block of code, the time-reversed Slow %K is not used.
Instead, a series of ones or zeros is calculated that indicates the presence (1) or
absence (0) of bottoms (bottom target). When writing the facts, instead of writing the
time-reversed Slow %K, the bottom target is written. In the second block of code, the
roles for comparing the neural output with an appropriate threshold, and generating
the actual entry buy signals, are implemented. In both blocks, code is included to pre-
vent the writing of facts, or use of predictions, when tomorrowâ€™s open could not pos-
sibly be a bottom. Similar code fragments for the top predictor model appear below.
/I ignore 00s data
I/ skip these facts
,, fact number
,, calculate target
,, write target
Test Methodology for the Turning-Point Model
The test methodology for this model is identical to that used for the time-reversed
Slow %K model. The fact set is generated, loaded into N-TRAIN, scaled, and shuf-
fled. A series of nets (from 3. to 4-layer ones) are trained to maximum convergence
and then polished. Statistics such as shrinkage-corrected correlations are calculated.
Training Results for the Turning-Point Model
Bo&vn Forecaster. The structure of Table 1 l-2 is identical to that of Table 11-l. As
with the net trained to predict the time-reversed Slow %K, there was a monotonic
relationship between the number of connections in the network and the multiple cor-
relation of the networkâ€™s output with the target; i.e., larger nets evinced higher corre-
lations. The net was trained on a total of 23,900 facts, which is a smaller fact set than
that for the time-reversed Slow %K. The difference in number of facts resulted
because the only facts used were those that contained some uncertainty about whether
tomorrowâ€™s open could be a turning point. Since the facts for the bottom forecaster
came from more widely spaced points in the time series, it was assumed that there
would be less redundancy among them. When corrected for shrinkage, effective sam-
ple sizes of 23,919 (equal to the actual number of facts) and 8,000 (a reduced effec-
tive fact count) were assumed. In terms of the more severely adjusted correlations, the
best net in this model appeared to be the largest 4-layer network; the smaller 4-layer
network was also very good. Other than these two nets, only the 3.layer network with
10 middle-layer neurons was a possible choice. For tests of trading performance, the
large 4-layer network (nn9.nef) and the much smaller 3-layer network (n&.net) were
Training Statistics for Neural Nets to Predict Bottom Turning Points
Net Name INet Size ]Connections lCorfelation lCorrected forShrinkage
! I I I !
NNI .NET 1841 76 0.109 0.084 0.050
NN2.NET 18&i 114 0.121 0.100 0.025
NNB NET 1 &&I O˜lA8._
_.. -..-- _._ O˜tlAQ
. .-.. .- ,. .--. .--
NN4.NE.â€˜ T IlIL˜h4 I100 Ill ,Ra I â€ť 1411 rind
._ ._ . a__ -. ._â€ť _. .-. -.---
NNS.NET 1 &I 2-1 228 0.167 0.137 -0.019
NN8.NET 1818-l 304 0.185 0.148 -0.080
I$ .--- -.-- -..-- _.__.
1 a-20-.1 !mo II225 0˜188 0˜057
NN7.NE-r 18-14-4-1 ii2 0.219 0.188 o.oQ8
18-200-l 488 0.294 0.200 0.188
23900 Assumed 23900 8000
TABLE 1 l-3
for Neural Nets to Predict Top Turning Points
Net Name Net Size Connections Correlation Corrected forShrinkqje
NNI .NET 18-4-l 70 0.103 0.068 1 0.035
NN2 NET Ii1 0117 0 na71 -0 t-07
Actual N 125919 IAs-----â€™
Top Forecuster. Table 1 l-3 contains the statistics for the nets in this model; they
were trained on 25,919 facts. Again, the correlations were directly related in size to
the number of connections in the net, with a larger number of connections leading
to a better model fit. When mildly corrected for shrinkage, only the smaller 4-layer
network deviated from this relationship by having a higher correlation than would
be expected. When adjusted under the assumption of large amounts of curve-fitting
and shrinkage, only the two 4-layer networks stood out, with the largest one
(nn9.net) performing best. The only other high correlation obtained was for the 1%
10-l net (nn4.nef). To maximize the difference between the nets used in the trading
tests, the largest 4.layer net, which was the best shrinkage-corrected performer, and
the fairly small (18-10-l) net were chosen.
TRADING RESULTS FOR ALL MODELS
Table 11-4 provides data regarding whole portfolio performance with the best in-
sample parameters for each test in the optimization and verification samples. The
information is presented for each combination of order type, network, and model.
In the table, SAMâ€™ = whether the test was on the training or verification sample
(Nor OUT); ROA% = the annualized return-on-account; ARRR = the annualized
risk-to-reward ratio; PROB = the associated probability or statistical significance;
TRDS = the number of trades taken across all commodities in the portfolio;
WIN% = the percentage of winning trades; $TRD = the average profit/loss per
trade; BARS = the average number of days a trade was held; NETL = the total net
profit on long trades, in thousands of dollars; NETS = the total net profit on short
trades, in thousands of dollars. Columns PI, P2, and P3 represent parameter val-
Portfolio Performance with Best In-Sample Parameters for Each Test
in the Optimization and Verification Samples
Portfolio Performance with Best In-Sample Parameters for Each Test
in the Optimization and Verification Samples (Continued)
ues: PI = the threshold, P2 = the number of the neural network within the group
of networks trained for the model (these numbers correspond to the numbers used
in the file names for the networks shown in Tables 1 l-l through 1 l-3), P3 = not
used. In all cases, the threshold parameters (column Pl) shown are those that
resulted in the best in-sample performance. Identical parameters are used for ver-
ification on the out-of-sample data.
The threshold for the time-reversed Slow %K model was optimized for each
order type by stepping it from 50 to 90 in increments of 1. For the top and bottom
predictor models, the thresholds were stepped from 20 to 80 in increments of 2. In
each case, optimization was carried out only using the in-sample data. The best
parameters were then used to test the model on both the in-sample and out-of-sam-
ple data sets. This follows the usual practice established in this book.
Trading Results for the Reverse Slow %K Model
The two networks that were selected as most likely to hold up out-of-sample, based
on their shrinkage-adjusted multiple correlations with the target, were analyzed for
trading performance, The first network was the smaller of the two, having 3 layers
(18-6-l network). The second network had 4 layers (18-14-4-l network).
Results Using the 18-6-I Network. In-sample, as expected, the trading results
were superb. The average trade yielded a profit of greater than $6,000 across all
order types, and the system provided an exceptional annual return, ranging from
192.9% (entry at open, Test 1) to 134.6% (entry on stop, Test 3). Results this
good were obtained because a complex model containing 114 free parameters
was fitted to the data. Is there anything here beyond curve-fitting? Indeed there
is. With the stop order, out-of-sample performance was actually slightly prof-
itable-nothing very tradable, but at least not in negative territory: The average
trade pulled $362 from the market. Even though losses resulted out-of-sample
for the other two order types, the losses were rather small when compared with
those obtained from many of the tests in other chapters: With entry at open, the
system lost only $233 per trade. With entry on limit (Test 2), it lost $331. Again,
as has sometimes happened in other tests of countertrend models, a stop order,
rather than a limit order, performed best. The system was profitable out-of-sam-
ple across all orders when only the long trades were considered. It lost across all
orders on the short side.
In-sample performance was fabulous in almost every market in the portfolio,
with few exceptions, This was true across all order types. The weakest perfor-
mance was observed for Eurodollars, probably a result of the large number of con-
tracts (hence high transaction costs) that must be traded in this market. Weak
performance was also noted for Silver, Soybean Oil, T-Bonds, T-Bills, Canadian
Dollar, British Pound, Gold, and Cocoa. There must be something about these
markets that makes them difficult to trade, because, in-sample, most markets per-
form well. Many of these markets also performed poorly with other models.
Out-of-sample, good trading was obtained across all three orders for the T-
Bonds (which did not trade well in-sample), the Deutschemark, the Swiss Franc, the
Japanese Yen, Unleaded Gasoline, Gold (another market that did not trade well in-
sample), Palladium, and Coffee. Many other markets were profitable for two of the
three order types. The number of markets that could be traded profitably out-of-sam-
ple using neural networks is a bit surprising. When the stop order (overall, the best-
performing order) was considered, even the S&P 500 and NYFE yielded substantial
profits, as did Feeder Cattle, Live Cattle, Soybeans, Soybean Meal, and Oats.
Figure 1 l-l illustrates the equity growth for the time-reversed Slow %K pre-
dictor model with entry on a stop. The equity curve was steadily up in-sample, and
continued its upward movement throughout about half of the out-of-sample peri-
od, after which there was a mild decline.
Results of the 18-14-4-l Network. This network provided trading performance
that showed more improvement in-sample than out-of-sample. In-sample, returns
FIGURE 1 â€˜I -1
Equity Growth for Reverse Slow %K 18-6-l Net, with Entry on a Stop
E- Out-oLsampb equity
ranged from a low of 328.9% annualized (stop order, Test 6) to 534.7% (entry at
open, Test 4). In all cases, there was greater than $6,000 profit per trade. As usual,
the longs were more profitable than the shorts. Out-of-sample, every order type
produced losses. However, as noted in the previous set of tests, the losses were
smaller than typical of losing systems observed in many of the other chapters: i.e.,
the losses were about $1,000 per trade, rather than $2,000. This network also took
many more trades than the previous one. The limit order performed best (Test 5).
The long side evidenced smaller losses than the short side, except in the case of
the stop order, where the short side had relatively small losses. The better in-sam-
ple performance and worsened out-of-sample performance are clear evidence of
curve-fitting. The larger network, with its 320 parameters, was able to capitalize
on the idiosyncrasies of the training data, thereby increasing its performance in-
sample and decreasing it out-of-sample.
In-sample, virtually every market was profitable across every order. There
were only three exceptions: Silver, the Canadian Dollar, and Cocoa. These mar-
kets seem hard to trade using any system. Out-of-sample, several markets were
profitable across all three order types: the Deutschemark, the Canadian Dollar,
Light Crude, Heating Oil, Palladium, Feeder Cattle, Live Cattle, and Lumber. A
few other markets traded well with at least one of the order types.
The equity curve showed perfectly increasing equity until the out-of-sample
period, at which point it mildly declined. This is typical of a curve resulting from
overoptimization. Given a sample size of 88,092 facts, this network may have
been too large.
Trading Results for the Bottom Turning-Point Model
The two networks that were selected, on the basis of their corrected multiple correla-
tions with the target, as most likely to hold up out-of-sample are analyzed for trading
performance below. The tist network was the smaller of the two, having 3 layers (18-
10-I network). The second nehvork was a network with 4 layers (18-20-6-I network).
Results of the 18-10-I Nefwork. In-sample, this network performed exception-
ally well-nothing unusual, given the degree of curve-fitting involved. Out-of-
sample, there was a return to the scenario of a heavily losing system. For all three
order types (at open, on limit, and on stop, or Tests 7, 8, and 9, respectively), the
average loss per trade was in the $2,000 range, typical of many of the losing mod-
els tested in previous chapters. The heavy per-trade losses occurred although this
model was only trading long positions, which have characteristically performed
better than shorts.
In-sample, only four markets did not perform well: the British Pound, Silver,
Live Cattle, and Corn. Silver was a market that also gave all the previously tested
networks problems. Out-of-sample, the network was profitable across all three
order types for the S&P 500, the Japanese Yen, Light Crude, Unleaded Gasoline,
Palladium, Soybeans, and Bean Oil. A number of other markets were also prof-
itable with one or two of the orders.
The equity curve showed strong steady gains in-sample and losses out-of-
These results were derived from Tests l&12
Results of&e 18-20-6-I Nehvork.
(at open, on limit, and on stop, respectively). In-sample performance for this net-
work soared to unimaginable levels. With entry at open, the return was 768%
annualized, with 83% of the 699 trades taken being profitable. The average trade
produced $18,588 profit. Surprisingly, despite the larger size of this network
(therefore, the greater opportunity for curve-fitting), the out-of-sample perfor-
mance, on a dollars-per-trade basis, was better than the smaller network, especial-
ly in the case of the stop entry, where the loss per trade was down to $518.
All markets were profitable across all orders in-sample, without exception.
Out-of-sample, the S&P 500, the British Pound, Platinum, Palladium, Soybean
Meal, Wheat, Kansas Wheat, Minnesota Wheat, and Lumber were profitable
across all three order types.
Trading Results for the Top Turning-Point Model
The two networks that were selected as most likely to hold up out-of-sample,
based on their corrected multiple correlations with the target, are analyzed for
trading performance below. The first network was the smaller of the two, having
3 layers (18-10-l network). The second network had 4 layers (18-20-6-l network).
Results of the 18-10-I Network. As usual, the in-sample performance was
excellent. Out-of-sample performance was profitable for two of the orders: entry
at open (Test 13) and on limit (Test 14). There were moderate losses for the stop
order (Test 15). This is slightly surprising, given that the short side is usually less
profitable than the long side.
The market-by-market breakdown reveals that only the Canadian Dollar,
Feeder Cattle, Bean Oil, Wheat, and Cocoa were not profitable across all three
order types, in-sample. Out-of-sample, strong profits were observed across all
three orders for the Deutschemark, the Japanese Yen, Light Crude, Heating Oil,
Feeder Cattle, Live Cattle, and Corn. The Japanese Yen, Light Crude, and, to some
extent, Corn shared profitability with the corresponding bottom (long) turning-
point model; in other words, these markets held up out-of-sample for both the bot-
tom network and the top (short) network.
The equity curve (Figure 1 l-2) for entry at open depicts rapidly rising equi-
ty until August 1993, and then more slowly rising equity throughout the remain-
der of the in-sample period and through about two-thirds of the out-of-sample
period. Equity then declined slightly.
FIGURE 1 â€˜I - 2
Equity Growth for Short Turning Points, 18-l O-l Net, with Entry at Open
Results of the 18-20-6-I Network. As expected, this network, the larger of the
two, produced greater and more consistent in-sample profits due to a higher
amount of curve-fitting. Out-of-sample, this network performed terribly across all
order types (at open, on limit, and on stop, or Tests 16, 17, and 18, respectively).
The least bad results were obtained with the stop order.
In-sample, only Silver, Wheat, Sugar, and Orange Juice did not trade prof-
itably across all orders. Out-of-sample, only Cocoa showed profitability for all
three orders. Surprisingly, all the metals showed strong out-of-sample profitability
for the entry at open and on limit, as did Feeder Cattle, Cocoa, and Cotton.
Portfolio equity showed incredibly smooth and steep gains in-sample, with
losses out-of-sample for all order types.
Table 11-5 provides in-sample and Table 11-6 contains out-of-sample perfor-
mance statistics for all of the neural network models broken down by test and mar-
ket. The SYM column represents the market being studied. The rightmost column
(COUiV˜ contains the number of profitable tests for a given market. The numbers
in the lirst row represent test identifiers. The last row (COUruT) contains the num-
ber of markets on which a given model was profitable. The data in this table pro-
vides relatively detailed information about which markets are and are not
profitable when traded by each of the models: One dash (-) indicates a moderate
loss per trade, i.e., $2,000 to $4,000; two dashes (--) represent a large loss per
trade, i.e., $4,000 or more; one plus sign (+) means a moderate profit per trade,
i.e., $1,000 to $2,000; two pluses (+ +) indicate a large gain per trade, i.e., $2,000
or more; and a blank cell means that the loss was between $0 and $1,999 or the
profit was between $0 and $1,000 per trade. (For information about the various
markets and their symbols, see Table II-1 in the â€śIntroductionâ€ť to Part IL)
In-sample, every order and every model yielded exceptionally strong positive
returns (see Table 1 l-7). When averaged over all models, the entry at open and on
limit performed best, while the entry on stop was the worst; however, the differ-
ences are all very small. In-sample, the best dollars-per-trade performance was
observed with the large turning-point networks for the long (bottom) and short
(top) sides. Out-of-sample, the stop order provided the best overall results. The
time-reversed Slow %K and the short (top) turning-point models performed best
when averaged across all order types.
When a neural newbie model was tested on an individual market (Katz and
McCormick, November 1996), the conclusion was that such an approach does not
work at all. The out-of-sample behavior in some of the current tests was much bet-
In-Sample Results Broken Down by Test and Market
ter than expected, based on our earlier explorations of simple neural network mod-
els. In the current tests, the more encouraging results were almost certainly due to
the large number of data points in the training set, which resulted from training the
model across an entire portfolio, rather than on a single tradable. In general, the
larger the optimization (or training) sample, the greater the likelihood of contin-
ued performance in the verification sample. Sample size could be increased by
Out-of-Sample Results Broken Down by Test and Market
going back further in history, which would be relatively easy to accomplish since
many commodities contracts go back well beyond the start of our in-sample peri-
od (1985). It could also be increased by enlarging the portfolio with additional
markets, perhaps a better way to bolster the training sample.
A maxim of optimization is that the likelihood of performance holding up
increases with reduction in the number of model parameters. Given the somewhat
positive results obtained in some of the tests, it might be worthwhile to experiment
Performance of Neural Network Models Broken Down by Model,
Order, and Sample
with more sophisticated models. Specifically, better input preprocessing, in the
sense of something that could reduce the total number of inputs without much loss
of essential predictive information, would probably lead to a very good trading
system. With a smaller number of inputs, there would be fewer parameters (con-
nection weights) in the network to estimate. Consequently, curve-fitting, an appar-
ently significant issue judging by the results and shrinkage levels, would be less of
WHAT HAVE WE LEARNED?
Under certain conditions, even neural newbie models can work. The criti-
cal issue with neural networks is the problem of achieving an adequate
ratio of sample size to free parameters for the purpose of minimizing
harmful (as opposed to beneficial) curve-fitting.
n Curve-fitting is a problem with neural networks. Any methods by which
the total number of parameters to be estimated can be reduced, without
too much loss of predictive information, are worth exploring; e.g., more
sophisticated information-compressing input preprocessing would proba-
bly improve out-of-sample performance and reduce the effects of pemi-
a For similar reasons, large samples are critical to the training of successful
neural network trading models. This is why training on whole portfolios
provides better results than training on individual tradables, despite the
loss of market specificity. One suggestion is to increase the number of
markets in the portfolio and, thereby, achieve a larger in-sample training
set. Carrying this to an extreme, perhaps a neural network should be
trained on hundreds of commodities, stocks, and various other trading
instruments, in an effort to develop a â€śuniversal market forecaster.â€ť If there
are any universal â€śtechnicalâ€ť price patterns that exist in all markets and
that have predictive validity, such an effort might actually be worthwhile.
. Some markets trade poorly, even in-sample. Other markets tend to hold
up out-of-sample. This has been found with other models in earlier chap-
ters: Some markets are more amenable to trading using certain techniques
than are other markets. Selecting a subset of markets to trade, based on
continued out-of-sample performance, might be an approach to take when
developing and trading neural network systems.
E xtrapolating from models of biology and economics, mathematician/psycholo-
gist, John Holland, developed a genetic optimization algorithm and introduced it
to the world in his book, Adaptation in Natural and Artificial Systems (1975).
Genetic algorithms (or GAS) only became popular in computer science about 1.5
years later (Yuret and de la Maza, 1994). The trading community first took notice
around 1993, when a few articles (Burke, 1993; Katz and McCormick, 1994;
Oliver, 1994) and software products appeared. Since then, a few vendors have
added a genetic training option to their neural network development shells and a
few have â€śindustrial strengthâ€ť genetic optimization toolkits.
In the trading community, GAS never really had the kind of heyday experi-
enced by neural networks. The popularity of this technology probably never grew
due to its nature. Genetic algorithms are a bit difficult for the average person to
understand and more than a bit difficult to use properly. Regardless of their image,
from our experience, GAS can be extremely beneficial for system developers.
As with neural networks, while a brief discussion is included to provide basic
understanding, it is not within the scope of this book to present a full tutorial on
genetic algorithms. Readers interested in studying this subject further should read
Davis (1991), as well as our contributions to the book Virflral Trading (Katz and
McCormick, 1995a, 1995b) and our articles (Katz and McCormick, July/August
1994, December 1996, January 1997, February 1997).
WHAT ARE GENETIC ALGORITHMS?
A genetic algorithm solves a problem using a process similar to biological evolution.
It works by the recombination and mutation of gene sequences. Recombination and
mutation are genetic operators; i.e., they manipulate genes. A gene is a string of
codes (the genotype) that is decoded to yield a functional organism with specific
characteristics (the phenotype). A chromosome is a string of genes. In the case of
genetic optimization, as caxried out on such problems as those being addressed here,
the string of codes usually takes the form of a series of numbers.
During the simulated evolutionary process, a GA engages in mating and
selecting the members of the population (the chromosomes). Mating involves
crossover and mutation. In crossover, the elements that comprise the genes of dif-
ferent chromosomes (members of the population or solutions) are combined to
form new chromosomes. Mutation involves the introduction of random alterations
to these elements. This provides additional variation in the sets of chromosomes
being generated. As with the process of biological selection (where less-fit mem-
bers of the population leave fewer progenies), the less-fit solutions are weeded out
so the more-fit solutions can proliferate, yielding another generution that may con-
tain some better solutions than the previous one. The process of recombination,
random mutation, and selection has been shown to be an extremely powerful prob-
EVOLVING RULE-BASED ENTRY MODELS
What would happen if a GA were allowed to search, not merely for the best para-
meters (the more common way a GA is applied by traders), but also for the best
rules? In this chapter, the consequences of using a GA to evolve a complete entry
model, by discovering both the rules and the optimal parameters for those rules,
will be explored. Although somewhat complex, this methodology proved to be
effective in our first investigation (Katz and McCormick, February 1997).
How can a GA be used to discover great trading rules? The garden variety
GA just juggles numbers. It is necessary to find a way to map sets of numbers in
a one-to-one fashion to sets of rules. There are many ways this can be accom-
plished. A simple and effective method involves the construction of a set of rule
templates. A rule template is a partial specification for a rule, one that contains
blanks that need to be filled in. For example, if some of the rules in previous chap-
ters were regarded as rule templates, the blanks to be filled in would be the values
for the look-backs, thresholds, and other parameters. Using rule templates, as
defined in this manner, a one-to-one mapping of sets of numbers to fully specified
rules can easily be achieved. The first number (properly scaled) of any set is used
as an index into a table of rule templates. The remaining numbers of the set are
then used to fill in the blanks, with the result being a fully specified rule. The code
below contains a C+ + function (Rules) that implements this mapping strategy; it
will be discussed later. Although C+ + was used in the current study, this method
can also be implemented in TradeStation using the TS-EVOLVE software from
Scientific Consultant Services (516-696-3333).
The term genetic search applies to the use of a genetic algorithm to search
through an incredibly large set of potential solutions to find those that are best, i.e.,
that have the greatest fitness. In the current application, the intention is to use the
evolutionary process to discover sets of numbers (genotypes) that translate to rule-
based entry models (phenotypes) with the greatest degree of fitness (defined in
terms of desirable trading behavior). In short, we are going to attempt to engage
in the selective breeding of rule-based entry methods! Instead of beginning with a
particular principle on which to base a model (e.g., seasonal@, breakouts, etc.),
the starting point is an assortment of ideas that might contribute to the develop-
ment of a profitable entry. Instead of testing these ideas one by one or in combi-
nations to determine what works, something very unusual will be done: The
genetic process of evolution will be allowed to breed the best possible entry model
from the raw ideas.
The GA will search an extremely broad space of possibilities to find the best
rule-based entry model that can be achieved given the constraints imposed by the
rule templates, the data, and the limitation of restricting the models to a specified
number of rules (to prevent curve-fitting). To accomplish this, it is necessary to
find the best sets of numbers (those that map to the best sets of rule-based entry
models) from an exceedingly large universe of possibilities. The kind of massive
search for solutions would be almost impossible-certainly impractical, in any
realistic sense-to accomplish without the use of genetic algorithms. There are
alternatives to GAS, e.g., brute force searching may be used, but we do not have
thousands of years to wait for the results. Another alternative might be through the
process of rule induction, i.e., where an attempt is made to infer rules from a set
of observations; however, this approach would not necessarily allow a complex
function, such as that of the risk-to-reward ratio of a trading model, to be maxi-
mized. Genetic algorithms provide an efticient way to accomplish very large
searches, especially when there are no simple problem-solving heuristics or tech-
niques that may otherwise be used.
EVOLVING AN ENTRY MODEL
In this exercise, a population of three-rule entry models are evolved using
OptEvolve, a C+ + genetic optimizer (Scientific Consultant Services, 516-696.
3333). Each gene corresponds to a block of four numbers and to a rule, via the
one-to-one mapping of sets of numbers to sets of rules. Each chromosome con-
tains three genes. A chromosome, as generated by the GA, therefore consists of 12
numbers: The first four numbers correspond to the first gene (or rule), the next
four correspond to the second gene (or rule), and the last four correspond to the
third gene (or rule). The GA itself has to be informed of the gene size so it does
not break up intrinsic genes when performing crossover. Crossover should only
occur at the boundaries of genes, i.e., four number blocks. In the current example,
this will be achieved by setting the â€śchunk size,â€ť a property of the genetic opti-
mizer component, to four.
As mentioned, each gene is composed of four numbers. The first number is
nothing more than an index into a table of possible rule templates. For example, if
that number is 1, a price-comparison template in which the difference between two
closing prices is compared with a threshold (see code) is selected. The remaining
three numbers in the gene then control the two lookback periods for the prices being
compared and the threshold. If the first number of the four-number block is 2, a
price-to-moving-average comparison template would be selected. In that case, two
of the remaining three numbers would control the period of the moving average and
the direction of the comparison (whether the price should be above or below the
moving average). In general, if the first number in the block of four numbers that
represents a gene is n, then the n&rule template is used, and any required parame-
ters are determined by reference to the remaining three numbers in the four-number
block. This decoding scheme makes it easy to maintain an expandable database of
rule templates. Each of the three blocks of four numbers is mapped to a corre-
sponding rule. For any 12-number chromosome, a 3-rule entry model is produced.
The Rule Templates
The first rule template (case I in function Rules) defines a comparison between
two prices and a threshold: The rule takes on a value of TRUE if the closing price
lb1 bars ago is greater than some threshold factor (thr) plus the closing price lb2
bars ago. Otherwise, the rule takes on the value of FALSE. The unknowns (Ibl,
lb2, and thr) are left as the blanks to be tilled in during instantiation. This template
has been included because the kind of rule it represents was useful in previous
The second rule template (case 2) involves simple moving averages, which
are often used to determine trend. Usually the market is thought to be trending up
if the price is above its moving average and trending down if the price is below its
moving average. There are only two unknowns in this template: The first (per)
controls the number of bars in the moving average, and the second (˜4) controls
the direction of comparison (above or below).
The third rule template (case 3) is identical to the second (cuse 2), except that
an exponential moving average is used rather than a simple moving average.
Much discussion has occurred regarding the importance of open interest.
Larry Williams (1979) mentioned that a decline in total open interest, during a
period when the market has been moving sideways, indicates potential for a strong
rally. A shrinking of open interest may be interpreted as a decline in available con-
tracts, producing a condition where demand may outweigh supply. The fourth rule
template (case 4) simply computes the percentage decline in open interest from
161 bars ago to 1 bar ago (open interest is generally not available for the current
CHAPTER 12 Geâ€ť& Algorithms
bar) and compares it with a threshold (thr). If the percentage decline is greater than
the threshold, the rule takes on the value TRUE. Otherwise it evaluates to FALSE.
The threshold and the lookback (Ibl) are the unknowns to be tilled in at the time
The fifth rule template (use 5) is similar to the fourth template, but a rise,
rather than fall, in total open interest is being sought. If the increase, as a percent-
age, is greater than a threshold, then the rule evaluates to TRUE. Otherwise it eval-
uates to FALSE. As previously, the lookback and the threshold are unknowns that
must he supplied to instantiate the rule.
The sixth rule template (case 6) can be called a â€śnew highâ€ť condition: The
template asks whether an &l-bar new high has occurred within the last lb2 bars.
A particular instance of the rule might read: â€śIf a new 50-day high has occurred
within the last 10 days, then TRUE, else FALSE.â€ť This rule attempts to capture a
simple breakout condition, allowing for breakouts that may have occurred several
bars ago (perhaps followed by a pull-back to the previous resistance-turned-sup-
port that another rule has detected as a good entry point). There are two blanks to
be tilled in to instantiate this template: Zbl and lb2.
The seventh rule template (case 7) is identical to the sixth rule template,
except that new lows, rather than new highs, are being detected.
The eighth rule template (case 8) examines the average directional move-
ment index with respect to two thresholds (thrl and rhr2). This is a measure of
trendiness, as discussed in the chapter on breakouts. If the average directional
movement (ADX) is above a lower threshold and below an upper threshold, the
rule evaluates to TRUE. Otherwise, the rule evaluates to FALSE.
The ninth rule template (case 9) performs a threshold comparison on the
Stochastic oscillator that is similar to that performed in Rule 8.
The tenth rule template (case 10) evaluates the direction of the slope of the
MACD oscillator. The lengths (Zbl and lb2) of the two moving averages that com-
pose the MACD, and the direction of the slope (˜4) required for the role to evalu-
ate to TRUE, are specified as parameters.