. 5
( 18)


© 2009 by Taylor and Francis Group, LLC
112 Algorithmic Cryptanalysis
b,u b
computing fA for at most a few vectors u su¬ces to recover fA with a good
enough probability. For example, when fA is an irreducible polynomial over
the ¬nite ¬eld we are considering, we necessarily have fA = fA when u = 0.
b b
Even when fA is composite, a fraction of the vectors u satis¬es fA = fA .
Moreover, we can recover fA by taking the lowest common multiple of a few
polynomials of the form fA .
Thus, to complete the description of Wiedemann™s algorithm, we need to
e¬ciently compute fA from the sequence T b,u . This can be done using
Berlekamp-Massey algorithm, as described in Chapter 2. Since this algorithm
recovers the minimal polynomial with degree at most d from a sequence of
2d elements over a ¬nite ¬eld, it can be used in the context of Wiedemann™s
algorithm by letting d be the dimension of the matrix A. Note that when A
is non-square, the smallest dimension su¬ces. Block iterative algorithms

In Section, we have seen that over small ¬nite ¬elds, Lanczos™s algo-
rithm may fail by encountering self-orthogonal vectors during its computation.
We recall that a self-orthogonal vector x whose coordinates lie in a ¬nite ¬eld
Fq is self-orthogonal when (x|x) = 0 (in Fq ). Heuristically, this bad event
occurs with probability near 1/q for each scalar product occurring in the al-
gorithm. This implies that for small ¬nite ¬elds, Lanczos™s algorithm is likely
to fail. When this happens, there is a simple way to avoid the problem: in-
stead of using Equation (3.15) with an orthogonal basis (vi ) de¬ned over Fq ,
we consider an orthogonal basis de¬ned over an extension ¬eld FQ , with Q a
large enough power of q. With this simple change, the individual probabil-
ity of failure of each scalar product becomes 1/Q, assuming that the initial
vector v1 is chosen at random in this large ¬eld. The fact that we search a
solution for the linear system in Fq does not a¬ect these probabilities, since it
only impacts the scalar products with y, which are not used as denominators.
Thus, this simple change removes the bad behavior of Lanczos™s algorithm.
However, this change is very costly because each computation in Fq is replaced
by a computation in FQ , without lowering the number of computations that
need to be performed.
Another approach is the block Lanczos algorithm proposed in [Mon95],
which starts from a block of several initial random vectors. At each round,
this algorithm computes a new block of vectors by multiplying the previous
block by the matrix and by performing orthogonalization with respect to the
previous blocks. The details are similar to those of plain Lanczos™s; however,
there are some additional steps. First, the orthogonalization process requires
linear algebra on some matrices, whose dimensions are the number of blocks.
Second, some self-orthogonal vectors may be encountered during the compu-
tation and in order to deal with them, they are removed from the current

© 2009 by Taylor and Francis Group, LLC
Linear algebra 113

block and added to the next one. This implies that block Lanczos algorithm
needs to deal with block of (slightly) varying size.
In addition to the possibility of working with small ¬nite ¬elds, block Lanc-
zos has another advantage compared to ordinary Lanczos: it is much better
suited for parallel computer architecture. As a consequence, it is frequently
used for computations in large dimension. Note that Wiedemann™s algorithm
also has a block version [Cop94], which is also well adapted to parallel or even
distributed computation (see [AFK+ 07] for an example).

3.4.2 Structured Gaussian elimination
Structured Gaussian elimination was ¬rst introduced in the context of
index calculus algorithms by Odlyzko in [Odl85] (see also Lamacchia and
Odlyzko [LO91]). Its goal is, starting from a large sparse system of linear
equations, to reduce it to a smaller system, while maintaining a relative spar-
sity. It has been especially devised for the kind of linear system typically
encountered in index calculus algorithms. In these systems, some variables
occur very frequently in equations while others are quite rare. Moreover, with
enough sieving e¬ort, these systems can be largely overde¬ned. These two
speci¬c properties are essential to the behavior of Lamacchia and Odlyzko™s
algorithm. Note that this algorithm is heuristic and that no de¬nite analy-
sis of its complexity is available. However, this is not essential in practice.
Indeed, structured Gaussian elimination is not used alone, but always in con-
junction with an iterative algorithm, as a practical method to reduce the cost
of this iterative algorithm. Moreover, using such structured Gaussian elimi-
nation does not improve the asymptotic complexity of index calculus methods
as described in Chapter 15.
In the sequel, we are given a sparse linear system of equations and we
try to simplify the system, reducing the number of equations and unknowns,
without losing too much sparsity. Note that it is useful to start with an
overdetermined system, i.e., with more equations than unknowns. This allows
us to discard some equations along the way, when they become too dense. It
is also convenient to assume that most coe¬cients that appear in the linear
system are either 1 or ’1 and that an overwhelming majority of coe¬cients
are small enough to ¬t in a single computer word or even in a single byte.
To introduce the ideas used in structured Gaussian elimination, we start
by considering some special cases of interest. The simplest transform that
can be applied to the initial system of equations is the removal of singletons.
Scan the system of equations and if any unknown only appears in a single
equation, remove both the unknown and the equation from the system. This
transform is very e¬cient since it lowers the number of equations and unknows
and also reduces the total memory size of the linear system description. The
next transform that we can apply looks either for equations containing two
unknowns only or for unknowns that appear in two equations only. In both
cases, it is convenient to add the additional constraint that the two coe¬cients

© 2009 by Taylor and Francis Group, LLC
114 Algorithmic Cryptanalysis

involved are both either 1 or ’1. In this case, an equation containing two
unknowns can be rewritten as y = ±x + a, where x and y are the involved
unknowns and a is a constant; thus y can be replaced everywhere by either x or
’x, with a corresponding change to the right-hand side constants. Similarly, if
x only appears in two equations E and E , we can replace these two equations
by either E ’ E or E + E and remove x from the list of unknowns. Note that
these two transforms are in some sense dual. Applying one to the matrix of a
linear system corresponds to applying the other one to the transposed matrix.
These two transforms remove one equation and one unknown, while leaving
the memory size of the system essentially constant. In fact, the amount of
memory usually decreases by 2 coe¬cients.
These two simple transforms can be applied iteratively; indeed, removing or
merging equations may create new singletons or doubletons. Once this is done,
we need to consider heavier equations and/or unknowns that appear more
often. Typically, if an equation with t unknowns exists and if, at least, one of
its coe¬cients is 1 or ’1, then the corresponding unknown can be replaced by
an a¬ne combination of the others everywhere in the linear system. Similarly,
if an unknown appears times, at least once with coe¬cient 1 or ’1, it can be
eliminated by subtracting the corresponding equation from each of the ’ 1
others. In fact, if unknown x appears in equation E with coe¬cient 1 or
’1, with x appearing a total of t times and if E has weight , this transform
removes one unknown and one equation; moreover, assuming that no other
cancellation occurs, the memory size is increased by (t ’ 1)( ’ 1) ’ t ’ + 1 =
(t ’ 2)( ’ 2) ’ 2.
Since we took care to start from an overde¬ned system of equations, another
basic transform is available; it consists of simply removing some equations
when the system becomes too large to ¬t into memory. Of course, during this
step, it is preferable to remove the equation(s) with the largest weight.
All existing variations of structured Gaussian elimination are essentially
heuristic methods that aim at using these transforms or close variations in an
e¬cient way. Odlyzko™s method
In the original method of Odlyzko, the geometry of the speci¬c systems
of equations encountered during index calculus algorithms1 is also taken into
account. With these systems, some unknowns, corresponding to small primes,
occur frequently, while others are much rarer. This dissymmetry is re¬‚ected
in the structured Gaussian elimination by distinguishing between heavy and
light unknowns. The transforms that are applied are adapted to take this
distinction into account. More precisely, transforms in the following list are
applied repeatedly:

1 See Chapter 15.

© 2009 by Taylor and Francis Group, LLC
Linear algebra 115

• Find all light unknowns which appear only once and remove the corre-
sponding rows and columns.

• Find all light unknowns which appear alone, with coe¬cient 1 or ’1, in
some row, replace the unknown everywhere by its expression in terms
of heavy unknowns and remove the corresponding row.

• Enlarge the set of heavy unknowns, declaring as heavy some of the
previously light unknowns. These unknowns are selected among the
more frequently occurring.

• Remove some heavy rows.

The above transforms eliminate singletons, with a modi¬ed de¬nition of
weight that ignores the contribution of heavy unknowns, i.e., of unknowns
which appear too frequently. In [LO91], the elimination of doubletons was
also considered; however, experiments showed that in this case, the overall
weight grows too quickly.
This approach of Odlyzko works well for linear systems of equations that
appear during index calculus algorithms for factoring or computing discrete
logarithms. Moreover, when implementing the method it is possible to remove
heavy unknowns from memory and to recompute them afterwards by journal-
ing the history of the computation. However, the approach also presents some

• The approach is not well adapted to the large primes variation of index
calculus. To deal with such systems, it is preferable to work with the
graph method presented below.

• Linear systems arising during the computation of discrete logarithms in
¬nite ¬elds of small characteristics behave badly with this algorithm.
The problem is that instead of having unknowns associated with prime
numbers, whose frequencies slowly decrease as the primes become larger,
we have unknowns that correspond to irreducible polynomials and the
frequency is a function of the degree of this polynomial. As a con-
sequence, when modifying the set of heavy unknowns, any reasonable
strategy makes jumps and adds a whole family of unknowns, correspond-
ing to the same degree. In this context, the overall behavior is quite bad.
There, the greedy method below is preferable.

• Finally, despite the relative simplicity of the method, programming this
approach is quite tricky and there is no easy way to check that the
structured Gaussian elimination did not subtly alter the system due to
an undetected bug. When faced with this concern, using the lightweight
approach of Section is a good idea.

© 2009 by Taylor and Francis Group, LLC
116 Algorithmic Cryptanalysis Markowitz pivoting
Instead of partitioning the unknowns between the heavy and light cate-
gories, we can use Markowitz pivoting, a greedy method that simply tries
to apply the basic transforms we described in the introduction of this sec-
tion. We already know that performing Gaussian elimination to remove an
unknown that appears in di¬erent equations and in particular in an equation
of weight t, with a coe¬cient 1 or ’1, usually increases the total number of
entries in the linear system by (t ’ 2)( ’ 2) ’ 2. Markowitz pivoting selects
at each step of the computation to perform the elimination corresponding to
the smallest possible increases of the linear system size. When applying this
strategy, the size of the system slowly increases; when it reaches a predeter-
mined threshold, a fraction of the equations with highest weight is removed.
The program stops when the number of remaining equations becomes smaller
than the number of remaining unknows plus a small ¬xed security margin,
e.g., 100 extra equations. Another option for halting structured Gaussian
elimination is to compute at regular intervals an estimate of the complexity
of running an iterative solver on the current system of equations. At the
beginning of structured elimination, this estimate quickly decreases, at some
point of the elimination it reaches a minimum, when this happens, stopping
the structured elimination is a good idea. All in all, this often gives a better
result than pushing the elimination too far, which results in a denser matrix.
The principle of the greedy approach is very simple and was ¬rst described
in [Mar57]. However, the bookkeeping required by this approach is quite
massive. The key questions are to e¬ciently locate at each step the pair
unknown/equation that corresponds to the lowest value of (t’2)( ’2)’2 and
to e¬ciently perform the Gaussian elimination steps. Clearly, recomputing
(t ’ 2)( ’ 2) ’ 2 for all pairs at each step of the computation to locate the
minimum value is going to be too costly. Similarly, in order to perform the
Gaussian elimination or pivoting step, we cannot a¬ord to scan the complete
linear system in order to locate the unknown that needs to be eliminated.
Thankfully, these two tasks can be managed e¬ciently by using adequate
data structures. Structure for e¬cient pivoting Given a pair (E, x) where E
is an equation of weight t and x an unknown that appear in E with coe¬cient
1 or ’1 and is the total number of equations that contains x, we need to
e¬ciently locate all equations that contain x and add/subtract some multiple
of E from these equations. A neat solution to perform this task is to store
not only the sparse matrix corresponding to the system of equations but also
its transpose. Note that when storing this transpose, it is possible to take
advantage of the heavy column idea of Odlyzko™s original approach. Indeed,
unknowns with a very large value for are unlikely to minimize W ( , t) =
( ’ 2)(t ’ 2) ’ 2. As a consequence, it su¬ces to store a fraction of the
transpose. During pivoting, both the linear system and the transposed matrix

© 2009 by Taylor and Francis Group, LLC
Linear algebra 117

need to be updated. However, since the linear system is sparse, the update
task only a¬ects a fraction of the unknows and equations. Structure for pivot selection To select a pivot, we need to
¬nd the pair (E, x) that minimizes the quantity W ( , t). A ¬rst remark is
that for a given unknown x, W ( , t) can be computed by scanning the column
of x using the transposed copy of the linear system and by remembering the
equation where x appears with coe¬cient 1 or ’1 that has the lowest weight.
Thus, for every unknown x, we can de¬ne a best possible pair (E, x) and a
corresponding value W ( , t). During a pivoting step, thanks to sparsity, only
a small number of unknows are subject to change and need an update of their
W ( , t) value. The only remaining question is to ¬nd a data structure that
allows to e¬ciently locate the unknown with the smallest value at each step
and that also allows e¬cient updates. This can for example be achieved using
self-balancing tree techniques (see Chapter 6). A lightweight approach
One notable disadvantage of using Markowitz pivoting is that since e¬cient
implementations of this algorithm require a complex data structure, they may
contain subtle, hard to detect bugs. In the context of index calculus, such a
bug may cause failure of the sparse linear algebra algorithm used after struc-
ture Gaussian elimination. However, such a failure could also arise due to
a bug in sieving or even due to a hardware failure at some point during the
process that would modify a single equation somewhere. When this happens,
it may become necessary to use simpler algorithms. For this reason, we now
propose a lightweight approach to structured Gaussian elimination. After an
initialization step, this approach works in several rounds, each round being
broken into two phases. During initialization, we count the total number of
occurrence of each unknown and memorize the list of light unknowns which
occur less than some ¬xed threshold. The ¬rst phase of each performs statis-
tics on the sparse system of equations, stored in memory. More precisely, it
sorts the equations by increasing values of the weight, or rather estimated
weight. When an equation contains a single light (active) unknown, this un-
known is remembered as a pivot for the equation. In case of con¬‚ict between
two equations, the lighter one is given precedence to keep the pivot. The
second phase performs the pivoting step in a lazy way. It simply adds to each
equation containing one or more unknowns declared as pivot a list of entries
that remember that a multiple of the equation corresponding to the pivot
should be added to it. The addition is not performed, however, the estimated
weight of the equation is modi¬ed, by adding to it the weight of the equation
being added minus 1. After that, all pivots, together with their corresponding
equations, are declared unactive unknowns and we proceed to the next round.
In a ¬nal phase, the algorithm evaluates for each equation the sum of the
initial entry and of the contribution of all equations that have been lazily

© 2009 by Taylor and Francis Group, LLC
118 Algorithmic Cryptanalysis

added. Note, that this needs to be done in a recursive manner since the
added equations may also contain further lazy additions. Any equation that
becomes too large during this process is simply discarded. Note that for
debugging purposes, only this part of the algorithm need to be checked in
details. Indeed, by construction, this part can only output linear combinations
of the original equations, which by de¬nition are valid equations. Any bug
in the previous parts of the program might a¬ect the level of sparsity of the
output system but not its correctness. Graph method
In the so-called large prime variation of index calculus algorithms, we en-
counter systems of equations with some additional structure. More precisely,
the set of unknowns is divided in two parts: unknowns corresponding to regu-
lar primes and unknowns corresponding to large primes. The main di¬erence
is that on average, each large prime appears less than once in the linear sys-
tem, while regular primes appear at least a few times on average. Moreover,
large prime unknowns always have coe¬cients 1 or ’1. Of course, an un-
known that appears only once can be safely removed, with the corresponding
equation from the linear systems. Thus, the average large prime essentially
yields a useless equation. However, due to birthday paradox e¬ects (see Chap-
ter 6), a small fraction of large primes appears at least twice. If there is at
most one large prime in each equation, it su¬ces to merge the two equations
where a given large prime occurs as in basic structured Gaussian elimination.
The resulting equation has double weight, thus remaining reasonably sparse.
However, with current index calculus algorithms, we may have up to four large
primes per equation.
Another speci¬city of the large prime variation is that since a large frac-
tion of large primes does not appear, a large fraction of the collected linear
equations is lost. As a consequence, we need to consider a huge system, much
bigger than systems arising from index calculus without large primes. For
this reason, it is important to use algorithms that do not require loading into
memory the complete system of equations. Instead, the algorithms should
work using only the large prime parts of equations. This approach called ¬l-
tering is in particular discussed in [Cav00]. Of course, in a ¬nal stage, it is
necessary to compute the resulting linear system after large prime elimina-
tion. However, this ¬nal stage only involves the small fraction of equations
that remains and is not a real problem.
With this idea in mind, speci¬c algorithms were proposed. These algo-
rithms treat Gaussian elimination as a graph problem. To explain how, let us
¬rst consider the restricted case with at most two large primes per equation.
In that case, we can view each unknown of the large type as a node in the
graph and each equation as an edge between the two large primes that appear
in the equation. When a given large prime appears alone in an equation, we
construct an edge between the large prime and a special node no-unknown.

© 2009 by Taylor and Francis Group, LLC
Linear algebra 119

This representation is helpful, because any linear combination (with ±1 co-
e¬cients) of equations where all large prime unknowns cancel out induces a
cycle in the above graph. Moreover, the converse is “almost” true. If the
linear system is considered modulo 2 as when factoring, since any node in a
cycle appears an even number of times, usually twice, summing all equations
that correspond to the edges of the cycle, the contribution of the large primes
vanishes. When the linear system is modulo a larger prime, it is possible to
adjust the sign of the contribution of each equation in the cycle to make sure
that all unknowns but one vanish. If the special node no-unknown occurs in
the cycle, we can make the contribution of all other nodes vanish and thus
obtain a new equation without large primes. If it does not occur, we need
to rely on luck for eliminating the ¬nal large prime unknown and we succeed
with probability 1/2. In truth, the whole approach is more complicated than
that, because it does not su¬ce to ¬nd a single cycle. To make the most out
of the available equations, we need to construct a set called a cycle base which
generates all possible cycles with redundancies. Final backtracking
After applying any of the above structured Gaussian elimination techniques,
one obtains a reduced sparse system of equations which can be solved using
iterative techniques. However, after that, it is often necessary to turn back to
the initial linear system. Two cases arise depending on the target application.
For factoring, the goal of the linear algebra step is to ¬nd an element of the
kernel of the sparse matrix. For discrete logarithms, the goal is to ¬nd values
for the unknowns.
Let us start by considering the case of discrete logarithms. After apply-
ing the iterative solver, we have values only for the unknowns that remain
in the linear system after structured Gaussian elimination. We would like to
complete this result and recover a value for all unknowns present in the orig-
inal system. This can be done using a very simple backtracking algorithm.
In fact, it su¬ces to scan the initial system of equations and, whenever an
equation with a single undetermined unknown is encountered, to compute a
value for this unknown. Repeating this step until no new value can be added
yields a complete (or almost complete) table of values for the unknowns of the
initial system. Indeed, since structured Gaussian elimination eliminates one
unknown at a time, this backtracking method can recover the missing values
one by one.
The case of factoring is slightly more complicated. Here, we are given
a kernel element as a linear combination of equations in the transformed
system and we need to express it as a linear combination of equations in
the original system. The simplest way to do this is to use journaling in the
structured Gaussian elimination algorithm. In the journal, we give labels to all
initial equations and to all intermediate equations that arise during Gaussian
elimination. Note that, each intermediate equation is a linear combination

© 2009 by Taylor and Francis Group, LLC
120 Algorithmic Cryptanalysis

of two preexisting equations. We now write a kernel element as a linear
combination of equations, allowing all initial, intermediate or ¬nal equations
in the expression. Such a expression can be iteratively transformed using the
journal by replacing each equation by its expression in terms of its immediate
predecessors. At the end, we obtain a linear equation that only involves the
initial equations. Since the linear combinations that represent kernel elements
are not sparse, we can only perform backtracking on a small number of kernel
elements. However, this is enough for factoring purposes.

© 2009 by Taylor and Francis Group, LLC
Linear algebra 121

1h . Construct a linear system of 4 equations in 4 unknowns, without a solu-
tion. Another one, over the ¬eld of reals, with in¬nitely many solutions.
Write a linear system of 4 equations in 4 unknowns, with a unique so-
lution, such that Algorithm 3.4 fails. Choose your example in a way
that ensures that if the order of the equations in the system is reversed,
Algorithm 3.4 succeeds.

2. Put together the code fragments in Programs 3.4 and 3.5 to realize
a complete multiplication of 32 — 32 Boolean matrix. Extend this to
64—64 and 128—128. Pro¬le the execution of the resulting code on your
computer, using the available compiler(s), identify the critical parts of
the program. Without changing the algorithm, rewrite the critical parts
of the code to improve the overall performance.

3h . Assume that we would like to multiply one ¬xed square Boolean matrix
of small dimension by many matrices of the same size. To improve the
performance, we consider the algorithm of four Russians which precom-
putes and stores the results all the possible multiplication for rectangular
slices of the original matrix. Assuming that the number of multipli-
cations is very large, we can ignore the time of this precomputation.
Implement and optimize this on your computer.

4. Program 3.6 uses the 32 — 32 matrix multiplication as a black box, in
order to multiply 32n — 32n Boolean matrices. This can be improved
in many ways. A ¬rst option is to simply share the fold operations
rather than doing them independently for each small multiplication. A
second option is to use the algorithm of four Russians from the previous

5h . In order to make Strassen™s multiplication algorithm running time be-
have more smoothly as a function of the dimension, we should change
the rounding rule and the decision to cut o¬ the algorithm. One simple
but time-consuming option is to run the program for each size using
three possibilities (or two for even dimensions), Strassen with rounding
up, Strassen with rounding down or basic multiplication. After this,
save the most e¬cient option for this dimension in a table. Then, when
the multiplication is called recursively from a larger size, reuse the saved
option. Note that this table of decision needs to be constructed by start-
ing from small sizes and going up.

6. Reprogram a recursive multiplication algorithm using Winograd™s for-
mulas instead of Strassen™s.

© 2009 by Taylor and Francis Group, LLC
122 Algorithmic Cryptanalysis

7h . Consider Schur™s complement method for matrix multiplication. How
would you bypass the problem caused by the fact that D might be non-
invertible? The goal is to achieve asymptotic complexity with exponent
log2 7.

8. Gaussian elimination can also be used for matrices de¬ned over a ring.
For example, consider the issue of solving linear systems of equations
modulo 256 when the determinant of the system is a multiple of 256
(see Section What new di¬culties do appear? How can you
solve them?

9h . Implement the computation of Hermite normal forms over the integers.

Some of the methods presented in this chapter can also be a basis for pro-
gramming projects. Implementations of structured Gaussian elimination or
iterative algorithms are very interesting in this respect, especially in conjunc-
tion with the applications presented in Chapters 11 and 15.

© 2009 by Taylor and Francis Group, LLC
Chapter 4
Sieve algorithms

Many modern algorithms for factoring and computing discrete logarithms
in ¬nite ¬elds heavily rely on sieving in practical implementations1 . These
index calculus algorithms themselves are described in the applications part
of this book, more precisely in Chapter 15. We chose to present the sieve
algorithms in the present chapter in order to focus on the variety of possible
implementations and available trade-o¬s.

4.1 Introductory example: Eratosthenes™s sieve
Eratosthenes (276“194 BC) was an ancient Greek mathematician. Among
his many contributions, he invented one of the ¬rst algorithms known. This
algorithm, called Eratosthenes™s sieve, is an e¬cient method for computing
all prime numbers up to a given bound. In 2004, Atkin and Bernstein [AB04]
found a faster algorithm for achieving this task. However, before that, Eratos-
thenes™s sieve had essentially been unsurpassed for more than two millennia.
In this section, in order to introduce sieve algorithms, we show how to e¬-
ciently implement Eratosthenes™s sieve on modern computers.

4.1.1 Overview of Eratosthenes™s sieve
The basic idea of Eratosthenes™s sieve is to start from a list of integers from 1
up to some initial limit and to remove all non-primes from the list, thus ending
with a list of primes. The ¬rst step is to cross 1, which is a unit, thus not a
prime, in the ring of integers. After that, the smallest remaining number, 2,
is a prime. As a consequence, all multiples of 2 are composite numbers and
can be crossed from the list. The next smallest remaining number is 3, which
is prime and its multiples need to be crossed. After that, we ¬nd in turn the
prime numbers 5, 7, 11 and so on, cross their multiples and continue until

1 Aswe will see in Chapter 15, sieving is only an implementation aspect of these algorithms
and can be ignored where asymptotic analysis is concerned. Nonetheless, it is a very
important practical aspect.

© 2009 by Taylor and Francis Group, LLC
124 Algorithmic Cryptanalysis

we reach the end of our list. To make the description more precise, we write
down the pseudo-code in Algorithm 4.1.

Algorithm 4.1 Eratosthenes™s sieve
Require: Initial limit Lim
Create array IsP rime indexed from 1 to Lim, initialized to true.
IsP rime[1] ←’ false
for i from 2 to Lim do
if IsP rime[i] is true then
j ←’ 2i
IsP rime[j] ←’ false; j ←’ j + i
until j > Lim
end if
end for
for i from 2 to Lim do
if IsP rime[i] is true then
Display(i,™is prime™)
end if
end for

Before going through the process of programming and optimizing this al-
gorithm, let us analyze its main properties. First of all, the algorithm needs
to work with Boolean values. From a theoretical point-of-view, this is ex-
tremely nice. However, the practical use of Booleans in the C language is
not completely straightforward. Indeed, no Boolean type was de¬ned in early
C89, and while an extension was added in C99, it is still not supported by all
compilers. Thus, as in Chapter 3 we use the trick of packing several Booleans
into the machine representation of a single (unsigned) integer. This allows us
to represent the required array of Lim Booleans into Lim/8 bytes of memory.
From the point-of-view of correctness, let us show that the algorithm works
correctly. It is necessary to prove that any prime number has true stored in
the corresponding entry of table IsP rime and that any composite number C
has false stored. Clearly, since the table is initialized with true and since
only 1 and proper multiples of accepted numbers are crossed, no prime number
may be crossed from the list. Thus, we need to check that all composites are
e¬ectively removed. More precisely, this event happens during the ¬rst i loop,
before i reaches value C. Indeed, if C is composite, it has at least one prime
divisor, say P < C. Since P is prime, it remains in the ¬nal list and thus all
its multiples, including C are crossed. Thus the algorithm is correct.
As the runtime analysis goes, the algorithm spends Lim/p steps in the
inner loop to remove the multiples of the prime p (when i is equal to p).

© 2009 by Taylor and Francis Group, LLC
Sieve algorithms 125

Thus, the total running time is bounded by:

. (4.1)

Looking more precisely at the algorithm, we see that every composite is
crossed more than once. This opens the way for a slight improvement, it
su¬ces to cross multiples of p starting from p2 . Indeed, each composite C has
at least one prime divisor p such that √2 ¤ C. This reduces the number of
iterations of the outside loop down to Lim and improves the running time.
However, asymptotically, the gain is negligible. The main reason is that small
primes have many multiples and sieving over these primes is the costly part of
the algorithm. In the next section, we present possible improvements in order
to speed up practical implementation of Eratosthenes™s sieve. Our starting
reference implementation is the C code of Program 4.1.

4.1.2 Improvements to Eratosthenes™s sieve
Looking at our reference Program 4.1, we quickly discover two main lim-
itations on its range of applicability. The ¬rst limitation, which is easily
overcome, is the fact that with many compilers, the integer type int repre-
sents signed 32-bit integers which cannot count above 231 ’ 1. To remove this
limitation, it su¬ces to de¬ne the loop variables i and j as 64-bit integers.
The next limitation is the need to store the Boolean array containing one bit
for each number up to the considered limit. Even with a 1 Gbyte memory,
this prevents us from building primes further than 233 . Wheel factorization
Our ¬rst improvement pushes this limit further by focusing on the regularity
of the distribution of multiples of small primes in order to use less memory.
The most basic example of this idea is to remove all even numbers from our
table in memory. This is safe since the only even prime is 2. Moreover, this can
be done without any essential change to the sieving process. Indeed, if x is an
odd multiple of p, the next odd multiple of p is x + 2p whose representation
is located p steps further than the representation of x in memory. As a
consequence, this improvement also allows us to run Eratosthenes™s sieve twice
as far for the same cost in terms of both time and memory. To push this idea
further, let us consider the three primes 2, 3 and 5. Between, 1 and 30, only
eight numbers, namely 1, 7, 11, 13, 17, 19, 23 and 29, are not multiples of
either 2, 3 or 5. Moreover, adding 30 or a multiple of 30 to any number
preserves the properties of being multiple of 2, 3 or 5. As a consequence,
in any interval of 30 consecutive integers, we know that Eratosthenes™s sieve
already removes 22 numbers, simply by sieving over the small primes 2, 3
and 5. We can use this fact and restrict our table of Booleans in memory

© 2009 by Taylor and Francis Group, LLC
126 Algorithmic Cryptanalysis

Program 4.1 Basic C code for Eratosthenes™s sieve
#include <stdio.h>
#include <stdlib.h>

typedef unsigned int packedbool;
packedbool *_IsPrime;

#define NpckBool (8*sizeof(packedbool))
#define GetIsPrime(x) \
#define SetCompo(x) \

SievePrimes(int Limit)
int i,j, tabLimit;

for (i=0;i<tabLimit;i++) _IsPrime[i]=˜0;
for (i=2;i*i<=Limit;i++){
if (GetIsPrime(i)) {
for (j=i*i;j<=Limit;j+=i){
SetCompo(j); } } }

printf("List of primes up to %d:\n",Limit);
for (i=2;i<=Limit;i++){
if (GetIsPrime(i)) {
printf("%d\n",i); } } }

int Limit;
printf("Enter limit ?");
_IsPrime=(packedbool *)malloc((Limit+NpckBool-1)/8);

© 2009 by Taylor and Francis Group, LLC
Sieve algorithms 127

to consider only numbers of the form 30k + δ, for all integers k and for δ
chosen among 1, 7, 11, 13, 17, 19, 23 and 29. This allows us to represent more
than three times more numbers than the initial sieve into the same amount
of memory. In order to visualize the algorithmic principle, it is useful to
represent the numbers on a modular wheel of perimeter 30 as in Figure 4.1.
For this reason, this particular approach to Eratosthenes™s sieve is often called
wheel factorization.

29 1
28 2
& ¡
27 3
& ¡
26 4
& ¡
25 5
& ¡
24 6
& ¡
23 7
30k + δ
22 8
& ¡
21 9
& ¡
20 10
& &
& &
19 11
18 12
& &
& &
17 13
16 15 14

Figure 4.1: Schematic picture of wheel factorization

Of course, we can even add more small primes into the picture to further
improve the ratio, going up to the small prime 11, we restrict ourselves to
480 numbers out of 2310, thus gaining a factor of almost ¬ve. Using wheel
factorization also speeds up the computation. Indeed, it removes the need
to sieve over the smallest, more costly, primes. However, the approach has
the drawback of complicating the matter of sieving over larger primes, since
the locations of multiples of these primes are no longer evenly distributed
in memory. To show the di¬culty, let us consider the distribution of the
multiples of 7 in a wheel of perimeter 30. We already know that our memory
only holds numbers which are multiples of neither 2, 3 nor 5. We now need to
locate multiples of 7 among these numbers. They are obtained by multiplying

© 2009 by Taylor and Francis Group, LLC
128 Algorithmic Cryptanalysis

together 7 with a number in the wheel, since multiplication by 7 does not
change the divisibility by 2, 3 or 5. Thus, we need to mark all numbers of the
form: 7 · (30k + δ) for all k and for δ in 1, 7, 11, 13, 17, 19, 23 and 29. These
numbers form eight series, one for each value of δ and each series consists in
evenly distributed numbers in memory, separated by a distance of 7 — 8 as
shown in Figure 4.2, with one series in each column. However, the ¬gure also
shows that the starting points of the di¬erent series are not evenly spaced. In
our example, the starting points correspond to the numbers 7, 49, 77, 91, 119,
133, 161 and 203 and their respective locations in memory are 1, 13, 20, 24, 31,
35, 42, 54; assuming that 1 is at location 0. This uneven distribution makes
the sieving code more complicated. We need either to sieve with irregular
steps or to repeat the sieve eight times, once for each of the di¬erent series.
With a wheel of ¬xed size, this can be achieved by embedding all the di¬erent
cases into the program.
Alternatively, we can let the size of the wheel grow with the upper bound
Lim, this is the dynamic wheel approach initially proposed by Pritchard
in [Pri81]. Programming dynamic wheels is trickier and adds some signi¬cant
computational overhead, which may be costly in implementations. However,
it truly improves asymptotic complexities as shown in Section A way
to avoid part of the di¬culty is to use dynamic compilation, i.e., to write
a program which given the input bounds generates a new program with an
adapted wheel size. Segmented sieve
Our second improvement aims at removing the memory limitation alto-
gether. It is based on a re¬nement of the remark that only primes up to
square root of any composite need to be considered in order to cross this com-
posite. As a consequence, the sieving process only needs to know the prime
numbers up to the initial limit in order to proceed further up. Thus, the main
part of the computation only depends on a short initial segment of Eratos-
thenes™s sieve. This implies that the computation can be split into several
smaller computations, each within an interval of length equal to the square
root of the initial limit. Each of these smaller computations is essentially in-
dependent of the previous ones, the notable exception being the computation
of the primes in the ¬rst of these intervals. Doing this, we can reduce the
amount of required memory drastically, representing only short intervals of
square root length. However, sieving each interval independently adds some
computational overhead. Indeed, we need to compute the location of the ¬rst
multiple of each prime in each interval. In fact, it is better to reuse some
of the memory we just saved in order to remember this location from one
interval to the next. This is illustrated by the C code of Program 4.2.
This approach, which works with each segment in turn, is called segmented
sieve. From a complexity point-of-view, it greatly reduces the amount of
required memory. However, the number of operations to perform remains

© 2009 by Taylor and Francis Group, LLC
Sieve algorithms 129

Program 4.2 Small memory code for Eratosthenes™s sieve
#include <stdio.h>
#include <stdlib.h>
typedef unsigned int packedbool; typedef long long int int64;
packedbool *_IsBPrime; packedbool *_IsPrime; int64 *Offset;
#define Npck (8*sizeof(packedbool))
#define GetBasePrime(x) (_IsBPrime[(x)/Npck]>>((x)%Npck))&1
#define SetBaseCompo(x) _IsBPrime[(x)/Npck]&=˜(1UL<<((x)%Npck))
#define GetIsPrime(x) (_IsPrime[(x)/Npck]>>((x)%Npck))&1
#define SetCompo(x) _IsPrime[(x)/Npck]&=˜(1UL<<((x)%Npck))
#define Action(p) printf("%lld\n",(int64)(p))

int InitialSievePrimes(int Limit) {
int i, j, count, tabLimit;
count=0; tabLimit=(Limit+Npck-1)/Npck;
for (i=0;i<tabLimit;i++) _IsBPrime[i]=˜0;
for (i=2;i*i<Limit;i++){
if (GetBasePrime(i)) { count++;
for (j=i*i;j<Limit;j+=i) SetBaseCompo(j); } }
for (;i<Limit;i++){if (GetBasePrime(i)) count++;}
Offset=(int64 *)malloc(count*sizeof(int64)); count=0;
for (i=2;i<Limit;i++){
if (GetBasePrime(i)) { Action(i);
j=Limit%i; if (j!=0) j=i-j; Offset[count]=j;
count++; } } }

int SievePrimesInterval(int64 offset, int Length) {
int i, j, count, tabLimit;
count=0; tabLimit=(Length+Npck-1)/Npck;
for (i=0;i<tabLimit;i++) _IsPrime[i]=˜0;
for (i=2;i<Length;i++) {
if (GetBasePrime(i)) {
for (j=Offset[count];j<Length;j+=i) SetCompo(j);
Offset[count]=j-Length; count++; } }
for (i=0;i<Length;i++) {if (GetIsPrime(i)) Action(offset+i); }

© 2009 by Taylor and Francis Group, LLC
130 Algorithmic Cryptanalysis

Program 4.2 Small memory code for Eratosthenes™s sieve (continued)
main() {
int i,j; int64 Limit, tmp; int sqrt;

printf("Enter limit ?"); scanf("%lld",&Limit);

for(tmp=0;tmp*tmp<=Limit;tmp++); sqrt=tmp;

_IsBPrime=(packedbool *)malloc((sqrt+Npck-1)/8);
_IsPrime=(packedbool *)malloc((sqrt+Npck-1)/8);



© 2009 by Taylor and Francis Group, LLC
Sieve algorithms 131

99 373
98 371
97 367
96 361
95 359
94 353
93 349
92 347
91 343
90 341
89 337
88 331
87 329
86 323
85 319
84 317
83 313
82 311
81 307
80 301
79 299
78 293
77 289
76 287
75 283
74 281
73 277
72 271
71 269
70 263
69 259
68 257
67 253
66 251
65 247
64 241
63 239
62 233
61 229
60 227
59 223

Numbers represented
58 221
57 217
Memory locations

56 211
55 209
54 203
53 199
52 197
51 193
50 191
49 187
48 181
47 179
46 173
45 169
44 167
43 163
42 161
41 157
40 151
39 149
38 143
37 139
36 137
35 133
34 131
33 127
32 121
31 119
30 113
29 109
28 107
27 103
26 101
25 97
24 91
23 89
22 83
21 79
20 77
19 73
18 71
17 67
16 61
15 59
14 53
13 49
12 47
11 43
10 41
9 37
8 31
7 29
6 23
5 19
4 17
3 13
2 11
1 7
0 1

Figure 4.2: Multiples of 7 in a wheel of perimeter 30

© 2009 by Taylor and Francis Group, LLC
132 Algorithmic Cryptanalysis

essentially the same and the approach does not seem to o¬er any improvements
to the running time. Yet, in practice, doing the sieve in a segmented way
allows to work in a very small amount of memory. For example, with an
upper-bound Lim = 234 , above the limit we previously had with a 1 Gbyte
memory, the basic segment contains 217 integers and is represented by 217
bits or 16 Kbytes. On most modern computers, this ¬ts without any di¬culty
in the ¬rst level of cache. Since sieving is a simple process that requires few
operations, the time of the main memory accesses dominates the computation
of our reference code for sieving. With the segmented sieve, the memory
accesses in cache are about twice as fast. It is interesting to note that the
gain is quite small compared to the most extreme memory behaviors. One of
the reason is that the basic sieve is already quite good in terms of its memory
access structure. Remember that at lot of time is spent sieving the small
primes and that for small primes we access a sequence of memory positions
not too far from each other. Another reason is that the computational load
for packing the Boolean is not negligible and hides part of the gain. Fast practical Eratosthenes™s sieve
Of course, nothing prevents us from combining both improvements to obtain
an even faster code. However, before devising a combined approach, it is
essential to pro¬le our previous codes in order to understand their limitations.
The very ¬rst remark that arises during pro¬ling is that a huge portion of the
running time is used to simply print the prime numbers. From an algorithmic
point-of-view, this is irrelevant, indeed there are many possible applications
to computing prime numbers and many of them do not involve any printing.
For example, we may want to compute general statistics on the distribution
of primes or we may require the creation of prime numbers as pre-requisite
for a subsequent algorithm. As a consequence, before pro¬ling it is important
to replace the action of printing the primes by a di¬erent, less costly action.
One simple possibility is to increment a counter and obtain the total number
of prime numbers below the considered bound.
Pro¬ling this code shows that when building the primes up to 109 , about
three quarters are spent in the sieving process itself, the ¬nal quarter of the
running time is spent in the last line of SievePrimesInterval, where the
Boolean table with the prime status is reread from memory. This is quite
surprising, because from an asymptotic point-of-view, this part of the code
has complexity O(Lim) compared to O(Lim · log log(Lim)) for the sieve itself
and should be negligible. However, for practical values of the bound Lim,
the resulting value log log(Lim) is very small, which explains the apparent
discrepancy. Fast asymptotic Eratosthenes™s sieve
From an asymptotic point-of-view, there are several di¬erent optimums
when looking at the sieve of Eratosthenes, depending on the relative contri-

© 2009 by Taylor and Francis Group, LLC
Sieve algorithms 133

bution of time and space to the complexity. In particular, in [Sor98], Sorenson
presented four di¬erent combinations with di¬erent balance of time and space.
We can either try to optimize the asymptotic running time or take into
account both the running time and the memory use. When considering the
asymptotic running time only, the best known variation on Eratosthenes™s
sieve is an algorithm by Pritchard, described in [Pri81]. This algorithm uses
wheel factorization with the largest possible wheel applicable to the upper
sieve bound Lim given in input. Clearly, the wheel size is a function of Lim and
this approach needs to deal with the additional technicalities of using wheel
of varying size. The resulting algorithm is sublinear in time but requires a
large amount of space.
Another approach proposed by Pritchard [Pri83] is using both wheels and
a segmented sieve. It only achieves linear time but reduces the memory re-
quirements down to the square root of the bound Lim.

4.1.3 Finding primes faster: Atkin and Bernstein™s sieve
As we saw above, asymptotically, the sieve of Eratosthenes can be sublinear;
it can also be performed using a small amount of memory. However, it is not
known how to achieve both properties simultaneously for this sieve. This has
been an open problem for a long time. In 1999, Atkin and Bernstein proposed
in [AB04] a new algorithm, not based on Eratosthenes™s sieve, that solves this
problem. This algorithm is constructed from 3 subroutines each addressing a
subset of the prime numbers, using a characterization of primes which does
not make use of divisibility. The ¬rst algorithm can be used to ¬nd primes
congruent to 1 modulo 4, the second ¬nds primes congruent to 1 modulo 6
and the third ¬nds primes congruent to 11 modulo 12. Since all primes, but 2
and 3, are of these forms these three subroutines su¬ce. For primes congruent
to 1 modulo 12, we need to choose either the ¬rst or second subroutine.
Let us start with the case of primes congruent to 1 modulo 4. The algorithm
is based on the following theorem:

THEOREM 4.1 (Th. 6.1 in [AB04])
Let n be a square-free positive integer with n ≡ 1 (mod 4). Then n is prime
if and only if the cardinality of the set (x, y)|x > 0, y > 0, 4x2 + y 2 = n (or
equivalently of the set (x, y)|x > y > 0, x2 + y 2 = n ) is odd.

The ¬rst subroutine follows quite naturally from this theorem. It contains
two phases: the ¬rst phase computes the cardinality of the set given in The-
orem 4.1, the second phase removes non-square-free integers. This results in
Algorithm 4.2. For more e¬ciency, this algorithm can be combined with a
wheel. For example, in [AB04] it is presented (as Algorithm 3.1) using a wheel
of 30. Note that such a wheel is easily combined with the congruence condi-
tion modulo 4, thus resulting in an extended wheel of length 60 containing 8

© 2009 by Taylor and Francis Group, LLC
134 Algorithmic Cryptanalysis

di¬erent numbers.

Algorithm 4.2 Sieve of Atkin and Bernstein for primes ≡ 1 (mod 4)
Require: Initial range Lim . . . Lim + 4Count with Lim ≡ 1 (mod 4)
Create array IsOdd indexed from 0 to Count, initialized to false.
for all (x, y, i) with x > 0, y > 0, i ¤ Count, 4x2 + y 2 = Lim + 4i do
Negate the Boolean value of IsOdd[i]
end for
for all prime q with q 2 ¤ Lim + 4Count do
for all Lim + 4i multiples of q 2 do
Set IsOdd[i] to false
end for
end for

The other subroutines derive from two other Theorems 4.2 and 4.3. They
can also be implemented using extended wheels of 60, respectively containing
eight and four numbers.

THEOREM 4.2 (Th. 6.2 in [AB04])
Let n be a square-free positive integer with n ≡ 1 (mod 6). Then n is prime
if and only if the cardinality of the set (x, y)|x > 0, y > 0, 3x2 + y 2 = n is

THEOREM 4.3 (Th. 6.3 in [AB04])
Let n be a square-free positive integer with n ≡ 11 (mod 12). Then n is
prime if and only if the cardinality of the set (x, y)|x > y > 0, 3x2 ’ y 2 = n
is odd.

Putting together the three subroutines implemented with an extended wheel
of 60 gives a practically e¬cient code. A package called primegen written by
Bernstein and distributed on his web pages gives an e¬cient implementation
of this algorithm. Moreover, from an asymptotic point-of-view, this new al-
gorithm improves upon the best algorithms based on Eratosthenes™s sieve. In

this case, we need to use a wheel containing all the primes up to log Lim and
achieve running time O(Lim/ log log(Lim)) using Lim1/2+o(1) bits of memory. Further improvements of Atkin and Bernstein™s sieve
While more e¬cient than algorithms based on the sieve of Eratosthenes,
Atkin-Bernstein™s sieve share a drawback with it. Assume that we do not want
all the primes up to Lim, but only the primes in some interval [Lim1 , Lim2 ].

Then, we see that there is a ¬xed start-up cost O( Lim2 ) which is independent

© 2009 by Taylor and Francis Group, LLC
Sieve algorithms 135

of the length of the interval. With the sieve of Eratosthenes, this cost comes

from creating the primes up to Lim2 and ¬nding their smallest multiples in
the interval. For Atkin and Bernstein™s sieve, the cost comes from the number
of pairs (x, y) we need to enumerate. Let us consider the ¬rst case that covers
primes congruent to one modulo 4. There, we need to consider all pairs (x, y)
with Lim1 ¤ 4x2 + y 2 ¤ Lim2 . Equivalently, we need to enumerate all pairs
(x , y) with Lim1 ¤ x + y 2 ¤ Lim2 and x even. This can be done by ¬nding
all points √
with integer coordinates located between two concentric circles of

diameter Lim1 and Lim2 . When Lim1 is near from Lim2 , it seems natural
to expect that this task requires a walk inside the perimeter of the outer circle

and costs O( Lim2 ). This is indeed the behavior of the algorithm proposed
by Atkin and Bernstein in [AB04].
Surprisingly, it is possible to enumerate all integer points between two
nearby concentric circles in a more e¬cient manner. The key idea comes
from a theorem of van der Corput and it has been applied to the algorithm of
Atkin and Bernstein by Galway in his PhD thesis. Using this technique, he
devised a √“dissected” sieve based on Atkin-Bernstein™s whose overhead cost
is only O( 3 Lim2 ). Using this algorithm, he could enumerate primes in small
intervals for much larger values of Lim2 .
At the time of writing, this dissected sieve idea is not used in any cryp-
tographic application. Thus, we do not describe it in detail and refer the
interested reader to Galway™s thesis [Gal04].

4.2 Sieving for smooth composites
In the previous section, we used sieve algorithms to ¬nd prime numbers.
However, another very frequent application of sieve algorithms in cryptog-
raphy searches for composite numbers (resp. polynomials) with many small
enough factors, which are usually called smooth numbers (resp. polynomials).
Applications are discussed in Chapter 15. In the present chapter, we simply
focus on the sieve algorithms themselves and show the many possible variants,
together with their performance issues. When sieving for composites instead
of primes, there is a ¬rst, essential di¬erence. In order to locate primes, a
Boolean table is enough, indeed, whenever a number has a divisor, it is com-
posite and can be removed. When locating smooth numbers, this is no longer
true, indeed, we need to check that each number has enough divisors and thus
we should be able to count higher than 1.
Another important di¬erence between the sieve algorithms in Sections 4.1
and 4.1.3 and the algorithms in this section is that instead of simply searching
for numbers in an interval we may have an additional degree of freedom and
search in higher dimensional sets. The most frequent case is to consider two

© 2009 by Taylor and Francis Group, LLC
136 Algorithmic Cryptanalysis

dimensional spaces. In this context, there are several possible strategies to
cover the sieving zone. We present two classical options: line and lattice
Finally, when using the algorithms presented in this section in applications,
we need to ¬nd many smooth objects but it is usually not essential to retrieve
all of them. We can overlook some without any problems as long as we ¬nd
many quickly enough.
Due to these di¬erences, to optimize the sieve algorithms in this section, we
cannot follow the same approaches as before and we need to develop speci¬c

4.2.1 General setting
The most frequent case for sieve algorithms, especially in index calculus
methods (see Chapter 15), is to consider algebraic objects of the form a + b±
where ± is ¬xed and a and b vary. Without going into the mathematical
details, which are given in Chapter 15, we need to explain what “smoothness”
means in this context. In fact, there are two di¬erent mathematical settings
to consider. The varying elements a and b may be either numbers (integers)
or univariate polynomials over a small ¬nite ¬eld. The two settings are very
similar from a high level point-of-view, but the implementation details vary a
lot between the two. Moreover, the mathematical language used to described
both cases is slightly di¬erent. For now, we give an overall description using
the language corresponding to the case of numbers. Our ¬rst remark is that
we only consider pairs (a, b) where a and b are coprime. Otherwise, we would
be considering multiples of previously seen objects. In the context of index
calculus, this would result in equivalent equations and would not be of any
help. Given such a pair (a, b), we need to consider a + b± modulo a prime p
and test whether we ¬nd 0 or not. To do this, we need to de¬ne the image
of ± modulo p. In general, this image need not be unique, for some primes ±
(1) (1) (2)
has a single image ±p , for some it has two ±p and ±p , and so on . . .
There are also primes with no image at all for ±, these primes can never
occur among the factors of a + b±. For both theoretical and practical reasons,
it is best to consider each image of ± for the same p as a di¬erent factor. With
(i) (i)
this convention, we say that a pair (p, ±p ) divides a + b± when a + b±p ≡ 0
(mod p). From a geometric point-of-view, the pairs (a, b) associated to objects
of the form a + b± divisible by (p, ±p ) are nicely distributed in the Euclidean
plane. More precisely, they form a 2 dimensional lattice. This is easily shown,
(i) (i)
let a + b± be divisible by (p, ±p ), then a + b±p ≡ 0 (mod p). Thus there
exists an integer » such that a = ’b±p + »p. This implies that:
a p
=b +» . (4.2)
b 0
Thus, a point associated to an element a+b± divisible by (p, ±p ) belongs to

© 2009 by Taylor and Francis Group, LLC
Sieve algorithms 137

the 2-dimensional lattice generated by the two vectors appearing in the right-
hand side of Equation (4.2). Conversely, we can check that both generating
(i) (i)
vectors satisfy the divisibility equation, since ’±p + 1 · ±p ≡ 0 (mod p) and
p + 0 · ±p ≡ 0 (mod p). By linearity, any integer linear combination of the
two vectors also satis¬es the divisibility equation. As a consequence, the set
of points associated to elements a + b± divisible by (p, ±p ) is precisely this
2-dimensional lattice.


Figure 4.3: Set of points a + b± divisible by (11, 5)

To illustrate this geometric point-of-view, we choose the following simple
example: let us take the prime p = 11 and an image for ±: ±11 = 5. In this
case, the set of points divisible by (11, 5) is represented by Figure 4.3. In this
example, we see that the generating vectors we gave for this lattice are not
the best possible choice. The two dotted vectors in Figure 4.3 give a better
description of the lattice. Since the lattice is two-dimensional, it is not neces-
sary to use the general purpose algorithms described in Chapter 10 to obtain
this good representation. We can either use Gauss reduction as described in
Chapter 10 or adapt a continued fraction algorithm to our purpose. For now,
we assume when necessary that each pair (p, ±p ) is given together with a
good basis describing the corresponding lattice.

Adaptation to polynomials We mentioned at the beginning of the present
section that we adopt here the language of numbers. In the case of polyno-
mials, the necessary adaptations are quite natural. When considering a + b±,
a and b become univariate polynomials over a ¬nite ¬eld, say a(t) and b(t).
The primes p are replaced by irreducible polynomials p(t) over the considered

© 2009 by Taylor and Francis Group, LLC
138 Algorithmic Cryptanalysis

¬nite ¬eld. The images ±p also become polynomials and we say that a pair
(i) (i)
(p(t), ±p (t)) divides a(t) + b(t)± when a(t) + b(t)±p (t) is a multiple of p(t)
over the ¬nite ¬eld. Smoothness
Now that we know how to ¬nd divisors of an element a + b±, we are ready
to de¬ne the notion of smoothness. We start from a smoothness basis B
containing all pairs (p, ±p ) with p ¤ B, where B is called the smoothness
bound. We say that a + b± is B“smooth if and only if all its divisors belong to
B. To detect a smooth element, we check that product of the primes occuring
in its divisors is near a number called the norm of a + b± which roughly gives
the size of a + b±. This notion of norm is de¬ned precisely in Chapter 15. For
the rest of this chapter, we do not need an exact de¬nition and simply need
to know that a number is B“smooth if and only if its norm is a product of
prime numbers below B.
Without going into details, let us mention an additional complication due
to the fact that the notion of divisibility we presented earlier is also associated
with a notion of multiplicity: a pair (a, b) may be divisible by (p, ±p ) more
than once. We chose to ignore that point, which is needed for the algorithms
in Chapter 15, since the notion of multiplicity is usually overlooked by sieving
This means that when computing the product of the primes occuring in the
divisors of a + b±, we do not correctly account for the contribution of multiple
factors. Thus, there is no guarantee that the product is equal to the norm,
but on average we expect it to be close to it. Indeed, we expect multiple
factors to be of small size and the overall multiple contribution to remain
small. Of course, we could also try to account for divisors that occur multiple
times; however, the additional computational cost makes this approach much
less practical. Note that in practice, instead of computing the above prod-
uct, it is more convenient to compute an additive equivalent by adding the
logarithms of the primes occuring in divisors. An even simpler alternative is
to compute the numbers of di¬erent divisors, if it is high enough, we expect
a + b± to be smooth quite often. Of course, using these simpli¬cations, i.e.,
ignoring multiplicities and counting the numbers of divisors only, we may well
miss some smooth elements. However, since we only need to ¬nd many, not
all, this is not a real problem. A very important parameter when detecting
smooth candidates is the detection threshold. It should be carefully chosen by
balancing the rate of false candidates and the rate of good candidates being
overlooked. Basic Lattice Algorithm
With all these elements in mind, we are now ready to give our ¬rst algorithm
to sieve for smooth elements a + b± with |a| and |b| smaller than sieve limits

© 2009 by Taylor and Francis Group, LLC
Sieve algorithms 139

Sa and Sb . As said before, we are only interested in pairs (a, b) where a and
b are coprime. Moreover, we may remark that it is useless to consider both
a + b± and its opposite. As a consequence, we may assume that a ≥ 0. We
need to explore as sieving zone a rectangle of Sa by 2Sb ’ 1 points. Using a
direct approach to explore the whole sieving zone results in Algorithm 4.3.

Algorithm 4.3 Two-dimensional sieving for smooth numbers
Require: Smoothness basis B
Require: Sieve limits Sa and Sb
Require: Expected value LogV al for smooth candidates
Create two-dimensional array LogSum indexed by [O · · · Sa ’ 1] — [’(Sb ’
1) . . . Sb ’ 1] initialized to zero
for all P = (p, ±p ) in B do
for all (a, b) ∈ [O · · · Sa ’ 1] — [’(Sb ’ 1) . . . Sb ’ 1] with a + b± divisible
by P do
Add log p to LogSum[a, b]
end for
end for
for all (a, b) ∈ [O · · · Sa ’ 1] — [’(Sb ’ 1) . . . Sb ’ 1] do
if LogSum[a, b] ≥ LogV al then
Check whether a + b± is really smooth
If so Display((a, b),˜corresponds to a smooth value™)
end if
end for

As stated, this algorithm is not complete; we need to know how to walk
over multiples of (p, ±p ) and also how to check whether a candidate a + b±
is really smooth or not. Moreover, even in this basic form, some potential
improvements are already worth considering: Can we expect to go faster than
this basic algorithm? Can we use less memory and work with a smaller array?
In this section, we now give general answers to these questions. More detailed
and speci¬c answers are discussed in Sections, and

(i) Walking the multiples. Given an element (p, ±p ) of the smoo-
thness basis B, we need a good presentation of the corresponding lattice given
by two basis vectors u and v. After that, we search for the lattice elements
within the total sieve area. This is a reasonably simple task performed by
starting from a known lattice element, usually the point at origin and by
adding multiples of the two basis vectors in order to ¬nd new points. This is
usually done by nesting two loops, the inner loop goes through the sieve area
by adding multiples of one basis vector, say u, thus drawing a line. The outer
loop adds multiples of the other basis vector v. This basic idea needs to be

© 2009 by Taylor and Francis Group, LLC
140 Algorithmic Cryptanalysis

adapted to each speci¬c setting and this can be done quite e¬ciently for all
of our special cases. Speeding up the approach. One basic technique to speed
up the sieve algorithm is to remark that an element of B associated with a
prime p corresponds to a walk over a 1/p fraction of the sieving zone. As
a consequence, elements associated to small primes contribute more to the
runtime. Moreover, these elements o¬er a low added value: knowing that
the norm of an element is divisible by a small prime is a smaller step toward
smoothness. To overcome these two drawbacks, one classical technique is to
remove from the sieve algorithm all elements of the smoothness basis below a
given bound. Checking for smoothness. The easiest way to check for smoo-
thness is simply to compute the norm of a + b± and then to factor this norm
using a fast algorithm. At this point, the cases of numbers and polynomials
are quite di¬erent. Indeed, as recalled in Chapter 2, factoring polynomials
can be done in polynomial time by e¬cient and practical algorithms, thus
this approach seems to be a good start in the case of polynomials. In the case
of numbers, among known algorithm, the better suited for factoring numbers
with many small factors is the elliptic curve factoring method (ECM, see
Chapter 14). As a consequence, in this case, checking smoothness seems to
be much more di¬cult and other methods need to be devised to remove this
practical obstruction. Note that from a complexity theoretic point-of-view,
using ECM is not a real problem and does not change the overall complexity.
However, in practice, getting rid of ECM, wherever it is possible, greatly
improves the e¬ciency. Interestingly, these methods can even be used in the
case of polynomials to speed up things a little.
The best approach is to view the problem of checking for smoothness as a
speci¬c factoring task. We need to factor a list of candidates in order to keep
the smooth ones. As with ordinary factoring tasks, it is a good idea to start
by using trial division, thus checking for elements of the smoothness basis
associated to small primes, whether they divide each candidate. It is essential
to follow this approach for the very small primes that were omitted from
the initial sieving phase. When a candidate is divisible by such an element
we divide its (precomputed) norm by the corresponding prime, taking the
multiplicity into account. As a consequence, after going through all the very
small primes, we are left for each candidate with a reduced norm without very
small factors. Once this is done, some candidates have a large reduced norm
and others a smaller one. Clearly, candidates with a large reduced norm have
a smaller probability of being smooth. As a consequence, to speed up things,
it is possible to ¬lter the candidates, removing those with a large norm. After
that, some additional trial division tests can be performed for slightly larger
primes. This idea of ¬ltering, i.e., of aborting early for bad candidates was


. 5
( 18)