. 5
( 10)


each tooth is considered as an independent skeletal element, the skull is considered
one element, and vertebrae, carpals, and tarsals are ignored, each individual deer has
the same number of skulls, mandibles, scapulae, humeri, radii, ulnae, innominates,
femora, tibiae, and ¬bulae (distal end only in ungulates) as each individual beaver.
But a beaver has two clavicles whereas a deer has none; a beaver has four incisors
whereas a deer has eight; a beaver has sixteen metapodials whereas a deer has four;
and a beaver has forty-eight phalanges (¬rst, second, and third) whereas a deer has
twenty-four. Thus a single deer can be conceived (for sake of discussion) to have ¬fty-
¬ve identi¬able elements whereas an individual beaver has (for sake of discussion)
eighty-nine identi¬able elements.
If we follow Holtzman™s (1979) procedure, then (NISP) abundances of beaver
and deer at Cathlapotle change to the WAE values indicated in Table 3.16. Notice
that the relative abundances of beaver and deer do not change whether NISP or
quantitative paleozoology

WAE values are used. Deer always outnumber beaver. Notice as well that the WAE
values provide a sort of fractional MNI in the sense that there are suf¬cient skeletal
elements of beaver in the precontact assemblage to represent about one and a quarter
individuals, and there are suf¬cient skeletal elements of beaver in the postcontact
assemblage to represent a bit more than two and a half individuals. Calculation of
Shotwell™s CNSE or Holtzman™s WAE does not gain us much accuracy in estimating
taxonomic abundances.
Rogers (2000a) has been concerned with how the differential accumulation of
skeletal parts in conjunction with the differential preservation of those parts might
in¬‚uence estimates of animal abundance (among other things). He notes that empir-
ical work has focused on the two taphonomic processes of accumulation (what he
terms deposition) and attrition, and that focus has resulted in major gains in knowl-
edge. But he was concerned that statistical methods had not developed or improved
at the same pace as empirical knowledge had improved (Rogers 2000b). Rogers there-
fore developed what he termed “analysis of bone counts by maximum likelihood”
(ABCML). ABCML is a complex statistical algorithm into which one plugs various
data such as skeletal part frequencies (see Chapter 6), bulk density per skeletal part (as
a proxy for preservation potential), economic utility per part, and the like. Attrition
of skeletal parts is assumed to be proportional to the bulk density of each part, and
the probability of transport of a part is assumed to be proportional to the utility of a
part. Rogers (2000a:122“123) explicitly acknowledged that ABCML was “incomplete”
because it has empirical weaknesses. Weaknesses include the modeling of attrition
and transport based on density and utility, respectively; actualistic (ethnoarchaeo-
logical) research indicates that both of these processes are in¬‚uenced by more than
just density and utility. Application of the ABCML protocol to a zooarchaeological
collection (Rogers and Broughton 2001) makes these points explicitly clear for two
reasons. First, the assumptions that are required are numerous, and second, the qual-
itative results are characterized as “more trustworthy” and more likely to be correct
than are the quantitative results (Rogers and Broughton 2001 :763, 772).
It is likely that ABCML has not often been used for two reasons. First, many paleo-
zoologists lack the statistical sophistication necessary to comprehend what ABCML
is and how it works (Rogers 2000b). Paleozoologists must learn more about statistical
methods, and they must overcome about seven decades of disciplinary historical iner-
tia that has focused on deterministic questions rather than probabilistic ones (Lyman
1994c). Second, the method requires one to assume much regarding the taphonomic
history of the collection. Has the collection been subjected to differential transport
intrataxonomically or intertaxonomically or both? Has it been subjected to dif-
ferential attrition intrataxonomically or intertaxonomically or both? A colleague
estimating taxonomic abundances: other methods 137

suggested that other methods analytically overlook these possibilities when
estimating taxonomic abundances, but ABCML deals with them. It also provides
con¬dence intervals, allowing one to assess how tight (or loose) an estimate is.
Rogers and Broughton (2001 ) advocate the use of ABCML because of perceived
¬‚aws in more simplistic analyses. In particular, they correctly note that even non-
parametric measures of association such as Spearman™s rho (between, say, NISP and
MNI) assume that NISP counts represent independent tallies. We know NISP counts
do not (or are highly unlikely to) represent tallies of independent specimens (Chap-
ter 2). Correlation coef¬cients also assume that different taxa have equal numbers of
identi¬able skeletal elements (ignoring fragmentation), and we know that they do
not. Finally, Rogers and Broughton (2001 ) correctly argue that different values of a
correlation coef¬cient cannot be related theoretically to the intensity of a taphonomic
process. Correlation coef¬cients are, however, used by paleozoologists to gain insight
to possible associations; none builds an argument on just a correlation coef¬cient
and instead consults other pertinent data to help understand why a correlation exists
(or doesn™t exist). Few paleozoologists infer the intensity or degree of a taphonomic
process simply on the basis of the magnitude of a correlation coef¬cient; rather, a
coef¬cient is usually interpreted in nominal-scale terms. Two variables are corre-
lated, or they are not. What the presence (or absence) of a correlation means is a
taphonomic issue more than a statistical one.
ABCML presents a detailed view of the sorts of variables that one must con-
sider in any analysis of transport and attrition and how those processes are likely
to in¬‚uence taxonomic abundances. In this I (in Lyman 2004c) agree completely
with Rogers™s (2000a:123) observation that detailed consideration of the empirical
requirements of ABCML will “help identify the [taphonomic, biological, archaeo-
logical, etc.] parameters that should be estimated and reported.” Time will tell if the
method gains popularity among paleozoologists.


In a recent discussion, Orchard (2005) argued that the relationship of bone size
to body size can be used to assist with the estimation of taxonomic abundances.
The method is easy to grasp conceptually. Each skeletal element has a particular
statistical relationship to body size that can be established with museum specimens.
That relationship can be described by a regression equation. Once such equations are
established for multiple skeletal elements, different skeletal elements in a prehistoric
collection may be used to estimate the body sizes represented. Let™s say that the most
quantitative paleozoology

common skeletal element in a prehistoric collection is right astragali, and the sizes
of those specimens suggest that there are ¬ve individuals (= MNI) of varied sizes. If
distal right tibiae suggest that there are only four individuals (= MNI) but one of those
tibiae indicates an individual body size larger than any of the individual body sizes
indicated by right astragali, then a sixth individual is added to the tally of individuals.
There is a practical problem; “the time and effort involved in gathering compar-
ative data and generating regression formulae, as well as the dif¬culty in obtaining
adequate comparative samples, can be prohibitive” (Orchard 2005:357). Generat-
ing regression formulae is relatively easy. It is the process of acquiring the requisite
data that is dif¬cult. Consider, for example, Emerson™s (1978) data for white-tailed
deer summarized in Table 3.12. He worked the several weeks of an annual hunting
season. I spent 2 weeks visiting eight collections of deer skeletons in various muse-
ums and comparative zooarchaeological collections to generate the data presented
in Figure 3.12. Those collections are housed in widely separated localities (Wyoming,
Montana, Washington, British Columbia). There is a more fundamental problem
with Orchard™s suggested procedure, however, and it can be illustrated with some of
the data collected when the Figure 3.12 data were collected.
Consider Emerson™s (1978) data summarized in Table 3.12. As noted earlier, those
data provide the equation Y = “104.96 + 4.11 X, where Y is the body weight or
individual biomass in kilograms, and X is the maximum lateral length of the astragalus
in millimeters. Applying that equation to the seventeen left and seventeen right
astragali constituting bilaterally paired elements of white-tailed deer that I measured
produces the results summarized in Table 3.17. Variation between the individual body
size estimated by the length of the left astragali and the body size estimated based on
the length of its bilaterally paired right astragali mate ranges from 0.25 kilograms to
2.3 kilograms with an average of 0.94 kilograms. Similar analysis of forty-three pairs
of astragali from mule deer indicates difference in body size estimates provided by left
and right elements averages 1.01 kilograms and ranges from 0.08 to 4.11 kilograms.
The problem that presents itself is precisely the one illustrated in Figure 3.13 but
the variables are different. In the former case the problem concerned the degree of
symmetry of distal left and right humeri in terms of size; now it is symmetry in
estimates of body weight derived from the size of astragali. Recall that only those
specimens that provide asymmetrical results (do not have bilateral mates in the
collection) will also add to the tally of individuals represented by an assemblage of
skeletal elements. What tolerance level should be chosen and why? How symmetrical
should the two estimates of body size be in order to conclude the size of the same
individual has been estimated twice? Orchard (2005) provides no guidance, and he is
estimating taxonomic abundances: other methods 139

Table 3.17. Estimates of individual body size (biomass) of seventeen
white-tailed deer based on the maximum length of right and left astragali.
Estimation equation is Y = “104.96 + 4.11X, where Y is the body weight or
individual biomass in kilograms, and X is the maximum lateral length of the
astragalus in millimeters (after Emerson 1978)

Length of Right Length of
Pair right weight left Left weight Difference
1 38.78 54.426 38.36 52.700 1.726
2 40.38 61.002 40.08 59.769 1.233
3 43.10 72.181 42.86 71.195 0.986
4 39.90 59.029 40.00 59.440 0.411
5 36.84 46.452 36.74 46.041 0.411
6 43.74 74.811 43.32 73.085 1.726
7 38.94 55.083 38.88 54.837 0.246
8 42.76 70.784 42.44 69.468 1.316
9 38.94 55.083 38.38 52.782 2.301
10 43.74 74.811 43.46 73.661 1.150
11 42.86 71.195 42.94 71.523 0.328
12 41.72 66.509 41.36 65.030 1.479
13 41.80 66.838 41.92 67.331 0.493
14 37.44 48.918 37.54 49.329 0.411
15 39.70 58.207 39.82 58.700 0.493
16 40.88 63.057 40.72 62.399 0.658
17 40.52 61.577 40.38 61.002 0.575

wise to not do so (Lyman 2006a). All of the problems that attend identifying bilateral
pairs also plague Orchard™s (2005) method.


None of the quantitative units and methods occasionally used to estimate or measure
taxonomic abundances reviewed in this chapter have been widely adopted or seen
more than sporadic use. When Grayson (1984) wrote his synopsis of quantitative
zooarchaeology, he focused on NISP and MNI because they were at the time the
most widely used units. His discussions of other methods such as the Lincoln“
Petersen index were terse. I have attempted here to empirically support, or refute
claims regarding these other methods. Thus, we ¬nd that biomass and meat weight
quantitative paleozoology

estimates compound many weaknesses of MNI because of requisite assumptions
regarding average live weights and edible tissue amounts. Ubiquity as a measure
of the “importance” of a taxon is strongly in¬‚uenced by sample size measured as
NISP; it provides information on taxonomic abundance that is virtually identical
to information provided by NISP. Ubiquity might measure some as yet unknown
(taphonomic?) variable if two taxa with statistically indistinguishable sample sizes
have different ubiquities, but what that variable might be is unclear.
Numerous quantitative methods have been proposed as improvements to MNI.
Virtually all involve investing what Theodore White thought would be a great deal of
time determining which left elements pair up bilaterally with which right elements.
The validity of estimates of taxonomic abundance provided by those methods rests
on the validity of pair identi¬cation. The pair identi¬cation procedure rests on the
notion of bilateral symmetry but no organism is perfectly bilaterally symmetrical,
so one must decide how symmetrical is symmetrical enough. Even if that decision
can be made, in large samples (of more than, say, two dozen specimens) potential
bilateral mates for any specimen are multiple. This highlights the fact that false pairs
are likely to be identi¬ed even in small samples of lefts and rights.
Other methods discussed in this chapter concern efforts to correct for intertaxo-
nomic variation in (i) number of identi¬able elements per individual, (ii) transport or
accumulation, and (iii) fragmentation. These variables can be analytically accounted
for in NISP (see Chapter 2). Where are we left, then, with respect to measuring tax-
onomic abundances? As alluded to in the preceding chapter, I agree with Grayson
(1979, 1984) and the numerous paleobiologists who use NISP to measure taxonomic
abundances. It is cumulative or simply additive, meaning it is primary data or an
observed measure; and it serves as the basis for many derived measures (and hence
is correlated with many of them). I use it virtually exclusively in subsequent chapters
when issues of taxonomic abundance are under study. When I do not, I explain why;
generally I use quantitative units other than NISP when the target variable is not
taxonomic abundances.
Sampling, Recovery, and Sample Size

It is commonsensical to note that what is recovered “ the amount recovered of
each kind, and the number of kinds “ will in¬‚uence quantitative analyses (Cannon
1999). As we have seen in earlier chapters, sample size in¬‚uences many measures and
estimates of taxonomic abundances. The size of a sample of faunal remains measured
as the number of specimens recovered is in turn in¬‚uenced by the sampling design
chosen (how much is excavated) and the recovery techniques (passing sediment
through ¬ne- or coarse-mesh sieves) used to implement that sampling design. This
chapter focuses on how one generates a collection of faunal remains (sampling and
recovery), properties of the resultant sample, and ways to examine the in¬‚uences of
sample size on selected target variables.
Paleozoologists have long worried about how methods of recovery might pro-
duce collections that are not representative of a target variable (e.g., Hibbard 1949;
Kuehne 1971 ; McKenna 1962; Payne 1972; Thomas 1969). Exacerbating this worry
is the fact that paleozoologists collect samples of faunal remains from geological
contexts (Krumbein 1965; Ward 1984). This is true on at least two levels. First, pale-
ozoologists never (or at least very seldom) collect all of the faunal remains from
a deposit, paleontological location, or archaeological site. Second, the target vari-
able usually resides in an entity other than the “identi¬ed assemblage” (Figure 2.1 ).
If the target variable is the taphocoenose (either that which is preserved or that
which was deposited), the thanatocoenose, or the biocoenose, the paleozoologist is
dealing with a sample regardless of whether or not the complete deposit has been
For more than 50 years, paleozoologists have suggested that probabilistic sampling
methods will produce “representative samples” (e.g., Gamble 1978; Krumbein 1965;
Voorhies 1970). These methods concern techniques to choose portions of the geo-
logical record to examine for faunal remains. There are many excellent discussions
quantitative paleozoology

of probabilistic sampling (e.g., Orton 2000), so methods of probabilistic sampling
are not discussed here. Instead the focus is on a couple of related sampling issues.
Choosing deposits to inspect for faunal remains is but one part of the sampling
problem. Another part concerns how faunal remains are retrieved or collected from
deposits chosen for inspection. If more sediment samples are chosen for inspection,
or more remains are recovered from the chosen sediments because of how remains
are retrieved, then the resultant sample is different (larger) than it might otherwise
have been.
In this chapter, two issues are of concern. One is collection or recovery technique.
Are faunal remains picked by hand from exposed sediments; are sediments passed
through hardware cloth (screens or sieves) and faunal remains that do not pass
through the cloth collected; are faunal remains collected from bulk samples, from
¬‚otation samples, or by some other means? Because the recovery methods used
in¬‚uence what is collected, efforts have been made to correct for these in¬‚uences,
and some of these correction procedures are reviewed here. Another issue discussed
in this chapter is sample size measured in either or both of two ways “ as amount of
sediment examined (either area or volume) and as amount of faunal material studied.
Both measures of sample size often correlate with the number of taxa recovered, the
relative abundances of those taxa, and the like. To make valid interpretations of
quantitative faunal data, we must understand both the nature of a sample and how
sampling techniques may in¬‚uence the faunal variables that we hope to measure.
All of the variables discussed here “ amount of sediment inspected, screen mesh
size, NISP “ are particular manifestations of sampling effort. Greater sampling effort
(however measured) will produce larger samples, but how sampling effort in¬‚uences
other characteristics of the sample is not always recognized.
One distinction that must be made at the start concerns the difference between
discovery sampling and statistical precision sampling (Nance 1983). Discovery sampling
concerns sampling designs built to discover new phenomena and the sampling effort
required to ¬nd various categories of phenomena. The more rare a kind of thing is in a
sampling universe, the more sampling effort required to ¬nd an instance of that kind.
Statistical precision sampling generates a sample that provides an accurate estimate
of a target variable. Whereas discovery sampling focuses on ¬nding examples of
rarely occurring kinds of phenomena, statistical precision sampling seeks to estimate
properties of commonly occurring categories of phenomena, whether abundances of
individual instances within each category, average size of members of each category,
or any of a plethora of other variables that might be measured. The distinction of
discovery sampling and statistical precision sampling will be important in this and
subsequent chapters.
sampling, recovery, and sample size 143


Many of the quantitative variables that we seek to measure with paleozoological
collections are often strongly in¬‚uenced by sampling effort (how ever such effort is
measured), itself a quantitative variable. In this chapter analytical techniques that
have been suggested for controlling those in¬‚uences when comparing samples of
markedly different sizes are outlined. These techniques are based on the assumption
that no other deposits will be inspected for faunal remains, and thus that no new
faunal remains will be added to the samples at hand. Another method that assumes
new specimens are forthcoming until a sample that is representative of the target
variable(s) is in hand is also described. This latter method can be implemented with
either or both of two distinct measures of sample size.
Paleozoologists typically collect a sample of faunal remains from a population of
remains. (For sake of discussion, the identity of the target variable “ taphocoenose,
thanatocoenose, biocoenose “ will be ignored.) The population may comprise all the
faunal remains encased within a stratum, or those within several strata thought for
non-faunal reasons to represent the same zoological property of interest and thus for
analytical purposes to be instances of the same population. Because paleozoologists
sample the depositional (geological) record for faunal remains, they generally collect
multiple samples. One sample may be collected today and another tomorrow; one
sample may be collected from a particular geographic and geological location and
another from a different location. In many cases, individual samples are collected over
multiple time periods, whether those periods are consecutive months or consecutive
annual ¬eld seasons. Because consecutively gathered samples from a deposit (from
what is thought to represent the same population) are cumulative, a basic method
of empirically assessing sample adequacy suggests itself.
Assume that a target variable has been speci¬ed by the research problem one is
attempting to solve. Let us say that the target variable requires measurement of the
number of mammalian taxa in a collection, generally known as taxonomic richness.
The acronym NTAXA for “number of taxa” will be used here. (NTAXA is used
by ecologists and archaeologists [e.g., Broughton and Grayson 1993] to measure
niche breadth [among other things].) How do we know when we have collected
enough faunal remains to have a sample that provides a relatively accurate estimate
of NTAXA? Archaeologist Robert Leonard (1987:499) suggested that one could sample
“to redundancy” and that the way to know when additional samples were redundant
with previous samples was simple; “plot the information gained against the number
of samples taken and determine if the curve is becoming asymptotic. It may then be
reasonable to assume that the sample is suf¬ciently representative with regard to that
quantitative paleozoology

Table 4.1. Volume excavated and NISP of mammals per annual ¬eld season at the
Meier site

Volume (m3 ) Deer NISP/m3
1973 11.0 519 16 276 25.1
1987 40.7 1,359 21 778 19.1
1988 31.2 1,232 22 756 24.2
1989 46.3 970 19 562 12.1
1990 29.2 956 18 570 19.5
1991 37.7 1,385 20 838 22.2

particular information.” In paleozoology, sample size can be measured one of two
ways “ either as amount excavated or as NISP.

Excavation Amount

Paleontologists have examined the in¬‚uence of sample size measured as amount of
sediment examined on NTAXA (e.g., Raup 1972). Efforts to correct for such continue
to this day (e.g., Crampton et al. 2003). Wolff (1975) used an empirical means to
determine if suf¬cient sediment had been examined to argue that his samples were
representative of NTAXA. He compiled data on cumulative NTAXA across increas-
ing amounts of sediment from which faunal remains had been extracted. When
his cumulative NTAXA curve leveled off across several additional units of sediment
volume, Wolff (1975) argued that his total sample was representative of that target
variable. With the cumulative NTAXA plotted on the vertical or y-axis of a bivariate
plot, and new unit volumes of sediment added along the horizontal or x-axis of the
plot, Wolff showed that taxa were initially added quickly by new samples, but as
the number of samples increased the rate of addition of new taxa slowed until it
leveled off across multiple new samples. The latter was taken by Wolff to mean that
he had sampled suf¬ciently to have a representative sample; Leonard (1987) would
say that Wolff had sampled to redundancy because faunal remains from additional
unit volumes of sediment failed to produce any new taxa.
The protocol is easy to illustrate with the data in Tables 4.1 and 4.2 for the Meier
site. If we use those data to construct a cumulative NTAXA curve based on volume
excavated, we obtain the result in Figure 4.1 . As the volume excavated from one
year to the next increased, NTAXA initially increased, but then it leveled off and no
sampling, recovery, and sample size 145

Table 4.2. Annual NISP samples of mammalian genera at the Meier site

Taxon 1973 1987 1988 1989 1990 1991 Total
Scapanus 4 4 3 4 1 2 18
Sylvilagus 2 3 1 1 10 1 18
Aplodontia 2 1 1 3 7
Tamias 1 1
Tamiasciurus 2 2
Thomomys 2 1 5 1 9
Castor 13 100 65 52 41 71 342
Peromyscus 4 12 12 4 3 35
Rattus 1 1
Neotoma 1 1
Microtus 15 25 34 15 11 100
Ondatra 37 97 55 59 74 52 374
Erethizon 1 1
Canis 2 25 13 16 11 25 111
Vulpes 3 1 1 5
Ursus 20 16 20 7 13 26 102
Procyon 15 79 51 35 43 64 287
Martes 1 6 1 1 11 20
Mustela 4 35 17 19 38 21 134
Mephitis 1 1 2 4
Lutra 6 12 6 2 11 14 51
Puma 4 1 3 1 9
Lynx 9 5 4 1 4 8 31
Phoca 3 6 5 10 6 13 43
Cervus 103 165 191 152 106 218 935
Odocoileus 276 788 756 562 570 838 3,780
Annual NISP 519 1,359 1,232 970 956 1,385 6,421
Annual NTAXA 16 21 22 19 18 20 26
Cumulative NISP 519 1,878 3,110 4,080 5,036 6,421 “
Cumulative NTAXA 16 21 24 26 26 26 “

new taxa were added after the ¬rst 129.2 m3 of sediment had been inspected. With
respect to cumulative volume of sediment excavated, the Meier site sample contains
representatives of at least the most common taxa (see the following section); very
rarely represented taxa may not be present in the collection, but the sampling to
redundancy procedure suggests that we have a statistically precise representation of
the common taxa.
quantitative paleozoology

figure 4.1. Cumulative richness of mammalian genera across cumulative volume (m3 )
of sediment excavated annually at the Meier site. Numbers adjacent to plotted points are
cumulative m3 . Data from Table 4.1 .

NISP as a Measure of Sample Redundancy

Retaining taxonomic richness or NTAXA as our target variable for illustrative pur-
poses, consider the Meier and the Cathlapotle collections. Meier was sampled over
a period of six annual ¬eld seasons (the last ¬ve were consecutive) by two archae-
ologists. Cathlapotle was sampled over a period of four annual ¬eld seasons by one
archaeologist, but early in the ¬rst annual ¬eld season recovery techniques varied
considerably from those used later that year, so the ¬rst year is split into two chrono-
logically consecutive samples for illustrative purposes. At both sites, each annual ¬eld
season spanned a period of 8 weeks. Annual NISP samples from Meier are described
in Table 4.2 and those for Cathlapotle are described in Table 4.3. Summed values
for Meier in Table 4.2 are larger than those given in Table 1.3 because an additional
sample analyzed in 1973 is included in the former table. Values for Cathlapotle in
Table 4.3 are greater than those given Table 1.3 because included in the former table
are specimens that could not be assigned to a temporal component and thus could
not be included in Table 1.3.
It has long been recognized that the order in which samples are added to cumulative
frequency curves can in¬‚uence the result (e.g., Kerrich and Clarke 1967). The total
sampling, recovery, and sample size 147

Table 4.3. Annual NISP samples of mammalian genera at Cathlapotle. The two 1993
samples represent different recovery techniques

Taxon 1993a 1993b 1994 1995 1996 Total
Didelphis 10 10
Scapanus 3 3
Sorex 4 4
Lepus 14 20 18 52
Aplodontia 2 18 41 42 33 136
Castor 1 32 123 185 51 392
Peromyscus 4 1 5
Microtus 1 12 16 39 68
Ondatra 19 34 32 21 106
Canis 4 27 5 3 39
Vulpes 1 3 1 5
Ursus 1 23 29 31 18 102
Procyon 1 57 59 70 20 207
Martes 2 2
Mustela 3 14 7 5 29
Mephitis 3 3
Lutra 14 19 13 19 65
Puma 5 3 3 1 12
Lynx 2 6 12 6 26
Phoca 1 19 41 4 65
Ovis 2 2
Cervus 16 462 879 1,184 683 3,224
Odocoileus 18 332 797 821 408 2,376
Equus 2 2 4
Annual NISP 40 973 2,091 2,488 1,345 6,937
Annual NTAXA 7 14 20 19 18 24
Cumulative NISP 40 1,013 3,104 5,592 6,937 “
Cumulative NTAXA 7 15 20 21 24 “

cumulative sample size at which the curve levels off and thus suggests that new
samples are providing no new information but instead only redundant information
can vary considerably depending on the order of sample addition. Thus, choice of the
order in which samples are added must be explicit and logical. Given that there is an
inherent (chronological) order to the annual samples from Meier and also to those
from Cathlapotle, it is logical to treat the samples as cumulative in the temporal order
in which they were collected. Doing so for the Meier annual samples produces the
quantitative paleozoology

figure 4.2. Cumulative richness of mammalian genera across cumulative annual samples
(NISP) from the Meier site. Numbers adjacent to plotted points are cumulative NISP. Data
from Table 4.2.

cumulative NTAXA curve shown in Figure 4.2; doing so for the Cathlapotle annual
samples produces the cumulative NTAXA curve shown in Figure 4.3 (both curves
are slightly different than those described in Lyman and Ames [2004] because all taxa
are included here; Lyman and Ames [2004] excluded historically introduced taxa).
What do those curves suggest?
On the one hand, the cumulative NTAXA curve for Meier levels off after the
addition of the fourth, or 1989, sample (Figure 4.2). Despite an addition of more than
2000 NISP, no new taxa are added with the 1990 and 1991 samples. These last two,
most recent samples are redundant with earlier samples in terms of their in¬‚uence
on the target variable of NTAXA. This suggests that the total Meier collection can
be treated as representative of the mammalian genera deposited at the site. The
cumulative NTAXA curve for the Cathlapotle sample, on the other hand, does not
level off but rises with the addition of each new sample (Figure 4.3). An argument
cannot be made that additional collection of faunal remains from this site will fail to
produce evidence of additional mammalian genera. The cumulative NTAXA curve
for Cathlapotle also suggests that the total sample, though it consists of nearly 7,000
NISP, does not represent all mammalian genera in the site deposits.
In the ecological literature, curves such as those illustrated in Figures 4.1 , 4.2 and
4.3 are sometimes referred to as “accumulation curves” (Gotelli and Colwell 2001 )
sampling, recovery, and sample size 149

figure 4.3. Cumulative richness of mammalian genera across cumulative annual samples
(NISP) from Cathlapotle. Numbers adjacent to plotted points are cumulative NISP. Data
from Table 4.3.

for an obvious reason. Sampling to redundancy has not been mentioned very often in
paleozoological research (e.g., Lyman 1995a; Monks 2000; Reitz and Wing 1999:107),
and used even less often (e.g., Butler 1990; Lyman and Ames 2004; Wolff 1975).
Many paleoethnobotanical examples are cases in which sampling effort is plotted
against richness, and the in¬‚uences of sample size differences are noted (Lepofsky
and Lertzman 2005). This underscores the ease with which a quantitative tool can
be misrepresented as doing one thing when in fact it is doing something else. We
return to this general kind of curve later in this chapter. Here it suf¬ces to note that
the curves in Figures 4.1 , 4.2 and 4.3 are but one kind of a more general kind of curve
that is used to examine the relationship between sample size and ecological variables
such as NTAXA.

Volume Excavated or NISP

The amount of sediment examined for faunal remains is one measure of sample size,
but the NISP per unit volume of sediment can vary considerably. This means that
correlations between sediment volume and, say, NTAXA, are likely to be less strong
quantitative paleozoology

figure 4.4. Relationship of mammalian genera richness (NTAXA) and sample size (NISP)
per annual sample at the Meier site. The relationship is described by the simple best-¬t
regression line (Y = 13.048 + 0.0059X) and is signi¬cant (r = 0.89, p < 0.01). The year the
sample was collected is indicated. Data from Table 4.2.

that those between NISP and NTAXA. The NISP of deer varies by as much as thirteen
NISP per cubic meter across the six samples from the Meier site (Table 4.1 ). There
is no statistically signi¬cant relationship between the volume excavated per year and
the NISP per annual sample for the 1987“1991 samples (r = 0.05, p > 0.2); the small-
est (1973) sample is deleted because it in¬‚uences the result considerably (r = 0.73 if
that sample is included). NISP per annual sample and richness per annual sample at
the Meier site, on the other hand, are strongly correlated (Figure 4.4). This suggests
that the better variable with which to monitor the in¬‚uence of sample size on variables
such as NTAXA is NISP, though this may vary.
Were I to examine the adequacy of the sample from Cathlapotle in a real analysis
rather than simply illustrating an analytical technique, I would apply the sampling
to redundancy protocol to the precontact assemblage and also to the postcontact
assemblage rather than to the site collection as a whole. This protocol demands that
the target variable of interest be explicitly de¬ned and the boundaries of the appro-
priate sample be unambiguous. Nevertheless, several lessons can be taken from the
preceding. First, the absolute size of a sample is not necessarily a good measure of
that sample™s representativeness of a particular target variable. The total mammalian
sampling, recovery, and sample size 151

genera NISP from Cathlapotle (= 6,937) is larger than the total mammalian gen-
era NISP from Meier (= 6,421), yet the latter seems to be representative of NTAXA
whereas the former does not seem to be representative of NTAXA. Second, cumu-
lative chronological samples, whether by week, month, or year, provide logical units
with an inherent cumulative order that may provide an indication of when enough
material has been collected. Such an argument presumes that identi¬cation of the
recovered faunal remains proceeds apace with recovery, or that the time lag between
the two is insigni¬cant. If identi¬cation can keep pace with recovery, then pale-
obiological resources can be saved in situ rather than disturbed (some would say
“destroyed”) by recovery because it will be clear when a sample suf¬ciently large to
provide an accurate answer (one not in¬‚uenced by inadequate sample size) has been
The ¬nal lesson is that, presuming one knows the identity of the variables plotted
on both axes (and there is no reason the analyst should not), the meaning of the
cumulative curve is commonsensical. When the curve is steep, much new information
is being added with each new sample; when it is horizontal, new samples are adding no
new information about the target variable. This makes such a curve useful as a simple
(and readily visible) way to monitor what is being learned as sample size increases, and
to determine whether additional samples are necessary or not. Samples can comprise
the material collected during one temporal period, or they can be structured some
other way, such as choosing (using probability sampling) a sample of 10 percent of all
units excavated or exposures inspected (or collected), then another 10 percent, then
another, and so on. Similarly, any target variable that consists of a single value can
be plotted on the y-axis, just as any kind of sample can be plotted on the x-axis. Such
target variables might involve the average or mean size of individuals of a taxon, or
the frequencies of skeletal parts of a taxon, or virtually any variable.
There is more to say about the type of graph shown in Figures 4.1 “4.4. This
graph type is one in which a measure of sample size is plotted against a measure of a
biological property, and more is said about such graphs and different versions of them
later. The sampling to redundancy type of graph is introduced here to illustrate its
value for evaluating sample adequacy as sample size is being actively increased. Once
the ¬eldwork is completed, little else can be done to increase the size of a collection.
Knowing whether more specimens are needed as collection is taking place would be
valuable knowledge. Knowing whether one is losing material rather than collecting
it as sediment is inspected (e.g., screened) would be equally valuable knowledge.
Zooarchaeologists in particular have devoted a great deal of energy to ¬guring out
how to generate this latter sort of knowledge, and it is to the results of those energy
expenditures that we turn to next.
quantitative paleozoology


Once a geographic and geological context in which to look for faunal remains has
been chosen, the next step is to choose how those remains will be searched for
and retrieved from sediments. Faunal remains can be hand picked from sediments as
those sediments are excavated. Bones and teeth can be collected from screens or sieves
the function of which is to allow sediment to pass through whereas faunal remains
are caught in the mesh where they are more visible than when in the sediment in
the excavation. Screens were not always used by zooarchaeologists who gathered, by
hand, those bones and teeth they saw in the sediment as it was excavated. Watson
(1972) showed that many bone fragments ¤ 3 cm maximum dimension tended to be
overlooked when hand picking alone was used. Passing sediments through screens
increased the return of small fragments an order of magnitude. An earlier study
showed exactly the same thing using remains of mollusks.

Hand Picking Specimens by Eye

Sparks (1961 ) demonstrated that the percentage of recovered remains of terrestrial
mollusks differed markedly by size class. The sample collected by eye, unaided by
screens, tended to have more specimens representing large size classes (>50 percent
of all specimens recovered) whereas the sample collected from a screen was dom-
inated by specimens representing small size classes (>70 percent of all specimens
recovered). His data are graphed in a way different than Sparks did in Figure 4.5 to
allow comparison of the taxonomic abundances in the two samples. The identity of
the taxa themselves is unimportant to this exercise, so categories of specimens are
distinguished on the basis of ordinal scale average size. Thus, whereas Sparks (1961)
distinguished eighteen taxa, there are only ¬fteen size classes here. Figure 4.5 shows
that the taxa with the largest shells were those that were most often collected by hand,
and those taxa with the smallest shells were seldom collected by hand. What seems
to have been a 2 mm mesh sieve produced many more individuals of small size, and
Sparks (1961 :72) concluded in an understated way that “Any attempt to pick out shells
by eye from a deposit is bound to lead to distortion in the percentage frequencies of
species.” Study by invertebrate paleobiologists of what is now referred to as size bias
continues (e.g., Cooper et al. 2006; Kowalewski and Hoffmeister 2003).
Results like those Sparks (1961 ) derived for mollusks were found by Payne (1972,
1975) for mammal remains. Although he did not explicitly list the body size or average
live weight of an adult animal, Payne (1975) found that more of the larger remains
sampling, recovery, and sample size 153

figure 4.5. Relative abundances of ¬fteen size classes of mollusk shells recovered during
hand picking from the excavation, and recovered from ¬ne-mesh sieves. Original data from
Sparks (1961 ).

of large-bodied taxa were found by hand while excavating whereas more of the small
remains of small-bodied taxa were found in sieves or screens. Taxa were rank ordered
in ¬ve size classes, from largest to smallest: cattle (Bos sp.), pig (Sus sp.), sheep and goat
(Ovis sp., Capra sp.), canid (Canis sp., Vulpes sp.), and hares (Lepus sp.). Assuming
that the remains that were recovered by hand picking from the excavation would
also be recovered from the screen. Figure 4.6 shows two things about Payne™s (1975)


figure 4.6. The effect of passing sediment through screens or sieves on recovery of mam-
mal remains relative to hand picking specimens from an excavation unit. Numbers within
bars are NISP. Data from Payne (1975).
quantitative paleozoology

data. First, more remains in general are collected from screens than by hand from an
excavation, something not so obvious given how Sparks (1961 ) presented his data.
Second, the smaller the body size of a taxon, the more of its remains will be found
in the screen than in the excavation; this echoed Sparks™s original observation on
mollusk remains but expanded it to include remains of mammals.

Screen Mesh Size

It is commonsensical to believe that small bones and small fragments thereof will fall
through coarse-mesh hardware cloth (that with large holes) whereas many will be
caught by and thus be recovered if ¬ne-mesh hardware cloth is used. Thomas (1969)
and Payne (1972, 1975) demonstrated this empirically (see also Casteel 1972; Clason
and Prummel 1977), and showed that the magnitude of loss when coarse mesh was
used had been underestimated. Their seminal work spawned over the next 30 years
a plethora of studies on the in¬‚uence of screen-mesh size on recovery (see James
[1997] for a relatively complete listing of references as of a decade ago). Such studies
continue to this day (e.g., Nagaoka 2005b; Partlow 2006), sometimes with much
more statistical sophistication than that found in the original studies (e.g., Cannon
1999). Although the lessons learned have been signi¬cant ones, many of them were
learned with Thomas™s (1969) seminal effort. For that reason, various analysts have
subsequently used his data to substantiate arguments concerning the in¬‚uence of
screen-mesh size on recovery (e.g., Casteel 1972; Grayson 1984).
Thomas (1969) used zooarchaeological data from three sites; this demonstrated
that recovery was not simply a function of the particular sample (geographic and
geological location) chosen. As each site was excavated, sediment was passed through
a series of nested screens with increasingly ¬ner mesh. The ¬rst screen was 1 /4-inch
(6.4 mm) mesh, the second was 1 /8-inch (3.2 mm) mesh, and the ¬nal screen was 1 /16-
inch (1.6 mm) mesh. All faunal remains in each screen were retrieved and recorded as
to screen mesh in which they were found. After all remains were identi¬ed, Thomas
categorized the remains as to average adult live weight of an individual of the taxon
represented. He distinguished ¬ve size classes: Class I: live weight < 100 g (e.g., mice);
Class II: live weight 100 to 700 g (e.g., squirrels); Class III: live weight 700 g to 5 kg (e.g.,
rabbits); Class IV: live weight 5 to 25 kg (mid-size mammals); and Class V: live weight
> 25 kg (e.g., deer). Thomas retained distinctions between site-speci¬c samples, and
also those between each vertical analytical level within each site. Such distinctions
are irrelevant to studies of the loss of faunal remains, so we can ignore them and
lump all data into categories de¬ned by screen-mesh size and body size (Table 4.4).
sampling, recovery, and sample size 155

Table 4.4. Mammalian NISP per screen-mesh size class and body-size class for
three sites. Percentages are calculated for each body-size class. Data from
Thomas (1969)

Body-size class 1 /4 inch (%) 1 /8 inch (%) 1 /16 inch (%) Total
I (< 100 gm) 141 (5) 910 (31) 1,930 (64) 2,981
II (100“700 gm) 626 (14) 1,478 (33) 2,450 (53) 4,554
III (0.7“5 kg) 1,069 (29) 1,358 (37) 1,275 (34) 3,702
IV (5“25 kg) 85 (96) 4 (4) 0 89
V (> 25 kg) 1,308 (100) 1 (0.1) 0 1,309
Total 3,229 3,751 5,655 12,635

Understating the issue, Grayson (1984:170) noted that there are “a number of ways
in which [Thomas™s] recovery data can be analyzed, but no matter how the analysis
proceeds, the effects of screen-mesh size on recovery are dramatic.” Although it
doubtless is untrue, assume, for instance, that 100 percent of all faunal remains were
recovered by the 1 /16-inch screen mesh. We can then determine the cumulative
percentage of NISP of each body-size class of mammal that was recovered across
the increasingly ¬ner screen-mesh size classes. These cumulative percentages are all
plotted in Figure 4.7. That ¬gure shows that the larger the body-size class is, the
more of a taxon™s remains are recovered in coarse mesh screens, and the smaller the
body-size class, the more of a taxon™s remains are recovered in ¬ne-mesh screens.
Thomas™s data empirically demonstrated what had long been suspected prior to his
study “ remains of small organisms are lost through coarse-mesh screens “ and they
demonstrate it with remarkable clarity. They demonstrate it on at least an ordinal
scale because screen-mesh size classes and body-size classes are treated in Figure 4.7
as ordinal-scale variables. The one thing that we do not know from Thomas™s data is
the nature of what is lost through the 1 /16-inch mesh screens. But even without such
information, Thomas™s data should prompt us to worry about taxonomic abundance
data even if we use a ¬ne-mesh hardware cloth, such as 1 /8-inch or 1 /16-inch mesh.
Small taxa will be underrepresented relative to large taxa even when ¬ne-mesh sieves
are used. Deciding how thorough to be in recovery efforts (¬ner mesh will result in
greater thoroughness) is a tactical decision that will depend on the research question
asked and its attendant target variables.
Even though numerous empirical studies indicate that the coarser the screen mesh,
the more small specimens pass through the sieve and are not recovered, occasionally
this does not seem to hold true (e.g., Vale and Gargett 2002). The potential reasons
for this are several (Gobalet 2005; Zohar and Belmaker 2005), but the most likely ones
quantitative paleozoology

figure 4.7. Cumulative percentage recovery of remains of different size classes (Roman
numerals) of mammals. The critical but empirically unvalidated assumption is that all
remains will be caught in the 1 /16-inch mesh screen. Data from Table 4.4 (originally from
Thomas 1969).

are taphonomic (Gargett and Vale 2005). If small remains are taxonomically uniden-
ti¬able because they are anatomically incomplete due to fragmentation, corrosion,
or some other taphonomic process, then it is possible that the use of small sieves
will not increase the value of NTAXA (e.g., Cooper et al. 2006). This is an empirical
matter; every collection is unique and subject to investigation as to whether or not
¬ne mesh makes a difference.

To Correct or Not to Correct for Differential Loss

If one passes site sediments through coarse-mesh hardware cloth, it is likely that
small bones and small teeth will, like the sedimentary particles themselves, pass
through the screen and thus not be recovered. The coarser the mesh of the hardware
cloth “ the larger the openings “ the more remains of, ¬rst, small animals, and then
progressively larger animals, as coarseness increases, will be lost because they are able
to pass through the hardware cloth. The total magnitude of such loss will depend
on the population of remains of small animals present in the screened sediments
(Clason and Prummel 1977). The choice of sieve mesh size should depend on the
sampling, recovery, and sample size 157

research questions one is asking because using ¬ner mesh means it will take longer
(and cost more) to complete an excavation “ there will be more material caught in the
screen that must be looked over and from which faunal remains must be removed.
One way to avoid the total cost of using ¬ne-mesh sieves throughout an excavation
is to take bulk samples every so often (how often is a matter of choice within the
sampling design used) and to pass those bulk samples through one or more ¬ner
meshed sieves to determine what and how much is being lost. Some analysts have
argued that if the rate of loss can be determined, then what has been recovered
can be mathematically adjusted to account for what has been lost (e.g., James 1997;
Thomas 1969; Ziegler 1965, 1973). Because differential recovery is often a troublesome
concern, it is worthwhile to review one way to correct for differential loss.
Thomas (1969) suggested that the analyst determine a correction factor to analyt-
ically compensate for differential recovery of small remains. This might involve ¬rst
using a formula like this:

Percentage of NISP lost = 100 NISP from ¬ne-mesh or bulk samples /

NISP from ¬ne-mesh or bulk samples
+ NISP from coarse mesh or standard recovery

Once the percentage lost is known, the inverse of the fraction lost (represented by
the percentage lost) can be multiplied by what has been recovered to estimate what
would have been recovered if there had been no loss. Alternatively, Thomas (1969)
suggests simply calculating the recovery ratio using the formula:

Recovery ratio = NISP for all recovery methods/
NISP for recovery method of interest.

This formula is used for each size class of taxa. Thus, using the data in Table 4.4
for illustrative purposes, the recovery ratios per size class are: I: 21.14 (2981 /141); II:
7.27 (4554/626); III: 3.46 (3702/1069); IV: 1.05 (89/85); and V: 1.00 (1309/1308). This
means that if one wanted to correct for differential recovery that resulted from use of
different screen mesh sizes at these sites, then the NISP of size class I remains should
be multiplied by 21.14, the NISP of size class II remains should be multiplied by 7.27,
size class II by 3.46, size class IV by 1.05, and size class V by 1.
There is a critical assumption that must be granted if a correction protocol such
as that described by Thomas is to be used. The assumption is that the rate of loss
determined from the subsample is representative of the entire sample. The weakness
quantitative paleozoology

of the assumption is that the recovery rate will likely vary from recovery context
to recovery context because faunal remains tend to not be randomly distributed
throughout a site or throughout a stratum. Loss will not be stable but in fact will likely
vary not only from site to site and from stratum to stratum, but also from horizontal
context to horizontal context within a site or stratum. Few researchers have explored
this potentiality of a nonhomogeneous distribution of faunal remains with real data.
Thomas (1969) used statistical procedures to determine that there seemed to be
minimal vertical variation in the distributions of faunal remains, and so had an
empirical warrant to apply his correction factor across entire site collections.
Not all sites have homogeneous distributions of faunal remains, and thus it is
ill advised to calculate a correction factor based on one excavation unit (whether
horizontally distinct, vertically distinct, or both) and to then apply that correction
to another unit to obtain, say, a site-wide value (e.g., Cannon 1999; Lyman 1992a;
Shaffer and Baker 1999). Occasionally paleozoologists have noted the proportion of a
deposit that has been excavated, and then estimated frequencies of taxa in the entire
site or deposit (e.g., Lorrain 1968). Again, such an estimation procedure assumes that
the density of NISP per unit of area or unit of volume observed applies to the entire
site or deposit under study. As data presented by Cannon (1999) demonstrate, such
an assumption should be empirically validated, else estimates of total site content
will be in error.


Thus far several issues with respect to generating collections of faunal remains have
been touched on. The focus has been to describe how one might determine if a
collection is representative of a target variable by determining if one has sampled to
redundancy or not, to illustrate how a particular recovery technique might in¬‚uence
what is collected (hand picking and screen mesh size), and to argue that despite
being able to calculate a recovery rate in a mathematically elegant fashion, to utilize
that rate as a correction factor is unwise given the requisite assumption that faunal
remains are homogeneously distributed over the sampled deposit(s). For the sake of
simplicity, throughout the chapter the focus has been on samples from which one
seeks to measure taxonomic richness, or NTAXA. But the arguments hold with equal
force for taxonomic abundances and other quantitative measures of the taxonomic
composition of a collection, as demonstrated in Chapter 5.
The arguments made here also hold for nontaxonomic quantitative measures. For
example, if the remains of taxa comprised of small individuals are lost more often
sampling, recovery, and sample size 159

than the remains of taxa comprised of large individuals (Shaffer 1992), then it stands
to reason that such intertaxonomic variation in recovery likely also applies intratax-
onomically. In particular, small skeletal elements of a taxon will be lost more often
than large skeletal elements (e.g., Nagaoka 2005b). Similarly, small fragments will be
lost more often than large fragments (Cannon 1999). In general, small specimens will
be lost more often than large specimens, regardless of the taxonomy or anatomical
completeness of those specimens. The general lessons from such observations are two.
The ¬rst lesson rests on the fact that a relationship between sample size and the
variable of interest may exist, so paleozoologists should search for such relationships
(e.g., Koch 1987). If a relationship is found, then although the sample might in fact
be representative of the variable of interest, the observed value of that variable might
be result of sample size (Leonard 1997). Until such possible sample-size effects are
controlled for analytically, or the relationship is found to be merely a correlation and
not causal, it is ill-advised to interpret the variable in terms of some ecological or
anthropogenic factor. The second lesson is that virtually any conceivable quantitative
variable that can correlate with NISP will display values that are also potentially a
function of sample size. Finding correlations between target variables and sample
sizes does not preclude analysis and interpretation, but such ¬ndings suggest that
cautious interpretation is warranted if the sample-size effects cannot be analytically
controlled or eliminated. This brings up the important topic of how we might detect
sample-size effects and how we might control for them.


Botanists recognized in the early twentieth century that the larger the area they sam-
pled the more species of plant they identi¬ed (Leonard 1989). Initially the relationship
was thought to be linear “ that as the area sampled increased, the number of species
would increase at a constant rate. Within a decade or two it was empirically demon-
strated that the relationship was semilogarithmic when large areas were considered.
The number of species identi¬ed increased as the logarithm of the area increased. By
the late 1930s, the relationship between amount of area sampled and number of plant
species identi¬ed was being graphed as shown in Figure 4.8 (after Cain 1938). Within
a few years, a graph of like form was generated for animal taxa but instead of the
area sampled the independent variable was the total number of individual animals
tallied (Fisher et al. 1943). The relationship between area examined and the num-
ber of taxa identi¬ed (NTAXA), and that between number of individuals tallied and
NTAXA are the same because the more area examined the more individuals (whether
quantitative paleozoology

Number of Taxa

Area Sampled (square meters, hectares, etc.)
figure 4.8. Model of the relationship between area sampled (or sampling intensity) and
number of taxa identi¬ed.

plants or animals) are encountered. In ecology, graphs with the form of Figure 4.8
are sometimes referred to as accumulation curves. They are more often referred to as
“species“area curves” because of the seminal discovery of the relationship between
these two variables.
Given the nature of the relationship between the two variables, ecologists in the
middle of the twentieth century became concerned with determination of how much
area to sample, or how many individuals to tally, to ensure that their samples were
representative of the target variable (often a habitat or biological community of
some scale). One solution was to hold the area sampled constant at some minimum
size thought to be adequate. Another is an analytical procedure termed “rarefac-
tion” (Sanders 1968). Rarefaction involves determination of the number of species
expected if all samples were the same size (if all samples included the same number
of individuals). Richness or NTAXA for a fraction of a collection can be estimated by
drawing a (random) subsample (equal to the fraction) of a sample (equal to the col-
lection) of the population of interest. As Tipper (1979) states in his terse history (with
pertinent references as of the late 1970s), the method is termed “rarefaction” because
it involves reducing or rarefying a sample to a smaller size. Figure 4.9 illustrates the
basic procedure and outcome of rarefaction in two ways.
sampling, recovery, and sample size 161

figure 4.9. Two models of the results of rarefaction. (a) histogram with high white bars
of 100 percent sample and black lower bars of rare¬ed (60 percent) sample; (b) rarefaction
curve (compare with Figure 4.8) showing 100 percent sample and corresponding 60 percent
rare¬ed sample.

Ecologists have been grappling with rarefaction for decades “ its various forms
and how to make results more valid (e.g., Colwell and Coddington 1994; Colwell
2004; Colwell et al. 2004; Gotelli and Colwell 2001 ; Scheiner 2003; Schoereder et al.
2004; Smith et al. 1985; Wolda 1981 ). Zooarchaeologists have been aware of the basic
rarefaction procedure for more than 20 years (Styles 1981 ), although few individuals
have used it (see Lyman and Ames 2007 for references). Paleobiologists are also
aware of the method, and they have devoted considerable effort to developing and
quantitative paleozoology

perfecting it (e.g., Alroy 2000; Barnosky et al. 2005; Bush et al. 2004; Miller and Foote
1996). Early efforts to develop standard species area curves for paleozoology (Koch
1987) have not been pursued, probably because general patterns are too general to
be of predictive value.
Given that the basic rarefaction procedure involves reducing a sample to a smaller
size, it is not surprising that as the statistical sophistication of scientists increased
and access to electronic computing power increased in the 1970s, programs were
written explicitly to perform rarefaction analysis. The best known of these among
zooarchaeologists is one designed by Kintigh (1984). This procedure sums all avail-
able samples in order to model taxonomic abundances in the population, and then
draws random samples of various sizes from that modeled population. Richness is
determined multiple times for each sample size, and a mean richness and con¬dence
levels thereof are calculated for each sample size. Finally, the procedure generates not
only a best-¬t line (mean) through the sample data sets but also con¬dence intervals
for the line in graphic form. This rarefaction program has been used by zooarchae-
ologists to compare faunas of different sizes (e.g., McCartney and Glass 1990). The
resulting model approximates the effects of varying sample size on richness and is
designed to test the null hypothesis that all samples (of whatever size) were derived
from the same population, and thus to identify samples that are not members of the
population but are instead (statistical) outliers. An outlier is a sample that seems not
to have been drawn from the same population as all others because it falls far above or
below the richness expected given its size; an outlier is a sample that, probabilistically,
could not have been drawn from the modeled population.
Several seldom acknowledged assumptions and problems attend rarefaction. Early
on, Grayson (1984:152) noted that the rarefaction method in general as originally
developed by Sanders (1968) and later perfected by Tipper (1979) used quantitative
units that were statistically independent of one another; it used individual animals.
No similar quantitative unit is available for paleozoology. One might use MNI, but
these values are dependent on aggregation; one might use NISP, but these values are
likely interdependent to some unknown degree.
Rhode (1988) noted that if one uses Kintigh™s (1984) procedure (and null hypoth-
esis) then one is assuming that a great deal is already known about the population
being investigated. In particular, such use assumes that the samples used to generate
the rarefaction curve are, when summed, representative of taxonomic richness as
manifest in the population of interest and, more importantly, that their sum is also
representative of the distribution of individuals across taxa (known as taxonomic
evenness). Using the sum of all samples to generate a rarefaction curve such as in
Figure 4.9b may result in the inclusion of samples that are not members of the (target)
sampling, recovery, and sample size 163

population; if the samples derive from different populations, their sum will represent
a sample of organisms derived from those multiple populations. The statistical effect
of including all samples is to produce expected richness values for various sample
sizes that have been in¬‚uenced by one or more samples that may not actually be part
of the same (target) population (the same holds for taxonomic evenness). Differences
between a nonmember sample and the model generated from all samples including
the nonmember would be muted to some unknown degree (see also Byrd 1997).
As Rhode (1988:711 “712) astutely observes, if a particular sample used to model the
target population seems to differ signi¬cantly from that modeled population, how
can “the choice of that population as the comparative baseline be justi¬ed?”
As Kintigh (1984) originally noted, the key step in his rarefaction procedure involves
the de¬nition of the population; in particular, which samples are to be included when
summing samples to create the population model? Producing an answer to this
question is where the assumption that we already know much about the population
we are studying comes into play. Analytical means of evaluating whether samples
of different sizes might have been derived from the same underlying population are
discussed later in this chapter.
On the one hand, Buzas and Hayek (2005) recently de¬ned “within-community
sampling” as drawing >1 sample from a population with a particular frequency
distribution (set of taxonomic abundances) or constant value of a variable of inter-
est. “Between-community sampling,” on the other hand, involves drawing the >1
samples from populations with different frequency distributions or the same distri-
bution with different values of a variable of interest. The distinction could be used as
a basis for lumping two samples (they are statistically indistinguishable with respect
to the property of interest) or for not lumping two samples (they are statistically
The preceding returns us to the question of what constitutes the target variable?
If it is NTAXA in a biological community, how is the community de¬ned (see Chap-
ter 2)? If it is the taxa exploited by human occupants of an archaeological site, the
differences between the thanatocoenose, taphocoenose, and identi¬ed assemblage
must be kept in mind (Figure 2.1 ). This volume is not the place to explore these
issues. Rather, it is relevant to illustrate how analysts have studied and analytically
used the generic species“area relationship. To do that in the following, it is assumed
that the target variable is NTAXA within the identi¬ed assemblage. This simpli¬ca-
tion allows us to focus on the species“area relationship and methods of investigating
it, although it is important to note that the relationship may well be found to exist
between any measure of sampling effort or sample size and any target variable (rich-
ness, evenness, heterogeneity).
quantitative paleozoology

Species“Area Curves Are Not All the Same

In preceding pages, techniques to explore relationships manifest by species“area
curves have been mentioned (e.g., Figures 4.1 “4.4 and 4.8“4.9, and associated dis-
cussion). At this juncture it must be made clear that species“area curves do not all
express the same relationships or have the same implications with respect to the rela-
tionship between sample size and NTAXA. This is so because they are constructed
differently, and they are constructed differently because they have different analyti-
cal purposes and address different analytical questions. To demonstrate this, in the
following the data in Table 4.2 are used to construct three different kinds of what are
generically known as species“area curves. (A portion of this section is derived from
Lyman and Ames [2007].)
One kind of species“area curve is shown in Figure 4.2. In this curve samples increase
in size by being added together and thus are statistically interdependent. This kind
of species“area curve is a sampling to redundancy curve. The particular curve in
Figure 4.2 has leveled off, suggesting that all of the information in the last couple
samples (identities of the mammalian genera present) is redundant with information
provided by earlier (smaller) samples. If the curve had not leveled off, such as is the
case in Figure 4.3, then new samples are still adding new information so there is
no empirical basis to argue that we have sampled to redundancy. The sampling to
redundancy curve can be plotted manually by simply connecting points, or it can
be drawn statistically (Lepofsky and Lertzman 2005). A sampling to redundancy
curve has a very narrow analytical purpose “ to determine if increases in sample
size (accomplished by summing samples) in¬‚uence the target variable; its utility is
that it provides an empirical indication of sample adequacy in the form of a static
value for the target variable across samples of varied sizes that comprise one total
collection. Constructing a species“area curve of the sampling to redundancy kind is
straightforward, but remember that the order of sample addition will in¬‚uence the
ultimate sample size at which the curve levels off (see the discussion of Figures 4.2
and 4.3).
Many species“area curves have been constructed in one of two ways distinctly
different from how a sampling to redundancy curve is built. They are distinct because
they have different analytical purposes. Some of those other curves were constructed
to compare statistically independent samples of different sizes (e.g., McCartney and
Glass 1990); some were constructed from statistically independent samples derived
from one population in order to predict representative statistically independent
sample sizes drawn from other populations (e.g., Zohar and Belmaker 2005); some
were used to determine or compare rates of increase in richness (slope of the curve)
sampling, recovery, and sample size 165

(e.g., Grayson 1998); some were constructed by rarifying samples (reducing their
sizes probabilistically) (e.g., Styles 1981 ). How were the other curves constructed?
One way that species“area curves are constructed involves generating bivariate
plots of statistically independent samples, and then statistically ¬tting a curve to
the plot to determine if sample size may be in¬‚uencing the target variable across the
different samples. The example in Figure 4.4 uses the six annual samples from the
Meier site described in Table 4.2. The best-¬t regression line de¬ned by the point
scatter is included. The correlation and the regression line are statistically signi¬cant
( p < 0.01 ) and suggest that NTAXA per statistically independent annual sample is
a function of sample size measured as NISP. If each point represented a sample from
a different stratum or different site, Figure 4.4 would suggest those samples were
strongly in¬‚uenced by sample size, and thus NTAXA values for those samples should
not be compared.
Figure 4.4 does not allow us to surmise if our total sample from the Meier site is
representative of taxonomic richness (compare with Figure 4.2); the kind of curve in
Figure 4.4 has a different analytical purpose and utility. The protocol of building a
species“area curve exempli¬ed in Figure 4.4 is sometimes referred to as the “regres-
sion approach” (Leonard 1997). The name re¬‚ects the statistical analysis performed.
Regression analysis ascertains the strength of the relationship between samples of dif-
ferent sizes (in Figure 4.4, NISP values) and a target variable (in Figure 4.4, NTAXA).
The strength of the relationship is re¬‚ected by the magnitude and statistical signi¬-
cance of the correlation coef¬cient. If there is a signi¬cant correlation between sample
size and the target variable, then the magnitudes of the target variable could be a
result of sample sizes rather than a property of interest. With respect to Figure 4.4,
taxonomic richness varies according to sample size. Therefore, if these samples had
come from different strata or sites, we would not want to conclude something like the
sample from the 1991 site/stratum is taxonomically richer than the 1990 site/stratum,
so the people who deposited the remains in the 1991 site/stratum had greater diet
breadth than those who deposited the 1990 site/stratum materials. Remembering that
correlations do not necessarily imply a causal relationship between two variables, our
inference regarding diet breadth might be correct, but it might not. The regression
approach is merely a way to detect those instances when caution is advisable.
If the regression approach prompts the conclusion that sample-size effects may be
present in a set of samples, the analyst has options. The samples can be pooled and a
rarefaction analysis performed, if one is willing to make the necessary assumptions.
Alternatively, slopes of lines describing the relationship between sample size and the
target variable may vary across different sets of samples (see Chapter 5). Compar-
isons of slopes may reveal a property of the compared sets of samples not otherwise
quantitative paleozoology

figure 4.10. Rarefaction curve (solid line) and 95 percent con¬dence intervals (dotted
lines) of richness of mammalian genera based on six annual samples from the Meier site
(black squares). Data from Table 4.2; curves determined using Holland™s (2005) Analytical

detectable that is free of sample-size effects. A third possibility is to identify statistical
outliers, or samples that fall signi¬cant distances (usually ≥ 2 standard deviations)
from the regression line (Grayson 1984). Ascertaining why samples fall far from the
regression line may reveal a unique property of those unusual assemblages not oth-
erwise perceived and that is free of sample-size effects. Study of slopes and of outliers
avoids one weakness of the regression approach. Small samples may in fact be 100
percent samples or populations (Rhode 1988), and thus the sample-size effect is an
artifact of the size of the populations from which the samples derive.
The third way that species“area curves are constructed involves rarefaction (e.g.,
Sanders 1968; Tipper 1979). Rarefaction has been used by zooarchaeologists for some
time (e.g., Byrd 1997; McCartney and Glass 1990; Styles 1981 ). There are several
ways to construct rarefaction curves, but describing them is beyond my scope here.
It suf¬ces to say that one can use statistically independent samples or statistically
interdependent (summed) samples (or sample without replacement, or sample with
replacement) to estimate NTAXA were a sample of a particular size. A rarefaction
curve constructed using the six annual samples from the Meier site is shown in
Figure 4.10. To generate this curve, Holland™s (2005) Analytical Rarefaction software
sampling, recovery, and sample size 167

was used. If the six samples from Meier were independent of one another and from
different strata or sites, the rarefaction curve would allow comparison of NTAXA
across assemblages of different size without fear of sample size differences driving the
results. As noted earlier, the rarefaction procedure assumes the included samples all
derive from the same population, and it also assumes that specimens used to provide
(NISP) values for drawing the curve are independent of one another. In Figure 4.10,
we know the samples all derive from the same population (the Meier site), and thus
we also know the specimens are to some degree interdependent.
The three kinds of species“area curves shown in Figures 4.2, 4.4, and 4.10 are
not very similar in general appearance despite the similarities in the variables used
to build them. They are not very similar because each curve is meant to address a
distinct analytical question, so each has been built in a unique, distinctive way. The
sampling to redundancy approach (Figure 4.2) determines if one total collection
represents the value of the target variable. Regression analysis (Figure 4.4) allows
detection of possible sample size effects on the target variable among independent
samples of different size. Rarefaction (Figure 4.10) allows two or more samples of
different sizes to be compared as if they were the same size by reducing the larger
samples to a common small size.


There is an analytical means of evaluating whether samples of different sizes might
have been derived from the same underlying population. The analytical technique was
developed by biogeographers studying insular faunas such as those on archipelagos
or island chains (see Brown and Lomolino [1998] for details). They reasoned that the
faunas on land-bridge islands (those once connected to the mainland when sea levels
were low) likely originated on the mainland, and given the species“area relationship,
islands “ which have varied but relatively small land areas “ would have subsets of the
taxa found on the mainland “ which have large land areas relative to islands. Further,
small islands would have smaller subsets of taxa “ have lower NTAXA values “ than
would large islands. Islands can be oceanographic, or they can be habitat islands
surrounded not by water but habitats unfavorable to the taxa located in the insular
habitat patch. The pattern of organismal distribution “ presence/absence of taxa
across the islands “ is referred to as the “nested subset pattern” (Patterson and Atmar
1986; see also Cutler 1994; Patterson 1987; Wright et al. 1998).
The concept of a nested subset pattern is straightforward. Figure 4.11 shows both
a perfectly nested set of faunas, and a poorly nested set of faunas, in two graphic
quantitative paleozoology

figure 4.11. Examples of perfectly nested faunas and poorly nested faunas. (a) perfectly
nested set of faunas, each capital letter represents a unique species; (b) poorly nested set of
faunas, each capital letter represents a unique species; (c) Venn diagram of three perfectly
nested faunas in which the larger the circle, the greater the number of taxa; (d) Venn diagram
of three imperfectly nested faunas in which the larger the circle, the greater the number of
taxa. (a) after Cutler (1994); (c) and (d) after Patterson (1987).

forms. Table 4.5 shows a perfectly nested set of faunas and a poorly nested set of
faunas in tabular form. In the perfectly nested sets, taxa absent from one fauna are
also absent from all smaller faunas, and taxa present in a fauna are also present in all
larger faunas. In poorly or weakly nested faunas, some taxa may occur unexpectedly
in small faunas and large faunas but not in midsized faunas, and other taxa may
not occur in large faunas but occur in midsized or small ones. The unexpected
occurrences are “outliers” whereas the unexpected absences are “holes” in the nested
pattern (Cutler 1991 ).
The extremes of nestedness are easy to tell apart (Figure 4.11 , Table 4.5). What
about intermediate cases? Can we determine if one set of faunas is more nested than
another? Biogeographers have developed quantitative ways to measure exactly how
nested a set of faunas is, and thus one can compare the nestedness of multiple sets
of faunas (e.g., Cutler 1991 ). Atmar and Patterson (1993) refer to their algorithm for
measuring the degree of nestedness as a means to measure an archipelago™s “heat of
disorder” or “temperature.” The algorithm measures the degree of nestedness on a
scale of zero to 100 degrees; faunas that are perfectly nested have a temperature of
0—¦ whereas faunas that display no nestedness whatsoever are 100—¦ . (The 100 degrees
are an arbitrary interval-scale measure of amount of nestedness.) The value of the
nestedness concept is great because, theoretically, nestedness provides an indication
of whether two or more faunas derive from the same population. In a way, the
examination of nestedness is like rarefaction without rarefying; it compares samples
rather than sum them and rarify the sum.
Atmar and Patterson™s (1993) thermometer of nestedness provides a measure
of whether multiple faunal (island) samples derive from the same underlying
sampling, recovery, and sample size 169

Table 4.5. Two sets of faunal samples showing (a) a perfectly nested set of faunas and
(b) a poorly nested set of faunas. +, taxon present; “, taxon absent. (b) was generated
with a table of random numbers

Assemblage Taxon A B C D E F G H I J
a. Nested
+ + + + + + + + + +
+ + + + + + + + +
II “
+ + + + + + + +
III “ “
+ + + + + + +
IV “ “ “
+ + + + + +
V “ “ “ “
+ + + + +
VI “ “ “ “ “
+ + + +
VII “ “ “ “ “ “
+ + +
VIII “ “ “ “ “ “ “
+ +
IX “ “ “ “ “ “ “ “
X “ “ “ “ “ “ “ “ “
b. Not nested
+ + +
I “ “ “ “ “ “ “
+ + + + + +
II “ “ “ “
+ + + +
III “ “ “ “ “
+ + + + + +
IV “ “ “ “
+ + + +
V “ “ “ “ “ “
+ + + +
VI “ “ “ “ “ “
+ + + + +
VII “ “ “ “ “
+ + +
VIII “ “ “ “ “ “ “
+ + + +
IX “ “ “ “ “ “
+ + + + + + +
X “ “ “

(mainland) population. If the faunas are strongly nested, then it is probable that
the samples derive from the same population, and one might perform a rarefaction
analysis using those faunas lumped together (assuming quantitative units are inde-
pendent). If faunas are weakly nested, then one could argue that either the samples
are so small as to either not accurately re¬‚ect the heterogeneity of the population or
the samples derive from different populations. How strong must nestedness be, or
how weak? That is dif¬cult to answer. But the point is that the nestedness thermome-
ter provides a measure that constitutes information bearing on the answer. And a
well-informed decision is likely to be better than one that is poorly informed.
The nestedness diagram of the 18 assemblages from eastern Washington State
generated by Atmar and Patterson™s (1993) thermometer is shown in Figure 4.12.
quantitative paleozoology

figure 4.12. Nestedness diagram of eighteen assemblages of mammalian genera from
eastern Washington State. Note that the NISP per assemblage and the rank order of the
assemblages are strongly correlated (Spearman™s rho = 0.812, p < 0.0001).

That ¬gure suggests there is some nestedness among the faunas. This set of faunas
has a nested “temperature” of 18.23 —¦ , a value that suggests there is indeed some
nestedness (0—¦ is perfectly nested), but the faunas are hardly perfectly nested. In
conjunction with the facts that NISP and NTAXA values per assemblage for this
set of assemblages are strongly correlated (Figure 4.4), and that the order of nested
faunas produced by the nestedness thermometer is strongly correlated with NISP per
assemblage (rho = 0.812, p < 0.0001), it seems reasonable to conclude that all eighteen
assemblages derive from the same population of mammals. The assemblages merely
differ in size (= NISP), and that difference is the major variable that is creating
taxonomic differences between them.
The value of the nestedness concept, however it is determined (and there are
several ways to do so; compare Cutler [1991 ] with Atmar and Patternson [1993]), is
great. If one grants the assumption that a small fauna should approximate a random
sample of a large fauna, then when comparing two or more faunas of different sizes,
if all faunas derive from the same population, they should be nested. The nestedness
concept takes advantage of not only the relationship between sample size and NTAXA
(say), but the taxonomic composition of the faunas. Rarefaction does as well, but it
effectively begins with the assumption that the faunal samples are all from the same
population. The nestedness concept and the techniques for measuring nestedness
allow that assumption to be tested and evaluated empirically. Given that the concept
has been discussed in the ecological literature for more than two decades, and given
the near ubiquitous concern with sample size issues among paleozoologists, it is a bit
surprising that nestedness has not been used by paleozoologists with some frequency.
Indeed, I am aware of only one instance of a paleozoologist using it (Jones 2004).
sampling, recovery, and sample size 171


There is a particularly telling example in the recent literature that highlights the lack
of interdisciplinary contact. Leonard (1987) mentioned the sampling to redundancy
approach 20 years ago in the archaeology literature. That technique was mentioned,
if not used very often, in the zooarchaeological literature several times since then
(e.g., Lyman 1995a; Monks 2000; Reitz and Wing 1999). A recent analysis by paleon-
tologists began with rarefaction to identify the general shape of a curve produced by
samples of different sizes and NTAXA (Jamniczky et al. 2003). Those paleontologists
discovered that the sampling to redundancy approach was a valuable tool for assess-
ing sample representativeness. They seem unaware of any of the discussions in the
zooarchaeological literature of their discovery. These paleontologists also do some
impressive computer modeling to try to predict how many additional samples might
be necessary, but this echoes Kintigh™s (1984) method (which they do not reference),
and they tend to caution against its use.
The paleontologists cited in the preceding paragraph made a signi¬cant contribu-
tion to paleontology and introduced an important analytical technique to that disci-
pline. The point here is simple. Read some paleontology if you are a zooarchaeologist;
if you are a paleontologist or paleobiologist, read some zooarchaeology. The cross-
fertilization will be worthwhile.
The means by which a collection of faunal remains is generated “ which sampling
design is used, how large the sample is, how faunal remains are extracted from
sediments “ can and typically does in¬‚uence what is recovered. Speci¬cally, the size
of the sample collected, and the frequencies of many of its attributes, are in¬‚uenced by
what (and how much) is collected. In the next three chapters, several different target
variables are discussed. Throughout, analytical means of detecting and controlling for
sample-size effects are described as quantitative measures of the target variables are
sought. Given that the variables of interest are quantitative, analysts need to be aware
of sample-size effects and to take every precaution to avoid allowing conclusions to be
in¬‚uenced by them. Means to detect such effects and a possible means to control for
them have been described in this chapter. We will return to these analytical techniques
often in Chapters 5, 6, and 7.
Measuring the Taxonomic Structure
and Composition (“Diversity”) of Faunas

One of the most common analytical procedures in paleozoology is to compare faunas
from different time periods, from different geographic locales, or both (e.g., Barnosky
et al. 2005 and references therein). Comparisons may be geared toward answering any
number of questions. Does the taxonomic composition of the compared faunas differ
(and why), and if so, by how much (and why)? Does the number of taxa represented
(NTAXA) differ between faunas (and why), and if so, by how much (and why)? Do
the abundances of taxa vary (and why)? Ignoring the “and why™s,” these and similar
queries are what can be considered proximal questions. The why questions are the
ultimate questions of interest; they constitute a reason(s) to identify and quantify the
faunal remains in the ¬rst place. Was hominid or human dietary change over time
the cause of the change in taxonomic composition, abundance, and so on? Did the
environment (particularly the climate) change such that different ecologies prompted
a change in the taxa present, the number of taxa present, or the abundances of various
taxa? It is beyond the scope of this volume to consider these ultimate why questions
other than as examples. The purpose of this chapter is to explore how quantitative
faunal data can be analytically manipulated in order to produce answers to these
kinds of proximal questions.
Once faunal remains have been identi¬ed as to the taxa they represent, they can be
quanti¬ed or counted any number of ways, many of which are described in Chapters 2
and 3. As indicated in those earlier chapters, NISP tends to be the quantitative unit
of choice for many analyses. NISP is used in this chapter to illustrate how taxonomic
abundance data can be analytically manipulated in order to measure the taxonomic
structure and composition of a collection of paleofaunal remains. MNI and biomass
might also be used to calculate the indices described, but in many cases there are
good reasons to not use them, as argued in earlier chapters.
Use of NISP throughout this chapter is meant to endorse it as the quantitative
unit of choice in such efforts. This does not mean that NISP is without ¬‚aws that
measuring the taxonomic structure and composition 173

might in¬‚uence analytical results. Whether a set of NISP values for an assemblage
suffers from interdependence should be ascertained prior to performing analyses
like those described in this chapter. Methods to do this are described in Chapter 2.
If the NISP values do not seem to be in¬‚uenced by variation in interdependence,
then use NISP values as ordinal scale values. If the NISP values are plagued by
interdependence, then the data are best treated as nominal scale data. As we will see,
even if interdependence does not seem to be a problem, there are other concerns
with using NISP as an estimate of a property of a paleofauna.
The analytical gymnastics involving number of taxa, shared taxa, taxonomic abun-
dances, and the like typically are implicitly aimed at the biocoenose (biological com-
munity) by paleobiologists, whereas zooarchaeologists may seek measures of the
thanatocoenose (killed population) or the biocoenose depending on the research
question. Recall that a biological community is a slippery entity empirically and
conceptually. Allowing that a community can indeed be de¬ned as, say, a naturally
delineated habitat patch (if de¬ned with even greater dif¬culty in the prehistoric
record than in modern ecosystems), ecologists tend to recognize three levels of inclu-
siveness of biological diversity (Whittaker 1972, 1977). Alpha diversity is the diversity
within a single local community; beta diversity is the change in diversity among or
across several communities (recognizably distinct but adjacent habitats); and gamma
diversity is the diversity evident in a set of communities such as is found across a large
area (Loreau 2000) involving more than one kind of habitat. Paleozoologists do not
always ignore these various sorts of diversity (e.g., Sepkoski 1988) when they com-
pare diversity across geographic localities, temporal periods, or both, but sometimes
they do (e.g., Osman and Whitlatch 1978). Of course sometimes they must assume
(or analytically warrant the belief) that the samples they use are each derived from
a single community and are not time and space averaged (e.g., Bush et al. 2004),
though this is not always possible or necessary depending on the question they are
asking (e.g., Jackson and Johnson 2001 ; Sepkoski 1997).
Given its central role in identifying the target variable, diversity is a concept in
need of explicit de¬nition. The title of this chapter is “Measuring the Taxonomic
Structure and Composition (˜Diversity™) of Faunas.” This wording is meant to imply
that diversity signi¬es the structure and composition of a fauna. By structure and
composition is meant such variables as the particular taxa represented in a collec-
tion of faunal remains, the number of taxa represented regardless of which taxa are
represented, the abundances of various taxa, and the like. In the ecological and bio-
logical literature diversity has come to mean any number of these variables (Magurran
1988; Spellerberg and Fedor 2003). In fact, some years ago the term “diversity” sig-
ni¬ed numerous concepts and variables within ecological research, and thus one
quantitative paleozoology

ecologist suggested that it be abandoned because it was too ambiguous (Hurlbert
1971 ).
The term “diversity” was not abandoned, but the lesson here is an important one. Be
aware that the terms analyst A uses may have different meanings than those intended
by analyst B. I follow precedent in zooarchaeology (e.g., Byrd 1997; McCartney and
Glass 1990) and some ecological literature (e.g., Lande 1996), and use the term “diver-
sity” to signify a family of variables used to describe the structure and composition
of faunas and collections of faunal remains. The members of that family of diversity
variables are introduced in the following section. In subsequent sections, quantitative
indices for each variable are discussed. Although it is sometimes obvious that one
fauna differs in one or more ways from another fauna, the indices have been designed
to provide a quantitative measurement of similarities and differences. Many of these
indices provide a continuous measure of similarity or difference, and this facilitates
comparative analyses.



. 5
( 10)