Lecture © John Brzustowski

BIOL 606 Session, University of Alberta, April 12, 2000

There are several possible goals of integrating fossil stratigraphic information with reconstructed phylogenies:

- assessing the extent of agreement between the phylogeny implied by a cladogram, and the fossil record
- deciding among several equally parsimonious cladograms based on how well they agree with the fossil record
- searching for cladograms that might have been rejected by purely morphological maximum parsimony, but which match the total data (fossil ranges + morphology) more closely.

The first quantitative method for assessing agreement between fossils and phylogeny (of those I surveyed, at least) is the Spearman Rank Correlation1 (SRC). This is the classical (Pearson) correlation coefficient applied to n pairs of ranks (Rf, Rp). Each pair corresponds to one taxon: Rf is the rank of that taxon in the fossil record (1 = earliest taxon to appear, n = most recent); and Rp is the rank of that taxon in the phylogeny (1 = most basal taxon, n = most derived taxon). SRC ranges from -1 to 1, representing complete disagreement (earliest clades branched most recently) to complete agreement (most recent clades are most derived). Unfortunately, the range of possible SRC values depends on tree topology, so comparisons of SRC between phylogenies is problematic. Moreover, ranking the fossil range data discards the magnitudes of age differences, thereby losing information.

The Stratigraphic Consistency Index2 (SCI) is similar to SRC, but is based on the n-1 clades in a tree, and is the fraction of these which are consistent with the fossil stratigraphic record. A clade is consistent if the earliest appearance of any taxon in that clade is later than the earliest appearance of any taxon in its sister clade. SCI suffers from the same problems as SRC.

Other proposed measures of fossil/phylogeny consensus use differences in dates of first appearance in the fossil record, rather than just their ranks. They all use a number called variously the Stratigraphic Debt (SD) or Minimum Implied Gap (MIG), which is the sum of all Ghost Ranges implied by the phylogeny. If two sister clades have different dates of earliest appearance in the fossil record, the associated ghost range is the magnitude of the difference between these dates. The implication is that the later appearing clade did not yield any fossils during a period when it must have existed, and our cladogram therefore makes a hypothesis of non-preservation. The SD is measured in units corresponding to a division of the fossil record into stratigraphic zones, so each time a lineage passes through a zone without leaving us a fossil, the SD increases by one.

The following consensus measures differ only in how they scale SD to allow comparison between different cladograms and datasets:

- the Relative Completeness Index3 (RCI) divides SD by the sum of fossil ranges for all taxa (measured in the same units as SD), then subtracts this ratio from one. This is a bit odd in that additional fossil information is used (i.e. the range, not just the date of first occurrence), but without relating it in any way to the cladogram. RCI values can range from minus infinity to one, but the latter is only obtained when SD is zero (i.e. all taxa are of the same age).
- the Manhattan Stratigraphic Metric4 (MSM) divides the difference in age of first occurrence between the earliest and latest appearing taxa by SD. (NB: the paper cited here gives a rather involved procedural definition of MSM, but it is not too difficult to show that the definition I give here yields the same number.) MSM values are always positive, and attain their maximum possible value of one precisely when the SRC equals one.
- the Gap Excess Ratio5 (GER) scales SD relative to the range given by SDmin and SDmax, the smallest and largest values of SD obtained when all possible permutations of first occurrence dates among taxa are examined. GER values range from 0 (when SD=SDmax) to 1 (when SD=SDmin).

In all three cases, a significance test can be obtained by permuting first occurrence dates among taxa in all possible ways (or a random subset thereof) and asking how often one observes as large (or small) a consensus measure as was obtained with the non-permuted data.

One simulation study6 uses a simple model of evolution and preservation to ask whether the use of admittedly incomplete fossil information can at least in principal yield trees closer to the true phylogeny than can purely morphological cladistics. While the results are strongly affirmative, the evolutionary model is ad hoc, much of the model's parameter space was not explored, and the issue of whether ancestors are to be allowed in the phylogenetic reconstructions is not addressed.

A recent approach7 uses a Maximum Likelihood (ML) model, with preservation and extinction probabilities (per taxon per stratigraphic layer) as parameters, to assign probabilities to cladograms of various lengths. Effectively, SD has been added as a character, with one unit of SD given equal weight to one character state transition, and the method seeks a tree for which this hybrid length has maximum likelihood, given estimates of the preservation and extinction probabilities. Unfortunately, the method of computing ML is ad hoc due to the computational infeasibility of doing so correctly.

---------

- M.A. Norell and M.J. Novacek. 1992. The fossil record and evolution: comparing cladistic and paleontologic evidence for vertebrate history. Science 255:1690-1693.
- J.P. Huelsenbeck. 1994. Comparing the stratigraphic record to estimates of phylogeny. Paleobiology 40:470-483.
- M.J. Benton. 1994. Palaeontological data, and identifying mass extinctions. Trends Ecol. Evol. 9:181-185.
- M.E. Siddall. 1998. Stratigraphic fit to phylogenies: a proposed solution. Cladistics 14:201-208.
- M.A. Wills. 1999. Congruence between phylogeny and stratigraphy: randomization tests and the gap excess ratio. Syst. Biol. 48(3):559-580.
- D.L. Fox, D.C. Fisher and L.R. Leighton. 1999. Reconstructing Phylogeny with and without Temporal Data. Science 284:1816-1819.
- P.J. Wagner. 1998. A likelihood approach for evaluating estimates of phylogenetic relationships among fossil taxa. Paleobiology 24(4):430-449.

Discussion

Rapporteur: