BIOL 606 Home

 The Use and Abuse of Molecular Phylogenies

Presenter : Greg Dueck

Discussants : Stephanie Zaklan and Chris Kyle
Rapporteur : Greg Wilson

The field of molecular systematics has become quite popular since the initial development, in 1966, of using protein electrophoresis to estimate genetic variation, for a number of reasons. Molecular data are genetic, and therefore hereditary. Molecular data are universal, and there is an abundance of potential data. The molecular clock may provide a common yardstick for dating branching events. Analogy can be distinguished from homology.

Many molecular phylogenies today make use of mitochondrial DNA (mtDNA) sequence. Mitochondrial DNA is maternally inherited, and each individual has only one different copy of the molecule. Different regions of mtDNA evolve at different rates, but the average is about 1% divergence per lineage per million years. Besides mtDNA sequence, differences in tRNA secondary structure, sizes of rRNA, and features of the control region can be examined for polymorphism.

Janke et al. (1996, 1997), attempted to determine the branching pattern of eutherians (placental mammals), monotremes (egg laying mammals) and marsupials. They sequenced the whole mtDNA region in platypus (17019 base pairs). Non-coding protein regions, sequence gaps and third codon transitions were eliminated from the analysis. The maximum likelihood, neigbour joining and maximum parsimony methods were used to analyze the platypus and some eutherian and marsupial sequences. When Xenopus was used as an outgroup, all phylogeny methods significantly supported the Marsupionta hypothesis, where monotremes and marsupials are more closely grouped. The traditional viewpoint of the phylogeny of marsupials, monotremes and eutherians is called the Theria hypothesis, and has the marsupials and the eutherians being the most tightly grouped. The unnamed other hypothesis, where monotremes and eutherians are the closest related taxa, had greater support in the molecular phylogenies than the Theria hypothesis. The mean amino acid frequency of a number of potential outgroups deviated significantly from the mean amino acid frequency, so they were excluded from the calculations. Their analysis assumed that all positions in an amino acid sequence are free to vary.

The 60 million years before present (mybp) divergence of artiodactyls and cetaceans was used to date the monotreme-marsupial branch at 115 mybp on their tree. The data was reanalyzed by Penny and Hasegawa (1997), but with a "correction that allows unequal frequencies of amino acids to be taken into account". Their analysis also supported the Marsupionta theory.

The Marsupionta theory has been supported in the past through the use of morphological characters, but these characters were later determined to be plesiomorphic or convergent. Prior molecular studies are either unresolved or support the Theria theory.

It must be remembered that genes may diverge prior to species, and may diverge differently than species so a gene tree does not necessarily mirror a species tree. The species tree is the tree of all the genes in that species. Lineage sorting, uneven evolutionary rates, introgression, and genetic polymorphism in the ancestral species may all result in a gene tree differing from a species tree. A gene tree will be more likely to have the same topology as the species tree if there are many generations since divergence, the effective population sizes are small, and few taxa are examined. Examining more loci will increase the probability of obtaining the correct phylogenetic tree. The entire mtDNA molecule acts as a single locus.

The molecular clock, while useful, may not be universal for a number of reasons. Different regions of the mtDNA genome evolve at different rates, mean mtDNA evolutionary rates differ between species, and physiological/life history traits may affect nucleotide generation time.

A maximum parsimony tree designed by Greg Dueck from 11 structural characteristics of the mtDNA genome mentioned in Janke et al. (1997) supported the Theria theory with most outgroups. If chicken was used as an outgroup, the unnamed monotreme-eutherian grouping was supported.

It was suggested that designing greatest agreement subtrees between molecular and morphological trees may be a good way of dealing with disagreements between these two types of trees when they arise.

Molecular phylogenies can fall into three different categories. They may support conventional understanding and be considered one piece of evidence for a well-established phylogeny, they may resolve a phylogeny which morphology cannot, or they may support an unconventional relationship, in which case more research must be done to reveal the proper phylogeny. The papers discussed describe the mtDNA gene tree, not necessarily the species tree. The addition of more molecular, morphological or fossil evidence is required before the monotreme-marsupial-eutherian phylogeny can be resolved.

Janke, A., Gemmell, N.J., Feldmaier-Fuchs, G., von Haeseler, A. & Paabo, S. (1996) J. Mol. Ecol.42, 153-159.
Janke, A., Xu, X., & Arnason, U. (1997) Proc. Natl. Acad. Sci. USA, 94, 1276-1281.
Penny, D. & Hasegawa, M. (1997) Nature, 387, 549-550.

BIOL 606 Home


Discussion

A number of different issues were raised during the discussion.

1) Why were the secondary structures of the mtDNA not used in the analysis, especially since they seemed to support the Theria theory? It was hypothesized that the secondary structures were not used in designing the tree because the rate of change in secondary structures is not known and weighting secondary structures versus nucleotide differences would be difficult. However, if different nucleotide changes can be given differential weighting, then secondary structures could be weighted as well. Trees have traditionally not been designed that combine morphological and molecular data. Also, gross structural changes of unknown evolutionary origin should not be included in a phylogenetic analysis.

2) Why were so many base pairs of sequence eliminated from the analyses? Would their deletion bias the tree? It was mentioned that some sequences change so rapidly that they provide no new information to a phylogeny. The third base pair in a codon is generally quite variable so on the time scale of this phylogeny, it may be back-mutating which could result in an incorrect tree if analyzed. Other large regions are excluded if it cannot be decided how many evolutionary events they represent, or if they are so different as to make alignment impossible. The sequencing of more related species may show how to align such sequences, but others will still remain unique to a taxon.

3) How should one decide which tree to use if molecular and morphological trees disagree? We decided that in such a case, we should do as many analyses as possible, using as much information as we can get, to design species trees. The addition of more loci, or more morphological evidence, may help to eliminate differences between trees.

4) Would the weighting of amino acids differently result in a different tree, and should molecular evidence even be used in a parsimony analysis? Someone mentioned that amino acids important to function should be weighted more than those that aren't, but it is difficult to measure the importance of amino acid changes. The amount of flux in nucleotide changes makes it difficult to tell shared-derived from convergent characters. The addition of more taxa to a study can make it easier to track nucleotide changes and if a character is shared-derived or convergent. Morphological characters are much less likely to converge than molecular characters, which may make them better suited to parsimony analyses. However, parsimony can take into account reversions as long as they are constant for a period of time. As nucleotides in flux do not all flux at the same time, their reversions can average out over a length of time. Again, the addition of more loci to a study can increase the reliability of the phylogeny produced.

5) What was the significance of the outgroup chosen and what determined which animals would be used in the study? It was pointed out that only species whose entire mtDNA sequence was known could be used in this study, so the choices of animals to use was limited. Some other outgroups besides Xenopus were used initially, but they were eliminated from further analyses as their average amino acid sequence was significantly different from the mean amino acid sequence. These outgroups, especially the chicken, either supported the unnamed grouping or were not significant. The use of the large number of primates in this study was questioned.

6) Why are there so many methods available to design molecular trees, and is there any way to standardize this? Someone stated that the three main methods of designing molecular trees are maximum parsimony, maximum likelihood and neighbour joining, while morphological trees are usually designed by maximum parsimony. Molecular data is often weighted because, for example, transitions are much more likely to occur than transversions. It is difficult to standardize the way of designing a molecular phylogeny as the studies all vary so much. Many methods should be used to design a tree and an average, or best-fit tree be designed. A concern was raised with the fact that the Janke et al. (1997) paper did not mention why the methods were chosen to design trees.

7) Why was only the artiodactyl/cetacean separation date used to standardize the molecular clock? An average of separation dates was proposed as an alternative. However, the artiodactyl/cetacean date is the best understood, so averaging it with other dates may decrease the precision of the molecular clock.

8) Is a gene tree really different from a species tree? Someone mentioned that morphological data could also be affected by the gene tree phenomenon, so long as polymorphism exists for that trait prior to branching. The length of time the taxa are separated may reduce the chance of paraphyly and incomplete lineage sorting in a gene tree.

9) Why was long branch length attraction ignored in these studies? It was said that long branch length attraction may affect this study due to the large separation times and the number of changes between taxa, but this is not discussed in the papers (Janke et al. 1997, Penny and Hasegawa 1997). The wallaroo mtDNA was sequenced in an attempt to break up the marsupial branch length, but more breaks could be used, especially in the outgroup lines.

10) The large number of morphological characters which would have to be reversed or convergent for the Marsupionta theory to be true was briefly discussed.

BIOL 606 Home