BIOL 606 Home

The role of Organismal and Nuclear DNA for Inferring Phylogenetic Relationships

Lecture by Renee Polziehn

Rapporteur: Jan Jekielek

Historically, plant taxonomy was developed using primarily morphological characters. However, species complexes exist where there is much variation in morphology, making taxonomic relationships unclear. For instance, Heuchera species exhibit variation in leaf morphology and floral structures, which can be attributed to geographic differences (Soltis & Kuzoff, 1995). When faced with such problems, taxonomists have begun to use other characters to determine taxon relationships.

Within the last 20 years, several methods using genetic variation to study taxonomy have been developed. These include, in chronological order, allozymes, RFLP's, and the sequencing of mitochondrial (mt), chloroplast (cp), and nuclear (n) genomes. Recent ly, most emphasis has been placed on mt and cp sequencing. Chloroplasts appear to have been secondarily lost several times throughout evolutionary history, while mt's have seldom disappeared from eukaryotic cells.

There are significant differences between the mt and cp genomes. In plants, mt's can be an order of magnitude larger than cp's (200-2500kb vs 120-160kb). The animal mt genome is only 14-30kb long. Mode of inheritance of cp's is paternal in conifers and largely maternal in angiosperms, whereas mt inheritance is usually maternal in both the plant and animal kingdoms. cpDNA has more transfer (t) RNA, ribosomal (r) RNA, and coding proteins than does mtDNA. There is significant transfer of genes between organellar genomes and nDNA in plants; in animals, no transfer from n to mtDNA is known. The cp genes in plants and mt genes in animals rarely have different arrangements, but plant mt genes often have different alignments among taxa. Repeats are present in bo th genomes, but inverted and dispersed repeats are found only in cpDNA. There are variable numbers of introns and spacers in cp genomes and plant mtDNA; There are no introns and few spacers in mammalian mt's. Nuclear genomes range between 8X10^8 kb (birds) and 1X10^11 kb (angiosperms). The tandemly repeated and serially copied rRNA cistron is the most commonly assayed nuclear gene. Most of the variation in rRNA is found in internal transcribed spacer (ITS) regions.

The cp genome evolves at a very slow rate: there is less than 2% divergence between congenerics. Most variation is found in the inverted repeat regions. In plants mtDNA likewise evolve at a slow rate, but in animals up to a 100-fold increase in mutation rate can be observed. Most of this variation is found in the repetitive d-loop region. The GC content of mt genomes is higher in endotherms than in exotherms.

The three types of DNA discussed above are useful at different scales of phylogenetic analysis. At the population level, mtDNA is potentially useful. At the species, genus, family, and order taxonomic levels, all three types of DNA can be used to find characters bearing phylogenetic information. Care must be taken to select genes with levels of variability appropriate to the scale of divergence being examined. Potential hybridization between taxa can be identified by observing incongruencies between phylogenies determined from different types of DNA. Identification of sequence divergence among populations and subspecies can aid in the selection of genetically important groups for conservation purposes, as well as provide biogeographic information.

When looking at molecular diversity of northern populations of Silene acaulis and Saxifraga oppositifolia, Abbot et al. (1995) found little cpDNA and allozyme variation among populations of the former, and considerable variation in the cp in the latter. They concluded that Silene had recently invaded the northern regions, while Saxifraga was likely to have had an ancient dispersal followed by divergence of each population. When examining mt and n gene variation in genus Salmo, Bernatchez & Osinov (1995) found that mtDNA was informative in grouping northern and southern sea basin populations into two ancient lineages. The nDNA did not show such a clear result. In the focal paper, n and cp phylogenies were developed for the Heuchera subfamily (Soltis & Kuzoff, 1995). There was considerable discordence between phylogenies determined using the different types of DNA, and many polytomies occurred among lower branches in the trees. Genes with higher rates of evolution would be required to resolve the polytomies. The discordance between the two data sets was deemed likely to be the result of hybridization between taxa, the fixation of cpDNA types from one taxon in another.

There are several advantages of using organellar DNA to infer taxonomic relationships. The mt and cp genomes are usually inherited uniparentally and exist only in one copy in individuals. Since there are so few copies, gene divergence is more likely to o ccur before species divergence. Since there are low levels of divergence in cpDNA, few characters are likely to be homoplasic. Conversely, high mutation rates in mt genomes increase the likelihood of convergence on specific characters. Furthermore, mutation rates in particular genomes can vary between taxa, confounding phylogenetic inferences. Introgressive hybridization, reflecting a reticulate pattern in the true taxon phylogeny, may also confound phylogenetic analyses.

Nuclear DNA can be used to provide an alternate inference of phylogeny. When used in combination with organellar DNA, it may be possible to identify instances of hybridization. However, variation in nDNA is limited, with base substitutions sometimes occurring serially at the same location, making homoplasy a strong possibility. In simulations, mtDNA trees reflect species trees with higher fidelity than do nDNA trees. The recombination of nDNA is the major confounding factor.

Molecular data can be used to infer taxonomic relationships independently of more traditional (i.e. morphological) methods. It is necessary to choose genes, either organellar or nuclear, which have variation at a scale that is useful for the taxonomic level that is being studied. Organellar and nuclear evidence supporting the same species tree provide a high level of confidence in that phylogenetic inference.  

Works Cited:

Abbot, R.J., H. M. Chapman, R.M.M. Crawford & D.G. Forbes. Molecular Ecology 4(2), 199-207 (1995)

Bernatchez, L. & A. Osinov. Molecular Ecology 4(3), 285-297 (1995)

Soltis, D.E. & R.K. Kuzoff. Evolution 49, 727-742 (1995)

BIOL 606 Home


Discussion

Discussants: Ranessa Cooper, Greg Dueck  

1. When conducting a parsimony analysis, it is useful to look for skewedness in frequency distributions of the phylogenetic trees generated (see figure). Why is this the case? Ideally, we want a single most parsimonious tree, on the far left of the frequency distribution. When it is skewed to the left, the bulk of the frequency distribution is made up of trees many steps larger than the most parsimonious one. This is often used as a measure of the accuracy of the most parsimonious tree.

2. Presence or absence of a sequence gap (insertion or deletion) is treated as an unknown character state in the focal paper. How does this differ from other methods? These include treating presence or absence as one of two possible character states, or simply leaving the gaps out of the analysis altogether. The latter method is the most common. The problem with treating gaps as single characters is that they may be the result of several independent mutations. It is also common practice to leave out regions immediately adjacent to gaps from the analysis, since their variation can have multiple origins.

3. Changing the sequence alignment among taxa being compared often has a significant effect on the length and composition of the most parsimonious tree associated with them. What methods are used to account for this fact? Most phylogenetic studies to date do not include alignment differences in their analyses. However, software packages have been developed that can treat alignment as a character state.

4. There are several potential sources of incongruency between organellar phylogenies and their organsimal counterparts. These include taxon density, sampling error, convergence, long branch attraction, heterogeneity of evolutionary rates, phylogenetic so rting, and hybridization. Why is hybridization focused on, as the most likely source of incongruence in the focal paper? There are many documented cases of chloroplast capture (one taxon inheriting the cpDNA from another through hybridization). The Heuchera group has the potential for such hybridization, even at the intergeneric level. Hybridization is a reasonable hypothesis for incongruence when there exists supporting ecological and/or biogeographic information. MT capture may have occurred several times in genus Salmo: hybridization is not just restricted to plants. However, it is important not to discount the possibility that other factors, such as differential evolutionary rates of organellar DNA in different taxa, are important.

5. Herbaria are an excellent potential source of genetic sequences for a variety of taxa. How likely is it that old specimens can be used to obtain DNA for phylogenetic analysis? The major limiting factor in herbarium collections is not age, but rather the method of preservation of the sample. A well preserved 100 year old specimen may be a better source of DNA than a poorly treated recent sample. There is likely to be considerable variation in preservation techniques among collectors.

6. It is popular for researchers focused on taxonomic techniques other than molecular, to see discordance in trees from different genetic data sets as proof that the methods are unsound. Is this a reasonable position to hold? Molecular data should hold relevent taxonomic information -- the trick is to identify the proper way of analysing it. Discordance may identify evidence of events such as historic hybridization and introgression, or be important in examining evolutionary differences among different types of DNA. Both of these factors can be important to the understanding of the phylogenies of taxa. However, discordance between data sets can be severely confounding to the determination of true organismal phylogeny.

7. It appears that the phylogenetic trees in the focal paper are not very well supported. Many branches have very low decay values (the number of steps that can be added to the tree before a particular clade breaks down). The trees depicted are probably not very good representations of actual organismal phylogeny.

8. When determining strict consensus trees from n data in the focal paper, populations were removed as to include only one population per species. Why was this done? The cp data set had one population per species, so this was an attempt to make the data sets similar. Also, when there are few population samples from some taxa and many from others, there exists a danger that the resulting tree will be overweighted by some taxa. This method allows for comparison of decay values of different branches in the consensus tree without bias. 9. Molecular phylogenetic data has fueled controversies about long established taxonomic designations. Taxonomies based on morphology are sometimes discordant with molecular data sets, leading to confusion about species concepts. The biological species concept may not have been meant to be an absolute designation, but rather a rough guideline to aid in classification. Nonetheless, it is used extensively as an absolute, particularily in endangered species legislation. It is useful to state which sort of a species concept (i.e. one based on phylogeny, or behaviour), is being used when describing taxonomic relationships.


BIOL 606 Home