The Evolution of Microsatellites

Lecture by Chris Kyle, for Biology 606

Report by Colin Reynolds

Microsatellites are a class of molecular markers used extensively in population biology and paternity studies. Studying the evolution of microsatellites provides an understanding of mutation rates and the associated mutational models. The proper use of these molecular markers prevents the making of incorrect conclusions with particular regard to the relatedness of populations and the variation among species. Microsatellites (STR's, VNTR's), are repetitive sequences of DNA 1-6bp that are usually 6-30 repeats long. The genomes of eukaryotes, prokaryotes and some organelles contain microsatellites. However, different organisms have different repeats that are characteristic of the specific organism and these may be tied to the evolution of these repetitive sequences.

In mammals, the di-nucleotide repeat GT is the most common while in plants it is CT. The abundance of GT repeats in mammals may be due to methylation of C residues. A methylated C residue is relatively unstable and can undergo a deamination event that results in a transition from C to T. This would change a GC di-nucleotide into a GT di-nucleotide resulting in more GT repeat arrays.

Microsatellites are generally thought to be randomly distributed neutral markers but there is some evidence of microsatellite clustering. Clustering of tetra-nucleotide repeats in the centromeric region of humans, tomatoes, and the sex chromosomes of snakes, could be an indicator of a specific function for some microsatellites.

Microsatellites are very amenable to the study of fine evolutionary questions due their quick evolution which results in a high level of variation. Microsatellites are also co-dominant mendelian markers, provide reproducible results and are neutral with the possible exception of the above mentioned tetra-nucleotides and tri-nucleotide repeats located within genes.

Microsatellites are found in two ways: 1) use of DNA repositories (e.g. Genbank) for species in which there is a sequencing project and 2) through cloning which involves the screening of a DNA library for microsatellite repeat motifs. Those clones containing a motif are then sequenced. Primers for the polymerase chain reaction (PCR) are designed for the unique sequence flanking the microsatellite repeat. Lastly PCR amplified DNA fragments containing the microsatellite are then run on a polyacrylamide sequencing gel which separates DNA on the basis of size allowing different alleles to be distinguished.

The quick rate of evolution (or mutation) produces a large number of alleles. The rate of mutation has been measured directly through pedigree analysis, but few studies have followed a mutation through several generations. Linkage data has been used to indirectly measure mutation rate in cell lines even though somatic mutations may occur at a different frequency than evolutionarily important germline mutations. Inbred cell lines could also be expressing recessive mutations that affect DNA repair which could inflate the detectable rate of mutation. The current range of mutation rate is between 10-2 and 10-5 mutations per generation. This three fold range may also be the result of different mutations rates for different microsatellite motifs (di- are the fastest followed by tri- and tetra-) and the number of repeats (a minimum of 8 repeats for polymorphism within a species).

The favoured mechanism for microsatellite mutation is replication slippage in which the repeat number changes by one repeat and rarely more (Levinson & Gutman, 1987). The microsatellite array would increase in length if a repeat from the new DNA strand "loops out" and would decrease if a repeat from the template strand "loops out". This mechanism is supported by 1) in vitro studies with bacterial cell lines, 2) population studies, 3) direct mutational analysis, 4) a 100 to 700 fold increase in array stability when genes involved in mismatch repair were mutated, and 5) no change in array stability when proof reading genes were mutated.

Mutational models are used to derive the expected number of alleles in a population from the observed heterozygosity; these are also used in the statistical analyses of genetic variation. The two main models are the infinite allele model (IAM) and the stepwise mutation model (SMM). The IAM states that a mutation creates another allele at rate u . The SMM predicts that mutation will result in an allele which is one repeat larger or smaller. Neither model matches microsatellite mutation. Wright's F-statistic underestimates population variation and is based on the IAM; Slatkin's R-statistic overestimates population variation and is based on the SMM.

Microsatellites are inappropriate for the study of deep phylogeny. Their high mutation rates lead to a large amount of homoplasy over a relatively short period of time. Their use should be limited to recent splits.

Discussion

based on Ellegren et al. 1997. A reciprocal study of repeat length at homologous loci in cattle and sheep.

Discussants: Greg Wilson and Grant McIntyre

1) What has this paper got to do with microsatellite evolution? It is a rebuttal to the papers by Rubinsztein et al. (1995). One paper stated that microsatellites evolve more quickly in humans than chimpanzees and the other stated that microsatellites evolve directionally (i.e. get larger) with inter specific variation. It was felt that the evidence put forth by Ellegren et al. (1997) would have been stronger if they had studied more species. While claiming the difference in size was due to varying repeat lengths they did not sequence any of their microsatellite loci. How could they determine difference in size was due to different numbers of repeats rather than insertion or deletion events. The use of repeat length as the measure of microsatellite evolution was also questioned; why not use variability as well as length?

2) Could the use of domesticated animals affect the results of Ellegren et al. (1997). It was felt that variation might be reduced if the animals were inbred but that large alleles should still be present. The use of only one breed of sheep but several cattle breeds seemed inadequately explained. Were the cattle breeds inbred while the sheep breed was not? Why not examine the loci in more than one breed of sheep.

3) Does it matter if microsatellite arrays are longer in one species than another? The issue is that people compare the variation of two or more species with microsatellites isolated from a single species. If microsatellites are always longer, implying higher variability, in the species from which they were isolated are these comparisons rendered invalid. There were a couple examples where this has been done and the conclusion reached was that the species from which the microsatellites were isolated were the most variable (the examples were monkey flowers and bears)

4) Are primers homologous between species? Yes, the sequence for a 20bp primer occurs with the frequency of 1/4^20. Flanking regions of microsatellites are ancient (same in different species) while the microsatellites may have 36 repeats in each species (X and Y) but are probably not the same (ie. convergent). This has been shown using interrupted repeats.

5) Should one use flanking regions of microsatellites for studies of evolution? The SMM predicts that there should be an infinite number of alleles and that alleles of similar size should be more closely related than those alleles with greater size differences. The problem is that microsatellite allele size appears to be constrained causing homoplasy to occur as only a limited number of alleles are possible and the mutation rate of microsatellites is high. The construction of phylogenies may not be possible when only using microsatellites. Black and Grizzly bears diverged 15-20 thousand years ago which was too ancient to be detected by microsatellites.

The flanking regions of microsatellites change slowly and could be used to do phylogenies. The microsatellite size differences could be used in order to identify haplotypes in the flanking regions. Microsatellites have very little crossing over, except in cases like fragile X where the repeat size is vary large.

6) When cloning, you are screening for the repeat GT20 which selects for long repeats when using stringent hybridization conditions. The use of M13 (a phage vector) selects against really long microsatellites so you do not get clones with 800bp repeats.

It is hard to believe microsatellites are limited to only 30 repeats on average. In fish, different alleles are 50 repeats apart (i.e. size is not constrained), Kangaroo rats (50-60 repeats) and ungulates 60+ repeats. It is possible that there is a bias in studies of repeat number because the studies use humans, mice and rats which may be too closely related or the way people are isolating microsatellites for these studies is creating a bias.

7) Microsatellites could be used to analyze the phylogeny of island populations. Allow for the fixation of an allele due to a big population constriction but only over a short time period.

8) Ellegren et al. (1997) provided 13 diagrams in the focal paper . RM011 and RM044 only good loci but they are not consistent with the authors' conclusion.

9) Evolution of functional traits vs. neutral traits. The environment selects homoplasies in functional traits (dessert plants) while neutral traits have random homoplasies. DNA sequences you don't know the selection pressure.

10) What aspects of microsatellite evolution should we be interested in? Mechanism of change, how often they appear and disappear and why.