Opinion Page is a forum for views and ideas of potential interest to
General information and editorial notes
News and Notes
DNA Barcoding: Deus ex Machina
DNA taxonomy (Tautz et al. 2003) and DNA barcoding (Hebert et al. 2003a) have captured considerable attention during the first half of this year, with feature articles appearing in publications ranging from The Economist (4 Jan. 2003) and Der Spiegel (21 May 2003) to Nature (Blaxter 2003) and Science (Pennisi 2003). In a nutshell, the proponents of DNA taxonomy advocate the use of DNA sequences as the central "scaffold of a taxonomic reference system" (Tautz et al. 2003). Hebert et al. (2003a) go further in proposing reliance on a DNA "barcode", which is the sequence for a 658 base pair fragment of the cytochrome oxidase subunit I (COI) gene of mitochondrial (mt) DNA, as a substitute for species diagnoses by traditional methods. Such barcodes are described as "the sole prospect for a sustainable identification capability" that will allow biologists to cope with the "harsh burden" of the diversity of life.
By themselves, such strong claims would be sure to capture public attention. Hebert has also shown astute media management skills in obtaining coverage from The Economist and Nature as simultaneous publicity for the appearance of his barcode article. Moreover, it is clear that numerous scientists are taking these claims seriously, whether they agree with their merits or not. In casual discussions with colleagues, any mention of DNA taxonomy soon raises the question: Is traditional morphology-based taxonomy on its way out? At least one geneticist has bet me $100 that classical taxonomy will wither away. He predicts that, within 15 years, all routine pest and border identifications will be done using DNA, and morphology-based investigations will be relegated to structure/function studies and accessory documentation of new species. Others among my colleagues worry that a rapid increase in funding for DNA taxonomy will inevitably be at the expense of traditional taxonomy. They fear that once classical taxonomists have been used to provide names for a specimen or two of the currently available species, further funding for their line of work will dry up. However, I think we have confused biological, practical, and sociological issues in a haze of hype and apprehension.
At the biological level, we need to ask whether DNA barcodes really work to identify species? The answer is a clear yes – in all except the kinds of identifications that matter most, which is the level of closely related sister species that cannot readily be distinguished by traditional morphological characters. As Hebert et al. (2003b) conclude in a follow-up study, in which they survey GenBank and compile COI divergences among congeneric species across 11 animal phyla, even the least informative DNA barcodes (in the Cnidaria) allow identification to the genus level and above. However, neither the local faunal sample of moths used in their first paper (2003a) nor the more general GenBank survey (2003b) constitute a rigorous test of the effectiveness of COI DNA barcodes for species identifications. That is because the most closely related species tend not to have overlapping ranges, and hence surveys from a geographically limited area will only rarely include the most recently diverged pairs. Also sequences represented in GenBank are biased toward the more distinct species within a genus. Based on a series of projects in my lab over the last decade, in which we have used COI sequences to document divergences between closely related species in five insect orders, I would estimate that up to a quarter of species will prove resistant to easy characterisation using DNA barcodes.
There are several reasons for an artificially dispersed distribution of congeneric sequences in GenBank. First, many sequences have not been deposited in GenBank if they are very similar to other haplotypes. Instead, this kind of minor variation is usually documented in the form of a condensed table in the paper publication (e.g. Sperling et al. 1995). Such cases will be missed by later data mining of GenBank. Second, there has been relatively little documentation of geographic variation in mtDNA sequence within species. In fact, those cases that have been studied show that that species frequently contain polymorphic haplotypes with deep divergences that predate species divergences (e.g. Sperling and Hickey 1994; Nice et al. 2002; Wahlberg et al. 2003). Thus many species cannot be characterised by either monophyletic mtDNA clades or distinct phenetic clusters based on percent sequence divergences, and the effectiveness of DNA barcodes for species identification is not properly tested by sequencing two or three specimens from the same location, as in Hebert et al. (2003a). Third, although studies where variations in mtDNA sequence confirm prior species designations have been easy to publish, it has become increasingly more difficult to publish studies that don’t confirm such preconceptions. In such cases, reviewers increasingly expect that other (presumably nuclear) gene sequences should be compared to mtDNA, and in the process they betray their assumption that morphological characters are by themselves not worthy of comparison with DNA. The problem is that, after mtDNA has been characterized, it is much harder to find nuclear genes that provide informative sequences at the level of closely related species. The genes commonly used in phylogenetic work, wingless and elongation factor 1a, and even non-coding internal transcribed spacer (ITS) sequences, are simply too slow-evolving to be very informative. Also random amplified polymorphic DNAs are unreliable, and allozymes require completely different equipment and skills. So we have a substantial backlog of unpublished studies in which polymorphic mitochondrial DNA is not providing the simple picture portrayed by Hebert et al.’s (2003b) GenBank survey, where "the clear delineation of most congeneric species pairs indicates a surprising ferocity of lineage pruning".
Of course, an alternative solution might be to define species primarily on the basis of DNA barcodes, perhaps using the 3% divergence rule advocated by Hebert et al. (2003a). However that would only conceal incongruent character distributions without solving the underlying biological problems. Close sister species are usually the most important ones to identify correctly, whether they are pests, disease vectors, or ecological indicators. There can be little doubt that DNA sequences, in conjunction with morphology, are a rich source of characters for identification and classification of species. However it will take sampling across the full range of each species to establish the credibility of DNA barcodes, one species at a time, and assumptions about ferocious lineage pruning are no substitute for such legwork. At least, however, Hebert et al.’s claims have provided new incentive to publish complex results, and I predict a surge in studies that show that many mtDNA-based delineations of species are not as simple as hoped.
In addition to biological issues that challenge the value of DNA barcodes, insufficient attention has been paid to several practical problems raised by reliance on DNA-based identifications. I have participated in the progress of various DNA sequencing methods over the last two decades, and although the advances are inspiring, we are a long, long way from having a tool that will work with the rapidity of the "tricorder" depicted in Star Trek shows. Regular species identifications using DNA are practical now for some economically important taxa, just as they are in the identification of criminals. A portable device, perhaps using DNA chips, that will give sequence in an hour or so and only require a few minutes of direct interaction by the user, might be widely available within this decade. It would be realistic to use this device in circumstances where taxonomists now do dissections of genitalia. But such a device would have to be orders of magnitude faster and more flexible than current technology for it to compete effectively with a trained entomologist who knows insects by eye and can identify them in seconds, as is currently expected for extension entomologists for most insect pests in a given geographic area. I’m sure that we won’t have a practical device that will rival the efficiency of such people for many decades.
In fact, voucher specimens and sight-based identifications will remain necessary far into the distant future. GenBank is already rife with errors (Harris 2003), with more than half of all published human mtDNA studies containing sequencing mistakes. Problems with switched or contaminated samples are likely to be even worse for insects. Current hand-held calculators provide an insightful comparison, in that they allow calculations that are much faster than by other means, but their results need to be constantly checked with mental math to save embarrassments due to errors in data entry. I am firmly convinced that quick (if rough) identification by eye will always remain crucial to weeding out the most egregious mistakes due to misuse of technology like DNA barcodes. It would be a serious loss if unrealistic expectations about the accuracy and efficiency of DNA barcodes were to diminish training in traditional insect classification.
Finally, I think it is important to consider DNA barcoding at a sociological level, in order to understand why this issue has so effectively captured the public imagination. If biological and practical criteria were the sole grounds on which the value of DNA-based identification was being judged, I would expect that it would develop gradually but unremarkably as a valued component of normal taxonomic identifications and delineations. Other technologies, such as scanning electron microscopy, have allowed access to rich new series of characters across a great range of taxa, and have been absorbed without controversy into the repertoire of working taxonomists. The basic idea of rapid, automated identification based on COI sequences has been around for many years. The opportunities for using "universal" polymerase chain reaction primers to amplify mtDNA from a vast array of taxa were already obvious in more than one lab in the late 1980’s, including the Wilson lab at Berkeley (Kocher et al. 1989), and the Harrison lab at Cornell, where fellow graduate students and I embarked on sequencing COI across a large variety of insects. Later, as a postdoc in the Hickey lab at the University of Ottawa in the early 1990’s, we speculated on how long it would be before our automated sequencer could be efficiently attached to a miniaturized DNA extractor at one end, and a voice synthesizer at the other end, to give a functional tricorder. It seemed only a few years away, and now more than a decade later we are incrementally closer, with no obvious breakthroughs except that people are talking about it more. Even the idea of sequencing a limited, standard set of genes (including COI) across half of the known biodiversity of the planet (the insects) is not new. My postdocs and I published a review paper structured around this idea three years ago (Caterino et al. 2000), which Hebert et al. (2003a) neglected to cite. But none of us took the final step that was required to make DNA-based identification using a standard gene region into the hot issue of the year – it needed energetic and adept marketing.
Hebert has shown real insight into what DNA taxonomy represents to popular culture. By coining the term DNA barcodes, he has given DNA-based identification an immediacy, practicality, and comprehensibility that anyone can relate to. By encouraging comparisons with Star Trek tricorders he has unleashed memories of optimism about the beneficial power of technology, deus ex machina (god from a machine), beneath the accumulated cynicism of the last decades. And by explicitly invoking the "harsh burden" of the diversity of life, rather than Darwin’s more uplifting sense of "grandeur in this view of life", Hebert has bluntly reminded us of our daily struggle to come to grips with relentlessly expanding amounts of information and complexity. Like Martha Stewart, J.K. Rowling, and Oprah Winfrey, Hebert has identified and capitalized on a latent yearning for something that is missing from our daily lives: DNA barcodes hold out the promise of a simplifying elegance that is both broad and deep, and tames the confusion of life.
Too bad it won’t be able to deliver the goods.
|Back to top||Biological Survey of Canada (Terrestrial Arthropods) home page|