Yet, large-scale phylogeny estimation turns out to be much more difficult than expected. First, all the best methods are computationally intensive, and standard techniques do not scale well to large datasets; massive parallelism helps but does not really address the basic challenge inherent in searching an exponential search space. Another issue is that the statistical models of sequence evolution that properly address genomic data are substantially more complex than the ones that model individual loci, and methods to estimate genome-scale phylogenies are (relatively speaking) in their infancy compared to methods for single gene phylogenies. Finally, there is a substantial gap between performance as suggested by mathematical theory (which is used to establish guarantees about methods under statistical models of evolution) and how well the methods actually perform on data – even on data generated under the same statistical models! Indeed, this gap is one of the most interesting things about doing research in computational phylogenetics, because it means that the most impactful research in the area must draw on mathematical theory (especially probability theory and graph theory) as well as on observations from data.

Computer scientists have brought innovative algorithm design techniques into computational phylogenetics that are dramatically improving the accuracy and scalability of phylogeny estimation. Many of these new methods are now being used by evolutionary biologists to compute multiple sequence alignments, construct species trees and phylogenetic networks from genome-scale datasets, and make biological discoveries. It is clear that computer science techniques can- and will- enable breakthroughs in biological discovery for the genome-scale datasets that are being assembled around the world.

*Computational Phylogenetics: An Introduction to Designing Methods for Phylogeny Estimation* is designed to train the next generation of algorithm developers so that they can develop these new methods and enable these breakthroughs. The book is self-contained, and no biology background is needed. Although the focus is on communicating mathematical foundations and innovative algorithm design, much of the material is accessible to biologists and others who are interested in critically evaluating the scientific literature about phylogeny estimation methods in this post-genome era.

Find out more about *Computational Phylogenetics *and Tandy Warnow