Organization Acknowledgements
Senior Personnel Postdocs/Grad Students All Personnel by Institution
Collaborations
Presentations Publications
Community Software Morphology Databases Sequence Databases ATol

Developing and Using RAxML for large-scale phylogeny reconstruction on the CIPRES cluster at the San Diego Supercomputing Center.

Dr. Alexandros Stamatakis

RAxML (Randomized Axelerated Maximum Likelihood) is a program for Maximum Likelihood-based inference of large phylogenetic trees. The cluster is currently being used to further develop RAxML and to conduct analyses of large-scale real-world biological problems.

The program is explicitly being developed to efficiently infer trees for extremely large datasets, either in terms of the number of taxa and/or the sequence length. For example, a 25,000-taxon alignment of protobacteria with an alignment length of 1,500 base pairs had a run time on a single CPU of the cluster of only 13.5 days, with a memory consumption of only 1.5GB.

Through OpenMP-based (www.openmp.org) parallelization of RAxML, the program can also efficiently exploit the SMP (Symmetric Multi Processing) capabilities of the cluster with its 8-way SMP nodes. This type of parallelism is especially useful for very long alignments. For example, together with Olaf Bininda-Emonds and Usman Roshan, we are currently working on a multi-gene analysis of almost 70 genes for a total of 2,100 mammal species. The resulting multi-gene alignment is extremely long (50,000 base pairs) and therefore scales nicely on the SMPs.

The new version of RAxML offers the possibility to carry out parallel distinct inferences on the original alignment from distinct randomized Maximum Parsimony starting trees as well as parallel bootstrap analyses. This parallelization is based on the Message Passing Interface (MPI) such that all CPUs of the cluster can be used simultaneously to carry out large bootstrap analyses.

On the computer-science side of things, the cluster is also currently being used by my student, Michael Ott, to develop the coarse-grained MPI parallelization of the RAxML algorithm. This type of parallelization will allow for the parallel inference of a single huge tree on all CPUs of the cluster.

Future plans include investigating novel ways to distribute the tree data structure among CPUs, given that memory consumption is currently limiting our capabilities to compute trees of more than 25,000 taxa. In addition, together with Usman Roshan, we will assess how his Rec-I-DCM3 meta-method can improve upon the performance of the new RAxML version.

Dr. Stamatakis is currently a postdoc with Dr. Bernard Moret at the Swiss Bioinformatics Institute in Lausanne, and is moving to Munich in the near future to run his own group.

References and Availability: