Developing and Using RAxML for large-scale phylogeny reconstruction on the CIPRES cluster at the San Diego Supercomputing Center.
Dr. Alexandros Stamatakis
RAxML (Randomized Axelerated Maximum Likelihood) is a program for Maximum Likelihood-based inference of large phylogenetic trees. The cluster is currently being used to further develop RAxML and to conduct analyses of large-scale real-world biological problems. RAxML is one of the test applications at the IEEE/ACM 2008 Supercomputing conference cluster challenge (see here for more information on the SC cluster challenge). Much additional information can be found on Alexis' RAxML page.
The program is explicitly being developed to efficiently infer trees for extremely large datasets, either in terms of the number of taxa and/or the sequence length. For example, a 25,000-taxon alignment of protobacteria with an alignment length of 1,500 base pairs had a run time on a single CPU of the cluster of only 13.5 days, with a memory consumption of only 1.5GB.
Through OpenMP-based (www.openmp.org) parallelization of RAxML, the program can also efficiently exploit the SMP (Symmetric Multi Processing) capabilities of the cluster with its 8-way SMP nodes. This type of parallelism is especially useful for very long alignments. For example, together with Olaf Bininda-Emonds and Usman Roshan, we are currently working on a multi-gene analysis of almost 70 genes for a total of 2,100 mammal species.
Recently, rapid bootstrap heuristics for RAxML were released that are more than an order of magnitude faster than current algorithms. Computational experiments on 22 DNA and AA (amino acid) containing 125 up to 7764 sequences; the RBS inferences are between 8 and 20 times faster (average 14.73) than SBS analyses with RAxML and between 18 and 495 times faster than BS analyses with competing programs, such as PHYML or GARLI The performance improvement increases with alignment size. These are availathrough the CIPRES portal.
Current Version
RAxML 7.0.4 (source code) and a comprehensive Manual (v7.0.4) are the most recent versions. Version 7.0.4 is running on the new Cipres portals V 1.0 and 2.0, The Vital-IT server still uses an intermediate version, but will be upgraded soon.
New Features (version 7.0.4):
· Ability to run rapid BS algorithm with constraint trees (-r and –g options)
· Added taxon-name error checking
· Increased allowed taxon name length to 256 characters
· Added option to compute pair-wise ML-distances between taxa
.
References and Availability:
- A. Stamatakis, M. Ott, T. Ludwig (2005) RAxML-OMP: An Efficient Program for Phylogenetic Inference on SMPs''. In Proceedings of 8th International Conference on Parallel Computing Technologies (PaCT2005), Lecture Notes in Computer Science, 3506:288-302, Springer Verlag.
- A. Stamatakis, P. Hoover, J. Rougemont. (2008) “A Fast Bootstrapping Algorithm for the RAxML Web-Servers”. Systematic Biology,
57(5): 758-771. - Related papers and the software are freely available at http://icwww.epfl.ch/~stamatak/
- RAxML has recently been updated and is now available for download as RAxML (7.0.0).
- A manual for RAxML is available at http://icwww.epfl.ch/~stamatak/index-Dateien/countManual7.0.0.php

