Using GARLI for large-scale phylogeny reconstruction on the CIPRES cluster at the San Diego Supercomputing Center.
Dr. Derrick Zwickl
GARLI (Genetic Algorithm for Rapid Likelihood Inference) is a program for Maximum Likelihood-based inference of large phylogenetic trees. GARLI is currently available on the CIPRES cluster to conduct analyses of large-scale problems, both as a stand-alone and as a part of REC-I-DCM3. .
GARLI (Genetic Algorithm for Rapid Likelihood Inference) performs phylogenetic searches on aligned nucleotide datasets using the maximum likelihood criterion. The assumed model of nucleotide substitution is the General Time Reversible (GTR) model, with gamma distributed rate heterogeneity and an estimated proportion of invariable sites. The implementation of this model is exactly equivalent to that is PAUP*, making the log likelihood (lnL) scores obtained directly comparable. All model parameters may be estimated, including the equilibrium base frequencies (which are not equal to the empirical base frequencies). The gamma model of rate heterogeneity assumes four rate categories (the default in PAUP*).
GARLI is loosely based on the program GAML (Lewis 1998). It uses a genetic algorithm approach to simultaneously find the topology, branch lengths and model parameters that maximize the lnL. This involves the evolution of a population of solutions termed individuals, with each individual encoding a tree topology, a set of branch lengths and a set of model parameters. Each individual is assigned a fitness based on its lnL score. Each generation random mutations are applied to some of the components of the individuals, and their fitnesses are recalculated. The individuals are then chosen to be the parents of the individuals of the next generation, in proportion to their fitnesses. This process is repeated many times, and the population of individuals evolves toward higher fitness solutions. Note that the highest fitness individual is automatically maintained in the population, ensuring that it is not lost due to chance (genetic drift).
The mutation types used by GARLI are divided into three types: topological mutations, model parameter mutations and branch-length mutations. Topological mutations consist of the standard NNI and SPR rearrangement types, as well as a localized form of SPR in which the pruned subtree may only be reattached to branches within a certain radius of its former location. Topological mutations are followed by some degree of rough branch-length optimization. Model mutations simply choose one of the model parameters and multiply it by a gamma-distributed variable with mean 1.0. When branch-length mutations are performed, a number of branches are chosen and each has its current length multiplied by a different gamma-distributed variable.
Dr. Zwickl is currently a postdoctoral researcher at NESCent, the National Evolutionary Synthesis Center .
References and Availability:
- GARLI (0.951) software is available at http://www.bio.utexas.edu/faculty/antisense/garli/Garli.html.
- A manual for GARLI is available.
- Please cite: Zwickl, D. J., 2006. Genetic algorithm approaches for the phylogenetic analysis of large biological sequence datasets under the maximum likelihood criterion. Ph.D. dissertation, The University of Texas at Austin.

