MrBayes on XSEDE

MrBayes is a program for the Bayesian estimation of phylogeny. Bayesian inference of phylogeny is based upon a quantity called the posterior probability distribution of trees, which is the probability of a tree conditioned on the observations. The conditioning is accomplished using Bayes' theorem. The posterior probability distribution of trees is impossible to calculate analytically; instead, MrBayes uses a simulation technique called Markov chain Monte Carlo (or MCMC) to approximate the posterior probabilities of trees.

MrBayes 3.2.2 provides new features. It supports commands for BEST (Bayesian Estimation of Species Trees). It also supports checkpointing, which makes it possible to restart a run that has terminated unexpectedly, or that has reached the end of the maximum allowed run time without converging. While MrBayes 3.2.2 code supports the use of GPUs via BEAGLE, we do not currently support that option. Nevetheless, the new code offers signficant speed ups on XSEDE.

MrBayes 3.2.2 on XSEDE provides an interface that allows one to configure and submit jobs to Gordon, a large NSF XSEDE Resource. The interface also supports submissions for jobs that are configured via a MrBayes Block in the Nexus input file. We always recommend use of the Nexus file for job configuration, because it seems much simpler to manage.

IF YOU ARE MAKING A LONG RUN: Lets say you want to run 100,000,000 generations and your data set is large. By default, sampolefreq=500. This will likely create huge output files. You want to set samplefreq in this case so the job will not exceed the maxomum size CIPRES allows (8 GB). If you arent sure, we recommend you run for a few thousand generations, and monitor the size of the output files using the intermedaite files link. If your files seem to grow very quickly, just stop the job, and edit the input file, setting samplefreq= to a larger value.

IF YOU WANT TO RESTART MRBAYES: A key issue is to make sure that your checkfreq and samplefreq values are compatible. checkfreq= command in your Nexus file sets the number of generations between writing to the .ckp file. This value should always be less than or equal to samplefreq= . Default for checkfreq is 2000; default for samplefreq=500. Note that checkfreq must also be an even multiple of samplefreq. If you set samplefreq for 10000, in the mrbayes  block of your input file, say for a long run, you should set samplefreq for an even multiple, 1000 or 5000. samplefreq=3000 or samplefreq=7000 will not work for obvious reasons. 

Manual for MrBayes 3.2.2: http://mrbayes.sourceforge.net/mb3.2_manual.pdf

MrBayes mail list: http://sourceforge.net/mailarchive/forum.php?forum=mrbayes-users

MRBAYES home page here.

INPUT = dna or protein matrices in Nexus format

Simple Example of Run Input/Output

Input File Type File Name
input file infile.nex
example MB block mbblock.nex
   
Output File Type File Name
log file log.txt and mrbayeslog.out
   
sump_output infile.nex.run1.p
  infile.nex.run2.p
  sumpoutput.out
   
sumt_output infile.nex.run1.t
  infile.nex.run2.t
all_mcmc_trees infile.nex.trprobs
partition information infile.nex.parts
consensus_tree infile.nex.con
acceptance_ratios infile.nex.mcmc

Known Issues:

  • MrBayes 3.2.2 runs with BEAGLE fail quickly with a few data sets. The cause is presently unknown, but it is related to the interaction between MrBayes and BEAGLE. To fix this, clone the failed job, uncheck the "Run Using Beagle" box, and resubmit. These runs will always succeed without BEAGLE, with a slight (20%) run time penalty.

     

If there is a tool or a feature you need, please let us know.


CIPRES – Cyberinfrastructure for Phylogenic Research