Organization Acknowledgements
Senior Personnel Postdocs/Grad Students All Personnel by Institution
Collaborations
Presentations Publications
Community Software Morphology Databases Sequence Databases ATol

Using Rec-I-DCM3 for fast phylogeny reconstruction of Large Trees on the CIPRES cluster at the San Diego Supercomputing Center.

Drs. Usman Roshan, Tiffani Williams, Usman Roshan, Bernard Moret, and Tandy Warnow

Rec-I-DCM3 (Recursive Iterative Disk Covering Method3) is one of a family of Disk Covering Methods to speed the inference of phylogenetic trees from large data sets. Rec-I-DCM3 is currently available on the CIPRES cluster to speed analyses of large-scale problems using PAUP, GARLI, or RAxML as the inference engine.

Estimations of phylogenetic trees are often obtained through the use of heuristics for maximum parsimony (MP) and maximum likelihood (ML); both of which are NP-hard problems. Although apparently good heuristics have been devised, even these fail to produce good solutions in reasonable time for large datasets. The practical limit today is probably less than one thousand sequences; reconstructing much larger trees remains a Grand Challenge problem. Rec-I-DCM3 is a promising new divide-and-conquer technique, one of a whole family of Disk-Covering Methods (DCMs) that operate by iteratively dividing the input set of sequences into smaller overlapping subproblems, solving them using some base method (e.g., neighbor-joining, heuristic MP, heuristic ML, etc.), and then merging these subtrees into a single, phylogenetic tree. All DCMs boost the performance of the base method. The method available here is composed of a new DCM algorithm, which we call DCM3, but utilizes recursion and iteration as well, hence the name Rec-I-DCM3. As assembled here, Rec-I-DCM3 boosts phylogenetic reconstruction methods that can produce dramatic improvements in speed for standard heuristics, as well as substantial improvements over the very best methods (which are harder to improve). We demonstrate the power of this new DCM on ten large biological datasets ranging from 1,322 to 13,921 sequences.

Dr. Roshan is currently an Assistant Professor at the New Jersey Institute of Technology.
Dr. Williams is currently an Assistant Professor at Texas A&M University
Dr. Moret is currently a Professor at the Swiss Federal Institute of Technology
Dr. Warnow is currently a Professor at UT Austin.

References and Availability: