Probalign uses partition function posterior probability estimates to compute maximum expected accuracy multiple sequence alignments. It performs statistically significantly better than the leading alignment programs Probcons v1.1, MAFFT v5.851, and MUSCLE v3.6 on BAliBASE 3.0, HOMSTRAD, and OXBENCH benchmarks. Probalign improvements are largest on datasets containing N/C terminal extensions and on datasets with long and heterogeneous length sequences. On heterogeneous length datasets containing repeats Probalign alignment accuracy is 10% and 15% than the other three methods when standard deviation of length is at least 300 and 400.

PROBALIGN manual here.

PROBALIGN home page here.

INPUT = dna or protein sequences in multiple fasta format (MFA).

Test input file (nucleic acid): PROBALIGN_in.txt

Test output file1 (clustal format): probalign_clustaloutfile.aln

If you use PROBALIGN, please cite:

Roshan, U. and Livesay, D. R. (2006) Probalign: Multiple sequence alignment using partition function posterior probabilities, Bioinformatics 22(22):2715-2721 (pdf)

Data used in the paper:

Related: Probalign study for RNA-genome alignment here

