CIPRES

PROBALIGN

Probalign uses partition function posterior probability estimates to compute maximum expected accuracy multiple sequence alignments. It performs statistically significantly better than the leading alignment programs Probcons v1.1, MAFFT v5.851, and MUSCLE v3.6 on BAliBASE 3.0, HOMSTRAD, and OXBENCH benchmarks. Probalign improvements are largest on datasets containing N/C terminal extensions and on datasets with long and heterogeneous length sequences. On heterogeneous length datasets containing repeats Probalign alignment accuracy is 10% and 15% than the other three methods when standard deviation of length is at least 300 and 400.

PROBALIGN manual here.

PROBALIGN home page here.

INPUT = dna or protein sequences in multiple fasta format (MFA).

Test input file (nucleic acid): PROBALIGN_in.txt

Test output file1 (clustal format): probalign_clustaloutfile.aln

If you use PROBALIGN, please cite:

Roshan, U. and Livesay, D. R. (2006) Probalign: Multiple sequence alignment using partition function posterior probabilities, Bioinformatics 22(22):2715-2721 (doi.org/10.1093/bioinformatics/btl472)

Data used in the paper:

N/C extension simulated data. Includes all programs used for simulating the data as well as the simulated datasets.
BAliBASE 2.0 repeat alignments. True alignments in FASTA format, core regions in upper case, and ambiguous ones in lower case. qscore program can be used for evaluating alignment accuracy
BAliBASE 3.0, HOMSTRAD, and OXBENCH multiple sequence alignment benchmarks from the websites hosting the distributions.

Related: Probalign study for RNA-genome alignment here

If there is a tool or a feature you need, let us know.

Probalign

PROBALIGN

Get 1000 Hours free