CLUSTAL Format:

CLUSTAL format is usually identified with the suffix ".aln".

From the EBI Site: ALN format was originated in the alignment program ClustalW. The file starts with word "CLUSTAL" and then some information about which clustal program was run and the version of clustal used. e.g. "CLUSTAL W (2.1) multiple sequence alignment" The type of clustal program is "W" and the version is 2.1. The alignment is written in blocks of 60 residues. Every block starts with the sequence names, obtained from the input sequence, and a count of the total number of residues is shown at the end of the line. The information about which residues match is shown below each block of residues:

"*" means that the residues or nucleotides in that column are identical in all sequences in the alignment.
":" means that conserved substitutions have been observed.
"." means that semi-conserved substitutions are observed. An example is shown below.

CLUSTAL W 2.1 multiple sequence alignment      


FOSB_MOUSE      ITTSQDLQWLVQPTLISSMAQSQGQPLASQPPAVDPYDMPGTSYSTPGLSAYSTGGASGS 60  
FOSB_HUMAN      ITTSQDLQWLVQPTLISSMAQSQGQPLASQPPVVDPYDMPGTSYSTPGMSGYSSGGASGS 60
                ********************************.***************:*.**:******  

A more strict definition of the format is as follows:

  • The first line in the file must start with the words "CLUSTAL W". .
  • One or more empty lines.
  • One or more blocks of sequence data. Each block consists of:
    • One line for each sequence in the alignment. Each line consists of:
      1. the sequence name
      2. white space
      3. up to 60 sequence symbols.
      4. optional - white space followed by a cumulative count of residues for the sequences
    • A line showing the degree of conservation for the columns of the alignment in this block.
    • One or more empty lines.

If there is a tool or a feature you need, please let us know.


CIPRES – Cyberinfrastructure for Phylogenic Research