Readseq


READSEQ

READSEQ reads and writes nucleic/protein sequences in various formats. Data files may have multiple sequences. .The java version employed here is also more efficient, working faster than the compiled C classic version. Bear in mind that the program is designed to extract sequences, and it does not pay strict attention to metadata surrounding the sequences. In other words, information can be lost during conversions. We have made a couple of fixes to READSEQ, so that it doesn't truncate taxon names any longer. We are happy to share those fixes on request. We have made a minimal installation, exposing only the features we think will be used. But if you require a feature we haven't exposed, please let us know.

Supported INPUT formats: Fasta, Clustal, Nexus, Phylip and Phylip 3.2, Plain/Raw, GCG, MSF, IG/Stanford, GenBank, NBRF, EMBL, PIR/CODATA, DNAStrider, FlatFeat, GFF, ACEDB, SCF

Supported OUTPUT formats: Fasta, Clustal, Nexus, Phylip and Phylip 3.2, Plain/Raw, GCG, MSF, Pretty, IG/Stanford, GenBank/G, NBRF, EMBL, PIR/CODATA, DNAStrider, FlatFeat, GFF, ACEDB, SCF

INPUT = dna and protein sequences, in various formats.

Test input file (fasta): readseq_in_fasta.txt

Test output file1 (nexus): readseq_out_nex.txt

If you use readseq, please cite: Gilbert D. Sequence file format conversion with command-line readseq. Curr Protoc Bioinformatics. 2003 Feb; Appendix 1:Appendix 1E. doi: 10.1002/0471250953.bia01es00. PMID: 18428689.

If there is a tool or a feature you need, please let us know.

 

hummingbird in flight

Get 1000 Hours free

On the UCSD Supercomputer

Start Your Trial