Table Of Contents

Next topic

Changelog

This Page

Fork me on GitHub

seqmagick

Motivation

We often have to convert between sequence formats and do little tasks on them, and it’s not worth writing scripts for that. Seqmagick is a kickass little utility built in the spirit of imagemagick to expose the file format conversion in Biopython in a convenient way. Instead of having a big mess of scripts, there is one that takes arguments:

seqmagick convert a.fasta b.phy    # convert from fasta to phylip
seqmagick mogrify --ungap a.fasta  # remove all gaps from a.fasta, in place
seqmagick info *.fasta             # describe all FASTA files in the current directory

And more.

Installation

First, you’ll need to install BioPython. NumPy (which parts of BioPython depend on) is not required for seqmagick to function. Once done, install the latest release with:

pip install seqmagick

Or install the bleeding edge version:

pip install git+git://github.com/fhcrc/seqmagick.git@master#egg-info=seqmagick

Use

Seqmagick can be used to query information about sequence files, convert between types, and modify sequence files. All functions are accessed through subcommands:

seqmagick <subcommand> [options] arguments

Supported File Extensions

By default, seqmagick infers the file type from extension. Currently mapped extensions are:

Extension Format
.afa fasta
.aln clustal
.fa fasta
.faa fasta
.fas fasta
.fasta fasta
.fastq fastq
.ffn fasta
.fna fasta
.frn fasta
.gb genbank
.gbk genbank
.needle emboss
.phy phylip
.phylip phylip
.phyx phylip-relaxed
.qual qual
.sff sff-trim
.sth stockholm
.sto stockholm

When reading from stdin or writing to stdout, seqmagick defaults to fasta format. This behavior may be overridden with the --input-format and --output-format flags.

If an extension is not listed, you can either rename the file to a supported extension, or specify it manually via --input-format or --output-format.

Additionally, most commands support gzip (files ending in .gz) and bzip (files ending in .bz2 or .bz) compressed inputs and outputs. File types for these files are inferred using the extension of the file after stripping the file extension indicating that the file is compressed, so input.fasta.gz would be inferred to be in FASTA format.

Acknowledgements

seqmagick is written and maintained by the Matsen Group at the Fred Hutchinson Cancer Research Center.

Contributing

We welcome contributions! Simply fork the repository on GitHub and send a pull request.