pplacer_demo.sh |
|
---|---|
This is a demonstration for the use of the pplacer suite of programs. It covers the use of placement, visualization, classification, and comparison. If you are looking at this file in a web browser after processing with shocco, the left column will describe what is going on in the right column. It is assumed that java is available and that you have installed Download tutorial files: |
#!/bin/bash -eu
|
Getting set up (for this demo) |
|
We start with a couple of little functions to make this script run smoothly. You can safely ignore them. |
|
We have a little script function |
aptx() {
java -jar bin/forester.jar -c bin/_aptx_configuration_file $1
}
|
A little |
pause() {
echo "Please press return to continue..."
read
}
|
Make sure that |
which guppy > /dev/null 2>&1 || {
echo "Couldn't find guppy. \
There is a download script in the bin directory for you to use."
exit 1
}
|
Echo the commands to the terminal. |
set -o verbose
|
Phylogenetic placement |
|
This makes p4z1r2.jplace, which is a "place" file in JSON format. Place files
contain information about collections of phylogenetic placements on a tree.
You may notice that one of the arguments to this command is
|
pplacer -c vaginal_16s.refpkg src/p4z1r36.fasta
pause
|
Grand Unified Phylogenetic Placement Yanalyzer (guppy) |
|
|
guppy --cmds
pause
|
These subcommands are used by writing out the name of the subcommand like
|
|
For example, we can get help for the |
guppy fat --help
pause
|
Visualization |
|
Now run
but in that case there won't be any taxonomic information in the visualizations. Here is an online version. |
guppy fat -c vaginal_16s.refpkg p4z1r36.jplace
aptx p4z1r36.xml &
|
Statistical comparison |
|
|
guppy kr src/*.jplace
pause
|
The KR metric can be thought of as the amount of work it takes to move the
distribution of reads from one collection of samples to another along the
edges of the tree. This can be visualized by thickening the branches of the
tree in proportion to the number of reads transported along that branch. To
get such a visualization, we use guppy's |
guppy kr_heat -c vaginal_16s.refpkg/ src/p1z1r2.jplace src/p1z1r34.jplace
aptx p1z1r2.p1z1r34.heat.xml &
|
Phylogenetic placement data has a special structure, and we have developed variants of classical ordination and clustering techniques, called "edge principal components analysis" and "squash clustering" which leverage this special structure. You can read more about these methods in our paper. |
|
Edge principal components analysis |
|
With edge principal components analysis (edge PCA), it is possible to
visualize the principal component axes, and find differences between
samples which may only differ in terms of read distributions on closely
related taxa. |
guppy pca --prefix pca_out -c vaginal_16s.refpkg src/*.jplace
aptx pca_out.xml &
|
The |
cat pca_out.trans
|
Squash clustering |
|
|
rm -rf squash_out; mkdir squash_out
guppy squash -c vaginal_16s.refpkg --out-dir squash_out src/*.jplace
aptx squash_out/cluster.tre &
|
We can look at |
aptx squash_out/mass_trees/6.phy.fat.xml &
|
Classification |
|
Next we run guppy's |
guppy classify --mrca-class -c vaginal_16s.refpkg p4z1r36.jplace | head -n 30
pause
|
We can quickly explore the classification results via SQL by importing them into a SQLite3 database. We exit if SQLite3 is not available, and clean up in case the script is getting run for the second time. |
which sqlite3 > /dev/null 2>&1 || {
echo "No sqlite3, so stopping here."
exit 0
}
rm -f example.db
|
Create a table containing the taxonomic names. |
rppr prep_db -c vaginal_16s.refpkg --sqlite example.db
|
Explore the taxonomic table itself, without reference to placements. |
sqlite3 -header -column example.db "SELECT tax_name FROM taxa WHERE rank = 'phylum'"
pause
|
|
guppy classify --mrca-class --sqlite example.db -c vaginal_16s.refpkg src/*.jplace
|
Now we can investigate placement classifications using SQL queries. Here we ask for the lineage of a specific sequence. |
sqlite3 -header example.db "
SELECT pc.rank,
tax_name,
likelihood
FROM placement_names AS pn
JOIN placement_classifications AS pc USING (placement_id)
JOIN taxa USING (tax_id)
JOIN ranks USING (rank)
WHERE pc.rank = desired_rank
AND pn.name = 'FUM0LCO01DX37Q'
ORDER BY rank_order
"
pause
|
Here is another example, with somewhat less confidence in the species-level classification result. |
sqlite3 -header example.db "
SELECT pc.rank,
tax_name,
likelihood
FROM placement_names AS pn
JOIN placement_classifications AS pc USING (placement_id)
JOIN taxa USING (tax_id)
JOIN ranks USING (rank)
WHERE pc.rank = desired_rank
AND pn.name = 'FUM0LCO01A2HOA'
ORDER BY rank_order
"
pause
|
That's it for the demo. For further information, please consult the pplacer documentation. |
echo "Thanks!"
|
| |