FSA: Fast Statistical Alignment


See a movie of this alignment.


Introduction

FSA is a probabilistic multiple sequence alignment algorithm which uses a "distance-based" approach to aligning homologous protein, RNA or DNA sequences. Much as distance-based phylogenetic reconstruction methods like Neighbor-Joining build a phylogeny using only pairwise divergence estimates, FSA builds a multiple alignment using only pairwise estimations of homology. This is made possible by the sequence annealing technique for constructing a multiple alignment from pairwise comparisons, developed by Ariel Schwartz in "Posterior Decoding Methods for Optimization and Control of Multiple Alignments."

FSA brings the high accuracies previously available only for small-scale analyses of proteins or RNAs to large-scale problems such as aligning thousands of sequences or megabase-long sequences. FSA introduces several novel methods for constructing better alignments:

You can see more information on the FAQ.


Download and Installation

FSA is an open-source project hosted by SourceForge. You can download the latest version from the SourceForge project page.

FSA is built and installed by running the following commands:

tar xvzf fsa-X.X.X.tar.gz
cd fsa-X.X.X
./configure
make
make install

(Substitute fsa-X.X.X.tar.gz with the name of the file that you downloaded.) The FSA executables can then be found in your system's standard binary directory (e.g., /usr/local/bin). To install to other locations, see the FAQ. Alternatively, you may just run FSA from the src/main subdirectory in which it is built (which does not require running the make install step)

If you wish to align long sequences, then you must download and install MUMmer, which FSA calls to get candidate anchors between sequences. When running ./configure, either have the MUMmer executable in your path or specify the executable with the --with-mummer option to configure. See the included README and FAQ for more information.

Please contact us if you have any build problems.


Webserver

You can submit alignment jobs to the FSA webserver. Be aware that the webserver may reject alignment jobs which contain many (> 100) sequences due to computational limitations. If you wish to align many sequences, then please download and install FSA in order to run the alignment on your personal computer.


Applications

FSA can be used for all alignment problems, including:


Citation

Please cite:
Bradley RK, Roberts A, Smoot M, Juvekar S, Do J, Dewey C, Holmes I, Pachter L (2009) Fast Statistical Alignment. PLoS Computational Biology. 5:e1000392.

The FSA manuscript can also be found in the doc/ directory of the FSA source code distribution.


Contact

Please contact us at fsa@math.berkeley.edu with any questions or feedback.


References

Please see:

[1] I. Holmes and R. Durbin. Dynamic Programming Alignment Accuracy. Journal of Computational Biology. 1998, 5 (3):493-504.

[2] G.A. Lunter. HMMoC - a Compiler for Hidden Markov Models. Bioinformatics. 2007, 23 (18):2485-2487.

[3] A.S. Schwartz. Posterior Decoding Methods for Optimization and Control of Multiple Alignments. Ph.D. Thesis, UC Berkeley. 2007.

[4] A.S. Schwartz and L. Pachter. Multiple Alignment by Sequence Annealing. Bioinformatics. 2007, 23 (2):e24-e29.

[5] N. Bray and L. Pachter. MAVID: Constrained Ancestral Alignment of Multiple Sequences. Genome Research. 2004, 14:693-699.

The SIV sequence data in the image and movie is from:

[6] B. D. Redelings and M. A. Suchard. Incorporating indel information into phylogeny estimation for rapidly emerging pathogens. BMC Evolutionary Biology. 2007, 7:40.


Acknowledgements

FSA was created by Robert Bradley. It was developed by Robert Bradley, Colin Dewey, Jaeyoung Do, Sudeep Juvekar, Lior Pachter, Adam Roberts, and Michael Smoot, along with assistance from many other people. All have made intellectual contributions and contributed code.

We give our heartfelt thanks to SourceForge for hosting this project.


Get FSA at SourceForge.net. Fast, secure and Free Open Source software downloads