Oct 19, 2008


Software for basic next-gen sequencing operations.
Each of the commercial vendors has their own proprietary software, so we will emphasize the open source.

A great page about Next-Gen software on SeqAnswers

Image Processing


alternative base calling for 454 sequencer with improved quality scores. Developed by the Marth lab at Boston College

Open source primary data analysis for next-gen DNA sequencers

Alignment to a Reference Sequence

Maq   Mapping and Assembly with Quality

Gapped alignments to reference genome, another from the Marth lab at Boston College

Novoalign from Novocraft in Kuala Lumpur, Malaysia

SOAP  —  Short Oligonucleotide Alignment Program
GNU Public License
from the Bioinformatics Dept of the Beijing Genomics Institute

Maps sequence reads to genomic location, variable number of mismatches and overall quality cutoff score. By Andrew D. Smith and Zhenyu Xuan in the Zhang lab at Cold Spring Harbor.

ZOOM (Zillions Of Oligos Mapped)
Product of the Michael Zhang lab at Cold Spring Harbor
"Zoom is freely available to non-commercial users at http://bioinfor.com/zoom"

de novo Assembly

Edena  (Exact de novo Assembler)
De novo bacterial genome sequencing: millions of very short reads assembled on a desktop computer.
D. Hernandez, P. François, L. Farinelli, M. Osteras, and J. Schrenzel.

Velvet  (GPL, 64 bit Linux)
Velvet: algorithms for de novo short read assembly using de Bruijn graphs.
D.R. Zerbino and E. Birney. Genome Research 18:821-829.

SHARCGS (SHort read Assembler based on Robust Contig extension for Genome Sequencing)
Dohm JC, Lottaz C, Borodina T, Himmelbauer HSHARCGS, a fast and highly accurate short-read assembly 
algorithm for de novo genomic sequencing. 


ALLPATHS: De novo assembly of whole-genome shotgun microreads
Jonathan Butler, Iain MacCallum, Michael Kleber, Ilya A. Shlyakhter, Matthew K. Belmonte, Eric S. Lander, Chad Nusbaum, and David B. Jaffe
Genome Res. 2008 18: 810-820.

SSAKE  (GNU Public License, written in Perl)
The Short Sequence Assembly by K-mer search and 3' read Extension (SSAKE) is a genomics application for aggressively assembling millions of short nucleotide sequences by progressively searching for perfect 3'-most k-mers using a DNA prefix tree.
René L Warren, Granger G Sutton, Steven JM Jones, Robert A Holt. 2007.
Assembling millions of short DNA sequences using SSAKE. 
Bioinformatics. 23:500-501.

SNP Discovery
Another from the Marth Lab

Genome Viewers

Another tool from the Marth Lab

from Affymetrix & GenoViz
some IGB tips from Hunstman Cancer Inst. @ Univ of Utah



Uday Deshpande said...

Kindly post a news on your blog that Labindia GPOD is organizing a Workshop on Next-Gen Sequencing Data Analysis. More Information is available at http://labindia-gpod.blogspot.com

Arnaud Desfeux said...
This comment has been removed by the author.