Oct 28, 2008

Gene-Boosted Assembly

Steven Salzberg describes a method for de novo assembly of a bacterial genome (Pseudomonas aeruginosa strain PAb1 = 6.2 MB) from a set of 33 bp Solexa fragments, using two closely related strains as reference sequences, and "boosting" assembly using predicted protein coding regions.

Salzberg SL, Sommer DD, Puiu D, Lee VT (2008) Gene-Boosted Assembly of a Novel Bacterial Genome from Very Short Reads. PLoS Comput Biol 4(9): e1000186. doi:10.1371/journal.pcbi.1000186

The AMOS assembler used in this project employs several different software modules and a considerable amount of hands-on effort. 

AMOScmp is a comparative alignment tool - it aligns short reads to a similar reference genome, and then builds contigs. This avoids the challenge of all-vs-all assembly for de novo genome sequencing projects. 

Minimus is a highly stringent assembler that uses Smith-Waterman alignments to identify overlaps between reads.

Contigs were then scanned for protein coding sequences using a combination of Glimmer and BLAST. The ABBA program uses protein coding information - especially at the ends of contings and singletons to close gaps.

Velvet was also used to independently assemble all the reads into contigs, them MUMMer was used to combine contigs and fill gaps. 


This method is not going to work for every de novo sequencing problem, but we are going to try something similar for some new Plasmodium and Trichomonas species. 

All software from the Salzberg lab at the Univ. of Maryland is freely available here:

and a page describing the Short Read Assembly methods here:

No comments: