May 19, 2011

$10K bioinformatics on thousand dollar genome

It is now possible to get 100x coverage of the exome sequence for a cancer sample (or any other type of human genomic sample) on one lane of an Illumina HiSeq machine. With the Sure Select 50 MB exome kit, it still costs quite a bit more than one thousand dollars to get this data, but it is getting close. At maximum yield, it might currently be possible to multiplex 4 samples into a singe lane and still get 100x coverage of each. This will certainly be true when planned upgrades to the HiSeq machine are available.

Illumina provides some nice software (called CASAVA) that is typically run at the default settings by Core labs and sequencing outsourcing companies. This software gives high-quality genome alignments and pretty good SNP calls - useful for many purposes. However, real-world research needs are often not satisfied with default automated bioinformatics analysis. Narrowing down hundreds of thousands of SNP calls to the few real disease-related mutations is difficult hands-on work for skilled bioinformaticians. Today in my lab group, we are fighting with false-negatives: SNPs that were present but not called in the germ line sample, leading to false identification of mutations unique to the tumor. It looks like we will have to re-run the SNP detection software many times with small changes in various parameters to optimize specificity vs. sensitivity in each sample. Investigators may sub-contract this type of work to the lab that does the sequencing, they may have skilled bioinformaticians in their lab group, or they may hire bioinformatics consultants. In any case, $1K of sequence data may cost more than $10K for analysis.