May 8, 2009

Targeted Resequencing

Targeted Resequencing is one area of the DNA sequencing landscape that has not yet been revolutionized by Next-Gen technologies.

Targeted resequencing typically investigates a few genes (or a few dozen) across large populations. The largest portion of the effort involves lots of PCR to collect all the exons — or in some projects entire gene regions, then sequencing each amplification product, while keeping track of which PCR product comes from which individual. Even a small project - 10 genes, with 10 exons each, on 100 individuals means 10,000 PCR reactions, and 10,000 sequencing reactions (while keeping accurate track of 10,000 different DNA fragments and avoiding cross-contamination).

The Next-Gen approach would amplify the genomic regions in larger chunks, combine all of the chunks from one individual together, then run the library prep protocol (fragment, attach linkers, etc). So how does this play out in reality?

I read a paper in Genome Biology yesterday (Harismendy et al about targeted sequencing. They looked at six genes, which were covered by 28 large PCR amplicons (all exons plus some introns) which ranged in size from 3 to 14 kb, for a total of 266 kb of genomic DNA. These PCR products were then combined, and used in the sample prep protocols for 454, ABI SOLID, and Illumina GA sequencing. The same genes also were sequenced by standard Sanger methods using 273 short PCR reactions (88 kb).

Overall, the NG seq methods showed distinct bias favoring the ends of PCR products, and required very high coverage (34-fold, 110-fold and 101-fold for Roche 454, Illumina GA, and ABI SOLiD, respectively) to achieve a 10% false positive rate - false negative rates were much lower.

Lets talk about costs. Sanger sequencing costs from $3-10 per sample. I've got an Internet offer here for $4 per reaction, so lets use that for this study:

Sanger: $4 x 273 = $1092 per individual
Illumina is about $1000 per sample plus about $300 per sample for the library prep kit.

So I think they are about the same.

However, the Next Gen methods come out far ahead if you multiplex a group of individuals together in the same sequencing reaction. This is not possible with Sanger methods since the sequence is read from the average of a large number of molecules. Then the question becomes how deep can you multiplex while still producing enough reads from each individual research subject to achieve the depth of coverage needed? Our Illumina GAII currently produces about 2 million (usable) 35 bp reads per lane, but we are ramping up toward 5 million 50 bp reads with the latest upgrades.

2 M X 35 bp = 70 M bases
5 M X 50 bp = 250 M bases

So for 250 kb X 100x coverage = 25 M bp

So it looks like the current generation of NG machines do have a cost advantage over Sanger methods if you include 8, 10, or 12 X multiplexing. Improved accuracy and reduced sampling bias (sample prep methods) could bring down the coverage requirements and increase the advantage of NG methods.

I'd really like to hear some other opinions about this issue. We are writing several grant proposals for projects like these and I need some convincing arguments.



Keith Robison said...

I wonder if the trouble with overrepresentation of the PCR product ends is just showing that these very short fragments don't shear well. I think in the Broad group's sequence capture paper (using Agilent chips) at one phase they concatenated the fragments with intervening linkers and then fragmented that product.

You might also want to read the article in The Scientist on capture methods; the last two described seem to be appropriate for the level of multiplexing described in this paper, which is perhaps where you are looking for something.

The DNAcowboy said...

Could you explain the without-a-NGS-instrument-yet that I am, how do you achieve the $1000 cost you are mentioning?

My gues is that you'll need to order more than $1000 of reagents from illumina (or the others) to run the experiment you are describing.

Also, what is the level of multiplexing that is commercialy available from illumina? I heard AB has a 20 barcodes kit available.

Thanxx for your very interesting blog.

Marco said...

we (at BMR GENOMICS - ITALY) are trying to prepare a service of target resequencing targeting the 10 most interesting (as spread on the population) autosomal recessive genetic disease.
We have a 454 and will are probably going to use target enrichment systems such as dna microarrays.
Combimatrix's arrays are under test right now.

p.s. ( the 1000 $ of total costs is not far from the true costs adding reagents, labour costs and instruments. We have ~ 70/80kb of target to sequence)

Anonymous said...

How do you account for the cost of the 12x libraries into your per run equation? Understand that sequencing reagent costs go down. But you still have to now make 12 libraries. Naturally if you are only planning on running 12 samples then that's one thing. But with multiplexing wouldn't you find ways to perform more runs?

Anonymous said...

Capture technology is NOT what it is cracked up to be and I don't think ieven the big 3 US centers know why. But there's incredible problems with the inability to comprehensively cover using capture since, well, the sequence is only as good as the capture and capture is extremely inefficient.

bioinformatics training said...

Yah. i am completely satisfy with this is that the NG seq methods showed distinct bias favoring the ends of PCR products, and required very high coverage (34-fold, 110-fold and 101-fold for Roche 454, Illumina GA, and ABI SOLiD, respectively) to achieve a 10% false positive rate - false negative rates were much lower.