I learned something interesting today about the SNP arrays used for GWAS. There has been a lot of discussion about the nature of mutations/alleles discovered by GWAS studies in terms of the "common disease: common variant" hypothesis. It is clear that SNP arrays are designed to cover common variants - alleles that are present in at least 2% of the human population (or at least of some population). Contrary-wise, genome sequencing studies tend to focus on rare variants. In fact a number of recent studies show that major diseases such as cancer and autism tend to be associated with novel, very severe mutations in coding regions of genes.
Now this is the interesting part. We took a look at the intersection between the Illumina 2.5 M SNP array and the regions targeted by the Agilent Sure Select exon enrichment kit. It turns out that only about 90K of the Illumina SNPs are in the exon regions. This matches up with Illumina's own annotation file showing that more than 80% of the SNPs on the array are intron or intergenic. My human genetics colleague suggests that the SNP array targets sequence variants (alleles) with small effects, while the exon sequencing strategy targets mutations with large effects. So we can't really replace the SNP array with exome sequencing, they are looking at completely different things.
SeqOthello – querying RNA-seq experiments at scale - University of Kentucky researchers have developed SeqOthello, an ultra-fast and memory-efficient indexing structure to support arbitrary sequence query aga...
9 hours ago