Yesterday I attended an excellent symposium on genomic structural variation organized by the Simons Foundation. The unifying theme from all of the speakers was the use of Pacific Biosicences long read technology to resolve large-scale duplicated sequences in the human genome. These long PacBio reads (5-10 kb) can be assembled across genetic regions with complex patterns of repeat structures, segmental duplications, inversions and deletions.
For me, The highlight of the afternoon was a talk by Evan Eichler from the University of Washington. Dr. Eichler presented both detailed sequencing data from specific loci and a grand overview of structural variation that synthsizes copy number variation, multi-gene families, the biology of autism and human evolution. His first point was that the reference genome is missing substantial sections of duplicated DNA, which has significant variation from person to person. Assembly software will tend to collapse multiple, nearly identical paralogus gene copies into one locus. Dr Eichler’s group has constructed more accurate sequences for regions with these complex patterns of segmental duplication using long PacBio reads. He has identified paralogous copies of genes, which actually exist as multi-gene families, and then created specific tags to track the copy number of various gene isoforms in different human genomes (such as from the 1000 genomes project). For example the SRGAP2 locus has 4 isoforms, each of which may be repeated several times in the genomes of some people.
Second, he explained that these regions of frequent copy number variation are often the site of deletions in the genomes of people with autism. These deletions and duplications may be quite large and typically include dozens of other genes besides the family of paralogs. In fact, the genome has hotspots of CNVs that are flanked by high-identity duplicated regions. In addition, some people may have additional duplications at hotspots, which create a predisposition for deletion or expansion events in their progeny.
Why do these deletions and duplications cause autism? Dr. Eicher suggested that brain development is a process that involves many genes, and it is particularly sensitive to gene dosage.
Dr. Eichler proposed a link to human evolution that is quite tantalizing. Many of the families of duplicated genes at the CNV hotspots are involved in brain development. These same genes are not duplicated in apes. A process of gene duplication and sequence variation allows for positive selection for new brain development phenotypes. So the gene duplication process which created expanded and more complex human brains may also make us susceptible to neurologially damaging CNV mutations.