Yesterday I attended an
excellent symposium on genomic structural variation organized by the Simons
Foundation. The unifying theme from all of the speakers was the use of
Pacific Biosicences long read technology to resolve large-scale duplicated
sequences in the human genome. These long PacBio reads (5-10 kb) can be
assembled across genetic regions with complex patterns of repeat structures,
segmental duplications, inversions and deletions.
For me, The highlight of
the afternoon was a talk by Evan Eichler from the
University of Washington. Dr.
Eichler presented both detailed sequencing data from specific loci and a grand
overview of structural variation that synthsizes copy number variation,
multi-gene families, the biology of autism and human evolution. His first point
was that the reference genome is missing substantial sections of duplicated
DNA, which has significant variation from person to person. Assembly software
will tend to collapse multiple, nearly identical paralogus gene copies into one
locus. Dr Eichler’s group has constructed more accurate sequences for regions
with these complex patterns of segmental duplication using long PacBio reads.
He has identified paralogous copies of genes, which actually exist as multi-gene families, and then created
specific tags to track the copy number of various gene isoforms in different
human genomes (such as from the 1000 genomes project). For example the SRGAP2
locus has 4 isoforms, each of which may be repeated several times in the
genomes of some people.
Second, he explained that
these regions
of frequent copy number variation are often the site of deletions in the genomes
of people with autism. These
deletions and duplications may be quite large and typically include dozens of other
genes besides the family of paralogs. In fact, the genome has hotspots of CNVs
that are flanked by high-identity duplicated regions. In addition, some people
may have additional duplications at hotspots, which create a predisposition for
deletion or expansion events in their progeny.
Why do these deletions
and duplications cause autism? Dr. Eicher suggested that brain development is a
process that involves many genes, and it is particularly sensitive to gene
dosage.
Dr. Eichler proposed a
link to human evolution that is quite tantalizing. Many of the families of
duplicated genes at the CNV hotspots are involved in brain development. These
same genes are not duplicated in apes. A process of gene duplication and
sequence variation allows for positive selection for new brain development
phenotypes. So the gene
duplication process which created expanded and more complex human brains may
also make us susceptible to neurologially damaging CNV mutations.