So what I would like to do is
really kind of focus on
where I think the field is heading
with respect to copy number
and genome structure variation
and highlight some of the limitations
and excitement in the field
with respect to future discovery.
So the number of unanswered questions
I think the most important is
how we're going to actually activate
genotype copy number variation
particularly within these
difficult duplicated regions
and there is a need to develop
specifically for that purpose.
I and other labs really have argued
that sequence resolution
is the key for understanding
the genetic basis of disease.
I will highlight an example of this.
And finally, I want to share with you
some of the excitement
really with respect
to more complete ascertainment
in terms of discovery of smaller variation
using next generation sequencing technology
and really the promise, potential promise
of developing personalized CNV maps
for individual genomes.
This slide actually highlights
one of the limitations
of many of the commercial SNP microarrays
that are commonly used
to detect copy number variation.
So what I'm showing you
are two histograms
of both the Illumina and Affymetrix
SNP platforms various versions,
and the number of probes
that are placed with insights
that we've actually sequenced
and confirmed with the base pair level
as copy number polymorphic.
Most groups would agree that
you need at least two probes
to accurately genotype copy number variation,
preferably more than two probes
to accurately genotype
copy number variation
once you know where it exists.
So these two histograms
actually show you
that there is in fact a deficit of probes
in regions of copy number polymorphism
owing largely to the fact that they map
to complex regions of genome
and where there has been fewer probes
laid down than the average.
So typically, in these roughly
500 regions that we've sequenced,
there is only about 60% of the sites
that could be in principle
and in fact, the actual number
is much less than that.
If you then looked at,
for example, new insertion sequences
or duplication intervals
as opposed to deletion intervals,
the number gets in fact considerably worse.
In those cases, less than 20% of the existing
or known sequence resolved duplications
and new insertion sequences
could be adequately resolved
using these types of platforms.