Genetic variation in gene regulation

Published on April 21, 2015   34 min

You are viewing a talk that is a part of one of our comprehensive courses. Additional learning material: case studies, projects, workshops and recommended reading; multiple choice questions and suggested exam questions with model answers are available on application. Learn more

Other Talks in the Series: Human Population Genetics II

My name's Jonathan Pritchard. I'm at Stanford University. Today I'm going to be talking about genetic variation in gene regulation.
We now know that a lot of the genetic basis of complex traits is noncoding, and presumably this is because of variants that are affecting gene regulation, as opposed to variants that are affecting protein coding sequences. So just as one example, the figure here shows the results of a genome-wide association study for Crohn's disease. The dots on the figure show the strength of signal for association between individual SNPs and risk of Crohn's disease. Down below, you can see the locations of coding regions in yellow. What you can see is that there's a very significant region of association for Crohn's disease; however, this lies outside any known genes. And in a case like this, presumably what's going on is that there is a SNP in this region that's affecting a regulatory element that drives regulation of one of those genes marked in yellow in such a way that it affects risk for disease. And so it's become clear during the last few years that this is a major mechanism by which genetic variation affects complex traits, and so there's been a great deal of interest in trying to understand how regulatory variants work and how we detect them and understand them.
So what we know now is that only a minority of genome-wide association hits are due to non-synonymous variants. This is a figure here from a paper by Joe Pickrell in 2014 where he estimates the fraction of associated SNPs that are non-synonymous, i.e., that they're changing protein coding sequences. And you can see that all of the traits in this study, approximately between 3% and 20% of the association hits that were discovered, are due to non-synonymous variants, and this suggests that the large majority of genome-wide association hits are due to regulatory variation.
So in the last few years, there's been a great deal of interest in trying to understand how genetic variation produces changes in gene regulation. One of the main tools for studying this is using what are known as eQTLs, or expression Quantitative Trait Loci. And eQTLs are genetic variants that affect expression levels of genes. So in this cartoon example, you can see here two individuals, and there's a SNP that's upstream of the gene, and if you have the pink allele at this particular SNP, then you're producing less mRNA than if you have the green allele at the same SNP. So this would be an example of an eQTL. People have used eQTL mapping in a number of contexts, both to understand the basic principles by which genetic variation affects regulation as well as to link particular variants to gene expression in studying genome-wide association hits. As you can see on the right-hand side, there's an example of an eQTL for expression levels of the gene HLA-C, so in this example, if you have the CC genotype on the right, you can see that your expression level of HLA-C on average is quite a lot higher than if you have the AA genotype. Now you can see that this SNP only produces an incomplete correlation with HLA-C expression, and this is because there are many other factors in the genome as well as measurement error that are also affecting expression of HLA-C. It turns out that this SNP is also associated with an important phenotype, namely progression rates of HIV, and this exemplifies the fact that eQTLs often play causal roles in driving risk for complex traits.
So the first kind of question is to understand how genetic variants might influence expression. So in these eQTL studies, essentially what we're measuring is, on the right-hand side, steady state mRNA levels, and steady state mRNA levels are driven primarily by DNA sequence in the region of the gene which encodes cis-acting regulatory information. This cis-acting regulatory information is interpreted by trans-acting factors in the cell, notably transcription factors, and because the transcription factors are cell-type specific, the way in which the cis regulatory information is interpreted is often cell-type specific or context specific. When there is genetic variation that lies within this regulatory information, this can disrupt the encoded information and that can affect levels of gene expression. And these variants that affect the encoded information are then referred to as cis-eQTLs. We can also have trans-eQTLs, and cis-eQTLs refer to eQTLs where the genetic variation affecting the gene lies in the same genomic region, usually within 100 kilobases or so. Trans-eQTLs are cases where a SNP in one location of the genome is affecting expression of genes in a completely different part of the genome. And the main way that this can occur is when a SNP is a cis-eQTL for one gene and that gene in turn is regulating the expression levels of other genes in the genome. Trans-eQTLs could also arise through a non-synonymous variation. For example, a coding variant in a transcription factor that then affects the function of that transcription factor, although this is probably less common than trans-eQTLs that act through changes in expression level of a cis-acting locus.

Genetic variation in gene regulation

Embed in course/own notes