An introduction to genetic association analysis

Published on April 19, 2016   39 min
0:00
I'm Jenny Barrett, Professor of Statistical Genetics at the University of Leeds in the UK. And in this talk, I'm going to give An Introduction to Genetic Association Analysis.
0:12
Genetic association simply refers to the statistical association between the genetic variant and the trait. The trait could be categorical, such as whether or not someone has diabetes or continuous, like height. So for a disease trait, genetic association is seen if the disease frequency varies according to genotype, or equivalently, if particular genotypes are more common in disease cases than in controls from the same population. Here, if we look at the three genotypes AA, AG, GG, defined by one particular A-G single nucleotide polymorphism or SNP, we see the frequency of a common disease in people with the three genotypes varies from 515 per 10,000 adults in those with the AA genotype to 712 per 10,000 adults with the GG genotype. Similarly, for a quantitative trait, we might see a slightly different distribution of trait values according to genotype illustrated here by height in adult males where we can see the mean height is highest in those with the AA genotype.
1:26
Tests for genetic association with a single genetic variant are very simple. We'll begin by considering binary traits, typically a comparison between cases with a particular disease and controls. Note that most of the examples in this talk will be about SNPs, although, there are other types of genetic variants such as simple indoles that could be analyzed in a similar manner. A simple test for association is the chi-squared test. So a chi-squared test on this 3 x 2 table shows very strong evidence for association, where we can see the difference in genotype frequencies between cases and controls. The GG genotype, and to a lesser extent, the AG genotype is much more frequent in cases. Furthermore, we'd probably be interested in estimating the relative risk associated with each genotype. Assuming the data from a case control study, we could do this by calculating an odds ratio to estimate risk to individuals with the AG or GG genotypes, compared with AA as baseline. If the disease is rare, this gives a good approximation to the relative risk. So we see, in this case, that the heterozygotes with the AG genotype are at about 30 percent increased risk with an odds ratio 1.29, and the rare homozygotes are about twice the risk compared with the baseline. Note that this test of the 3 x 2 table of genotypes versus disease status is a very general test. No assumptions are made about the nature or pattern of risk. This has advantages. It's good not to make assumptions, but also disadvantages in terms of likely loss of power. If instead, we're prepared to make assumptions about the genetic model, then other more specific tests can be used.
3:24
If we assume that the G allele acts in the dominant fashion, so the probability of disease is the same whether an individual carries one or two copies of the G allele, then the AG and GG genotype groups can be combined. A chi-squared test can be applied to the 2 x 2 table, and the odds ratio for carriers of the G allele versus non-carriers is estimated to be 1.4. We can gain power by doing this, but only if the model reflects the reality, which in this case, it doesn't appear to.
3:60
Similarly, if we assume that G acts recessively, we could compare the GG genotype group to everyone else, combining the AA and AG as a new baseline.
4:14
In practice, an additive genetic model is most widely used. This assumes that the effect of a heterozygous genotype is halfway between the other two groups on some measurement scale in this context, the log odds scale. This would mean the odds ratio would increase multiplicatively. We can test for this trend and risk using Cochrane-Armitage test statistic, this is calculated similarly to a standard contingency table chi-squared test, but it include weights for each row, zero, one, and two in this application. Under the null hypothesis, the statistics should follow a chi-squared distribution with one degree of freedom. We'll return to the issue of estimating effect sizes shortly.
Hide

An introduction to genetic association analysis

Embed in course/own notes