We noted you are experiencing viewing problems
- 
        
        Check with your IT department that JWPlatform, JWPlayer and Amazon AWS & CloudFront are not being blocked by your network. The relevant domains are *.jwplatform.com, *.jwpsrv.com, *.jwpcdn.com, jwpltx.com, jwpsrv.a.ssl.fastly.net, *.amazonaws.com and *.cloudfront.net. The relevant ports are 80 and 443.
- 
        
        Check the following talk links to see which ones work correctly:
 Auto Mode
 HTTP Progressive Download Send us your results from the above test links at access@hstalks.com and we will contact you with further advice on troubleshooting your viewing problems.
- 
        
        No luck yet? More tips for troubleshooting viewing issues
- 
        
        Contact HST Support access@hstalks.com
- 
        Please review our troubleshooting guide for tips and advice on resolving your viewing problems.
- 
        For additional help, please don't hesitate to contact HST support access@hstalks.com
We hope you have enjoyed this limited-length demo
                    
                    This is a limited length demo talk; you may
                    
                      login or
                    
                    review methods of
                    obtaining more access.
                  
                
                
              Printable Handouts
Navigable Slide Index
Topics Covered
- Hidden Markov models
- Imputation and genetic imputation
- Principal Component Analysis (PCA)
- Mixed Models
- Shrinkage and regularisation methods
Talk Citation
O'Reilly, P. (2025, March 28). An introduction to statistics for statistical genetics: models and techniques common in statistical genetics [Video file]. In The Biomedical & Life Sciences Collection, Henry Stewart Talks. Retrieved October 25, 2025, from https://doi.org/10.69645/XROV3727.Export Citation (RIS)
Publication History
Financial Disclosures
- Dr. Paul O'Reilly has not informed HSTalks of any commercial/financial relationship that it is appropriate to disclose.
An introduction to statistics for statistical genetics: models and techniques common in statistical genetics
Other Talks in the Series: Statistical Genetics
Transcript
Please wait while the transcript is being prepared...
      
      
        
                  0:00
                
                
                  
                    This lecture, An Introduction to Statistics for Statistical Genetics,
                  
                    is the first talk in the statistical genetics series.
                  
                    I'm Dr. Paul O'Reilly,
                  
                    a senior lecturer in statistical genetics,
                  
                    performing research at King's College London.
                  
                    This is Part Two: Models and Techniques Common in Statistical Genetics.
                  
                    In this section of the talk,
                  
                    I will give a basic overview of several models and techniques
                  
                    popular in statistical genetics, with the aim of
                  
                    providing an introductory level understanding of each.
                  
                    If you plan to use any of these approaches,
                  
                    then you will need to obtain further details from other lectures in this series,
                  
                    in statistical textbooks, or online.
                  
                
              
                  0:38
                
                
                  
                    In part two of the talk,
                  
                    I'll introduce a number of statistical models and techniques
                  
                    that are often used in statistical genetics.
                  
                    First, I'll describe Hidden Markov models.
                  
                    Then I'll explain the process of statistical imputation,
                  
                    as well as a method of imputation especially tailored for application to genetic data.
                  
                    Next I'll explain principal component analysis, and then mixed models,
                  
                    and finally, I'll describe shrinkage and regularisation methods.
                  
                    For each of these I'll give the general intuition of the model or technique
                  
                    and explain its relevance to statistical genetics using examples from the field.
                  
                
              
                  1:13
                
                
                  
                    A process has the Markov property if
                  
                    the next state, in space or time, is governed only by the present state.
                  
                    A useful way to think about this in a real life context is to consider the weather.
                  
                    If it is raining now,
                  
                    it doesn't matter too much that it was dry and sunny yesterday,
                  
                    it's very likely to be still raining in one minute from now.
                  
                    Strictly speaking weather isn't Markovian since
                  
                    the recent weather or present season is also informative about future weather;
                  
                    But over short periods of time weather is close
                  
                    enough to Markovian to be a useful analogy.
                  
                    If we model a system or process as having the Markov property,
                  
                    then it is called a Markov model.
                  
                    A Markov model is typically made up of some number of possible states,
                  
                    which in the case of weather might be
                  
                    raining, snowing, sunny, and overcast,
                  
                    along with transition probabilities of switching from one state to another.
                  
                    The transition probabilities may be different for different transitions.
                  
                    For example, going from overcast to snowing
                  
                    has higher probability than going from sunny to snowing.
                  
                    Markov models are extremely useful in analysing genetic data
                  
                    because the ancestral contributions to our DNA sequence are highly Markovian.
                  
                    Consider a chromosome that came from either your mother or father.
                  
                    This chromosome will be a mosaic of
                  
                    your grandparents chromosomes as a result of recombination.
                  
                    It can be viewed as being made up of two states,
                  
                    either grandmother or grandfather,
                  
                    and if at a particular locus,
                  
                    the sequence is from your grandmother,
                  
                    then at the very next locus this is most likely from your grandmother as well,
                  
                    but with some probability there will be a transition
                  
                    to sequence that came from your grandfather.
                  
                    Likewise, we can view our chromosomes as being a mosaic of
                  
                    our great-grandparents chromosomes
                  
                    or of our ancestors from any number of generations ago.
                  
                    Regions with high recombination rates will involve
                  
                    many transitions between these ancestral sequence states,
                  
                    whereas those with little recombination may correspond to only a single ancestral state.
                  
                    Because of the relatedness among all individuals,
                  
                    this means that a sample of individual's chromosomes,
                  
                    and genetic variation data in general,
                  
                    can be well captured by Markov models.
                  
                    In practice, Hidden Markov models,
                  
                    or HMMs for short,
                  
                    are usually employed, because
                  
                    the ancestral states are unknown but genotype data can be used to estimate them.
                  
                    Going back to the weather analogy,
                  
                    applying hidden Markov Models is a bit
                  
                    like trying to estimate the state of the weather only from
                  
                    data on what clothes people are wearing or
                  
                    if they're applying sun cream or holding umbrellas.
                  
                    In genetics, our observed data are usually genotypes,
                  
                    and we can use these to estimate
                  
                    different hypothetical ancestral sequence underlying
                  
                    the genotypes at a genomic locus in a sample of individuals.
                  
                    This has the effect of clustering the sample of
                  
                    DNA sequences into groups of similar sequence.
                  
                    The HMMs are also used to estimate when there are transitions between
                  
                    different ancestral sequences along individual chromosomes.
                  
                    By capturing the structure of genetic variation data in samples of individuals
                  
                    in a way that reflects their present similarities and differences, and ancestral histories,
                  
                    Hidden Markov Models are extremely useful in statistical genetics and have been
                  
                    employed in a wide range of applications including estimating haplotypes from genotypes,
                  
                    identifying copy number variants,
                  
                    characterising population admixture, and in performing genetic imputation.
                  
                    The problem of missing data plagues medical and scientific research,
                  
                
              
        Hide 
        
        
        
     
       
     
                    
                     
        
      
     
        
      
    