Common Disease

  • Created by: Hpgrice
  • Created on: 02-10-19 09:58

Chromosomal Abnormalities

Visible alteration of chromosome or abnormality produced by chromosomal abnormalities e.g. reduction of chromosome number during meiosis

1. Numerical

  • Triploid/tetraploid - lethal and not compatible with animal life
  • Aneuploidy - extra copy of one or more chromosomes due to nondisjunction during meiosis and can be compatible with animal life e.g. trisomy 21 results in Down's Syndrome

2. Structural

  • deletion
  • duplication
  • translocation - one part of chromosome moves to another)
  • inversion -part on chromosome is reversed, occurs when undergoes and breakage and rearrangment witin itself, paracentric or pericentric

If occurs in egg/sperm = constitional abnormality = all cells

If occurs later in life = somatic abnormalities = some cells (mosaicism) 

1 of 49

Mendelian Disorders

  • Rare, monogenic (caused by single gene),
  • Mendelian inheritance - recessive, dominant or sex linked,
  • Doesn't include de novo mutations or mosaics 
  • High penetrance - prophylactic intervention
  • Environment effect weak but mutation effect can be modulated by common genetic variant and environmental factors e.g. diet

Gene mutation:

  • Point Mutation - e.g. sickle cell anaemia mutation in HBB. Silent, missense (changes amino acid), nonsense (change to stop codon)
  • Deletion - e.g. cystic fibrosis deletion in CFTR 
  • Insertion - e.g. Huntington's disease CAG repeated up to x120

1.Somatic (body) cells - not transmitted to progeny (acquired mutation) but can cause malignant transformation and congenital diseases 

2. Gamete cells (germ cells) - transmitted to progeny (inherited mutation) can give rise to inherited disease

2 of 49

Mendel's Laws

Not valid for multifactorial complex disease

Law of Segregation - during gamete formation, the 2 alleles for each gene (locus) segregate from each other so that each gamete carries onle one allele for each gene (locus)

Law of Independent Assortment - Alleles for different traits separate independently during the formation of gametes

Law of Dominance - Some alleles are dominant, some recessive. An organism with at least one dominant allele will display the phenotype of that allele.

3 of 49

Multi-factorial Complex Disorder


Polygenic - multiple SNPs or genes to increase or decrease disease risk, SNPs can modify effects of other SNPs

Environment - strong effect, interactions between genetic, lifestyle and environmental factors and age, can modulate SNP affect can be modified by environment factors

Pattern of Inheritence - non-mendelian - unknown

Penetrance - low, prophylactic intervention unlikely

De novo and mosaics

4 of 49

Overlap of Mendalian and Complex Disease

Example of a Mendelian Disorder with influence of other factors:

CFTR mutation causes Cystic Fibrosis but onset, severity and outcome can vary depending on other factors such as:

  • physical environment - passive smoke, outdoor pollution, pathogenic microorganisms
  • socioeconomic status, cultural, family context - nutrition, stress level, social support and disease self management skills
  • other genetic factos - gene modifiers

Some complex disorders can be monogenic but majorirty are polygenic/ mutli-factorial

5 of 49


Single nucleotide polymorphism = single base change

Most common form of genetic variations in the human genome = 90% of all variations = approx 85m SNP's


Minor allele frequency (MAF) is the frequency at which the second most common allele occurs in a given population. If the base change has a MAF of more than 1% then it is a SNP but if less then it is a mutation

An advantageous allele can become a disadvantage in a different environment.

Mode of inheritance

Crossing over in prophase I of meosis - homologous recombination

6 of 49

SNP Effects

1. Promotors - TF binding site whisich can increase/decrease gene expression                       

2. 5' or 3' UTR - mRNA regulation which effects mRNA stability and transcription regulation

3. Coding region - synonymous = silent due to genetic code degeneracy, non-synonymous missense =  change in amino acid and thus protein activity/stability/regulation, non-synonymous nonsense = premature STOP codon

4. Intron - splicing site would effect mRNA processing

5. Intergenic - enhancer or silent could increase/decrease gene expression

7 of 49

Epigenetic Factors

Regulate gene expression independently of gene sequence

Heritable but reversable - turn genes on/off in a particular cell type, response to stimulus, distinct disease states - potential therapeutic target

1. Histone modifications: acetylation, phosphorylation - affect the extent to which DNA is wrapped around histones and gene regulation

2. DNA methylation (promoter regions): hypomethylation increase gene expression, hypermethylation decrease gene expression, reduce binding of transcription factors to methylated promoter regions, CpG islands

3. MicroRNA: promote mRNA degradation or suppression of protein translation 

Methylated patterns determined during embryogenesis and passed onto differentiating cells and tissues

Epigenetic mechanisms can be affected by environmental factors such as development (nutrition in utero), diet, environment chemicals, drugs and pharmaceuticals, ageing or altered in complex disorder modulated health outcomes e.g. cancer, auto-immune disease, diabetes

8 of 49

Human Genome Project

Large number of inter-individual genetic variations in our genome 

Novel sequences technologies and analytical tools

Cost was originally $100 million dollars but no approx $1000

Development of genomics research

Application of high-throughput technologies

Large biological data sets

Personalised medicine is a emerging field

Discover aetiology, novel therapeutic targets, identification of biomarkers

9 of 49

Candidate SNP approach

Aim: discovery of new functional genetic associations (or absence of association) between functional SNPs within pre-specified genes of interest and disease

Approach: based on prior knowledge about the genes and SNPs in relation to disease = hypothesis driven, study small number of SNPs (1-20), genotyping of theses SNPs in a population (e.g case-control study) and compare genotype frequency between cases and controls at each locus

Advantages: relatively cheap, small to medium size study population (100s-1000s) - cheap, look at functional impact of the SNPs, can account for SNP x SNP or SNP x Environment interaction 

Limits: need prior knowledge, small to medium size study population - is this representative of all the population

10 of 49

Pathway approach and GWAS

SNP - known or unknown functionality

Large number of SNPs studied:

  • multiple statistical test - risk of false positives
  • requires corrections for multiple testing e.g if significant p-value threshold p<0.05 for 1 test then for n-tests i.e n SNPS p<( 0.05/ n)
  • increase statistical power so need to use a very large study population and reduce the number of SNPs to genetype: use of Tag SNPS

A Tag SNP is a representative SNP in a genome region with high LD that represents a group of SNPs (haplotype). Means you don't have to genotype all the SNPs reducing cost and number of tests required

11 of 49

Pathway Analysis

Aim: discover new genetic SNPs, disease associations within a given metabolic pathway, identify genes invovled in the disease mechanism within this pathway

Approach: based on prior knowledge about the role of a given metabolic pathway in a disease, study of 100s-1000s of SNPs, include functional and Tag SNPS to capture simultaneously the genotype of all SNPs in all genes present in the metabolic pathway, case-control study

Advantages: not too expensive, include some functional SNPs to see functional impact of SNPs, can account for SNP x SNP or SNP x environment (e.g diet) interactions, medium size study population (1000s)

Limits: need prior knowledge about the metabolic pathway and disease but not the genes or SNP, medium size study population

12 of 49

GWAS - Genome Wide Association Study

Aim: discovery of new genetic SNPs - disease associations. 

Approach: hypothesis free, no prior knowledge required, genotyping or Tag SNPS to capture simultaneously the genotype of all SNPs in the genome (in theory) in reality depends on the genotyping platform, case-control study. At each measured locus (SNP) carry out a χ 2 (or other) test of association between genotype and phenotype. 

Advantages: no prior knowledge required, identify potential novel genes involved in disease pathway, very large study population (1000s - 10,000s) so more confidence in the results reflecting what is happenign in the general population.

Limits: very expensive, rely of Tag SNP and LD, genotype and LD patterns may differ between populations thus genotyping platform may not be relevant for the study population, no information of functionality: Tag SNP may not be the causative SNP need further functional studies, cannot account for SNP x SNP or SNP x environmetn interaction (multifactorial disorders??), very larger study population - no matching/ expensive. Requires more stringent significance level (e.g. p = 5 × 10^−8 ).  If testing 1mil SNPs using p = 0.05, would obtain 50,000 ‘significant’ results just by chance and need to use large sample sizes (1000s of individuals) to have sufficient power.

13 of 49

Limits of Candidate SNP, Pathway Analysis, GWAS

  • Need functional studies to determine functionality of a given SNP in a given complex trait
  • SNP association identified in one population may not be transferable to other populations- different allele frequency bewtween populations, different LD patterns between populations, differen risk factors for a given disease between populations
  • Study population - controls versus case: age, gender, disease outcome, disease stages or matched case-control study
  • Do not study epigentics, CNV, rare variant associations, microbiome


  • points to genomic regions likely the harbour disease genes but don't know the functional variant that causes the disease
  • SNPs identified through GWAS generally have small ORs (< 1.5), suggesting their effects are not very ‘important’
    • As we increase sample size, we detect more and more ‘significant’ SNPs with smaller and smaller effect sizes (ORs) but the SNPs identified do not have strong predictive value (e.g. for predicting disease status)
  • GWAS are best considered as a hypothesis generating exercise to identife ‘candidate’ genomic regions for further investigation and potentially pointing us to new biology 
14 of 49

Obesity Intro

Metabolic disorder associated with insulin resistance, type 2 diabetes, dyslipidemia and hypertension

Consequence of the imbalance between energy intake and expenditure

Growing prevalence worldwide - epidemiology

Risk factor for other major chronic disease: cardiovascular disease, type 2 diabetes and several cancers

Causes: genetics, epigentics, environmental - excess food intake, lack of physical activity, genetic x environmental, socio-economical factors

Obesity related traits - high BMI, unhealthy weight gain, high waist circumference, excess of body fat

Two types: Monogenic - mendelian obesity (early onset) and polygenic - common obesity

15 of 49

Obesity Causes


  • Natural obesity in animals for hibernation/migration etc.
  • is obesity in humans adaptive response in case of famine? No because approx 70% of population not obese. If it was positive selection then everyone would be obese

Evidence for genetic contribution

1.Inter-individual variation of BMI/body fatness studies: identical twins, adoptees (environment affect only), and family data in Canada, Denmark and Norway

  • genetic factors: 25-40%
  • non hereditary factors (e.g diet/exercise): 60-75%

2. Animal models ob/ob mice

  • mutation in ob (obese) gene results in severe obesity and type 2 diabetes
  • Leptin (ob (or lep) gene product): satiety hormone produced in adipose tissue, decreases appetite, body fat stores and insulin production - see lecture 3 for image of how leptin works
16 of 49

Obesity Monogenic - Intro

Monogenic mutations lead to defects in the satiety (loss of leptin signaling) and appetite control centres in the brain

Four patterns of inheritence

  • autosomal dominant
  • autosomal recessive
  • x-linked dominant
  • x-linked recessive
17 of 49

Obesity Monogenic - Satiety Centre

Mutations affecting the satiety centre:

1. Congential Leptin deficiency: extremely rare mutations in LEP gene = no leptin. Children with deficiency treated with Leptin rapidly lose fat mass and body weight but not lean mass (see lec 3)

2. Leptin Receptor deficiency (leptin resistance): rare mutations in LEPR gene = no response to leptin

Both autosomal recessive and result in excessive hunger leading to severe obesity, decreased production of hormones directing sexual development and reproductive function, hyperinsulinemia (insulin resistance), onset in first few months of life

18 of 49

Obesity Monogenic - POMC Pathway

Mutations affecting the appetite centre: POMC pathway

POMC = proopiomelanocortin (POMC) - extensively cleaves into multiple peptide hormones 

  • ACTH (adrenocorticotropic hormone) stimulates cortisol release by adrenal gland 
  • α-MSH and β-MSH (melanocyte stimulating hormone) -weight regulation and energy balance maintenance, skin pigmentation (melanin production) - used to identify the disease but doesn't affect the disease
  • β-endorphin and met-encephalin - endogenous opiates

POMC deficiency

  • autosomal recessive
  • mutation in POMC gene - truncated version or absense of POMC = decreased levels/loss of above proteins, dysregulation of body's energy balance, excessive hunger = excessive feeding and severe obesity
  • severe obesity: normal birth weight but obese at 1 year old
  • red hair and pale skin - common phenotype
19 of 49

Obesity - Regulation of Feeding and Metabolism

Hypothalamus in Regulation of Feeding and Metabolism

Leptin/ Melanocortin Pathway

Two Neuron populations control food intake and satiety

1. POMC neurons

  • activated by leptin and insulin
  • produce α-MSH which activates melanocortin (MC4R & MC3R) receptors and increases sateity signal

2. Neurons (NPY/ AGRP)

  • express neuropeptide Y (NPY) and agouti-related protein (AGRP)
  • activated by Ghrelin (hunger hormone)
  • inhibit MC4R and POMC neurons signalling resulting in increased appetitie and weight gains

see lecture 3 for diagram

20 of 49

Obesity Polygenic (Common)- Candidate SNP Approach

Most obese people have very high circulation concentrations of leptin and are leptin resistant - failure of leptin to access brian or stimulate downstream signalling pathway

Candidate SNP approach

  • more that 400 studies covering 114 candidate genes have reported significant associations between candiate genes and obesity related phenotypes
  • Associatons found in humans: body weight, BMI, overweight and obesity, body compositions - fat distribution and energy expenditure, changes in body weight and compositions

Some studies found a positive association between risk of obesity and SNPs involved in Leptin/Melanocortin pathway including: AGRP, LEPR, MC4R, NPY and POMC. But other studies could not replicate these associations possibly due to environmental factors e.g. diet. These SNPs are in the same metabolic pathway as the ones observed in Mendelian obesity


21 of 49

Obesity Polygenic - GWAS

Association with SNPs in FTO (fat mass and obesity) gene - first obesity susceptability locus reported in 2 independent GWAS

1. Frayling TM et al. 2007: cluster of 10 SNPs genetically linked, association replicated in 13 cohorts with over 38,000 participants

2. Scuteri et al. 2007: cluster of SNPs on Chr 16 strongly associated with BMI, hip circumference and weight, over 12,000 participants

Many more GWAS confirmed association between FTO and BMI, obesity or adiposity across populatons of diverse ancestry

FTO functional study showed adults homozygous dominant for a SNP found in Frayling study (rs9939609) had 3kg heavier than adults with no copy of the SNP, they also had increase BMI and waist circumference.

The individuls who were physically active - FTO's effect was attenuated by around 30% (Celis-Morales C et al. 2016). Odds ratio for obesity related disease for each SNP is relatively low - combined affect

22 of 49

Obesity - SNPs in FTO Gene

SNPs in FTO gene associated with increased BMI

FTO (fat mass and obesity associated protein)

  • mRNA demethylase
  • highly expressed in hypothalmus

SNP (rs9939609) in FTO gene - risk allele linked to

  • decrease satiety
  • increase food intake but not enery expenditure
  • Ghrelin expression and leptin resistance

Animal model (mice)

  • over-expression of FTO = increase food intake and obesity
  • deletion or missense mutations of FTO = increased leaness
  • fasting can increase expression of FTO
  • increase orexigenic hormone including ghrelin, neuropeptike Y (NPY)
23 of 49

Obesity - Epigenetic Affects

DNA methylation

Impact of Nutrition in utero: foetal origin of obesity

  • high or low birth weight associated with increase risk of complex disease such as Type 2 diabetes and obesity

Foetal over-nutrition and under-nutrition

  • associated with increase risk if complex diseases and obesity e.g famine during pregnancy/development such as the Dutch famine (pregnant mothers from famine resulted in obese offspring)

Maternal obesity

  • associated with increased risk of obesity in offspring

Epigenetic modifications in adult life

  • environmental/ lifestyle factors
24 of 49

Obesity - Gut Microbiome

Gut microbiome of obese people is very different to lean people.

Twin Studies (1 lean twin, 1 obese twin)

  • microbiota transplants in mice
  • gut bacterial population compositon (decrease bacteriodetes) associated with altered expresion of genes involved in energy uptake and adipogenesis metabolic pathways

Early life environment: maternal gut bacterial compostion affects foetal gut microbiota and offsprings obesity

Study: Tunbaugh PJ, Gordon JI 2009

  • Microbiota from obese twin caused lean mouse to become obese
25 of 49

Genotype Relative Risk

Approx equal to Odds Ratio

Defined as the factor by which the baseline penetrance has to be multiplied by to get the penetrance in the genotype category of interest

One category is chosen as the Genotype Baseline category e.g NN (for homozygous not-diseased). The genotype relative risk for this category becomes 1.0. To work out the relative risk for the other categories you work out the distance between each category's penetrance and the baseline penetrance and multiply that by the baseline relative risk. E.g. if DN = 0.5. Then DN penetrance/ NN penetrance = 0.5/0.1 =5. Then NN relative risk times 5 = 1x5 = 5.0 relative risk for DN

The genotype relative risk in whatever category chosen as the baseline will equal 1 as doing the above calculation results in dividing NN by itself. e.g NNpen / NNpen = 1

26 of 49

Family Studies

Traditional method to investigate genetic contribution to disease using large families/pedigrees. Used to address questions such as:

  • Is there any evidence for genetic contribution to disease
  • Can we localise the underlying genetic contributors? (to chromosomal location or genomic region

Only in certain situation do pedigrees provide information to answer: can we identify the genetic cause of disease (the gene involved and genetic mutation/variant). So require other techniques such as study of model organsims (mouse, zebrafish etc.) through breeding experiments or CRISPR/Cas9 gene knockouts or functional experiments in different human or animal cell types.

27 of 49

Segregation Analysis

Can be used to fit a mathematical model to the pattern of inheritance seen in a family:

  • To estimate the parameters ( gene frequencies, shared environmental effects etc.) that best explain the inheritance pattern seen
  • Allows fitting of more complicated oligogenic models (rather than just dominant or recessive) including one or more major genes, possibly operating against a polygenic background (lots of minor genetic effects), possibly subject to environmental effects

However in practice still not that useful for complex disease

28 of 49

Recurrence Risk

Another method to assess genetic contribution to a given disease

Denoted KR and defined as the probability of getting disease for specified "type R" relatives of an affected proband. 

R takes different values according to the relationship we are interested in e.g.

  • KS is the risk of disease for a sibling of an affected individual
  • KO is the risk for an offspring of an affected individual
  • KMZ is the risk for a monozygotic twin of an affected individual

Recurrence Risk is often expressed relative to the background population risk K.

λR = KR / K

λis known as the recurrence risk ratio (for relationship type R) e.g. the sibling relative risk, λS. If λS for a given disease is 3 then KS/ K=3. So a sibling of an affected person has 3 times the population risk. The assumption is that this increase is due to shared genetic factors but note siblings may not share environmental factors.

29 of 49

Parametric Linkage Analysis

Traditionally genetic determinants of disease have been identified using parametric linkage analysis.

  •  Find set of large families (pedigrees) each containing a number of affected individuals
  • Obtain DNA for all or subset via blood or saliva
  • Use a genotype technique to measure alleles in each individual at one or more loci
  • Fit a mathematical model modelling the co-segregation (co-transmission) of disease phenotype and alleles at the genetic marker loci

Not testing the hypothesis that a particular locus causes disease but testing hypothesis that a particular locus that has been genotyped is linked (lies close to) a locus that causes the disease.

The locus being tested has to be genotyped in a set of related individuals in general, the alleles at the disease locus not measured but will have gained some information about the disease alleles present in each person from measuring the disease phenotype.

30 of 49

Parametric Linkage Analysis cont.

E.g if the disease is recessive the all affected will have 2 copies of the disease allele so their parents must be carriers. Can use this information to investigate the co-transmission of marker and disease alleles in the pedigree.

Higly successful for mendalien disorders (e.g Huntington's) but not for complex disorders.

Calculate likelihood L (θ) = the probability of obtaining the observed (genotype and phenotype) data

  • as a function of the recombination fraction, θ
  • under the assumption that the disease is caused by a disease locus situated at recombination fraction θ from a genotyped marker locus

We compare evidence for linkage (θ<0.5) against the null hypothesis that the two loci are unlinked (θ=0.5) using the likelihood ratio: LRmax = L(θ^) / L(0.5)

  • where θ^ is the value of θ that maximises the likelihood (makes the data 'most likely' to have occured)
  • Test the null hypothesis that θ=0.5
31 of 49

Association Analysis

Aim is to directly examine the association (correlation) between alleles present at a genetic locus and a phenotype of interest.

  • Could indicate direct causal realtionship
  • Allows investigation of mechanisms and pathways in disease progression
  • Or could indicate indirect relationship due to correlation between the test variant and the causal variant known as linkage disequilibrium.
  • Can help us localize causal variant.

Most popular design is a case/control study (unrelated people)

  • Collect sample of affected individuals (cases) and unaffected individuals (controls)
  • Or a random population sample as controls - most will not have disease
  • Examine the association between alleles present at a genetic locus and presence/ absence of disease by comparing the distribution of genotypes seen in affected individuals with that seen in controls

Can find controls from Birth cohorts, population based cohorts such as UK Biobank, blood donors or 'bring a friend/family member. See case/control card

32 of 49

HapMap and 1000G

Large scale projects designed to investigate human genetic variation (and genetic correlation patterns) worldwide


Phase I:

  • YRI: Yoruba people in Ibadan
  • Nigeria (30 parent-and-adult-child trios)
  • JPT: Japanese in Tokyo (45 unrelated individuals)
  • CHB: Han Chinese in Beijing (45 unrelated individuals)
  • CEU: Utah residents of north/west European ancestry (30 trios).

Phase II: expanded out to additional populations 1000 Genomes

33 of 49

Linkage and Recombination

Genetic distance is measured in Morgans (M) or centMorgan (cM)

  • depends on liklihood of recombination between alleles at two loci
  • related to physical distance the loci are

θ represents the probability of recombination between the loci

  • θ ranges from 0 to 0.5
  • if the loci lie close together in the same chromosome, θ is small  (≈ 0) and the loci are said to be completely linked
  • if the loci are farther apart θ approaches 0.5, loci are said to be unlinked
34 of 49

Likelihood Ratio Test

Use a computer programme to caculate the probability of observed genotype and phenotype data in large families.

  • Under the assumption that the disease is caused by a disease locus close to the genotyped marker locus
  • under an assumed mode of inheritance (e.g dominant/recessive)
  • how likely the data is depends on how well the observation match the assumed model e.g do the alleles at marker locus segregate down the pedigree in an expected way

Test for linkage likelihood ratio test (LOD score)

  • tests null hypothesis that the disease locus lies far away from the genotyped marker locus
  • hope to find evidence against null hypothesis
  • e.g reject null and conclude that the disease lies close to the genotyped marker locus
35 of 49

Non-Parametric Linkage Analysis

Non-parametric linkage analysis (affected sibling pair studies) uses similar approach and tries to determine whether members of a family with simular trait values (e.g both affected with disease) tend to inherit genetic material in common from their common ancestors. Happens more often than expected by chance. Only been useful on occasion.

Success examples:

  • Type 1 diabetes - confirmed roles of HLA and insulin genes
  • Crohn's disease - NOD2/ CARD15 genes implicated
  • Age related macular degeneration - complement factor H gene identifid through combination of approaches e.g follow up of significant regions from non-parametric linkage scan

In general lack of success of linkage for both parametric and non parametric for complex disease

Risch and Merikengas (Science 273:1516-1517, 1996): Genes of small effect lead to only small increases in sharing of genetic material by affected relatives. May have greater power by instead using association analysis.

36 of 49

LOD Score

LOD scores corresponds to the log base 10 of the likelihood ratio

LOD = log10 (Lθ^) / L(0.5)

Evidence for linkage usually taken as a LOD of 3 which corresponds to a likelihood ratio of 1000 e.g. data is 1000 times more likely under the alternative hypothesis rather than under the null hypothesis

To maximise the LOD score we calculate the likelihood (or likelihood ratio) at different values of θ to create a graph curve

^ should be directly above θ

37 of 49

Multi-point Analysis

Can use a set of marker loci with known genetic map positions to assess the evidence for the disease locus lying at different positions along the genetic map

  • means that you can use information from all the marker loci together rather than just one locus at a time
  • calculate LOD score with the disease locus placed at different positions along the marker

At each positon the LOD score corresponds to the log10 of:

  • the likelihood of the data assuming the disease locus lies at that positon
  • divided by the likelihood of the data assuming the disease locus lies far away

Computer programs to create the analysis:

  • Merlin (smallish pedigrees, exact calculation)
  • SIMWALK or MORGAN (larger pedigrees, approximate calculation using Monte Carlo Markov Chain (MCMC) techniques)
38 of 49

Data Files

Use a computer program and two file options:

1. .tfam file - no genotype data

  • each row is a different person
  • columns from top: family id number, individual id number, father id, mother id, father's allele and mother's allele ( 2= diseased and 1 = not diseased)

2. .tped file

  • each row is a different SNP
  • columns from top: chromosome number, SNP name, genetic distance, base pair position of SNP, the genotype in pairs with A being one allele and B another (so e.g the first 2 = AA so AA is the genotype at that locus)
39 of 49

Case/ Control studies

Each person can have one of 3 possible genotypes at a diallelic genetic locus. Test for association (correlation) between genotype and presence/ absence of disease using standard χ 2 test for independence on 2 df.

Defined as  Sigma i =1,6 (Oi−Ei )^(2) / Ei where Oi and Ei are observed and expected counts (calculated from the row and column totals) respectively. Generates a p value indicating how significant the association/ correlation appears to be

Two odds ratios can be estimated OR (2|2 : 1|1) = af/be OR (1|2 : 1|1) = cf/de (See lecture 5 for table). Odds of disease are defined as P(diseased)/P(not diseased). Odds ratio OR (2|2 : 1|1) repesents the factor by which your odds of disease must be multiplied, if you have genotype 2|2 as opposed to 1|1 i.e. the ‘effect’ of genotype 2|2. 

Similarly, we can define the OR for 1|2 vs 1|1 As the factor by which your odds of disease must be multiplied, if you have genotype 1|2 as opposed to 1|1 i.e. the ‘effect’ of genotype 1|2 ORs are closely related (often ≈) genotype relative risk and your genotype has no effect on your probability (and therefore on your odds) of disease, then the ORs=1. So the association test can be thought of as a test of the null hypothess that the ORs=1 

40 of 49

Testing for Association

There are more sophisticated ways to perform the association test.

  • Rearrange your data to test specifically for dominant or recessive effects (or some other specific penetrance model e.g. allelic effects)
  • Use linear regression for quantitative outcomes (blood pressure, weight height etc.), use an x variable defined according to genotype (similarly can use logistic regression for case/control data).
  • Use family-based association tests (FBATs) such as the transmission disequilibrium test (TDT) or linear mixed models (LMMs), for analysing family-based data

All methods produce a test statistic and a p value indicating how significant the association/correlation between genotype and phenotype appears to be i.e. how likely it was to have occurred by chance.

41 of 49

GWAS - Quality Control

Stringent GC checks are required.

Discard samples (people) deemed unreliable

  • Low genotype call rates, excess heterozygosity etc.
  • X chromosomal markers useful for checking gender
    • Males should ‘appear’ homozygous at all X markers
  • Genome-wide SNP data useful for checking relationships and ethnicity

Discard data from SNPs deemed unreliable

  • On basis of genotype call rates, Mendelian misinheritances, Hardy-Weinberg disequilibrium
  • Exclude SNPs with low minor allele frequency (MAF)
42 of 49

Asthma Intro


  • common inflammatory disease caused by genetic and environmental factors
  • Airway narrowing and excess mucus cause breathing difficulties
  • symptoms: wheezing, chest tighness, shortness of breath and coughing
  • mild to severe and symptoms can come and go
  • can results in Asthma attacks
  • Affects all ages but usually starts in childhood
  • most common chronic disease in children
43 of 49

Asthma Causes


  • exact cause unknown
  • results of strong response of the immune system to an allergen in the environment e.g ragweed allergen - some people may not react and some may react strongly - genetic explanation?
  • environmental factors may affect risk of asthma such as airborne allergens and virus infections in infancy when the immune system is still developing

Race: African Americans and Puerto Ricans are at higher risk of asthma that other races

Gender: In children more boys that girls, in adults more women than men

44 of 49

Asthma Factors

Environmental Factors

  • Exposure to cigarette smoke during pregnancy or during a child's first few years - affect lung growth and development
  • Exposure to different microbes in the environment - affect development of the immune system - can either increase or protect against risk of developing Asthma
  • Workplace exposures - chemical irritants, industrial dusts = occupational asthma
  • Poor air quality from pollution or allergens may worsen asthma - gases from heaters or vehicles, pollen, dust etc.

Other Medical Conditions: Allergies, Obesity, Respiratory infections and wheezing

Genetic Contribution: predicted by clustering of asthma and allery symtoms among relatives and twin studies - ranges from 55-90% genetic contribution

45 of 49

Asthma Candidate Gene Analysis

Candidate Gene Analysis

  • 2017 study BA Almomani et al
  • 10 candidate genes e.g STIP1, ALOX5, ABCC1 - assessing association with Asthma in a Jordanian population of Arab descent
  • Case control studdy - 245 adult asthmatics and 249 controls
  • 1 significant genetic association in STIP1: rs2236647 (T/C) SNP
  • C allele and CC genotype of this SNP were significantly higher in asthmatics compared to controls - could be a way to identify risk of developing asthma and provide early intervention in populations of Arab descent

In another case control study on the affects of corticosterioids on Asthma (2019 M Salhi) Two STIP1 SNP's including the one above was concluded to be a potential risk biomarker for Tunision populations so could be a biomarker for other populations.

46 of 49

Asthma GWAS


  • 2011 R Anantharaman
  • Aim: Identify genetic variants that influence predispostion towards asthma in an ethnic Chinese population in Singapore
  • 2 stage GWAS on allergic asthma and controls without asthma or atopy
  • 1st stage: 490 case and 490 control - genotyped
  • 2nd stage: significate associations from 1st stage analysed in 2nd to see if could replicate. 521 cases and 524 controls
  • 19 promising SNPs passed genome wide P value threshold of 5.52x10-8 were genotyped
  • SNP rs2941504 in PERLD1 on chromosome 17q12 was found to be significantly associated with asthma at genotypic and allelic level
47 of 49

Asthma GWAS cont.

  • These findings were replicated in 3 other asthma GWAS studies - validation
  • Genotyping of additional SNPs in 100kb flanking the SNP further confirmed that the association was to PERLD1
  • PERLD1 is involved in the modification of the glycosylphosphatidylinositol anchors for cell surface markers such as CD48 and CD69 which are known to play roles in T-cell activation and proliferation
  • Conclusion: PERLD1 is a novel asthma candidate gene 
48 of 49

Asthma Early Life Exposure to Smoking

  • 2019 PE Sugier et al
  • Aim: Identify new loci interacting with Early life tobacco exposure exposure on time-to-asthma onset in childhood
  • Genome wide interaction analyses in 5 European-ancestry studies (8273 subjects) using cox proportional-hazard model. Results were meta-analysed
  • rs7334050 SNP in KLHL1 was consistent across all 5 studies. P=4.3 x 10-8
  • Suggestive interactions were found at 3 other loci within MACROD2.
  • Functional annotations and the literature showed that the lead SNPs at these 4 loci influence DNA methylation in the blood and are located nearby CpG sites reported to be associated with exposure to tobacco smoke components which strongly support the findings
  • Conclusions: Identified novel candidate genes interacting with early life tobacco exposure on time to asthma onset in childhood. The genes have plausible biological relevance related to tobacco smoke exposure. Further epigenetic and functional studies are needed to confirm these findings and determine underlying mechanisms
49 of 49


No comments have yet been made

Similar All resources:

See all All resources »See all Common Disease resources »