A systematic and comprehensive approach for large-scale genome-wide association studies. Unraveling non-additive inheritance models in age-related diseases

Author

Guindo Martínez, Marta

Director

Torrents Arenales, David

Mercader Bigas, Josep Maria

Tutor

Gelpi Buchaca, Josep Lluís

Date of defense

2019-12-18

Pages

267 p.



Department/Institute

Universitat de Barcelona. Facultat de Biologia

Abstract

Genome-wide association studies (GWAS) have been proven useful for identifying thousands of associations between genetic variants and human complex diseases and traits. However, the identified loci account for a small proportion of the estimated heritability (i.e., the proportion of variance for a particular phenotype that can be explained by genetic factors). The usually small effect size of common variants and the low frequencies of some variants with potentially larger effect sizes limit the statistical power of GWAS. The identification of common variants with small effects and low-frequency variants with large effects can be overcome with the analysis of larger sample sizes and imputing genotypes using dense reference panels. However, there is still room for improvement beyond increasing the sample size and the number of variants. As current GWAS are predominantly focused on the autosomes and only test the additive model, current strategies still constrain the full potential of GWAS. In this thesis, we hypothesized that performing a comprehensive analysis improving current GWAS strategies by 1) implementing the analysis of the X chromosome alongside the autosomes, 2) including genetic variants from a broader allele frequency spectrum and type of variants, such as small insertions and deletions (INDELs) through genotype imputation using multiple reference panels, and 3) testing different models of inheritance in the association test, would improve our understanding of the genetic architecture of complex diseases. To test these hypotheses we developed an integrated framework including our methodology, called GUIDANCE. Hence, GUIDANCE integrates state-of-the-art tools for GWAS analysis, including the analysis of X chromosome, a two-step imputation with multiple reference panels, the association testing including additive, dominant, recessive, heterodominant and genotypic inheritance models, and cross-phenotype association analysis when more than one disease is available in the cohort under study. We used GUIDANCE to analyze the Genetic Epidemiology Research on Adult Health and Aging (GERA) cohort, a publicly available cohort that includes 62,281 subjects from European ancestry with an average age of 63 years for 22 diseases, representing the largest cohort for age-related diseases to date. After quality control, we analyzed 56,637 subjects from European descendant populations. Following our methodology, we imputed genotypes using 1000 Genomes Project (1000G) phase 3, the Genome of the Netherlands project (GoNL), the UK10K project22, and the Haplotype Reference Consortium (HRC) as reference panels. Using this strategy, we identified 26 new associated loci for 16 phenotypes (p < 5 × 10-8), with 13 showing significant dominance deviation (p < 0.05). Importantly, we identified three recessive loci with large effects that could not have identified by the additive model. This include a region let by an INDEL associated with cardiovascular disease in CACNB4 (rs201654520, minor allele frequency [MAF] = 0.017, odds ratio [OR] = 19.02, p = 4.32 × 10-8), a lous near PELO associated with type 2 diabetes with the greatest odds ratio for type 2 diabetes in Europeans reported to date (rs77704739, MAF= 0.036, OR = 4.32, p = 1.75 × 10-8), and a rare INDEL associated with age-related macular degeneration near THUMPD2 (rs557998486, MAF= 0.009, OR = 10.5, p = 2.75 × 10-8). Despite the phenotype discrepancies and different demographical characteristics of the GERA cohort and UK Biobank, four of the novel loci were replicated with an equivalent phenotype in UK Biobank, and we found additional supporting associations in related traits, treatments or biomarkers in UK Biobank for the remaining novel loci. Of note, PELO and THUMPD2 recessive loci were replicated using the recessive model in UK Biobank (combined results: PELO, rs77704739, OR = 2.46, p = 4.68 × 10-11, and THUMPD2, rs557998486, OR = 26.51, p = 3.29 × 10-8), which could not have been found with the additive model. Overall, these results highlight the importance of performing a comprehensive analysis of the full spectrum of genetic variation and considering non-additive models when performing GWAS, especially with well-powered biobanks and the increasing ability to impute low-frequency variants. For the benefit of the research community, we make available both GUIDANCE to boost the analysis of existing and ongoing GWAS projects, and the GERA cohort results, which constitute the largest non-additive genetic variation association database to date, through the Type 2 Diabetes Knowledge Portal (http://www.type2diabetesgenetics.org).

Keywords

Genètica; Genética; Genetics; Epidemiologia genètica; Epidemiología genética; Genetic epidemiology; Bioinformàtica; Bioinformática; Bioinformatics

Subjects

577 - Biochemistry. Molecular biology. Biophysics

Knowledge Area

Ciències Experimentals i Matemàtiques

Note

Programa de Doctorat en Biomedicina / Tesi realitzada al Barcelona Supercomputing Center (BSC)

Documents

MGM_PhD-THESIS.pdf

44.46Mb

 

Rights

L'accés als continguts d'aquesta tesi queda condicionat a l'acceptació de les condicions d'ús establertes per la següent llicència Creative Commons: http://creativecommons.org/licenses/by-nc-nd/4.0/
L'accés als continguts d'aquesta tesi queda condicionat a l'acceptació de les condicions d'ús establertes per la següent llicència Creative Commons: http://creativecommons.org/licenses/by-nc-nd/4.0/

This item appears in the following Collection(s)