dc.description.abstract
[eng] In recent years, the genetics field has placed a significant emphasis on
identifying and characterizing genetic factors contributing to complex diseases,
alongside environmental factors. Genome-wide association studies (GWAS)
have emerged as one of the principal methodologies for this purpose, as they
analyze extensive genetic and phenotypic data from multiple individuals to
identify genetic variations associated with specific traits. This approach has
advanced our understanding of the genetic architecture of complex diseases,
allowing the development of prevention strategies and genetic risk estimation.
However, despite progress, much information remains to be uncovered, leading
to a heritability discrepancy, which refers to the difference between heritability
estimated in population studies and that explained by known genetic
variations.
Many methodological and statistical limitations are slowing down the
identification of the genetic variation associated with the risk to develop
complex diseases. Current GWAS rely on Single Nucleotide Polymorphisms
(SNP) arrays that have a limited number of variants. To overcome this, the
number of variants analyzed can be augmented through imputation of pre-
existing genetic variants from reference panels. However, reference panels
frequently exclude rare variants and structural variants (SVs) which results in
these variants not being considered in the imputation process leading to
potential missed associations.
Another element neglected in most studies of complex diseases is the X
chromosome, which is one of the two sex chromosomes and has unique biology
that results in different copy number in females and males. When examining
the SNP-trait associations reported in the National Human Genome Research
Institute's (NHGRI) GWAS catalog, a clear shortfall in the representation of the
X chromosome becomes apparent. Still, only 0.5% of the known associations
map on chromosome X. This under-representation is primarily due to the
methodological challenges associated with its analysis. The unique pattern of
inheritance and the effects of allelic inactivation in females can result in allelic
imbalances between the sexes and decrease the statistical power during
genetic association studies.
In this thesis, we aim to address these challenges by creating a comprehensive
genetic resource, consisting of a haplotype map, particularly enriched in well
characterized, and phased SVs; and deal with the gap in X-chromosome
analysis by designing, implementing and applying a targeted methodology for
the study of the role of the X-chromosome across multiple phenotypes.
The haplotype map was generated using 785 Illumina high coverage (30x)
whole-genomes from the Iberian GCAT Cohort with multiple variant identification methods and Logistic Regression Models (LRMs) for their
validation. The resulting catalog includes 35,431,441 variants, including 89,178
SVs (≥50 bp), 30,325,064 SNVs and 5,017,199 indels, across all individuals in
the cohort. The haplotype panel demonstrates improved imputation
capabilities, with 14,360,728 SNVs/indels and 23,179 SVs being imputed,
representing a 2.7-fold increase in SVs compared to other available genetic
variation panels. This panel's significance is highlighted by the imputation of a
rare Alu element located in a new locus associated with Mononeuritis of the
lower limb, a rare neuromuscular disease. This study represents the first in-
depth characterization of genetic variation in the Iberian population and the
first haplotype panel that systematically includes SVs in genome-wide genetic
studies.
The X-Chromosome targeted strategy was designed and applied to nearly
800,000 individuals across 600 phenotypes from publicly available cohorts (UK
Biobank and dbGaP). This pipeline includes the data collection process, a
specific and fundamental quality control for the X-chromosome analysis and
the phasing, imputation and association process, which was performed by
splitting females and males and then meta-analyzing the results, thus allowing
to detect sex-differences.
Our analysis of nearly 500,000 X-linked variants, including SVs, resulted in 96
significant associations with 77 traits, with 75 of these being novel. By
incorporating sex-specific analyses, we identified 41 loci with different behavior
between males and females. These findings give us insight into the level of
missing information and the X chromosome's potential role in complex
diseases, as well as its contribution to sex-specific risk and manifestation.
In conclusion, this work highlights the importance of considering SVs and the
chromosome X in genetic studies, particularly in the context of exploring the
genetic architecture of human complex diseases. The findings offer a valuable
asset for further examination of the genetic components that contribute to
complex diseases, marking a progression towards a more complete
comprehension of the genetic landscape and its effects on human health.
ca