Universitat de Barcelona. Facultat de Biologia
[eng] One of the major and most challenging goals of Biomedicine during the last centuries has been the study of the human biological mechanisms, and its relation with traits and diseases. Particularly, in the case of complex diseases, such as Type 2 Diabetes (T2D), asthma or Alzheimer, special interest has been devoted to understanding the underlying molecular mechanisms that affect the development of complex diseases, and the biological processes involved in the preservation of these diseases across generations (genetic basis). In this direction, during the last decades, the advance of computing as well as the development of new DNA-related technologies has largely contributed to the faster development of methods, tools, and resources, which have enhanced the genetic study of traits and diseases. As a result of this revolution, new specialised fields such as Biomedicine, Bioinformatics, and Computational genomics have emerged to find the genomic basis of disease using computational tools. Hence, the identification of the genetic factors behind complex diseases has evolved into a multidisciplinary effort, which combines disciplines as diverse as Biology, Mathematics, Physics, Chemistry, and Information technology. The Computational genomics field, in the context of Biomedicine, focuses on the study of the relationship between genomic changes (variants) and the predisposition or the offset of disease with the final aim of understanding, predict and prevent diseases and, ultimately, to design better treatments. In this direction, numerous contributions have been made in this field to discover variants associated with the risk of developing a disease, and to interpret these associations in terms of function. Notably, some of these contributions, such as the assembly and annotation of the human reference genome, improvements on disease characterization, the better understanding of the effects of genomic variation in different populations, or the introduction of Genome Wide Association Studies (GWAS), have represented very relevant landmarks for the advance on the understanding of the genetic basis of diseases. Particularly, the broad use of GWAS, which mostly relies on the statistical comparison between the variants present in groups of diseased and non-diseased individuals, have led to the discovery of thousands of genomic variants associated with a great diversity of complex traits and diseases. Despite the great success of GWAS, the multiple limitations surrounding this type of approaches, has converted the study of complex diseases into a still challenging problem. Particularly, there are many elements, such as the need of analysing large cohorts of individuals, or the difficulties to generate a complete model to capture the whole complexity of common traits, which limit the discovery power of GWAS. Therefore, reducing the explanation of disease heritability, based on GWAS findings, to a small fraction. Moreover, the lack of biological and functional interpretation of the results obtained from GWAS has complicated its translation into something meaningful to be applied in the clinics. Consequently, many statistical and computational efforts have been devoted to improve GWAS discovery power, and to develop new analytical frameworks to find new disease- susceptibility variants. Additionally, other biological approaches, such as transcriptomics and epigenetics have emerged as a key to facilitate the interpretation of GWAS outcomes. Finally, the need for accessibility to this valuable genomic, transcriptomic and epigenetic information has led to the generation of a wide diversity of publicly available databases. This is the case of Type 2 diabetes (T2D), which is a complex metabolic disorder mainly known to be caused by islet beta-cell dysfunction usually surrounded by a background of insulin resistance. T2D is an example of a common disease that has been broadly studied from the perspective of different omic layers. Particularly, the genetic study of T2D has led to the discovery of more than 700 genomic variants significantly associated with the disease, thousands of genes with a putative effect on the disorder, and thousands of target genomic regions with potential regulatory effects. However, although the genomic explanation of its heritability is estimated around 70%, approximately only 20% has been already explained and, most importantly, the use of these markers to detect the predisposition of an individual to develop the disease is still far for the clinics. Additionally, most of these genomic signals lack functional explanation, thus representing a challenge for the understanding of disease pathophysiology. Consequently, the general objective of this thesis is to broaden the genetic understanding of complex diseases, focusing on the analysis of T2D, by finding new disease-susceptibility loci and improving the functional interpretation of genetic markers. In this direction, the objectives of this thesis can be summarised in: 1) Discover epistatic groups of variants associated with T2D, applying combined machine learning and statistical approaches, and analyse their underlying molecular mechanisms to enhance the early detection of the disease and a better comprehension of its pathophysiology. 2) Generate a comprehensive database of human pancreatic islets gene expression regulatory variation, which integrates genomic, transcriptomic and epigenetic data related to diseases, genes and variants to improve the functional study of T2D and other islets related traits (Alonso, Piron, Morán, & et al., 2021). Additionally, this thesis recapitulates the participation in two studies with the objectives: 3) Support the relevance of inversions and their effect in islets expression to improve the genetic knowledge about the shared-susceptibility of complex diseases (González et al., 2020). 4) Review current GWAS statistical frames to promote the development of new methods and tools that can enhance the study of complex diseases (Alonso, Morán, Salvoro, & Torrents, 2021). Therefore, I start this document with a detailed introduction that aims to facilitate the comprehension and motivation of this study, followed by the hypotheses related to milestones 1-2), and the corresponding list of objectives. This section is followed by a report made by Dr. David Torrents, the director of this thesis, summarising my trajectory during the PhD, and detailing my contributions to the studies related to milestones 1-4) during this period. This report is followed by a brief summary of the studies presented in this thesis. Then, for the study of milestone 1), an unpublished manuscript is provided summarising the preliminary results obtained from the analysis of variant-variant interactions and its association with T2D using machine learning and statistical approaches. Therefore, describing and discussing the last advances done, specifying the methods used, and discussing the outcomes and limitations of the preliminary analyses. Next, a publication is provided to support the results obtained from the study of milestone 2). Thus, detailing and discussing the human pancreatic islets gene expression variation results that constitute the core of the database. Additionally, two appendix sections have been provided in this document to include the publication and review related to milestones 3-4). Finally, the global results obtained from the study of milestones 1) and 2) are summarised and discussed, and a list of conclusions is provided to briefly recapitulate the main outcomes of this thesis.
Genòmica; Genómica; Genomics; Expressió gènica; Expresión génica; Gene expression; Diabetis; Diabetes; Illots de Langerhans; Islotes de Langerhans; Islands of Langerhans
575 - Genética general. Citogenética general. Inmunogenética. Evolución. Filogenia
Ciències Experimentals i Matemàtiques
Tesi realitzada al Barcelona Supercomputing Center (BSC)
Facultat de Biologia [236]