Universitat de Barcelona. Departament de Bioquímica i Biomedicina Molecular
[eng] The understanding of proteins as dynamical entities rather than static structures marked a very significant advance in the interpretation of their functional role in life. The capacity of proteins to interact with their environment, sense molecular perturbations and exert responses can be explained in an effective manner by specific dynamical events. The study of proteins from this perspective has been possible in the last decades thanks to the emergence of computational approaches. Among these techniques, Molecular Dynamics (MD) simulations have emerged as a potent tool, playing a pivotal role in investigating conformational transitions at atomic resolution across diverse biomacromolecular systems. As computational power and infrastructures keep evolving, we are increasingly able to generate longer MD simulations that are capable of capturing dynamical events at biologically relevant timescales. MD simulations typically generate an overwhelming amount of data in the form of a collection of snapshots, called a trajectory. Thus, we need to find suitable metrics to extract, quantify and present the relevant information depending on the target of the study. The scenario is even more challenging when we aim to analyze multiple trajectories and compare their similarity. Among the proposed strategies to explore the comparability between trajectories, essential dynamics analysis (EDA) approaches are a common choice, where Principal Components Analysis (PCA) or other dimensionality reduction techniques are applied to express the differential behavior between trajectories in terms of the underlying collective features of the ensemble. The work presented in this thesis delves further into this analytical field with the aim of improving the applicability of EDA in functional studies of proteins. The developed approach, termed Consensus Essential Dynamics Analysis (CEDA), introduces a protocol to integrate the information from independent PCAs and derive a consensus set of vectors, the Consensus Principal Components (CPCs). CPCs encapsulate the most representative (consensus) collective motions of an ensemble of trajectories of the system under study, allowing for sharper descriptions and comparisons of its relevant dynamical events. The framework of CEDA also facilitates the comparative study of alternative trajectory ensembles of the same system, in terms of the reference set of CPCs. The outcomes of such comparisons may be interpreted using different data analysis techniques and graphical representations. In this thesis, a strategy was proposed to evaluate the underlying similarities and differences between trajectory ensembles by comparison of their conformational profiles and application of similarity metrics between statistical distributions. The capacities of the CEDA protocol were demonstrated with the analysis of a collection of MD simulations of human erythrocyte pyruvate kinase (PKR) that covers multiple conditions of the enzymatic complex with its natural ligands, as well as a large array of human genomic missense variants of the protein. Pyruvate kinase is among the most studied proteins from the perspective of biochemistry, given both its role in glycolysis and its paradigmatic and complex set of allosteric properties. This study has provided new support for several of the proposed conformational changes that are associated with the transition between the inactive and active states of the enzyme. Following from the study of the wild-type protein, a second experimental part of the project revolved around the characterization of the functional effects of missense variants of the enzyme. Analysis with CEDA enabled detection of altered dynamical behavior in variants either with a previously validated pathogenic status or for which no functional details were previously known. The conducted research in this regard is presented in depth throughout this manuscript. The obtained results are discussed in the light of the potential application of this protocol in functional studies of proteins in general, and with a particular perspective on pathogenicity prediction studies.
Bioinformàtica; Bioinformática; Bioinformatics; Dinàmica molecular; Dinámica molecular; Molecular dynamics; Proteïnes; Proteínas; Proteins; Proteïnes quinases; Proteínas quinasas; Protein kinases
577 - Biochemistry. Molecular biology. Biophysics
Ciències Experimentals i Matemàtiques
Programa de Doctorat en Biomedicina / Tesi realitzada al Barcelona Supercomputing Center (BSC)