Universitat de Barcelona. Facultat de Farmàcia i Ciències de l'Alimentació
Determining and understanding protein structure in atomic detail is a fundamental process to the advancement of biotechnology and biomedicine, shedding light the role of macromolecules and their complexes, their biological functions and pathways. The work presented in this thesis contributes to this aim by exploring new ways of identifying, analysing and classifying small, disconnected fragments and their association into local folds to provide specialised input models for X-ray crystallography phasing methods and as a general structural bioinformatics tool to enrich our insight of the results. The first part of this work has developed algorithms capturing the complexity of the backbone atoms by the use of geometric descriptors called Characteristic Vectors. The context of each amino acid is abstracted into a vector. The method encodes structural properties in a graph, whose exploration allows performing secondary and tertiary structure annotation, decomposition of a structure into compact local folds, fragment comparison and superposition, generation of fragment libraries from geometrical constraints, and the identification and classification of protein interfaces and new unique local folds. The results of this thesis have been implemented in the program ALEPH. ALEPH is also deeply integrated within the ARCIMBOLDO framework as a bioinformatics tool used in multiple contexts, fundamentally providing model hypotheses for fragment-based phasing. In ARCIMBOLDO small and accurate fragments are placed with Phaser and solutions are identified and expanded to the full structure via density modification and auto- tracing within SHELXE. In addition, in this work, new strategies addressing multimeric structures have been developed. Multimers are very frequent and their presence increases the complexity of the structure to be determined, especially for fragment-based approaches that must rely on a very small percentage of the total scattering. The problem has been successfully overcome in a computationally efficient way. The usefulness of the methods developed has been established in their practical application to 10 previously unknown macromolecular structures. Very recently, new efficient and powerful computational approaches based on deep-learning have made a breakthrough in the accuracy of three-dimensional models predicted from the sequence. As of July 2021, the tools to compute such models have been made available. We did not want to conclude this work, without a preliminary exploration on the use of AlphaFold2 and RoseTTAFold models in the context of phasing with small fragments. ALEPH and ARCIMBOLDO and their graphical user interfaces have been distributed worldwide and have been successfully used in their published work by independent groups and collaborators.
Biologia molecular; Biología molecular; Molecular biology; Bioinformàtica; Bioinformática; Bioinformatics; Macromolècules; Macromoléculas; Macromolecules; Cristal·lografia; Cristalografía; Crystallography; Estructura cristal·lina (Sòlids); Estructura cristalina (Sólidos); Layer structure (Solids)
577 - Biochemistry. Molecular biology. Biophysics
Ciències Experimentals i Matemàtiques
Programa de Doctorat en Biotecnologia / Tesi realitzada a l'Institut de Biologia Molecular de Barcelona (IBMB-CSIC)