Structural analysis of macromolecular folds and the application to phasing

Author

Medina Bernal, Ana del Rocío

Director

Usón, Isabel

Sammito, Massimo Domenico

Tutor

Badia Palacín, Josefa

Date of defense

2021-11-19

Pages

92 p.



Department/Institute

Universitat de Barcelona. Facultat de Farmàcia i Ciències de l'Alimentació

Abstract

Determining and understanding protein structure in atomic detail is a fundamental process to the advancement of biotechnology and biomedicine, shedding light the role of macromolecules and their complexes, their biological functions and pathways. The work presented in this thesis contributes to this aim by exploring new ways of identifying, analysing and classifying small, disconnected fragments and their association into local folds to provide specialised input models for X-ray crystallography phasing methods and as a general structural bioinformatics tool to enrich our insight of the results. The first part of this work has developed algorithms capturing the complexity of the backbone atoms by the use of geometric descriptors called Characteristic Vectors. The context of each amino acid is abstracted into a vector. The method encodes structural properties in a graph, whose exploration allows performing secondary and tertiary structure annotation, decomposition of a structure into compact local folds, fragment comparison and superposition, generation of fragment libraries from geometrical constraints, and the identification and classification of protein interfaces and new unique local folds. The results of this thesis have been implemented in the program ALEPH. ALEPH is also deeply integrated within the ARCIMBOLDO framework as a bioinformatics tool used in multiple contexts, fundamentally providing model hypotheses for fragment-based phasing. In ARCIMBOLDO small and accurate fragments are placed with Phaser and solutions are identified and expanded to the full structure via density modification and auto- tracing within SHELXE. In addition, in this work, new strategies addressing multimeric structures have been developed. Multimers are very frequent and their presence increases the complexity of the structure to be determined, especially for fragment-based approaches that must rely on a very small percentage of the total scattering. The problem has been successfully overcome in a computationally efficient way. The usefulness of the methods developed has been established in their practical application to 10 previously unknown macromolecular structures. Very recently, new efficient and powerful computational approaches based on deep-learning have made a breakthrough in the accuracy of three-dimensional models predicted from the sequence. As of July 2021, the tools to compute such models have been made available. We did not want to conclude this work, without a preliminary exploration on the use of AlphaFold2 and RoseTTAFold models in the context of phasing with small fragments. ALEPH and ARCIMBOLDO and their graphical user interfaces have been distributed worldwide and have been successfully used in their published work by independent groups and collaborators.

Keywords

Biologia molecular; Biología molecular; Molecular biology; Bioinformàtica; Bioinformática; Bioinformatics; Macromolècules; Macromoléculas; Macromolecules; Cristal·lografia; Cristalografía; Crystallography; Estructura cristal·lina (Sòlids); Estructura cristalina (Sólidos); Layer structure (Solids)

Subjects

577 - Biochemistry. Molecular biology. Biophysics

Knowledge Area

Ciències Experimentals i Matemàtiques

Note

Programa de Doctorat en Biotecnologia / Tesi realitzada a l'Institut de Biologia Molecular de Barcelona (IBMB-CSIC)

Documents

AdRMB_PhD_THESIS.pdf

11.54Mb

 

Rights

L'accés als continguts d'aquesta tesi queda condicionat a l'acceptació de les condicions d'ús establertes per la següent llicència Creative Commons: http://creativecommons.org/licenses/by-nc-sa/4.0/
L'accés als continguts d'aquesta tesi queda condicionat a l'acceptació de les condicions d'ús establertes per la següent llicència Creative Commons: http://creativecommons.org/licenses/by-nc-sa/4.0/

This item appears in the following Collection(s)