Enforcing secondary and tertiary structure for crystallographic phasing. Developing ARCIMBOLDO and BORGES

Author

Sammito, Massimo Domenico

Director

Usón, Isabel

Tutor

Badía Palacín, Josefa

Date of defense

2015-06-22

Legal Deposit

B 18341-2015

Pages

189 p.



Department/Institute

Universitat de Barcelona. Facultat de Biologia

Abstract

ARCIMBOLDO is an ab initio phasing method for macromolecular crystallographic X-ray diffraction data, which combines location of model fragments such as polyalanine α- helices with the program PHASER and density modification and main chain autotracing with the program SHELXE. The method has been named after the Italian painter Giuseppe Arcimboldo (1526-1593), who used to compose portraits out of common objects such as fruits and vegetables. Following the analogy, ARCIMBOLDO composes an unknown structure by assembling small secondary structure elements, which are conserved across families of unrelated tertiary structure. Exploiting this method requires a multi-solution approach due to the difficulty to recognize correct solutions at early stages. Moreover, phasing a structure starting from partial information provided by such a small percentage of the total model (around 10% of the main chain atoms) is challenging and requires evaluation of alternative hypotheses under statistical constraints to avoid combinatorial explosion. ARCIMBOLDO methods have proven successful in many cases of previously unknown structures[3] and also on a pool of test structures[4]. The program can accept any Sohnke space group and all the most frequent ones are represented in the pool of structures solved so far. In both studies data were collected in the most common protein space groups. Data quality is crucial for phasing methods, and particularly sensitive for ARCIMBOLDO, where low resolution (worse than 2.1 Å) and lack of completeness (less than 98%) drastically decrease the chance of success. Location of secondary structure elements is not indicated as phasing method for large structures or complexes (over 400 residues) unless very long helices are present and high resolution data are available. Such cases would require the placement of many fragments in order to assemble 10% of the main chain, which can lead to an unmanageable number of solutions. To approach correctly this different scenario we have implemented dedicated methods in ARCIMBOLDO_BORGES[7] and ARCIMBOLDO_SHREDDER[8]. These programs exploit libraries of folds or large search models and are described later in the text. The current implementation[4], coded in Python, is deployed as a standalone binary, freely available under registration from http://chango.ibmb.csic.es/download. The binary is compatible with common Linux distributions and latest versions of the Mac OSX operating system. Users can find online manuals, tutorials and documentation in our website. As of 30th April 2015, it has been downloaded 664 times and distributed to 121 research groups; furthermore, it has been installed in many European synchrotron facilities such as the Alba Synchrotron in Spain, the Diamond Light Source in United Kingdom and SOLEIL Synchrotron in France. The software is also available through SBGrid Consortium (https://sbgrid.org), a network of institutions across 19 countries, which provides a distributed grid network of computers to run structural biology software. We have recently started a collaboration with the San Diego Supercomputer Center (http://www.sdsc.edu) in California (USA), to develop optimized and dedicated versions of the programs for their platform with the aim of addressing difficult phasing cases. Due to this recent spread in the crystallographic community ARCIMBOLDO has been presented in many international conferences such as the International Union of Crystallography Meeting in Madrid (ES) 2011 and in Montreal (CA) 2014; the European Crystallographic Meeting in Bergen (NO) 2012, Warwick (UK) 2013; and many schools and workshops such as the International School of Crystallography in Erice (IT) 2012 and Macromolecular Crystallography School in Madrid (ES) 2014. This thesis is organised in the standard scientific format comprising five main parts: 1. INTRODUCTION: introducing the theoretical topics directly or indirectly related to the contents of the thesis and also discussing the state of the art of current scientific production related to the objective proposed. 2. OBJECTIVES: listing all general goals and particular aims of the doctoral project conducted. 3. MATERIALS AND METHODS: detailing the hardware and software environment, including third party software and algorithms employed in the project. 4. RESULTS AND DISCUSSION: presenting all the produced algorithms, software, experiments and tests that correspond to the prefixed objectives. 5. CONCLUSION: summarising the whole project and listing its achievements by the end of the doctoral studies.


ARCIMBOLDO es un método de resolución de estructuras macromoleculares cristalográficas ab intio, que combina la localización de pequeños fragmentos modelo tales como hélices alfa con modificación de la densidad electrónica y trazado automático de la cadena polipeptídica. El método ha sido denominado como el pintor italiano Giuseppe Arcimboldo, quien componía retratos con objetos comunes, tales como libros o vegetales. De modo análogo, nuestro método compone hipótesis estructurales colocando pequeños fragmentos de estructura, cuando la subestructura resultante es suficientemente próxima a la real, la modificación de la densidad electrónica muestra su “retrato”. Si por el contrario la hipótesis es incorrecta, el resultado es un mero “bodegón”. El presente trabajo se centra en el desarrollo de este método y su extensión del uso de fragmentos de estructura seundaria a pequeños plegamientos locales y estructuras terciarias derivadas de modelos de baja homología. • El método se ha caracterizado por necesitar computación masiva para tratar la enorme cantidad de hipótesis generadas, pero en el presente trabajo hemos implementado una versión tan optimizada que resuelve estructuras cristalográficas en una estación de trabajo única • Más allá de las hélices alfa de polialanina, se ha extendido el uso a fragmentos cortados de modelos de baja homología, desarrollando un método para determinar y extraer la subestructura de más similaridad contra los datos experimentales, en concreto la función de rotación. Implementación de SHREDDER. • La extensión del método ab initio del uso de estructura secundaria a terciaria requería utilizar librerías de hipótesis de plegamiento que representaran una vasta colección de posibilidades. Se ha desarrollado una herramienta para generar tales librerías, el programa BORGES y un formalismo subyacente basado en vectores característicos. • Desarrollo de un método e implementación de un programa para resolver estructuras empleando las librerías de plegamientos inespecíficos: ARCIMBOLDO_BORGES. Todos estos objetivos se han cumplido satisfactoriamente.

Keywords

Cristal·lografia; Cristalografía; Crystallography; Proteïnes; Proteínas; Proteins

Subjects

548 - Crystallography

Knowledge Area

Ciències Experimentals i Matemàtiques

Note

Tesi realitzada a l'Institut de Biologia Molecular de Barcelona (IBMB-CSIC)

Documents

MDS_PhD_THESIS.pdf

5.864Mb

 

Rights

L'accés als continguts d'aquesta tesi queda condicionat a l'acceptació de les condicions d'ús establertes per la següent llicència Creative Commons: http://creativecommons.org/licenses/by-nc-sa/3.0/es/
L'accés als continguts d'aquesta tesi queda condicionat a l'acceptació de les condicions d'ús establertes per la següent llicència Creative Commons: http://creativecommons.org/licenses/by-nc-sa/3.0/es/

This item appears in the following Collection(s)