Structural prediction and characterization of protein-RNA interactions / Predicción y caracterización estructural de interacciones proteína-ARN

Autor/a

Pérez Cano, Laura

Director/a

Fernández-Recio, Juan

Fecha de defensa

2013-06-28

Depósito Legal

B. 21953-2013

Páginas

277 p.



Departamento/Instituto

Universitat de Barcelona. Departament de Bioquímica i Biologia Molecular (Biologia)

Resumen

Computational methods are increasingly important to help to predict and characterize protein interactions. However, most of the efforts so far have focused on protein-protein and protein-ligand interactions, and few computer methods are available for modeling and characterizing protein-RNA interactions, in spite of their biological and biomedical importance. Given the difficulties and resource limitations of experimental procedures, developing computer methods for studying protein-RNA interactions is essential in order to get a better understanding of gene expression and cellular function. In this context, the main purpose of this thesis has been the development and application of computational methods for the structural prediction and characterization of protein-RNA complexes. This Doctoral thesis has fulfilled all the expected objectives. A more detailed summary is given below. In 2008 it was proposed the first protein-RNA docking case by the CAPRI (Critical Assessment of Prediction of Interactions) communitywide experiment. We devised a new protocol for this new challenge, based on our previous protein-protein docking programs, and obtained excellent results, generating the second best model among all participants. This experiment showed for the first time the potential of our new approach and the possibilities for further improvements. The next step was the extraction of statistical potentials to be applied for protein-RNA docking and interface prediction. For that purpose, we compiled the largest structural set of non-redundant protein-RNA complexes reported so far in order to derive individual and pairwise propensities of ribonucleotides and amino acid residues to be located at the binding interfaces. We found that the most significantly populated residues at protein-RNA interfaces were Arg, Lys and His, while the less favoured were Asp, Glu, Cys, Val, Leu and Ile. On the other hand, we did not observe a significant preference among the four types of ribonucleotides to be at protein-RNA interfaces. In the same line, pairwise propensities showed similar propensity values for the different types of ribonucleotides. We then developed the OPRA method to identify regions on protein surface with global preference to bind RNA. This method was tested with an independent set of known protein-RNA structures and showed to have a high positive predictive value for the prediction of residues involved in RNA binding. In addition, we found that this method was able to identify RNA-binding proteins. The next objective was the application of pairwise statistical potentials to the scoring of protein-RNA docking solutions. Unexpectedly, the statistical potentials showed worse predictive success rates than the FTDock scoring function (highly related to structural complementarity), although the results improved when both scoring terms were combined. However, we still needed more test cases in order to extract more reliable and general conclusions. Therefore, the next objective was to build a benchmark that could be used for the optimization and development of protein-RNA docking methods. For this, we collected as many non-redundant protein-RNA cases as possible with known complex structure and known or modeled structure for at least one of the subunits. This was the first publicly available protein-RNA docking benchmark and was composed of 106 cases, with 71 cases with at least the available unbound coordinates for one of the molecules, and 35 cases in which at least one of the molecules was built by homology modelling. One of the conclusions that emerged from the analysis of this set of structures is that protein-RNA complexes are much more flexible than protein-protein and even protein-DNA complexes. We then performed a docking study over the full protein-RNA docking benchmark which showed that the use of pairwise statistical potentials for identifying protein-RNA near-native solutions is noisy. The results confirmed that the best docking success determinant is structural interface complementarity as defined by parameters such as the FTDock score or the van der Waals energy. The combination of these efficient terms with electrostatics yields a scoring function that is able to identify high quality models in most of the cases when the bound coordinates of the interacting molecules are used. However, its efficiency in a more realistic scenario (using the unbound coordinates of the molecules) is highly dependent on the capability of sampling methods to generate high quality docking solutions. Results also underlined important differences with protein-protein interactions. The experience acquired during these more methodological parts of this PhD thesis has facilitated the application of computational methods to the study of translin, a highly conserved nucleic acid-binding protein of significant biomedical interest. By combining computational tools with experimental techniques, we contributed to the elucidation of the translin multimerization interfaces and nucleic acids binding sites and provided a first structural and dynamic picture of the functions carried out by the protein.


La caracterización estructural de complejos proteína-ARN es esencial para lograr una mayor comprensión en el campo de la biología molecular y la regulación celular. Los métodos computacionales de predicción estructural representan una alternativa rápida y poco costosa para la detección y caracterización de complejos biológicos. No obstante, en contraste con la gran variedad de métodos computacionales orientados a la predicción estructural de las interacciones proteína-proteína, existen muy pocos métodos enfocados al estudio de complejos proteína-ARN. En este contexto, el propósito principal de este proyecto de tesis ha sido el desarrollo y aplicación de métodos computacionales para el análisis, caracterización y predicción estructural de complejos proteína-ARN. Con este objetivo, en la primera fase de esta tesis doctoral, se han desarrollado nuevos protocolos para la predicción estructural de este tipo de complejos a partir de métodos de docking entre proteínas previamente descritos, y se han generado potenciales estadísticos por residuo, nucleótido y por pares residuo-nucleótido a partir de estructuras conocidas de complejos proteína-ARN. Dichos potenciales estadísticos por residuo se han aplicado al desarrollo de un método para la predicción de sitios de unión a ARN en proteínas y la identificación de proteínas que unen ARN. Por otro lado, se ha construido un conjunto de pruebas de complejos proteína-ARN para la evaluación de métodos de docking. Usando dicho conjunto de pruebas, se ha estudiado el poder predictivo de los potenciales estadísticos de pares residuo-nucleótido, así como otros términos energéticos, para la evaluación de soluciones de docking de complejos proteína-ARN y se ha desarrollado una nueva función de evaluación de posibles orientaciones de docking en complejos proteína-ARN, integrando aquellos términos energéticos más efectivos a nivel individual. La experiencia acumulada durante las fases iniciales de la tesis permitió la aplicación de técnicas de modelado computacional, en combinación con técnicas experimentales llevadas a cabo por colaboradores, al estudio de translin, una proteína de unión a ácidos nucleicos de gran interés biológico. Así pues, durante la fase final de este proyecto de tesis doctoral se contribuyó a la identificación de los sitios de multimerización y de unión a ácidos nucleicos en translin, y se propuso una primera aproximación estructural y dinámica de las funciones llevadas a cabo por la proteína, contribuyendo a resolver aspectos tan fundamentales como los determinantes estructurales de la unión a ARN.

Palabras clave

Proteïnes; Proteínas; Proteins; Àcids nucleics; Ácidos nucleicos; Nucleic acids

Materias

577 - Bioquímica. Biología molecular. Biofísica

Área de conocimiento

Ciències Experimentals i Matemàtiques

Documentos

LPC_THESIS.pdf

23.20Mb

 

Derechos

ADVERTIMENT. L'accés als continguts d'aquesta tesi doctoral i la seva utilització ha de respectar els drets de la persona autora. Pot ser utilitzada per a consulta o estudi personal, així com en activitats o materials d'investigació i docència en els termes establerts a l'art. 32 del Text Refós de la Llei de Propietat Intel·lectual (RDL 1/1996). Per altres utilitzacions es requereix l'autorització prèvia i expressa de la persona autora. En qualsevol cas, en la utilització dels seus continguts caldrà indicar de forma clara el nom i cognoms de la persona autora i el títol de la tesi doctoral. No s'autoritza la seva reproducció o altres formes d'explotació efectuades amb finalitats de lucre ni la seva comunicació pública des d'un lloc aliè al servei TDX. Tampoc s'autoritza la presentació del seu contingut en una finestra o marc aliè a TDX (framing). Aquesta reserva de drets afecta tant als continguts de la tesi com als seus resums i índexs.

Este ítem aparece en la(s) siguiente(s) colección(ones)