Deciphering the functional organization of molecular networks via graphlets-based methods and network embedding techniques

Author

Doria Belenguer, Sergio

Director

Przujl, Natasa

Tutor

Gelpi Buchaca, Josep Lluís

Date of defense

2023-10-07

Pages

172 p.



Department/Institute

Universitat de Barcelona. Facultat de Biologia

Abstract

[eng] Advances in capturing technologies have yielded a massive production of large-scale molecular data that describe different aspects of cellular functioning. These data are often modeled as networks, in which nodes are molecular entities, and the edges connecting them represent their relationships. These networks are a valuable source of biological information, but they need to be untangled by new algorithms to reveal the information hidden in their wiring patterns. State-of-the-art approaches for deciphering these complex networks are based on graphlets and network embeddings. This thesis focuses on the development of novel algorithms to overcome the limitations of the current graphlet and network embedding methodologies in the field of biology. Graphlets are a powerful tool for characterizing the local wiring patterns of molecular networks. However, current graphlet-based methods are mostly applicable to unweighted networks, whereas real-world molecular networks may have weighted edges that represent the probability of an interaction occurring in the cell. This probabilistic information is commonly discarded when applying thresholds to generate unweighted networks, which may lead to information loss. To address this challenge, we introduce probabilistic graphlets, a novel approach that can capture the local wiring patterns of weighted networks and uncover hidden probabilistic relationships between molecular entities. We use probabilistic graphlets to generalize the graphlet methods and apply these to the probabilistic representation of real-world molecular interactions. We show that probabilistic graphlets robustly un- cover relevant biological information from the molecular networks. Furthermore, we demonstrate that probabilistic graphlets exhibit a higher sensitivity to identifying condition-specific functions compared to their unweighted counterparts. Network embedding algorithms learn a low-dimensional vectorial representation for each gene in the network while preserving the structural information of the molecular network. Current, available embedding approaches strictly focus on clustering the genes’ embedding vectors and interpreting such clusters to reveal the hidden information of the biological networks. Thus, we investigate new perspectives and methods that go beyond gene-centric approaches. First, we shift the exploration of the embedding space’s functional organization from the genes to their functions. We introduce the Functional Mapping Matrix and apply it to investigate the changes in the organization of cancer and control network embedding spaces from a functional perspective. We demonstrate that our methodology identifies novel cancer-related functions and genes that the currently available methods for gene-centric analyses cannot identify. Finally, we go even further and switch the perspective from the organization of the embedded entities (genes and functions) in the embedding space to the space itself. We annotate axes of the network embedding spaces of six species with both, functional annotations and genes. We demonstrate that the embedding space axes represent coherent cellular functions and offer a functional fingerprint of the cell’s functional organization. Moreover, we show that the analysis of the axes reveals new functional evolutionary connections between species.

Keywords

Ciències de la salut; Ciencias biomédicas; Medical sciences; Biometria; Biometría; Biometry

Subjects

577 - Material bases of life. Biochemistry. Molecular biology. Biophysics

Knowledge Area

Ciències de la Salut

Note

Programa de Doctorat en Biomedicina / Tesi realitzada al Barcelona Supercomputing Center (BSC)

Documents

SDB_PhD_THESIS.pdf

20.65Mb

 

Rights

L'accés als continguts d'aquesta tesi queda condicionat a l'acceptació de les condicions d'ús establertes per la següent llicència Creative Commons: http://creativecommons.org/licenses/by-nc/4.0/
L'accés als continguts d'aquesta tesi queda condicionat a l'acceptació de les condicions d'ús establertes per la següent llicència Creative Commons: http://creativecommons.org/licenses/by-nc/4.0/

This item appears in the following Collection(s)