Development of Computational Workflows for Volatile Metabolomics applied to Life Sciences

Author

Mallafré Muro, Celia

Director

Pardo Martínez, Antonio

Femández Romero, Luis

Tutor

Pardo Martínez, Antonio

Date of defense

2024-06-03

Pages

156 p.



Department/Institute

Universitat de Barcelona. Facultat de Física

Abstract

[eng] The field of medicine and life sciences faces a significant challenge due to the overwhelming influx of information and a rapidly increasing volume of data. Addressing this data abundance necessitates the integration of essential tools such as statistical analysis, data processing, and machine learning. In addition to managing vast amounts of data, the field encounters issues of reproducibility in studies related to metabolic profiles. Despite a growing number of studies, inconsistencies arise due to experimental design problems and instrumental limitations. Experimental design issues encompass factors influencing measurement campaigns, unbalanced groups, and confounding variables, leading to inadequate control of study metadata and potentially erroneous conclusions. Instrumental limitations, inherent to instruments, include insufficient selectivity, poor reproducibility, time drift, and non-linear responses. These challenges underscore the importance of effective data analysis tools in addressing the complexity introduced by these factors. To address the need for consistent and repeatable research results, scientists are promoting the creation of standard procedures, both in experiments and computer analyses, to reduce variations in scientific studies. Signal pre-processing emerges as a crucial step in improving data quality before analysis, often overlooked but essential, particularly in applications involving Artificial Intelligence (AI) models. The underlying principle of "garbage in, garbage out" underscores the significance of enhancing data quality through effective pre-processing. In the thesis it is emphasized the importance of considering the tools and purposes involved in preparing data, especially when measuring Volatile Organic Compounds (VOCs). The challenges connected to the measuring instruments like electronic noses (e-Nose), gas chromatography-mass spectrometry (GC-MS), and ion mobility spectrometry (IMS), each having different costs, sensitivities, and complexities, are explained. The goal of the thesis is to introduce computerized workflows that suit these instruments, focusing on the analysis of VOCs data. The main idea is that by using proper data preparation steps, along with machine learning, we can overcome limitations in studying these compounds, improving consistency, and dealing with specific challenges in the instruments, like selectivity, time drift, and non-linear behaviour. The emphasis is on comprehensive pre-processing workflows that account for potential limitations in volatilomics, breathomics, and foodomics data to ensure reproducibility. The overarching goal is to understand and address issues leading to the lack of reproducibility in volatilomics within medicine and other life sciences. The thesis intends to delve into the concept of volatilomics, its significance across various life science fields, including foodomics and cancer research, and discuss commonly used instruments. The complexity of volatilomics data and signals obtained from these instruments will be explored, followed by the development of proposed workflows aimed at enhancing reproducibility in VOCs analysis. The thesis's key conclusions emphasize the significant potential of volatilomics in applications such as foodomics and disease diagnosis within the life sciences. Acknowledging the complexity of life sciences samples, highly sensitive instrumentation is deemed necessary. Different instruments for metabolomics analysis exhibit varying sensitivity and complexity, influencing the obtained signals' complexity. Well-established experimental protocols underscore the need for defined computational workflows in pre-processing to ensure reproducibility. While GC-IMS holds promise in volatilomics, its expansion is hindered by a lack of open-source software. Proposed application-dependent workflows, with a general tip to correct noise before baseline removal and align peaks as the final step, are highlighted. The development of an R package for GC-IMS pre-processing demonstrated discriminatory capabilities between male and female urine samples. The recent release of opensource software for Python and R has facilitated the application of GC-IMS in volatilomics, particularly in foodomics, where promising results have been obtained. In biomedicine, GC-IMS holds potential for colorectal cancer detection, emphasizing the need for further research with rigorous methodologies.


[spa] Esta tesis de doctorado se centra en el desarrollo de flujos de trabajo computacionales para el análisis de metabolitos volátiles en diversas áreas de las ciencias de la vida. Se enfoca en la volatilómica, un campo en crecimiento que está recibiendo más atención. La tesis busca establecer flujos de trabajo computacionales estandarizados para garantizar la reproducibilidad en la ciencia. Destaca la importancia de la volatilómica en aplicaciones como foodómica, análisis del aliento y diagnóstico médico. Reconociendo la complejidad de las muestras de volatilómica y los desafíos con la instrumentación especializada, la tesis explora la necesidad de flujos de trabajo sofisticados para interpretar datos con precisión. Examina a fondo los instrumentos utilizados, comprendiendo las señales y complejidades asociadas con cada uno. Identifica problemas de datos y propone técnicas para mitigar su impacto, presentando metodologías adaptadas a desafíos específicos. La tesis presenta tres flujos de trabajo diseñados para procesar datos de Cromatografía de Gases-Espectrometría de Movilidad Iónica (GC-IMS), una técnica crucial en metabolómica volátil. Estos flujos de trabajo se adaptan a aplicaciones específicas, como foodómica, accesibilidad y detección de cáncer de colon. En resumen, esta investigación avanza las metodologías computacionales en metabolómica volátil, estableciendo una base sólida para enfrentar desafíos en diversas aplicaciones de las ciencias de la vida. Proporciona soluciones estandarizadas y automatizadas para mejorar la confiabilidad y reproducibilidad de los análisis metabolómicos volátiles, allanando el camino para una adopción y aplicación más amplias en la investigación científica y el diagnóstico médico.

Keywords

Ciències de la vida; Ciencias de la vida; Life sciences; Metabolòmica; Metabolómica; Metabolomics; Cicle de treball; Ciclo de trabajo; Workflow; Teoria de la computació; Teoría de la computación; Theory of computation

Subjects

62 - Engineering. Technology in general

Knowledge Area

Ciències Experimentals i Matemàtiques

Note

Programa de Doctorat en Enginyeria i Ciències Aplicades

Documents

CMM_PhD_THESIS.pdf

29.56Mb

 

Rights

ADVERTIMENT. Tots els drets reservats. L'accés als continguts d'aquesta tesi doctoral i la seva utilització ha de respectar els drets de la persona autora. Pot ser utilitzada per a consulta o estudi personal, així com en activitats o materials d'investigació i docència en els termes establerts a l'art. 32 del Text Refós de la Llei de Propietat Intel·lectual (RDL 1/1996). Per altres utilitzacions es requereix l'autorització prèvia i expressa de la persona autora. En qualsevol cas, en la utilització dels seus continguts caldrà indicar de forma clara el nom i cognoms de la persona autora i el títol de la tesi doctoral. No s'autoritza la seva reproducció o altres formes d'explotació efectuades amb finalitats de lucre ni la seva comunicació pública des d'un lloc aliè al servei TDX. Tampoc s'autoritza la presentació del seu contingut en una finestra o marc aliè a TDX (framing). Aquesta reserva de drets afecta tant als continguts de la tesi com als seus resums i índexs.

This item appears in the following Collection(s)