dc.description.abstract
[eng] The field of medicine and life sciences faces a significant challenge due to the
overwhelming influx of information and a rapidly increasing volume of data.
Addressing this data abundance necessitates the integration of essential tools such as
statistical analysis, data processing, and machine learning.
In addition to managing vast amounts of data, the field encounters issues of
reproducibility in studies related to metabolic profiles. Despite a growing number of
studies, inconsistencies arise due to experimental design problems and instrumental
limitations. Experimental design issues encompass factors influencing measurement
campaigns, unbalanced groups, and confounding variables, leading to inadequate
control of study metadata and potentially erroneous conclusions. Instrumental
limitations, inherent to instruments, include insufficient selectivity, poor
reproducibility, time drift, and non-linear responses. These challenges underscore the
importance of effective data analysis tools in addressing the complexity introduced by
these factors.
To address the need for consistent and repeatable research results, scientists are
promoting the creation of standard procedures, both in experiments and computer
analyses, to reduce variations in scientific studies. Signal pre-processing emerges as a
crucial step in improving data quality before analysis, often overlooked but essential,
particularly in applications involving Artificial Intelligence (AI) models. The underlying
principle of "garbage in, garbage out" underscores the significance of enhancing data
quality through effective pre-processing.
In the thesis it is emphasized the importance of considering the tools and purposes
involved in preparing data, especially when measuring Volatile Organic Compounds
(VOCs). The challenges connected to the measuring instruments like electronic noses
(e-Nose), gas chromatography-mass spectrometry (GC-MS), and ion mobility
spectrometry (IMS), each having different costs, sensitivities, and complexities, are
explained. The goal of the thesis is to introduce computerized workflows that suit
these instruments, focusing on the analysis of VOCs data. The main idea is that by
using proper data preparation steps, along with machine learning, we can overcome
limitations in studying these compounds, improving consistency, and dealing with
specific challenges in the instruments, like selectivity, time drift, and non-linear
behaviour.
The emphasis is on comprehensive pre-processing workflows that account for
potential limitations in volatilomics, breathomics, and foodomics data to ensure
reproducibility. The overarching goal is to understand and address issues leading to
the lack of reproducibility in volatilomics within medicine and other life sciences. The
thesis intends to delve into the concept of volatilomics, its significance across various
life science fields, including foodomics and cancer research, and discuss commonly
used instruments. The complexity of volatilomics data and signals obtained from these
instruments will be explored, followed by the development of proposed workflows
aimed at enhancing reproducibility in VOCs analysis.
The thesis's key conclusions emphasize the significant potential of volatilomics in
applications such as foodomics and disease diagnosis within the life sciences.
Acknowledging the complexity of life sciences samples, highly sensitive
instrumentation is deemed necessary. Different instruments for metabolomics analysis
exhibit varying sensitivity and complexity, influencing the obtained signals' complexity.
Well-established experimental protocols underscore the need for defined
computational workflows in pre-processing to ensure reproducibility. While GC-IMS
holds promise in volatilomics, its expansion is hindered by a lack of open-source
software. Proposed application-dependent workflows, with a general tip to correct
noise before baseline removal and align peaks as the final step, are highlighted. The
development of an R package for GC-IMS pre-processing demonstrated discriminatory
capabilities between male and female urine samples. The recent release of opensource
software for Python and R has facilitated the application of GC-IMS in
volatilomics, particularly in foodomics, where promising results have been obtained. In
biomedicine, GC-IMS holds potential for colorectal cancer detection, emphasizing the
need for further research with rigorous methodologies.
ca
dc.description.abstract
[spa] Esta tesis de doctorado se centra en el desarrollo de flujos de trabajo computacionales para el análisis de metabolitos volátiles en diversas áreas de las ciencias de la vida. Se enfoca en la volatilómica, un campo en crecimiento que está recibiendo más atención. La tesis busca establecer flujos de trabajo computacionales estandarizados para garantizar la reproducibilidad en la ciencia. Destaca la importancia de la volatilómica en aplicaciones como foodómica, análisis del aliento y diagnóstico médico.
Reconociendo la complejidad de las muestras de volatilómica y los desafíos con la instrumentación especializada, la tesis explora la necesidad de flujos de trabajo sofisticados para interpretar datos con precisión. Examina a fondo los instrumentos utilizados, comprendiendo las señales y complejidades asociadas con cada uno. Identifica problemas de datos y propone técnicas para mitigar su impacto, presentando metodologías adaptadas a desafíos específicos.
La tesis presenta tres flujos de trabajo diseñados para procesar datos de Cromatografía de Gases-Espectrometría de Movilidad Iónica (GC-IMS), una técnica crucial en metabolómica volátil. Estos flujos de trabajo se adaptan a aplicaciones específicas, como foodómica, accesibilidad y detección de cáncer de colon.
En resumen, esta investigación avanza las metodologías computacionales en metabolómica volátil, estableciendo una base sólida para enfrentar desafíos en diversas aplicaciones de las ciencias de la vida. Proporciona soluciones estandarizadas y automatizadas para mejorar la confiabilidad y reproducibilidad de los análisis metabolómicos volátiles, allanando el camino para una adopción y aplicación más amplias en la investigación científica y el diagnóstico médico.
ca
dc.rights.license
ADVERTIMENT. Tots els drets reservats. L'accés als continguts d'aquesta tesi doctoral i la seva utilització ha de respectar els drets de la persona autora. Pot ser utilitzada per a consulta o estudi personal, així com en activitats o materials d'investigació i docència en els termes establerts a l'art. 32 del Text Refós de la Llei de Propietat Intel·lectual (RDL 1/1996). Per altres utilitzacions es requereix l'autorització prèvia i expressa de la persona autora. En qualsevol cas, en la utilització dels seus continguts caldrà indicar de forma clara el nom i cognoms de la persona autora i el títol de la tesi doctoral. No s'autoritza la seva reproducció o altres formes d'explotació efectuades amb finalitats de lucre ni la seva comunicació pública des d'un lloc aliè al servei TDX. Tampoc s'autoritza la presentació del seu contingut en una finestra o marc aliè a TDX (framing). Aquesta reserva de drets afecta tant als continguts de la tesi com als seus resums i índexs.
ca