Functional characterization of single amino acid variants

López Ferrando, Víctor

Functional characterization of single amino acid variants

dc.contributor

Universitat de Barcelona. Facultat de Biologia

dc.contributor.author

López Ferrando, Víctor

dc.date.accessioned

2020-02-11T12:10:30Z

dc.date.available

2020-02-11T12:10:30Z

dc.date.issued

2019-12-11

dc.identifier.uri

http://hdl.handle.net/10803/668545

dc.description

Programa de Doctorat: Biomedicina / Tesi realitzada al Barcelona Supercomputing Center (BSC)

en_US

dc.description.abstract

Single amino acid variants (SAVs) are one of the main causes of Mendelian disorders, and play an important role in the development of many complex diseases. At the same time, they are the most common kind of variation affecting coding DNA, without generally presenting any damaging effect. With the advent of next generation sequencing technologies, the detection of these variants in patients and the general population is easier than ever, but the characterization of the functional effects of each variant remains an open challenge. It is our objective in this work to tackle this problem by developing machine learning based in silico SAVs pathology predictors. Having the PMut classic predictor as a starting point, we have rethought the entire supervised learning pipeline, elaborating new training sets, features and classifiers. PMut2017 is the first result of these efforts, a new general-purpose predictor based on SwissVar and trained on 12 different conservation scores. Its performance, evaluated bothby cross-validation and different blind tests, was in line with the best predictors published to date. Continuing our efforts in search for more accurate predictors, especially for those cases were general predictors tend to fail, we developed PMut-S, a suite of 215 protein-specific predictors. Similar to PMut in nature, Pmut-S introduced the use of co-evolution conservation features and balanced training sets, and showed improved performance, specially for those proteins that were more commonly misclassified by PMut. Comparing PMut-S to other specific predictors we proved that it is possible to train specific predictors using a unique automated pipeline and match the results of most gene specific predictors released to date. The implementation of the machine learning pipeline of both PMut and PMut-S was released as an open source Python module: PyMut, which bundles functions implementing the features computation and selection, classifier training and evaluation, plots drawing, among others. Their predictions were also made available in a rich web portal, which includes a precomputed repository with analyses of more than 700 million variants on over 100,000 human proteins, together with relevant contextual information such as 3D visualizationsof protein structures, links to databases, functional annotations, and more.

en_US

dc.description.abstract

Les mutacions puntuals d’aminoàcids són la principal causa de moltes malalties mendelianes, i juguen un paper important en el desenvolupament de moltes malalties complexes. Alhora, són el tipus de variant més comuna que afecta l’ADN codificant de proteïnes, sense provocar, en general, cap efecte advers. Amb l’adveniment de la seqüenciació de nova generació, la detecció d’aquestes variants en pacients i en la població general és més fàcil que mai, però la caracterització dels efectes funcionals de cada variant segueix sent un repte. El nostre objectiu en aquest treball és abordar aquest problema desenvolupant predictors de patologia in silico basats en l’aprenentatge automàtic. Prenent el predictor clàssic PMut com a punt de partida, hem repensat tot el procés d’aprenentatge supervisat, elaborant nous conjunts d’entrenament, descriptors i classificadors. PMut2017 és el primer resultat d’aquests esforços, un nou predictor basat en SwissVar i entrenat amb 12 mètriques de conservació de seqüència. La seva precisió, mesurada mitjançant validació creuada i amb tests cecs, s’ha mostrar en línia amb els millors predictors publicats a dia d’avui. Continuant els nostres esforços en la cerca de predictors més acurats, hem desenvolupat PMut-S, un conjunt de 215 predictors específics per cada proteïna. Similar a PMut en la seva concepció, PMut-S introdueix l’ús de descriptors basats en la coevolució i conjunts d’entrenament balancejats, millorant el rendiment de PMut2017 en 0.1 punts del coeficient de correlació de Matthews. Comparant PMut-S a d’altres predictors específics hem provat que és possible entrenar predictors específics seguint un únic procediment automatitzat i assolir uns resultats tan bon com els de la majoria de predictors específics publicats. La implementació del procediment d’aprenentatge automàtic tant de PMut com de PMut-S ha sigut publicat com a un mòdul de Python de codi obert: PyMut, el qual inclou les funcions que implementen el càlcul dels descriptors i la seva selecció, l’entrenament i avaluació dels classificadors, el dibuix de diverses gràfiques... Les prediccions també estan disponibles en un portal web que inclou un repositori precalculat amb els anàlisis de més de 700 milions de variants en més de 100 mil proteïnes humanes, junt a rellevant informació de context com visualitzacions 3D de les proteïnes, enllaços a bases de dades, anotacions funcionals i molt més.

en_US

dc.format.extent

197 p.

en_US

dc.format.mimetype

application/pdf

dc.language.iso

eng

en_US

dc.publisher

Universitat de Barcelona

dc.rights.license

L'accés als continguts d'aquesta tesi queda condicionat a l'acceptació de les condicions d'ús establertes per la següent llicència Creative Commons: http://creativecommons.org/licenses/by/4.0/

dc.rights.uri

http://creativecommons.org/licenses/by/4.0/

dc.source

TDX (Tesis Doctorals en Xarxa)

dc.subject

Aminoàcids

en_US

dc.subject

Aminoácidos

en_US

dc.subject

Amino acids

en_US

dc.subject

Medicina preventiva

en_US

dc.subject

Preventive medicine

en_US

dc.subject

Seqüència de nucleòtids

en_US

dc.subject

Cadenas de nucleótidos

en_US

dc.subject

Nucleotide sequence

en_US

dc.subject

Bioinformàtica

en_US

dc.subject

Bioinformática

en_US

dc.subject

Bioinformatics

en_US

dc.subject.other

Ciències Experimentals i Matemàtiques

en_US

dc.title

Functional characterization of single amino acid variants

en_US

dc.type

info:eu-repo/semantics/doctoralThesis

dc.type

info:eu-repo/semantics/publishedVersion

dc.subject.udc

577

en_US

dc.contributor.director

Gelpí Buchaca, Josep Lluís

dc.contributor.director

Orozco López, Modesto

dc.contributor.tutor

Gelpí Buchaca, Josep Lluís

dc.embargo.terms

cap

en_US

dc.rights.accessLevel

info:eu-repo/semantics/openAccess

Documents

VLF_PhD_THESIS.pdf

6.941Mb PDF

This item appears in the following Collection(s)

Facultat de Biologia [236]