Lagragian duality for efficient large-scale reinforcement learning

Bas Serrano, Joan

Lagragian duality for efficient large-scale reinforcement learning

dc.contributor

Universitat Pompeu Fabra. Departament de Tecnologies de la Informació i les Comunicacions

dc.contributor.author

Bas Serrano, Joan

dc.date.accessioned

2022-07-12T10:57:43Z

dc.date.available

2022-07-12T10:57:43Z

dc.date.issued

2022-06-28

dc.identifier.uri

http://hdl.handle.net/10803/674767

dc.description.abstract

Reinforcement learning is an expanding field where very often there is a mismatch between the high performance of the algorithms and their poor theoretical justification. For this reason, there is a need of algorithms that are well grounded in theory, with strong mathematical guarantees and that are efficient in solving large-scale problems. In this work we explore the linear programming approach for optimal control in MDPs. In order to develop novel reinforcement learning algorithms, we apply tools from constrained optimization to this linear programming framework. In concrete, we propose a variety of new algorithms using techniques like constraint relaxation, regularization or Lagrangian duality. We provide a formal performance analysis for all of these algorithms, and evaluate them in a range of benchmark tasks.

en_US

dc.description.abstract

L'aprenentatge per reforç (en anglès, reinforcement learning) és un camp en expansió on tot sovint la gran eficàcia dels algorismes no va de la mà d'una bona justificació teòrica d'aquests. Per aquest motiu, hi ha la necessitat d'algorismes ben fonamentats en la teoria, amb garanties matemàtiques robustes, i que a la vegada siguin eficients a l'hora de resoldre problemes de gran escala. En aquest treball explorem la formulació basada en programació lineal per al control òptim en problemes de decisió de Markov. Per tal de desenvolupar nous algorismes d'aprenentatge per reforç, apliquem eines del camp de l'optimització de funcions convexes a la formulació basada en programació lineal. En concret, utilitzem tècniques com la relaxació de condicions, la regularització, o la dualitat Lagrangiana. També elaborem una anàlisi formal del rendiment d'aquests algorismes i els avaluem en diferents tasques de referència.

en_US

dc.format.extent

126 p.

en_US

dc.format.mimetype

application/pdf

dc.language.iso

eng

en_US

dc.publisher

Universitat Pompeu Fabra

dc.rights.license

L'accés als continguts d'aquesta tesi queda condicionat a l'acceptació de les condicions d'ús establertes per la següent llicència Creative Commons: http://creativecommons.org/licenses/by/4.0/

dc.rights.uri

http://creativecommons.org/licenses/by/4.0/

dc.source

TDX (Tesis Doctorals en Xarxa)

dc.subject

Reinforcement learning

en_US

dc.subject

Lagrangian duality

en_US

dc.subject

Linear programming

en_US

dc.subject

Constraint relaxation

en_US

dc.subject

Convex optimization

en_US

dc.subject

Entropy regularization

en_US

dc.subject

Aprenentatge per reforç

en_US

dc.subject

Dualitat lagrangiana

en_US

dc.subject

Programació lineal

en_US

dc.subject

Relaxació de condicions

en_US

dc.subject

Regularització entròpica

en_US

dc.title

Lagragian duality for efficient large-scale reinforcement learning

en_US

dc.type

info:eu-repo/semantics/doctoralThesis

dc.type

info:eu-repo/semantics/publishedVersion

dc.subject.udc

en_US

dc.contributor.authoremail

joanbasserrano@gmail.com

en_US

dc.contributor.director

Neu, Gergely

dc.embargo.terms

cap

en_US

dc.rights.accessLevel

info:eu-repo/semantics/openAccess

dc.description.degree

Programa de doctorat en Tecnologies de la Informació i les Comunicacions

Documents

tjb.pdf

2.016Mb PDF

This item appears in the following Collection(s)

Programa de Doctorat en Tecnologies de la Informació i les Comunicacions [394]