Large scale off-policy and offline learning

Gabbianelli, Germano

Large scale off-policy and offline learning

dc.contributor

Universitat Pompeu Fabra. Departament de Tecnologies de la Informació i les Comunicacions

dc.contributor.author

Gabbianelli, Germano

dc.date.accessioned

2024-06-26T15:16:10Z

dc.date.available

2024-06-26T15:16:10Z

dc.date.issued

2024-03-07

dc.identifier.uri

http://hdl.handle.net/10803/691518

dc.description.abstract

Reinforcement Learning (RL), a subfield of machine learning and artifical intelligence, is a learning paradigm where an artificial agent learns to reach a predefined goal by trying to maximize a reward signal while interacting with the environment. In recent years RL has witnesses unprecedented breakthroughs, driven mainly by the integration of deep learning techniques. However, the deployment of RL algorithms in real-world scenarios poses challenges, particularly in environments where exploration is impractical or hazardous, such as autonomous driving or healthcare applications. Moreover, the current poor theoretical understanding of RL algorithms poses an additional limit to their usefulness in safety-critical scenarios. This thesis focuses on the design of provably efficient algorithms for the settings of off-policy and offline learning. These paradigm constrain the agent to learn without directly receiving any feedback for its actions, and instead observing the rewards obtained by an other policy. In particular, the task of offline learning consists in learning a near-optimal policy only having access to a dataset of past interactions. In summary, the theoretical exploration of off-policy and offline RL not only contributes to the broader understanding of RL algorithms but also offers a principled approach to training in scenarios where safety and reliability are paramount. The findings presented in this thesis aim to be a small step towards a broader adoption of RL in high-stakes environments, underpinned by robust theoretical frameworks and regret bounds.

dc.format.extent

155 p.

dc.language.iso

eng

dc.publisher

Universitat Pompeu Fabra

dc.rights.license

L'accés als continguts d'aquesta tesi queda condicionat a l'acceptació de les condicions d'ús establertes per la següent llicència Creative Commons: http://creativecommons.org/licenses/by-nc-sa/4.0/

dc.rights.uri

http://creativecommons.org/licenses/by-nc-sa/4.0/

dc.source

TDX (Tesis Doctorals en Xarxa)

dc.title

Large scale off-policy and offline learning

dc.type

info:eu-repo/semantics/doctoralThesis

dc.type

info:eu-repo/semantics/publishedVersion

dc.subject.udc

004

dc.contributor.director

Neu, Gergely

dc.embargo.terms

cap

dc.rights.accessLevel

info:eu-repo/semantics/openAccess

dc.description.degree

Programa de Doctorat en Tecnologies de la Informació i les Comunicacions

Documents

tgg.pdf

970.0Kb PDF

Aquest element apareix en la col·lecció o col·leccions següent(s)

Programa de Doctorat en Tecnologies de la Informació i les Comunicacions [407]