dc.contributor
Universitat Pompeu Fabra. Departament de Tecnologies de la Informació i les Comunicacions
dc.contributor.author
Gabbianelli, Germano
dc.date.accessioned
2024-06-26T15:16:10Z
dc.date.available
2024-06-26T15:16:10Z
dc.date.issued
2024-03-07
dc.identifier.uri
http://hdl.handle.net/10803/691518
dc.description.abstract
Reinforcement Learning (RL), a subfield of machine learning and artifical
intelligence, is a learning paradigm where an artificial agent learns to
reach a predefined goal by trying to maximize a reward signal while interacting
with the environment. In recent years RL has witnesses unprecedented
breakthroughs, driven mainly by the integration of deep learning
techniques. However, the deployment of RL algorithms in real-world scenarios
poses challenges, particularly in environments where exploration is
impractical or hazardous, such as autonomous driving or healthcare applications.
Moreover, the current poor theoretical understanding of RL
algorithms poses an additional limit to their usefulness in safety-critical
scenarios.
This thesis focuses on the design of provably efficient algorithms for the
settings of off-policy and offline learning. These paradigm constrain the
agent to learn without directly receiving any feedback for its actions, and
instead observing the rewards obtained by an other policy. In particular,
the task of offline learning consists in learning a near-optimal policy only
having access to a dataset of past interactions.
In summary, the theoretical exploration of off-policy and offline RL not
only contributes to the broader understanding of RL algorithms but also
offers a principled approach to training in scenarios where safety and reliability
are paramount. The findings presented in this thesis aim to be a
small step towards a broader adoption of RL in high-stakes environments,
underpinned by robust theoretical frameworks and regret bounds.
ca
dc.format.extent
155 p.
ca
dc.publisher
Universitat Pompeu Fabra
dc.rights.license
L'accés als continguts d'aquesta tesi queda condicionat a l'acceptació de les condicions d'ús establertes per la següent llicència Creative Commons: http://creativecommons.org/licenses/by-nc-sa/4.0/
ca
dc.rights.uri
http://creativecommons.org/licenses/by-nc-sa/4.0/
*
dc.source
TDX (Tesis Doctorals en Xarxa)
dc.title
Large scale off-policy and offline learning
ca
dc.type
info:eu-repo/semantics/doctoralThesis
dc.type
info:eu-repo/semantics/publishedVersion
dc.contributor.director
Neu, Gergely
dc.rights.accessLevel
info:eu-repo/semantics/openAccess
dc.description.degree
Programa de Doctorat en Tecnologies de la Informació i les Comunicacions