Visual understanding of human behavior: 3D pose, motion, actions and context

Hernández Ruiz, Alejandro José

Visual understanding of human behavior: 3D pose, motion, actions and context

dc.contributor

Institut de Robòtica i Informàtica Industrial

dc.contributor.author

Hernández Ruiz, Alejandro José

dc.date.accessioned

2024-01-25T10:04:36Z

dc.date.available

2024-01-25T10:04:36Z

dc.date.issued

2023-05-30

dc.identifier.uri

http://hdl.handle.net/10803/689862

dc.description

dc.description.abstract

(English) Visual understanding of human behavior is a very broad topic that, in the abstract, means understanding what a person or group of people is doing in an image or video. In practice, it can be broken down into a series of steps: detecting people in the image, estimating their posture and motion, recognizing objects in the environment, recognizing the action performed, predicting the subsequent motion, and predicting the next actions to be performed. Each of these steps is a computer vision task, and in this thesis we will focus on the following: - 3D action recognition, identifying actions being performed based on the movements of people. - 3D motion prediction, predicting the motion of a person based on a sequence of previous motion. - Program generation, generate a program with an expected target that can adapt its behavior depending on the context. For each of these tasks, the main contributions of this thesis are: - A novel method for action recognition that uses a 3D CNN in conjunction with Euclidean Distance Matrices to analyze motion sequences. - State of the art results for motion prediction using a generative adversarial network that can generate realistic sequences of up to 4 seconds. - The creation of a new type of neural network, the Neural Cellular Automata Manifold, which can generate programs in the form of cellular automata whose behavior is learned from data. In summary, understanding human behavior is central to many applications and is not trivial to solve. However, we propose methods for recognizing actions, predicting movements, and generating programs, and we have achieved very good results on each of these tasks

dc.description.abstract

(Español) La comprensión visual del comportamiento humano es un tema muy amplio que, en abstracto, significa comprender lo que una persona o grupo de personas está haciendo en una imagen o video. En la práctica, se puede dividir en una serie de pasos: detección de personas en la imagen, estimar su postura y movimiento, reconocer objetos en el entorno, reconocer la acción realizada, predecir el movimiento subsiguiente, y predecir las próximas acciones a realizar. Cada uno de estos pasos es una tarea de visión artificial, y en este tesis nos centraremos en lo siguiente: - Reconocimiento de acciones en 3D, identificando las acciones que se están realizando en función de los movimientos de las personas. - Predicción de movimiento 3D, que predice el movimiento de una persona en función de una secuencia de movimiento anterior. - Generación de programas, generar un programa con un target esperado que pueda adaptar su comportamiento dependiendo del contexto. Para cada una de estas tareas, las principales contribuciones de esta tesis son: - Un método novedoso para el reconocimiento de acciones que utiliza una CNN 3D junto con matrices de distancia euclidiana para analizar secuencias de movimiento. - Resultados de vanguardia para la predicción de movimiento utilizando una GAN que puede generar secuencias realistas de hasta a 4 segundos. - La creación de un nuevo tipo de red neuronal, el Neural Cellular Automata Manifold, que puede generar programas en la forma de autómatas celulares, y cuyo comportamiento se aprende a partir de los datos. En resumen, comprender el comportamiento humano es fundamental para muchas aplicaciones y no es trivial de resolver. Sin embargo, nos proponemos métodos de reconocimiento de acciones, predicción de movimientos y generación de programas, y hemos logrado muy buenos resultados en cada una de estas tareas.

dc.format.extent

80 p.

dc.language.iso

eng

dc.publisher

Universitat Politècnica de Catalunya

dc.rights.license

L'accés als continguts d'aquesta tesi queda condicionat a l'acceptació de les condicions d'ús establertes per la següent llicència Creative Commons: http://creativecommons.org/licenses/by/4.0/

dc.rights.uri

http://creativecommons.org/licenses/by/4.0/

dc.source

TDX (Tesis Doctorals en Xarxa)

dc.subject

Action recognition

dc.subject

3D CNN

dc.subject

Motion prediction

dc.subject

Generative adversarial networks

dc.subject

Program generation

dc.subject

Neural cellular automata

dc.subject.other

Àrees temàtiques de la UPC::Informàtica

dc.title

Visual understanding of human behavior: 3D pose, motion, actions and context

dc.type

info:eu-repo/semantics/doctoralThesis

dc.type

info:eu-repo/semantics/publishedVersion

dc.subject.udc

004

dc.contributor.director

Moreno-Noguer, Francesc

dc.embargo.terms

cap

dc.rights.accessLevel

info:eu-repo/semantics/openAccess

dc.identifier.doi

https://dx.doi.org/10.5821/dissertation-2117-400814

dc.description.degree

DOCTORAT EN AUTOMÀTICA, ROBÒTICA I VISIÓ (Pla 2013)

Documentos

TAJHR1de1.pdf

9.643Mb PDF

Este ítem aparece en la(s) siguiente(s) colección(ones)

Programa de Doctorat en Automàtica, Robòtica i Visió [149]

Àrea de contingut