dc.description.abstract
Remote sensing (RS) images have become fundamental in many important application fields, including land-cover and mapping, urban planning, forest monitoring, agricultural land management, or weather forecasting, among others. Recently, the availability of new satellites with very different instruments has brought new opportunities and challenges to the processing of RS imagery. The now-public access to more multispectral (MS) and hyperspectral (HS) RS data has made possible the training of deep learning (DL) models that have been previously restrained to natural images. However, state-of-the-art image processing DL models cannot be directly applied to RS products. There are many differences in the nature of RS images, particularly MS and HS images, that make it a challenging task to process them. Even more so to fuse them in a multimodal fashion for tasks such as image registration, classification, segmentation or regression. In this thesis, we perform different image processing tasks by designing state-of-the-art models for this demanding kind of images. In this work, we propose different DL methodologies to fuse and process RS images. Until recent years, the sparse amount of RS data and the high computational cost of DL models made it only viable for traditional machine learning and image processing algorithms to extract features and information from RS images. In this thesis, we develop state-of-the-art DL models such as convolutional neural networks or transformers, and we apply them to different RS tasks. Specifically, this work can be divided into four main areas within RS: dataset generation, image registration, indices regression, and classification. First, we define a new multimodal MS dataset, containing images from all of Europe. Then, after researching different image registration methods for RS imagery, we developed a DL self-supervised optical flow method to register low-resolution MS images using higher resolution multimodal information from a different instrument. Thanks to this information, we could correct local deformations and reduce geolocation errors. Once the images have been geometrically aligned, the process of fusing their information is more effective and accurate. Then, we designed a DL model to predict unavailable vegetation indices from the low-resolution multimodal image in a temporal series. Next, we used a combination of low and medium resolution RS MS images to improve the completing process of the unavailable vegetation indices in a temporal series with a multimodal DL regression model. Finally, we proposed a self-supervised pretrained DL model to classify HS remote sensing images.
ca