TY - THES AU - Carola Figueroa Flores ED - Joost Van de Weijer ED - Bogdan Raducanu PY - 2021// TI - Visual Saliency for Object Recognition, and Object Recognition for Visual Saliency PB - Ediciones Graficas Rey KW - computer vision KW - visual saliency KW - fine-grained object recognition KW - convolutional neural networks KW - images classification N2 - For humans, the recognition of objects is an almost instantaneous, precise andextremely adaptable process. Furthermore, we have the innate capability to learnnew object classes from only few examples. The human brain lowers the complexityof the incoming data by filtering out part of the information and only processingthose things that capture our attention. This, mixed with our biological predisposition to respond to certain shapes or colors, allows us to recognize in a simpleglance the most important or salient regions from an image. This mechanism canbe observed by analyzing on which parts of images subjects place attention; wherethey fix their eyes when an image is shown to them. The most accurate way torecord this behavior is to track eye movements while displaying images.Computational saliency estimation aims to identify to what extent regions orobjects stand out with respect to their surroundings to human observers. Saliencymaps can be used in a wide range of applications including object detection, imageand video compression, and visual tracking. The majority of research in the field hasfocused on automatically estimating saliency maps given an input image. Instead, inthis thesis, we set out to incorporate saliency maps in an object recognition pipeline:we want to investigate whether saliency maps can improve object recognitionresults.In this thesis, we identify several problems related to visual saliency estimation.First, to what extent the estimation of saliency can be exploited to improve thetraining of an object recognition model when scarce training data is available. Tosolve this problem, we design an image classification network that incorporatessaliency information as input. This network processes the saliency map through adedicated network branch and uses the resulting characteristics to modulate thestandard bottom-up visual characteristics of the original image input. We will refer to this technique as saliency-modulated image classification (SMIC). In extensiveexperiments on standard benchmark datasets for fine-grained object recognition,we show that our proposed architecture can significantly improve performance,especially on dataset with scarce training data.Next, we address the main drawback of the above pipeline: SMIC requires anexplicit saliency algorithm that must be trained on a saliency dataset. To solve this,we implement a hallucination mechanism that allows us to incorporate the saliencyestimation branch in an end-to-end trained neural network architecture that onlyneeds the RGB image as an input. A side-effect of this architecture is the estimationof saliency maps. In experiments, we show that this architecture can obtain similarresults on object recognition as SMIC but without the requirement of ground truthsaliency maps to train the system.Finally, we evaluated the accuracy of the saliency maps that occur as a sideeffect of object recognition. For this purpose, we use a set of benchmark datasetsfor saliency evaluation based on eye-tracking experiments. Surprisingly, the estimated saliency maps are very similar to the maps that are computed from humaneye-tracking experiments. Our results show that these saliency maps can obtaincompetitive results on benchmark saliency maps. On one synthetic saliency datasetthis method even obtains the state-of-the-art without the need of ever having seenan actual saliency image for training. SN - 978-84-122714-4-7 N1 - LAMP; 600.120 ID - Carola Figueroa Flores2021 U1 - Ph.D. thesis ER -