|   | 
Details
   web
Records
Author David Berga; Xose R. Fernandez-Vidal; Xavier Otazu; Xose M. Pardo
Title SID4VAM: A Benchmark Dataset with Synthetic Images for Visual Attention Modeling Type Conference Article
Year 2019 Publication 18th IEEE International Conference on Computer Vision Abbreviated Journal
Volume Issue (up) Pages 8788-8797
Keywords
Abstract A benchmark of saliency models performance with a synthetic image dataset is provided. Model performance is evaluated through saliency metrics as well as the influence of model inspiration and consistency with human psychophysics. SID4VAM is composed of 230 synthetic images, with known salient regions. Images were generated with 15 distinct types of low-level features (e.g. orientation, brightness, color, size...) with a target-distractor popout type of synthetic patterns. We have used Free-Viewing and Visual Search task instructions and 7 feature contrasts for each feature category. Our study reveals that state-ofthe-art Deep Learning saliency models do not perform well with synthetic pattern images, instead, models with Spectral/Fourier inspiration outperform others in saliency metrics and are more consistent with human psychophysical experimentation. This study proposes a new way to evaluate saliency models in the forthcoming literature, accounting for synthetic images with uniquely low-level feature contexts, distinct from previous eye tracking image datasets.
Address Seul; Corea; October 2019
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference ICCV
Notes NEUROBIT; 600.128 Approved no
Call Number Admin @ si @ BFO2019b Serial 3372
Permanent link to this record
 

 
Author David Berga; Xavier Otazu
Title Computations of inhibition of return mechanisms by modulating V1 dynamics Type Conference Article
Year 2019 Publication 28th Annual Computational Neuroscience Meeting Abbreviated Journal
Volume Issue (up) Pages
Keywords
Abstract In this study we present a unifed model of the visual cortex for predicting visual attention using real image scenes. Feedforward mechanisms from RGC and LGN have been functionally modeled using wavelet filters at distinct orientations and scales for each chromatic pathway (Magno-, Parvo-, Konio-cellular) and polarity (ON-/OFF-center), by processing image components in the CIE Lab space. In V1, we process cortical interactions with an excitatory-inhibitory network of fring rate neurons, initially proposed by (Li, 1999), later extended by (Penacchio et al. 2013). Firing rates from model’s output have been used as predictors of neuronal activity to be projected in a map in superior colliculus (with WTA-like computations), determining locations of visual fxations. These locations will be considered as already visited areas for future saccades, therefore we integrated a spatiotemporal function of inhibition of return mechanisms (where LIP/FEF is responsible) to feed to the model with spatial memory for next saccades. Foveation mechanisms have been simulated with a cortical magnifcation function, which distort spatial viewing properties for each fxation. Results show lower prediction errors than with respect no IoR cases (Fig. 1), and it is functionally consistent with human psychophysical measurements. Our model follows a biologically-constrained architecture, previously shown to reproduce visual saliency (Berga & Otazu, 2018), visual discomfort (Penacchio et al. 2016), brightness (Penacchio et al. 2013) and chromatic induction (Cerda & Otazu, 2016).
Address Barcelona; July 2019
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference CNS
Notes NEUROBIT; no menciona Approved no
Call Number Admin @ si @ BeO2019a Serial 3373
Permanent link to this record
 

 
Author David Berga; Xavier Otazu
Title Computational modelingof visual attention: What do we know from physiology and psychophysics? Type Conference Article
Year 2019 Publication 8th Iberian Conference on Perception Abbreviated Journal
Volume Issue (up) Pages
Keywords
Abstract Latest computer vision architectures use a chain of feedforward computations, mainly optimizing artificial neural networks for very specific tasks. Although their impressive performance (i.e. in saliency) using real image datasets, these models do not follow several biological principles of the human visual system (e.g. feedback and horizontal connections in cortex) and are unable to predict several visual tasks simultaneously. In this study we present biologically plausible computations from the early stages of the human visual system (i.e. retina and lateral geniculate nucleus) and lateral connections in V1. Despite the simplicity of these processes and without any type of training or optimization, simulations of firing-rate dynamics of V1 are able to predict bottom-up visual attention at distinct contexts (shown previously as well to predict visual discomfort, brightness and chromatic induction). We also show functional top-down selection mechanisms as feedback inhibition projections (i.e. prefrontal cortex for search/task-based attention and parietal area for inhibition of return). Distinct saliency model predictions are tested with eye tracking datasets in free-viewing and visual search tasks, using real images and synthetically-generated patterns. Results on predicting saliency and scanpaths show that artificial models do not outperform biologically-inspired ones (specifically for datasets that lack of common endogenous biases found in eye tracking experimentation), as well as, do not correctly predict contrast sensitivities in pop-out stimulus patterns. This work remarks the importance of considering biological principles of the visual system for building models that reproduce this (and any other) visual effects.
Address San Lorenzo El Escorial; July 2019
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference CIP
Notes NEUROBIT; no menciona Approved no
Call Number Admin @ si @ BeO2019b Serial 3374
Permanent link to this record
 

 
Author David Berga; Xose R. Fernandez-Vidal; Xavier Otazu; Victor Leboran; Xose M. Pardo
Title Measuring bottom-up visual attention in eye tracking experimentation with synthetic images Type Conference Article
Year 2019 Publication 8th Iberian Conference on Perception Abbreviated Journal
Volume Issue (up) Pages
Keywords
Abstract A benchmark of saliency models performance with a synthetic image dataset is provided. Model performance is evaluated through saliency metrics as well as the influence of model inspiration and consistency with human psychophysics. SID4VAM is composed of 230 synthetic images, with known salient regions. Images were generated with 15 distinct types of low-level features (e.g. orientation, brightness, color, size...) with a target-distractor pop-out type of synthetic patterns. We have used Free-Viewing and Visual Search task instructions and 7 feature contrasts for each feature category. Our study reveals that state-of-the-art Deep Learning saliency models do not perform well with synthetic pattern images, instead, models with Spectral/Fourier inspiration outperform others in saliency metrics and are more consistent with human psychophysical experimentation. This study proposes a new way to evaluate saliency models in the forthcoming literature, accounting for synthetic images with uniquely low-level feature contexts, distinct from previous eye tracking image datasets.
Address San Lorenzo El Escorial; July 2019
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference CIP
Notes NEUROBIT; 600.128 Approved no
Call Number Admin @ si @ BFO2019c Serial 3375
Permanent link to this record
 

 
Author David Berga; Xavier Otazu
Title Computations of top-down attention by modulating V1 dynamics Type Conference Article
Year 2020 Publication Computational and Mathematical Models in Vision Abbreviated Journal
Volume Issue (up) Pages
Keywords
Abstract
Address St. Pete Beach; Florida; May 2020
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference MODVIS
Notes NEUROBIT Approved no
Call Number Admin @ si @ BeO2020a Serial 3376
Permanent link to this record
 

 
Author Estefania Talavera; Alexandre Cola; Nicolai Petkov; Petia Radeva
Title Towards Egocentric Person Re-identification and Social Pattern Analysis. Type Book Chapter
Year 2019 Publication Frontiers in Artificial Intelligence and Applications Abbreviated Journal
Volume 310 Issue (up) Pages 203 - 211
Keywords
Abstract CoRR abs/1905.04073
Wearable cameras capture a first-person view of the daily activities of the camera wearer, offering a visual diary of the user behaviour. Detection of the appearance of people the camera user interacts with for social interactions analysis is of high interest. Generally speaking, social events, lifestyle and health are highly correlated, but there is a lack of tools to monitor and analyse them. We consider that egocentric vision provides a tool to obtain information and understand users social interactions. We propose a model that enables us to evaluate and visualize social traits obtained by analysing social interactions appearance within egocentric photostreams. Given sets of egocentric images, we detect the appearance of faces within the days of the camera wearer, and rely on clustering algorithms to group their feature descriptors in order to re-identify persons. Recurrence of detected faces within photostreams allows us to shape an idea of the social pattern of behaviour of the user. We validated our model over several weeks recorded by different camera wearers. Our findings indicate that social profiles are potentially useful for social behaviour interpretation.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes MILAB; no proj Approved no
Call Number Admin @ si @ TCP2019 Serial 3377
Permanent link to this record
 

 
Author Estefania Talavera; Nicolai Petkov; Petia Radeva
Title Towards Unsupervised Familiar Scene Recognition in Egocentric Videos Type Miscellaneous
Year 2019 Publication Arxiv Abbreviated Journal
Volume Issue (up) Pages
Keywords
Abstract CoRR abs/1905.04093
Nowadays, there is an upsurge of interest in using lifelogging devices. Such devices generate huge amounts of image data; consequently, the need for automatic methods for analyzing and summarizing these data is drastically increasing. We present a new method for familiar scene recognition in egocentric videos, based on background pattern detection through automatically configurable COSFIRE filters. We present some experiments over egocentric data acquired with the Narrative Clip.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes MILAB; no menciona Approved no
Call Number Admin @ si @ TPR2019b Serial 3379
Permanent link to this record
 

 
Author Estefania Talavera; Petia Radeva; Nicolai Petkov
Title Towards Emotion Retrieval in Egocentric PhotoStream Type Miscellaneous
Year 2019 Publication Arxiv Abbreviated Journal
Volume Issue (up) Pages
Keywords
Abstract CoRR abs/1905.04107
The availability and use of egocentric data are rapidly increasing due to the growing use of wearable cameras. Our aim is to study the effect (positive, neutral or negative) of egocentric images or events on an observer. Given egocentric photostreams capturing the wearer's days, we propose a method that aims to assign sentiment to events extracted from egocentric photostreams. Such moments can be candidates to retrieve according to their possibility of representing a positive experience for the camera's wearer. The proposed approach obtained a classification accuracy of 75% on the test set, with a deviation of 8%. Our model makes a step forward opening the door to sentiment recognition in egocentric photostreams.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes MILAB; no proj Approved no
Call Number Admin @ si @ TRP2019 Serial 3381
Permanent link to this record
 

 
Author Alejandro Cartas; Jordi Luque; Petia Radeva; Carlos Segura; Mariella Dimiccoli
Title How Much Does Audio Matter to Recognize Egocentric Object Interactions? Type Miscellaneous
Year 2019 Publication Arxiv Abbreviated Journal
Volume Issue (up) Pages
Keywords
Abstract CoRR abs/1906.00634
Sounds are an important source of information on our daily interactions with objects. For instance, a significant amount of people can discern the temperature of water that it is being poured just by using the sense of hearing. However, only a few works have explored the use of audio for the classification of object interactions in conjunction with vision or as single modality. In this preliminary work, we propose an audio model for egocentric action recognition and explore its usefulness on the parts of the problem (noun, verb, and action classification). Our model achieves a competitive result in terms of verb classification (34.26% accuracy) on a standard benchmark with respect to vision-based state of the art systems, using a comparatively lighter architecture.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes MILAB; no menciona Approved no
Call Number Admin @ si @ CLR2019 Serial 3383
Permanent link to this record
 

 
Author Md. Mostafa Kamal Sarker; Hatem A. Rashwan; Mohamed Abdel-Nasser; Vivek Kumar Singh; Syeda Furruka Banu; Farhan Akram; Forhad U. H. Chowdhury; Kabir Ahmed Choudhury; Sylvie Chambon; Petia Radeva; Domenec Puig
Title MobileGAN: Skin Lesion Segmentation Using a Lightweight Generative Adversarial Network Type Miscellaneous
Year 2019 Publication Arxiv Abbreviated Journal
Volume Issue (up) Pages
Keywords
Abstract CoRR abs/1907.00856
Skin lesion segmentation in dermoscopic images is a challenge due to their blurry and irregular boundaries. Most of the segmentation approaches based on deep learning are time and memory consuming due to the hundreds of millions of parameters. Consequently, it is difficult to apply them to real dermatoscope devices with limited GPU and memory resources. In this paper, we propose a lightweight and efficient Generative Adversarial Networks (GAN) model, called MobileGAN for skin lesion segmentation. More precisely, the MobileGAN combines 1D non-bottleneck factorization networks with position and channel attention modules in a GAN model. The proposed model is evaluated on the test dataset of the ISBI 2017 challenges and the validation dataset of ISIC 2018 challenges. Although the proposed network has only 2.35 millions of parameters, it is still comparable with the state-of-the-art. The experimental results show that our MobileGAN obtains comparable performance with an accuracy of 97.61%.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes MILAB; no menciona Approved no
Call Number Admin @ si @ MRA2019 Serial 3384
Permanent link to this record
 

 
Author Alejandro Cartas; Jordi Luque; Petia Radeva; Carlos Segura; Mariella Dimiccoli
Title Seeing and Hearing Egocentric Actions: How Much Can We Learn? Type Conference Article
Year 2019 Publication IEEE International Conference on Computer Vision Workshops Abbreviated Journal
Volume Issue (up) Pages 4470-4480
Keywords
Abstract Our interaction with the world is an inherently multimodal experience. However, the understanding of human-to-object interactions has historically been addressed focusing on a single modality. In particular, a limited number of works have considered to integrate the visual and audio modalities for this purpose. In this work, we propose a multimodal approach for egocentric action recognition in a kitchen environment that relies on audio and visual information. Our model combines a sparse temporal sampling strategy with a late fusion of audio, spatial, and temporal streams. Experimental results on the EPIC-Kitchens dataset show that multimodal integration leads to better performance than unimodal approaches. In particular, we achieved a 5.18% improvement over the state of the art on verb classification.
Address Seul; Korea; October 2019
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference ICCVW
Notes MILAB; no proj Approved no
Call Number Admin @ si @ CLR2019b Serial 3385
Permanent link to this record
 

 
Author Felipe Codevilla
Title On Building End-to-End Driving Models Through Imitation Learning Type Book Whole
Year 2019 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal
Volume Issue (up) Pages
Keywords
Abstract Autonomous vehicles are now considered as an assured asset in the future. Literally, all the relevant car-markers are now in a race to produce fully autonomous vehicles. These car-makers usually make use of modular pipelines for designing autonomous vehicles. This strategy decomposes the problem in a variety of tasks such as object detection and recognition, semantic and instance segmentation, depth estimation, SLAM and place recognition, as well as planning and control. Each module requires a separate set of expert algorithms, which are costly specially in the amount of human labor and necessity of data labelling. An alternative, that recently has driven considerable interest, is the end-to-end driving. In the end-to-end driving paradigm, perception and control are learned simultaneously using a deep network. These sensorimotor models are typically obtained by imitation learning fromhuman demonstrations. The main advantage is that this approach can directly learn from large fleets of human-driven vehicles without requiring a fixed ontology and extensive amounts of labeling. However, scaling end-to-end driving methods to behaviors more complex than simple lane keeping or lead vehicle following remains an open problem. On this thesis, in order to achieve more complex behaviours, we
address some issues when creating end-to-end driving system through imitation
learning. The first of themis a necessity of an environment for algorithm evaluation and collection of driving demonstrations. On this matter, we participated on the creation of the CARLA simulator, an open source platformbuilt from ground up for autonomous driving validation and prototyping. Since the end-to-end approach is purely reactive, there is also the necessity to provide an interface with a global planning system. With this, we propose the conditional imitation learning that conditions the actions produced into some high level command. Evaluation is also a concern and is commonly performed by comparing the end-to-end network output to some pre-collected driving dataset. We show that this is surprisingly weakly correlated to the actual driving and propose strategies on how to better acquire data and a better comparison strategy. Finally, we confirmwell-known generalization issues
(due to dataset bias and overfitting), new ones (due to dynamic objects and the
lack of a causal model), and training instability; problems requiring further research before end-to-end driving through imitation can scale to real-world driving.
Address May 2019
Corporate Author Thesis Ph.D. thesis
Publisher Ediciones Graficas Rey Place of Publication Editor Antonio Lopez
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes ADAS; 600.118 Approved no
Call Number Admin @ si @ Cod2019 Serial 3387
Permanent link to this record
 

 
Author Zhijie Fang
Title Behavior understanding of vulnerable road users by 2D pose estimation Type Book Whole
Year 2019 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal
Volume Issue (up) Pages
Keywords
Abstract Anticipating the intentions of vulnerable road users (VRUs) such as pedestrians
and cyclists can be critical for performing safe and comfortable driving maneuvers. This is the case for human driving and, therefore, should be taken into account by systems providing any level of driving assistance, i.e. from advanced driver assistant systems (ADAS) to fully autonomous vehicles (AVs). In this PhD work, we show how the latest advances on monocular vision-based human pose estimation, i.e. those relying on deep Convolutional Neural Networks (CNNs), enable to recognize the intentions of such VRUs. In the case of cyclists, we assume that they follow the established traffic codes to indicate future left/right turns and stop maneuvers with arm signals. In the case of pedestrians, no indications can be assumed a priori. Instead, we hypothesize that the walking pattern of a pedestrian can allow us to determine if he/she has the intention of crossing the road in the path of the egovehicle, so that the ego-vehicle must maneuver accordingly (e.g. slowing down or stopping). In this PhD work, we show how the same methodology can be used for recognizing pedestrians and cyclists’ intentions. For pedestrians, we perform experiments on the publicly available Daimler and JAAD datasets. For cyclists, we did not found an analogous dataset, therefore, we created our own one by acquiring
and annotating corresponding video-sequences which we aim to share with the
research community. Overall, the proposed pipeline provides new state-of-the-art results on the intention recognition of VRUs.
Address May 2019
Corporate Author Thesis Ph.D. thesis
Publisher Ediciones Graficas Rey Place of Publication Editor Antonio Lopez;David Vazquez
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN 978-84-948531-6-6 Medium
Area Expedition Conference
Notes ADAS; 600.118 Approved no
Call Number Admin @ si @ Fan2019 Serial 3388
Permanent link to this record
 

 
Author Juan Ignacio Toledo
Title Information Extraction from Heterogeneous Handwritten Documents Type Book Whole
Year 2019 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal
Volume Issue (up) Pages
Keywords
Abstract In this thesis we explore information Extraction from totally or partially handwritten documents. Basically we are dealing with two different application scenarios. The first scenario are modern highly structured documents like forms. In this kind of documents, the semantic information is encoded in different fields with a pre-defined location in the document, therefore, information extraction becomes roughly equivalent to transcription. The second application scenario are loosely structured totally handwritten documents, besides transcribing them, we need to assign a semantic label, from a set of known values to the handwritten words.
In both scenarios, transcription is an important part of the information extraction. For that reason in this thesis we present two methods based on Neural Networks, to transcribe handwritten text.In order to tackle the challenge of loosely structured documents, we have produced a benchmark, consisting of a dataset, a defined set of tasks and a metric, that was presented to the community as an international competition. Also, we propose different models based on Convolutional and Recurrent neural networks that are able to transcribe and assign different semantic labels to each handwritten words, that is, able to perform Information Extraction.
Address July 2019
Corporate Author Thesis Ph.D. thesis
Publisher Ediciones Graficas Rey Place of Publication Editor Alicia Fornes;Josep Llados
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN 978-84-948531-7-3 Medium
Area Expedition Conference
Notes DAG; 600.140; 600.121 Approved no
Call Number Admin @ si @ Tol2019 Serial 3389
Permanent link to this record
 

 
Author David Berga
Title Understanding Eye Movements: Psychophysics and a Model of Primary Visual Cortex Type Book Whole
Year 2019 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal
Volume Issue (up) Pages
Keywords
Abstract Humansmove their eyes in order to learn visual representations of the world. These eye movements depend on distinct factors, either by the scene that we perceive or by our own decisions. To select what is relevant to attend is part of our survival mechanisms and the way we build reality, as we constantly react both consciously and unconsciously to all the stimuli that is projected into our eyes. In this thesis we try to explain (1) how we move our eyes, (2) how to build machines that understand visual information and deploy eyemovements, and (3) how to make these machines understand tasks in order to decide for eye movements.
(1) We provided the analysis of eye movement behavior elicited by low-level feature distinctiveness with a dataset of 230 synthetically-generated image patterns. A total of 15 types of stimuli has been generated (e.g. orientation, brightness, color, size, etc.), with 7 feature contrasts for each feature category. Eye-tracking data was collected from 34 participants during the viewing of the dataset, using Free-Viewing and Visual Search task instructions. Results showed that saliency is predominantly and distinctively influenced by: 1. feature type, 2. feature contrast, 3. Temporality of fixations, 4. task difficulty and 5. center bias. From such dataset (SID4VAM), we have computed a benchmark of saliency models by testing performance using psychophysical patterns. Model performance has been evaluated considering model inspiration and consistency with human psychophysics. Our study reveals that state-of-the-art Deep Learning saliency models do not performwell with synthetic pattern images, instead, modelswith Spectral/Fourier inspiration outperform others in saliency metrics and are more consistent with human psychophysical experimentation.
(2) Computations in the primary visual cortex (area V1 or striate cortex) have long been hypothesized to be responsible, among several visual processing mechanisms, of bottom-up visual attention (also named saliency). In order to validate this hypothesis, images from eye tracking datasets have been processed with a biologically plausible model of V1 (named Neurodynamic SaliencyWaveletModel or NSWAM). Following Li’s neurodynamic model, we define V1’s lateral connections with a network of firing rate neurons, sensitive to visual features such as brightness, color, orientation and scale. Early subcortical processes (i.e. retinal and thalamic) are functionally simulated. The resulting saliency maps are generated from the model output, representing the neuronal activity of V1 projections towards brain areas involved in eye movement control. We want to pinpoint that our unified computational architecture is able to reproduce several visual processes (i.e. brightness, chromatic induction and visual discomfort) without applying any type of training or optimization and keeping the same parametrization. The model has been extended (NSWAM-CM) with an implementation of the cortical magnification function to define the retinotopical projections towards V1, processing neuronal activity for each distinct view during scene observation. Novel computational definitions of top-down inhibition (in terms of inhibition of return and selection mechanisms), are also proposed to predict attention in Free-Viewing and Visual Search conditions. Results show that our model outperforms other biologically-inpired models of saliency prediction as well as to predict visual saccade sequences, specifically for nature and synthetic images. We also show how temporal and spatial characteristics of inhibition of return can improve prediction of saccades, as well as how distinct search strategies (in terms of feature-selective or category-specific inhibition) predict attention at distinct image contexts.
(3) Although previous scanpath models have been able to efficiently predict saccades during Free-Viewing, it is well known that stimulus and task instructions can strongly affect eye movement patterns. In particular, task priming has been shown to be crucial to the deployment of eye movements, involving interactions between brain areas related to goal-directed behavior, working and long-termmemory in combination with stimulus-driven eyemovement neuronal correlates. In our latest study we proposed an extension of the Selective Tuning Attentive Reference Fixation ControllerModel based on task demands (STAR-FCT), describing novel computational definitions of Long-TermMemory, Visual Task Executive and Task Working Memory. With these modules we are able to use textual instructions in order to guide the model to attend to specific categories of objects and/or places in the scene. We have designed our memorymodel by processing a visual hierarchy of low- and high-level features. The relationship between the executive task instructions and the memory representations has been specified using a tree of semantic similarities between the learned features and the object category labels. Results reveal that by using this model, the resulting object localizationmaps and predicted saccades have a higher probability to fall inside the salient regions depending on the distinct task instructions compared to saliency.
Address July 2019
Corporate Author Thesis Ph.D. thesis
Publisher Ediciones Graficas Rey Place of Publication Editor Xavier Otazu
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN 978-84-948531-8-0 Medium
Area Expedition Conference
Notes NEUROBIT Approved no
Call Number Admin @ si @ Ber2019 Serial 3390
Permanent link to this record