|   | 
Details
   web
Records
Author Lluis Gomez; Ali Furkan Biten; Ruben Tito; Andres Mafla; Marçal Rusiñol; Ernest Valveny; Dimosthenis Karatzas
Title Multimodal grid features and cell pointers for scene text visual question answering Type Journal Article
Year 2021 Publication Pattern Recognition Letters Abbreviated Journal PRL
Volume (up) 150 Issue Pages 242-249
Keywords
Abstract This paper presents a new model for the task of scene text visual question answering. In this task questions about a given image can only be answered by reading and understanding scene text. Current state of the art models for this task make use of a dual attention mechanism in which one attention module attends to visual features while the other attends to textual features. A possible issue with this is that it makes difficult for the model to reason jointly about both modalities. To fix this problem we propose a new model that is based on an single attention mechanism that attends to multi-modal features conditioned to the question. The output weights of this attention module over a grid of multi-modal spatial features are interpreted as the probability that a certain spatial location of the image contains the answer text to the given question. Our experiments demonstrate competitive performance in two standard datasets with a model that is faster than previous methods at inference time. Furthermore, we also provide a novel analysis of the ST-VQA dataset based on a human performance study. Supplementary material, code, and data is made available through this link.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes DAG; 600.084; 600.121 Approved no
Call Number Admin @ si @ GBT2021 Serial 3620
Permanent link to this record
 

 
Author Marc Serra
Title Estimating Intrinsic Images from Physical and Categorical Color Cues Type Report
Year 2010 Publication CVC Technical Report Abbreviated Journal
Volume (up) 151 Issue Pages
Keywords
Abstract
Address
Corporate Author Thesis Master's thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes CIC Approved no
Call Number Admin @ si @ Ser2010 Serial 1345
Permanent link to this record
 

 
Author Ivet Rafegas; Maria Vanrell
Title Color encoding in biologically-inspired convolutional neural networks Type Journal Article
Year 2018 Publication Vision Research Abbreviated Journal VR
Volume (up) 151 Issue Pages 7-17
Keywords Color coding; Computer vision; Deep learning; Convolutional neural networks
Abstract Convolutional Neural Networks have been proposed as suitable frameworks to model biological vision. Some of these artificial networks showed representational properties that rival primate performances in object recognition. In this paper we explore how color is encoded in a trained artificial network. It is performed by estimating a color selectivity index for each neuron, which allows us to describe the neuron activity to a color input stimuli. The index allows us to classify whether they are color selective or not and if they are of a single or double color. We have determined that all five convolutional layers of the network have a large number of color selective neurons. Color opponency clearly emerges in the first layer, presenting 4 main axes (Black-White, Red-Cyan, Blue-Yellow and Magenta-Green), but this is reduced and rotated as we go deeper into the network. In layer 2 we find a denser hue sampling of color neurons and opponency is reduced almost to one new main axis, the Bluish-Orangish coinciding with the dataset bias. In layers 3, 4 and 5 color neurons are similar amongst themselves, presenting different type of neurons that detect specific colored objects (e.g., orangish faces), specific surrounds (e.g., blue sky) or specific colored or contrasted object-surround configurations (e.g. blue blob in a green surround). Overall, our work concludes that color and shape representation are successively entangled through all the layers of the studied network, revealing certain parallelisms with the reported evidences in primate brains that can provide useful insight into intermediate hierarchical spatio-chromatic representations.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes CIC; 600.051; 600.087 Approved no
Call Number Admin @ si @RaV2018 Serial 3114
Permanent link to this record
 

 
Author Ahmed Mounir Gad
Title Object Localization Enhancement by Multiple Segmentation Fusion Type Report
Year 2010 Publication CVC Technical Report Abbreviated Journal
Volume (up) 152 Issue Pages
Keywords
Abstract
Address
Corporate Author Thesis Master's thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes Approved no
Call Number Admin @ si @ Mou2010 Serial 1346
Permanent link to this record
 

 
Author Antonio Hernandez
Title Pose and Face Recovery via Spatio-temporal GrabCut Human Segmentation Type Report
Year 2010 Publication CVC Technical Report Abbreviated Journal
Volume (up) 153 Issue Pages
Keywords
Abstract
Address
Corporate Author Thesis Master's thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes HUPBA;MILAB Approved no
Call Number Admin @ si @ Her2010 Serial 1347
Permanent link to this record
 

 
Author Jorge Bernal; Fernando Vilariño; F. Javier Sanchez
Title Feature Detectors and Feature Descriptors: Where We Are Now Type Report
Year 2010 Publication CVC Technical Report Abbreviated Journal
Volume (up) 154 Issue Pages
Keywords
Abstract Feature Detection and Feature Description are clearly nowadays topics. Many Computer Vision applications rely on the use of several of these techniques in order to extract the most significant aspects of an image so they can help in some tasks such as image retrieval, image registration, object recognition, object categorization and texture classification, among others. In this paper we define what Feature Detection and Description are and then we present an extensive collection of several methods in order to show the different techniques that are being used right now. The aim of this report is to provide a glimpse of what is being used currently in these fields and to serve as a starting point for future endeavours.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area 800 Expedition Conference
Notes MV;SIAI Approved no
Call Number Admin @ si @ BVS2010; IAM @ iam @ BVS2010 Serial 1348
Permanent link to this record
 

 
Author David Berga; Xose R. Fernandez-Vidal; Xavier Otazu; V. Leboran; Xose M. Pardo
Title Psychophysical evaluation of individual low-level feature influences on visual attention Type Journal Article
Year 2019 Publication Vision Research Abbreviated Journal VR
Volume (up) 154 Issue Pages 60-79
Keywords Visual attention; Psychophysics; Saliency; Task; Context; Contrast; Center bias; Low-level; Synthetic; Dataset
Abstract In this study we provide the analysis of eye movement behavior elicited by low-level feature distinctiveness with a dataset of synthetically-generated image patterns. Design of visual stimuli was inspired by the ones used in previous psychophysical experiments, namely in free-viewing and visual searching tasks, to provide a total of 15 types of stimuli, divided according to the task and feature to be analyzed. Our interest is to analyze the influences of low-level feature contrast between a salient region and the rest of distractors, providing fixation localization characteristics and reaction time of landing inside the salient region. Eye-tracking data was collected from 34 participants during the viewing of a 230 images dataset. Results show that saliency is predominantly and distinctively influenced by: 1. feature type, 2. feature contrast, 3. temporality of fixations, 4. task difficulty and 5. center bias. This experimentation proposes a new psychophysical basis for saliency model evaluation using synthetic images.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes NEUROBIT; 600.128; 600.120 Approved no
Call Number Admin @ si @ BFO2019a Serial 3274
Permanent link to this record
 

 
Author Nataliya Shapovalova
Title On Importance of Interaction and Context Type Report
Year 2010 Publication CVC Technical Report Abbreviated Journal
Volume (up) 155 Issue Pages
Keywords
Abstract
Address
Corporate Author Thesis Master's thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes ISE Approved no
Call Number Admin @ si @ Sha2010 Serial 1355
Permanent link to this record
 

 
Author Mariella Dimiccoli; Marc Bolaños; Estefania Talavera; Maedeh Aghaei; Stavri G. Nikolov; Petia Radeva
Title SR-Clustering: Semantic Regularized Clustering for Egocentric Photo Streams Segmentation Type Journal Article
Year 2017 Publication Computer Vision and Image Understanding Abbreviated Journal CVIU
Volume (up) 155 Issue Pages 55-69
Keywords
Abstract While wearable cameras are becoming increasingly popular, locating relevant information in large unstructured collections of egocentric images is still a tedious and time consuming processes. This paper addresses the problem of organizing egocentric photo streams acquired by a wearable camera into semantically meaningful segments. First, contextual and semantic information is extracted for each image by employing a Convolutional Neural Networks approach. Later, by integrating language processing, a vocabulary of concepts is defined in a semantic space. Finally, by exploiting the temporal coherence in photo streams, images which share contextual and semantic attributes are grouped together. The resulting temporal segmentation is particularly suited for further analysis, ranging from activity and event recognition to semantic indexing and summarization. Experiments over egocentric sets of nearly 17,000 images, show that the proposed approach outperforms state-of-the-art methods.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes MILAB; 601.235 Approved no
Call Number Admin @ si @ DBT2017 Serial 2714
Permanent link to this record
 

 
Author Zhanwu Xiong
Title A Pompd Model for Active Camera Control Type Report
Year 2010 Publication CVC Technical Report Abbreviated Journal
Volume (up) 156 Issue Pages
Keywords
Abstract
Address
Corporate Author Thesis Master's thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes ISE Approved no
Call Number Admin @ si @ Xio2010 Serial 1356
Permanent link to this record
 

 
Author Debora Gil; Jose Maria-Carazo; Roberto Marabini
Title On the nature of 2D crystal unbending Type Journal Article
Year 2006 Publication Journal of Structural Biology Abbreviated Journal
Volume (up) 156 Issue 3 Pages 546-555
Keywords Electron microscopy
Abstract Crystal unbending, the process that aims to recover a perfect crystal from experimental data, is one of the more important steps in electron crystallography image processing. The unbending process involves three steps: estimation of the unit cell displacements from their ideal positions, extension of the deformation field to the whole image and transformation of the image in order to recover an ideal crystal. In this work, we present a systematic analysis of the second step oriented to address two issues. First, whether the unit cells remain undistorted and only the distance between them should be changed (rigid case) or should be modified with the same deformation suffered by the whole crystal (elastic case). Second, the performance of different extension algorithms (interpolation versus approximation) is explored. Our experiments show that there is no difference between elastic and rigid cases or among the extension algorithms. This implies that the deformation fields are constant over large areas. Furthermore, our results indicate that the main source of error is the transformation of the crystal image.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN 1047-8477 ISBN Medium
Area Expedition Conference
Notes IAM; Approved no
Call Number IAM @ iam @ GCM2006 Serial 1519
Permanent link to this record
 

 
Author Patricia Marquez
Title Conditions Ensuring Accuracy of Local Optical Flow Schemes Type Report
Year 2010 Publication CVC Tehcnical Report Abbreviated Journal
Volume (up) 157 Issue Pages
Keywords
Abstract Accurate computation of optical flow is a key-point in many image processing fields. Detection of anomalous and unpredicted agents (such as pedestrians, bikers or cars) in urban scenes or pathology discrimination in medical imaging sequences, to mention just a two. The above kinds sequences present two main difficulties for standard optical flow techniques. On one hand, variability in acquisition conditions (illuminance, medical imaging modality, ...) force an alterantive representation for images fulfilling the britghtness constancy constrain. On the hand, current variational schemes produce oversmoothed fields unable to properly model discontinuous behaviours such as collisions or functionless pathological areas. This master project explores the abilities and limitations of local and global optical flow approaches. The master student will put especial emphasis in the theoretical grounds behind in order to design a variational framework combining the theoretical advantages of the considered techniques. In particular an optical flow based on Gabor phase tracking (developed in the group for medical imaging) will be generalized to urban scenes.
Address
Corporate Author Thesis Master's thesis
Publisher Place of Publication Bellaterra 08193, Barcelona, Spain Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes IAM; Approved no
Call Number IAM @ iam @ Mar2010 Serial 1582
Permanent link to this record
 

 
Author Lluis Pere de las Heras
Title Syntactic Model for Semantic Document Analysis Type Report
Year 2010 Publication CVC Technical Report Abbreviated Journal
Volume (up) 158 Issue Pages
Keywords
Abstract
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes Approved no
Call Number Admin @ si @ Per2010 Serial 1350
Permanent link to this record
 

 
Author Mohammad Ali Bagheri; Qigang Gao; Sergio Escalera; Huamin Ren; Thomas B. Moeslund; Elham Etemad
Title Locality Regularized Group Sparse Coding for Action Recognition Type Journal Article
Year 2017 Publication Computer Vision and Image Understanding Abbreviated Journal CVIU
Volume (up) 158 Issue Pages 106-114
Keywords Bag of words; Feature encoding; Locality constrained coding; Group sparse coding; Alternating direction method of multipliers; Action recognition
Abstract Bag of visual words (BoVW) models are widely utilized in image/ video representation and recognition. The cornerstone of these models is the encoding stage, in which local features are decomposed over a codebook in order to obtain a representation of features. In this paper, we propose a new encoding algorithm by jointly encoding the set of local descriptors of each sample and considering the locality structure of descriptors. The proposed method takes advantages of locality coding such as its stability and robustness to noise in descriptors, as well as the strengths of the group coding strategy by taking into account the potential relation among descriptors of a sample. To efficiently implement our proposed method, we consider the Alternating Direction Method of Multipliers (ADMM) framework, which results in quadratic complexity in the problem size. The method is employed for a challenging classification problem: action recognition by depth cameras. Experimental results demonstrate the outperformance of our methodology compared to the state-of-the-art on the considered datasets.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes HuPBA; no proj Approved no
Call Number Admin @ si @ BGE2017 Serial 3014
Permanent link to this record
 

 
Author Anjan Dutta
Title Symbol Spotting in Graphical Documents by Serialized Subgraph Matching Type Report
Year 2010 Publication CVC Technical Report Abbreviated Journal
Volume (up) 159 Issue Pages
Keywords
Abstract
Address
Corporate Author Thesis Master's thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes DAG Approved no
Call Number Admin @ si @ Dut2010 Serial 1351
Permanent link to this record