|
Jorge Bernal, Fernando Vilariño, & F. Javier Sanchez. (2010). Feature Detectors and Feature Descriptors: Where We Are Now (Vol. 154).
Abstract: Feature Detection and Feature Description are clearly nowadays topics. Many Computer Vision applications rely on the use of several of these techniques in order to extract the most significant aspects of an image so they can help in some tasks such as image retrieval, image registration, object recognition, object categorization and texture classification, among others. In this paper we define what Feature Detection and Description are and then we present an extensive collection of several methods in order to show the different techniques that are being used right now. The aim of this report is to provide a glimpse of what is being used currently in these fields and to serve as a starting point for future endeavours.
|
|
|
David Berga, Xose R. Fernandez-Vidal, Xavier Otazu, V. Leboran, & Xose M. Pardo. (2019). Psychophysical evaluation of individual low-level feature influences on visual attention. VR - Vision Research, 154, 60–79.
Abstract: In this study we provide the analysis of eye movement behavior elicited by low-level feature distinctiveness with a dataset of synthetically-generated image patterns. Design of visual stimuli was inspired by the ones used in previous psychophysical experiments, namely in free-viewing and visual searching tasks, to provide a total of 15 types of stimuli, divided according to the task and feature to be analyzed. Our interest is to analyze the influences of low-level feature contrast between a salient region and the rest of distractors, providing fixation localization characteristics and reaction time of landing inside the salient region. Eye-tracking data was collected from 34 participants during the viewing of a 230 images dataset. Results show that saliency is predominantly and distinctively influenced by: 1. feature type, 2. feature contrast, 3. temporality of fixations, 4. task difficulty and 5. center bias. This experimentation proposes a new psychophysical basis for saliency model evaluation using synthetic images.
Keywords: Visual attention; Psychophysics; Saliency; Task; Context; Contrast; Center bias; Low-level; Synthetic; Dataset
|
|
|
Antonio Hernandez. (2010). Pose and Face Recovery via Spatio-temporal GrabCut Human Segmentation (Vol. 153). Master's thesis, , .
|
|
|
Ahmed Mounir Gad. (2010). Object Localization Enhancement by Multiple Segmentation Fusion (Vol. 152). Master's thesis, , .
|
|
|
Marc Serra. (2010). Estimating Intrinsic Images from Physical and Categorical Color Cues (Vol. 151). Master's thesis, , .
|
|
|
Ivet Rafegas, & Maria Vanrell. (2018). Color encoding in biologically-inspired convolutional neural networks. VR - Vision Research, 151, 7–17.
Abstract: Convolutional Neural Networks have been proposed as suitable frameworks to model biological vision. Some of these artificial networks showed representational properties that rival primate performances in object recognition. In this paper we explore how color is encoded in a trained artificial network. It is performed by estimating a color selectivity index for each neuron, which allows us to describe the neuron activity to a color input stimuli. The index allows us to classify whether they are color selective or not and if they are of a single or double color. We have determined that all five convolutional layers of the network have a large number of color selective neurons. Color opponency clearly emerges in the first layer, presenting 4 main axes (Black-White, Red-Cyan, Blue-Yellow and Magenta-Green), but this is reduced and rotated as we go deeper into the network. In layer 2 we find a denser hue sampling of color neurons and opponency is reduced almost to one new main axis, the Bluish-Orangish coinciding with the dataset bias. In layers 3, 4 and 5 color neurons are similar amongst themselves, presenting different type of neurons that detect specific colored objects (e.g., orangish faces), specific surrounds (e.g., blue sky) or specific colored or contrasted object-surround configurations (e.g. blue blob in a green surround). Overall, our work concludes that color and shape representation are successively entangled through all the layers of the studied network, revealing certain parallelisms with the reported evidences in primate brains that can provide useful insight into intermediate hierarchical spatio-chromatic representations.
Keywords: Color coding; Computer vision; Deep learning; Convolutional neural networks
|
|
|
Javier Marin. (2009). Virtual learning for real testing (Vol. 150). Master's thesis, , bell.
|
|
|
Monica Piñol, Angel Sappa, & Ricardo Toledo. (2015). Adaptive Feature Descriptor Selection based on a Multi-Table Reinforcement Learning Strategy. NEUCOM - Neurocomputing, 150(A), 106–115.
Abstract: This paper presents and evaluates a framework to improve the performance of visual object classification methods, which are based on the usage of image feature descriptors as inputs. The goal of the proposed framework is to learn the best descriptor for each image in a given database. This goal is reached by means of a reinforcement learning process using the minimum information. The visual classification system used to demonstrate the proposed framework is based on a bag of features scheme, and the reinforcement learning technique is implemented through the Q-learning approach. The behavior of the reinforcement learning with different state definitions is evaluated. Additionally, a method that combines all these states is formulated in order to select the optimal state. Finally, the chosen actions are obtained from the best set of image descriptors in the literature: PHOW, SIFT, C-SIFT, SURF and Spin. Experimental results using two public databases (ETH and COIL) are provided showing both the validity of the proposed approach and comparisons with state of the art. In all the cases the best results are obtained with the proposed approach.
Keywords: Reinforcement learning; Q-learning; Bag of features; Descriptors
|
|
|
Francisco Alvaro, Francisco Cruz, Joan Andreu Sanchez, Oriol Ramos Terrades, & Jose Miguel Benedi. (2015). Structure Detection and Segmentation of Documents Using 2D Stochastic Context-Free Grammars. NEUCOM - Neurocomputing, 150(A), 147–154.
Abstract: In this paper we dene a bidimensional extension of Stochastic Context-Free Grammars for structure detection and segmentation of images of documents.
Two sets of text classication features are used to perform an initial classication of each zone of the page. Then, the document segmentation is obtained as the most likely hypothesis according to a stochastic grammar. We used a dataset of historical marriage license books to validate this approach. We also tested several inference algorithms for Probabilistic Graphical Models
and the results showed that the proposed grammatical model outperformed
the other methods. Furthermore, grammars also provide the document structure
along with its segmentation.
Keywords: document image analysis; stochastic context-free grammars; text classication features
|
|
|
Daniel Sanchez, Miguel Angel Bautista, & Sergio Escalera. (2015). HuPBA 8k+: Dataset and ECOC-GraphCut based Segmentation of Human Limbs. NEUCOM - Neurocomputing, 150(A), 173–188.
Abstract: Human multi-limb segmentation in RGB images has attracted a lot of interest in the research community because of the huge amount of possible applications in fields like Human-Computer Interaction, Surveillance, eHealth, or Gaming. Nevertheless, human multi-limb segmentation is a very hard task because of the changes in appearance produced by different points of view, clothing, lighting conditions, occlusions, and number of articulations of the human body. Furthermore, this huge pose variability makes the availability of large annotated datasets difficult. In this paper, we introduce the HuPBA8k+ dataset. The dataset contains more than 8000 labeled frames at pixel precision, including more than 120000 manually labeled samples of 14 different limbs. For completeness, the dataset is also labeled at frame-level with action annotations drawn from an 11 action dictionary which includes both single person actions and person-person interactive actions. Furthermore, we also propose a two-stage approach for the segmentation of human limbs. In a first stage, human limbs are trained using cascades of classifiers to be split in a tree-structure way, which is included in an Error-Correcting Output Codes (ECOC) framework to define a body-like probability map. This map is used to obtain a binary mask of the subject by means of GMM color modelling and GraphCuts theory. In a second stage, we embed a similar tree-structure in an ECOC framework to build a more accurate set of limb-like probability maps within the segmented user mask, that are fed to a multi-label GraphCut procedure to obtain final multi-limb segmentation. The methodology is tested on the novel HuPBA8k+ dataset, showing performance improvements in comparison to state-of-the-art approaches. In addition, a baseline of standard action recognition methods for the 11 actions categories of the novel dataset is also provided.
Keywords: Human limb segmentation; ECOC; Graph-Cuts
|
|
|
Marta Diez-Ferrer, Debora Gil, Elena Carreño, Susana Padrones, Samantha Aso, Vanesa Vicens, et al. (2016). Positive Airway Pressure-Enhanced CT to Improve Virtual Bronchoscopic Navigation. CHEST - Chest Journal, 150(4), 1003A.
|
|
|
Razieh Rastgoo, Kourosh Kiani, & Sergio Escalera. (2020). Hand sign language recognition using multi-view hand skeleton. ESWA - Expert Systems With Applications, 150, 113336.
Abstract: Hand sign language recognition from video is a challenging research area in computer vision, which performance is affected by hand occlusion, fast hand movement, illumination changes, or background complexity, just to mention a few. In recent years, deep learning approaches have achieved state-of-the-art results in the field, though previous challenges are not completely solved. In this work, we propose a novel deep learning-based pipeline architecture for efficient automatic hand sign language recognition using Single Shot Detector (SSD), 2D Convolutional Neural Network (2DCNN), 3D Convolutional Neural Network (3DCNN), and Long Short-Term Memory (LSTM) from RGB input videos. We use a CNN-based model which estimates the 3D hand keypoints from 2D input frames. After that, we connect these estimated keypoints to build the hand skeleton by using midpoint algorithm. In order to obtain a more discriminative representation of hands, we project 3D hand skeleton into three views surface images. We further employ the heatmap image of detected keypoints as input for refinement in a stacked fashion. We apply 3DCNNs on the stacked features of hand, including pixel level, multi-view hand skeleton, and heatmap features, to extract discriminant local spatio-temporal features from these stacked inputs. The outputs of the 3DCNNs are fused and fed to a LSTM to model long-term dynamics of hand sign gestures. Analyzing 2DCNN vs. 3DCNN using different number of stacked inputs into the network, we demonstrate that 3DCNN better capture spatio-temporal dynamics of hands. To the best of our knowledge, this is the first time that this multi-modal and multi-view set of hand skeleton features are applied for hand sign language recognition. Furthermore, we present a new large-scale hand sign language dataset, namely RKS-PERSIANSIGN, including 10′000 RGB videos of 100 Persian sign words. Evaluation results of the proposed model on three datasets, NYU, First-Person, and RKS-PERSIANSIGN, indicate that our model outperforms state-of-the-art models in hand sign language recognition, hand pose estimation, and hand action recognition.
Keywords: Multi-view hand skeleton; Hand sign language recognition; 3DCNN; Hand pose estimation; RGB video; Hand action recognition
|
|
|
Carola Figueroa Flores, David Berga, Joost Van de Weijer, & Bogdan Raducanu. (2021). Saliency for free: Saliency prediction as a side-effect of object recognition. PRL - Pattern Recognition Letters, 150, 1–7.
Abstract: Saliency is the perceptual capacity of our visual system to focus our attention (i.e. gaze) on relevant objects instead of the background. So far, computational methods for saliency estimation required the explicit generation of a saliency map, process which is usually achieved via eyetracking experiments on still images. This is a tedious process that needs to be repeated for each new dataset. In the current paper, we demonstrate that is possible to automatically generate saliency maps without ground-truth. In our approach, saliency maps are learned as a side effect of object recognition. Extensive experiments carried out on both real and synthetic datasets demonstrated that our approach is able to generate accurate saliency maps, achieving competitive results when compared with supervised methods.
Keywords: Saliency maps; Unsupervised learning; Object recognition
|
|
|
Kai Wang, Joost Van de Weijer, & Luis Herranz. (2021). ACAE-REMIND for online continual learning with compressed feature replay. PRL - Pattern Recognition Letters, 150, 122–129.
Abstract: Online continual learning aims to learn from a non-IID stream of data from a number of different tasks, where the learner is only allowed to consider data once. Methods are typically allowed to use a limited buffer to store some of the images in the stream. Recently, it was found that feature replay, where an intermediate layer representation of the image is stored (or generated) leads to superior results than image replay, while requiring less memory. Quantized exemplars can further reduce the memory usage. However, a drawback of these methods is that they use a fixed (or very intransigent) backbone network. This significantly limits the learning of representations that can discriminate between all tasks. To address this problem, we propose an auxiliary classifier auto-encoder (ACAE) module for feature replay at intermediate layers with high compression rates. The reduced memory footprint per image allows us to save more exemplars for replay. In our experiments, we conduct task-agnostic evaluation under online continual learning setting and get state-of-the-art performance on ImageNet-Subset, CIFAR100 and CIFAR10 dataset.
Keywords: online continual learning; autoencoders; vector quantization
|
|
|
Lluis Gomez, Ali Furkan Biten, Ruben Tito, Andres Mafla, Marçal Rusiñol, Ernest Valveny, et al. (2021). Multimodal grid features and cell pointers for scene text visual question answering. PRL - Pattern Recognition Letters, 150, 242–249.
Abstract: This paper presents a new model for the task of scene text visual question answering. In this task questions about a given image can only be answered by reading and understanding scene text. Current state of the art models for this task make use of a dual attention mechanism in which one attention module attends to visual features while the other attends to textual features. A possible issue with this is that it makes difficult for the model to reason jointly about both modalities. To fix this problem we propose a new model that is based on an single attention mechanism that attends to multi-modal features conditioned to the question. The output weights of this attention module over a grid of multi-modal spatial features are interpreted as the probability that a certain spatial location of the image contains the answer text to the given question. Our experiments demonstrate competitive performance in two standard datasets with a model that is faster than previous methods at inference time. Furthermore, we also provide a novel analysis of the ST-VQA dataset based on a human performance study. Supplementary material, code, and data is made available through this link.
|
|