|
C. Alejandro Parraga, Ramon Baldrich, & Maria Vanrell. (2010). Accurate Mapping of Natural Scenes Radiance to Cone Activation Space: A New Image Dataset. In 5th European Conference on Colour in Graphics, Imaging and Vision and 12th International Symposium on Multispectral Colour Science (50–57).
Abstract: The characterization of trichromatic cameras is usually done in terms of a device-independent color space, such as the CIE 1931 XYZ space. This is indeed convenient since it allows the testing of results against colorimetric measures. We have characterized our camera to represent human cone activation by mapping the camera sensor's (RGB) responses to human (LMS) through a polynomial transformation, which can be “customized” according to the types of scenes we want to represent. Here we present a method to test the accuracy of the camera measures and a study on how the choice of training reflectances for the polynomial may alter the results.
|
|
|
D.Sanchez, J.C.Ortega, & Miguel Angel Bautista. (2013). Human Body Segmentation with Multi-limb Error-Correcting Output Codes Detection and Graph Cuts Optimization. In 6th Iberian Conference on Pattern Recognition and Image Analysis (Vol. 7887, pp. 50–58). LNCS. Springer Berlin Heidelberg.
Abstract: Human body segmentation is a hard task because of the high variability in appearance produced by changes in the point of view, lighting conditions, and number of articulations of the human body. In this paper, we propose a two-stage approach for the segmentation of the human body. In a first step, a set of human limbs are described, normalized to be rotation invariant, and trained using cascade of classifiers to be split in a tree structure way. Once the tree structure is trained, it is included in a ternary Error-Correcting Output Codes (ECOC) framework. This first classification step is applied in a windowing way on a new test image, defining a body-like probability map, which is used as an initialization of a GMM color modelling and binary Graph Cuts optimization procedure. The proposed methodology is tested in a novel limb-labelled data set. Results show performance improvements of the novel approach in comparison to classical cascade of classifiers and human detector-based Graph Cuts segmentation approaches.
Keywords: Human Body Segmentation; Error-Correcting Output Codes; Cascade of Classifiers; Graph Cuts
|
|
|
Aura Hernandez-Sabate, Lluis Albarracin, Daniel Calvo, & Nuria Gorgorio. (2016). EyeMath: Identifying Mathematics Problem Solving Processes in a RTS Video Game. In 5th International Conference Games and Learning Alliance (Vol. 10056, pp. 50–59). LNCS.
Abstract: Photorealistic virtual environments are crucial for developing and testing automated driving systems in a safe way during trials. As commercially available simulators are expensive and bulky, this paper presents a low-cost, extendable, and easy-to-use (LEE) virtual environment with the aim to highlight its utility for level 3 driving automation. In particular, an experiment is performed using the presented simulator to explore the influence of different variables regarding control transfer of the car after the system was driving autonomously in a highway scenario. The results show that the speed of the car at the time when the system needs to transfer the control to the human driver is critical.
Keywords: Simulation environment; Automated Driving; Driver-Vehicle interaction
|
|
|
Maria Elena Meza de Luna, Juan Ramon Terven Salinas, Bogdan Raducanu, & Joaquin Salas. (2019). A Social-Aware Assistant to support individuals with visual impairments during social interaction: A systematic requirements analysis. IJHC - International Journal of Human-Computer Studies, 122, 50–60.
Abstract: Visual impairment affects the normal course of activities in everyday life including mobility, education, employment, and social interaction. Most of the existing technical solutions devoted to empowering the visually impaired people are in the areas of navigation (obstacle avoidance), access to printed information and object recognition. Less effort has been dedicated so far in developing solutions to support social interactions. In this paper, we introduce a Social-Aware Assistant (SAA) that provides visually impaired people with cues to enhance their face-to-face conversations. The system consists of a perceptive component (represented by smartglasses with an embedded video camera) and a feedback component (represented by a haptic belt). When the vision system detects a head nodding, the belt vibrates, thus suggesting the user to replicate (mirror) the gesture. In our experiments, sighted persons interacted with blind people wearing the SAA. We instructed the former to mirror the noddings according to the vibratory signal, while the latter interacted naturally. After the face-to-face conversation, the participants had an interview to express their experience regarding the use of this new technological assistant. With the data collected during the experiment, we have assessed quantitatively and qualitatively the device usefulness and user satisfaction.
|
|
|
Meysam Madadi, Sergio Escalera, Xavier Baro, & Jordi Gonzalez. (2022). End-to-end Global to Local CNN Learning for Hand Pose Recovery in Depth data. IETCV - IET Computer Vision, 16(1), 50–66.
Abstract: Despite recent advances in 3D pose estimation of human hands, especially thanks to the advent of CNNs and depth cameras, this task is still far from being solved. This is mainly due to the highly non-linear dynamics of fingers, which make hand model training a challenging task. In this paper, we exploit a novel hierarchical tree-like structured CNN, in which branches are trained to become specialized in predefined subsets of hand joints, called local poses. We further fuse local pose features, extracted from hierarchical CNN branches, to learn higher order dependencies among joints in the final pose by end-to-end training. Lastly, the loss function used is also defined to incorporate appearance and physical constraints about doable hand motion and deformation. Finally, we introduce a non-rigid data augmentation approach to increase the amount of training depth data. Experimental results suggest that feeding a tree-shaped CNN, specialized in local poses, into a fusion network for modeling joints correlations and dependencies, helps to increase the precision of final estimations, outperforming state-of-the-art results on NYU and SyntheticHand datasets.
Keywords: Computer vision; data acquisition; human computer interaction; learning (artificial intelligence); pose estimation
|
|
|
Fahad Shahbaz Khan, Joost Van de Weijer, & Maria Vanrell. (2012). Modulating Shape Features by Color Attention for Object Recognition. IJCV - International Journal of Computer Vision, 98(1), 49–64.
Abstract: Bag-of-words based image representation is a successful approach for object recognition. Generally, the subsequent stages of the process: feature detection,feature description, vocabulary construction and image representation are performed independent of the intentioned object classes to be detected. In such a framework, it was found that the combination of different image cues, such as shape and color, often obtains below expected results. This paper presents a novel method for recognizing object categories when using ultiple cues by separately processing the shape and color cues and combining them by modulating the shape features by category specific color attention. Color is used to compute bottom up and top-down attention maps. Subsequently, these color attention maps are used to modulate the weights of the shape features. In regions with higher attention shape features are given more weight than in regions with low attention. We compare our approach with existing methods that combine color and shape cues on five data sets containing varied importance of both cues, namely, Soccer (color predominance), Flower (color and hape parity), PASCAL VOC 2007 and 2009 (shape predominance) and Caltech-101 (color co-interference). The experiments clearly demonstrate that in all five data sets our proposed framework significantly outperforms existing methods for combining color and shape information.
|
|
|
Volkmar Frinken, Markus Baumgartner, Andreas Fischer, & Horst Bunke. (2012). Semi-Supervised Learning for Cursive Handwriting Recognition using Keyword Spotting. In 13th International Conference on Frontiers in Handwriting Recognition (pp. 49–54).
Abstract: State-of-the-art handwriting recognition systems are learning-based systems that require large sets of training data. The creation of training data, and consequently the creation of a well-performing recognition system, requires therefore a substantial amount of human work. This can be reduced with semi-supervised learning, which uses unlabeled text lines for training as well. Current approaches estimate the correct transcription of the unlabeled data via handwriting recognition which is not only extremely demanding as far as computational costs are concerned but also requires a good model of the target language. In this paper, we propose a different approach that makes use of keyword spotting, which is significantly faster and does not need any language model. In a set of experiments we demonstrate its superiority over existing approaches.
|
|
|
Marçal Rusiñol, V. Poulain d'Andecy, Dimosthenis Karatzas, & Josep Llados. (2014). Classification of Administrative Document Images by Logo Identification. In Bart Lamiroy, & Jean-Marc Ogier (Eds.), Graphics Recognition. Current Trends and Challenges (Vol. 8746, pp. 49–58). Springer Berlin Heidelberg.
Abstract: This paper is focused on the categorization of administrative document images (such as invoices) based on the recognition of the supplier’s graphical logo. Two different methods are proposed, the first one uses a bag-of-visual-words model whereas the second one tries to locate logo images described by the blurred shape model descriptor within documents by a sliding-window technique. Preliminar results are reported with a dataset of real administrative documents.
Keywords: Administrative Document Classification; Logo Recognition; Logo Spotting
|
|
|
Antonio Hernandez, Sergio Escalera, & Stan Sclaroff. (2016). Poselet-basedContextual Rescoring for Human Pose Estimation via Pictorial Structures. IJCV - International Journal of Computer Vision, 118(1), 49–64.
Abstract: In this paper we propose a contextual rescoring method for predicting the position of body parts in a human pose estimation framework. A set of poselets is incorporated in the model, and their detections are used to extract spatial and score-related features relative to other body part hypotheses. A method is proposed for the automatic discovery of a compact subset of poselets that covers the different poses in a set of validation images while maximizing precision. A rescoring mechanism is defined as a set-based boosting classifier that computes a new score for each body joint detection, given its relationship to detections of other body joints and mid-level parts in the image. This new score is incorporated in the pictorial structure model as an additional unary potential, following the recent work of Pishchulin et al. Experiments on two benchmarks show comparable results to Pishchulin et al. while reducing the size of the mid-level representation by an order of magnitude, reducing the execution time by 68 % accordingly.
Keywords: Contextual rescoring; Poselets; Human pose estimation
|
|
|
Alejandro Ariza-Casabona, Bartlomiej Twardowski, & Tri Kurniawan Wijaya. (2023). Exploiting Graph Structured Cross-Domain Representation for Multi-domain Recommendation. In European Conference on Information Retrieval – ECIR 2023: Advances in Information Retrieval (Vol. 13980, 49–65). LNCS.
Abstract: Multi-domain recommender systems benefit from cross-domain representation learning and positive knowledge transfer. Both can be achieved by introducing a specific modeling of input data (i.e. disjoint history) or trying dedicated training regimes. At the same time, treating domains as separate input sources becomes a limitation as it does not capture the interplay that naturally exists between domains. In this work, we efficiently learn multi-domain representation of sequential users’ interactions using graph neural networks. We use temporal intra- and inter-domain interactions as contextual information for our method called MAGRec (short for Multi-dom Ain Graph-based Recommender). To better capture all relations in a multi-domain setting, we learn two graph-based sequential representations simultaneously: domain-guided for recent user interest, and general for long-term interest. This approach helps to mitigate the negative knowledge transfer problem from multiple domains and improve overall representation. We perform experiments on publicly available datasets in different scenarios where MAGRec consistently outperforms state-of-the-art methods. Furthermore, we provide an ablation study and discuss further extensions of our method.
|
|
|
Naveen Onkarappa, & Angel Sappa. (2013). A Novel Space Variant Image Representation. JMIV - Journal of Mathematical Imaging and Vision, 47(1-2), 48–59.
Abstract: Traditionally, in machine vision images are represented using cartesian coordinates with uniform sampling along the axes. On the contrary, biological vision systems represent images using polar coordinates with non-uniform sampling. For various advantages provided by space-variant representations many researchers are interested in space-variant computer vision. In this direction the current work proposes a novel and simple space variant representation of images. The proposed representation is compared with the classical log-polar mapping. The log-polar representation is motivated by biological vision having the characteristic of higher resolution at the fovea and reduced resolution at the periphery. On the contrary to the log-polar, the proposed new representation has higher resolution at the periphery and lower resolution at the fovea. Our proposal is proved to be a better representation in navigational scenarios such as driver assistance systems and robotics. The experimental results involve analysis of optical flow fields computed on both proposed and log-polar representations. Additionally, an egomotion estimation application is also shown as an illustrative example. The experimental analysis comprises results from synthetic as well as real sequences.
Keywords: Space-variant representation; Log-polar mapping; Onboard vision applications
|
|
|
Mirko Arnold, Anarta Ghosh, Glen Doherty, Hugh Mulcahy, Stephen Patchett, & Gerard Lacey. (2013). Towards Automatic Direct Observation of Procedure and Skill (DOPS) in Colonoscopy. In Proceedings of the International Conference on Computer Vision Theory and Applications (pp. 48–53).
|
|
|
Karim Lekadir, Alfiia Galimzianova, Angels Betriu, Maria del Mar Vila, Laura Igual, Daniel L. Rubin, et al. (2017). A Convolutional Neural Network for Automatic Characterization of Plaque Composition in Carotid Ultrasound. J-BHI - IEEE Journal Biomedical and Health Informatics, 21(1), 48–55.
Abstract: Characterization of carotid plaque composition, more specifically the amount of lipid core, fibrous tissue, and calcified tissue, is an important task for the identification of plaques that are prone to rupture, and thus for early risk estimation of cardiovascular and cerebrovascular events. Due to its low costs and wide availability, carotid ultrasound has the potential to become the modality of choice for plaque characterization in clinical practice. However, its significant image noise, coupled with the small size of the plaques and their complex appearance, makes it difficult for automated techniques to discriminate between the different plaque constituents. In this paper, we propose to address this challenging problem by exploiting the unique capabilities of the emerging deep learning framework. More specifically, and unlike existing works which require a priori definition of specific imaging features or thresholding values, we propose to build a convolutional neural network (CNN) that will automatically extract from the images the information that is optimal for the identification of the different plaque constituents. We used approximately 90 000 patches extracted from a database of images and corresponding expert plaque characterizations to train and to validate the proposed CNN. The results of cross-validation experiments show a correlation of about 0.90 with the clinical assessment for the estimation of lipid core, fibrous cap, and calcified tissue areas, indicating the potential of deep learning for the challenging task of automatic characterization of plaque composition in carotid ultrasound.
|
|
|
Daniel Ponsa, & Antonio Lopez. (2007). Feature Selection Based on a New Formulation of the Minimal-Redundancy-Maximal-Relevance Criterion. In 3rd Iberian Conference on Pattern Recognition and Image Analysis, LNCS 4477 (pp. 47–54).
|
|
|
J.A.Perez, Enric Marti, & Juan J.Villanueva. (1992). Interfase de Usuario de Entrada de Datos 3D en un CAD de Cartografía Urbana a partir de Pares Estereoscópicos. In II Congreso Español de Informática Gráfica (pp. 47–60).
|
|