|
Mario Rojas, David Masip, A. Todorov, & Jordi Vitria. (2011). Automatic Prediction of Facial Trait Judgments: Appearance vs. Structural Models. Plos - PloS one, 6(8), e23323.
Abstract: JCR Impact Factor 2010: 4.411
Evaluating other individuals with respect to personality characteristics plays a crucial role in human relations and it is the focus of attention for research in diverse fields such as psychology and interactive computer systems. In psychology, face perception has been recognized as a key component of this evaluation system. Multiple studies suggest that observers use face information to infer personality characteristics. Interactive computer systems are trying to take advantage of these findings and apply them to increase the natural aspect of interaction and to improve the performance of interactive computer systems. Here, we experimentally test whether the automatic prediction of facial trait judgments (e.g. dominance) can be made by using the full appearance information of the face and whether a reduced representation of its structure is sufficient. We evaluate two separate approaches: a holistic representation model using the facial appearance information and a structural model constructed from the relations among facial salient points. State of the art machine learning methods are applied to a) derive a facial trait judgment model from training data and b) predict a facial trait value for any face. Furthermore, we address the issue of whether there are specific structural relations among facial points that predict perception of facial traits. Experimental results over a set of labeled data (9 different trait evaluations) and classification rules (4 rules) suggest that a) prediction of perception of facial traits is learnable by both holistic and structural approaches; b) the most reliable prediction of facial trait judgments is obtained by certain type of holistic descriptions of the face appearance; and c) for some traits such as attractiveness and extroversion, there are relationships between specific structural features and social perceptions
|
|
|
Noha Elfiky, Jordi Gonzalez, & Xavier Roca. (2012). Compact and Adaptive Spatial Pyramids for Scene Recognition. IMAVIS - Image and Vision Computing, 30(8), 492–500.
Abstract: Most successful approaches on scenerecognition tend to efficiently combine global image features with spatial local appearance and shape cues. On the other hand, less attention has been devoted for studying spatial texture features within scenes. Our method is based on the insight that scenes can be seen as a composition of micro-texture patterns. This paper analyzes the role of texture along with its spatial layout for scenerecognition. However, one main drawback of the resulting spatial representation is its huge dimensionality. Hence, we propose a technique that addresses this problem by presenting a compactSpatialPyramid (SP) representation. The basis of our compact representation, namely, CompactAdaptiveSpatialPyramid (CASP) consists of a two-stages compression strategy. This strategy is based on the Agglomerative Information Bottleneck (AIB) theory for (i) compressing the least informative SP features, and, (ii) automatically learning the most appropriate shape for each category. Our method exceeds the state-of-the-art results on several challenging scenerecognition data sets.
|
|
|
Marc Castello, Jordi Gonzalez, Ariel Amato, Pau Baiget, Carles Fernandez, Josep M. Gonfaus, et al. (2013). Exploiting Multimodal Interaction Techniques for Video-Surveillance. In Multimodal Interaction in Image and Video Applications Intelligent Systems Reference Library (Vol. 48, pp. 135–151). Springer Berlin Heidelberg.
Abstract: In this paper we present an example of a video surveillance application that exploits Multimodal Interactive (MI) technologies. The main objective of the so-called VID-Hum prototype was to develop a cognitive artificial system for both the detection and description of a particular set of human behaviours arising from real-world events. The main procedure of the prototype described in this chapter entails: (i) adaptation, since the system adapts itself to the most common behaviours (qualitative data) inferred from tracking (quantitative data) thus being able to recognize abnormal behaviors; (ii) feedback, since an advanced interface based on Natural Language understanding allows end-users the communicationwith the prototype by means of conceptual sentences; and (iii) multimodality, since a virtual avatar has been designed to describe what is happening in the scene, based on those textual interpretations generated by the prototype. Thus, the MI methodology has provided an adequate framework for all these cooperating processes.
|
|
|
Laura Igual, Joan Carles Soliva, Sergio Escalera, Roger Gimeno, Oscar Vilarroya, & Petia Radeva. (2012). Automatic Brain Caudate Nuclei Segmentation and Classification in Diagnostic of Attention-Deficit/Hyperactivity Disorder. CMIG - Computerized Medical Imaging and Graphics, 36(8), 591–600.
Abstract: We present a fully automatic diagnostic imaging test for Attention-Deficit/Hyperactivity Disorder diagnosis assistance based on previously found evidences of caudate nucleus volumetric abnormalities. The proposed method consists of different steps: a new automatic method for external and internal segmentation of caudate based on Machine Learning methodologies; the definition of a set of new volume relation features, 3D Dissociated Dipoles, used for caudate representation and classification. We separately validate the contributions using real data from a pediatric population and show precise internal caudate segmentation and discrimination power of the diagnostic test, showing significant performance improvements in comparison to other state-of-the-art methods.
Keywords: Automatic caudate segmentation; Attention-Deficit/Hyperactivity Disorder; Diagnostic test; Machine learning; Decision stumps; Dissociated dipoles
|
|
|
Carlo Gatta, & Francesco Ciompi. (2014). Stacked Sequential Scale-Space Taylor Context. TPAMI - IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(8), 1694–1700.
Abstract: We analyze sequential image labeling methods that sample the posterior label field in order to gather contextual information. We propose an effective method that extracts local Taylor coefficients from the posterior at different scales. Results show that our proposal outperforms state-of-the-art methods on MSRC-21, CAMVID, eTRIMS8 and KAIST2 data sets.
|
|
|
Fahad Shahbaz Khan, Joost Van de Weijer, Muhammad Anwer Rao, Michael Felsberg, & Carlo Gatta. (2014). Semantic Pyramids for Gender and Action Recognition. TIP - IEEE Transactions on Image Processing, 23(8), 3633–3645.
Abstract: Person description is a challenging problem in computer vision. We investigated two major aspects of person description: 1) gender and 2) action recognition in still images. Most state-of-the-art approaches for gender and action recognition rely on the description of a single body part, such as face or full-body. However, relying on a single body part is suboptimal due to significant variations in scale, viewpoint, and pose in real-world images. This paper proposes a semantic pyramid approach for pose normalization. Our approach is fully automatic and based on combining information from full-body, upper-body, and face regions for gender and action recognition in still images. The proposed approach does not require any annotations for upper-body and face of a person. Instead, we rely on pretrained state-of-the-art upper-body and face detectors to automatically extract semantic information of a person. Given multiple bounding boxes from each body part detector, we then propose a simple method to select the best candidate bounding box, which is used for feature extraction. Finally, the extracted features from the full-body, upper-body, and face regions are combined into a single representation for classification. To validate the proposed approach for gender recognition, experiments are performed on three large data sets namely: 1) human attribute; 2) head-shoulder; and 3) proxemics. For action recognition, we perform experiments on four data sets most used for benchmarking action recognition in still images: 1) Sports; 2) Willow; 3) PASCAL VOC 2010; and 4) Stanford-40. Our experiments clearly demonstrate that the proposed approach, despite its simplicity, outperforms state-of-the-art methods for gender and action recognition.
|
|
|
Katerine Diaz, Francesc J. Ferri, & W. Diaz. (2015). Incremental Generalized Discriminative Common Vectors for Image Classification. TNNLS - IEEE Transactions on Neural Networks and Learning Systems, 26(8), 1761–1775.
Abstract: Subspace-based methods have become popular due to their ability to appropriately represent complex data in such a way that both dimensionality is reduced and discriminativeness is enhanced. Several recent works have concentrated on the discriminative common vector (DCV) method and other closely related algorithms also based on the concept of null space. In this paper, we present a generalized incremental formulation of the DCV methods, which allows the update of a given model by considering the addition of new examples even from unseen classes. Having efficient incremental formulations of well-behaved batch algorithms allows us to conveniently adapt previously trained classifiers without the need of recomputing them from scratch. The proposed generalized incremental method has been empirically validated in different case studies from different application domains (faces, objects, and handwritten digits) considering several different scenarios in which new data are continuously added at different rates starting from an initial model.
|
|
|
G. Lisanti, I. Masi, Andrew Bagdanov, & Alberto del Bimbo. (2015). Person Re-identification by Iterative Re-weighted Sparse Ranking. TPAMI - IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(8), 1629–1642.
Abstract: In this paper we introduce a method for person re-identification based on discriminative, sparse basis expansions of targets in terms of a labeled gallery of known individuals. We propose an iterative extension to sparse discriminative classifiers capable of ranking many candidate targets. The approach makes use of soft- and hard- re-weighting to redistribute energy among the most relevant contributing elements and to ensure that the best candidates are ranked at each iteration. Our approach also leverages a novel visual descriptor which we show to be discriminative while remaining robust to pose and illumination variations. An extensive comparative evaluation is given demonstrating that our approach achieves state-of-the-art performance on single- and multi-shot person re-identification scenarios on the VIPeR, i-LIDS, ETHZ, and CAVIAR4REID datasets. The combination of our descriptor and iterative sparse basis expansion improves state-of-the-art rank-1 performance by six percentage points on VIPeR and by 20 on CAVIAR4REID compared to other methods with a single gallery image per person. With multiple gallery and probe images per person our approach improves by 17 percentage points the state-of-the-art on i-LIDS and by 72 on CAVIAR4REID at rank-1. The approach is also quite efficient, capable of single-shot person re-identification over galleries containing hundreds of individuals at about 30 re-identifications per second.
|
|
|
Adriana Romero, Petia Radeva, & Carlo Gatta. (2015). Meta-parameter free unsupervised sparse feature learning. TPAMI - IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(8), 1716–1722.
Abstract: We propose a meta-parameter free, off-the-shelf, simple and fast unsupervised feature learning algorithm, which exploits a new way of optimizing for sparsity. Experiments on CIFAR-10, STL- 10 and UCMerced show that the method achieves the state-of-theart performance, providing discriminative features that generalize well.
|
|
|
Ciprian Corneanu, Marc Oliu, Jeffrey F. Cohn, & Sergio Escalera. (2016). Survey on RGB, 3D, Thermal, and Multimodal Approaches for Facial Expression Recognition: History. TPAMI - IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(8), 1548–1568.
Abstract: Facial expressions are an important way through which humans interact socially. Building a system capable of automatically recognizing facial expressions from images and video has been an intense field of study in recent years. Interpreting such expressions remains challenging and much research is needed about the way they relate to human affect. This paper presents a general overview of automatic RGB, 3D, thermal and multimodal facial expression analysis. We define a new taxonomy for the field, encompassing all steps from face detection to facial expression recognition, and describe and classify the state of the art methods accordingly. We also present the important datasets and the bench-marking of most influential methods. We conclude with a general discussion about trends, important questions and future lines of research.
Keywords: Facial expression; affect; emotion recognition; RGB; 3D; thermal; multimodal
|
|
|
Mikhail Mozerov, & Joost Van de Weijer. (2017). Improved Recursive Geodesic Distance Computation for Edge Preserving Filter. TIP - IEEE Transactions on Image Processing, 26(8), 3696–3706.
Abstract: All known recursive filters based on the geodesic distance affinity are realized by two 1D recursions applied in two orthogonal directions of the image plane. The 2D extension of the filter is not valid and has theoretically drawbacks, which lead to known artifacts. In this paper, a maximum influence propagation method is proposed to approximate the 2D extension for the
geodesic distance-based recursive filter. The method allows to partially overcome the drawbacks of the 1D recursion approach. We show that our improved recursion better approximates the true geodesic distance filter, and the application of this improved filter for image denoising outperforms the existing recursive implementation of the geodesic distance. As an application,
we consider a geodesic distance-based filter for image denoising.
Experimental evaluation of our denoising method demonstrates comparable and for several test images better results, than stateof-the-art approaches, while our algorithm is considerably fasterwith computational complexity O(8P).
Keywords: Geodesic distance filter; color image filtering; image enhancement
|
|
|
Xialei Liu, Joost Van de Weijer, & Andrew Bagdanov. (2019). Exploiting Unlabeled Data in CNNs by Self-Supervised Learning to Rank. TPAMI - IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(8), 1862–1878.
Abstract: For many applications the collection of labeled data is expensive laborious. Exploitation of unlabeled data during training is thus a long pursued objective of machine learning. Self-supervised learning addresses this by positing an auxiliary task (different, but related to the supervised task) for which data is abundantly available. In this paper, we show how ranking can be used as a proxy task for some regression problems. As another contribution, we propose an efficient backpropagation technique for Siamese networks which prevents the redundant computation introduced by the multi-branch network architecture. We apply our framework to two regression problems: Image Quality Assessment (IQA) and Crowd Counting. For both we show how to automatically generate ranked image sets from unlabeled data. Our results show that networks trained to regress to the ground truth targets for labeled data and to simultaneously learn to rank unlabeled data obtain significantly better, state-of-the-art results for both IQA and crowd counting. In addition, we show that measuring network uncertainty on the self-supervised proxy task is a good measure of informativeness of unlabeled data. This can be used to drive an algorithm for active learning and we show that this reduces labeling effort by up to 50 percent.
Keywords: Task analysis;Training;Image quality;Visualization;Uncertainty;Labeling;Neural networks;Learning from rankings;image quality assessment;crowd counting;active learning
|
|
|
Akhil Gurram, Ahmet Faruk Tuna, Fengyi Shen, Onay Urfalioglu, & Antonio Lopez. (2021). Monocular Depth Estimation through Virtual-world Supervision and Real-world SfM Self-Supervision. TITS - IEEE Transactions on Intelligent Transportation Systems, 23(8), 12738–12751.
Abstract: Depth information is essential for on-board perception in autonomous driving and driver assistance. Monocular depth estimation (MDE) is very appealing since it allows for appearance and depth being on direct pixelwise correspondence without further calibration. Best MDE models are based on Convolutional Neural Networks (CNNs) trained in a supervised manner, i.e., assuming pixelwise ground truth (GT). Usually, this GT is acquired at training time through a calibrated multi-modal suite of sensors. However, also using only a monocular system at training time is cheaper and more scalable. This is possible by relying on structure-from-motion (SfM) principles to generate self-supervision. Nevertheless, problems of camouflaged objects, visibility changes, static-camera intervals, textureless areas, and scale ambiguity, diminish the usefulness of such self-supervision. In this paper, we perform monocular depth estimation by virtual-world supervision (MonoDEVS) and real-world SfM self-supervision. We compensate the SfM self-supervision limitations by leveraging virtual-world images with accurate semantic and depth supervision and addressing the virtual-to-real domain gap. Our MonoDEVSNet outperforms previous MDE CNNs trained on monocular and even stereo sequences.
|
|
|
Jaume Garcia. (2004). Generalized Active Shape Models Applied to Cardiac Function Analysis. Master's thesis, , .
Abstract: Medical imaging is very useful in the assessment and treatment of many diseases. To deal with the great amount of data provided by imaging scanners and extract quantitative information that physicians can interpret, many analysis algorithms have been developed. Any process of analysis always consists of a first step of segmenting some particular structure. In medical imaging, structures are not always well defined and suffer from noise artifacts thus, ordinary segmentation methods are not well suited. The ones that seem to give better results are those based on deformable models. Nevertheless, despite their capability of mixing image features together with smoothness constraints that may compensate for image irregularities, these are naturally local methods, i. e., each node of the active contour evolve taking into account information about its neighbors and some other weak constraints about flexibility and smoothness, but not about the global shape that they should find. Due to the fact that structures to be segmented are the same for all cases but with some inter and intra-patient variation, the incorporation of a priori knowledge about shape in the segmentation method will provide robustness to it. Active Shape Models is an algorithm based on the creation of a shape model called Point Distribution Model. It performs a segmentation using only shapes similar than those previously learned from a training set that capture most of the variation presented by the structure. This algorithm works by updating shape nodes along a normal segment which often can be too restrictive. For this reason we propose a generalization of this algorithm that we call Generalized Active Shape Models and fully integrates the a priori knowledge given by the Point Distribution Model with deformable models or any other appropriate segmentation method. Two different applications to cardiac imaging of this generalized method are developed and promising results are shown.
Keywords: Cardiac Analysis; Deformable Models; Active Contour Models; Active Shape Models; Tagged MRI; HARP; Contrast Echocardiography.
|
|
|
Debora Gil, Jaume Garcia, Aura Hernandez-Sabate, & Enric Marti. (2010). Manifold parametrization of the left ventricle for a statistical modelling of its complete anatomy. In 8th Medical Imaging (Vol. 7623, 304). SPIE.
Abstract: Distortion of Left Ventricle (LV) external anatomy is related to some dysfunctions, such as hypertrophy. The architecture of myocardial fibers determines LV electromechanical activation patterns as well as mechanics. Thus, their joined modelling would allow the design of specific interventions (such as peacemaker implantation and LV remodelling) and therapies (such as resynchronization). On one hand, accurate modelling of external anatomy requires either a dense sampling or a continuous infinite dimensional approach, which requires non-Euclidean statistics. On the other hand, computation of fiber models requires statistics on Riemannian spaces. Most approaches compute separate statistical models for external anatomy and fibers architecture. In this work we propose a general mathematical framework based on differential geometry concepts for computing a statistical model including, both, external and fiber anatomy. Our framework provides a continuous approach to external anatomy supporting standard statistics. We also provide a straightforward formula for the computation of the Riemannian fiber statistics. We have applied our methodology to the computation of complete anatomical atlas of canine hearts from diffusion tensor studies. The orientation of fibers over the average external geometry agrees with the segmental description of orientations reported in the literature.
|
|