|
Fahad Shahbaz Khan, Joost Van de Weijer, Muhammad Anwer Rao, Michael Felsberg, & Carlo Gatta. (2014). Semantic Pyramids for Gender and Action Recognition. TIP - IEEE Transactions on Image Processing, 23(8), 3633–3645.
Abstract: Person description is a challenging problem in computer vision. We investigated two major aspects of person description: 1) gender and 2) action recognition in still images. Most state-of-the-art approaches for gender and action recognition rely on the description of a single body part, such as face or full-body. However, relying on a single body part is suboptimal due to significant variations in scale, viewpoint, and pose in real-world images. This paper proposes a semantic pyramid approach for pose normalization. Our approach is fully automatic and based on combining information from full-body, upper-body, and face regions for gender and action recognition in still images. The proposed approach does not require any annotations for upper-body and face of a person. Instead, we rely on pretrained state-of-the-art upper-body and face detectors to automatically extract semantic information of a person. Given multiple bounding boxes from each body part detector, we then propose a simple method to select the best candidate bounding box, which is used for feature extraction. Finally, the extracted features from the full-body, upper-body, and face regions are combined into a single representation for classification. To validate the proposed approach for gender recognition, experiments are performed on three large data sets namely: 1) human attribute; 2) head-shoulder; and 3) proxemics. For action recognition, we perform experiments on four data sets most used for benchmarking action recognition in still images: 1) Sports; 2) Willow; 3) PASCAL VOC 2010; and 4) Stanford-40. Our experiments clearly demonstrate that the proposed approach, despite its simplicity, outperforms state-of-the-art methods for gender and action recognition.
|
|
|
Mohammad Ali Bagheri, Qigang Gao, & Sergio Escalera. (2015). Combining Local and Global Learners in the Pairwise Multiclass Classification. PAA - Pattern Analysis and Applications, 18(4), 845–860.
Abstract: Pairwise classification is a well-known class binarization technique that converts a multiclass problem into a number of two-class problems, one problem for each pair of classes. However, in the pairwise technique, nuisance votes of many irrelevant classifiers may result in a wrong class prediction. To overcome this problem, a simple, but efficient method is proposed and evaluated in this paper. The proposed method is based on excluding some classes and focusing on the most probable classes in the neighborhood space, named Local Crossing Off (LCO). This procedure is performed by employing a modified version of standard K-nearest neighbor and large margin nearest neighbor algorithms. The LCO method takes advantage of nearest neighbor classification algorithm because of its local learning behavior as well as the global behavior of powerful binary classifiers to discriminate between two classes. Combining these two properties in the proposed LCO technique will avoid the weaknesses of each method and will increase the efficiency of the whole classification system. On several benchmark datasets of varying size and difficulty, we found that the LCO approach leads to significant improvements using different base learners. The experimental results show that the proposed technique not only achieves better classification accuracy in comparison to other standard approaches, but also is computationally more efficient for tackling classification problems which have a relatively large number of target classes.
Keywords: Multiclass classification; Pairwise approach; One-versus-one
|
|
|
Mariella Dimiccoli, Benoît Girard, Alain Berthoz, & Daniel Bennequin. (2013). Striola Magica: a functional explanation of otolith organs. JCN - Journal of Computational Neuroscience, 35(2), 125–154.
Abstract: Otolith end organs of vertebrates sense linear accelerations of the head and gravitation. The hair cells on their epithelia are responsible for transduction. In mammals, the striola, parallel to the line where hair cells reverse their polarization, is a narrow region centered on a curve with curvature and torsion. It has been shown that the striolar region is functionally different from the rest, being involved in a phasic vestibular pathway. We propose a mathematical and computational model that explains the necessity of this amazing geometry for the striola to be able to carry out its function. Our hypothesis, related to the biophysics of the hair cells and to the physiology of their afferent neurons, is that striolar afferents collect information from several type I hair cells to detect the jerk in a large domain of acceleration directions. This predicts a mean number of two calyces for afferent neurons, as measured in rodents. The domain of acceleration directions sensed by our striolar model is compatible with the experimental results obtained on monkeys considering all afferents. Therefore, the main result of our study is that phasic and tonic vestibular afferents cover the same geometrical fields, but at different dynamical and frequency domains.
Keywords: Otolith organs ;Striola; Vestibular pathway
|
|
|
Carlo Gatta, Eloi Puertas, & Oriol Pujol. (2011). Multi-Scale Stacked Sequential Learning. PR - Pattern Recognition, 44(10-11), 2414–2416.
Abstract: One of the most widely used assumptions in supervised learning is that data is independent and identically distributed. This assumption does not hold true in many real cases. Sequential learning is the discipline of machine learning that deals with dependent data such that neighboring examples exhibit some kind of relationship. In the literature, there are different approaches that try to capture and exploit this correlation, by means of different methodologies. In this paper we focus on meta-learning strategies and, in particular, the stacked sequential learning approach. The main contribution of this work is two-fold: first, we generalize the stacked sequential learning. This generalization reflects the key role of neighboring interactions modeling. Second, we propose an effective and efficient way of capturing and exploiting sequential correlations that takes into account long-range interactions by means of a multi-scale pyramidal decomposition of the predicted labels. Additionally, this new method subsumes the standard stacked sequential learning approach. We tested the proposed method on two different classification tasks: text lines classification in a FAQ data set and image classification. Results on these tasks clearly show that our approach outperforms the standard stacked sequential learning. Moreover, we show that the proposed method allows to control the trade-off between the detail and the desired range of the interactions.
Keywords: Stacked sequential learning; Multiscale; Multiresolution; Contextual classification
|
|
|
Margarita Torre, Beatriz Remeseiro, Petia Radeva, & Fernando Martinez. (2020). DeepNEM: Deep Network Energy-Minimization for Agricultural Field Segmentation. JSTAEOR - IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 13, 726–737.
Abstract: One of the main characteristics of agricultural fields is that the appearance of different crops and their growth status, in an aerial image, is varied, and has a wide range of radiometric values and high level of variability. The extraction of these fields and their monitoring are activities that require a high level of human intervention. In this article, we propose a novel automatic algorithm, named deep network energy-minimization (DeepNEM), to extract agricultural fields in aerial images. The model-guided process selects the most relevant image clues extracted by a deep network, completes them and finally generates regions that represent the agricultural fields under a minimization scheme. DeepNEM has been tested over a broad range of fields in terms of size, shape, and content. Different measures were used to compare the DeepNEM with other methods, and to prove that it represents an improved approach to achieve a high-quality segmentation of agricultural fields. Furthermore, this article also presents a new public dataset composed of 1200 images with their parcels boundaries annotations.
|
|