|
Shiqi Yang, Kai Wang, Luis Herranz, & Joost Van de Weijer. (2021). On Implicit Attribute Localization for Generalized Zero-Shot Learning. IEEE Signal Processing Letters, 28, 872–876.
Abstract: Zero-shot learning (ZSL) aims to discriminate images from unseen classes by exploiting relations to seen classes via their attribute-based descriptions. Since attributes are often related to specific parts of objects, many recent works focus on discovering discriminative regions. However, these methods usually require additional complex part detection modules or attention mechanisms. In this paper, 1) we show that common ZSL backbones (without explicit attention nor part detection) can implicitly localize attributes, yet this property is not exploited. 2) Exploiting it, we then propose SELAR, a simple method that further encourages attribute localization, surprisingly achieving very competitive generalized ZSL (GZSL) performance when compared with more complex state-of-the-art methods. Our findings provide useful insight for designing future GZSL methods, and SELAR provides an easy to implement yet strong baseline.
|
|
|
Shida Beigpour, Christian Riess, Joost Van de Weijer, & Elli Angelopoulou. (2014). Multi-Illuminant Estimation with Conditional Random Fields. TIP - IEEE Transactions on Image Processing, 23(1), 83–95.
Abstract: Most existing color constancy algorithms assume uniform illumination. However, in real-world scenes, this is not often the case. Thus, we propose a novel framework for estimating the colors of multiple illuminants and their spatial distribution in the scene. We formulate this problem as an energy minimization task within a conditional random field over a set of local illuminant estimates. In order to quantitatively evaluate the proposed method, we created a novel data set of two-dominant-illuminant images comprised of laboratory, indoor, and outdoor scenes. Unlike prior work, our database includes accurate pixel-wise ground truth illuminant information. The performance of our method is evaluated on multiple data sets. Experimental results show that our framework clearly outperforms single illuminant estimators as well as a recently proposed multi-illuminant estimation approach.
Keywords: color constancy; CRF; multi-illuminant
|
|
|
Sergio Vera, Debora Gil, Agnes Borras, Marius George Linguraru, & Miguel Angel Gonzalez Ballester. (2013). Geometric Steerable Medial Maps. MVA - Machine Vision and Applications, 24(6), 1255–1266.
Abstract: In order to provide more intuitive and easily interpretable representations of complex shapes/organs, medial manifolds should reach a compromise between simplicity in geometry and capability for restoring the anatomy/shape of the organ/volume. Existing morphological methods show excellent results when applied to 2D objects, but their quality drops across dimensions.
This paper contributes to the computation of medial manifolds in two aspects. First, we provide a standard scheme for the computation of medial manifolds that avoids degenerated medial axis segments. Second, we introduce a continuous operator for accurate and efficient computation of medial structures of arbitrary dimension. We evaluate quantitatively the performance of our method with respect to existing approaches, by applying them to syn- thetic shapes of known medial geometry. We also show its higher performance for medical imaging applications in terms of simplicity of medial structures and capability for reconstructing the anatomical volume.
Keywords: Medial Representations ,Medial Manifolds Comparation , Surface , Reconstruction
|
|
|
Sergio Escalera, Xavier Baro, Jordi Vitria, Petia Radeva, & Bogdan Raducanu. (2012). Social Network Extraction and Analysis Based on Multimodal Dyadic Interaction. SENS - Sensors, 12(2), 1702–1719.
Abstract: IF=1.77 (2010)
Social interactions are a very important component in peopleís lives. Social network analysis has become a common technique used to model and quantify the properties of social interactions. In this paper, we propose an integrated framework to explore the characteristics of a social network extracted from multimodal dyadic interactions. For our study, we used a set of videos belonging to New York Timesí Blogging Heads opinion blog.
The Social Network is represented as an oriented graph, whose directed links are determined by the Influence Model. The linksí weights are a measure of the ìinfluenceî a person has over the other. The states of the Influence Model encode automatically extracted audio/visual features from our videos using state-of-the art algorithms. Our results are reported in terms of accuracy of audio/visual data fusion for speaker segmentation and centrality measures used to characterize the extracted social network.
|
|
|
Sergio Escalera, Vassilis Athitsos, & Isabelle Guyon. (2016). Challenges in multimodal gesture recognition. JMLR - Journal of Machine Learning Research, 17, 1–54.
Abstract: This paper surveys the state of the art on multimodal gesture recognition and introduces the JMLR special topic on gesture recognition 2011-2015. We began right at the start of the KinectTMrevolution when inexpensive infrared cameras providing image depth recordings became available. We published papers using this technology and other more conventional methods, including regular video cameras, to record data, thus providing a good overview of uses of machine learning and computer vision using multimodal data in this area of application. Notably, we organized a series of challenges and made available several datasets we recorded for that purpose, including tens of thousands
of videos, which are available to conduct further research. We also overview recent state of the art works on gesture recognition based on a proposed taxonomy for gesture recognition, discussing challenges and future lines of research.
Keywords: Gesture Recognition; Time Series Analysis; Multimodal Data Analysis; Computer Vision; Pattern Recognition; Wearable sensors; Infrared Cameras; KinectTM
|
|
|
Sergio Escalera, R. M. Martinez, Jordi Vitria, Petia Radeva, & Maria Teresa Anguera. (2010). Deteccion automatica de la dominancia en conversaciones diadicas. EP - Escritos de Psicologia, 3(2), 41–45.
Abstract: Dominance is referred to the level of influence that a person has in a conversation. Dominance is an important research area in social psychology, but the problem of its automatic estimation is a very recent topic in the contexts of social and wearable computing. In this paper, we focus on the dominance detection of visual cues. We estimate the correlation among observers by categorizing the dominant people in a set of face-to-face conversations. Different dominance indicators from gestural communication are defined, manually annotated, and compared to the observers' opinion. Moreover, these indicators are automatically extracted from video sequences and learnt by using binary classifiers. Results from the three analyses showed a high correlation and allows the categorization of dominant people in public discussion video sequences.
Keywords: Dominance detection; Non-verbal communication; Visual features
|
|
|
Sergio Escalera, Oriol Pujol, Petia Radeva, Jordi Vitria, & Maria Teresa Anguera. (2010). Automatic Detection of Dominance and Expected Interest. EURASIPJ - EURASIP Journal on Advances in Signal Processing, , 12.
Abstract: Article ID 491819
Social Signal Processing is an emergent area of research that focuses on the analysis of social constructs. Dominance and interest are two of these social constructs. Dominance refers to the level of influence a person has in a conversation. Interest, when referred in terms of group interactions, can be defined as the degree of engagement that the members of a group collectively display during their interaction. In this paper, we argue that only using behavioral motion information, we are able to predict the interest of observers when looking at face-to-face interactions as well as the dominant people. First, we propose a simple set of movement-based features from body, face, and mouth activity in order to define a higher set of interaction indicators. The considered indicators are manually annotated by observers. Based on the opinions obtained, we define an automatic binary dominance detection problem and a multiclass interest quantification problem. Error-Correcting Output Codes framework is used to learn to rank the perceived observer's interest in face-to-face interactions meanwhile Adaboost is used to solve the dominant detection problem. The automatic system shows good correlation between the automatic categorization results and the manual ranking made by the observers in both dominance and interest detection problems.
|
|
|
Sergio Escalera, Oriol Pujol, & Petia Radeva. (2009). Separability of Ternary Codes for Sparse Designs of Error-Correcting Output Codes. PRL - Pattern Recognition Letters, 30(3), 285–297.
Abstract: Error Correcting Output Codes (ECOC) represent a successful framework to deal with multi-class categorization problems based on combining binary classifiers. In this paper, we present a new formulation of the ternary ECOC distance and the error-correcting capabilities in the ternary ECOC framework. Based on the new measure, we stress on how to design coding matrices preventing codification ambiguity and propose a new Sparse Random coding matrix with ternary distance maximization. The results on the UCI Repository and in a real speed traffic categorization problem show that when the coding design satisfies the new ternary measures, significant performance improvement is obtained independently of the decoding strategy applied.
|
|
|
Sergio Escalera, Oriol Pujol, & Petia Radeva. (2010). Traffic sign recognition system with β -correction. MVA - Machine Vision and Applications, 21(2), 99–111.
Abstract: Traffic sign classification represents a classical application of multi-object recognition processing in uncontrolled adverse environments. Lack of visibility, illumination changes, and partial occlusions are just a few problems. In this paper, we introduce a novel system for multi-class classification of traffic signs based on error correcting output codes (ECOC). ECOC is based on an ensemble of binary classifiers that are trained on bi-partition of classes. We classify a wide set of traffic signs types using robust error correcting codings. Moreover, we introduce the novel β-correction decoding strategy that outperforms the state-of-the-art decoding techniques, classifying a high number of classes with great success.
|
|
|
Sergio Escalera, Oriol Pujol, & Petia Radeva. (2010). On the Decoding Process in Ternary Error-Correcting Output Codes. TPAMI - IEEE on Pattern Analysis and Machine Intelligence, 32(1), 120–134.
Abstract: A common way to model multiclass classification problems is to design a set of binary classifiers and to combine them. Error-correcting output codes (ECOC) represent a successful framework to deal with these type of problems. Recent works in the ECOC framework showed significant performance improvements by means of new problem-dependent designs based on the ternary ECOC framework. The ternary framework contains a larger set of binary problems because of the use of a ldquodo not carerdquo symbol that allows us to ignore some classes by a given classifier. However, there are no proper studies that analyze the effect of the new symbol at the decoding step. In this paper, we present a taxonomy that embeds all binary and ternary ECOC decoding strategies into four groups. We show that the zero symbol introduces two kinds of biases that require redefinition of the decoding design. A new type of decoding measure is proposed, and two novel decoding strategies are defined. We evaluate the state-of-the-art coding and decoding strategies over a set of UCI machine learning repository data sets and into a real traffic sign categorization problem. The experimental results show that, following the new decoding strategies, the performance of the ECOC design is significantly improved.
|
|
|
Sergio Escalera, Oriol Pujol, & Petia Radeva. (2010). Error-Correcting Output Codes Library. JMLR - Journal of Machine Learning Research, 11, 661–664.
Abstract: (Feb):661−664
In this paper, we present an open source Error-Correcting Output Codes (ECOC) library. The ECOC framework is a powerful tool to deal with multi-class categorization problems. This library contains both state-of-the-art coding (one-versus-one, one-versus-all, dense random, sparse random, DECOC, forest-ECOC, and ECOC-ONE) and decoding designs (hamming, euclidean, inverse hamming, laplacian, β-density, attenuated, loss-based, probabilistic kernel-based, and loss-weighted) with the parameters defined by the authors, as well as the option to include your own coding, decoding, and base classifier.
|
|
|
Sergio Escalera, Oriol Pujol, & Petia Radeva. (2010). Re-coding ECOCs without retraining. PRL - Pattern Recognition Letters, 31(7), 555–562.
Abstract: A standard way to deal with multi-class categorization problems is by the combination of binary classifiers in a pairwise voting procedure. Recently, this classical approach has been formalized in the Error-Correcting Output Codes (ECOC) framework. In the ECOC framework, the one-versus-one coding demonstrates to achieve higher performance than the rest of coding designs. The binary problems that we train in the one-versus-one strategy are significantly smaller than in the rest of designs, and usually easier to be learnt, taking into account the smaller overlapping between classes. However, a high percentage of the positions coded by zero of the coding matrix, which implies a high sparseness degree, does not codify meta-class membership information. In this paper, we show that using the training data we can redefine without re-training, in a problem-dependent way, the one-versus-one coding matrix so that the new coded information helps the system to increase its generalization capability. Moreover, the new re-coding strategy is generalized to be applied over any binary code. The results over several UCI Machine Learning repository data sets and two real multi-class problems show that performance improvements can be obtained re-coding the classical one-versus-one and Sparse random designs compared to different state-of-the-art ECOC configurations.
|
|
|
Sergio Escalera, Oriol Pujol, J. Mauri, & Petia Radeva. (2009). Intravascular Ultrasound Tissue Characterization with Sub-class Error-Correcting Output Codes. Journal of Signal Processing Systems, 55(1-3), 35–47.
Abstract: Intravascular ultrasound (IVUS) represents a powerful imaging technique to explore coronary vessels and to study their morphology and histologic properties. In this paper, we characterize different tissues based on radial frequency, texture-based, and combined features. To deal with the classification of multiple tissues, we require the use of robust multi-class learning techniques. In this sense, error-correcting output codes (ECOC) show to robustly combine binary classifiers to solve multi-class problems. In this context, we propose a strategy to model multi-class classification tasks using sub-classes information in the ECOC framework. The new strategy splits the classes into different sub-sets according to the applied base classifier. Complex IVUS data sets containing overlapping data are learnt by splitting the original set of classes into sub-classes, and embedding the binary problems in a problem-dependent ECOC design. The method automatically characterizes different tissues, showing performance improvements over the state-of-the-art ECOC techniques for different base classifiers. Furthermore, the combination of RF and texture-based features also shows improvements over the state-of-the-art approaches.
|
|
|
Sergio Escalera, Jordi Gonzalez, Xavier Baro, & Jamie Shotton. (2016). Guest Editor Introduction to the Special Issue on Multimodal Human Pose Recovery and Behavior Analysis. TPAMI - IEEE Transactions on Pattern Analysis and Machine Intelligence, 28, 1489–1491.
Abstract: The sixteen papers in this special section focus on human pose recovery and behavior analysis (HuPBA). This is one of the most challenging topics in computer vision, pattern analysis, and machine learning. It is of critical importance for application areas that include gaming, computer interaction, human robot interaction, security, commerce, assistive technologies and rehabilitation, sports, sign language recognition, and driver assistance technology, to mention just a few. In essence, HuPBA requires dealing with the articulated nature of the human body, changes in appearance due to clothing, and the inherent problems of clutter scenes, such as background artifacts, occlusions, and illumination changes. These papers represent the most recent research in this field, including new methods considering still images, image sequences, depth data, stereo vision, 3D vision, audio, and IMUs, among others.
|
|
|
Sergio Escalera, Jordi Gonzalez, Hugo Jair Escalante, Xavier Baro, & Isabelle Guyon. (2018). Looking at People Special Issue. IJCV - International Journal of Computer Vision, 126(2-4), 141–143.
|
|