|
Huamin Ren, Nattiya Kanhabua, Andreas Mogelmose, Weifeng Liu, Kaustubh Kulkarni, Sergio Escalera, et al. (2018). Back-dropout Transfer Learning for Action Recognition. IETCV - IET Computer Vision, 12(4), 484–491.
Abstract: Transfer learning aims at adapting a model learned from source dataset to target dataset. It is a beneficial approach especially when annotating on the target dataset is expensive or infeasible. Transfer learning has demonstrated its powerful learning capabilities in various vision tasks. Despite transfer learning being a promising approach, it is still an open question how to adapt the model learned from the source dataset to the target dataset. One big challenge is to prevent the impact of category bias on classification performance. Dataset bias exists when two images from the same category, but from different datasets, are not classified as the same. To address this problem, a transfer learning algorithm has been proposed, called negative back-dropout transfer learning (NB-TL), which utilizes images that have been misclassified and further performs back-dropout strategy on them to penalize errors. Experimental results demonstrate the effectiveness of the proposed algorithm. In particular, the authors evaluate the performance of the proposed NB-TL algorithm on UCF 101 action recognition dataset, achieving 88.9% recognition rate.
Keywords: Learning (artificial intelligence); Pattern Recognition
|
|
|
Pau Rodriguez, Miguel Angel Bautista, Sergio Escalera, & Jordi Gonzalez. (2018). Beyond Oneshot Encoding: lower dimensional target embedding. IMAVIS - Image and Vision Computing, 75, 21–31.
Abstract: Target encoding plays a central role when learning Convolutional Neural Networks. In this realm, one-hot encoding is the most prevalent strategy due to its simplicity. However, this so widespread encoding schema assumes a flat label space, thus ignoring rich relationships existing among labels that can be exploited during training. In large-scale datasets, data does not span the full label space, but instead lies in a low-dimensional output manifold. Following this observation, we embed the targets into a low-dimensional space, drastically improving convergence speed while preserving accuracy. Our contribution is two fold: (i) We show that random projections of the label space are a valid tool to find such lower dimensional embeddings, boosting dramatically convergence rates at zero computational cost; and (ii) we propose a normalized eigenrepresentation of the class manifold that encodes the targets with minimal information loss, improving the accuracy of random projections encoding while enjoying the same convergence rates. Experiments on CIFAR-100, CUB200-2011, Imagenet, and MIT Places demonstrate that the proposed approach drastically improves convergence speed while reaching very competitive accuracy rates.
Keywords: Error correcting output codes; Output embeddings; Deep learning; Computer vision
|
|
|
Sergio Escalera, Alicia Fornes, O. Pujol, Petia Radeva, Gemma Sanchez, & Josep Llados. (2009). Blurred Shape Model for Binary and Grey-level Symbol Recognition. PRL - Pattern Recognition Letters, 30(15), 1424–1433.
Abstract: Many symbol recognition problems require the use of robust descriptors in order to obtain rich information of the data. However, the research of a good descriptor is still an open issue due to the high variability of symbols appearance. Rotation, partial occlusions, elastic deformations, intra-class and inter-class variations, or high variability among symbols due to different writing styles, are just a few problems. In this paper, we introduce a symbol shape description to deal with the changes in appearance that these types of symbols suffer. The shape of the symbol is aligned based on principal components to make the recognition invariant to rotation and reflection. Then, we present the Blurred Shape Model descriptor (BSM), where new features encode the probability of appearance of each pixel that outlines the symbols shape. Moreover, we include the new descriptor in a system to deal with multi-class symbol categorization problems. Adaboost is used to train the binary classifiers, learning the BSM features that better split symbol classes. Then, the binary problems are embedded in an Error-Correcting Output Codes framework (ECOC) to deal with the multi-class case. The methodology is evaluated on different synthetic and real data sets. State-of-the-art descriptors and classifiers are compared, showing the robustness and better performance of the present scheme to classify symbols with high variability of appearance.
|
|
|
Sergio Escalera, Oriol Pujol, & Petia Radeva. (2007). Boosted Landmarks of Contextual Descriptors and Forest-ECOC: a Novel Framework to Detect and Classify Objects in Cluttered Scenes.
|
|
|
Shifeng Zhang, Ajian Liu, Jun Wan, Yanyan Liang, Guogong Guo, Sergio Escalera, et al. (2020). CASIA-SURF: A Dataset and Benchmark for Large-scale Multi-modal Face Anti-spoofing. TTBIS - IEEE Transactions on Biometrics, Behavior, and Identity Science, 182–193.
Abstract: Face anti-spoofing is essential to prevent face recognition systems from a security breach. Much of the progresses have been made by the availability of face anti-spoofing benchmark datasets in recent years. However, existing face anti-spoofing benchmarks have limited number of subjects (≤170) and modalities (≤2), which hinder the further development of the academic community. To facilitate face anti-spoofing research, we introduce a large-scale multi-modal dataset, namely CASIA-SURF, which is the largest publicly available dataset for face anti-spoofing in terms of both subjects and modalities. Specifically, it consists of 1,000 subjects with 21,000 videos and each sample has 3 modalities ( i.e. , RGB, Depth and IR). We also provide comprehensive evaluation metrics, diverse evaluation protocols, training/validation/testing subsets and a measurement tool, developing a new benchmark for face anti-spoofing. Moreover, we present a novel multi-modal multi-scale fusion method as a strong baseline, which performs feature re-weighting to select the more informative channel features while suppressing the less useful ones for each modality across different scales. Extensive experiments have been conducted on the proposed dataset to verify its significance and generalization capability. The dataset is available at https://sites.google.com/qq.com/face-anti-spoofing/welcome/challengecvpr2019?authuser=0
|
|