|
Ivet Rafegas, & Maria Vanrell. (2017). Color representation in CNNs: parallelisms with biological vision. In ICCV Workshop on Mutual Benefits ofr Cognitive and Computer Vision.
Abstract: Convolutional Neural Networks (CNNs) trained for object recognition tasks present representational capabilities approaching to primate visual systems [1]. This provides a computational framework to explore how image features
are efficiently represented. Here, we dissect a trained CNN
[2] to study how color is represented. We use a classical methodology used in physiology that is measuring index of selectivity of individual neurons to specific features. We use ImageNet Dataset [20] images and synthetic versions
of them to quantify color tuning properties of artificial neurons to provide a classification of the network population.
We conclude three main levels of color representation showing some parallelisms with biological visual systems: (a) a decomposition in a circular hue space to represent single color regions with a wider hue sampling beyond the first
layer (V2), (b) the emergence of opponent low-dimensional spaces in early stages to represent color edges (V1); and (c) a strong entanglement between color and shape patterns representing object-parts (e.g. wheel of a car), objectshapes (e.g. faces) or object-surrounds configurations (e.g. blue sky surrounding an object) in deeper layers (V4 or IT).
|
|
|
Mohammed Al Rawi, & Ernest Valveny. (2019). Compact and Efficient Multitask Learning in Vision, Language and Speech. In IEEE International Conference on Computer Vision Workshops (pp. 2933–2942).
Abstract: Across-domain multitask learning is a challenging area of computer vision and machine learning due to the intra-similarities among class distributions. Addressing this problem to cope with the human cognition system by considering inter and intra-class categorization and recognition complicates the problem even further. We propose in this work an effective holistic and hierarchical learning by using a text embedding layer on top of a deep learning model. We also propose a novel sensory discriminator approach to resolve the collisions between different tasks and domains. We then train the model concurrently on textual sentiment analysis, speech recognition, image classification, action recognition from video, and handwriting word spotting of two different scripts (Arabic and English). The model we propose successfully learned different tasks across multiple domains.
|
|
|
Bogdan Raducanu, Alireza Bosaghzadeh, & Fadi Dornaika. (2014). Facial Expression Recognition based on Multi-view Observations with Application to Social Robotics. In 1st Workshop on Computer Vision for Affective Computing (pp. 1–8).
Abstract: Human-robot interaction is a hot topic nowadays in the social robotics community. One crucial aspect is represented by the affective communication which comes encoded through the facial expressions. In this paper, we propose a novel approach for facial expression recognition, which exploits an efficient and adaptive graph-based label propagation (semi-supervised mode) in a multi-observation framework. The facial features are extracted using an appearance-based 3D face tracker, view- and texture independent. Our method has been extensively tested on the CMU dataset, and has been conveniently compared with other methods for graph construction. With the proposed approach, we developed an application for an AIBO robot, in which it mirrors the recognized facial
expression.
|
|
|
Bogdan Raducanu, Alireza Bosaghzadeh, & Fadi Dornaika. (2015). Multi-observation Face Recognition in Videos based on Label Propagation. In 6th Workshop on Analysis and Modeling of Faces and Gestures AMFG2015 (pp. 10–17).
Abstract: In order to deal with the huge amount of content generated by social media, especially for indexing and retrieval purposes, the focus shifted from single object recognition to multi-observation object recognition. Of particular interest is the problem of face recognition (used as primary cue for persons’ identity assessment), since it is highly required by popular social media search engines like Facebook and Youtube. Recently, several approaches for graph-based label propagation were proposed. However, the associated graphs were constructed in an ad-hoc manner (e.g., using the KNN graph) that cannot cope properly with the rapid and frequent changes in data appearance, a phenomenon intrinsically related with video sequences. In this paper, we
propose a novel approach for efficient and adaptive graph construction, based on a two-phase scheme: (i) the first phase is used to adaptively find the neighbors of a sample and also to find the adequate weights for the minimization function of the second phase; (ii) in the second phase, the
selected neighbors along with their corresponding weights are used to locally and collaboratively estimate the sparse affinity matrix weights. Experimental results performed on Honda Video Database (HVDB) and a subset of video
sequences extracted from the popular TV-series ’Friends’ show a distinct advantage of the proposed method over the existing standard graph construction methods.
|
|
|
Pau Rodriguez, Miguel Angel Bautista, Sergio Escalera, & Jordi Gonzalez. (2018). Beyond Oneshot Encoding: lower dimensional target embedding. IMAVIS - Image and Vision Computing, 75, 21–31.
Abstract: Target encoding plays a central role when learning Convolutional Neural Networks. In this realm, one-hot encoding is the most prevalent strategy due to its simplicity. However, this so widespread encoding schema assumes a flat label space, thus ignoring rich relationships existing among labels that can be exploited during training. In large-scale datasets, data does not span the full label space, but instead lies in a low-dimensional output manifold. Following this observation, we embed the targets into a low-dimensional space, drastically improving convergence speed while preserving accuracy. Our contribution is two fold: (i) We show that random projections of the label space are a valid tool to find such lower dimensional embeddings, boosting dramatically convergence rates at zero computational cost; and (ii) we propose a normalized eigenrepresentation of the class manifold that encodes the targets with minimal information loss, improving the accuracy of random projections encoding while enjoying the same convergence rates. Experiments on CIFAR-100, CUB200-2011, Imagenet, and MIT Places demonstrate that the proposed approach drastically improves convergence speed while reaching very competitive accuracy rates.
Keywords: Error correcting output codes; Output embeddings; Deep learning; Computer vision
|
|
|
Oriol Ramos Terrades, Albert Berenguel, & Debora Gil. (2020). A flexible outlier detector based on a topology given by graph communities.
Abstract: Outlier, or anomaly, detection is essential for optimal performance of machine learning methods and statistical predictive models. It is not just a technical step in a data cleaning process but a key topic in many fields such as fraudulent document detection, in medical applications and assisted diagnosis systems or detecting security threats. In contrast to population-based methods, neighborhood based local approaches are simple flexible methods that have the potential to perform well in small sample size unbalanced problems. However, a main concern of local approaches is the impact that the computation of each sample neighborhood has on the method performance. Most approaches use a distance in the feature space to define a single neighborhood that requires careful selection of several parameters. This work presents a local approach based on a local measure of the heterogeneity of sample labels in the feature space considered as a topological manifold. Topology is computed using the communities of a weighted graph codifying mutual nearest neighbors in the feature space. This way, we provide with a set of multiple neighborhoods able to describe the structure of complex spaces without parameter fine tuning. The extensive experiments on real-world data sets show that our approach overall outperforms, both, local and global strategies in multi and single view settings.
|
|
|
Oriol Ramos Terrades, Albert Berenguel, & Debora Gil. (2022). A Flexible Outlier Detector Based on a Topology Given by Graph Communities. BDR - Big Data Research, 29, 100332.
Abstract: Outlier detection is essential for optimal performance of machine learning methods and statistical predictive models. Their detection is especially determinant in small sample size unbalanced problems, since in such settings outliers become highly influential and significantly bias models. This particular experimental settings are usual in medical applications, like diagnosis of rare pathologies, outcome of experimental personalized treatments or pandemic emergencies. In contrast to population-based methods, neighborhood based local approaches compute an outlier score from the neighbors of each sample, are simple flexible methods that have the potential to perform well in small sample size unbalanced problems. A main concern of local approaches is the impact that the computation of each sample neighborhood has on the method performance. Most approaches use a distance in the feature space to define a single neighborhood that requires careful selection of several parameters, like the number of neighbors.
This work presents a local approach based on a local measure of the heterogeneity of sample labels in the feature space considered as a topological manifold. Topology is computed using the communities of a weighted graph codifying mutual nearest neighbors in the feature space. This way, we provide with a set of multiple neighborhoods able to describe the structure of complex spaces without parameter fine tuning. The extensive experiments on real-world and synthetic data sets show that our approach outperforms, both, local and global strategies in multi and single view settings.
Keywords: Classification algorithms; Detection algorithms; Description of feature space local structure; Graph communities; Machine learning algorithms; Outlier detectors
|
|
|
Antoni Rosell, Sonia Baeza, S. Garcia-Reina, JL. Mate, Ignasi Guasch, I. Nogueira, et al. (2022). EP01.05-001 Radiomics to Increase the Effectiveness of Lung Cancer Screening Programs. Radiolung Preliminary Results. JTO - Journal of Thoracic Oncology, 17(9), S182.
|
|
|
Antoni Rosell, Sonia Baeza, S. Garcia-Reina, JL. Mate, Ignasi Guasch, I. Nogueira, et al. (2022). Radiomics to increase the effectiveness of lung cancer screening programs. Radiolung preliminary results. ERJ - European Respiratory Journal, 60(66).
|
|
|
Adriana Romero, Nicolas Ballas, Samira Ebrahimi Kahou, Antoine Chassang, Carlo Gatta, & Yoshua Bengio. (2015). FitNets: Hints for Thin Deep Nets. In 3rd International Conference on Learning Representations ICLR2015.
Abstract: While depth tends to improve network performances, it also makes gradient-based training more difficult since deeper networks tend to be more non-linear. The recently proposed knowledge distillation approach is aimed at obtaining small and fast-to-execute models, and it has shown that a student network could imitate the soft output of a larger teacher network or ensemble of networks. In this paper, we extend this idea to allow the training of a student that is deeper and thinner than the teacher, using not only the outputs but also the intermediate representations learned by the teacher as hints to improve the training process and final performance of the student. Because the student intermediate hidden layer will generally be smaller than the teacher's intermediate hidden layer, additional parameters are introduced to map the student hidden layer to the prediction of the teacher hidden layer. This allows one to train deeper students that can generalize better or run faster, a trade-off that is controlled by the chosen student capacity. For example, on CIFAR-10, a deep student network with almost 10.4 times less parameters outperforms a larger, state-of-the-art teacher network.
Keywords: Computer Science ; Learning; Computer Science ;Neural and Evolutionary Computing
|
|
|
Marçal Rusiñol, K. Bertet, Jean-Marc Ogier, & Josep Llados. (2010). Symbol Recognition Using a Concept Lattice of Graphical Patterns. In Graphics Recognition. Achievements, Challenges, and Evolution. 8th International Workshop, GREC 2009. Selected Papers (Vol. 6020, pp. 187–198). LNCS. Springer Berlin Heidelberg.
Abstract: In this paper we propose a new approach to recognize symbols by the use of a concept lattice. We propose to build a concept lattice in terms of graphical patterns. Each model symbol is decomposed in a set of composing graphical patterns taken as primitives. Each one of these primitives is described by boundary moment invariants. The obtained concept lattice relates which symbolic patterns compose a given graphical symbol. A Hasse diagram is derived from the context and is used to recognize symbols affected by noise. We present some preliminary results over a variation of the dataset of symbols from the GREC 2005 symbol recognition contest.
|
|
|
Marçal Rusiñol, T.Benkhelfallah, & V. Poulain d'Andecy. (2013). Field Extraction from Administrative Documents by Incremental Structural Templates. In 12th International Conference on Document Analysis and Recognition (pp. 1100–1104).
Abstract: In this paper we present an incremental framework aimed at extracting field information from administrative document images in the context of a Digital Mail-room scenario. Given a single training sample in which the user has marked which fields have to be extracted from a particular document class, a document model representing structural relationships among words is built. This model is incrementally refined as the system processes more and more documents from the same class. A reformulation of the tf-idf statistic scheme allows to adjust the importance weights of the structural relationships among words. We report in the experimental section our results obtained with a large dataset of real invoices.
|
|
|
Mohamed Ramzy Ibrahim, Robert Benavente, Daniel Ponsa, & Felipe Lumbreras. (2024). SWViT-RRDB: Shifted Window Vision Transformer Integrating Residual in Residual Dense Block for Remote Sensing Super-Resolution. In 19th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications.
Abstract: Remote sensing applications, impacted by acquisition season and sensor variety, require high-resolution images. Transformer-based models improve satellite image super-resolution but are less effective than convolutional neural networks (CNNs) at extracting local details, crucial for image clarity. This paper introduces SWViT-RRDB, a new deep learning model for satellite imagery super-resolution. The SWViT-RRDB, combining transformer with convolution and attention blocks, overcomes the limitations of existing models by better representing small objects in satellite images. In this model, a pipeline of residual fusion group (RFG) blocks is used to combine the multi-headed self-attention (MSA) with residual in residual dense block (RRDB). This combines global and local image data for better super-resolution. Additionally, an overlapping cross-attention block (OCAB) is used to enhance fusion and allow interaction between neighboring pixels to maintain long-range pixel dependencies across the image. The SWViT-RRDB model and its larger variants outperform state-of-the-art (SoTA) models on two different satellite datasets in terms of PSNR and SSIM.
|
|
|
Mohammad Rouhani, E. Boyer, & Angel Sappa. (2014). Non-Rigid Registration meets Surface Reconstruction. In International Conference on 3D Vision (pp. 617–624).
Abstract: Non rigid registration is an important task in computer vision with many applications in shape and motion modeling. A fundamental step of the registration is the data association between the source and the target sets. Such association proves difficult in practice, due to the discrete nature of the information and its corruption by various types of noise, e.g. outliers and missing data. In this paper we investigate the benefit of the implicit representations for the non-rigid registration of 3D point clouds. First, the target points are described with small quadratic patches that are blended through partition of unity weighting. Then, the discrete association between the source and the target can be replaced by a continuous distance field induced by the interface. By combining this distance field with a proper deformation term, the registration energy can be expressed in a linear least square form that is easy and fast to solve. This significantly eases the registration by avoiding direct association between points. Moreover, a hierarchical approach can be easily implemented by employing coarse-to-fine representations. Experimental results are provided for point clouds from multi-view data sets. The qualitative and quantitative comparisons show the outperformance and robustness of our framework. %in presence of noise and outliers.
|
|
|
Gemma Roig, Xavier Boix, & Fernando De la Torre. (2009). Optimal Feature Selection for Subspace Image Matching. In 2nd IEEE International Workshop on Subspace Methods in conjunction.
Abstract: Image matching has been a central research topic in computer vision over the last decades. Typical approaches to correspondence involve matching feature points between images. In this paper, we present a novel problem for establishing correspondences between a sparse set of image features and a previously learned subspace model. We formulate the matching task as an energy minimization, and jointly optimize over all possible feature assignments and parameters of the subspace model. This problem is in general NP-hard. We propose a convex relaxation approximation, and develop two optimization strategies: naïve gradient-descent and quadratic programming. Alternatively, we reformulate the optimization criterion as a sparse eigenvalue problem, and solve it using a recently proposed backward greedy algorithm. Experimental results on facial feature detection show that the quadratic programming solution provides better selection mechanism for relevant features.
|
|