Publicacions CVC -- Query Results

Jorge Charco, Angel Sappa, Boris X. Vintimilla, & Henry Velesaca. (2020). Transfer Learning from Synthetic Data in the Camera Pose Estimation Problem. In 15th International Conference on Computer Vision Theory and Applications. Abstract: This paper presents a novel Siamese network architecture, as a variant of Resnet-50, to estimate the relative camera pose on multi-view environments. In order to improve the performance of the proposed model a transfer learning strategy, based on synthetic images obtained from a virtual-world, is considered. The transfer learning consists of first training the network using pairs of images from the virtual-world scenario considering different conditions (i.e., weather, illumination, objects, buildings, etc.); then, the learned weight of the network are transferred to the real case, where images from real-world scenarios are considered. Experimental results and comparisons with the state of the art show both, improvements on the relative pose estimation accuracy using the proposed model, as well as further improvements when the transfer learning strategy (synthetic-world data transfer learning real-world data) is considered to tackle the limitation on the training due to the reduced number of pairs of real-images on most of the public data sets. http://refbase.cvc.uab.es/show.php?record=3433
Xavier Soria, Edgar Riba, & Angel Sappa. (2020). Dense Extreme Inception Network: Towards a Robust CNN Model for Edge Detection. In IEEE Winter Conference on Applications of Computer Vision. Abstract: This paper proposes a Deep Learning based edge detector, which is inspired on both HED (Holistically-Nested Edge Detection) and Xception networks. The proposed approach generates thin edge-maps that are plausible for human eyes; it can be used in any edge detection task without previous training or fine tuning process. As a second contribution, a large dataset with carefully annotated edges has been generated. This dataset has been used for training the proposed approach as well the state-of-the-art algorithms for comparisons. Quantitative and qualitative evaluations have been performed on different benchmarks showing improvements with the proposed method when F-measure of ODS and OIS are considered. http://refbase.cvc.uab.es/show.php?record=3434
Ciprian Corneanu, Sergio Escalera, & Aleix M. Martinez. (2020). Computing the Testing Error Without a Testing Set. In 33rd IEEE Conference on Computer Vision and Pattern Recognition. Abstract: Oral. Paper award nominee. Deep Neural Networks (DNNs) have revolutionized computer vision. We now have DNNs that achieve top (performance) results in many problems, including object recognition, facial expression analysis, and semantic segmentation, to name but a few. The design of the DNNs that achieve top results is, however, non-trivial and mostly done by trailand-error. That is, typically, researchers will derive many DNN architectures (i.e., topologies) and then test them on multiple datasets. However, there are no guarantees that the selected DNN will perform well in the real world. One can use a testing set to estimate the performance gap between the training and testing sets, but avoiding overfitting-to-thetesting-data is almost impossible. Using a sequestered testing dataset may address this problem, but this requires a constant update of the dataset, a very expensive venture. Here, we derive an algorithm to estimate the performance gap between training and testing that does not require any testing dataset. Specifically, we derive a number of persistent topology measures that identify when a DNN is learning to generalize to unseen samples. This allows us to compute the DNN’s testing error on unseen samples, even when we do not have access to them. We provide extensive experimental validation on multiple networks and datasets to demonstrate the feasibility of the proposed approach. http://refbase.cvc.uab.es/show.php?record=3437
Swathikiran Sudhakaran, Sergio Escalera, & Oswald Lanz. (2020). Gate-Shift Networks for Video Action Recognition. In 33rd IEEE Conference on Computer Vision and Pattern Recognition. Abstract: Deep 3D CNNs for video action recognition are designed to learn powerful representations in the joint spatio-temporal feature space. In practice however, because of the large number of parameters and computations involved, they may under-perform in the lack of sufficiently large datasets for training them at scale. In this paper we introduce spatial gating in spatial-temporal decomposition of 3D kernels. We implement this concept with Gate-Shift Module (GSM). GSM is lightweight and turns a 2D-CNN into a highly efficient spatio-temporal feature extractor. With GSM plugged in, a 2D-CNN learns to adaptively route features through time and combine them, at almost no additional parameters and computational overhead. We perform an extensive evaluation of the proposed module to study its effectiveness in video action recognition, achieving state-of-the-art results on Something Something-V1 and Diving48 datasets, and obtaining competitive results on EPIC-Kitchens with far less model complexity. http://refbase.cvc.uab.es/show.php?record=3438
Eduardo Aguilar, Bhalaji Nagarajan, Rupali Khatun, Marc Bolaños, & Petia Radeva. (2020). Uncertainty Modeling and Deep Learning Applied to Food Image Analysis. In 13th International Joint Conference on Biomedical Engineering Systems and Technologies. Abstract: Recently, computer vision approaches specially assisted by deep learning techniques have shown unexpected advancements that practically solve problems that never have been imagined to be automatized like face recognition or automated driving. However, food image recognition has received a little effort in the Computer Vision community. In this project, we review the field of food image analysis and focus on how to combine with two challenging research lines: deep learning and uncertainty modeling. After discussing our methodology to advance in this direction, we comment potential research, social and economic impact of the research on food image analysis. http://refbase.cvc.uab.es/show.php?record=3526
Mohamed Ali Souibgui, Y.Kessentini, & Alicia Fornes. (2020). A conditional GAN based approach for distorted camera captured documents recovery. In 4th Mediterranean Conference on Pattern Recognition and Artificial Intelligence. http://refbase.cvc.uab.es/show.php?record=3450
Fernando Vilariño. (2020). Unveiling the Social Impact of AI. In Workshop at Digital Living Lab Days Conference. http://refbase.cvc.uab.es/show.php?record=3459
Hassan Ahmed Sial, Ramon Baldrich, Maria Vanrell, & Dimitris Samaras. (2020). Light Direction and Color Estimation from Single Image with Deep Regression. In London Imaging Conference. Abstract: We present a method to estimate the direction and color of the scene light source from a single image. Our method is based on two main ideas: (a) we use a new synthetic dataset with strong shadow effects with similar constraints to the SID dataset; (b) we define a deep architecture trained on the mentioned dataset to estimate the direction and color of the scene light source. Apart from showing good performance on synthetic images, we additionally propose a preliminary procedure to obtain light positions of the Multi-Illumination dataset, and, in this way, we also prove that our trained model achieves good performance when it is applied to real scenes. http://refbase.cvc.uab.es/show.php?record=3460
Sagnik Das, Hassan Ahmed Sial, Ke Ma, Ramon Baldrich, Maria Vanrell, & Dimitris Samaras. (2020). Intrinsic Decomposition of Document Images In-the-Wild. In 31st British Machine Vision Conference. Abstract: Automatic document content processing is affected by artifacts caused by the shape of the paper, non-uniform and diverse color of lighting conditions. Fully-supervised methods on real data are impossible due to the large amount of data needed. Hence, the current state of the art deep learning models are trained on fully or partially synthetic images. However, document shadow or shading removal results still suffer because: (a) prior methods rely on uniformity of local color statistics, which limit their application on real-scenarios with complex document shapes and textures and; (b) synthetic or hybrid datasets with non-realistic, simulated lighting conditions are used to train the models. In this paper we tackle these problems with our two main contributions. First, a physically constrained learning-based method that directly estimates document reflectance based on intrinsic image formation which generalizes to challenging illumination conditions. Second, a new dataset that clearly improves previous synthetic ones, by adding a large range of realistic shading and diverse multi-illuminant conditions, uniquely customized to deal with documents in-the-wild. The proposed architecture works in two steps. First, a white balancing module neutralizes the color of the illumination on the input image. Based on the proposed multi-illuminant dataset we achieve a good white-balancing in really difficult conditions. Second, the shading separation module accurately disentangles the shading and paper material in a self-supervised manner where only the synthetic texture is used as a weak training signal (obviating the need for very costly ground truth with disentangled versions of shading and reflectance). The proposed approach leads to significant generalization of document reflectance estimation in real scenes with challenging illumination. We extensively evaluate on the real benchmark datasets available for intrinsic image decomposition and document shadow removal tasks. Our reflectance estimation scheme, when used as a pre-processing step of an OCR pipeline, shows a 21% improvement of character error rate (CER), thus, proving the practical applicability. The data and code will be available at: https://github.com/cvlab-stonybrook/DocIIW. http://refbase.cvc.uab.es/show.php?record=3461
Xinhang Song, Haitao Zeng, Sixian Zhang, Luis Herranz, & Shuqiang Jiang. (2020). Generalized Zero-shot Learning with Multi-source Semantic Embeddings for Scene Recognition. In 28th ACM International Conference on Multimedia. Abstract: Recognizing visual categories from semantic descriptions is a promising way to extend the capability of a visual classifier beyond the concepts represented in the training data (i.e. seen categories). This problem is addressed by (generalized) zero-shot learning methods (GZSL), which leverage semantic descriptions that connect them to seen categories (e.g. label embedding, attributes). Conventional GZSL are designed mostly for object recognition. In this paper we focus on zero-shot scene recognition, a more challenging setting with hundreds of categories where their differences can be subtle and often localized in certain objects or regions. Conventional GZSL representations are not rich enough to capture these local discriminative differences. Addressing these limitations, we propose a feature generation framework with two novel components: 1) multiple sources of semantic information (i.e. attributes, word embeddings and descriptions), 2) region descriptions that can enhance scene discrimination. To generate synthetic visual features we propose a two-step generative approach, where local descriptions are sampled and used as conditions to generate visual features. The generated features are then aggregated and used together with real features to train a joint classifier. In order to evaluate the proposed method, we introduce a new dataset for zero-shot scene recognition with multi-semantic annotations. Experimental results on the proposed dataset and SUN Attribute dataset illustrate the effectiveness of the proposed method. http://refbase.cvc.uab.es/show.php?record=3465
Kai Wang, Luis Herranz, Anjan Dutta, & Joost Van de Weijer. (2020). Bookworm continual learning: beyond zero-shot learning and continual learning. In Workshop TASK-CV 2020. Abstract: We propose bookworm continual learning(BCL), a flexible setting where unseen classes can be inferred via a semantic model, and the visual model can be updated continually. Thus BCL generalizes both continual learning (CL) and zero-shot learning (ZSL). We also propose the bidirectional imagination (BImag) framework to address BCL where features of both past and future classes are generated. We observe that conditioning the feature generator on attributes can actually harm the continual learning ability, and propose two variants (joint class-attribute conditioning and asymmetric generation) to alleviate this problem. http://refbase.cvc.uab.es/show.php?record=3466
Debora Gil, & Guillermo Torres. (2020). A multi-shape loss function with adaptive class balancing for the segmentation of lung structures. In 34th International Congress and Exhibition on Computer Assisted Radiology & Surgery. http://refbase.cvc.uab.es/show.php?record=3472
Debora Gil, Oriol Ramos Terrades, & Raquel Perez. (2020). Topological Radiomics (TOPiomics): Early Detection of Genetic Abnormalities in Cancer Treatment Evolution. In Women in Geometry and Topology. http://refbase.cvc.uab.es/show.php?record=3473
Debora Gil, Katerine Diaz, Carles Sanchez, & Aura Hernandez-Sabate. (2020). Early Screening of SARS-CoV-2 by Intelligent Analysis of X-Ray Images. Abstract: Future SARS-CoV-2 virus outbreak COVID-XX might possibly occur during the next years. However the pathology in humans is so recent that many clinical aspects, like early detection of complications, side effects after recovery or early screening, are currently unknown. In spite of the number of cases of COVID-19, its rapid spread putting many sanitary systems in the edge of collapse has hindered proper collection and analysis of the data related to COVID-19 clinical aspects. We describe an interdisciplinary initiative that integrates clinical research, with image diagnostics and the use of new technologies such as artificial intelligence and radiomics with the aim of clarifying some of SARS-CoV-2 open questions. The whole initiative addresses 3 main points: 1) collection of standardize data including images, clinical data and analytics; 2) COVID-19 screening for its early diagnosis at primary care centers; 3) define radiomic signatures of COVID-19 evolution and associated pathologies for the early treatment of complications. In particular, in this paper we present a general overview of the project, the experimental design and first results of X-ray COVID-19 detection using a classic approach based on HoG and feature selection. Our experiments include a comparison to some recent methods for COVID-19 screening in X-Ray and an exploratory analysis of the feasibility of X-Ray COVID-19 screening. Results show that classic approaches can outperform deep-learning methods in this experimental setting, indicate the feasibility of early COVID-19 screening and that non-COVID infiltration is the group of patients most similar to COVID-19 in terms of radiological description of X-ray. Therefore, an efficient COVID-19 screening should be complemented with other clinical data to better discriminate these cases. http://refbase.cvc.uab.es/show.php?record=3474
Oriol Ramos Terrades, Albert Berenguel, & Debora Gil. (2020). A flexible outlier detector based on a topology given by graph communities. Abstract: Outlier, or anomaly, detection is essential for optimal performance of machine learning methods and statistical predictive models. It is not just a technical step in a data cleaning process but a key topic in many fields such as fraudulent document detection, in medical applications and assisted diagnosis systems or detecting security threats. In contrast to population-based methods, neighborhood based local approaches are simple flexible methods that have the potential to perform well in small sample size unbalanced problems. However, a main concern of local approaches is the impact that the computation of each sample neighborhood has on the method performance. Most approaches use a distance in the feature space to define a single neighborhood that requires careful selection of several parameters. This work presents a local approach based on a local measure of the heterogeneity of sample labels in the feature space considered as a topological manifold. Topology is computed using the communities of a weighted graph codifying mutual nearest neighbors in the feature space. This way, we provide with a set of multiple neighborhoods able to describe the structure of complex spaces without parameter fine tuning. The extensive experiments on real-world data sets show that our approach overall outperforms, both, local and global strategies in multi and single view settings. http://refbase.cvc.uab.es/show.php?record=3475

Jorge Charco, Angel Sappa, Boris X. Vintimilla, & Henry Velesaca. (2020). Transfer Learning from Synthetic Data in the Camera Pose Estimation Problem. In 15th International Conference on Computer Vision Theory and Applications.

Xavier Soria, Edgar Riba, & Angel Sappa. (2020). Dense Extreme Inception Network: Towards a Robust CNN Model for Edge Detection. In IEEE Winter Conference on Applications of Computer Vision.

Ciprian Corneanu, Sergio Escalera, & Aleix M. Martinez. (2020). Computing the Testing Error Without a Testing Set. In 33rd IEEE Conference on Computer Vision and Pattern Recognition.

Swathikiran Sudhakaran, Sergio Escalera, & Oswald Lanz. (2020). Gate-Shift Networks for Video Action Recognition. In 33rd IEEE Conference on Computer Vision and Pattern Recognition.

Eduardo Aguilar, Bhalaji Nagarajan, Rupali Khatun, Marc Bolaños, & Petia Radeva. (2020). Uncertainty Modeling and Deep Learning Applied to Food Image Analysis. In 13th International Joint Conference on Biomedical Engineering Systems and Technologies.

Mohamed Ali Souibgui, Y.Kessentini, & Alicia Fornes. (2020). A conditional GAN based approach for distorted camera captured documents recovery. In 4th Mediterranean Conference on Pattern Recognition and Artificial Intelligence.

Fernando Vilariño. (2020). Unveiling the Social Impact of AI. In Workshop at Digital Living Lab Days Conference.

Hassan Ahmed Sial, Ramon Baldrich, Maria Vanrell, & Dimitris Samaras. (2020). Light Direction and Color Estimation from Single Image with Deep Regression. In London Imaging Conference.

Sagnik Das, Hassan Ahmed Sial, Ke Ma, Ramon Baldrich, Maria Vanrell, & Dimitris Samaras. (2020). Intrinsic Decomposition of Document Images In-the-Wild. In 31st British Machine Vision Conference.

Abstract: Automatic document content processing is affected by artifacts caused by the shape
of the paper, non-uniform and diverse color of lighting conditions. Fully-supervised
methods on real data are impossible due to the large amount of data needed. Hence, the
current state of the art deep learning models are trained on fully or partially synthetic images. However, document shadow or shading removal results still suffer because: (a) prior methods rely on uniformity of local color statistics, which limit their application on real-scenarios with complex document shapes and textures and; (b) synthetic or hybrid datasets with non-realistic, simulated lighting conditions are used to train the models. In this paper we tackle these problems with our two main contributions. First, a physically constrained learning-based method that directly estimates document reflectance based on intrinsic image formation which generalizes to challenging illumination conditions. Second, a new dataset that clearly improves previous synthetic ones, by adding a large range of realistic shading and diverse multi-illuminant conditions, uniquely customized to deal with documents in-the-wild. The proposed architecture works in two steps. First, a white balancing module neutralizes the color of the illumination on the input image. Based on the proposed multi-illuminant dataset we achieve a good white-balancing in really difficult conditions. Second, the shading separation module accurately disentangles the shading and paper material in a self-supervised manner where only the synthetic texture is used as a weak training signal (obviating the need for very costly ground truth with disentangled versions of shading and reflectance). The proposed approach leads to significant generalization of document reflectance estimation in real scenes with challenging illumination. We extensively evaluate on the real benchmark datasets available for intrinsic image decomposition and document shadow removal tasks. Our reflectance estimation scheme, when used as a pre-processing step of an OCR pipeline, shows a 21% improvement of character error rate (CER), thus, proving the practical applicability. The data and code will be available at: https://github.com/cvlab-stonybrook/DocIIW.

http://refbase.cvc.uab.es/show.php?record=3461

Xinhang Song, Haitao Zeng, Sixian Zhang, Luis Herranz, & Shuqiang Jiang. (2020). Generalized Zero-shot Learning with Multi-source Semantic Embeddings for Scene Recognition. In 28th ACM International Conference on Multimedia.

Kai Wang, Luis Herranz, Anjan Dutta, & Joost Van de Weijer. (2020). Bookworm continual learning: beyond zero-shot learning and continual learning. In Workshop TASK-CV 2020.

Debora Gil, & Guillermo Torres. (2020). A multi-shape loss function with adaptive class balancing for the segmentation of lung structures. In 34th International Congress and Exhibition on Computer Assisted Radiology & Surgery.

Debora Gil, Oriol Ramos Terrades, & Raquel Perez. (2020). Topological Radiomics (TOPiomics): Early Detection of Genetic Abnormalities in Cancer Treatment Evolution. In Women in Geometry and Topology.

Debora Gil, Katerine Diaz, Carles Sanchez, & Aura Hernandez-Sabate. (2020). Early Screening of SARS-CoV-2 by Intelligent Analysis of X-Ray Images.

Oriol Ramos Terrades, Albert Berenguel, & Debora Gil. (2020). A flexible outlier detector based on a topology given by graph communities.