Diego Cheda, Daniel Ponsa, & Antonio Lopez. (2012). Monocular Egomotion Estimation based on Image Matching. In 1st International Conference on Pattern Recognition Applications and Methods (pp. 425–430).
|
Diego Cheda, Daniel Ponsa, & Antonio Lopez. (2012). Monocular Depth-based Background Estimation. In 7th International Conference on Computer Vision Theory and Applications (pp. 323–328).
Abstract: In this paper, we address the problem of reconstructing the background of a scene from a video sequence with occluding objects. The images are taken by hand-held cameras. Our method composes the background by selecting the appropriate pixels from previously aligned input images. To do that, we minimize a cost function that penalizes the deviations from the following assumptions: background represents objects whose distance to the camera is maximal, and background objects are stationary. Distance information is roughly obtained by a supervised learning approach that allows us to distinguish between close and distant image regions. Moving foreground objects are filtered out by using stationariness and motion boundary constancy measurements. The cost function is minimized by a graph cuts method. We demonstrate the applicability of our approach to recover an occlusion-free background in a set of sequences.
|
Diego Cheda, Daniel Ponsa, & Antonio Lopez. (2012). Pedestrian Candidates Generation using Monocular Cues. In IEEE Intelligent Vehicles Symposium (pp. 7–12). IEEE Xplore.
Abstract: Common techniques for pedestrian candidates generation (e.g., sliding window approaches) are based on an exhaustive search over the image. This implies that the number of windows produced is huge, which translates into a significant time consumption in the classification stage. In this paper, we propose a method that significantly reduces the number of windows to be considered by a classifier. Our method is a monocular one that exploits geometric and depth information available on single images. Both representations of the world are fused together to generate pedestrian candidates based on an underlying model which is focused only on objects standing vertically on the ground plane and having certain height, according with their depths on the scene. We evaluate our algorithm on a challenging dataset and demonstrate its application for pedestrian detection, where a considerable reduction in the number of candidate windows is reached.
Keywords: pedestrian detection
|
Diego Porres. (2021). Discriminator Synthesis: On reusing the other half of Generative Adversarial Networks. In Machine Learning for Creativity and Design, Neurips Workshop.
Abstract: Generative Adversarial Networks have long since revolutionized the world of computer vision and, tied to it, the world of art. Arduous efforts have gone into fully utilizing and stabilizing training so that outputs of the Generator network have the highest possible fidelity, but little has gone into using the Discriminator after training is complete. In this work, we propose to use the latter and show a way to use the features it has learned from the training dataset to both alter an image and generate one from scratch. We name this method Discriminator Dreaming, and the full code can be found at this https URL.
|
Diego Velazquez. (2023). Towards Robustness in Computer-based Image Understanding (Jordi Gonzalez, Josep M. Gonfaus, & Pau Rodriguez, Eds.). Ph.D. thesis, IMPRIMA, .
Abstract: This thesis embarks on an exploratory journey into robustness in deep learning,
with a keen focus on the intertwining facets of generalization, explainability, and
edge cases within the realm of computer vision. In deep learning, robustness
epitomizes a model’s resilience and flexibility, grounded on its capacity to generalize across diverse data distributions, explain its predictions transparently, and navigate the intricacies of edge cases effectively. The challenges associated with robust generalization are multifaceted, encompassing the model’s performance on unseen data and its defense against out-of-distribution data and adversarial attacks. Bridging this gap, the potential of Embedding Propagation (EP) for improving out-of-distribution generalization is explored. EP is depicted as a powerful tool facilitating manifold smoothing, which in turn fortifies the model’s robustness against adversarial onslaughts and bolsters performance in few-shot and self-/semi-supervised learning scenarios. In the labyrinth of deep learning models, the path to robustness often intersects with explainability. As model complexity increases, so does the urgency to decipher their decision-making
processes. Acknowledging this, the thesis introduces a robust framework for
evaluating and comparing various counterfactual explanation methods, echoing
the imperative of explanation quality over quantity and spotlighting the intricacies of diversifying explanations. Simultaneously, the deep learning landscape is fraught with edge cases – anomalies in the form of small objects or rare instances in object detection tasks that defy the norm. Confronting this, the
thesis presents an extension of the DETR (DEtection TRansformer) model to enhance small object detection. The devised DETR-FP, embedding the Feature Pyramid technique, demonstrating improvement in small objects detection accuracy, albeit facing challenges like high computational costs. With emergence of foundation models in mind, the thesis unveils EarthView, the largest scale remote sensing dataset to date, built for the self-supervised learning of a robust foundational model for remote sensing. Collectively, these studies contribute to the grand narrative of robustness in deep learning, weaving together the strands of generalization, explainability, and edge case performance. Through these methodological advancements and novel datasets, the thesis calls for continued exploration, innovation, and refinement to fortify the bastion of robust computer vision.
|
Diego Velazquez, Josep M. Gonfaus, Pau Rodriguez, Xavier Roca, Seiichi Ozawa, & Jordi Gonzalez. (2021). Logo Detection With No Priors. ACCESS - IEEE Access, 9, 106998–107011.
Abstract: In recent years, top referred methods on object detection like R-CNN have implemented this task as a combination of proposal region generation and supervised classification on the proposed bounding boxes. Although this pipeline has achieved state-of-the-art results in multiple datasets, it has inherent limitations that make object detection a very complex and inefficient task in computational terms. Instead of considering this standard strategy, in this paper we enhance Detection Transformers (DETR) which tackles object detection as a set-prediction problem directly in an end-to-end fully differentiable pipeline without requiring priors. In particular, we incorporate Feature Pyramids (FP) to the DETR architecture and demonstrate the effectiveness of the resulting DETR-FP approach on improving logo detection results thanks to the improved detection of small logos. So, without requiring any domain specific prior to be fed to the model, DETR-FP obtains competitive results on the OpenLogo and MS-COCO datasets offering a relative improvement of up to 30%, when compared to a Faster R-CNN baseline which strongly depends on hand-designed priors.
|
Diego Velazquez, Pau Rodriguez, Alexandre Lacoste, Issam H. Laradji, Xavier Roca, & Jordi Gonzalez. (2023). Evaluating Counterfactual Explainers. TMLR - Transactions on Machine Learning Research.
Abstract: Explainability methods have been widely used to provide insight into the decisions made by statistical models, thus facilitating their adoption in various domains within the industry. Counterfactual explanation methods aim to improve our understanding of a model by perturbing samples in a way that would alter its response in an unexpected manner. This information is helpful for users and for machine learning practitioners to understand and improve their models. Given the value provided by counterfactual explanations, there is a growing interest in the research community to investigate and propose new methods. However, we identify two issues that could hinder the progress in this field. (1) Existing metrics do not accurately reflect the value of an explainability method for the users. (2) Comparisons between methods are usually performed with datasets like CelebA, where images are annotated with attributes that do not fully describe them and with subjective attributes such as ``Attractive''. In this work, we address these problems by proposing an evaluation method with a principled metric to evaluate and compare different counterfactual explanation methods. The evaluation method is based on a synthetic dataset where images are fully described by their annotated attributes. As a result, we are able to perform a fair comparison of multiple explainability methods in the recent literature, obtaining insights about their performance. We make the code public for the benefit of the research community.
Keywords: Explainability; Counterfactuals; XAI
|
Diego Velazquez, Pau Rodriguez, Josep M. Gonfaus, Xavier Roca, & Jordi Gonzalez. (2022). A Closer Look at Embedding Propagation for Manifold Smoothing. JMLR - Journal of Machine Learning Research, 23(252), 1–27.
Abstract: Supervised training of neural networks requires a large amount of manually annotated data and the resulting networks tend to be sensitive to out-of-distribution (OOD) data.
Self- and semi-supervised training schemes reduce the amount of annotated data required during the training process. However, OOD generalization remains a major challenge for most methods. Strategies that promote smoother decision boundaries play an important role in out-of-distribution generalization. For example, embedding propagation (EP) for manifold smoothing has recently shown to considerably improve the OOD performance for few-shot classification. EP achieves smoother class manifolds by building a graph from sample embeddings and propagating information through the nodes in an unsupervised manner. In this work, we extend the original EP paper providing additional evidence and experiments showing that it attains smoother class embedding manifolds and improves results in settings beyond few-shot classification. Concretely, we show that EP improves the robustness of neural networks against multiple adversarial attacks as well as semi- and
self-supervised learning performance.
Keywords: Regularization; emi-supervised learning; self-supervised learning; adversarial robustness; few-shot classification
|
Dimosthenis Karatzas. (2008). Detecting Gradients in Text Images Using the Hough Transform. In Proceedings of the 8th International Workshop on Document Analysis Systems, (245–252).
|
Dimosthenis Karatzas, Faisal Shafait, Seiichi Uchida, Masakazu Iwamura, Lluis Gomez, Sergi Robles, et al. (2013). ICDAR 2013 Robust Reading Competition. In 12th International Conference on Document Analysis and Recognition (pp. 1484–1493).
Abstract: This report presents the final results of the ICDAR 2013 Robust Reading Competition. The competition is structured in three Challenges addressing text extraction in different application domains, namely born-digital images, real scene images and real-scene videos. The Challenges are organised around specific tasks covering text localisation, text segmentation and word recognition. The competition took place in the first quarter of 2013, and received a total of 42 submissions over the different tasks offered. This report describes the datasets and ground truth specification, details the performance evaluation protocols used and presents the final results along with a brief summary of the participating methods.
|
Dimosthenis Karatzas, Lluis Gomez, Anguelos Nicolaou, Suman Ghosh, Andrew Bagdanov, Masakazu Iwamura, et al. (2015). ICDAR 2015 Competition on Robust Reading. In 13th International Conference on Document Analysis and Recognition ICDAR2015 (pp. 1156–1160).
|
Dimosthenis Karatzas, Lluis Gomez, & Marçal Rusiñol. (2017). The Robust Reading Competition Annotation and Evaluation Platform. In 1st International Workshop on Open Services and Tools for Document Analysis.
Abstract: The ICDAR Robust Reading Competition (RRC), initiated in 2003 and re-established in 2011, has become the defacto evaluation standard for the international community. Concurrent with its second incarnation in 2011, a continuous effort started to develop an online framework to facilitate the hosting and management of competitions. This short paper briefly outlines the Robust Reading Competition Annotation and Evaluation Platform, the backbone of the Robust Reading Competition, comprising a collection of tools and processes that aim to simplify the management and annotation
of data, and to provide online and offline performance evaluation and analysis services
|
Dimosthenis Karatzas, Lluis Gomez, Marçal Rusiñol, & Anguelos Nicolaou. (2018). The Robust Reading Competition Annotation and Evaluation Platform. In 13th IAPR International Workshop on Document Analysis Systems (pp. 61–66).
Abstract: The ICDAR Robust Reading Competition (RRC), initiated in 2003 and reestablished in 2011, has become the defacto evaluation standard for the international community. Concurrent with its second incarnation in 2011, a continuous
effort started to develop an online framework to facilitate the hosting and management of competitions. This short paper briefly outlines the Robust Reading Competition Annotation and Evaluation Platform, the backbone of the
Robust Reading Competition, comprising a collection of tools and processes that aim to simplify the management and annotation of data, and to provide online and offline performance evaluation and analysis services.
|
Dimosthenis Karatzas, Marçal Rusiñol, Coen Antens, & Miquel Ferrer. (2008). Segmentation Robust to the Vignette Effect for Machine Vision Systems. In 19th International Conference on Pattern Recognition.
Abstract: The vignette effect (radial fall-off) is commonly encountered in images obtained through certain image acquisition setups and can seriously hinder automatic analysis processes. In this paper we present a fast and efficient method for dealing with vignetting in the context of object segmentation in an existing industrial inspection setup. The vignette effect is modelled here as a circular, non-linear gradient. The method estimates the gradient parameters and employs them to perform segmentation. Segmentation results on a variety of images indicate that the presented method is able to successfully tackle the vignette effect.
|
Dimosthenis Karatzas, Sergi Robles, Joan Mas, Farshad Nourbakhsh, & Partha Pratim Roy. (2011). ICDAR 2011 Robust Reading Competition – Challege 1: Reading Text in Born-Digital Images (Web and Email). In 11th International Conference on Document Analysis and Recognition (pp. 1485–1490).
Abstract: This paper presents the results of the first Challenge of ICDAR 2011 Robust Reading Competition. Challenge 1 is focused on the extraction of text from born-digital images, specifically from images found in Web pages and emails. The challenge was organized in terms of three tasks that look at different stages of the process: text localization, text segmentation and word recognition. In this paper we present the results of the challenge for all three tasks, and make an open call for continuous participation outside the context of ICDAR 2011.
|