|
Mohamed Ali Souibgui, & Y.Kessentini. (2022). DE-GAN: A Conditional Generative Adversarial Network for Document Enhancement. TPAMI - IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(3), 1180–1191.
Abstract: Documents often exhibit various forms of degradation, which make it hard to be read and substantially deteriorate the performance of an OCR system. In this paper, we propose an effective end-to-end framework named Document Enhancement Generative Adversarial Networks (DE-GAN) that uses the conditional GANs (cGANs) to restore severely degraded document images. To the best of our knowledge, this practice has not been studied within the context of generative adversarial deep networks. We demonstrate that, in different tasks (document clean up, binarization, deblurring and watermark removal), DE-GAN can produce an enhanced version of the degraded document with a high quality. In addition, our approach provides consistent improvements compared to state-of-the-art methods over the widely used DIBCO 2013, DIBCO 2017 and H-DIBCO 2018 datasets, proving its ability to restore a degraded document image to its ideal condition. The obtained results on a wide variety of degradation reveal the flexibility of the proposed model to be exploited in other document enhancement problems.
|
|
|
Saiping Zhang, Luis Herranz, Marta Mrak, Marc Gorriz Blanch, Shuai Wan, & Fuzheng Yang. (2022). DCNGAN: A Deformable Convolution-Based GAN with QP Adaptation for Perceptual Quality Enhancement of Compressed Video. In 47th International Conference on Acoustics, Speech, and Signal Processing.
Abstract: In this paper, we propose a deformable convolution-based generative adversarial network (DCNGAN) for perceptual quality enhancement of compressed videos. DCNGAN is also adaptive to the quantization parameters (QPs). Compared with optical flows, deformable convolutions are more effective and efficient to align frames. Deformable convolutions can operate on multiple frames, thus leveraging more temporal information, which is beneficial for enhancing the perceptual quality of compressed videos. Instead of aligning frames in a pairwise manner, the deformable convolution can process multiple frames simultaneously, which leads to lower computational complexity. Experimental results demonstrate that the proposed DCNGAN outperforms other state-of-the-art compressed video quality enhancement algorithms.
|
|
|
Adria Molina, Pau Riba, Lluis Gomez, Oriol Ramos Terrades, & Josep Llados. (2021). Date Estimation in the Wild of Scanned Historical Photos: An Image Retrieval Approach. In 16th International Conference on Document Analysis and Recognition (Vol. 12822, pp. 306–320). LNCS.
Abstract: This paper presents a novel method for date estimation of historical photographs from archival sources. The main contribution is to formulate the date estimation as a retrieval task, where given a query, the retrieved images are ranked in terms of the estimated date similarity. The closer are their embedded representations the closer are their dates. Contrary to the traditional models that design a neural network that learns a classifier or a regressor, we propose a learning objective based on the nDCG ranking metric. We have experimentally evaluated the performance of the method in two different tasks: date estimation and date-sensitive image retrieval, using the DEW public database, overcoming the baseline methods.
|
|
|
P. Andreeva, Maya Dimitrova, & Petia Radeva. (2004). Data Mining Learning Models and Algorithms for Medical Applications. In 18 Conference Systems for Automation of Engineering and Research (SEAR 2004).
|
|
|
Antonio Lopez, David Vazquez, & Gabriel Villalonga. (2018). Data for Training Models, Domain Adaptation. In Intelligent Vehicles. Enabling Technologies and Future Developments (395–436).
Abstract: Simulation can enable several developments in the field of intelligent vehicles. This chapter is divided into three main subsections. The first one deals with driving simulators. The continuous improvement of hardware performance is a well-known fact that is allowing the development of more complex driving simulators. The immersion in the simulation scene is increased by high fidelity feedback to the driver. In the second subsection, traffic simulation is explained as well as how it can be used for intelligent transport systems. Finally, it is rather clear that sensor-based perception and action must be based on data-driven algorithms. Simulation could provide data to train and test algorithms that are afterwards implemented in vehicles. These tools are explained in the third subsection.
Keywords: Driving simulator; hardware; software; interface; traffic simulation; macroscopic simulation; microscopic simulation; virtual data; training data
|
|
|
Debora Gil, Antonio Esteban Lansaque, Sebastian Stefaniga, Mihail Gaianu, & Carles Sanchez. (2019). Data Augmentation from Sketch. In International Workshop on Uncertainty for Safe Utilization of Machine Learning in Medical Imaging (Vol. 11840, pp. 155–162). LNCS.
Abstract: State of the art machine learning methods need huge amounts of data with unambiguous annotations for their training. In the context of medical imaging this is, in general, a very difficult task due to limited access to clinical data, the time required for manual annotations and variability across experts. Simulated data could serve for data augmentation provided that its appearance was comparable to the actual appearance of intra-operative acquisitions. Generative Adversarial Networks (GANs) are a powerful tool for artistic style transfer, but lack a criteria for selecting epochs ensuring also preservation of intra-operative content.
We propose a multi-objective optimization strategy for a selection of cycleGAN epochs ensuring a mapping between virtual images and the intra-operative domain preserving anatomical content. Our approach has been applied to simulate intra-operative bronchoscopic videos and chest CT scans from virtual sketches generated using simple graphical primitives.
Keywords: Data augmentation; cycleGANs; Multi-objective optimization
|
|
|
Albert Clapes, Tinne Tuytelaars, & Sergio Escalera. (2017). Darwintrees for action recognition. In Chalearn Workshop on Action, Gesture, and Emotion Recognition: Large Scale Multimodal Gesture Recognition and Real versus Fake expressed emotions at ICCV.
|
|
|
Sudeep Katakol, Luis Herranz, Fei Yang, & Marta Mrak. (2021). DANICE: Domain adaptation without forgetting in neural image compression. In Conference on Computer Vision and Pattern Recognition Workshops (pp. 1921–1925).
Abstract: Neural image compression (NIC) is a new coding paradigm where coding capabilities are captured by deep models learned from data. This data-driven nature enables new potential functionalities. In this paper, we study the adaptability of codecs to custom domains of interest. We show that NIC codecs are transferable and that they can be adapted with relatively few target domain images. However, naive adaptation interferes with the solution optimized for the original source domain, resulting in forgetting the original coding capabilities in that domain, and may even break the compatibility with previously encoded bitstreams. Addressing these problems, we propose Codec Adaptation without Forgetting (CAwF), a framework that can avoid these problems by adding a small amount of custom parameters, where the source codec remains embedded and unchanged during the adaptation process. Experiments demonstrate its effectiveness and provide useful insights on the characteristics of catastrophic interference in NIC.
|
|
|
Aleksandr Setkov, Fabio Martinez Carillo, Michele Gouiffes, Christian Jacquemin, Maria Vanrell, & Ramon Baldrich. (2015). DAcImPro: A Novel Database of Acquired Image Projections and Its Application to Object Recognition. In Advances in Visual Computing. Proceedings of 11th International Symposium, ISVC 2015 Part II (Vol. 9475, pp. 463–473). LNCS. Springer International Publishing.
Abstract: Projector-camera systems are designed to improve the projection quality by comparing original images with their captured projections, which is usually complicated due to high photometric and geometric variations. Many research works address this problem using their own test data which makes it extremely difficult to compare different proposals. This paper has two main contributions. Firstly, we introduce a new database of acquired image projections (DAcImPro) that, covering photometric and geometric conditions and providing data for ground-truth computation, can serve to evaluate different algorithms in projector-camera systems. Secondly, a new object recognition scenario from acquired projections is presented, which could be of a great interest in such domains, as home video projections and public presentations. We show that the task is more challenging than the classical recognition problem and thus requires additional pre-processing, such as color compensation or projection area selection.
Keywords: Projector-camera systems; Feature descriptors; Object recognition
|
|
|
Jiaolong Xu, Sebastian Ramos, David Vazquez, & Antonio Lopez. (2013). DA-DPM Pedestrian Detection. In ICCV Workshop on Reconstruction meets Recognition.
Keywords: Domain Adaptation; Pedestrian Detection
|
|
|
Patricia Suarez, Angel Sappa, Boris X. Vintimilla, & Riad I. Hammoud. (2021). Cycle Generative Adversarial Network: Towards A Low-Cost Vegetation Index Estimation. In 28th IEEE International Conference on Image Processing (pp. 19–22).
Abstract: This paper presents a novel unsupervised approach to estimate the Normalized Difference Vegetation Index (NDVI). The NDVI is obtained as the ratio between information from the visible and near infrared spectral bands; in the current work, the NDVI is estimated just from an image of the visible spectrum through a Cyclic Generative Adversarial Network (CyclicGAN). This unsupervised architecture learns to estimate the NDVI index by means of an image translation between the red channel of a given RGB image and the NDVI unpaired index’s image. The translation is obtained by means of a ResNET architecture and a multiple loss function. Experimental results obtained with this unsupervised scheme show the validity of the implemented model. Additionally, comparisons with the state of the art approaches are provided showing improvements with the proposed approach.
|
|
|
Debora Gil, Aura Hernandez-Sabate, David Castells, & Jordi Carrabina. (2017). CYBERH: Cyber-Physical Systems in Health for Personalized Assistance. In International Symposium on Symbolic and Numeric Algorithms for Scientific Computing.
Abstract: Assistance systems for e-Health applications have some specific requirements that demand of new methods for data gathering, analysis and modeling able to deal with SmallData:
1) systems should dynamically collect data from, both, the environment and the user to issue personalized recommendations; 2) data analysis should be able to tackle a limited number of samples prone to include non-informative data and possibly evolving in time due to changes in patient condition; 3) algorithms should run in real time with possibly limited computational resources and fluctuant internet access.
Electronic medical devices (and CyberPhysical devices in general) can enhance the process of data gathering and analysis in several ways: (i) acquiring simultaneously multiple sensors data instead of single magnitudes (ii) filtering data; (iii) providing real-time implementations condition by isolating tasks in individual processors of multiprocessors Systems-on-chip (MPSoC) platforms and (iv) combining information through sensor fusion
techniques.
Our approach focus on both aspects of the complementary role of CyberPhysical devices and analysis of SmallData in the process of personalized models building for e-Health applications. In particular, we will address the design of Cyber-Physical Systems in Health for Personalized Assistance (CyberHealth) in two specific application cases: 1) A Smart Assisted Driving System (SADs) for dynamical assessment of the driving capabilities of Mild Cognitive Impaired (MCI) people; 2) An Intelligent Operating Room (iOR) for improving the yield of bronchoscopic interventions for in-vivo lung cancer diagnosis.
|
|
|
Marçal Rusiñol, Lluis Pere de las Heras, Joan Mas, Oriol Ramos Terrades, Dimosthenis Karatzas, Anjan Dutta, et al. (2012). CVC-UAB's participation in the Flowchart Recognition Task of CLEF-IP 2012. In Conference and Labs of the Evaluation Forum.
|
|
|
Alicia Fornes, Anjan Dutta, Albert Gordo, & Josep Llados. (2012). CVC-MUSCIMA: A Ground-Truth of Handwritten Music Score Images for Writer Identification and Staff Removal. IJDAR - International Journal on Document Analysis and Recognition, 15(3), 243–251.
Abstract: 0,405JCR
The analysis of music scores has been an active research field in the last decades. However, there are no publicly available databases of handwritten music scores for the research community. In this paper we present the CVC-MUSCIMA database and ground-truth of handwritten music score images. The dataset consists of 1,000 music sheets written by 50 different musicians. It has been especially designed for writer identification and staff removal tasks. In addition to the description of the dataset, ground-truth, partitioning and evaluation metrics, we also provide some base-line results for easing the comparison between different approaches.
Keywords: Music scores; Handwritten documents; Writer identification; Staff removal; Performance evaluation; Graphics recognition; Ground truths
|
|
|
Lluis Pere de las Heras, Oriol Ramos Terrades, Sergi Robles, & Gemma Sanchez. (2015). CVC-FP and SGT: a new database for structural floor plan analysis and its groundtruthing tool. IJDAR - International Journal on Document Analysis and Recognition, 18(1), 15–30.
Abstract: Recent results on structured learning methods have shown the impact of structural information in a wide range of pattern recognition tasks. In the field of document image analysis, there is a long experience on structural methods for the analysis and information extraction of multiple types of documents. Yet, the lack of conveniently annotated and free access databases has not benefited the progress in some areas such as technical drawing understanding. In this paper, we present a floor plan database, named CVC-FP, that is annotated for the architectural objects and their structural relations. To construct this database, we have implemented a groundtruthing tool, the SGT tool, that allows to make specific this sort of information in a natural manner. This tool has been made for general purpose groundtruthing: It allows to define own object classes and properties, multiple labeling options are possible, grants the cooperative work, and provides user and version control. We finally have collected some of the recent work on floor plan interpretation and present a quantitative benchmark for this database. Both CVC-FP database and the SGT tool are freely released to the research community to ease comparisons between methods and boost reproducible research.
|
|