toggle visibility Search & Display Options

Select All    Deselect All
 |   | 
Details
   print
  Records Links
Author L. Rothacker; Marçal Rusiñol; Josep Llados; G.A. Fink edit  url
openurl 
  Title A Two-stage Approach to Segmentation-Free Query-by-example Word Spotting Type Journal
  Year 2014 Publication Manuscript Cultures Abbreviated Journal  
  Volume 7 Issue Pages 47-58  
  Keywords  
  Abstract With the ongoing progress in digitization, huge document collections and archives have become available to a broad audience. Scanned document images can be transmitted electronically and studied simultaneously throughout the world. While this is very beneficial, it is often impossible to perform automated searches on these document collections. Optical character recognition usually fails when it comes to handwritten or historic documents. In order to address the need for exploring document collections rapidly, researchers are working on word spotting. In query-by-example word spotting scenarios, the user selects an exemplary occurrence of the query word in a document image. The word spotting system then retrieves all regions in the collection that are visually similar to the given example of the query word. The best matching regions are presented to the user and no actual transcription is required.
An important property of a word spotting system is the computational speed with which queries can be executed. In our previous work, we presented a relatively slow but high-precision method. In the present work, we will extend this baseline system to an integrated two-stage approach. In a coarse-grained first stage, we will filter document images efficiently in order to identify regions that are likely to contain the query word. In the fine-grained second stage, these regions will be analyzed with our previously presented high-precision method. Finally, we will report recognition results and query times for the well-known George Washington
benchmark in our evaluation. We achieve state-of-the-art recognition results while the query times can be reduced to 50% in comparison with our baseline.
 
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN (up) Medium  
  Area Expedition Conference  
  Notes DAG; 600.061; 600.077 Approved no  
  Call Number Admin @ si @ Serial 3190  
Permanent link to this record
 

 
Author Cristhian A. Aguilera-Carrasco; C. Aguilera; Angel Sappa edit   pdf
doi  openurl
  Title Melamine Faced Panels Defect Classification beyond the Visible Spectrum Type Journal Article
  Year 2018 Publication Sensors Abbreviated Journal SENS  
  Volume 18 Issue 11 Pages 1-10  
  Keywords industrial application; infrared; machine learning  
  Abstract In this work, we explore the use of images from different spectral bands to classify defects in melamine faced panels, which could appear through the production process. Through experimental evaluation, we evaluate the use of images from the visible (VS), near-infrared (NIR), and long wavelength infrared (LWIR), to classify the defects using a feature descriptor learning approach together with a support vector machine classifier. Two descriptors were evaluated, Extended Local Binary Patterns (E-LBP) and SURF using a Bag of Words (BoW) representation. The evaluation was carried on with an image set obtained during this work, which contained five different defect categories that currently occurs in the industry. Results show that using images from beyond the visual spectrum helps to improve classification performance in contrast with a single visible spectrum solution.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN (up) Medium  
  Area Expedition Conference  
  Notes MSIAU; 600.122 Approved no  
  Call Number Admin @ si @ AAS2018 Serial 3191  
Permanent link to this record
 

 
Author Xavier Soria; Angel Sappa edit   pdf
openurl 
  Title Improving Edge Detection in RGB Images by Adding NIR Channel Type Conference Article
  Year 2018 Publication 14th IEEE International Conference on Signal Image Technology & Internet Based System Abbreviated Journal  
  Volume Issue Pages  
  Keywords Edge detection; Contour detection; VGG; CNN; RGB-NIR; Near infrared images  
  Abstract The edge detection is yet a critical problem in many computer vision and image processing tasks. The manuscript presents an Holistically-Nested Edge Detection based approach to study the inclusion of Near-Infrared in the Visible spectrum
images. To do so, a Single Sensor based dataset has been acquired in the range of 400nm to 1100nm wavelength spectral band. Prominent results have been obtained even when the ground truth (annotated edge-map) is based in the visible wavelength spectrum.
 
  Address Las Palmas de Gran Canaria; November 2018  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN (up) Medium  
  Area Expedition Conference SITIS  
  Notes MSIAU; 600.122 Approved no  
  Call Number Admin @ si @ SoS2018 Serial 3192  
Permanent link to this record
 

 
Author Jorge Charco; Boris X. Vintimilla; Angel Sappa edit   pdf
openurl 
  Title Deep learning based camera pose estimation in multi-view environment Type Conference Article
  Year 2018 Publication 14th IEEE International Conference on Signal Image Technology & Internet Based System Abbreviated Journal  
  Volume Issue Pages  
  Keywords Deep learning; Camera pose estimation; Multiview environment; Siamese architecture  
  Abstract This paper proposes to use a deep learning network architecture for relative camera pose estimation on a multi-view environment. The proposed network is a variant architecture of AlexNet to use as regressor for prediction the relative translation and rotation as output. The proposed approach is trained from
scratch on a large data set that takes as input a pair of imagesfrom the same scene. This new architecture is compared with a previous approach using standard metrics, obtaining better results on the relative camera pose.
 
  Address Las Palmas de Gran Canaria; November 2018  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN (up) Medium  
  Area Expedition Conference SITIS  
  Notes MSIAU; 600.086; 600.130; 600.122 Approved no  
  Call Number Admin @ si @ CVS2018 Serial 3194  
Permanent link to this record
 

 
Author Patricia Suarez; Angel Sappa; Boris X. Vintimilla; Riad I. Hammoud edit   pdf
doi  openurl
  Title Near InfraRed Imagery Colorization Type Conference Article
  Year 2018 Publication 25th International Conference on Image Processing Abbreviated Journal  
  Volume Issue Pages 2237 - 2241  
  Keywords Convolutional Neural Networks (CNN), Generative Adversarial Network (GAN), Infrared Imagery colorization  
  Abstract This paper proposes a stacked conditional Generative Adversarial Network-based method for Near InfraRed (NIR) imagery colorization. We propose a variant architecture of Generative Adversarial Network (GAN) that uses multiple
loss functions over a conditional probabilistic generative model. We show that this new architecture/loss-function yields better generalization and representation of the generated colored IR images. The proposed approach is evaluated on a large test dataset and compared to recent state of the art methods using standard metrics.
 
  Address Athens; Greece; October 2018  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN (up) Medium  
  Area Expedition Conference ICIP  
  Notes MSIAU; 600.086; 600.130; 600.122 Approved no  
  Call Number Admin @ si @ SSV2018b Serial 3195  
Permanent link to this record
 

 
Author Patricia Suarez; Angel Sappa; Boris X. Vintimilla edit   pdf
url  openurl
  Title Vegetation Index Estimation from Monospectral Images Type Conference Article
  Year 2018 Publication 15th International Conference on Images Analysis and Recognition Abbreviated Journal  
  Volume 10882 Issue Pages 353-362  
  Keywords  
  Abstract This paper proposes a novel approach to estimate Normalized Difference Vegetation Index (NDVI) from just the red channel of a RGB image. The NDVI index is defined as the ratio of the difference of the red and infrared radiances over their sum. In other words, information from the red channel of a RGB image and the corresponding infrared spectral band are required for its computation. In the current work the NDVI index is estimated just from the red channel by training a Conditional Generative Adversarial Network (CGAN). The architecture proposed for the generative network consists of a single level structure, which combines at the final layer results from convolutional operations together with the given red channel with Gaussian noise to enhance
details, resulting in a sharp NDVI image. Then, the discriminative model
estimates the probability that the NDVI generated index came from the training dataset, rather than the index automatically generated. Experimental results with a large set of real images are provided showing that a Conditional GAN single level model represents an acceptable approach to estimate NDVI index.
 
  Address Povoa de Varzim; Portugal; June 2018  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title LNCS  
  Series Volume Series Issue Edition  
  ISSN ISBN (up) Medium  
  Area Expedition Conference ICIAR  
  Notes MSIAU; 600.086; 600.130; 600.122 Approved no  
  Call Number Admin @ si @ SSV2018c Serial 3196  
Permanent link to this record
 

 
Author Patricia Suarez; Angel Sappa; Boris X. Vintimilla; Riad I. Hammoud edit   pdf
doi  openurl
  Title Deep Learning based Single Image Dehazing Type Conference Article
  Year 2018 Publication 31st IEEE Conference on Computer Vision and Pattern Recognition Workhsop Abbreviated Journal  
  Volume Issue Pages 1250 - 12507  
  Keywords Gallium nitride; Atmospheric modeling; Generators; Generative adversarial networks; Convergence; Image color analysis  
  Abstract This paper proposes a novel approach to remove haze degradations in RGB images using a stacked conditional Generative Adversarial Network (GAN). It employs a triplet of GAN to remove the haze on each color channel independently.
A multiple loss functions scheme, applied over a conditional probabilistic model, is proposed. The proposed GAN architecture learns to remove the haze, using as conditioned entrance, the images with haze from which the clear
images will be obtained. Such formulation ensures a fast model training convergence and a homogeneous model generalization. Experiments showed that the proposed method generates high-quality clear images.
 
  Address Salt Lake City; USA; June 2018  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN (up) Medium  
  Area Expedition Conference CVPRW  
  Notes MSIAU; 600.086; 600.130; 600.122 Approved no  
  Call Number Admin @ si @ SSV2018d Serial 3197  
Permanent link to this record
 

 
Author Razieh Rastgoo; Kourosh Kiani; Sergio Escalera edit  doi
openurl 
  Title Multi-Modal Deep Hand Sign Language Recognition in Still Images Using Restricted Boltzmann Machine Type Journal Article
  Year 2018 Publication Entropy Abbreviated Journal ENTROPY  
  Volume 20 Issue 11 Pages 809  
  Keywords hand sign language; deep learning; restricted Boltzmann machine (RBM); multi-modal; profoundly deaf; noisy image  
  Abstract In this paper, a deep learning approach, Restricted Boltzmann Machine (RBM), is used to perform automatic hand sign language recognition from visual data. We evaluate how RBM, as a deep generative model, is capable of generating the distribution of the input data for an enhanced recognition of unseen data. Two modalities, RGB and Depth, are considered in the model input in three forms: original image, cropped image, and noisy cropped image. Five crops of the input image are used and the hand of these cropped images are detected using Convolutional Neural Network (CNN). After that, three types of the detected hand images are generated for each modality and input to RBMs. The outputs of the RBMs for two modalities are fused in another RBM in order to recognize the output sign label of the input image. The proposed multi-modal model is trained on all and part of the American alphabet and digits of four publicly available datasets. We also evaluate the robustness of the proposal against noise. Experimental results show that the proposed multi-modal model, using crops and the RBM fusing methodology, achieves state-of-the-art results on Massey University Gesture Dataset 2012, American Sign Language (ASL). and Fingerspelling Dataset from the University of Surrey’s Center for Vision, Speech and Signal Processing, NYU, and ASL Fingerspelling A datasets.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN (up) Medium  
  Area Expedition Conference  
  Notes HUPBA; no proj Approved no  
  Call Number Admin @ si @ RKE2018 Serial 3198  
Permanent link to this record
 

 
Author Md Mostafa Kamal Sarker; Hatem A. Rashwan; Farhan Akram; Vivek Kumar Singh; Syeda Furruka Banu; Forhad U H Chowdhury; Kabir Ahmed Choudhury; Sylvie Chambon; Petia Radeva; Domenec Puig; Mohamed Abdel-Nasser edit   pdf
url  openurl
  Title SLSNet: Skin lesion segmentation using a lightweight generative adversarial network Type Journal Article
  Year 2021 Publication Expert Systems With Applications Abbreviated Journal ESWA  
  Volume 183 Issue Pages 115433  
  Keywords  
  Abstract The determination of precise skin lesion boundaries in dermoscopic images using automated methods faces many challenges, most importantly, the presence of hair, inconspicuous lesion edges and low contrast in dermoscopic images, and variability in the color, texture and shapes of skin lesions. Existing deep learning-based skin lesion segmentation algorithms are expensive in terms of computational time and memory. Consequently, running such segmentation algorithms requires a powerful GPU and high bandwidth memory, which are not available in dermoscopy devices. Thus, this article aims to achieve precise skin lesion segmentation with minimum resources: a lightweight, efficient generative adversarial network (GAN) model called SLSNet, which combines 1-D kernel factorized networks, position and channel attention, and multiscale aggregation mechanisms with a GAN model. The 1-D kernel factorized network reduces the computational cost of 2D filtering. The position and channel attention modules enhance the discriminative ability between the lesion and non-lesion feature representations in spatial and channel dimensions, respectively. A multiscale block is also used to aggregate the coarse-to-fine features of input skin images and reduce the effect of the artifacts. SLSNet is evaluated on two publicly available datasets: ISBI 2017 and the ISIC 2018. Although SLSNet has only 2.35 million parameters, the experimental results demonstrate that it achieves segmentation results on a par with the state-of-the-art skin lesion segmentation methods with an accuracy of 97.61%, and Dice and Jaccard similarity coefficients of 90.63% and 81.98%, respectively. SLSNet can run at more than 110 frames per second (FPS) in a single GTX1080Ti GPU, which is faster than well-known deep learning-based image segmentation models, such as FCN. Therefore, SLSNet can be used for practical dermoscopic applications.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN (up) Medium  
  Area Expedition Conference  
  Notes MILAB; no proj Approved no  
  Call Number Admin @ si @ SRA2021 Serial 3633  
Permanent link to this record
 

 
Author Giacomo Magnifico; Beata Megyesi; Mohamed Ali Souibgui; Jialuo Chen; Alicia Fornes edit   pdf
url  openurl
  Title Lost in Transcription of Graphic Signs in Ciphers Type Conference Article
  Year 2022 Publication International Conference on Historical Cryptology (HistoCrypt 2022) Abbreviated Journal  
  Volume Issue Pages 153-158  
  Keywords transcription of ciphers; hand-written text recognition of symbols; graphic signs  
  Abstract Hand-written Text Recognition techniques with the aim to automatically identify and transcribe hand-written text have been applied to historical sources including ciphers. In this paper, we compare the performance of two machine learning architectures, an unsupervised method based on clustering and a deep learning method with few-shot learning. Both models are tested on seen and unseen data from historical ciphers with different symbol sets consisting of various types of graphic signs. We compare the models and highlight their differences in performance, with their advantages and shortcomings.  
  Address Amsterdam, Netherlands, June 20-22, 2022  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN (up) Medium  
  Area Expedition Conference HystoCrypt  
  Notes DAG; 600.121; 600.162; 602.230; 600.140 Approved no  
  Call Number Admin @ si @ MBS2022 Serial 3731  
Permanent link to this record
 

 
Author Pau Riba; Lutz Goldmann; Oriol Ramos Terrades; Diede Rusticus; Alicia Fornes; Josep Llados edit  doi
openurl 
  Title Table detection in business document images by message passing networks Type Journal Article
  Year 2022 Publication Pattern Recognition Abbreviated Journal PR  
  Volume 127 Issue Pages 108641  
  Keywords  
  Abstract Tabular structures in business documents offer a complementary dimension to the raw textual data. For instance, there is information about the relationships among pieces of information. Nowadays, digital mailroom applications have become a key service for workflow automation. Therefore, the detection and interpretation of tables is crucial. With the recent advances in information extraction, table detection and recognition has gained interest in document image analysis, in particular, with the absence of rule lines and unknown information about rows and columns. However, business documents usually contain sensitive contents limiting the amount of public benchmarking datasets. In this paper, we propose a graph-based approach for detecting tables in document images which do not require the raw content of the document. Hence, the sensitive content can be previously removed and, instead of using the raw image or textual content, we propose a purely structural approach to keep sensitive data anonymous. Our framework uses graph neural networks (GNNs) to describe the local repetitive structures that constitute a table. In particular, our main application domain are business documents. We have carefully validated our approach in two invoice datasets and a modern document benchmark. Our experiments demonstrate that tables can be detected by purely structural approaches.  
  Address July 2022  
  Corporate Author Thesis  
  Publisher Elsevier Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN (up) Medium  
  Area Expedition Conference  
  Notes DAG; 600.162; 600.121 Approved no  
  Call Number Admin @ si @ RGR2022 Serial 3729  
Permanent link to this record
 

 
Author Meysam Madadi; Sergio Escalera; Alex Carruesco Llorens; Carlos Andujar; Xavier Baro; Jordi Gonzalez edit   pdf
url  doi
openurl 
  Title Top-down model fitting for hand pose recovery in sequences of depth images Type Journal Article
  Year 2018 Publication Image and Vision Computing Abbreviated Journal IMAVIS  
  Volume 79 Issue Pages 63-75  
  Keywords  
  Abstract State-of-the-art approaches on hand pose estimation from depth images have reported promising results under quite controlled considerations. In this paper we propose a two-step pipeline for recovering the hand pose from a sequence of depth images. The pipeline has been designed to deal with images taken from any viewpoint and exhibiting a high degree of finger occlusion. In a first step we initialize the hand pose using a part-based model, fitting a set of hand components in the depth images. In a second step we consider temporal data and estimate the parameters of a trained bilinear model consisting of shape and trajectory bases. We evaluate our approach on a new created synthetic hand dataset along with NYU and MSRA real datasets. Results demonstrate that the proposed method outperforms the most recent pose recovering approaches, including those based on CNNs.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN (up) Medium  
  Area Expedition Conference  
  Notes HUPBA; 600.098 Approved no  
  Call Number Admin @ si @ MEC2018 Serial 3203  
Permanent link to this record
 

 
Author Marc Oliu; Javier Selva; Sergio Escalera edit   pdf
url  openurl
  Title Folded Recurrent Neural Networks for Future Video Prediction Type Conference Article
  Year 2018 Publication 15th European Conference on Computer Vision Abbreviated Journal  
  Volume 11218 Issue Pages 745-761  
  Keywords  
  Abstract Future video prediction is an ill-posed Computer Vision problem that recently received much attention. Its main challenges are the high variability in video content, the propagation of errors through time, and the non-specificity of the future frames: given a sequence of past frames there is a continuous distribution of possible futures. This work introduces bijective Gated Recurrent Units, a double mapping between the input and output of a GRU layer. This allows for recurrent auto-encoders with state sharing between encoder and decoder, stratifying the sequence representation and helping to prevent capacity problems. We show how with this topology only the encoder or decoder needs to be applied for input encoding and prediction, respectively. This reduces the computational cost and avoids re-encoding the predictions when generating a sequence of frames, mitigating the propagation of errors. Furthermore, it is possible to remove layers from an already trained model, giving an insight to the role performed by each layer and making the model more explainable. We evaluate our approach on three video datasets, outperforming state of the art prediction results on MMNIST and UCF101, and obtaining competitive results on KTH with 2 and 3 times less memory usage and computational cost than the best scored approach.  
  Address Munich; September 2018  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title LNCS  
  Series Volume Series Issue Edition  
  ISSN ISBN (up) Medium  
  Area Expedition Conference ECCV  
  Notes HUPBA; no menciona Approved no  
  Call Number Admin @ si @ OSE2018 Serial 3204  
Permanent link to this record
 

 
Author Ciprian Corneanu; Meysam Madadi; Sergio Escalera edit   pdf
url  openurl
  Title Deep Structure Inference Network for Facial Action Unit Recognition Type Conference Article
  Year 2018 Publication 15th European Conference on Computer Vision Abbreviated Journal  
  Volume 11216 Issue Pages 309-324  
  Keywords Computer Vision; Machine Learning; Deep Learning; Facial Expression Analysis; Facial Action Units; Structure Inference  
  Abstract Facial expressions are combinations of basic components called Action Units (AU). Recognizing AUs is key for general facial expression analysis. Recently, efforts in automatic AU recognition have been dedicated to learning combinations of local features and to exploiting correlations between AUs. We propose a deep neural architecture that tackles both problems by combining learned local and global features in its initial stages and replicating a message passing algorithm between classes similar to a graphical model inference approach in later stages. We show that by training the model end-to-end with increased supervision we improve state-of-the-art by 5.3% and 8.2% performance on BP4D and DISFA datasets, respectively.  
  Address Munich; September 2018  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title LNCS  
  Series Volume Series Issue Edition  
  ISSN ISBN (up) Medium  
  Area Expedition Conference ECCV  
  Notes HUPBA; no proj Approved no  
  Call Number Admin @ si @ CME2018 Serial 3205  
Permanent link to this record
 

 
Author Mohamed Ilyes Lakhal; Albert Clapes; Sergio Escalera; Oswald Lanz; Andrea Cavallaro edit   pdf
url  openurl
  Title Residual Stacked RNNs for Action Recognition Type Conference Article
  Year 2018 Publication 9th International Workshop on Human Behavior Understanding Abbreviated Journal  
  Volume Issue Pages 534-548  
  Keywords Action recognition; Deep residual learning; Two-stream RNN  
  Abstract Action recognition pipelines that use Recurrent Neural Networks (RNN) are currently 5–10% less accurate than Convolutional Neural Networks (CNN). While most works that use RNNs employ a 2D CNN on each frame to extract descriptors for action recognition, we extract spatiotemporal features from a 3D CNN and then learn the temporal relationship of these descriptors through a stacked residual recurrent neural network (Res-RNN). We introduce for the first time residual learning to counter the degradation problem in multi-layer RNNs, which have been successful for temporal aggregation in two-stream action recognition pipelines. Finally, we use a late fusion strategy to combine RGB and optical flow data of the two-stream Res-RNN. Experimental results show that the proposed pipeline achieves competitive results on UCF-101 and state of-the-art results for RNN-like architectures on the challenging HMDB-51 dataset.  
  Address Munich; September 2018  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN (up) Medium  
  Area Expedition Conference ECCVW  
  Notes HUPBA; no proj Approved no  
  Call Number Admin @ si @ LCE2018b Serial 3206  
Permanent link to this record
Select All    Deselect All
 |   | 
Details
   print

Save Citations:
Export Records: