toggle visibility Search & Display Options

Select All    Deselect All
 |   | 
Details
   print
  Records Links
Author ChunYang; Xu Cheng Yin; Hong Yu; Dimosthenis Karatzas; Yu Cao edit  doi
isbn  openurl
  Title ICDAR2017 Robust Reading Challenge on Text Extraction from Biomedical Literature Figures (DeTEXT) Type Conference Article
  Year 2017 Publication 14th International Conference on Document Analysis and Recognition Abbreviated Journal  
  Volume Issue (up) Pages 1444-1447  
  Keywords  
  Abstract Hundreds of millions of figures are available in the biomedical literature, representing important biomedical experimental evidence. Since text is a rich source of information in figures, automatically extracting such text may assist in the task of mining figure information and understanding biomedical documents. Unlike images in the open domain, biomedical figures present a variety of unique challenges. For example, biomedical figures typically have complex layouts, small font sizes, short text, specific text, complex symbols and irregular text arrangements. This paper presents the final results of the ICDAR 2017 Competition on Text Extraction from Biomedical Literature Figures (ICDAR2017 DeTEXT Competition), which aims at extracting (detecting and recognizing) text from biomedical literature figures. Similar to text extraction from scene images and web pictures, ICDAR2017 DeTEXT Competition includes three major tasks, i.e., text detection, cropped word recognition and end-to-end text recognition. Here, we describe in detail the data set, tasks, evaluation protocols and participants of this competition, and report the performance of the participating methods.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN 978-1-5386-3586-5 Medium  
  Area Expedition Conference ICDAR  
  Notes DAG; 600.121 Approved no  
  Call Number Admin @ si @ YCY2017 Serial 3098  
Permanent link to this record
 

 
Author Ivet Rafegas edit  isbn
openurl 
  Title Color in Visual Recognition: from flat to deep representations and some biological parallelisms Type Book Whole
  Year 2017 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal  
  Volume Issue (up) Pages  
  Keywords  
  Abstract Visual recognition is one of the main problems in computer vision that attempts to solve image understanding by deciding what objects are in images. This problem can be computationally solved by using relevant sets of visual features, such as edges, corners, color or more complex object parts. This thesis contributes to how color features have to be represented for recognition tasks.

Image features can be extracted following two different approaches. A first approach is defining handcrafted descriptors of images which is then followed by a learning scheme to classify the content (named flat schemes in Kruger et al. (2013). In this approach, perceptual considerations are habitually used to define efficient color features. Here we propose a new flat color descriptor based on the extension of color channels to boost the representation of spatio-chromatic contrast that surpasses state-of-the-art approaches. However, flat schemes present a lack of generality far away from the capabilities of biological systems. A second approach proposes evolving these flat schemes into a hierarchical process, like in the visual cortex. This includes an automatic process to learn optimal features. These deep schemes, and more specifically Convolutional Neural Networks (CNNs), have shown an impressive performance to solve various vision problems. However, there is a lack of understanding about the internal representation obtained, as a result of automatic learning. In this thesis we propose a new methodology to explore the internal representation of trained CNNs by defining the Neuron Feature as a visualization of the intrinsic features encoded in each individual neuron. Additionally, and inspired by physiological techniques, we propose to compute different neuron selectivity indexes (e.g., color, class, orientation or symmetry, amongst others) to label and classify the full CNN neuron population to understand learned representations.

Finally, using the proposed methodology, we show an in-depth study on how color is represented on a specific CNN, trained for object recognition, that competes with primate representational abilities (Cadieu et al (2014)). We found several parallelisms with biological visual systems: (a) a significant number of color selectivity neurons throughout all the layers; (b) an opponent and low frequency representation of color oriented edges and a higher sampling of frequency selectivity in brightness than in color in 1st layer like in V1; (c) a higher sampling of color hue in the second layer aligned to observed hue maps in V2; (d) a strong color and shape entanglement in all layers from basic features in shallower layers (V1 and V2) to object and background shapes in deeper layers (V4 and IT); and (e) a strong correlation between neuron color selectivities and color dataset bias.
 
  Address November 2017  
  Corporate Author Thesis Ph.D. thesis  
  Publisher Ediciones Graficas Rey Place of Publication Editor Maria Vanrell  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN 978-84-945373-7-0 Medium  
  Area Expedition Conference  
  Notes CIC Approved no  
  Call Number Admin @ si @ Raf2017 Serial 3100  
Permanent link to this record
 

 
Author Lluis Gomez; Marçal Rusiñol; Ali Furkan Biten; Dimosthenis Karatzas edit   pdf
openurl 
  Title Subtitulació automàtica d'imatges. Estat de l'art i limitacions en el context arxivístic Type Conference Article
  Year 2018 Publication Jornades Imatge i Recerca Abbreviated Journal  
  Volume Issue (up) Pages  
  Keywords  
  Abstract  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference JIR  
  Notes DAG; 600.084; 600.135; 601.338; 600.121; 600.129 Approved no  
  Call Number Admin @ si @ GRB2018 Serial 3173  
Permanent link to this record
 

 
Author Lluis Gomez; Marçal Rusiñol; Dimosthenis Karatzas edit   pdf
url  doi
openurl 
  Title Cutting Sayre's Knot: Reading Scene Text without Segmentation. Application to Utility Meters Type Conference Article
  Year 2018 Publication 13th IAPR International Workshop on Document Analysis Systems Abbreviated Journal  
  Volume Issue (up) Pages 97-102  
  Keywords Robust Reading; End-to-end Systems; CNN; Utility Meters  
  Abstract In this paper we present a segmentation-free system for reading text in natural scenes. A CNN architecture is trained in an end-to-end manner, and is able to directly output readings without any explicit text localization step. In order to validate our proposal, we focus on the specific case of reading utility meters. We present our results in a large dataset of images acquired by different users and devices, so text appears in any location, with different sizes, fonts and lengths, and the images present several distortions such as
dirt, illumination highlights or blur.
 
  Address Viena; Austria; April 2018  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference DAS  
  Notes DAG; 600.084; 600.121; 600.129 Approved no  
  Call Number Admin @ si @ GRK2018 Serial 3102  
Permanent link to this record
 

 
Author Dimosthenis Karatzas; Lluis Gomez; Marçal Rusiñol; Anguelos Nicolaou edit   pdf
url  openurl
  Title The Robust Reading Competition Annotation and Evaluation Platform Type Conference Article
  Year 2018 Publication 13th IAPR International Workshop on Document Analysis Systems Abbreviated Journal  
  Volume Issue (up) Pages 61-66  
  Keywords  
  Abstract The ICDAR Robust Reading Competition (RRC), initiated in 2003 and reestablished in 2011, has become the defacto evaluation standard for the international community. Concurrent with its second incarnation in 2011, a continuous
effort started to develop an online framework to facilitate the hosting and management of competitions. This short paper briefly outlines the Robust Reading Competition Annotation and Evaluation Platform, the backbone of the
Robust Reading Competition, comprising a collection of tools and processes that aim to simplify the management and annotation of data, and to provide online and offline performance evaluation and analysis services.
 
  Address Viena; Austria; April 2018  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference DAS  
  Notes DAG; 600.084; 600.121 Approved no  
  Call Number KGR2018 Serial 3103  
Permanent link to this record
 

 
Author David Aldavert; Marçal Rusiñol edit   pdf
doi  openurl
  Title Manuscript text line detection and segmentation using second-order derivatives analysis Type Conference Article
  Year 2018 Publication 13th IAPR International Workshop on Document Analysis Systems Abbreviated Journal  
  Volume Issue (up) Pages 293 - 298  
  Keywords text line detection; text line segmentation; text region detection; second-order derivatives  
  Abstract In this paper, we explore the use of second-order derivatives to detect text lines on handwritten document images. Taking advantage that the second derivative gives a minimum response when a dark linear element over a
bright background has the same orientation as the filter, we use this operator to create a map with the local orientation and strength of putative text lines in the document. Then, we detect line segments by selecting and merging the filter responses that have a similar orientation and scale. Finally, text lines are found by merging the segments that are within the same text region. The proposed segmentation algorithm, is learning-free while showing a performance similar to the state of the art methods in publicly available datasets.
 
  Address Viena; Austria; April 2018  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference DAS  
  Notes DAG; 600.084; 600.129; 302.065; 600.121 Approved no  
  Call Number Admin @ si @ AlR2018a Serial 3104  
Permanent link to this record
 

 
Author David Aldavert; Marçal Rusiñol edit   pdf
doi  openurl
  Title Synthetically generated semantic codebook for Bag-of-Visual-Words based word spotting Type Conference Article
  Year 2018 Publication 13th IAPR International Workshop on Document Analysis Systems Abbreviated Journal  
  Volume Issue (up) Pages 223 - 228  
  Keywords Word Spotting; Bag of Visual Words; Synthetic Codebook; Semantic Information  
  Abstract Word-spotting methods based on the Bag-ofVisual-Words framework have demonstrated a good retrieval performance even when used in a completely unsupervised manner. Although unsupervised approaches are suitable for
large document collections due to the cost of acquiring labeled data, these methods also present some drawbacks. For instance, having to train a suitable “codebook” for a certain dataset has a high computational cost. Therefore, in
this paper we present a database agnostic codebook which is trained from synthetic data. The aim of the proposed approach is to generate a codebook where the only information required is the type of script used in the document. The use of synthetic data also allows to easily incorporate semantic
information in the codebook generation. So, the proposed method is able to determine which set of codewords have a semantic representation of the descriptor feature space. Experimental results show that the resulting codebook attains a state-of-the-art performance while having a more compact representation.
 
  Address Viena; Austria; April 2018  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference DAS  
  Notes DAG; 600.084; 600.129; 600.121 Approved no  
  Call Number Admin @ si @ AlR2018b Serial 3105  
Permanent link to this record
 

 
Author V. Poulain d'Andecy; Emmanuel Hartmann; Marçal Rusiñol edit   pdf
doi  openurl
  Title Field Extraction by hybrid incremental and a-priori structural templates Type Conference Article
  Year 2018 Publication 13th IAPR International Workshop on Document Analysis Systems Abbreviated Journal  
  Volume Issue (up) Pages 251 - 256  
  Keywords Layout Analysis; information extraction; incremental learning  
  Abstract In this paper, we present an incremental framework for extracting information fields from administrative documents. First, we demonstrate some limits of the existing state-of-the-art methods such as the delay of the system efficiency. This is a concern in industrial context when we have only few samples of each document class. Based on this analysis, we propose a hybrid system combining incremental learning by means of itf-df statistics and a-priori generic
models. We report in the experimental section our results obtained with a dataset of real invoices.
 
  Address Viena; Austria; April 2018  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference DAS  
  Notes DAG; 600.084; 600.129; 600.121 Approved no  
  Call Number Admin @ si @ PHR2018 Serial 3106  
Permanent link to this record
 

 
Author Felipe Codevilla; Matthias Muller; Antonio Lopez; Vladlen Koltun; Alexey Dosovitskiy edit   pdf
doi  openurl
  Title End-to-end Driving via Conditional Imitation Learning Type Conference Article
  Year 2018 Publication IEEE International Conference on Robotics and Automation Abbreviated Journal  
  Volume Issue (up) Pages 4693 - 4700  
  Keywords  
  Abstract Deep networks trained on demonstrations of human driving have learned to follow roads and avoid obstacles. However, driving policies trained via imitation learning cannot be controlled at test time. A vehicle trained end-to-end to imitate an expert cannot be guided to take a specific turn at an upcoming intersection. This limits the utility of such systems. We propose to condition imitation learning on high-level command input. At test time, the learned driving policy functions as a chauffeur that handles sensorimotor coordination but continues to respond to navigational commands. We evaluate different architectures for conditional imitation learning in vision-based driving. We conduct experiments in realistic three-dimensional simulations of urban driving and on a 1/5 scale robotic truck that is trained to drive in a residential area. Both systems drive based on visual input yet remain responsive to high-level navigational commands. The supplementary video can be viewed at this https URL  
  Address Brisbane; Australia; May 2018  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference ICRA  
  Notes ADAS; 600.116; 600.124; 600.118 Approved no  
  Call Number Admin @ si @ CML2018 Serial 3108  
Permanent link to this record
 

 
Author Marc Bolaños; Alvaro Peris; Francisco Casacuberta; Sergi Solera; Petia Radeva edit   pdf
url  openurl
  Title Egocentric video description based on temporally-linked sequences Type Journal Article
  Year 2018 Publication Journal of Visual Communication and Image Representation Abbreviated Journal JVCIR  
  Volume 50 Issue (up) Pages 205-216  
  Keywords egocentric vision; video description; deep learning; multi-modal learning  
  Abstract Egocentric vision consists in acquiring images along the day from a first person point-of-view using wearable cameras. The automatic analysis of this information allows to discover daily patterns for improving the quality of life of the user. A natural topic that arises in egocentric vision is storytelling, that is, how to understand and tell the story relying behind the pictures.
In this paper, we tackle storytelling as an egocentric sequences description problem. We propose a novel methodology that exploits information from temporally neighboring events, matching precisely the nature of egocentric sequences. Furthermore, we present a new method for multimodal data fusion consisting on a multi-input attention recurrent network. We also release the EDUB-SegDesc dataset. This is the first dataset for egocentric image sequences description, consisting of 1,339 events with 3,991 descriptions, from 55 days acquired by 11 people. Finally, we prove that our proposal outperforms classical attentional encoder-decoder methods for video description.
 
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes MILAB; no proj Approved no  
  Call Number Admin @ si @ BPC2018 Serial 3109  
Permanent link to this record
 

 
Author Stefan Schurischuster; Beatriz Remeseiro; Petia Radeva; Martin Kampel edit  url
openurl 
  Title A Preliminary Study of Image Analysis for Parasite Detection on Honey Bees Type Conference Article
  Year 2018 Publication 15th International Conference on Image Analysis and Recognition Abbreviated Journal  
  Volume 10882 Issue (up) Pages 465-473  
  Keywords  
  Abstract Varroa destructor is a parasite harming bee colonies. As the worldwide bee population is in danger, beekeepers as well as researchers are looking for methods to monitor the health of bee hives. In this context, we present a preliminary study to detect parasites on bee videos by means of image analysis and machine learning techniques. For this purpose, each video frame is analyzed individually to extract bee image patches, which are then processed to compute image descriptors and finally classified into mite and no mite bees. The experimental results demonstrated the adequacy of the proposed method, which will be a perfect stepping stone for a further bee monitoring system.  
  Address Povoa de Varzim; Portugal; June 2018  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title LNCS  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference ICIAR  
  Notes MILAB; no proj Approved no  
  Call Number Admin @ si @ SRR2018a Serial 3110  
Permanent link to this record
 

 
Author Stefan Lonn; Petia Radeva; Mariella Dimiccoli edit  openurl
  Title A picture is worth a thousand words but how to organize thousands of pictures? Type Miscellaneous
  Year 2018 Publication Arxiv Abbreviated Journal  
  Volume Issue (up) Pages  
  Keywords  
  Abstract We live in a society where the large majority of the population has a camera-equipped smartphone. In addition, hard drives and cloud storage are getting cheaper and cheaper, leading to a tremendous growth in stored personal photos. Unlike photo collections captured by a digital camera, which typically are pre-processed by the user who organizes them into event-related folders, smartphone pictures are automatically stored in the cloud. As a consequence, photo collections captured by a smartphone are highly unstructured and because smartphones are ubiquitous, they present a larger variability compared to pictures captured by a digital camera. To solve the need of organizing large smartphone photo collections automatically, we propose here a new methodology for hierarchical photo organization into topics and topic-related categories. Our approach successfully estimates latent topics in the pictures by applying probabilistic Latent Semantic Analysis, and automatically assigns a name to each topic by relying on a lexical database. Topic-related categories are then estimated by using a set of topic-specific Convolutional Neuronal Networks. To validate our approach, we ensemble and make public a large dataset of more than 8,000 smartphone pictures from 10 persons. Experimental results demonstrate better user satisfaction with respect to state of the art solutions in terms of organization.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes MILAB; no proj Approved no  
  Call Number Admin @ si @ LRD2018 Serial 3111  
Permanent link to this record
 

 
Author Md.Mostafa Kamal Sarker; , Hatem A. Rashwan; Farhan Akram; Syeda Furruka Banu; Adel Saleh; Vivek Kumar Singh; Forhad U. H. Chowdhury; Saddam Abdulwahab; Santiago Romani; Petia Radeva; Domenec Puig edit   pdf
url  openurl
  Title SLSDeep: Skin Lesion Segmentation Based on Dilated Residual and Pyramid Pooling Networks. Type Conference Article
  Year 2018 Publication 21st International Conference on Medical Image Computing & Computer Assisted Intervention Abbreviated Journal  
  Volume 2 Issue (up) Pages 21-29  
  Keywords  
  Abstract Skin lesion segmentation (SLS) in dermoscopic images is a crucial task for automated diagnosis of melanoma. In this paper, we present a robust deep learning SLS model, so-called SLSDeep, which is represented as an encoder-decoder network. The encoder network is constructed by dilated residual layers, in turn, a pyramid pooling network followed by three convolution layers is used for the decoder. Unlike the traditional methods employing a cross-entropy loss, we investigated a loss function by combining both Negative Log Likelihood (NLL) and End Point Error (EPE) to accurately segment the melanoma regions with sharp boundaries. The robustness of the proposed model was evaluated on two public databases: ISBI 2016 and 2017 for skin lesion analysis towards melanoma detection challenge. The proposed model outperforms the state-of-the-art methods in terms of segmentation accuracy. Moreover, it is capable to segment more than 100 images of size 384x384 per second on a recent GPU.  
  Address Granada; Espanya; September 2018  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference MICCAI  
  Notes MILAB; no proj Approved no  
  Call Number Admin @ si @ SRA2018 Serial 3112  
Permanent link to this record
 

 
Author Md.Mostafa Kamal Sarker; Mohammed Jabreel; , Hatem A. Rashwan; Syeda Furruka Banu; Petia Radeva; Domenec Puig edit   pdf
doi  openurl
  Title CuisineNet: Food Attributes Classification using Multi-scale Convolution Network Type Conference Article
  Year 2018 Publication 21st International Conference of the Catalan Association for Artificial Intelligence Abbreviated Journal  
  Volume Issue (up) Pages 365-372  
  Keywords  
  Abstract Diversity of food and its attributes represents the culinary habits of peoples from different countries. Thus, this paper addresses the problem of identifying food culture of people around the world and its flavor by classifying two main food attributes, cuisine and flavor. A deep learning model based on multi-scale convotuional networks is proposed for extracting more accurate features from input images. The aggregation of multi-scale convolution layers with different kernel size is also used for weighting the features results from different scales. In addition, a joint loss function based on Negative Log Likelihood (NLL) is used to fit the model probability to multi labeled classes for multi-modal classification task. Furthermore, this work provides a new dataset for food attributes, so-called Yummly48K, extracted from the popular food website, Yummly. Our model is assessed on the constructed Yummly48K dataset. The experimental results show that our proposed method yields 65% and 62% average F1 score on validation and test set which outperforming the state-of-the-art models.  
  Address Roses; catalonia; October 2018  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference CCIA  
  Notes MILAB; no menciona Approved no  
  Call Number Admin @ si @ SJR2018 Serial 3113  
Permanent link to this record
 

 
Author Ivet Rafegas; Maria Vanrell edit   pdf
url  doi
openurl 
  Title Color encoding in biologically-inspired convolutional neural networks Type Journal Article
  Year 2018 Publication Vision Research Abbreviated Journal VR  
  Volume 151 Issue (up) Pages 7-17  
  Keywords Color coding; Computer vision; Deep learning; Convolutional neural networks  
  Abstract Convolutional Neural Networks have been proposed as suitable frameworks to model biological vision. Some of these artificial networks showed representational properties that rival primate performances in object recognition. In this paper we explore how color is encoded in a trained artificial network. It is performed by estimating a color selectivity index for each neuron, which allows us to describe the neuron activity to a color input stimuli. The index allows us to classify whether they are color selective or not and if they are of a single or double color. We have determined that all five convolutional layers of the network have a large number of color selective neurons. Color opponency clearly emerges in the first layer, presenting 4 main axes (Black-White, Red-Cyan, Blue-Yellow and Magenta-Green), but this is reduced and rotated as we go deeper into the network. In layer 2 we find a denser hue sampling of color neurons and opponency is reduced almost to one new main axis, the Bluish-Orangish coinciding with the dataset bias. In layers 3, 4 and 5 color neurons are similar amongst themselves, presenting different type of neurons that detect specific colored objects (e.g., orangish faces), specific surrounds (e.g., blue sky) or specific colored or contrasted object-surround configurations (e.g. blue blob in a green surround). Overall, our work concludes that color and shape representation are successively entangled through all the layers of the studied network, revealing certain parallelisms with the reported evidences in primate brains that can provide useful insight into intermediate hierarchical spatio-chromatic representations.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes CIC; 600.051; 600.087 Approved no  
  Call Number Admin @ si @RaV2018 Serial 3114  
Permanent link to this record
Select All    Deselect All
 |   | 
Details
   print

Save Citations:
Export Records: