Home | << 1 2 3 4 5 6 7 8 9 10 >> [11–11] |
Records | |||||
---|---|---|---|---|---|
Author | Suman Ghosh; Ernest Valveny | ||||
Title | R-PHOC: Segmentation-Free Word Spotting using CNN | Type | Conference Article | ||
Year | 2017 | Publication | 14th International Conference on Document Analysis and Recognition | Abbreviated Journal | |
Volume | Issue | Pages | |||
Keywords | Convolutional neural network; Image segmentation; Artificial neural network; Nearest neighbor search | ||||
Abstract | arXiv:1707.01294
This paper proposes a region based convolutional neural network for segmentation-free word spotting. Our network takes as input an image and a set of word candidate bound- ing boxes and embeds all bounding boxes into an embedding space, where word spotting can be casted as a simple nearest neighbour search between the query representation and each of the candidate bounding boxes. We make use of PHOC embedding as it has previously achieved significant success in segmentation- based word spotting. Word candidates are generated using a simple procedure based on grouping connected components using some spatial constraints. Experiments show that R-PHOC which operates on images directly can improve the current state-of- the-art in the standard GW dataset and performs as good as PHOCNET in some cases designed for segmentation based word spotting. |
||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | ICDAR | ||
Notes | DAG; 600.121 | Approved | no | ||
Call Number | Admin @ si @ GhV2017a | Serial | 3079 | ||
Permanent link to this record | |||||
Author | Suman Ghosh; Ernest Valveny | ||||
Title | Visual attention models for scene text recognition | Type | Conference Article | ||
Year | 2017 | Publication | 14th International Conference on Document Analysis and Recognition | Abbreviated Journal | |
Volume | Issue | Pages | |||
Keywords | |||||
Abstract | arXiv:1706.01487
In this paper we propose an approach to lexicon-free recognition of text in scene images. Our approach relies on a LSTM-based soft visual attention model learned from convolutional features. A set of feature vectors are derived from an intermediate convolutional layer corresponding to different areas of the image. This permits encoding of spatial information into the image representation. In this way, the framework is able to learn how to selectively focus on different parts of the image. At every time step the recognizer emits one character using a weighted combination of the convolutional feature vectors according to the learned attention model. Training can be done end-to-end using only word level annotations. In addition, we show that modifying the beam search algorithm by integrating an explicit language model leads to significantly better recognition results. We validate the performance of our approach on standard SVT and ICDAR'03 scene text datasets, showing state-of-the-art performance in unconstrained text recognition. |
||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | ICDAR | ||
Notes | DAG; 600.121 | Approved | no | ||
Call Number | Admin @ si @ GhV2017b | Serial | 3080 | ||
Permanent link to this record | |||||
Author | Konstantia Georgouli; Katerine Diaz; Jesus Martinez del Rincon; Anastasios Koidis | ||||
Title | Building generic, easily-updatable chemometric models with harmonisation and augmentation features: The case of FTIR vegetable oils classification | Type | Conference Article | ||
Year | 2017 | Publication | 3rd Ιnternational Conference Metrology Promoting Standardization and Harmonization in Food and Nutrition | Abbreviated Journal | |
Volume | Issue | Pages | |||
Keywords | |||||
Abstract | |||||
Address | Thessaloniki; Greece; October 2017 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | IMEKOFOODS | ||
Notes | ADAS; 600.118 | Approved | no | ||
Call Number | Admin @ si @ GDM2017 | Serial | 3081 | ||
Permanent link to this record | |||||
Author | Ivet Rafegas | ||||
Title | Color in Visual Recognition: from flat to deep representations and some biological parallelisms | Type | Book Whole | ||
Year | 2017 | Publication | PhD Thesis, Universitat Autonoma de Barcelona-CVC | Abbreviated Journal | |
Volume | Issue | Pages | |||
Keywords | |||||
Abstract | Visual recognition is one of the main problems in computer vision that attempts to solve image understanding by deciding what objects are in images. This problem can be computationally solved by using relevant sets of visual features, such as edges, corners, color or more complex object parts. This thesis contributes to how color features have to be represented for recognition tasks.
Image features can be extracted following two different approaches. A first approach is defining handcrafted descriptors of images which is then followed by a learning scheme to classify the content (named flat schemes in Kruger et al. (2013). In this approach, perceptual considerations are habitually used to define efficient color features. Here we propose a new flat color descriptor based on the extension of color channels to boost the representation of spatio-chromatic contrast that surpasses state-of-the-art approaches. However, flat schemes present a lack of generality far away from the capabilities of biological systems. A second approach proposes evolving these flat schemes into a hierarchical process, like in the visual cortex. This includes an automatic process to learn optimal features. These deep schemes, and more specifically Convolutional Neural Networks (CNNs), have shown an impressive performance to solve various vision problems. However, there is a lack of understanding about the internal representation obtained, as a result of automatic learning. In this thesis we propose a new methodology to explore the internal representation of trained CNNs by defining the Neuron Feature as a visualization of the intrinsic features encoded in each individual neuron. Additionally, and inspired by physiological techniques, we propose to compute different neuron selectivity indexes (e.g., color, class, orientation or symmetry, amongst others) to label and classify the full CNN neuron population to understand learned representations. Finally, using the proposed methodology, we show an in-depth study on how color is represented on a specific CNN, trained for object recognition, that competes with primate representational abilities (Cadieu et al (2014)). We found several parallelisms with biological visual systems: (a) a significant number of color selectivity neurons throughout all the layers; (b) an opponent and low frequency representation of color oriented edges and a higher sampling of frequency selectivity in brightness than in color in 1st layer like in V1; (c) a higher sampling of color hue in the second layer aligned to observed hue maps in V2; (d) a strong color and shape entanglement in all layers from basic features in shallower layers (V1 and V2) to object and background shapes in deeper layers (V4 and IT); and (e) a strong correlation between neuron color selectivities and color dataset bias. |
||||
Address | November 2017 | ||||
Corporate Author | Thesis | Ph.D. thesis | |||
Publisher | Ediciones Graficas Rey | Place of Publication | Editor | Maria Vanrell | |
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | 978-84-945373-7-0 | Medium | ||
Area | Expedition | Conference | |||
Notes | CIC | Approved | no | ||
Call Number | Admin @ si @ Raf2017 | Serial | 3100 | ||
Permanent link to this record | |||||
Author | Pau Rodriguez; Guillem Cucurull; Jordi Gonzalez; Josep M. Gonfaus; Kamal Nasrollahi; Thomas B. Moeslund; Xavier Roca | ||||
Title | Deep Pain: Exploiting Long Short-Term Memory Networks for Facial Expression Classification | Type | Journal Article | ||
Year | 2017 | Publication | IEEE Transactions on cybernetics | Abbreviated Journal | Cyber |
Volume | Issue | Pages | 1-11 | ||
Keywords | |||||
Abstract | Pain is an unpleasant feeling that has been shown to be an important factor for the recovery of patients. Since this is costly in human resources and difficult to do objectively, there is the need for automatic systems to measure it. In this paper, contrary to current state-of-the-art techniques in pain assessment, which are based on facial features only, we suggest that the performance can be enhanced by feeding the raw frames to deep learning models, outperforming the latest state-of-the-art results while also directly facing the problem of imbalanced data. As a baseline, our approach first uses convolutional neural networks (CNNs) to learn facial features from VGG_Faces, which are then linked to a long short-term memory to exploit the temporal relation between video frames. We further compare the performances of using the so popular schema based on the canonically normalized appearance versus taking into account the whole image. As a result, we outperform current state-of-the-art area under the curve performance in the UNBC-McMaster Shoulder Pain Expression Archive Database. In addition, to evaluate the generalization properties of our proposed methodology on facial motion recognition, we also report competitive results in the Cohn Kanade+ facial expression database. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | ISE; 600.119; 600.098 | Approved | no | ||
Call Number | Admin @ si @ RCG2017a | Serial | 2926 | ||
Permanent link to this record | |||||
Author | F. Javier Sanchez; Jorge Bernal; Cristina Sanchez Montes; Cristina Rodriguez de Miguel; Gloria Fernandez Esparrach | ||||
Title | Bright spot regions segmentation and classification for specular highlights detection in colonoscopy videos | Type | Journal Article | ||
Year | 2017 | Publication | Machine Vision and Applications | Abbreviated Journal | MVAP |
Volume | Issue | Pages | 1-20 | ||
Keywords | Specular highlights; bright spot regions segmentation; region classification; colonoscopy | ||||
Abstract | A novel specular highlights detection method in colonoscopy videos is presented. The method is based on a model of appearance dening specular
highlights as bright spots which are highly contrasted with respect to adjacent regions. Our approach proposes two stages; segmentation, and then classication of bright spot regions. The former denes a set of candidate regions obtained through a region growing process with local maxima as initial region seeds. This process creates a tree structure which keeps track, at each growing iteration, of the region frontier contrast; nal regions provided depend on restrictions over contrast value. Non-specular regions are ltered through a classication stage performed by a linear SVM classier using model-based features from each region. We introduce a new validation database with more than 25; 000 regions along with their corresponding pixel-wise annotations. We perform a comparative study against other approaches. Results show that our method is superior to other approaches, with our segmented regions being closer to actual specular regions in the image. Finally, we also present how our methodology can also be used to obtain an accurate prediction of polyp histology. |
||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | MV; 600.096; 600.175 | Approved | no | ||
Call Number | Admin @ si @ SBS2017 | Serial | 2975 | ||
Permanent link to this record | |||||
Author | Alexey Dosovitskiy; German Ros; Felipe Codevilla; Antonio Lopez; Vladlen Koltun | ||||
Title | CARLA: An Open Urban Driving Simulator | Type | Conference Article | ||
Year | 2017 | Publication | 1st Annual Conference on Robot Learning. Proceedings of Machine Learning | Abbreviated Journal | |
Volume | 78 | Issue | Pages | 1-16 | |
Keywords | Autonomous driving; sensorimotor control; simulation | ||||
Abstract | We introduce CARLA, an open-source simulator for autonomous driving research. CARLA has been developed from the ground up to support development, training, and validation of autonomous urban driving systems. In addition to open-source code and protocols, CARLA provides open digital assets (urban layouts, buildings, vehicles) that were created for this purpose and can be used freely. The simulation platform supports flexible specification of sensor suites and environmental conditions. We use CARLA to study the performance of three approaches to autonomous driving: a classic modular pipeline, an endto-end
model trained via imitation learning, and an end-to-end model trained via reinforcement learning. The approaches are evaluated in controlled scenarios of increasing difficulty, and their performance is examined via metrics provided by CARLA, illustrating the platform’s utility for autonomous driving research. |
||||
Address | Mountain View; CA; USA; November 2017 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | CORL | ||
Notes | ADAS; 600.085; 600.118 | Approved | no | ||
Call Number | Admin @ si @ DRC2017 | Serial | 2988 | ||
Permanent link to this record | |||||
Author | Sergio Escalera; Vassilis Athitsos; Isabelle Guyon | ||||
Title | Challenges in Multi-modal Gesture Recognition | Type | Book Chapter | ||
Year | 2017 | Publication | Abbreviated Journal | ||
Volume | Issue | Pages | 1-60 | ||
Keywords | Gesture recognition; Time series analysis; Multimodal data analysis; Computer vision; Pattern recognition; Wearable sensors; Infrared cameras; Kinect TMTM | ||||
Abstract | This paper surveys the state of the art on multimodal gesture recognition and introduces the JMLR special topic on gesture recognition 2011–2015. We began right at the start of the Kinect TMTM revolution when inexpensive infrared cameras providing image depth recordings became available. We published papers using this technology and other more conventional methods, including regular video cameras, to record data, thus providing a good overview of uses of machine learning and computer vision using multimodal data in this area of application. Notably, we organized a series of challenges and made available several datasets we recorded for that purpose, including tens of thousands of videos, which are available to conduct further research. We also overview recent state of the art works on gesture recognition based on a proposed taxonomy for gesture recognition, discussing challenges and future lines of research. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | HuPBA; no proj | Approved | no | ||
Call Number | Admin @ si @ EAG2017 | Serial | 3008 | ||
Permanent link to this record | |||||
Author | Laura Igual; Santiago Segui | ||||
Title | Introduction to Data Science – A Python Approach to Concepts, Techniques and Applications. Undergraduate Topics in Computer Science | Type | Book Whole | ||
Year | 2017 | Publication | Abbreviated Journal | ||
Volume | Issue | Pages | 1-215 | ||
Keywords | |||||
Abstract | |||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | 978-3-319-50016-4 | Place of Publication | Editor | ||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | 978-3-319-50016-4 | Medium | ||
Area | Expedition | Conference | |||
Notes | MILAB | Approved | no | ||
Call Number | Admin @ si @ IgS2017 | Serial | 3027 | ||
Permanent link to this record | |||||
Author | Sergio Alloza; Flavio Escribano; Sergi Delgado; Ciprian Corneanu; Sergio Escalera | ||||
Title | XBadges. Identifying and training soft skills with commercial video games Improving persistence, risk taking & spatial reasoning with commercial video games and facial and emotional recognition system | Type | Conference Article | ||
Year | 2017 | Publication | 4th Congreso de la Sociedad Española para las Ciencias del Videojuego | Abbreviated Journal | |
Volume | 1957 | Issue | Pages | 13-28 | |
Keywords | Video Games; Soft Skills; Training; Skilling Development; Emotions; Cognitive Abilities; Flappy Bird; Pacman; Tetris | ||||
Abstract | XBadges is a research project based on the hypothesis that commercial video games (nonserious games) can train soft skills. We measure persistence, patial reasoning and risk taking before and after subjects paticipate in controlled game playing sessions.
In addition, we have developed an automatic facial expression recognition system capable of inferring their emotions while playing, allowing us to study the role of emotions in soft skills acquisition. We have used Flappy Bird, Pacman and Tetris for assessing changes in persistence, risk taking and spatial reasoning respectively. Results show how playing Tetris significantly improves spatial reasoning and how playing Pacman significantly improves prudence in certain areas of behavior. As for emotions, they reveal that being concentrated helps to improve performance and skills acquisition. Frustration is also shown as a key element. With the results obtained we are able to glimpse multiple applications in areas which need soft skills development. |
||||
Address | Barcelona; June 2017 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | COSECIVI; CEUR-WS | ||
Notes | HUPBA; no menciona | Approved | no | ||
Call Number | Admin @ si @ AED2017 | Serial | 3065 | ||
Permanent link to this record | |||||
Author | Arnau Baro; Pau Riba; Jorge Calvo-Zaragoza; Alicia Fornes | ||||
Title | Optical Music Recognition by Recurrent Neural Networks | Type | Conference Article | ||
Year | 2017 | Publication | 14th IAPR International Workshop on Graphics Recognition | Abbreviated Journal | |
Volume | Issue | Pages | 25-26 | ||
Keywords | Optical Music Recognition; Recurrent Neural Network; Long Short-Term Memory | ||||
Abstract | Optical Music Recognition is the task of transcribing a music score into a machine readable format. Many music scores are written in a single staff, and therefore, they could be treated as a sequence. Therefore, this work explores the use of Long Short-Term Memory (LSTM) Recurrent Neural Networks for reading the music score sequentially, where the LSTM helps in keeping the context. For training, we have used a synthetic dataset of more than 40000 images, labeled at primitive level | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | ICDAR | ||
Notes | DAG; 600.097; 601.302; 600.121 | Approved | no | ||
Call Number | Admin @ si @ BRC2017 | Serial | 3056 | ||
Permanent link to this record | |||||
Author | Adria Rico; Alicia Fornes | ||||
Title | Camera-based Optical Music Recognition using a Convolutional Neural Network | Type | Conference Article | ||
Year | 2017 | Publication | 12th IAPR International Workshop on Graphics Recognition | Abbreviated Journal | |
Volume | Issue | Pages | 27-28 | ||
Keywords | optical music recognition; document analysis; convolutional neural network; deep learning | ||||
Abstract | Optical Music Recognition (OMR) consists in recognizing images of music scores. Contrary to expectation, the current OMR systems usually fail when recognizing images of scores captured by digital cameras and smartphones. In this work, we propose a camera-based OMR system based on Convolutional Neural Networks, showing promising preliminary results | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | GREC | ||
Notes | DAG;600.097; 600.121 | Approved | no | ||
Call Number | Admin @ si @ RiF2017 | Serial | 3059 | ||
Permanent link to this record | |||||
Author | Quentin Angermann; Jorge Bernal; Cristina Sanchez Montes; Gloria Fernandez Esparrach; Xavier Gray; Olivier Romain; F. Javier Sanchez; Aymeric Histace | ||||
Title | Towards Real-Time Polyp Detection in Colonoscopy Videos: Adapting Still Frame-Based Methodologies for Video Sequences Analysis | Type | Conference Article | ||
Year | 2017 | Publication | 4th International Workshop on Computer Assisted and Robotic Endoscopy | Abbreviated Journal | |
Volume | Issue | Pages | 29-41 | ||
Keywords | Polyp detection; colonoscopy; real time; spatio temporal coherence | ||||
Abstract | Colorectal cancer is the second cause of cancer death in United States: precursor lesions (polyps) detection is key for patient survival. Though colonoscopy is the gold standard screening tool, some polyps are still missed. Several computational systems have been proposed but none of them are used in the clinical room mainly due to computational constraints. Besides, most of them are built over still frame databases, decreasing their performance on video analysis due to the lack of output stability and not coping with associated variability on image quality and polyp appearance. We propose a strategy to adapt these methods to video analysis by adding a spatio-temporal stability module and studying a combination of features to capture polyp appearance variability. We validate our strategy, incorporated on a real-time detection method, on a public video database. Resulting method detects all
polyps under real time constraints, increasing its performance due to our adaptation strategy. |
||||
Address | Quebec; Canada; September 2017 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | CARE | ||
Notes | MV; 600.096; 600.075 | Approved | no | ||
Call Number | Admin @ si @ ABS2017b | Serial | 2977 | ||
Permanent link to this record | |||||
Author | Pau Riba; Anjan Dutta; Josep Llados; Alicia Fornes | ||||
Title | Graph-based deep learning for graphics classification | Type | Conference Article | ||
Year | 2017 | Publication | 14th International Conference on Document Analysis and Recognition | Abbreviated Journal | |
Volume | Issue | Pages | 29-30 | ||
Keywords | |||||
Abstract | Graph-based representations are a common way to deal with graphics recognition problems. However, previous works were mainly focused on developing learning-free techniques. The success of deep learning frameworks have proved that learning is a powerful tool to solve many problems, however it is not straightforward to extend these methodologies to non euclidean data such as graphs. On the other hand, graphs are a good representational structure for graphical entities. In this work, we present some deep learning techniques that have been proposed in the literature for graph-based representations and
we show how they can be used in graphics recognition problems |
||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | ICDAR | ||
Notes | DAG; 600.097; 601.302; 600.121 | Approved | no | ||
Call Number | Admin @ si @ RDL2017b | Serial | 3058 | ||
Permanent link to this record | |||||
Author | Sounak Dey; Anjan Dutta; Josep Llados; Alicia Fornes; Umapada Pal | ||||
Title | Shallow Neural Network Model for Hand-drawn Symbol Recognition in Multi-Writer Scenario | Type | Conference Article | ||
Year | 2017 | Publication | 14th International Conference on Document Analysis and Recognition | Abbreviated Journal | |
Volume | Issue | Pages | 31-32 | ||
Keywords | |||||
Abstract | One of the main challenges in hand drawn symbol recognition is the variability among symbols because of the different writer styles. In this paper, we present and discuss some results recognizing hand-drawn symbols with a shallow neural network. A neural network model inspired from the LeNet architecture has been used to achieve state-of-the-art results with
very less training data, which is very unlikely to the data hungry deep neural network. From the results, it has become evident that the neural network architectures can efficiently describe and recognize hand drawn symbols from different writers and can model the inter author aberration |
||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | ICDAR | ||
Notes | DAG; 600.097; 600.121 | Approved | no | ||
Call Number | Admin @ si @ DDL2017 | Serial | 3057 | ||
Permanent link to this record |