|   | 
Details
   web
Records
Author Pichao Wang; Wanqing Li; Philip Ogunbona; Jun Wan; Sergio Escalera
Title (up) RGB-D-based Human Motion Recognition with Deep Learning: A Survey Type Journal Article
Year 2018 Publication Computer Vision and Image Understanding Abbreviated Journal CVIU
Volume 171 Issue Pages 118-139
Keywords Human motion recognition; RGB-D data; Deep learning; Survey
Abstract Human motion recognition is one of the most important branches of human-centered research activities. In recent years, motion recognition based on RGB-D data has attracted much attention. Along with the development in artificial intelligence, deep learning techniques have gained remarkable success in computer vision. In particular, convolutional neural networks (CNN) have achieved great success for image-based tasks, and recurrent neural networks (RNN) are renowned for sequence-based problems. Specifically, deep learning methods based on the CNN and RNN architectures have been adopted for motion recognition using RGB-D data. In this paper, a detailed overview of recent advances in RGB-D-based motion recognition is presented. The reviewed methods are broadly categorized into four groups, depending on the modality adopted for recognition: RGB-based, depth-based, skeleton-based and RGB+D-based. As a survey focused on the application of deep learning to RGB-D-based motion recognition, we explicitly discuss the advantages and limitations of existing techniques. Particularly, we highlighted the methods of encoding spatial-temporal-structural information inherent in video sequence, and discuss potential directions for future research.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes HUPBA; no proj Approved no
Call Number Admin @ si @ WLO2018 Serial 3123
Permanent link to this record
 

 
Author Xialei Liu; Marc Masana; Luis Herranz; Joost Van de Weijer; Antonio Lopez; Andrew Bagdanov
Title (up) Rotate your Networks: Better Weight Consolidation and Less Catastrophic Forgetting Type Conference Article
Year 2018 Publication 24th International Conference on Pattern Recognition Abbreviated Journal
Volume Issue Pages 2262-2268
Keywords
Abstract In this paper we propose an approach to avoiding catastrophic forgetting in sequential task learning scenarios. Our technique is based on a network reparameterization that approximately diagonalizes the Fisher Information Matrix of the network parameters. This reparameterization takes the form of
a factorized rotation of parameter space which, when used in conjunction with Elastic Weight Consolidation (which assumes a diagonal Fisher Information Matrix), leads to significantly better performance on lifelong learning of sequential tasks. Experimental results on the MNIST, CIFAR-100, CUB-200 and
Stanford-40 datasets demonstrate that we significantly improve the results of standard elastic weight consolidation, and that we obtain competitive results when compared to the state-of-the-art in lifelong learning without forgetting.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference ICPR
Notes LAMP; ADAS; 601.305; 601.109; 600.124; 600.106; 602.200; 600.120; 600.118 Approved no
Call Number Admin @ si @ LMH2018 Serial 3160
Permanent link to this record
 

 
Author Sangheeta Roy; Palaiahnakote Shivakumara; Namita Jain; Vijeta Khare; Anjan Dutta; Umapada Pal; Tong Lu
Title (up) Rough-Fuzzy based Scene Categorization for Text Detection and Recognition in Video Type Journal Article
Year 2018 Publication Pattern Recognition Abbreviated Journal PR
Volume 80 Issue Pages 64-82
Keywords Rough set; Fuzzy set; Video categorization; Scene image classification; Video text detection; Video text recognition
Abstract Scene image or video understanding is a challenging task especially when number of video types increases drastically with high variations in background and foreground. This paper proposes a new method for categorizing scene videos into different classes, namely, Animation, Outlet, Sports, e-Learning, Medical, Weather, Defense, Economics, Animal Planet and Technology, for the performance improvement of text detection and recognition, which is an effective approach for scene image or video understanding. For this purpose, at first, we present a new combination of rough and fuzzy concept to study irregular shapes of edge components in input scene videos, which helps to classify edge components into several groups. Next, the proposed method explores gradient direction information of each pixel in each edge component group to extract stroke based features by dividing each group into several intra and inter planes. We further extract correlation and covariance features to encode semantic features located inside planes or between planes. Features of intra and inter planes of groups are then concatenated to get a feature matrix. Finally, the feature matrix is verified with temporal frames and fed to a neural network for categorization. Experimental results show that the proposed method outperforms the existing state-of-the-art methods, at the same time, the performances of text detection and recognition methods are also improved significantly due to categorization.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes DAG; 600.097; 600.121 Approved no
Call Number Admin @ si @ RSJ2018 Serial 3096
Permanent link to this record
 

 
Author Fahad Shahbaz Khan; Joost Van de Weijer; Muhammad Anwer Rao; Andrew Bagdanov; Michael Felsberg; Jorma
Title (up) Scale coding bag of deep features for human attribute and action recognition Type Journal Article
Year 2018 Publication Machine Vision and Applications Abbreviated Journal MVAP
Volume 29 Issue 1 Pages 55-71
Keywords Action recognition; Attribute recognition; Bag of deep features
Abstract Most approaches to human attribute and action recognition in still images are based on image representation in which multi-scale local features are pooled across scale into a single, scale-invariant encoding. Both in bag-of-words and the recently popular representations based on convolutional neural networks, local features are computed at multiple scales. However, these multi-scale convolutional features are pooled into a single scale-invariant representation. We argue that entirely scale-invariant image representations are sub-optimal and investigate approaches to scale coding within a bag of deep features framework. Our approach encodes multi-scale information explicitly during the image encoding stage. We propose two strategies to encode multi-scale information explicitly in the final image representation. We validate our two scale coding techniques on five datasets: Willow, PASCAL VOC 2010, PASCAL VOC 2012, Stanford-40 and Human Attributes (HAT-27). On all datasets, the proposed scale coding approaches outperform both the scale-invariant method and the standard deep features of the same network. Further, combining our scale coding approaches with standard deep features leads to consistent improvement over the state of the art.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes LAMP; 600.068; 600.079; 600.106; 600.120 Approved no
Call Number Admin @ si @ KWR2018 Serial 3107
Permanent link to this record
 

 
Author Oscar Argudo; Marc Comino; Antonio Chica; Carlos Andujar; Felipe Lumbreras
Title (up) Segmentation of aerial images for plausible detail synthesis Type Journal Article
Year 2018 Publication Computers & Graphics Abbreviated Journal CG
Volume 71 Issue Pages 23-34
Keywords Terrain editing; Detail synthesis; Vegetation synthesis; Terrain rendering; Image segmentation
Abstract The visual enrichment of digital terrain models with plausible synthetic detail requires the segmentation of aerial images into a suitable collection of categories. In this paper we present a complete pipeline for segmenting high-resolution aerial images into a user-defined set of categories distinguishing e.g. terrain, sand, snow, water, and different types of vegetation. This segmentation-for-synthesis problem implies that per-pixel categories must be established according to the algorithms chosen for rendering the synthetic detail. This precludes the definition of a universal set of labels and hinders the construction of large training sets. Since artists might choose to add new categories on the fly, the whole pipeline must be robust against unbalanced datasets, and fast on both training and inference. Under these constraints, we analyze the contribution of common per-pixel descriptors, and compare the performance of state-of-the-art supervised learning algorithms. We report the findings of two user studies. The first one was conducted to analyze human accuracy when manually labeling aerial images. The second user study compares detailed terrains built using different segmentation strategies, including official land cover maps. These studies demonstrate that our approach can be used to turn digital elevation models into fully-featured, detailed terrains with minimal authoring efforts.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN 0097-8493 ISBN Medium
Area Expedition Conference
Notes MSIAU; 600.086; 600.118 Approved no
Call Number Admin @ si @ ACC2018 Serial 3147
Permanent link to this record
 

 
Author Sounak Dey; Anjan Dutta; Juan Ignacio Toledo; Suman Ghosh; Josep Llados; Umapada Pal
Title (up) SigNet: Convolutional Siamese Network for Writer Independent Offline Signature Verification Type Miscellaneous
Year 2018 Publication Arxiv Abbreviated Journal
Volume Issue Pages
Keywords
Abstract Offline signature verification is one of the most challenging tasks in biometrics and document forensics. Unlike other verification problems, it needs to model minute but critical details between genuine and forged signatures, because a skilled falsification might often resembles the real signature with small deformation. This verification task is even harder in writer independent scenarios which is undeniably fiscal for realistic cases. In this paper, we model an offline writer independent signature verification task with a convolutional Siamese network. Siamese networks are twin networks with shared weights, which can be trained to learn a feature space where similar observations are placed in proximity. This is achieved by exposing the network to a pair of similar and dissimilar observations and minimizing the Euclidean distance between similar pairs while simultaneously maximizing it between dissimilar pairs. Experiments conducted on cross-domain datasets emphasize the capability of our network to model forgery in different languages (scripts) and handwriting styles. Moreover, our designed Siamese network, named SigNet, exceeds the state-of-the-art results on most of the benchmark signature datasets, which paves the way for further research in this direction.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes DAG; 600.097; 600.121 Approved no
Call Number Admin @ si @ DDT2018 Serial 3085
Permanent link to this record
 

 
Author Lluis Gomez; Andres Mafla; Marçal Rusiñol; Dimosthenis Karatzas
Title (up) Single Shot Scene Text Retrieval Type Conference Article
Year 2018 Publication 15th European Conference on Computer Vision Abbreviated Journal
Volume 11218 Issue Pages 728-744
Keywords Image retrieval; Scene text; Word spotting; Convolutional Neural Networks; Region Proposals Networks; PHOC
Abstract Textual information found in scene images provides high level semantic information about the image and its context and it can be leveraged for better scene understanding. In this paper we address the problem of scene text retrieval: given a text query, the system must return all images containing the queried text. The novelty of the proposed model consists in the usage of a single shot CNN architecture that predicts at the same time bounding boxes and a compact text representation of the words in them. In this way, the text based image retrieval task can be casted as a simple nearest neighbor search of the query text representation over the outputs of the CNN over the entire image
database. Our experiments demonstrate that the proposed architecture
outperforms previous state-of-the-art while it offers a significant increase
in processing speed.
Address Munich; September 2018
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title LNCS
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference ECCV
Notes DAG; 600.084; 601.338; 600.121; 600.129 Approved no
Call Number Admin @ si @ GMR2018 Serial 3143
Permanent link to this record
 

 
Author Md. Mostafa Kamal Sarker; Hatem A. Rashwan; Farhan Akram; Syeda Furruka Banu; Adel Saleh; Vivek Kumar Singh; Forhad U. H. Chowdhury; Saddam Abdulwahab; Santiago Romani; Petia Radeva; Domenec Puig
Title (up) SLSDeep: Skin Lesion Segmentation Based on Dilated Residual and Pyramid Pooling Networks. Type Conference Article
Year 2018 Publication 21st International Conference on Medical Image Computing & Computer Assisted Intervention Abbreviated Journal
Volume 2 Issue Pages 21-29
Keywords
Abstract Skin lesion segmentation (SLS) in dermoscopic images is a crucial task for automated diagnosis of melanoma. In this paper, we present a robust deep learning SLS model, so-called SLSDeep, which is represented as an encoder-decoder network. The encoder network is constructed by dilated residual layers, in turn, a pyramid pooling network followed by three convolution layers is used for the decoder. Unlike the traditional methods employing a cross-entropy loss, we investigated a loss function by combining both Negative Log Likelihood (NLL) and End Point Error (EPE) to accurately segment the melanoma regions with sharp boundaries. The robustness of the proposed model was evaluated on two public databases: ISBI 2016 and 2017 for skin lesion analysis towards melanoma detection challenge. The proposed model outperforms the state-of-the-art methods in terms of segmentation accuracy. Moreover, it is capable to segment more than 100 images of size 384x384 per second on a recent GPU.
Address Granada; Espanya; September 2018
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference MICCAI
Notes MILAB; no proj Approved no
Call Number Admin @ si @ SRA2018 Serial 3112
Permanent link to this record
 

 
Author Dena Bazazian; Dimosthenis Karatzas; Andrew Bagdanov
Title (up) Soft-PHOC Descriptor for End-to-End Word Spotting in Egocentric Scene Images Type Conference Article
Year 2018 Publication International Workshop on Egocentric Perception, Interaction and Computing at ECCV Abbreviated Journal
Volume Issue Pages
Keywords
Abstract Word spotting in natural scene images has many applications in scene understanding and visual assistance. We propose Soft-PHOC, an intermediate representation of images based on character probability maps. Our representation extends the concept of the Pyramidal Histogram Of Characters (PHOC) by exploiting Fully Convolutional Networks to derive a pixel-wise mapping of the character distribution within candidate word regions. We show how to use our descriptors for word spotting tasks in egocentric camera streams through an efficient text line proposal algorithm. This is based on the Hough Transform over character attribute maps followed by scoring using Dynamic Time Warping (DTW). We evaluate our results on ICDAR 2015 Challenge 4 dataset of incidental scene text captured by an egocentric camera.
Address Munich; Alemanya; September 2018
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference ECCVW
Notes DAG; 600.129; 600.121; Approved no
Call Number Admin @ si @ BKB2018b Serial 3174
Permanent link to this record
 

 
Author Anjan Dutta; Hichem Sahbi
Title (up) Stochastic Graphlet Embedding Type Journal Article
Year 2018 Publication IEEE Transactions on Neural Networks and Learning Systems Abbreviated Journal TNNLS
Volume Issue Pages 1-14
Keywords Stochastic graphlets; Graph embedding; Graph classification; Graph hashing; Betweenness centrality
Abstract Graph-based methods are known to be successful in many machine learning and pattern classification tasks. These methods consider semi-structured data as graphs where nodes correspond to primitives (parts, interest points, segments,
etc.) and edges characterize the relationships between these primitives. However, these non-vectorial graph data cannot be straightforwardly plugged into off-the-shelf machine learning algorithms without a preliminary step of – explicit/implicit –graph vectorization and embedding. This embedding process
should be resilient to intra-class graph variations while being highly discriminant. In this paper, we propose a novel high-order stochastic graphlet embedding (SGE) that maps graphs into vector spaces. Our main contribution includes a new stochastic search procedure that efficiently parses a given graph and extracts/samples unlimitedly high-order graphlets. We consider
these graphlets, with increasing orders, to model local primitives as well as their increasingly complex interactions. In order to build our graph representation, we measure the distribution of these graphlets into a given graph, using particular hash functions that efficiently assign sampled graphlets into isomorphic sets with a very low probability of collision. When
combined with maximum margin classifiers, these graphlet-based representations have positive impact on the performance of pattern comparison and recognition as corroborated through extensive experiments using standard benchmark databases.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes DAG; 602.167; 602.168; 600.097; 600.121 Approved no
Call Number Admin @ si @ DuS2018 Serial 3225
Permanent link to this record
 

 
Author Domicele Jonauskaite; Nele Dael; C. Alejandro Parraga; Laetitia Chevre; Alejandro Garcia Sanchez; Christine Mohr
Title (up) Stripping #The Dress: The importance of contextual information on inter-individual differences in colour perception Type Journal Article
Year 2018 Publication Psychological Research Abbreviated Journal PSYCHO R
Volume Issue Pages 1-15
Keywords
Abstract In 2015, a picture of a Dress (henceforth the Dress) triggered popular and scientific interest; some reported seeing the Dress in white and gold (W&G) and others in blue and black (B&B). We aimed to describe the phenomenon and investigate the role of contextualization. Few days after the Dress had appeared on the Internet, we projected it to 240 students on two large screens in the classroom. Participants reported seeing the Dress in B&B (48%), W&G (38%), or blue and brown (B&Br; 7%). Amongst numerous socio-demographic variables, we only observed that W&G viewers were most likely to have always seen the Dress as W&G. In the laboratory, we tested how much contextual information is necessary for the phenomenon to occur. Fifty-seven participants selected colours most precisely matching predominant colours of parts or the full Dress. We presented, in this order, small squares (a), vertical strips (b), and the full Dress (c). We found that (1) B&B, B&Br, and W&G viewers had selected colours differing in lightness and chroma levels for contextualized images only (b, c conditions) and hue for fully contextualized condition only (c) and (2) B&B viewers selected colours most closely matching displayed colours of the Dress. Thus, the Dress phenomenon emerges due to inter-individual differences in subjectively perceived lightness, chroma, and hue, at least when all aspects of the picture need to be integrated. Our results support the previous conclusions that contextual information is key to colour perception; it should be important to understand how this actually happens.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes NEUROBIT; no proj Approved no
Call Number Admin @ si @ JDP2018 Serial 3149
Permanent link to this record
 

 
Author Thanh Nam Le; Muhammad Muzzamil Luqman; Anjan Dutta; Pierre Heroux; Christophe Rigaud; Clement Guerin; Pasquale Foggia; Jean Christophe Burie; Jean Marc Ogier; Josep Llados; Sebastien Adam
Title (up) Subgraph spotting in graph representations of comic book images Type Journal Article
Year 2018 Publication Pattern Recognition Letters Abbreviated Journal PRL
Volume 112 Issue Pages 118-124
Keywords Attributed graph; Region adjacency graph; Graph matching; Graph isomorphism; Subgraph isomorphism; Subgraph spotting; Graph indexing; Graph retrieval; Query by example; Dataset and comic book images
Abstract Graph-based representations are the most powerful data structures for extracting, representing and preserving the structural information of underlying data. Subgraph spotting is an interesting research problem, especially for studying and investigating the structural information based content-based image retrieval (CBIR) and query by example (QBE) in image databases. In this paper we address the problem of lack of freely available ground-truthed datasets for subgraph spotting and present a new dataset for subgraph spotting in graph representations of comic book images (SSGCI) with its ground-truth and evaluation protocol. Experimental results of two state-of-the-art methods of subgraph spotting are presented on the new SSGCI dataset.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes DAG; 600.097; 600.121 Approved no
Call Number Admin @ si @ LLD2018 Serial 3150
Permanent link to this record
 

 
Author Lluis Gomez; Marçal Rusiñol; Ali Furkan Biten; Dimosthenis Karatzas
Title (up) Subtitulació automàtica d'imatges. Estat de l'art i limitacions en el context arxivístic Type Conference Article
Year 2018 Publication Jornades Imatge i Recerca Abbreviated Journal
Volume Issue Pages
Keywords
Abstract
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference JIR
Notes DAG; 600.084; 600.135; 601.338; 600.121; 600.129 Approved no
Call Number Admin @ si @ GRB2018 Serial 3173
Permanent link to this record
 

 
Author David Aldavert; Marçal Rusiñol
Title (up) Synthetically generated semantic codebook for Bag-of-Visual-Words based word spotting Type Conference Article
Year 2018 Publication 13th IAPR International Workshop on Document Analysis Systems Abbreviated Journal
Volume Issue Pages 223 - 228
Keywords Word Spotting; Bag of Visual Words; Synthetic Codebook; Semantic Information
Abstract Word-spotting methods based on the Bag-ofVisual-Words framework have demonstrated a good retrieval performance even when used in a completely unsupervised manner. Although unsupervised approaches are suitable for
large document collections due to the cost of acquiring labeled data, these methods also present some drawbacks. For instance, having to train a suitable “codebook” for a certain dataset has a high computational cost. Therefore, in
this paper we present a database agnostic codebook which is trained from synthetic data. The aim of the proposed approach is to generate a codebook where the only information required is the type of script used in the document. The use of synthetic data also allows to easily incorporate semantic
information in the codebook generation. So, the proposed method is able to determine which set of codewords have a semantic representation of the descriptor feature space. Experimental results show that the resulting codebook attains a state-of-the-art performance while having a more compact representation.
Address Viena; Austria; April 2018
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference DAS
Notes DAG; 600.084; 600.129; 600.121 Approved no
Call Number Admin @ si @ AlR2018b Serial 3105
Permanent link to this record
 

 
Author Cesar de Souza; Adrien Gaidon; Eleonora Vig; Antonio Lopez
Title (up) System and method for video classification using a hybrid unsupervised and supervised multi-layer architecture Type Patent
Year 2018 Publication US9946933B2 Abbreviated Journal
Volume Issue Pages
Keywords US9946933B2
Abstract A computer-implemented video classification method and system are disclosed. The method includes receiving an input video including a sequence of frames. At least one transformation of the input video is generated, each transformation including a sequence of frames. For the input video and each transformation, local descriptors are extracted from the respective sequence of frames. The local descriptors of the input video and each transformation are aggregated to form an aggregated feature vector with a first set of processing layers learned using unsupervised learning. An output classification value is generated for the input video, based on the aggregated feature vector with a second set of processing layers learned using supervised learning.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes ADAS; 600.118 Approved no
Call Number Admin @ si @ SGV2018 Serial 3255
Permanent link to this record