Home | << 1 2 3 4 >> |
Records | |||||
---|---|---|---|---|---|
Author | Lei Kang; Pau Riba; Mauricio Villegas; Alicia Fornes; Marçal Rusiñol | ||||
Title | Candidate Fusion: Integrating Language Modelling into a Sequence-to-Sequence Handwritten Word Recognition Architecture | Type | Journal Article | ||
Year | 2021 | Publication | Pattern Recognition | Abbreviated Journal | PR |
Volume | 112 | Issue | Pages | 107790 | |
Keywords | |||||
Abstract | Sequence-to-sequence models have recently become very popular for tackling
handwritten word recognition problems. However, how to effectively integrate an external language model into such recognizer is still a challenging problem. The main challenge faced when training a language model is to deal with the language model corpus which is usually different to the one used for training the handwritten word recognition system. Thus, the bias between both word corpora leads to incorrectness on the transcriptions, providing similar or even worse performances on the recognition task. In this work, we introduce Candidate Fusion, a novel way to integrate an external language model to a sequence-to-sequence architecture. Moreover, it provides suggestions from an external language knowledge, as a new input to the sequence-to-sequence recognizer. Hence, Candidate Fusion provides two improvements. On the one hand, the sequence-to-sequence recognizer has the flexibility not only to combine the information from itself and the language model, but also to choose the importance of the information provided by the language model. On the other hand, the external language model has the ability to adapt itself to the training corpus and even learn the most commonly errors produced from the recognizer. Finally, by conducting comprehensive experiments, the Candidate Fusion proves to outperform the state-of-the-art language models for handwritten word recognition tasks. |
||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | DAG; 600.140; 601.302; 601.312; 600.121 | Approved | no | ||
Call Number | Admin @ si @ KRV2021 | Serial | 3343 | ||
Permanent link to this record | |||||
Author | Ruben Tito; Dimosthenis Karatzas; Ernest Valveny | ||||
Title | Hierarchical multimodal transformers for Multi-Page DocVQA | Type | Journal Article | ||
Year | 2023 | Publication | Pattern Recognition | Abbreviated Journal | PR |
Volume | 144 | Issue | Pages | 109834 | |
Keywords | |||||
Abstract | Document Visual Question Answering (DocVQA) refers to the task of answering questions from document images. Existing work on DocVQA only considers single-page documents. However, in real scenarios documents are mostly composed of multiple pages that should be processed altogether. In this work we extend DocVQA to the multi-page scenario. For that, we first create a new dataset, MP-DocVQA, where questions are posed over multi-page documents instead of single pages. Second, we propose a new hierarchical method, Hi-VT5, based on the T5 architecture, that overcomes the limitations of current methods to process long multi-page documents. The proposed method is based on a hierarchical transformer architecture where the encoder summarizes the most relevant information of every page and then, the decoder takes this summarized information to generate the final answer. Through extensive experimentation, we demonstrate that our method is able, in a single stage, to answer the questions and provide the page that contains the relevant information to find the answer, which can be used as a kind of explainability measure. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISSN 0031-3203 | ISBN | Medium | ||
Area | Expedition | Conference | |||
Notes | DAG; 600.155; 600.121 | Approved | no | ||
Call Number | Admin @ si @ TKV2023 | Serial | 3825 | ||
Permanent link to this record | |||||
Author | Pau Riba; Lutz Goldmann; Oriol Ramos Terrades; Diede Rusticus; Alicia Fornes; Josep Llados | ||||
Title | Table detection in business document images by message passing networks | Type | Journal Article | ||
Year | 2022 | Publication | Pattern Recognition | Abbreviated Journal | PR |
Volume | 127 | Issue | Pages | 108641 | |
Keywords | |||||
Abstract | Tabular structures in business documents offer a complementary dimension to the raw textual data. For instance, there is information about the relationships among pieces of information. Nowadays, digital mailroom applications have become a key service for workflow automation. Therefore, the detection and interpretation of tables is crucial. With the recent advances in information extraction, table detection and recognition has gained interest in document image analysis, in particular, with the absence of rule lines and unknown information about rows and columns. However, business documents usually contain sensitive contents limiting the amount of public benchmarking datasets. In this paper, we propose a graph-based approach for detecting tables in document images which do not require the raw content of the document. Hence, the sensitive content can be previously removed and, instead of using the raw image or textual content, we propose a purely structural approach to keep sensitive data anonymous. Our framework uses graph neural networks (GNNs) to describe the local repetitive structures that constitute a table. In particular, our main application domain are business documents. We have carefully validated our approach in two invoice datasets and a modern document benchmark. Our experiments demonstrate that tables can be detected by purely structural approaches. | ||||
Address | July 2022 | ||||
Corporate Author | Thesis | ||||
Publisher | Elsevier | Place of Publication | Editor | ||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | DAG; 600.162; 600.121 | Approved | no | ||
Call Number | Admin @ si @ RGR2022 | Serial | 3729 | ||
Permanent link to this record | |||||
Author | Anjan Dutta; Josep Llados; Horst Bunke; Umapada Pal | ||||
Title | Product graph-based higher order contextual similarities for inexact subgraph matching | Type | Journal Article | ||
Year | 2018 | Publication | Pattern Recognition | Abbreviated Journal | PR |
Volume | 76 | Issue | Pages | 596-611 | |
Keywords | |||||
Abstract | Many algorithms formulate graph matching as an optimization of an objective function of pairwise quantification of nodes and edges of two graphs to be matched. Pairwise measurements usually consider local attributes but disregard contextual information involved in graph structures. We address this issue by proposing contextual similarities between pairs of nodes. This is done by considering the tensor product graph (TPG) of two graphs to be matched, where each node is an ordered pair of nodes of the operand graphs. Contextual similarities between a pair of nodes are computed by accumulating weighted walks (normalized pairwise similarities) terminating at the corresponding paired node in TPG. Once the contextual similarities are obtained, we formulate subgraph matching as a node and edge selection problem in TPG. We use contextual similarities to construct an objective function and optimize it with a linear programming approach. Since random walk formulation through TPG takes into account higher order information, it is not a surprise that we obtain more reliable similarities and better discrimination among the nodes and edges. Experimental results shown on synthetic as well as real benchmarks illustrate that higher order contextual similarities increase discriminating power and allow one to find approximate solutions to the subgraph matching problem. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | DAG; 602.167; 600.097; 600.121 | Approved | no | ||
Call Number | Admin @ si @ DLB2018 | Serial | 3083 | ||
Permanent link to this record | |||||
Author | Marçal Rusiñol; David Aldavert; Ricardo Toledo; Josep Llados | ||||
Title | Efficient segmentation-free keyword spotting in historical document collections | Type | Journal Article | ||
Year | 2015 | Publication | Pattern Recognition | Abbreviated Journal | PR |
Volume | 48 | Issue | 2 | Pages | 545–555 |
Keywords | Historical documents; Keyword spotting; Segmentation-free; Dense SIFT features; Latent semantic analysis; Product quantization | ||||
Abstract | In this paper we present an efficient segmentation-free word spotting method, applied in the context of historical document collections, that follows the query-by-example paradigm. We use a patch-based framework where local patches are described by a bag-of-visual-words model powered by SIFT descriptors. By projecting the patch descriptors to a topic space with the latent semantic analysis technique and compressing the descriptors with the product quantization method, we are able to efficiently index the document information both in terms of memory and time. The proposed method is evaluated using four different collections of historical documents achieving good performances on both handwritten and typewritten scenarios. The yielded performances outperform the recent state-of-the-art keyword spotting approaches. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | DAG; ADAS; 600.076; 600.077; 600.061; 601.223; 602.006; 600.055 | Approved | no | ||
Call Number | Admin @ si @ RAT2015a | Serial | 2544 | ||
Permanent link to this record | |||||
Author | Jun Wan; Sergio Escalera; Francisco Perales; Josef Kittler | ||||
Title | Articulated Motion and Deformable Objects | Type | Journal Article | ||
Year | 2018 | Publication | Pattern Recognition | Abbreviated Journal | PR |
Volume | 79 | Issue | Pages | 55-64 | |
Keywords | |||||
Abstract | This guest editorial introduces the twenty two papers accepted for this Special Issue on Articulated Motion and Deformable Objects (AMDO). They are grouped into four main categories within the field of AMDO: human motion analysis (action/gesture), human pose estimation, deformable shape segmentation, and face analysis. For each of the four topics, a survey of the recent developments in the field is presented. The accepted papers are briefly introduced in the context of this survey. They contribute novel methods, algorithms with improved performance as measured on benchmarking datasets, as well as two new datasets for hand action detection and human posture analysis. The special issue should be of high relevance to the reader interested in AMDO recognition and promote future research directions in the field. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | HUPBA; no proj | Approved | no | ||
Call Number | Admin @ si @ WEP2018 | Serial | 3126 | ||
Permanent link to this record | |||||
Author | Meysam Madadi; Hugo Bertiche; Sergio Escalera | ||||
Title | SMPLR: Deep learning based SMPL reverse for 3D human pose and shape recovery | Type | Journal Article | ||
Year | 2020 | Publication | Pattern Recognition | Abbreviated Journal | PR |
Volume | 106 | Issue | Pages | 107472 | |
Keywords | Deep learning; 3D Human pose; Body shape; SMPL; Denoising autoencoder; Volumetric stack hourglass | ||||
Abstract | In this paper we propose to embed SMPL within a deep-based model to accurately estimate 3D pose and shape from a still RGB image. We use CNN-based 3D joint predictions as an intermediate representation to regress SMPL pose and shape parameters. Later, 3D joints are reconstructed again in the SMPL output. This module can be seen as an autoencoder where the encoder is a deep neural network and the decoder is SMPL model. We refer to this as SMPL reverse (SMPLR). By implementing SMPLR as an encoder-decoder we avoid the need of complex constraints on pose and shape. Furthermore, given that in-the-wild datasets usually lack accurate 3D annotations, it is desirable to lift 2D joints to 3D without pairing 3D annotations with RGB images. Therefore, we also propose a denoising autoencoder (DAE) module between CNN and SMPLR, able to lift 2D joints to 3D and partially recover from structured error. We evaluate our method on SURREAL and Human3.6M datasets, showing improvement over SMPL-based state-of-the-art alternatives by about 4 and 12 mm, respectively. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | HuPBA; no proj | Approved | no | ||
Call Number | Admin @ si @ MBE2020 | Serial | 3439 | ||
Permanent link to this record | |||||
Author | Miguel Angel Bautista; Sergio Escalera; Oriol Pujol | ||||
Title | On the Design of an ECOC-Compliant Genetic Algorithm | Type | Journal Article | ||
Year | 2014 | Publication | Pattern Recognition | Abbreviated Journal | PR |
Volume | 47 | Issue | 2 | Pages | 865-884 |
Keywords | |||||
Abstract | Genetic Algorithms (GA) have been previously applied to Error-Correcting Output Codes (ECOC) in state-of-the-art works in order to find a suitable coding matrix. Nevertheless, none of the presented techniques directly take into account the properties of the ECOC matrix. As a result the considered search space is unnecessarily large. In this paper, a novel Genetic strategy to optimize the ECOC coding step is presented. This novel strategy redefines the usual crossover and mutation operators in order to take into account the theoretical properties of the ECOC framework. Thus, it reduces the search space and lets the algorithm to converge faster. In addition, a novel operator that is able to enlarge the code in a smart way is introduced. The novel methodology is tested on several UCI datasets and four challenging computer vision problems. Furthermore, the analysis of the results done in terms of performance, code length and number of Support Vectors shows that the optimization process is able to find very efficient codes, in terms of the trade-off between classification performance and the number of classifiers. Finally, classification performance per dichotomizer results shows that the novel proposal is able to obtain similar or even better results while defining a more compact number of dichotomies and SVs compared to state-of-the-art approaches. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | HuPBA;MILAB | Approved | no | ||
Call Number | Admin @ si @ BEP2013 | Serial | 2254 | ||
Permanent link to this record | |||||
Author | Mohammad Ali Bagheri; Qigang Gao; Sergio Escalera | ||||
Title | A Genetic-based Subspace Analysis Method for Improving Error-Correcting Output Coding | Type | Journal Article | ||
Year | 2013 | Publication | Pattern Recognition | Abbreviated Journal | PR |
Volume | 46 | Issue | 10 | Pages | 2830-2839 |
Keywords | Error Correcting Output Codes; Evolutionary computation; Multiclass classification; Feature subspace; Ensemble classification | ||||
Abstract | Two key factors affecting the performance of Error Correcting Output Codes (ECOC) in multiclass classification problems are the independence of binary classifiers and the problem-dependent coding design. In this paper, we propose an evolutionary algorithm-based approach to the design of an application-dependent codematrix in the ECOC framework. The central idea of this work is to design a three-dimensional codematrix, where the third dimension is the feature space of the problem domain. In order to do that, we consider the feature space in the design process of the codematrix with the aim of improving the independence and accuracy of binary classifiers. The proposed method takes advantage of some basic concepts of ensemble classification, such as diversity of classifiers, and also benefits from the evolutionary approach for optimizing the three-dimensional codematrix, taking into account the problem domain. We provide a set of experimental results using a set of benchmark datasets from the UCI Machine Learning Repository, as well as two real multiclass Computer Vision problems. Both sets of experiments are conducted using two different base learners: Neural Networks and Decision Trees. The results show that the proposed method increases the classification accuracy in comparison with the state-of-the-art ECOC coding techniques. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Elsevier | Place of Publication | Editor | ||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | 0031-3203 | ISBN | Medium | ||
Area | Expedition | Conference | |||
Notes | HuPBA;MILAB | Approved | no | ||
Call Number | Admin @ si @ BGE2013a | Serial | 2247 | ||
Permanent link to this record | |||||
Author | Debora Gil; Aura Hernandez-Sabate; Mireia Brunat;Steven Jansen; Jordi Martinez-Vilalta | ||||
Title | Structure-preserving smoothing of biomedical images | Type | Journal Article | ||
Year | 2011 | Publication | Pattern Recognition | Abbreviated Journal | PR |
Volume | 44 | Issue | 9 | Pages | 1842-1851 |
Keywords | Non-linear smoothing; Differential geometry; Anatomical structures; segmentation; Cardiac magnetic resonance; Computerized tomography | ||||
Abstract | Smoothing of biomedical images should preserve gray-level transitions between adjacent tissues, while restoring contours consistent with anatomical structures. Anisotropic diffusion operators are based on image appearance discontinuities (either local or contextual) and might fail at weak inter-tissue transitions. Meanwhile, the output of block-wise and morphological operations is prone to present a block structure due to the shape and size of the considered pixel neighborhood. In this contribution, we use differential geometry concepts to define a diffusion operator that restricts to image consistent level-sets. In this manner, the final state is a non-uniform intensity image presenting homogeneous inter-tissue transitions along anatomical structures, while smoothing intra-structure texture. Experiments on different types of medical images (magnetic resonance, computerized tomography) illustrate its benefit on a further process (such as segmentation) of images. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | 0031-3203 | ISBN | Medium | ||
Area | Expedition | Conference | |||
Notes | IAM; ADAS | Approved | no | ||
Call Number | IAM @ iam @ GHB2011 | Serial | 1526 | ||
Permanent link to this record | |||||
Author | Ignasi Rius; Jordi Gonzalez; Javier Varona; Xavier Roca | ||||
Title | Action-specific motion prior for efficient bayesian 3D human body tracking | Type | Journal Article | ||
Year | 2009 | Publication | Pattern Recognition | Abbreviated Journal | PR |
Volume | 42 | Issue | 11 | Pages | 2907–2921 |
Keywords | |||||
Abstract | In this paper, we aim to reconstruct the 3D motion parameters of a human body
model from the known 2D positions of a reduced set of joints in the image plane. Towards this end, an action-specific motion model is trained from a database of real motion-captured performances. The learnt motion model is used within a particle filtering framework as a priori knowledge on human motion. First, our dynamic model guides the particles according to similar situations previously learnt. Then, the solution space is constrained so only feasible human postures are accepted as valid solutions at each time step. As a result, we are able to track the 3D configuration of the full human body from several cycles of walking motion sequences using only the 2D positions of a very reduced set of joints from lateral or frontal viewpoints. |
||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | 0031-3203 | ISBN | Medium | ||
Area | Expedition | Conference | |||
Notes | ISE | Approved | no | ||
Call Number | ISE @ ise @ RGV2009 | Serial | 1159 | ||
Permanent link to this record | |||||
Author | Parichehr Behjati; Pau Rodriguez; Carles Fernandez; Isabelle Hupont; Armin Mehri; Jordi Gonzalez | ||||
Title | Single image super-resolution based on directional variance attention network | Type | Journal Article | ||
Year | 2023 | Publication | Pattern Recognition | Abbreviated Journal | PR |
Volume | 133 | Issue | Pages | 108997 | |
Keywords | |||||
Abstract | Recent advances in single image super-resolution (SISR) explore the power of deep convolutional neural networks (CNNs) to achieve better performance. However, most of the progress has been made by scaling CNN architectures, which usually raise computational demands and memory consumption. This makes modern architectures less applicable in practice. In addition, most CNN-based SR methods do not fully utilize the informative hierarchical features that are helpful for final image recovery. In order to address these issues, we propose a directional variance attention network (DiVANet), a computationally efficient yet accurate network for SISR. Specifically, we introduce a novel directional variance attention (DiVA) mechanism to capture long-range spatial dependencies and exploit inter-channel dependencies simultaneously for more discriminative representations. Furthermore, we propose a residual attention feature group (RAFG) for parallelizing attention and residual block computation. The output of each residual block is linearly fused at the RAFG output to provide access to the whole feature hierarchy. In parallel, DiVA extracts most relevant features from the network for improving the final output and preventing information loss along the successive operations inside the network. Experimental results demonstrate the superiority of DiVANet over the state of the art in several datasets, while maintaining relatively low computation and memory footprint. The code is available at https://github.com/pbehjatii/DiVANet. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | ISE | Approved | no | ||
Call Number | Admin @ si @ BPF2023 | Serial | 3861 | ||
Permanent link to this record | |||||
Author | Ivan Huerta; Marco Pedersoli; Jordi Gonzalez; Alberto Sanfeliu | ||||
Title | Combining where and what in change detection for unsupervised foreground learning in surveillance | Type | Journal Article | ||
Year | 2015 | Publication | Pattern Recognition | Abbreviated Journal | PR |
Volume | 48 | Issue | 3 | Pages | 709-719 |
Keywords | Object detection; Unsupervised learning; Motion segmentation; Latent variables; Support vector machine; Multiple appearance models; Video surveillance | ||||
Abstract | Change detection is the most important task for video surveillance analytics such as foreground and anomaly detection. Current foreground detectors learn models from annotated images since the goal is to generate a robust foreground model able to detect changes in all possible scenarios. Unfortunately, manual labelling is very expensive. Most advanced supervised learning techniques based on generic object detection datasets currently exhibit very poor performance when applied to surveillance datasets because of the unconstrained nature of such environments in terms of types and appearances of objects. In this paper, we take advantage of change detection for training multiple foreground detectors in an unsupervised manner. We use statistical learning techniques which exploit the use of latent parameters for selecting the best foreground model parameters for a given scenario. In essence, the main novelty of our proposed approach is to combine the where (motion segmentation) and what (learning procedure) in change detection in an unsupervised way for improving the specificity and generalization power of foreground detectors at the same time. We propose a framework based on latent support vector machines that, given a noisy initialization based on motion cues, learns the correct position, aspect ratio, and appearance of all moving objects in a particular scene. Specificity is achieved by learning the particular change detections of a given scenario, and generalization is guaranteed since our method can be applied to any possible scene and foreground object, as demonstrated in the experimental results outperforming the state-of-the-art. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | ISE; 600.063; 600.078 | Approved | no | ||
Call Number | Admin @ si @ HPG2015 | Serial | 2589 | ||
Permanent link to this record | |||||
Author | Marco Pedersoli; Andrea Vedaldi; Jordi Gonzalez; Xavier Roca | ||||
Title | A coarse-to-fine approach for fast deformable object detection | Type | Journal Article | ||
Year | 2015 | Publication | Pattern Recognition | Abbreviated Journal | PR |
Volume | 48 | Issue | 5 | Pages | 1844-1853 |
Keywords | |||||
Abstract | We present a method that can dramatically accelerate object detection with part based models. The method is based on the observation that the cost of detection is likely to be dominated by the cost of matching each part to the image, and not by the cost of computing the optimal configuration of the parts as commonly assumed. Therefore accelerating detection requires minimizing the number of
part-to-image comparisons. To this end we propose a multiple-resolutions hierarchical part based model and a corresponding coarse-to-fine inference procedure that recursively eliminates from the search space unpromising part placements. The method yields a ten-fold speedup over the standard dynamic programming approach and is complementary to the cascade-of-parts approach of [9]. Compared to the latter, our method does not have parameters to be determined empirically, which simplifies its use during the training of the model. Most importantly, the two techniques can be combined to obtain a very significant speedup, of two orders of magnitude in some cases. We evaluate our method extensively on the PASCAL VOC and INRIA datasets, demonstrating a very high increase in the detection speed with little degradation of the accuracy. |
||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | ISE; 600.078; 602.005; 605.001; 302.012 | Approved | no | ||
Call Number | Admin @ si @ PVG2015 | Serial | 2628 | ||
Permanent link to this record | |||||
Author | Pau Rodriguez; Guillem Cucurull; Josep M. Gonfaus; Xavier Roca; Jordi Gonzalez | ||||
Title | Age and gender recognition in the wild with deep attention | Type | Journal Article | ||
Year | 2017 | Publication | Pattern Recognition | Abbreviated Journal | PR |
Volume | 72 | Issue | Pages | 563-571 | |
Keywords | Age recognition; Gender recognition; Deep neural networks; Attention mechanisms | ||||
Abstract | Face analysis in images in the wild still pose a challenge for automatic age and gender recognition tasks, mainly due to their high variability in resolution, deformation, and occlusion. Although the performance has highly increased thanks to Convolutional Neural Networks (CNNs), it is still far from optimal when compared to other image recognition tasks, mainly because of the high sensitiveness of CNNs to facial variations. In this paper, inspired by biology and the recent success of attention mechanisms on visual question answering and fine-grained recognition, we propose a novel feedforward attention mechanism that is able to discover the most informative and reliable parts of a given face for improving age and gender classification. In particular, given a downsampled facial image, the proposed model is trained based on a novel end-to-end learning framework to extract the most discriminative patches from the original high-resolution image. Experimental validation on the standard Adience, Images of Groups, and MORPH II benchmarks show that including attention mechanisms enhances the performance of CNNs in terms of robustness and accuracy. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | ISE; 600.098; 602.133; 600.119 | Approved | no | ||
Call Number | Admin @ si @ RCG2017b | Serial | 2962 | ||
Permanent link to this record |