|   | 
Details
   web
Records
Author Ruben Tito; Dimosthenis Karatzas; Ernest Valveny
Title Hierarchical multimodal transformers for Multi-Page DocVQA Type Journal Article
Year 2023 Publication (up) Pattern Recognition Abbreviated Journal PR
Volume 144 Issue Pages 109834
Keywords
Abstract Document Visual Question Answering (DocVQA) refers to the task of answering questions from document images. Existing work on DocVQA only considers single-page documents. However, in real scenarios documents are mostly composed of multiple pages that should be processed altogether. In this work we extend DocVQA to the multi-page scenario. For that, we first create a new dataset, MP-DocVQA, where questions are posed over multi-page documents instead of single pages. Second, we propose a new hierarchical method, Hi-VT5, based on the T5 architecture, that overcomes the limitations of current methods to process long multi-page documents. The proposed method is based on a hierarchical transformer architecture where the encoder summarizes the most relevant information of every page and then, the decoder takes this summarized information to generate the final answer. Through extensive experimentation, we demonstrate that our method is able, in a single stage, to answer the questions and provide the page that contains the relevant information to find the answer, which can be used as a kind of explainability measure.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISSN 0031-3203 ISBN Medium
Area Expedition Conference
Notes DAG; 600.155; 600.121 Approved no
Call Number Admin @ si @ TKV2023 Serial 3825
Permanent link to this record
 

 
Author Souhail Bakkali; Zuheng Ming; Mickael Coustaty; Marçal Rusiñol; Oriol Ramos Terrades
Title VLCDoC: Vision-Language Contrastive Pre-Training Model for Cross-Modal Document Classification Type Journal Article
Year 2023 Publication (up) Pattern Recognition Abbreviated Journal PR
Volume 139 Issue Pages 109419
Keywords
Abstract Multimodal learning from document data has achieved great success lately as it allows to pre-train semantically meaningful features as a prior into a learnable downstream approach. In this paper, we approach the document classification problem by learning cross-modal representations through language and vision cues, considering intra- and inter-modality relationships. Instead of merging features from different modalities into a common representation space, the proposed method exploits high-level interactions and learns relevant semantic information from effective attention flows within and across modalities. The proposed learning objective is devised between intra- and inter-modality alignment tasks, where the similarity distribution per task is computed by contracting positive sample pairs while simultaneously contrasting negative ones in the common feature representation space}. Extensive experiments on public document classification datasets demonstrate the effectiveness and the generalization capacity of our model on both low-scale and large-scale datasets.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISSN 0031-3203 ISBN Medium
Area Expedition Conference
Notes DAG; 600.140; 600.121 Approved no
Call Number Admin @ si @ BMC2023 Serial 3826
Permanent link to this record
 

 
Author Ruben Tito; Dimosthenis Karatzas; Ernest Valveny
Title Hierarchical multimodal transformers for Multipage DocVQA Type Journal Article
Year 2023 Publication (up) Pattern Recognition Abbreviated Journal PR
Volume 144 Issue 109834 Pages
Keywords
Abstract Existing work on DocVQA only considers single-page documents. However, in real applications documents are mostly composed of multiple pages that should be processed altogether. In this work, we propose a new multimodal hierarchical method Hi-VT5, that overcomes the limitations of current methods to process long multipage documents. In contrast to previous hierarchical methods that focus on different semantic granularity (He et al., 2021) or different subtasks (Zhou et al., 2022) used in image classification. Our method is a hierarchical transformer architecture where the encoder learns to summarize the most relevant information of every page and then, the decoder uses this summarized representation to generate the final answer, following a bottom-up approach. Moreover, due to the lack of multipage DocVQA datasets, we also introduce MP-DocVQA, an extension of SP-DocVQA where questions are posed over multipage documents instead of single pages. Through extensive experimentation, we demonstrate that Hi-VT5 is able, in a single stage, to answer the questions and provide the page that contains the answer, which can be used as a kind of explainability measure.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes DAG Approved no
Call Number Admin @ si @ TKV2023 Serial 3836
Permanent link to this record
 

 
Author Parichehr Behjati; Pau Rodriguez; Carles Fernandez; Isabelle Hupont; Armin Mehri; Jordi Gonzalez
Title Single image super-resolution based on directional variance attention network Type Journal Article
Year 2023 Publication (up) Pattern Recognition Abbreviated Journal PR
Volume 133 Issue Pages 108997
Keywords
Abstract Recent advances in single image super-resolution (SISR) explore the power of deep convolutional neural networks (CNNs) to achieve better performance. However, most of the progress has been made by scaling CNN architectures, which usually raise computational demands and memory consumption. This makes modern architectures less applicable in practice. In addition, most CNN-based SR methods do not fully utilize the informative hierarchical features that are helpful for final image recovery. In order to address these issues, we propose a directional variance attention network (DiVANet), a computationally efficient yet accurate network for SISR. Specifically, we introduce a novel directional variance attention (DiVA) mechanism to capture long-range spatial dependencies and exploit inter-channel dependencies simultaneously for more discriminative representations. Furthermore, we propose a residual attention feature group (RAFG) for parallelizing attention and residual block computation. The output of each residual block is linearly fused at the RAFG output to provide access to the whole feature hierarchy. In parallel, DiVA extracts most relevant features from the network for improving the final output and preventing information loss along the successive operations inside the network. Experimental results demonstrate the superiority of DiVANet over the state of the art in several datasets, while maintaining relatively low computation and memory footprint. The code is available at https://github.com/pbehjatii/DiVANet.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes ISE Approved no
Call Number Admin @ si @ BPF2023 Serial 3861
Permanent link to this record
 

 
Author Xavier Soria; Angel Sappa; Patricio Humanante; Arash Akbarinia
Title Dense extreme inception network for edge detection Type Journal Article
Year 2023 Publication (up) Pattern Recognition Abbreviated Journal PR
Volume 139 Issue Pages 109461
Keywords
Abstract Edge detection is the basis of many computer vision applications. State of the art predominantly relies on deep learning with two decisive factors: dataset content and network architecture. Most of the publicly available datasets are not curated for edge detection tasks. Here, we address this limitation. First, we argue that edges, contours and boundaries, despite their overlaps, are three distinct visual features requiring separate benchmark datasets. To this end, we present a new dataset of edges. Second, we propose a novel architecture, termed Dense Extreme Inception Network for Edge Detection (DexiNed), that can be trained from scratch without any pre-trained weights. DexiNed outperforms other algorithms in the presented dataset. It also generalizes well to other datasets without any fine-tuning. The higher quality of DexiNed is also perceptually evident thanks to the sharper and finer edges it outputs.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes MSIAU Approved no
Call Number Admin @ si @ SSH2023 Serial 3982
Permanent link to this record
 

 
Author Antonio Lopez; W. Niessen; Joan Serrat; K. Nikolay; B. Ter Haar Romeny; Juan J. Villanueva; M. Viergerver
Title New improvements in the multiscale analysis of trabecular bone patterns Type Book Chapter
Year 2000 Publication (up) Pattern Recognition and Applications Abbreviated Journal
Volume Issue Pages 251-260
Keywords
Abstract
Address
Corporate Author Thesis
Publisher IOS Press Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes ADAS Approved no
Call Number Admin @ si @ Serial 3418
Permanent link to this record
 

 
Author Xavier Roca; Jordi Vitria; Maria Vanrell; Juan J. Villanueva
Title Visual behaviours for binocular navigation with autonomous systems. Type Miscellaneous
Year 2000 Publication (up) Pattern Recognition and Applications, IOS Press, 134–143. Abbreviated Journal
Volume Issue Pages
Keywords
Abstract
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes OR;ISE;CIC;MV Approved no
Call Number BCNPCL @ bcnpcl @ RVV2000 Serial 245
Permanent link to this record
 

 
Author Antonio Lopez; W. Niessen; Joan Serrat; K. Nicolay; Bart M. Ter Haar Romeny; Juan J. Villanueva; M. Viergever
Title New improvements in the multiscale analysis of trabecular bone patterns. Type Miscellaneous
Year 2000 Publication (up) Pattern Recognition and Applications, IOS Press, 251–260. Abbreviated Journal
Volume Issue Pages
Keywords
Abstract
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes ADAS Approved no
Call Number ADAS @ adas @ LNS2000 Serial 332
Permanent link to this record
 

 
Author Daniel Ponsa; A.F. Sole; Antonio Lopez; Cristina Cañero; Petia Radeva; Jordi Vitria
Title Regularized EM. Type Miscellaneous
Year 2000 Publication (up) Pattern Recognition and Applications, IOS Press, 69–77. Abbreviated Journal
Volume Issue Pages
Keywords
Abstract
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes invisible;ADAS;OR;MILAB;MV Approved no
Call Number ADAS @ adas @ PSL2000 Serial 336
Permanent link to this record
 

 
Author Javier Varona; A. Pujol; Juan J. Villanueva
Title Visual Tracking in Application Domains. Type Miscellaneous
Year 2000 Publication (up) Pattern Recognition and Applications, IOS Press, 99–106. Abbreviated Journal
Volume Issue Pages
Keywords
Abstract
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes Approved no
Call Number ISE @ ise @ VPV2000 Serial 333
Permanent link to this record
 

 
Author Quan-sen Sun; Zhong Jin; Pheng-ann Heng; De-shen Xia
Title A novel feature fusion method based on partial least squares regression Type Book Chapter
Year 2005 Publication (up) Pattern Recognition and Data Mining, Lecture Notes in Computer Science, 3686: 268–277 Abbreviated Journal
Volume Issue Pages
Keywords
Abstract
Address Bath (United Kingdom)
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes Approved no
Call Number Admin @ si @ SJH2005 Serial 626
Permanent link to this record
 

 
Author Agnes Borras; Josep Llados
Title Object Image Retrieval by Shape Content in Complex Scenes Using Geometric Constraints Type Book Chapter
Year 2005 Publication (up) Pattern Recognition And Image Analysis Abbreviated Journal LNCS
Volume 3522 Issue Pages 325–332
Keywords
Abstract This paper presents an image retrieval system based on 2D shape information. Query shape objects and database images are repre- sented by polygonal approximations of their contours. Afterwards they are encoded, using geometric features, in terms of predefined structures. Shapes are then located in database images by a voting procedure on the spatial domain. Then an alignment matching provides a probability value to rank de database image in the retrieval result. The method al- lows to detect a query object in database images even when they contain complex scenes. Also the shape matching tolerates partial occlusions and affine transformations as translation, rotation or scaling.
Address Estoril (Portugal)
Corporate Author Thesis
Publisher Springer Link Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes DAG; Approved no
Call Number DAG @ dag @ BoL2005; IAM @ iam @ BoL2005 Serial 556
Permanent link to this record
 

 
Author V. Kober; Mikhail Mozerov; J. Alvarez-Borrego; I.A. Ovseyevich
Title Adaptive Correlation Filters for Pattern Recognition Type Journal
Year 2006 Publication (up) Pattern Recognition and Image Analysis Abbreviated Journal
Volume 16 Issue 3 Pages 425-431
Keywords Pattern recognition, Correlation filters, A adaptive filters
Abstract Adaptive correlation filters based on synthetic discriminant functions (SDFs) for reliable pattern recognition are proposed. A given value of discrimination capability can be achieved by adapting a SDF filter to the input scene. This can be done by iterative training. Computer simulation results obtained with the proposed filters are compared with those of various correlation filters in terms of recognition performance.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes ISE Approved no
Call Number ISE @ ise @ KMA2006a Serial 673
Permanent link to this record
 

 
Author Mikhail Mozerov; Ariel Amato; Xavier Roca; Jordi Gonzalez
Title Solving the Multi Object Occlusion Problem in a Multiple Camera Tracking System Type Journal
Year 2009 Publication (up) Pattern Recognition and Image Analysis Abbreviated Journal
Volume 19 Issue 1 Pages 165-171
Keywords
Abstract An efficient method to overcome adverse effects of occlusion upon object tracking is presented. The method is based on matching paths of objects in time and solves a complex occlusion-caused problem of merging separate segments of the same path.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN 1054-6618 ISBN Medium
Area Expedition Conference
Notes ISE Approved no
Call Number ISE @ ise @ MAR2009a Serial 1160
Permanent link to this record
 

 
Author E. Tavalera; Mariella Dimiccoli; Marc Bolaños; Maedeh Aghaei; Petia Radeva
Title Regularized Clustering for Egocentric Video Segmentation Type Book Chapter
Year 2015 Publication (up) Pattern Recognition and Image Analysis Abbreviated Journal
Volume Issue Pages 327-336
Keywords Temporal video segmentation ; Egocentric videos ; Clustering
Abstract In this paper, we present a new method for egocentric video temporal segmentation based on integrating a statistical mean change detector and agglomerative clustering(AC) within an energyminimization framework. Given the tendency of most AC methods to oversegment video sequences when clustering their frames, we combine the clustering with a concept drift detection technique (ADWIN) that has rigorous guarantee of performances. ADWIN serves as a statistical upper bound for the clustering-based video segmentation. We integrate techniques in an energy-minimization framework that serves disambiguate the decision of both techniques and to complete the segmentation taking into account the temporal continuity of video frames We present experiments over egocentric sets of more than 13.000 images acquired with different wearable cameras, showing that our method outperforms state-of-the-art clustering methods.
Address
Corporate Author Thesis
Publisher Springer International Publishing Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title LNCS
Series Volume Series Issue Edition
ISSN ISBN 978-3-319-19390-8 Medium
Area Expedition Conference
Notes MILAB Approved no
Call Number Admin @ si @TDB2015a Serial 2781
Permanent link to this record