toggle visibility Search & Display Options

Select All    Deselect All
 |   | 
Details
   print
  Records Links
Author Carme Julia; Angel Sappa; Felipe Lumbreras; Joan Serrat; Antonio Lopez edit   pdf
doi  openurl
  Title Rank Estimation in 3D Multibody Motion Segmentation Type Journal Article
  Year 2008 Publication Electronic Letters Abbreviated Journal  
  Volume 44 Issue 4 Pages (down) 279-280  
  Keywords  
  Abstract A novel technique for rank estimation in 3D multibody motion segmentation is proposed. It is based on the study of the frequency spectra of moving rigid objects and does not use or assume a prior knowledge of the objects contained in the scene (i.e. number of objects and motion). The significance of rank estimation on multibody motion segmentation results is shown by using two motion segmentation algorithms over both synthetic and real data.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes ADAS Approved no  
  Call Number ADAS @ adas @ JSL2008a Serial 939  
Permanent link to this record
 

 
Author Jean-Pascal Jacob; Mariella Dimiccoli; Lionel Moisan edit   pdf
doi  openurl
  Title Active skeleton for bacteria modeling Type Journal Article
  Year 2016 Publication Computer Methods in Biomechanics and Biomedical Engineering: Imaging and Visualization Abbreviated Journal CMBBE  
  Volume 5 Issue 4 Pages (down) 274-286  
  Keywords Bacteria modelling; medial axis; active contours; active skeleton; shape contraints  
  Abstract The investigation of spatio-temporal dynamics of bacterial cells and their molecular components requires automated image analysis tools to track cell shape properties and molecular component locations inside the cells. In the study of bacteria aging, the molecular components of interest are protein aggregates accumulated near bacteria boundaries. This particular location makes very ambiguous the correspondence between aggregates and cells, since computing accurately bacteria boundaries in phase-contrast time-lapse imaging is a challenging task. This paper proposes an active skeleton formulation for bacteria modeling which provides several advantages: an easy computation of shape properties (perimeter, length, thickness, orientation), an improved boundary accuracy in noisy images, and a natural bacteria-centered coordinate system that permits the intrinsic location of molecular components inside the cell. Starting from an initial skeleton estimate, the medial axis of the bacterium is obtained by minimizing an energy function which incorporates bacteria shape constraints. Experimental results on biological images and comparative evaluation of the performances validate the proposed approach for modeling cigar-shaped bacteria like Escherichia coli. The Image-J plugin of the proposed method can be found online at this http URL  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes MILAB Approved no  
  Call Number Admin @ si @ JDM2016 Serial 2711  
Permanent link to this record
 

 
Author Jean-Pascal Jacob; Mariella Dimiccoli; L. Moisan edit   pdf
url  openurl
  Title Active skeleton for bacteria modelling Type Journal Article
  Year 2017 Publication Computer Methods in Biomechanics and Biomedical Engineering: Imaging and Visualization Abbreviated Journal CMBBE  
  Volume 5 Issue 4 Pages (down) 274-286  
  Keywords  
  Abstract The investigation of spatio-temporal dynamics of bacterial cells and their molecular components requires automated image analysis tools to track cell shape properties and molecular component locations inside the cells. In the study of bacteria aging, the molecular components of interest are protein aggregates accumulated near bacteria boundaries. This particular location makes very ambiguous the correspondence between aggregates and cells, since computing accurately bacteria boundaries in phase-contrast time-lapse imaging is a challenging task. This paper proposes an active skeleton formulation for bacteria modelling which provides several advantages: an easy computation of shape properties (perimeter, length, thickness and orientation), an improved boundary accuracy in noisy images and a natural bacteria-centred coordinate system that permits the intrinsic location of molecular components inside the cell. Starting from an initial skeleton estimate, the medial axis of the bacterium is obtained by minimising an energy function which incorporates bacteria shape constraints. Experimental results on biological images and comparative evaluation of the performances validate the proposed approach for modelling cigar-shaped bacteria like Escherichia coli. The Image-J plugin of the proposed method can be found online at http://fluobactracker.inrialpes.fr.  
  Address  
  Corporate Author Thesis  
  Publisher Taylor & Francis Group Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes MILAB; Approved no  
  Call Number Admin @ si @JDM2017 Serial 2784  
Permanent link to this record
 

 
Author Francesc Carreras; Jaume Garcia; Debora Gil; Sandra Pujadas; Chi ho Lion; R.Suarez-Arias; R.Leta; Xavier Alomar; Manuel Ballester; Guillem Pons-Llados edit  url
doi  openurl
  Title Left ventricular torsion and longitudinal shortening: two fundamental components of myocardial mechanics assessed by tagged cine-MRI in normal subjects Type Journal Article
  Year 2012 Publication International Journal of Cardiovascular Imaging Abbreviated Journal IJCI  
  Volume 28 Issue 2 Pages (down) 273-284  
  Keywords Magnetic resonance imaging (MRI); Tagging MRI; Cardiac mechanics; Ventricular torsion  
  Abstract Cardiac magnetic resonance imaging (Cardiac MRI) has become a gold standard diagnostic technique for the assessment of cardiac mechanics, allowing the non-invasive calculation of left ventric- ular long axis longitudinal shortening (LVLS) and absolute myocardial torsion (AMT) between basal and apical left ventricular slices, a movement directly related to the helicoidal anatomic disposition of the myocardial fibers. The aim of this study is to determine AMT and LVLS behaviour and normal values from a group of healthy subjects. A group of 21 healthy volunteers (15 males) (age: 23–55 y.o., mean:30.7 ± 7.5) were prospectively included in an obser- vational study by Cardiac MRI. Left ventricular rotation (degrees) was calculated by custom-made software (Harmonic Phase Flow) in consecutive LV short axis planes tagged cine-MRI sequences. AMT was determined from the difference between basal and apical planes LV rotations. LVLS (%) was determined from the LV longitudinal and horizontal axis cine-MRI images. All the 21 cases studied were interpretable, although in three cases the value of the LV apical rotation could not be determined. The mean rotation of the basal and apical planes at end-systole were -3.71° ± 0.84° and 6.73° ± 1.69° (n:18) respectively, resulting in a LV mean AMT of 10.48° ± 1.63° (n:18). End-systolic mean LVLS was 19.07 ± 2.71%. Cardiac MRI allows for the calculation of AMT and LVLS, fundamental functional components of the ventricular twist mechanics conditioned, in turn, by the anatomical helical layout of the myocardial fibers. These values provide complementary information about systolic ventricular function in relation to the traditional parameters used in daily practice.  
  Address  
  Corporate Author Thesis  
  Publisher Springer Netherlands Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN 1569-5794 ISBN Medium  
  Area Expedition Conference  
  Notes IAM; Approved no  
  Call Number IAM @ iam @ CGG2012 Serial 1496  
Permanent link to this record
 

 
Author David Berga; Xavier Otazu edit   pdf
url  openurl
  Title Modeling Bottom-Up and Top-Down Attention with a Neurodynamic Model of V1 Type Journal Article
  Year 2020 Publication Neurocomputing Abbreviated Journal NEUCOM  
  Volume 417 Issue Pages (down) 270-289  
  Keywords  
  Abstract Previous studies suggested that lateral interactions of V1 cells are responsible, among other visual effects, of bottom-up visual attention (alternatively named visual salience or saliency). Our objective is to mimic these connections with a neurodynamic network of firing-rate neurons in order to predict visual attention. Early visual subcortical processes (i.e. retinal and thalamic) are functionally simulated. An implementation of the cortical magnification function is included to define the retinotopical projections towards V1, processing neuronal activity for each distinct view during scene observation. Novel computational definitions of top-down inhibition (in terms of inhibition of return, oculomotor and selection mechanisms), are also proposed to predict attention in Free-Viewing and Visual Search tasks. Results show that our model outpeforms other biologically inspired models of saliency prediction while predicting visual saccade sequences with the same model. We also show how temporal and spatial characteristics of saccade amplitude and inhibition of return can improve prediction of saccades, as well as how distinct search strategies (in terms of feature-selective or category-specific inhibition) can predict attention at distinct image contexts.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes NEUROBIT Approved no  
  Call Number Admin @ si @ BeO2020c Serial 3444  
Permanent link to this record
 

 
Author Sanket Biswas; Pau Riba; Josep Llados; Umapada Pal edit   pdf
url  doi
openurl 
  Title Beyond Document Object Detection: Instance-Level Segmentation of Complex Layouts Type Journal Article
  Year 2021 Publication International Journal on Document Analysis and Recognition Abbreviated Journal IJDAR  
  Volume 24 Issue Pages (down) 269–281  
  Keywords  
  Abstract Information extraction is a fundamental task of many business intelligence services that entail massive document processing. Understanding a document page structure in terms of its layout provides contextual support which is helpful in the semantic interpretation of the document terms. In this paper, inspired by the progress of deep learning methodologies applied to the task of object recognition, we transfer these models to the specific case of document object detection, reformulating the traditional problem of document layout analysis. Moreover, we importantly contribute to prior arts by defining the task of instance segmentation on the document image domain. An instance segmentation paradigm is especially important in complex layouts whose contents should interact for the proper rendering of the page, i.e., the proper text wrapping around an image. Finally, we provide an extensive evaluation, both qualitative and quantitative, that demonstrates the superior performance of the proposed methodology over the current state of the art.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes DAG; 600.121; 600.140; 110.312 Approved no  
  Call Number Admin @ si @ BRL2021b Serial 3574  
Permanent link to this record
 

 
Author C. Butakoff; Simone Balocco; F.M. Sukno; C. Hoogendoorn; C. Tobon-Gomez; G. Avegliano; A.F. Frangi edit   pdf
doi  openurl
  Title Left-ventricular Epi- and Endocardium Extraction from 3D Ultrasound Images Using an Automatically Constructed 3D ASM Type Journal Article
  Year 2016 Publication Computer Methods in Biomechanics and Biomedical Engineering: Imaging and Visualization Abbreviated Journal CMBBE  
  Volume 4 Issue 5 Pages (down) 265-280  
  Keywords ASM; cardiac segmentation; statistical model; shape model; 3D ultrasound; cardiac segmentation  
  Abstract In this paper, we propose an automatic method for constructing an active shape model (ASM) to segment the complete cardiac left ventricle in 3D ultrasound (3DUS) images, which avoids costly manual landmarking. The automatic construction of the ASM has already been addressed in the literature; however, the direct application of these methods to 3DUS is hampered by a high level of noise and artefacts. Therefore, we propose to construct the ASM by fusing the multidetector computed tomography data, to learn the shape, with the artificially generated 3DUS, in order to learn the neighbourhood of the boundaries. Our artificial images were generated by two approaches: a faster one that does not take into account the geometry of the transducer, and a more comprehensive one, implemented in Field II toolbox. The segmentation accuracy of our ASM was evaluated on 20 patients with left-ventricular asynchrony, demonstrating plausibility of the approach.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN 2168-1163 ISBN Medium  
  Area Expedition Conference  
  Notes MILAB Approved no  
  Call Number Admin @ si @ BBS2016 Serial 2449  
Permanent link to this record
 

 
Author Maria Vanrell; Jordi Vitria; Xavier Roca edit  doi
openurl 
  Title A multidimensional scaling approach to explore the behavior of a texture perception algorithm. Type Journal Article
  Year 1997 Publication Machine Vision and Applications Abbreviated Journal  
  Volume 9 Issue Pages (down) 262–271  
  Keywords  
  Abstract  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes OR;ISE;CIC;MV Approved no  
  Call Number BCNPCL @ bcnpcl @ VVR1997 Serial 35  
Permanent link to this record
 

 
Author Cristina Sanchez Montes; F. Javier Sanchez; Jorge Bernal; Henry Cordova; Maria Lopez Ceron; Miriam Cuatrecasas; Cristina Rodriguez de Miguel; Ana Garcia Rodriguez; Rodrigo Garces Duran; Maria Pellise; Josep Llach; Gloria Fernandez Esparrach edit   pdf
doi  openurl
  Title Computer-aided Prediction of Polyp Histology on White-Light Colonoscopy using Surface Pattern Analysis Type Journal Article
  Year 2019 Publication Endoscopy Abbreviated Journal END  
  Volume 51 Issue 3 Pages (down) 261-265  
  Keywords  
  Abstract Background and study aims: To evaluate a new computational histology prediction system based on colorectal polyp textural surface patterns using high definition white light images.
Patients and methods: Textural elements (textons) were characterized according to their contrast with respect to the surface, shape and number of bifurcations, assuming that dysplastic polyps are associated with highly contrasted, large tubular patterns with some degree of bifurcation. Computer-aided diagnosis (CAD) was compared with pathological diagnosis and the diagnosis by the endoscopists using Kudo and NICE classification.
Results: Images of 225 polyps were evaluated (142 dysplastic and 83 non-dysplastic). CAD system correctly classified 205 (91.1%) polyps, 131/142 (92.3%) dysplastic and 74/83 (89.2%) non-dysplastic. For the subgroup of 100 diminutive (<5 mm) polyps, CAD correctly classified 87 (87%) polyps, 43/50 (86%) dysplastic and 44/50 (88%) non-dysplastic. There were not statistically significant differences in polyp histology prediction based on CAD system and on endoscopist assessment.
Conclusion: A computer vision system based on the characterization of the polyp surface in the white light accurately predicts colorectal polyp histology.
 
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes MV; 600.096; 600.119; 600.075 Approved no  
  Call Number Admin @ si @ SSB2019 Serial 3164  
Permanent link to this record
 

 
Author Marta Diez-Ferrer; Arturo Morales; Rosa Lopez Lisbona; Noelia Cubero; Cristian Tebe; Susana Padrones; Samantha Aso; Jordi Dorca; Debora Gil; Antoni Rosell edit  url
openurl 
  Title Ultrathin Bronchoscopy with and without Virtual Bronchoscopic Navigation: Influence of Segmentation on Diagnostic Yield Type Journal Article
  Year 2019 Publication Respiration Abbreviated Journal RES  
  Volume 97 Issue 3 Pages (down) 252-258  
  Keywords Lung cancer; Peripheral lung lesion; Diagnosis; Bronchoscopy; Ultrathin bronchoscopy; Virtual bronchoscopic navigation  
  Abstract Background: Bronchoscopy is a safe technique for diagnosing peripheral pulmonary lesions (PPLs), and virtual bronchoscopic navigation (VBN) helps guide the bronchoscope to PPLs. Objectives: We aimed to compare the diagnostic yield of VBN-guided and unguided ultrathin bronchoscopy (UTB) and explore clinical and technical factors associated with better results. We developed a diagnostic algorithm for deciding whether to use VBN to reach PPLs or choose an alternative diagnostic approach. Methods: We compared diagnostic yield between VBN-UTB (prospective cases) and unguided UTB (historical controls) and analyzed the VBN-UTB subgroup to identify clinical and technical variables that could predict the success of VBN-UTB. Results: Fifty-five cases and 110 controls were included. The overall diagnostic yield did not differ between the VBN-guided and unguided arms (47 and 40%, respectively; p = 0.354). Although the yield was slightly higher for PPLs ≤20 mm in the VBN-UTB arm, the difference was not significant (p = 0.069). No other clinical characteristics were associated with a higher yield in a subgroup analysis, but an 85% diagnostic yield was observed when segmentation was optimal and the PPL was endobronchial (vs. 30% when segmentation was suboptimal and 20% when segmentation was optimal but the PPL was extrabronchial). Conclusions: VBN-guided UTB is not superior to unguided UTB. A greater impact of VBN-guided over unguided UTB is highly dependent on both segmentation quality and an endobronchial location of the PPL. Segmentation quality should be considered before starting a procedure, when an alternative technique that may improve yield can be chosen, saving time and resources.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes IAM; 600.145; 600.139 Approved no  
  Call Number Admin @ si @ DML2019 Serial 3134  
Permanent link to this record
 

 
Author Eloi Puertas; Sergio Escalera; Oriol Pujol edit   pdf
url  doi
openurl 
  Title Generalized Multi-scale Stacked Sequential Learning for Multi-class Classification Type Journal Article
  Year 2015 Publication Pattern Analysis and Applications Abbreviated Journal PAA  
  Volume 18 Issue 2 Pages (down) 247-261  
  Keywords Stacked sequential learning; Multi-scale; Error-correct output codes (ECOC); Contextual classification  
  Abstract In many classification problems, neighbor data labels have inherent sequential relationships. Sequential learning algorithms take benefit of these relationships in order to improve generalization. In this paper, we revise the multi-scale sequential learning approach (MSSL) for applying it in the multi-class case (MMSSL). We introduce the error-correcting output codesframework in the MSSL classifiers and propose a formulation for calculating confidence maps from the margins of the base classifiers. In addition, we propose a MMSSL compression approach which reduces the number of features in the extended data set without a loss in performance. The proposed methods are tested on several databases, showing significant performance improvement compared to classical approaches.  
  Address  
  Corporate Author Thesis  
  Publisher Springer-Verlag Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN 1433-7541 ISBN Medium  
  Area Expedition Conference  
  Notes HuPBA;MILAB Approved no  
  Call Number Admin @ si @ PEP2013 Serial 2251  
Permanent link to this record
 

 
Author Fernando Vilariño; Panagiota Spyridonos; Fosca De Iorio; Jordi Vitria; Fernando Azpiroz; Petia Radeva edit   pdf
doi  openurl
  Title Intestinal Motility Assessment With Video Capsule Endoscopy: Automatic Annotation of Phasic Intestinal Contractions Type Journal Article
  Year 2010 Publication IEEE Transactions on Medical Imaging Abbreviated Journal TMI  
  Volume 29 Issue 2 Pages (down) 246-259  
  Keywords  
  Abstract Intestinal motility assessment with video capsule endoscopy arises as a novel and challenging clinical fieldwork. This technique is based on the analysis of the patterns of intestinal contractions shown in a video provided by an ingestible capsule with a wireless micro-camera. The manual labeling of all the motility events requires large amount of time for offline screening in search of findings with low prevalence, which turns this procedure currently unpractical. In this paper, we propose a machine learning system to automatically detect the phasic intestinal contractions in video capsule endoscopy, driving a useful but not feasible clinical routine into a feasible clinical procedure. Our proposal is based on a sequential design which involves the analysis of textural, color, and blob features together with SVM classifiers. Our approach tackles the reduction of the imbalance rate of data and allows the inclusion of domain knowledge as new stages in the cascade. We present a detailed analysis, both in a quantitative and a qualitative way, by providing several measures of performance and the assessment study of interobserver variability. Our system performs at 70% of sensitivity for individual detection, whilst obtaining equivalent patterns to those of the experts for density of contractions.  
  Address  
  Corporate Author IEEE Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN 0278-0062 ISBN Medium  
  Area 800 Expedition Conference  
  Notes MILAB;MV;OR;SIAI Approved no  
  Call Number BCNPCL @ bcnpcl @ VSD2010; IAM @ iam @ VSI2010 Serial 1281  
Permanent link to this record
 

 
Author Alicia Fornes; Josep Llados; Gemma Sanchez; Xavier Otazu; Horst Bunke edit  doi
openurl 
  Title A Combination of Features for Symbol-Independent Writer Identification in Old Music Scores Type Journal Article
  Year 2010 Publication International Journal on Document Analysis and Recognition Abbreviated Journal IJDAR  
  Volume 13 Issue 4 Pages (down) 243-259  
  Keywords  
  Abstract The aim of writer identification is determining the writer of a piece of handwriting from a set of writers. In this paper, we present an architecture for writer identification in old handwritten music scores. Even though an important amount of music compositions contain handwritten text, the aim of our work is to use only music notation to determine the author. The main contribution is therefore the use of features extracted from graphical alphabets. Our proposal consists in combining the identification results of two different approaches, based on line and textural features. The steps of the ensemble architecture are the following. First of all, the music sheet is preprocessed for removing the staff lines. Then, music lines and texture images are generated for computing line features and textural features. Finally, the classification results are combined for identifying the writer. The proposed method has been tested on a database of old music scores from the seventeenth to nineteenth centuries, achieving a recognition rate of about 92% with 20 writers.  
  Address  
  Corporate Author Thesis  
  Publisher Springer-Verlag Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN 1433-2833 ISBN Medium  
  Area Expedition Conference  
  Notes DAG; CAT;CIC Approved no  
  Call Number FLS2010b Serial 1319  
Permanent link to this record
 

 
Author Alicia Fornes; Anjan Dutta; Albert Gordo; Josep Llados edit   pdf
doi  openurl
  Title CVC-MUSCIMA: A Ground-Truth of Handwritten Music Score Images for Writer Identification and Staff Removal Type Journal Article
  Year 2012 Publication International Journal on Document Analysis and Recognition Abbreviated Journal IJDAR  
  Volume 15 Issue 3 Pages (down) 243-251  
  Keywords Music scores; Handwritten documents; Writer identification; Staff removal; Performance evaluation; Graphics recognition; Ground truths  
  Abstract 0,405JCR
The analysis of music scores has been an active research field in the last decades. However, there are no publicly available databases of handwritten music scores for the research community. In this paper we present the CVC-MUSCIMA database and ground-truth of handwritten music score images. The dataset consists of 1,000 music sheets written by 50 different musicians. It has been especially designed for writer identification and staff removal tasks. In addition to the description of the dataset, ground-truth, partitioning and evaluation metrics, we also provide some base-line results for easing the comparison between different approaches.
 
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN 1433-2833 ISBN Medium  
  Area Expedition Conference  
  Notes DAG Approved no  
  Call Number Admin @ si @ FDG2012 Serial 2129  
Permanent link to this record
 

 
Author Lluis Gomez; Ali Furkan Biten; Ruben Tito; Andres Mafla; Marçal Rusiñol; Ernest Valveny; Dimosthenis Karatzas edit   pdf
url  openurl
  Title Multimodal grid features and cell pointers for scene text visual question answering Type Journal Article
  Year 2021 Publication Pattern Recognition Letters Abbreviated Journal PRL  
  Volume 150 Issue Pages (down) 242-249  
  Keywords  
  Abstract This paper presents a new model for the task of scene text visual question answering. In this task questions about a given image can only be answered by reading and understanding scene text. Current state of the art models for this task make use of a dual attention mechanism in which one attention module attends to visual features while the other attends to textual features. A possible issue with this is that it makes difficult for the model to reason jointly about both modalities. To fix this problem we propose a new model that is based on an single attention mechanism that attends to multi-modal features conditioned to the question. The output weights of this attention module over a grid of multi-modal spatial features are interpreted as the probability that a certain spatial location of the image contains the answer text to the given question. Our experiments demonstrate competitive performance in two standard datasets with a model that is faster than previous methods at inference time. Furthermore, we also provide a novel analysis of the ST-VQA dataset based on a human performance study. Supplementary material, code, and data is made available through this link.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes DAG; 600.084; 600.121 Approved no  
  Call Number Admin @ si @ GBT2021 Serial 3620  
Permanent link to this record
Select All    Deselect All
 |   | 
Details
   print

Save Citations:
Export Records: