|   | 
Details
   web
Records
Author (down) Sanket Biswas; Pau Riba; Josep Llados; Umapada Pal
Title Beyond Document Object Detection: Instance-Level Segmentation of Complex Layouts Type Journal Article
Year 2021 Publication International Journal on Document Analysis and Recognition Abbreviated Journal IJDAR
Volume 24 Issue Pages 269–281
Keywords
Abstract Information extraction is a fundamental task of many business intelligence services that entail massive document processing. Understanding a document page structure in terms of its layout provides contextual support which is helpful in the semantic interpretation of the document terms. In this paper, inspired by the progress of deep learning methodologies applied to the task of object recognition, we transfer these models to the specific case of document object detection, reformulating the traditional problem of document layout analysis. Moreover, we importantly contribute to prior arts by defining the task of instance segmentation on the document image domain. An instance segmentation paradigm is especially important in complex layouts whose contents should interact for the proper rendering of the page, i.e., the proper text wrapping around an image. Finally, we provide an extensive evaluation, both qualitative and quantitative, that demonstrates the superior performance of the proposed methodology over the current state of the art.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes DAG; 600.121; 600.140; 110.312 Approved no
Call Number Admin @ si @ BRL2021b Serial 3574
Permanent link to this record
 

 
Author (down) Sanket Biswas; Pau Riba; Josep Llados; Umapada Pal
Title Graph-Based Deep Generative Modelling for Document Layout Generation Type Conference Article
Year 2021 Publication 16th International Conference on Document Analysis and Recognition Abbreviated Journal
Volume 12917 Issue Pages 525-537
Keywords
Abstract One of the major prerequisites for any deep learning approach is the availability of large-scale training data. When dealing with scanned document images in real world scenarios, the principal information of its content is stored in the layout itself. In this work, we have proposed an automated deep generative model using Graph Neural Networks (GNNs) to generate synthetic data with highly variable and plausible document layouts that can be used to train document interpretation systems, in this case, specially in digital mailroom applications. It is also the first graph-based approach for document layout generation task experimented on administrative document images, in this case, invoices.
Address Lausanne; Suissa; September 2021
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title LNCS
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes DAG; 600.121; 600.140; 110.312 Approved no
Call Number Admin @ si @ BRL2021 Serial 3676
Permanent link to this record
 

 
Author (down) Sangheeta Roy; Palaiahnakote Shivakumara; Namita Jain; Vijeta Khare; Anjan Dutta; Umapada Pal; Tong Lu
Title Rough-Fuzzy based Scene Categorization for Text Detection and Recognition in Video Type Journal Article
Year 2018 Publication Pattern Recognition Abbreviated Journal PR
Volume 80 Issue Pages 64-82
Keywords Rough set; Fuzzy set; Video categorization; Scene image classification; Video text detection; Video text recognition
Abstract Scene image or video understanding is a challenging task especially when number of video types increases drastically with high variations in background and foreground. This paper proposes a new method for categorizing scene videos into different classes, namely, Animation, Outlet, Sports, e-Learning, Medical, Weather, Defense, Economics, Animal Planet and Technology, for the performance improvement of text detection and recognition, which is an effective approach for scene image or video understanding. For this purpose, at first, we present a new combination of rough and fuzzy concept to study irregular shapes of edge components in input scene videos, which helps to classify edge components into several groups. Next, the proposed method explores gradient direction information of each pixel in each edge component group to extract stroke based features by dividing each group into several intra and inter planes. We further extract correlation and covariance features to encode semantic features located inside planes or between planes. Features of intra and inter planes of groups are then concatenated to get a feature matrix. Finally, the feature matrix is verified with temporal frames and fed to a neural network for categorization. Experimental results show that the proposed method outperforms the existing state-of-the-art methods, at the same time, the performances of text detection and recognition methods are also improved significantly due to categorization.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes DAG; 600.097; 600.121 Approved no
Call Number Admin @ si @ RSJ2018 Serial 3096
Permanent link to this record
 

 
Author (down) Sangeeth Reddy; Minesh Mathew; Lluis Gomez; Marçal Rusiñol; Dimosthenis Karatzas; C.V. Jawahar
Title RoadText-1K: Text Detection and Recognition Dataset for Driving Videos Type Conference Article
Year 2020 Publication IEEE International Conference on Robotics and Automation Abbreviated Journal
Volume Issue Pages
Keywords
Abstract Perceiving text is crucial to understand semantics of outdoor scenes and hence is a critical requirement to build intelligent systems for driver assistance and self-driving. Most of the existing datasets for text detection and recognition comprise still images and are mostly compiled keeping text in mind. This paper introduces a new ”RoadText-1K” dataset for text in driving videos. The dataset is 20 times larger than the existing largest dataset for text in videos. Our dataset comprises 1000 video clips of driving without any bias towards text and with annotations for text bounding boxes and transcriptions in every frame. State of the art methods for text detection,
recognition and tracking are evaluated on the new dataset and the results signify the challenges in unconstrained driving videos compared to existing datasets. This suggests that RoadText-1K is suited for research and development of reading systems, robust enough to be incorporated into more complex downstream tasks like driver assistance and self-driving. The dataset can be found at http://cvit.iiit.ac.in/research/
projects/cvit-projects/roadtext-1k
Address Paris; Francia; ???
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference ICRA
Notes DAG; 600.121; 600.129 Approved no
Call Number Admin @ si @ RMG2020 Serial 3400
Permanent link to this record
 

 
Author (down) Sandra Pujades;Francesc Carreras;Manuel Ballester; Jaume Garcia; Debora Gil
Title A Normalized Parametric Domain for the Analysis of the Left Ventricular Function Type Conference Article
Year 2008 Publication Proceedings of the Third International Conference on Computer Vision Theory and Applications (VISAPP’08) Abbreviated Journal
Volume 1 Issue Pages 267-274
Keywords Helical Ventricular Myocardial Band; Myocardial Fiber; Tagged Magnetic Resonance; HARP; Optical Flow Variational Framework; Gabor Filters; B-Splines.
Abstract Impairment of left ventricular (LV) contractility due to cardiovascular diseases is reflected in LV motion patterns. The mechanics of any muscle strongly depends on the spatial orientation of its muscular fibers since the motion that the muscle undergoes mainly takes place along the fiber. The helical ventricular myocardial band (HVMB) concept describes the myocardial muscle as a unique muscular band that twists in space in a non homogeneous fashion. The 3D anisotropy of the ventricular band fibers suggests a regional analysis of the heart motion. Computation of normality models of such motion can help in the detection and localization of any cardiac disorder. In this paper we introduce, for the first time, a normalized parametric domain that allows comparison of the left ventricle motion across patients. We address, both, extraction of the LV motion from Tagged Magnetic Resonance images, as well as, defining a mapping of the LV to a common normalized domain. Extraction of normality motion patterns from 17 healthy volunteers shows the clinical potential of our LV parametrization.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes IAM; Approved no
Call Number IAM @ iam @ GGP2008 Serial 1627
Permanent link to this record
 

 
Author (down) Sandra Jimenez; Xavier Otazu; Valero Laparra; Jesus Malo
Title Chromatic induction and contrast masking: similar models, different goals? Type Conference Article
Year 2013 Publication Human Vision and Electronic Imaging XVIII Abbreviated Journal
Volume 8651 Issue Pages
Keywords
Abstract Normalization of signals coming from linear sensors is an ubiquitous mechanism of neural adaptation.1 Local interaction between sensors tuned to a particular feature at certain spatial position and neighbor sensors explains a wide range of psychophysical facts including (1) masking of spatial patterns, (2) non-linearities of motion sensors, (3) adaptation of color perception, (4) brightness and chromatic induction, and (5) image quality assessment. Although the above models have formal and qualitative similarities, it does not necessarily mean that the mechanisms involved are pursuing the same statistical goal. For instance, in the case of chromatic mechanisms (disregarding spatial information), different parameters in the normalization give rise to optimal discrimination or adaptation, and different non-linearities may give rise to error minimization or component independence. In the case of spatial sensors (disregarding color information), a number of studies have pointed out the benefits of masking in statistical independence terms. However, such statistical analysis has not been performed for spatio-chromatic induction models where chromatic perception depends on spatial configuration. In this work we investigate whether successful spatio-chromatic induction models,6 increase component independence similarly as previously reported for masking models. Mutual information analysis suggests that seeking an efficient chromatic representation may explain the prevalence of induction effects in spatially simple images. © (2013) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Address San Francisco CA; USA; February 2013
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference HVEI
Notes CIC Approved no
Call Number Admin @ si @ JOL2013 Serial 2240
Permanent link to this record
 

 
Author (down) Salvatore Tabbone; Oriol Ramos Terrades; S. Barrat
Title Histogram of radon transform. A useful descriptor for shape retrieval Type Conference Article
Year 2008 Publication 19th International Conference on Pattern Recognition Abbreviated Journal
Volume Issue Pages 1-4
Keywords
Abstract
Address Tampa, Florida
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference ICPR
Notes DAG Approved no
Call Number Admin @ si @ TRB2008 Serial 1876
Permanent link to this record
 

 
Author (down) Salvatore Tabbone; Oriol Ramos Terrades
Title An Overview of Symbol Recognition Type Book Chapter
Year 2014 Publication Handbook of Document Image Processing and Recognition Abbreviated Journal
Volume D Issue Pages 523-551
Keywords Pattern recognition; Shape descriptors; Structural descriptors; Symbolrecognition; Symbol spotting
Abstract According to the Cambridge Dictionaries Online, a symbol is a sign, shape, or object that is used to represent something else. Symbol recognition is a subfield of general pattern recognition problems that focuses on identifying, detecting, and recognizing symbols in technical drawings, maps, or miscellaneous documents such as logos and musical scores. This chapter aims at providing the reader an overview of the different existing ways of describing and recognizing symbols and how the field has evolved to attain a certain degree of maturity.
Address
Corporate Author Thesis
Publisher Springer London Place of Publication Editor D. Doermann; K. Tombre
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN 978-0-85729-858-4 Medium
Area Expedition Conference
Notes DAG; 600.077 Approved no
Call Number Admin @ si @ TaT2014 Serial 2489
Permanent link to this record
 

 
Author (down) Salvatore Tabbone; Josep Llados
Title A Propos de la Reconnaissance de Documents Graphiques: Synthese et Perspectives Type Conference Article
Year 2007 Publication Traitement et Analyse de l’Information: Methodes et Applications Abbreviated Journal
Volume Issue Pages 247–258
Keywords
Abstract
Address Hammamet (Tunis)
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference TAIMA’07
Notes DAG Approved no
Call Number DAG @ dag @ TaL2007 Serial 890
Permanent link to this record
 

 
Author (down) Salim Jouili; Salvatore Tabbone; Ernest Valveny
Title Comparing Graph Similarity Measures for Graphical Recognition Type Book Chapter
Year 2010 Publication Graphics Recognition. Achievements, Challenges, and Evolution. 8th International Workshop, GREC 2009. Selected Papers Abbreviated Journal
Volume 6020 Issue Pages 37-48
Keywords
Abstract In this paper we evaluate four graph distance measures. The analysis is performed for document retrieval tasks. For this aim, different kind of documents are used including line drawings (symbols), ancient documents (ornamental letters), shapes and trademark-logos. The experimental results show that the performance of each graph distance measure depends on the kind of data and the graph representation technique.
Address
Corporate Author Thesis
Publisher Springer Berlin Heidelberg Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title LNCS
Series Volume Series Issue Edition
ISSN 0302-9743 ISBN 978-3-642-13727-3 Medium
Area Expedition Conference GREC
Notes DAG Approved no
Call Number Admin @ si @ JTV2010 Serial 2404
Permanent link to this record
 

 
Author (down) Salim Jouili; Salvatore Tabbone; Ernest Valveny
Title Evaluation of graph matching measures for documents retrieval Type Conference Article
Year 2009 Publication In proceedings of 8th IAPR International Workshop on Graphics Recognition Abbreviated Journal
Volume Issue Pages 13–21
Keywords Graph Matching; Graph retrieval; structural representation; Performance Evaluation
Abstract In this paper we evaluate four graph distance measures. The analysis is performed for document retrieval tasks. For this aim, different kind of documents are used which include line drawings (symbols), ancient documents (ornamental letters), shapes and trademark-logos. The experimental results show that the performance of each grahp distance measure depends on the kind of data and the graph representation technique.
Address La Rochelle, France
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN 0302-9743 ISBN 978-3-642-13727-3 Medium
Area Expedition Conference GREC
Notes DAG Approved no
Call Number DAG @ dag @ JTV2009a Serial 1230
Permanent link to this record
 

 
Author (down) Salim Jouili; Salvatore Tabbone; Ernest Valveny
Title Comparing Graph Similarity Measures for Graphical Recognition. Type Conference Article
Year 2009 Publication 8th IAPR International Workshop on Graphics Recognition Abbreviated Journal
Volume Issue Pages
Keywords
Abstract In this paper we evaluate four graph distance measures. The analysis is performed for document retrieval tasks. For this aim, different kind of documents are used including line drawings (symbols), ancient documents (ornamental letters), shapes and trademark-logos. The experimental results show that the performance of each graph distance measure depends on the kind of data and the graph representation technique.
Address La Rochelle; France; July 2009
Corporate Author Thesis
Publisher Springer Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title LNCS
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference GREC
Notes DAG Approved no
Call Number DAG @ dag @ JTV2009 Serial 1442
Permanent link to this record
 

 
Author (down) Saiping Zhang; Luis Herranz; Marta Mrak; Marc Gorriz Blanch; Shuai Wan; Fuzheng Yang
Title DCNGAN: A Deformable Convolution-Based GAN with QP Adaptation for Perceptual Quality Enhancement of Compressed Video Type Conference Article
Year 2022 Publication 47th International Conference on Acoustics, Speech, and Signal Processing Abbreviated Journal
Volume Issue Pages
Keywords
Abstract In this paper, we propose a deformable convolution-based generative adversarial network (DCNGAN) for perceptual quality enhancement of compressed videos. DCNGAN is also adaptive to the quantization parameters (QPs). Compared with optical flows, deformable convolutions are more effective and efficient to align frames. Deformable convolutions can operate on multiple frames, thus leveraging more temporal information, which is beneficial for enhancing the perceptual quality of compressed videos. Instead of aligning frames in a pairwise manner, the deformable convolution can process multiple frames simultaneously, which leads to lower computational complexity. Experimental results demonstrate that the proposed DCNGAN outperforms other state-of-the-art compressed video quality enhancement algorithms.
Address Virtual; May 2022
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference ICASSP
Notes MACO; 600.161; 601.379 Approved no
Call Number Admin @ si @ ZHM2022a Serial 3765
Permanent link to this record
 

 
Author (down) Saiping Zhang, Luis Herranz, Marta Mrak, Marc Gorriz Blanch, Shuai Wan, Fuzheng Yang
Title PeQuENet: Perceptual Quality Enhancement of Compressed Video with Adaptation-and Attention-based Network Type Miscellaneous
Year 2022 Publication Arxiv Abbreviated Journal
Volume Issue Pages
Keywords
Abstract In this paper we propose a generative adversarial network (GAN) framework to enhance the perceptual quality of compressed videos. Our framework includes attention and adaptation to different quantization parameters (QPs) in a single model. The attention module exploits global receptive fields that can capture and align long-range correlations between consecutive frames, which can be beneficial for enhancing perceptual quality of videos. The frame to be enhanced is fed into the deep network together with its neighboring frames, and in the first stage features at different depths are extracted. Then extracted features are fed into attention blocks to explore global temporal correlations, followed by a series of upsampling and convolution layers. Finally, the resulting features are processed by the QP-conditional adaptation module which leverages the corresponding QP information. In this way, a single model can be used to enhance adaptively to various QPs without requiring multiple models specific for every QP value, while having similar performance. Experimental results demonstrate the superior performance of the proposed PeQuENet compared with the state-of-the-art compressed video quality enhancement algorithms.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes MACO; no proj Approved no
Call Number Admin @ si @ ZHM2022b Serial 3819
Permanent link to this record
 

 
Author (down) Sagnik Das; Hassan Ahmed Sial; Ke Ma; Ramon Baldrich; Maria Vanrell; Dimitris Samaras
Title Intrinsic Decomposition of Document Images In-the-Wild Type Conference Article
Year 2020 Publication 31st British Machine Vision Conference Abbreviated Journal
Volume Issue Pages
Keywords
Abstract Automatic document content processing is affected by artifacts caused by the shape
of the paper, non-uniform and diverse color of lighting conditions. Fully-supervised
methods on real data are impossible due to the large amount of data needed. Hence, the
current state of the art deep learning models are trained on fully or partially synthetic images. However, document shadow or shading removal results still suffer because: (a) prior methods rely on uniformity of local color statistics, which limit their application on real-scenarios with complex document shapes and textures and; (b) synthetic or hybrid datasets with non-realistic, simulated lighting conditions are used to train the models. In this paper we tackle these problems with our two main contributions. First, a physically constrained learning-based method that directly estimates document reflectance based on intrinsic image formation which generalizes to challenging illumination conditions. Second, a new dataset that clearly improves previous synthetic ones, by adding a large range of realistic shading and diverse multi-illuminant conditions, uniquely customized to deal with documents in-the-wild. The proposed architecture works in two steps. First, a white balancing module neutralizes the color of the illumination on the input image. Based on the proposed multi-illuminant dataset we achieve a good white-balancing in really difficult conditions. Second, the shading separation module accurately disentangles the shading and paper material in a self-supervised manner where only the synthetic texture is used as a weak training signal (obviating the need for very costly ground truth with disentangled versions of shading and reflectance). The proposed approach leads to significant generalization of document reflectance estimation in real scenes with challenging illumination. We extensively evaluate on the real benchmark datasets available for intrinsic image decomposition and document shadow removal tasks. Our reflectance estimation scheme, when used as a pre-processing step of an OCR pipeline, shows a 21% improvement of character error rate (CER), thus, proving the practical applicability. The data and code will be available at: https://github.com/cvlab-stonybrook/DocIIW.
Address Virtual; September 2020
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference BMVC
Notes CIC; 600.087; 600.140; 600.118 Approved no
Call Number Admin @ si @ DSM2020 Serial 3461
Permanent link to this record