Home | [21–30] << 31 32 33 34 35 36 37 38 39 40 >> [41–50] |
Records | |||||
---|---|---|---|---|---|
Author | Sanket Biswas; Pau Riba; Josep Llados; Umapada Pal | ||||
Title | Beyond Document Object Detection: Instance-Level Segmentation of Complex Layouts | Type | Journal Article | ||
Year | 2021 | Publication | International Journal on Document Analysis and Recognition | Abbreviated Journal | IJDAR |
Volume | 24 | Issue | Pages | 269–281 | |
Keywords | |||||
Abstract | Information extraction is a fundamental task of many business intelligence services that entail massive document processing. Understanding a document page structure in terms of its layout provides contextual support which is helpful in the semantic interpretation of the document terms. In this paper, inspired by the progress of deep learning methodologies applied to the task of object recognition, we transfer these models to the specific case of document object detection, reformulating the traditional problem of document layout analysis. Moreover, we importantly contribute to prior arts by defining the task of instance segmentation on the document image domain. An instance segmentation paradigm is especially important in complex layouts whose contents should interact for the proper rendering of the page, i.e., the proper text wrapping around an image. Finally, we provide an extensive evaluation, both qualitative and quantitative, that demonstrates the superior performance of the proposed methodology over the current state of the art. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | DAG; 600.121; 600.140; 110.312 | Approved | no | ||
Call Number | Admin @ si @ BRL2021b | Serial | 3574 | ||
Permanent link to this record | |||||
Author | Sanket Biswas; Pau Riba; Josep Llados; Umapada Pal | ||||
Title | Graph-Based Deep Generative Modelling for Document Layout Generation | Type | Conference Article | ||
Year | 2021 | Publication | 16th International Conference on Document Analysis and Recognition | Abbreviated Journal | |
Volume | 12917 | Issue | Pages | 525-537 | |
Keywords | |||||
Abstract | One of the major prerequisites for any deep learning approach is the availability of large-scale training data. When dealing with scanned document images in real world scenarios, the principal information of its content is stored in the layout itself. In this work, we have proposed an automated deep generative model using Graph Neural Networks (GNNs) to generate synthetic data with highly variable and plausible document layouts that can be used to train document interpretation systems, in this case, specially in digital mailroom applications. It is also the first graph-based approach for document layout generation task experimented on administrative document images, in this case, invoices. | ||||
Address | Lausanne; Suissa; September 2021 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | LNCS | ||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | DAG; 600.121; 600.140; 110.312 | Approved | no | ||
Call Number | Admin @ si @ BRL2021 | Serial | 3676 | ||
Permanent link to this record | |||||
Author | Sangheeta Roy; Palaiahnakote Shivakumara; Namita Jain; Vijeta Khare; Anjan Dutta; Umapada Pal; Tong Lu | ||||
Title | Rough-Fuzzy based Scene Categorization for Text Detection and Recognition in Video | Type | Journal Article | ||
Year | 2018 | Publication | Pattern Recognition | Abbreviated Journal | PR |
Volume | 80 | Issue | Pages | 64-82 | |
Keywords | Rough set; Fuzzy set; Video categorization; Scene image classification; Video text detection; Video text recognition | ||||
Abstract | Scene image or video understanding is a challenging task especially when number of video types increases drastically with high variations in background and foreground. This paper proposes a new method for categorizing scene videos into different classes, namely, Animation, Outlet, Sports, e-Learning, Medical, Weather, Defense, Economics, Animal Planet and Technology, for the performance improvement of text detection and recognition, which is an effective approach for scene image or video understanding. For this purpose, at first, we present a new combination of rough and fuzzy concept to study irregular shapes of edge components in input scene videos, which helps to classify edge components into several groups. Next, the proposed method explores gradient direction information of each pixel in each edge component group to extract stroke based features by dividing each group into several intra and inter planes. We further extract correlation and covariance features to encode semantic features located inside planes or between planes. Features of intra and inter planes of groups are then concatenated to get a feature matrix. Finally, the feature matrix is verified with temporal frames and fed to a neural network for categorization. Experimental results show that the proposed method outperforms the existing state-of-the-art methods, at the same time, the performances of text detection and recognition methods are also improved significantly due to categorization. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | DAG; 600.097; 600.121 | Approved | no | ||
Call Number | Admin @ si @ RSJ2018 | Serial | 3096 | ||
Permanent link to this record | |||||
Author | Sangeeth Reddy; Minesh Mathew; Lluis Gomez; Marçal Rusiñol; Dimosthenis Karatzas; C.V. Jawahar | ||||
Title | RoadText-1K: Text Detection and Recognition Dataset for Driving Videos | Type | Conference Article | ||
Year | 2020 | Publication | IEEE International Conference on Robotics and Automation | Abbreviated Journal | |
Volume | Issue | Pages | |||
Keywords | |||||
Abstract | Perceiving text is crucial to understand semantics of outdoor scenes and hence is a critical requirement to build intelligent systems for driver assistance and self-driving. Most of the existing datasets for text detection and recognition comprise still images and are mostly compiled keeping text in mind. This paper introduces a new ”RoadText-1K” dataset for text in driving videos. The dataset is 20 times larger than the existing largest dataset for text in videos. Our dataset comprises 1000 video clips of driving without any bias towards text and with annotations for text bounding boxes and transcriptions in every frame. State of the art methods for text detection,
recognition and tracking are evaluated on the new dataset and the results signify the challenges in unconstrained driving videos compared to existing datasets. This suggests that RoadText-1K is suited for research and development of reading systems, robust enough to be incorporated into more complex downstream tasks like driver assistance and self-driving. The dataset can be found at http://cvit.iiit.ac.in/research/ projects/cvit-projects/roadtext-1k |
||||
Address | Paris; Francia; ??? | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | ICRA | ||
Notes | DAG; 600.121; 600.129 | Approved | no | ||
Call Number | Admin @ si @ RMG2020 | Serial | 3400 | ||
Permanent link to this record | |||||
Author | Sandra Pujades;Francesc Carreras;Manuel Ballester; Jaume Garcia; Debora Gil | ||||
Title | A Normalized Parametric Domain for the Analysis of the Left Ventricular Function | Type | Conference Article | ||
Year | 2008 | Publication | Proceedings of the Third International Conference on Computer Vision Theory and Applications (VISAPP’08) | Abbreviated Journal | |
Volume | 1 | Issue | Pages | 267-274 | |
Keywords | Helical Ventricular Myocardial Band; Myocardial Fiber; Tagged Magnetic Resonance; HARP; Optical Flow Variational Framework; Gabor Filters; B-Splines. | ||||
Abstract | Impairment of left ventricular (LV) contractility due to cardiovascular diseases is reflected in LV motion patterns. The mechanics of any muscle strongly depends on the spatial orientation of its muscular fibers since the motion that the muscle undergoes mainly takes place along the fiber. The helical ventricular myocardial band (HVMB) concept describes the myocardial muscle as a unique muscular band that twists in space in a non homogeneous fashion. The 3D anisotropy of the ventricular band fibers suggests a regional analysis of the heart motion. Computation of normality models of such motion can help in the detection and localization of any cardiac disorder. In this paper we introduce, for the first time, a normalized parametric domain that allows comparison of the left ventricle motion across patients. We address, both, extraction of the LV motion from Tagged Magnetic Resonance images, as well as, defining a mapping of the LV to a common normalized domain. Extraction of normality motion patterns from 17 healthy volunteers shows the clinical potential of our LV parametrization. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | IAM; | Approved | no | ||
Call Number | IAM @ iam @ GGP2008 | Serial | 1627 | ||
Permanent link to this record | |||||
Author | Sandra Jimenez; Xavier Otazu; Valero Laparra; Jesus Malo | ||||
Title | Chromatic induction and contrast masking: similar models, different goals? | Type | Conference Article | ||
Year | 2013 | Publication | Human Vision and Electronic Imaging XVIII | Abbreviated Journal | |
Volume | 8651 | Issue | Pages | ||
Keywords | |||||
Abstract | Normalization of signals coming from linear sensors is an ubiquitous mechanism of neural adaptation.1 Local interaction between sensors tuned to a particular feature at certain spatial position and neighbor sensors explains a wide range of psychophysical facts including (1) masking of spatial patterns, (2) non-linearities of motion sensors, (3) adaptation of color perception, (4) brightness and chromatic induction, and (5) image quality assessment. Although the above models have formal and qualitative similarities, it does not necessarily mean that the mechanisms involved are pursuing the same statistical goal. For instance, in the case of chromatic mechanisms (disregarding spatial information), different parameters in the normalization give rise to optimal discrimination or adaptation, and different non-linearities may give rise to error minimization or component independence. In the case of spatial sensors (disregarding color information), a number of studies have pointed out the benefits of masking in statistical independence terms. However, such statistical analysis has not been performed for spatio-chromatic induction models where chromatic perception depends on spatial configuration. In this work we investigate whether successful spatio-chromatic induction models,6 increase component independence similarly as previously reported for masking models. Mutual information analysis suggests that seeking an efficient chromatic representation may explain the prevalence of induction effects in spatially simple images. © (2013) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only. | ||||
Address | San Francisco CA; USA; February 2013 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | HVEI | ||
Notes | CIC | Approved | no | ||
Call Number | Admin @ si @ JOL2013 | Serial | 2240 | ||
Permanent link to this record | |||||
Author | Salvatore Tabbone; Oriol Ramos Terrades; S. Barrat | ||||
Title | Histogram of radon transform. A useful descriptor for shape retrieval | Type | Conference Article | ||
Year | 2008 | Publication | 19th International Conference on Pattern Recognition | Abbreviated Journal | |
Volume | Issue | Pages | 1-4 | ||
Keywords | |||||
Abstract | |||||
Address | Tampa, Florida | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | ICPR | ||
Notes | DAG | Approved | no | ||
Call Number | Admin @ si @ TRB2008 | Serial | 1876 | ||
Permanent link to this record | |||||
Author | Salvatore Tabbone; Oriol Ramos Terrades | ||||
Title | An Overview of Symbol Recognition | Type | Book Chapter | ||
Year | 2014 | Publication | Handbook of Document Image Processing and Recognition | Abbreviated Journal | |
Volume | D | Issue | Pages | 523-551 | |
Keywords | Pattern recognition; Shape descriptors; Structural descriptors; Symbolrecognition; Symbol spotting | ||||
Abstract | According to the Cambridge Dictionaries Online, a symbol is a sign, shape, or object that is used to represent something else. Symbol recognition is a subfield of general pattern recognition problems that focuses on identifying, detecting, and recognizing symbols in technical drawings, maps, or miscellaneous documents such as logos and musical scores. This chapter aims at providing the reader an overview of the different existing ways of describing and recognizing symbols and how the field has evolved to attain a certain degree of maturity. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Springer London | Place of Publication | Editor | D. Doermann; K. Tombre | |
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | 978-0-85729-858-4 | Medium | ||
Area | Expedition | Conference | |||
Notes | DAG; 600.077 | Approved | no | ||
Call Number | Admin @ si @ TaT2014 | Serial | 2489 | ||
Permanent link to this record | |||||
Author | Salvatore Tabbone; Josep Llados | ||||
Title | A Propos de la Reconnaissance de Documents Graphiques: Synthese et Perspectives | Type | Conference Article | ||
Year | 2007 | Publication | Traitement et Analyse de l’Information: Methodes et Applications | Abbreviated Journal | |
Volume | Issue | Pages | 247–258 | ||
Keywords | |||||
Abstract | |||||
Address | Hammamet (Tunis) | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | TAIMA’07 | ||
Notes | DAG | Approved | no | ||
Call Number | DAG @ dag @ TaL2007 | Serial | 890 | ||
Permanent link to this record | |||||
Author | Salim Jouili; Salvatore Tabbone; Ernest Valveny | ||||
Title | Comparing Graph Similarity Measures for Graphical Recognition | Type | Book Chapter | ||
Year | 2010 | Publication | Graphics Recognition. Achievements, Challenges, and Evolution. 8th International Workshop, GREC 2009. Selected Papers | Abbreviated Journal | |
Volume | 6020 | Issue | Pages | 37-48 | |
Keywords | |||||
Abstract | In this paper we evaluate four graph distance measures. The analysis is performed for document retrieval tasks. For this aim, different kind of documents are used including line drawings (symbols), ancient documents (ornamental letters), shapes and trademark-logos. The experimental results show that the performance of each graph distance measure depends on the kind of data and the graph representation technique. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Springer Berlin Heidelberg | Place of Publication | Editor | ||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | LNCS | ||
Series Volume | Series Issue | Edition | |||
ISSN | 0302-9743 | ISBN | 978-3-642-13727-3 | Medium | |
Area | Expedition | Conference | GREC | ||
Notes | DAG | Approved | no | ||
Call Number | Admin @ si @ JTV2010 | Serial | 2404 | ||
Permanent link to this record | |||||
Author | Salim Jouili; Salvatore Tabbone; Ernest Valveny | ||||
Title | Evaluation of graph matching measures for documents retrieval | Type | Conference Article | ||
Year | 2009 | Publication | In proceedings of 8th IAPR International Workshop on Graphics Recognition | Abbreviated Journal | |
Volume | Issue | Pages | 13–21 | ||
Keywords | Graph Matching; Graph retrieval; structural representation; Performance Evaluation | ||||
Abstract | In this paper we evaluate four graph distance measures. The analysis is performed for document retrieval tasks. For this aim, different kind of documents are used which include line drawings (symbols), ancient documents (ornamental letters), shapes and trademark-logos. The experimental results show that the performance of each grahp distance measure depends on the kind of data and the graph representation technique. | ||||
Address | La Rochelle, France | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | 0302-9743 | ISBN | 978-3-642-13727-3 | Medium | |
Area | Expedition | Conference | GREC | ||
Notes | DAG | Approved | no | ||
Call Number | DAG @ dag @ JTV2009a | Serial | 1230 | ||
Permanent link to this record | |||||
Author | Salim Jouili; Salvatore Tabbone; Ernest Valveny | ||||
Title | Comparing Graph Similarity Measures for Graphical Recognition. | Type | Conference Article | ||
Year | 2009 | Publication | 8th IAPR International Workshop on Graphics Recognition | Abbreviated Journal | |
Volume | Issue | Pages | |||
Keywords | |||||
Abstract | In this paper we evaluate four graph distance measures. The analysis is performed for document retrieval tasks. For this aim, different kind of documents are used including line drawings (symbols), ancient documents (ornamental letters), shapes and trademark-logos. The experimental results show that the performance of each graph distance measure depends on the kind of data and the graph representation technique. | ||||
Address | La Rochelle; France; July 2009 | ||||
Corporate Author | Thesis | ||||
Publisher | Springer | Place of Publication | Editor | ||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | LNCS | ||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | GREC | ||
Notes | DAG | Approved | no | ||
Call Number | DAG @ dag @ JTV2009 | Serial | 1442 | ||
Permanent link to this record | |||||
Author | Saiping Zhang; Luis Herranz; Marta Mrak; Marc Gorriz Blanch; Shuai Wan; Fuzheng Yang | ||||
Title | DCNGAN: A Deformable Convolution-Based GAN with QP Adaptation for Perceptual Quality Enhancement of Compressed Video | Type | Conference Article | ||
Year | 2022 | Publication | 47th International Conference on Acoustics, Speech, and Signal Processing | Abbreviated Journal | |
Volume | Issue | Pages | |||
Keywords | |||||
Abstract | In this paper, we propose a deformable convolution-based generative adversarial network (DCNGAN) for perceptual quality enhancement of compressed videos. DCNGAN is also adaptive to the quantization parameters (QPs). Compared with optical flows, deformable convolutions are more effective and efficient to align frames. Deformable convolutions can operate on multiple frames, thus leveraging more temporal information, which is beneficial for enhancing the perceptual quality of compressed videos. Instead of aligning frames in a pairwise manner, the deformable convolution can process multiple frames simultaneously, which leads to lower computational complexity. Experimental results demonstrate that the proposed DCNGAN outperforms other state-of-the-art compressed video quality enhancement algorithms. | ||||
Address | Virtual; May 2022 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | ICASSP | ||
Notes | MACO; 600.161; 601.379 | Approved | no | ||
Call Number | Admin @ si @ ZHM2022a | Serial | 3765 | ||
Permanent link to this record | |||||
Author | Saiping Zhang, Luis Herranz, Marta Mrak, Marc Gorriz Blanch, Shuai Wan, Fuzheng Yang | ||||
Title | PeQuENet: Perceptual Quality Enhancement of Compressed Video with Adaptation-and Attention-based Network | Type | Miscellaneous | ||
Year | 2022 | Publication | Arxiv | Abbreviated Journal | |
Volume | Issue | Pages | |||
Keywords | |||||
Abstract | In this paper we propose a generative adversarial network (GAN) framework to enhance the perceptual quality of compressed videos. Our framework includes attention and adaptation to different quantization parameters (QPs) in a single model. The attention module exploits global receptive fields that can capture and align long-range correlations between consecutive frames, which can be beneficial for enhancing perceptual quality of videos. The frame to be enhanced is fed into the deep network together with its neighboring frames, and in the first stage features at different depths are extracted. Then extracted features are fed into attention blocks to explore global temporal correlations, followed by a series of upsampling and convolution layers. Finally, the resulting features are processed by the QP-conditional adaptation module which leverages the corresponding QP information. In this way, a single model can be used to enhance adaptively to various QPs without requiring multiple models specific for every QP value, while having similar performance. Experimental results demonstrate the superior performance of the proposed PeQuENet compared with the state-of-the-art compressed video quality enhancement algorithms. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | MACO; no proj | Approved | no | ||
Call Number | Admin @ si @ ZHM2022b | Serial | 3819 | ||
Permanent link to this record | |||||
Author | Sagnik Das; Hassan Ahmed Sial; Ke Ma; Ramon Baldrich; Maria Vanrell; Dimitris Samaras | ||||
Title | Intrinsic Decomposition of Document Images In-the-Wild | Type | Conference Article | ||
Year | 2020 | Publication | 31st British Machine Vision Conference | Abbreviated Journal | |
Volume | Issue | Pages | |||
Keywords | |||||
Abstract | Automatic document content processing is affected by artifacts caused by the shape
of the paper, non-uniform and diverse color of lighting conditions. Fully-supervised methods on real data are impossible due to the large amount of data needed. Hence, the current state of the art deep learning models are trained on fully or partially synthetic images. However, document shadow or shading removal results still suffer because: (a) prior methods rely on uniformity of local color statistics, which limit their application on real-scenarios with complex document shapes and textures and; (b) synthetic or hybrid datasets with non-realistic, simulated lighting conditions are used to train the models. In this paper we tackle these problems with our two main contributions. First, a physically constrained learning-based method that directly estimates document reflectance based on intrinsic image formation which generalizes to challenging illumination conditions. Second, a new dataset that clearly improves previous synthetic ones, by adding a large range of realistic shading and diverse multi-illuminant conditions, uniquely customized to deal with documents in-the-wild. The proposed architecture works in two steps. First, a white balancing module neutralizes the color of the illumination on the input image. Based on the proposed multi-illuminant dataset we achieve a good white-balancing in really difficult conditions. Second, the shading separation module accurately disentangles the shading and paper material in a self-supervised manner where only the synthetic texture is used as a weak training signal (obviating the need for very costly ground truth with disentangled versions of shading and reflectance). The proposed approach leads to significant generalization of document reflectance estimation in real scenes with challenging illumination. We extensively evaluate on the real benchmark datasets available for intrinsic image decomposition and document shadow removal tasks. Our reflectance estimation scheme, when used as a pre-processing step of an OCR pipeline, shows a 21% improvement of character error rate (CER), thus, proving the practical applicability. The data and code will be available at: https://github.com/cvlab-stonybrook/DocIIW. |
||||
Address | Virtual; September 2020 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | BMVC | ||
Notes | CIC; 600.087; 600.140; 600.118 | Approved | no | ||
Call Number | Admin @ si @ DSM2020 | Serial | 3461 | ||
Permanent link to this record |