Publicacions CVC -- Query Results

[21–30] << 31 32 33 34 35 36 37 38 39 40 >> [41–50]

Details

Records
Author	Sanket Biswas; Pau Riba; Josep Llados; Umapada Pal
Title	Beyond Document Object Detection: Instance-Level Segmentation of Complex Layouts			Type	Journal Article
Year	2021	Publication	International Journal on Document Analysis and Recognition	Abbreviated Journal	IJDAR
Volume	24	Issue		Pages	269–281
Keywords
Abstract	Information extraction is a fundamental task of many business intelligence services that entail massive document processing. Understanding a document page structure in terms of its layout provides contextual support which is helpful in the semantic interpretation of the document terms. In this paper, inspired by the progress of deep learning methodologies applied to the task of object recognition, we transfer these models to the specific case of document object detection, reformulating the traditional problem of document layout analysis. Moreover, we importantly contribute to prior arts by defining the task of instance segmentation on the document image domain. An instance segmentation paradigm is especially important in complex layouts whose contents should interact for the proper rendering of the page, i.e., the proper text wrapping around an image. Finally, we provide an extensive evaluation, both qualitative and quantitative, that demonstrates the superior performance of the proposed methodology over the current state of the art.
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes	DAG; 600.121; 600.140; 110.312			Approved	no
Call Number	Admin @ si @ BRL2021b			Serial	3574
Permanent link to this record



Author	Sanket Biswas; Pau Riba; Josep Llados; Umapada Pal
Title	Graph-Based Deep Generative Modelling for Document Layout Generation			Type	Conference Article
Year	2021	Publication	16th International Conference on Document Analysis and Recognition	Abbreviated Journal
Volume	12917	Issue		Pages	525-537
Keywords
Abstract	One of the major prerequisites for any deep learning approach is the availability of large-scale training data. When dealing with scanned document images in real world scenarios, the principal information of its content is stored in the layout itself. In this work, we have proposed an automated deep generative model using Graph Neural Networks (GNNs) to generate synthetic data with highly variable and plausible document layouts that can be used to train document interpretation systems, in this case, specially in digital mailroom applications. It is also the first graph-based approach for document layout generation task experimented on administrative document images, in this case, invoices.
Address	Lausanne; Suissa; September 2021
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title	LNCS
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes	DAG; 600.121; 600.140; 110.312			Approved	no
Call Number	Admin @ si @ BRL2021			Serial	3676
Permanent link to this record



Author	Sangheeta Roy; Palaiahnakote Shivakumara; Namita Jain; Vijeta Khare; Anjan Dutta; Umapada Pal; Tong Lu
Title	Rough-Fuzzy based Scene Categorization for Text Detection and Recognition in Video			Type	Journal Article
Year	2018	Publication	Pattern Recognition	Abbreviated Journal	PR
Volume	80	Issue		Pages	64-82
Keywords	Rough set; Fuzzy set; Video categorization; Scene image classification; Video text detection; Video text recognition
Abstract	Scene image or video understanding is a challenging task especially when number of video types increases drastically with high variations in background and foreground. This paper proposes a new method for categorizing scene videos into different classes, namely, Animation, Outlet, Sports, e-Learning, Medical, Weather, Defense, Economics, Animal Planet and Technology, for the performance improvement of text detection and recognition, which is an effective approach for scene image or video understanding. For this purpose, at first, we present a new combination of rough and fuzzy concept to study irregular shapes of edge components in input scene videos, which helps to classify edge components into several groups. Next, the proposed method explores gradient direction information of each pixel in each edge component group to extract stroke based features by dividing each group into several intra and inter planes. We further extract correlation and covariance features to encode semantic features located inside planes or between planes. Features of intra and inter planes of groups are then concatenated to get a feature matrix. Finally, the feature matrix is verified with temporal frames and fed to a neural network for categorization. Experimental results show that the proposed method outperforms the existing state-of-the-art methods, at the same time, the performances of text detection and recognition methods are also improved significantly due to categorization.
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes	DAG; 600.097; 600.121			Approved	no
Call Number	Admin @ si @ RSJ2018			Serial	3096
Permanent link to this record



Author	Sangeeth Reddy; Minesh Mathew; Lluis Gomez; Marçal Rusiñol; Dimosthenis Karatzas; C.V. Jawahar
Title	RoadText-1K: Text Detection and Recognition Dataset for Driving Videos			Type	Conference Article
Year	2020	Publication	IEEE International Conference on Robotics and Automation	Abbreviated Journal
Volume		Issue		Pages
Keywords
Abstract	Perceiving text is crucial to understand semantics of outdoor scenes and hence is a critical requirement to build intelligent systems for driver assistance and self-driving. Most of the existing datasets for text detection and recognition comprise still images and are mostly compiled keeping text in mind. This paper introduces a new ”RoadText-1K” dataset for text in driving videos. The dataset is 20 times larger than the existing largest dataset for text in videos. Our dataset comprises 1000 video clips of driving without any bias towards text and with annotations for text bounding boxes and transcriptions in every frame. State of the art methods for text detection, recognition and tracking are evaluated on the new dataset and the results signify the challenges in unconstrained driving videos compared to existing datasets. This suggests that RoadText-1K is suited for research and development of reading systems, robust enough to be incorporated into more complex downstream tasks like driver assistance and self-driving. The dataset can be found at http://cvit.iiit.ac.in/research/ projects/cvit-projects/roadtext-1k
Address	Paris; Francia; ???
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	ICRA
Notes	DAG; 600.121; 600.129			Approved	no
Call Number	Admin @ si @ RMG2020			Serial	3400
Permanent link to this record



Author	Sandra Pujades;Francesc Carreras;Manuel Ballester; Jaume Garcia; Debora Gil
Title	A Normalized Parametric Domain for the Analysis of the Left Ventricular Function			Type	Conference Article
Year	2008	Publication	Proceedings of the Third International Conference on Computer Vision Theory and Applications (VISAPP’08)	Abbreviated Journal
Volume	1	Issue		Pages	267-274
Keywords	Helical Ventricular Myocardial Band; Myocardial Fiber; Tagged Magnetic Resonance; HARP; Optical Flow Variational Framework; Gabor Filters; B-Splines.
Abstract	Impairment of left ventricular (LV) contractility due to cardiovascular diseases is reflected in LV motion patterns. The mechanics of any muscle strongly depends on the spatial orientation of its muscular fibers since the motion that the muscle undergoes mainly takes place along the fiber. The helical ventricular myocardial band (HVMB) concept describes the myocardial muscle as a unique muscular band that twists in space in a non homogeneous fashion. The 3D anisotropy of the ventricular band fibers suggests a regional analysis of the heart motion. Computation of normality models of such motion can help in the detection and localization of any cardiac disorder. In this paper we introduce, for the first time, a normalized parametric domain that allows comparison of the left ventricle motion across patients. We address, both, extraction of the LV motion from Tagged Magnetic Resonance images, as well as, defining a mapping of the LV to a common normalized domain. Extraction of normality motion patterns from 17 healthy volunteers shows the clinical potential of our LV parametrization.
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes	IAM;			Approved	no
Call Number	IAM @ iam @ GGP2008			Serial	1627
Permanent link to this record



Author	Sandra Jimenez; Xavier Otazu; Valero Laparra; Jesus Malo
Title	Chromatic induction and contrast masking: similar models, different goals?			Type	Conference Article
Year	2013	Publication	Human Vision and Electronic Imaging XVIII	Abbreviated Journal
Volume	8651	Issue		Pages
Keywords
Abstract	Normalization of signals coming from linear sensors is an ubiquitous mechanism of neural adaptation.1 Local interaction between sensors tuned to a particular feature at certain spatial position and neighbor sensors explains a wide range of psychophysical facts including (1) masking of spatial patterns, (2) non-linearities of motion sensors, (3) adaptation of color perception, (4) brightness and chromatic induction, and (5) image quality assessment. Although the above models have formal and qualitative similarities, it does not necessarily mean that the mechanisms involved are pursuing the same statistical goal. For instance, in the case of chromatic mechanisms (disregarding spatial information), different parameters in the normalization give rise to optimal discrimination or adaptation, and different non-linearities may give rise to error minimization or component independence. In the case of spatial sensors (disregarding color information), a number of studies have pointed out the benefits of masking in statistical independence terms. However, such statistical analysis has not been performed for spatio-chromatic induction models where chromatic perception depends on spatial configuration. In this work we investigate whether successful spatio-chromatic induction models,6 increase component independence similarly as previously reported for masking models. Mutual information analysis suggests that seeking an efficient chromatic representation may explain the prevalence of induction effects in spatially simple images. © (2013) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Address	San Francisco CA; USA; February 2013
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	HVEI
Notes	CIC			Approved	no
Call Number	Admin @ si @ JOL2013			Serial	2240
Permanent link to this record



Author	Salvatore Tabbone; Oriol Ramos Terrades; S. Barrat
Title	Histogram of radon transform. A useful descriptor for shape retrieval			Type	Conference Article
Year	2008	Publication	19th International Conference on Pattern Recognition	Abbreviated Journal
Volume		Issue		Pages	1-4
Keywords
Abstract
Address	Tampa, Florida
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	ICPR
Notes	DAG			Approved	no
Call Number	Admin @ si @ TRB2008			Serial	1876
Permanent link to this record



Author	Salvatore Tabbone; Oriol Ramos Terrades
Title	An Overview of Symbol Recognition			Type	Book Chapter
Year	2014	Publication	Handbook of Document Image Processing and Recognition	Abbreviated Journal
Volume	D	Issue		Pages	523-551
Keywords	Pattern recognition; Shape descriptors; Structural descriptors; Symbolrecognition; Symbol spotting
Abstract	According to the Cambridge Dictionaries Online, a symbol is a sign, shape, or object that is used to represent something else. Symbol recognition is a subfield of general pattern recognition problems that focuses on identifying, detecting, and recognizing symbols in technical drawings, maps, or miscellaneous documents such as logos and musical scores. This chapter aims at providing the reader an overview of the different existing ways of describing and recognizing symbols and how the field has evolved to attain a certain degree of maturity.
Address
Corporate Author				Thesis
Publisher	Springer London	Place of Publication		Editor	D. Doermann; K. Tombre
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN	978-0-85729-858-4	Medium
Area		Expedition		Conference
Notes	DAG; 600.077			Approved	no
Call Number	Admin @ si @ TaT2014			Serial	2489
Permanent link to this record



Author	Salvatore Tabbone; Josep Llados
Title	A Propos de la Reconnaissance de Documents Graphiques: Synthese et Perspectives			Type	Conference Article
Year	2007	Publication	Traitement et Analyse de l’Information: Methodes et Applications	Abbreviated Journal
Volume		Issue		Pages	247–258
Keywords
Abstract
Address	Hammamet (Tunis)
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	TAIMA’07
Notes	DAG			Approved	no
Call Number	DAG @ dag @ TaL2007			Serial	890
Permanent link to this record



Author	Salim Jouili; Salvatore Tabbone; Ernest Valveny
Title	Comparing Graph Similarity Measures for Graphical Recognition			Type	Book Chapter
Year	2010	Publication	Graphics Recognition. Achievements, Challenges, and Evolution. 8th International Workshop, GREC 2009. Selected Papers	Abbreviated Journal
Volume	6020	Issue		Pages	37-48
Keywords
Abstract	In this paper we evaluate four graph distance measures. The analysis is performed for document retrieval tasks. For this aim, different kind of documents are used including line drawings (symbols), ancient documents (ornamental letters), shapes and trademark-logos. The experimental results show that the performance of each graph distance measure depends on the kind of data and the graph representation technique.
Address
Corporate Author				Thesis
Publisher	Springer Berlin Heidelberg	Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title	LNCS
Series Volume		Series Issue		Edition
ISSN	0302-9743	ISBN	978-3-642-13727-3	Medium
Area		Expedition		Conference	GREC
Notes	DAG			Approved	no
Call Number	Admin @ si @ JTV2010			Serial	2404
Permanent link to this record



Author	Salim Jouili; Salvatore Tabbone; Ernest Valveny
Title	Evaluation of graph matching measures for documents retrieval			Type	Conference Article
Year	2009	Publication	In proceedings of 8th IAPR International Workshop on Graphics Recognition	Abbreviated Journal
Volume		Issue		Pages	13–21
Keywords	Graph Matching; Graph retrieval; structural representation; Performance Evaluation
Abstract	In this paper we evaluate four graph distance measures. The analysis is performed for document retrieval tasks. For this aim, different kind of documents are used which include line drawings (symbols), ancient documents (ornamental letters), shapes and trademark-logos. The experimental results show that the performance of each grahp distance measure depends on the kind of data and the graph representation technique.
Address	La Rochelle, France
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN	0302-9743	ISBN	978-3-642-13727-3	Medium
Area		Expedition		Conference	GREC
Notes	DAG			Approved	no
Call Number	DAG @ dag @ JTV2009a			Serial	1230
Permanent link to this record



Author	Salim Jouili; Salvatore Tabbone; Ernest Valveny
Title	Comparing Graph Similarity Measures for Graphical Recognition.			Type	Conference Article
Year	2009	Publication	8th IAPR International Workshop on Graphics Recognition	Abbreviated Journal
Volume		Issue		Pages
Keywords
Abstract	In this paper we evaluate four graph distance measures. The analysis is performed for document retrieval tasks. For this aim, different kind of documents are used including line drawings (symbols), ancient documents (ornamental letters), shapes and trademark-logos. The experimental results show that the performance of each graph distance measure depends on the kind of data and the graph representation technique.
Address	La Rochelle; France; July 2009
Corporate Author				Thesis
Publisher	Springer	Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title	LNCS
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	GREC
Notes	DAG			Approved	no
Call Number	DAG @ dag @ JTV2009			Serial	1442
Permanent link to this record



Author	Saiping Zhang; Luis Herranz; Marta Mrak; Marc Gorriz Blanch; Shuai Wan; Fuzheng Yang
Title	DCNGAN: A Deformable Convolution-Based GAN with QP Adaptation for Perceptual Quality Enhancement of Compressed Video			Type	Conference Article
Year	2022	Publication	47th International Conference on Acoustics, Speech, and Signal Processing	Abbreviated Journal
Volume		Issue		Pages
Keywords
Abstract	In this paper, we propose a deformable convolution-based generative adversarial network (DCNGAN) for perceptual quality enhancement of compressed videos. DCNGAN is also adaptive to the quantization parameters (QPs). Compared with optical flows, deformable convolutions are more effective and efficient to align frames. Deformable convolutions can operate on multiple frames, thus leveraging more temporal information, which is beneficial for enhancing the perceptual quality of compressed videos. Instead of aligning frames in a pairwise manner, the deformable convolution can process multiple frames simultaneously, which leads to lower computational complexity. Experimental results demonstrate that the proposed DCNGAN outperforms other state-of-the-art compressed video quality enhancement algorithms.
Address	Virtual; May 2022
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	ICASSP
Notes	MACO; 600.161; 601.379			Approved	no
Call Number	Admin @ si @ ZHM2022a			Serial	3765
Permanent link to this record



Author	Saiping Zhang, Luis Herranz, Marta Mrak, Marc Gorriz Blanch, Shuai Wan, Fuzheng Yang
Title	PeQuENet: Perceptual Quality Enhancement of Compressed Video with Adaptation-and Attention-based Network			Type	Miscellaneous
Year	2022	Publication	Arxiv	Abbreviated Journal
Volume		Issue		Pages
Keywords
Abstract	In this paper we propose a generative adversarial network (GAN) framework to enhance the perceptual quality of compressed videos. Our framework includes attention and adaptation to different quantization parameters (QPs) in a single model. The attention module exploits global receptive fields that can capture and align long-range correlations between consecutive frames, which can be beneficial for enhancing perceptual quality of videos. The frame to be enhanced is fed into the deep network together with its neighboring frames, and in the first stage features at different depths are extracted. Then extracted features are fed into attention blocks to explore global temporal correlations, followed by a series of upsampling and convolution layers. Finally, the resulting features are processed by the QP-conditional adaptation module which leverages the corresponding QP information. In this way, a single model can be used to enhance adaptively to various QPs without requiring multiple models specific for every QP value, while having similar performance. Experimental results demonstrate the superior performance of the proposed PeQuENet compared with the state-of-the-art compressed video quality enhancement algorithms.
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes	MACO; no proj			Approved	no
Call Number	Admin @ si @ ZHM2022b			Serial	3819
Permanent link to this record



Author	Sagnik Das; Hassan Ahmed Sial; Ke Ma; Ramon Baldrich; Maria Vanrell; Dimitris Samaras
Title	Intrinsic Decomposition of Document Images In-the-Wild			Type	Conference Article
Year	2020	Publication	31st British Machine Vision Conference	Abbreviated Journal
Volume		Issue		Pages
Keywords
Abstract	Automatic document content processing is affected by artifacts caused by the shape of the paper, non-uniform and diverse color of lighting conditions. Fully-supervised methods on real data are impossible due to the large amount of data needed. Hence, the current state of the art deep learning models are trained on fully or partially synthetic images. However, document shadow or shading removal results still suffer because: (a) prior methods rely on uniformity of local color statistics, which limit their application on real-scenarios with complex document shapes and textures and; (b) synthetic or hybrid datasets with non-realistic, simulated lighting conditions are used to train the models. In this paper we tackle these problems with our two main contributions. First, a physically constrained learning-based method that directly estimates document reflectance based on intrinsic image formation which generalizes to challenging illumination conditions. Second, a new dataset that clearly improves previous synthetic ones, by adding a large range of realistic shading and diverse multi-illuminant conditions, uniquely customized to deal with documents in-the-wild. The proposed architecture works in two steps. First, a white balancing module neutralizes the color of the illumination on the input image. Based on the proposed multi-illuminant dataset we achieve a good white-balancing in really difficult conditions. Second, the shading separation module accurately disentangles the shading and paper material in a self-supervised manner where only the synthetic texture is used as a weak training signal (obviating the need for very costly ground truth with disentangled versions of shading and reflectance). The proposed approach leads to significant generalization of document reflectance estimation in real scenes with challenging illumination. We extensively evaluate on the real benchmark datasets available for intrinsic image decomposition and document shadow removal tasks. Our reflectance estimation scheme, when used as a pre-processing step of an OCR pipeline, shows a 21% improvement of character error rate (CER), thus, proving the practical applicability. The data and code will be available at: https://github.com/cvlab-stonybrook/DocIIW.
Address	Virtual; September 2020
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	BMVC
Notes	CIC; 600.087; 600.140; 600.118			Approved	no
Call Number	Admin @ si @ DSM2020			Serial	3461
Permanent link to this record