Publicacions CVC -- Query Results

[181–190] << 191 192 193 194 195 196 197 198 199 200 >> [201–210]

Details

Records
Author	Mohammad Rouhani
Title	3D Data Fitting and Tracking for Real Time Applications			Type	Report
Year	2009	Publication	CVC Technical Report	Abbreviated Journal
Volume	138	Issue	138	Pages
Keywords
Abstract
Address	Barcelona, Spain
Corporate Author				Thesis	Master's thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes	invisible;ADAS			Approved	no
Call Number	Admin @ si @ Rou2009			Serial	1150
Permanent link to this record



Author	Josep M. Gonfaus; Marco Pedersoli; Jordi Gonzalez; Andrea Vedaldi; Xavier Roca
Title	Factorized appearances for object detection			Type	Journal Article
Year	2015	Publication	Computer Vision and Image Understanding	Abbreviated Journal	CVIU
Volume	138	Issue		Pages	92–101
Keywords	Object recognition; Deformable part models; Learning and sharing parts; Discovering discriminative parts
Abstract	Deformable object models capture variations in an object’s appearance that can be represented as image deformations. Other effects such as out-of-plane rotations, three-dimensional articulations, and self-occlusions are often captured by considering mixture of deformable models, one per object aspect. A more scalable approach is representing instead the variations at the level of the object parts, applying the concept of a mixture locally. Combining a few part variations can in fact cheaply generate a large number of global appearances. A limited version of this idea was proposed by Yang and Ramanan [1], for human pose dectection. In this paper we apply it to the task of generic object category detection and extend it in several ways. First, we propose a model for the relationship between part appearances more general than the tree of Yang and Ramanan [1], which is more suitable for generic categories. Second, we treat part locations as well as their appearance as latent variables so that training does not need part annotations but only the object bounding boxes. Third, we modify the weakly-supervised learning of Felzenszwalb et al. and Girshick et al. [2], [3] to handle a significantly more complex latent structure. Our model is evaluated on standard object detection benchmarks and is found to improve over existing approaches, yielding state-of-the-art results for several object categories.
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes	ISE; 600.063; 600.078			Approved	no
Call Number	Admin @ si @ GPG2015			Serial	2705
Permanent link to this record



Author	Muhammad Anwer Rao; Fahad Shahbaz Khan; Joost Van de Weijer; Matthieu Molinier; Jorma Laaksonen
Title	Binary patterns encoded convolutional neural networks for texture recognition and remote sensing scene classification			Type	Journal Article
Year	2018	Publication	ISPRS Journal of Photogrammetry and Remote Sensing	Abbreviated Journal	ISPRS J
Volume	138	Issue		Pages	74-85
Keywords	Remote sensing; Deep learning; Scene classification; Local Binary Patterns; Texture analysis
Abstract	Designing discriminative powerful texture features robust to realistic imaging conditions is a challenging computer vision problem with many applications, including material recognition and analysis of satellite or aerial imagery. In the past, most texture description approaches were based on dense orderless statistical distribution of local features. However, most recent approaches to texture recognition and remote sensing scene classification are based on Convolutional Neural Networks (CNNs). The de facto practice when learning these CNN models is to use RGB patches as input with training performed on large amounts of labeled data (ImageNet). In this paper, we show that Local Binary Patterns (LBP) encoded CNN models, codenamed TEX-Nets, trained using mapped coded images with explicit LBP based texture information provide complementary information to the standard RGB deep models. Additionally, two deep architectures, namely early and late fusion, are investigated to combine the texture and color information. To the best of our knowledge, we are the first to investigate Binary Patterns encoded CNNs and different deep network fusion architectures for texture recognition and remote sensing scene classification. We perform comprehensive experiments on four texture recognition datasets and four remote sensing scene classification benchmarks: UC-Merced with 21 scene categories, WHU-RS19 with 19 scene classes, RSSCN7 with 7 categories and the recently introduced large scale aerial image dataset (AID) with 30 aerial scene types. We demonstrate that TEX-Nets provide complementary information to standard RGB deep model of the same network architecture. Our late fusion TEX-Net architecture always improves the overall performance compared to the standard RGB network on both recognition problems. Furthermore, our final combination leads to consistent improvement over the state-of-the-art for remote sensing scene
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes	LAMP; 600.109; 600.106; 600.120			Approved	no
Call Number	Admin @ si @ RKW2018			Serial	3158
Permanent link to this record



Author	Souhail Bakkali; Zuheng Ming; Mickael Coustaty; Marçal Rusiñol; Oriol Ramos Terrades
Title	VLCDoC: Vision-Language Contrastive Pre-Training Model for Cross-Modal Document Classification			Type	Journal Article
Year	2023	Publication	Pattern Recognition	Abbreviated Journal	PR
Volume	139	Issue		Pages	109419
Keywords
Abstract	Multimodal learning from document data has achieved great success lately as it allows to pre-train semantically meaningful features as a prior into a learnable downstream approach. In this paper, we approach the document classification problem by learning cross-modal representations through language and vision cues, considering intra- and inter-modality relationships. Instead of merging features from different modalities into a common representation space, the proposed method exploits high-level interactions and learns relevant semantic information from effective attention flows within and across modalities. The proposed learning objective is devised between intra- and inter-modality alignment tasks, where the similarity distribution per task is computed by contracting positive sample pairs while simultaneously contrasting negative ones in the common feature representation space}. Extensive experiments on public document classification datasets demonstrate the effectiveness and the generalization capacity of our model on both low-scale and large-scale datasets.
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN	ISSN 0031-3203	ISBN		Medium
Area		Expedition		Conference
Notes	DAG; 600.140; 600.121			Approved	no
Call Number	Admin @ si @ BMC2023			Serial	3826
Permanent link to this record



Author	Xavier Soria; Angel Sappa; Patricio Humanante; Arash Akbarinia
Title	Dense extreme inception network for edge detection			Type	Journal Article
Year	2023	Publication	Pattern Recognition	Abbreviated Journal	PR
Volume	139	Issue		Pages	109461
Keywords
Abstract	Edge detection is the basis of many computer vision applications. State of the art predominantly relies on deep learning with two decisive factors: dataset content and network architecture. Most of the publicly available datasets are not curated for edge detection tasks. Here, we address this limitation. First, we argue that edges, contours and boundaries, despite their overlaps, are three distinct visual features requiring separate benchmark datasets. To this end, we present a new dataset of edges. Second, we propose a novel architecture, termed Dense Extreme Inception Network for Edge Detection (DexiNed), that can be trained from scratch without any pre-trained weights. DexiNed outperforms other algorithms in the presented dataset. It also generalizes well to other datasets without any fine-tuning. The higher quality of DexiNed is also perceptually evident thanks to the sharper and finer edges it outputs.
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes	MSIAU			Approved	no
Call Number	Admin @ si @ SSH2023			Serial	3982
Permanent link to this record



Author	Jorge Bernal
Title	Use of Projection and Back-projection Methods in Bidimensional Computed Tomography Image Reconstruction			Type	Report
Year	2009	Publication	CVC Tecnical Report	Abbreviated Journal
Volume	141	Issue		Pages
Keywords	Projection, Back-projection, CT scan, Euclidean geometry, Radon transform
Abstract	One of the biggest drawbacks related to the use of CT scanners is the cost (in memory and in time) associated. In this project many methods to simulate their functioning, but in a more feasible way (taking an industrial point of view), will be studied. The main group of techniques that are being used are the one entitled as ’back-projection’. The concept behind is to simulate the X ray emission in CT scans by lines that cross with the image we want to reconstruct. In the first part of this document euclidean geometry is used to face the tasks of projec- tion and back-projection. After analysing the results achieved it has been proved that this approach does not lead to a fully perfect reconstruction (and also has some other problems related to running time and memory cost). Because of this in the second part of the document ’Filtered Back-projection’ method is introduced in order to improve the results. Filtered Back-projection methods rely on mathematical transforms (Fourier, Radon) in order to provide more accurate results that can be obtained in much less time. The main cause of this better results is the use of a filtering process before the back-projection in order to avoid high frequency-caused errors. As a result of this project two different implementations (one for each approach) had been implemented in order to compare their performance.
Address
Corporate Author	Computer Vision Center			Thesis	Master's thesis
Publisher		Place of Publication	Barcelona, Spain	Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area	800	Expedition		Conference
Notes	MV;			Approved	no
Call Number	IAM @ iam @ Ber2009			Serial	1693
Permanent link to this record



Author	Albert Andaluz
Title	LV Contour Segmentation in TMR images using Semantic Description of Tissue and Prior Knowledge Correction			Type	Report
Year	2009	Publication	CVC Technical Report	Abbreviated Journal
Volume	142	Issue		Pages
Keywords	Active Contour Models; Snakes; Active Shape Models; Deformable Templates; Left Ventricle Segmentation; Generalized Orthogonal Procrustes Analysis; Harmonic Phase Flow; Principal Component Analysis; Tagged Magnetic Resonance
Abstract	The Diagnosis of Left Ventricle (LV) pathologies is related to regional wall motion analysis. Health indicator scores such as the rotation and the torsion are useful for the diagnose of the Left Ventricle (LV) function. However, this requires proper identification of LV segments. On one hand, manual segmentation is robust, but it is slow and requires medical expertise. On the other hand, the tag pattern in Tagged Magnetic Resonance (TMR) sequences is a problem for the automatic segmentation of the LV boundaries. Consequently, we propose a method based in the classical formulation of parametric Snakes, combined with Active Shape models. Our semantic definition of the LV is tagged tissue that experiences motion in the systolic cycle. This defines two energy potentials for the Snake convergence. Additionally, the mean shape corrects excessive deviation from the anatomical shape. We have validated our approach in 15 healthy volunteers and two short axis cuts. In this way, we have compared the automatic segmentations to manual shapes outlined by medical experts. Also, we have explored the accuracy of clinical scores computed using automatic contours. The results show minor divergence in the approximation and the manual segmentations as well as robust computation of clinical scores in all cases. From this we conclude that the proposed method is a promising support tool for clinical analysis.
Address
Corporate Author				Thesis	Master's thesis
Publisher		Place of Publication	Bellaterra 08193, Barcelona, Spain	Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes	IAM;			Approved	no
Call Number	IAM @ iam @ And2009			Serial	1667
Permanent link to this record



Author	Jaume Gibert
Title	Learning structural representations and graph matching paradigms in the context of object recognition			Type	Report
Year	2009	Publication	CVC Technical Report	Abbreviated Journal
Volume	143	Issue		Pages
Keywords
Abstract
Address
Corporate Author	Computer Vision Center			Thesis	Master's thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes	DAG			Approved	no
Call Number	Admin @ si @ Gib2009			Serial	2397
Permanent link to this record



Author	Jose Carlos Rubio
Title	Graph matching based on graphical models with application to vehicle tracking and classification at night			Type	Report
Year	2009	Publication	CVC Technical Report	Abbreviated Journal
Volume	144	Issue		Pages
Keywords
Abstract
Address
Corporate Author	Computer Vision Center			Thesis	Master's thesis
Publisher		Place of Publication	Bellaterra, Barcelona	Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes	CIC			Approved	no
Call Number	Admin @ si @ Rub2009			Serial	2398
Permanent link to this record



Author	Joakim Bruslund Haurum; Meysam Madadi; Sergio Escalera; Thomas B. Moeslund
Title	Multi-scale hybrid vision transformer and Sinkhorn tokenizer for sewer defect classification			Type	Journal Article
Year	2022	Publication	Automation in Construction	Abbreviated Journal	AC
Volume	144	Issue		Pages	104614
Keywords	Sewer Defect Classification; Vision Transformers; Sinkhorn-Knopp; Convolutional Neural Networks; Closed-Circuit Television; Sewer Inspection
Abstract	A crucial part of image classification consists of capturing non-local spatial semantics of image content. This paper describes the multi-scale hybrid vision transformer (MSHViT), an extension of the classical convolutional neural network (CNN) backbone, for multi-label sewer defect classification. To better model spatial semantics in the images, features are aggregated at different scales non-locally through the use of a lightweight vision transformer, and a smaller set of tokens was produced through a novel Sinkhorn clustering-based tokenizer using distinct cluster centers. The proposed MSHViT and Sinkhorn tokenizer were evaluated on the Sewer-ML multi-label sewer defect classification dataset, showing consistent performance improvements of up to 2.53 percentage points.
Address	Dec 2022
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes	HuPBA			Approved	no
Call Number	Admin @ si @ BME2022c			Serial	3780
Permanent link to this record



Author	Ruben Tito; Dimosthenis Karatzas; Ernest Valveny
Title	Hierarchical multimodal transformers for Multi-Page DocVQA			Type	Journal Article
Year	2023	Publication	Pattern Recognition	Abbreviated Journal	PR
Volume	144	Issue		Pages	109834
Keywords
Abstract	Document Visual Question Answering (DocVQA) refers to the task of answering questions from document images. Existing work on DocVQA only considers single-page documents. However, in real scenarios documents are mostly composed of multiple pages that should be processed altogether. In this work we extend DocVQA to the multi-page scenario. For that, we first create a new dataset, MP-DocVQA, where questions are posed over multi-page documents instead of single pages. Second, we propose a new hierarchical method, Hi-VT5, based on the T5 architecture, that overcomes the limitations of current methods to process long multi-page documents. The proposed method is based on a hierarchical transformer architecture where the encoder summarizes the most relevant information of every page and then, the decoder takes this summarized information to generate the final answer. Through extensive experimentation, we demonstrate that our method is able, in a single stage, to answer the questions and provide the page that contains the relevant information to find the answer, which can be used as a kind of explainability measure.
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN	ISSN 0031-3203	ISBN		Medium
Area		Expedition		Conference
Notes	DAG; 600.155; 600.121			Approved	no
Call Number	Admin @ si @ TKV2023			Serial	3825
Permanent link to this record



Author	Ruben Tito; Dimosthenis Karatzas; Ernest Valveny
Title	Hierarchical multimodal transformers for Multipage DocVQA			Type	Journal Article
Year	2023	Publication	Pattern Recognition	Abbreviated Journal	PR
Volume	144	Issue	109834	Pages
Keywords
Abstract	Existing work on DocVQA only considers single-page documents. However, in real applications documents are mostly composed of multiple pages that should be processed altogether. In this work, we propose a new multimodal hierarchical method Hi-VT5, that overcomes the limitations of current methods to process long multipage documents. In contrast to previous hierarchical methods that focus on different semantic granularity (He et al., 2021) or different subtasks (Zhou et al., 2022) used in image classification. Our method is a hierarchical transformer architecture where the encoder learns to summarize the most relevant information of every page and then, the decoder uses this summarized representation to generate the final answer, following a bottom-up approach. Moreover, due to the lack of multipage DocVQA datasets, we also introduce MP-DocVQA, an extension of SP-DocVQA where questions are posed over multipage documents instead of single pages. Through extensive experimentation, we demonstrate that Hi-VT5 is able, in a single stage, to answer the questions and provide the page that contains the answer, which can be used as a kind of explainability measure.
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes	DAG			Approved	no
Call Number	Admin @ si @ TKV2023			Serial	3836
Permanent link to this record



Author	Farshad Nourbakhsh
Title	Colour logo recognition			Type	Report
Year	2009	Publication	CVC Technical Report	Abbreviated Journal
Volume	145	Issue		Pages
Keywords
Abstract
Address
Corporate Author	Computer Vision Center			Thesis	Master's thesis
Publisher		Place of Publication	Bellaterra, Barcelona	Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes	DAG			Approved	no
Call Number	Admin @ si @ Nou2009			Serial	2399
Permanent link to this record



Author	Katerine Diaz; Francesc J. Ferri; Aura Hernandez-Sabate
Title	An overview of incremental feature extraction methods based on linear subspaces			Type	Journal Article
Year	2018	Publication	Knowledge-Based Systems	Abbreviated Journal	KBS
Volume	145	Issue		Pages	219-235
Keywords
Abstract	With the massive explosion of machine learning in our day-to-day life, incremental and adaptive learning has become a major topic, crucial to keep up-to-date and improve classification models and their corresponding feature extraction processes. This paper presents a categorized overview of incremental feature extraction based on linear subspace methods which aim at incorporating new information to the already acquired knowledge without accessing previous data. Specifically, this paper focuses on those linear dimensionality reduction methods with orthogonal matrix constraints based on global loss function, due to the extensive use of their batch approaches versus other linear alternatives. Thus, we cover the approaches derived from Principal Components Analysis, Linear Discriminative Analysis and Discriminative Common Vector methods. For each basic method, its incremental approaches are differentiated according to the subspace model and matrix decomposition involved in the updating process. Besides this categorization, several updating strategies are distinguished according to the amount of data used to update and to the fact of considering a static or dynamic number of classes. Moreover, the specific role of the size/dimension ratio in each method is considered. Finally, computational complexity, experimental setup and the accuracy rates according to published results are compiled and analyzed, and an empirical evaluation is done to compare the best approach of each kind.
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN	0950-7051	ISBN		Medium
Area		Expedition		Conference
Notes	ADAS; 600.118			Approved	no
Call Number	Admin @ si @ DFH2018			Serial	3090
Permanent link to this record



Author	Enric Sala
Title	Off-line person-dependent signature verification			Type	Report
Year	2009	Publication	CVC Technical Report	Abbreviated Journal
Volume	146	Issue		Pages
Keywords
Abstract
Address
Corporate Author	Computer Vision Center			Thesis	Master's thesis
Publisher		Place of Publication	Bellaterra, Barcelona	Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes				Approved	no
Call Number	Admin @ si @ Sal2009			Serial	2400
Permanent link to this record