Publicacions CVC -- Query Results

	Publicacions CVC Home \| Show All \| Simple Search \| Advanced Search \| Add Record \| Import	Login Quick Search: Field: contains: ...
	256–270 of 3413 records found matching your query (RSS \| history):

Search & Display Options

Select All Deselect All

[1–10] << 11 12 13 14 15 16 17 18 19 20 >> [21–30]

List View

Citations

Details

	Records
	Author	Henry Velesaca; Patricia Suarez; Angel Sappa; Dario Carpio; Rafael E. Rivadeneira; Angel Sanchez
	Title	Review on Common Techniques for Urban Environment Video Analytics			Type	Conference Article
	Year	2022	Publication	Anais do III Workshop Brasileiro de Cidades Inteligentes	Abbreviated Journal
	Volume		Issue		Pages	107-118
	Keywords	Video Analytics; Review; Urban Environments; Smart Cities
	Abstract	This work compiles the different computer vision-based approaches from the state-of-the-art intended for video analytics in urban environments. The manuscript groups the different approaches according to the typical modules present in video analysis, including image preprocessing, object detection, classification, and tracking. This proposed pipeline serves as a basic guide to representing these most representative approaches in this topic of video analysis that will be addressed in this work. Furthermore, the manuscript is not intended to be an exhaustive review of the most advanced approaches, but only a list of common techniques proposed to address recurring problems in this field.
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	WBCI
	Notes	MSIAU; 601.349			Approved	no
	Call Number	Admin @ si @ VSS2022			Serial	3773
Permanent link to this record



	Author	Angel Morera; Angel Sanchez; A. Belen Moreno; Angel Sappa; Jose F. Velez
	Title	SSD vs. YOLO for Detection of Outdoor Urban Advertising Panels under Multiple Variabilities			Type	Journal Article
	Year	2020	Publication	Sensors	Abbreviated Journal	SENS
	Volume	20	Issue	16	Pages	4587
	Keywords
	Abstract	This work compares Single Shot MultiBox Detector (SSD) and You Only Look Once (YOLO) deep neural networks for the outdoor advertisement panel detection problem by handling multiple and combined variabilities in the scenes. Publicity panel detection in images offers important advantages both in the real world as well as in the virtual one. For example, applications like Google Street View can be used for Internet publicity and when detecting these ads panels in images, it could be possible to replace the publicity appearing inside the panels by another from a funding company. In our experiments, both SSD and YOLO detectors have produced acceptable results under variable sizes of panels, illumination conditions, viewing perspectives, partial occlusion of panels, complex background and multiple panels in scenes. Due to the difficulty of finding annotated images for the considered problem, we created our own dataset for conducting the experiments. The major strength of the SSD model was the almost elimination of False Positive (FP) cases, situation that is preferable when the publicity contained inside the panel is analyzed after detecting them. On the other side, YOLO produced better panel localization results detecting a higher number of True Positive (TP) panels with a higher accuracy. Finally, a comparison of the two analyzed object detection models with different types of semantic segmentation networks and using the same evaluation metrics is also included.
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	MSIAU; 600.130; 601.349; 600.122			Approved	no
	Call Number	Admin @ si @ MSM2020			Serial	3452
Permanent link to this record



	Author	Jorge Bernal; F. Javier Sanchez; Fernando Vilariño
	Title	A Region Segmentation Method for Colonoscopy Images Using a Model of Polyp Appearance			Type	Conference Article
	Year	2011	Publication	5th Iberian Conference on Pattern Recognition and Image Analysis	Abbreviated Journal
	Volume	6669	Issue		Pages	134-143
	Keywords	Colonoscopy, Polyp Detection, Region Merging, Region Segmentation.
	Abstract	This work aims at the segmentation of colonoscopy images into a minimum number of informative regions. Our method performs in a way such, if a polyp is present in the image, it will be exclusively and totally contained in a single region. This result can be used in later stages to classify regions as polyp-containing candidates. The output of the algorithm also defines which regions can be considered as non-informative. The algorithm starts with a high number of initial regions and merges them taking into account the model of polyp appearance obtained from available data. The results show that our segmentations of polyp regions are more accurate than state-of-the-art methods.
	Address	Las Palmas de Gran Canaria, June 2011
	Corporate Author	SpringerLink			Thesis
	Publisher		Place of Publication		Editor	Vitrià, Jordi and Sanches, João and Hernández, Mario
	Language		Summary Language		Original Title
	Series Editor		Series Title	Lecture Notes in Computer Science	Abbreviated Series Title	LNCS
	Series Volume		Series Issue		Edition
	ISSN		ISBN	978-3-642-21256-7	Medium
	Area	800	Expedition		Conference	IbPRIA
	Notes	MV;SIAI			Approved	no
	Call Number	IAM @ iam @ BSV2011c			Serial	1696
Permanent link to this record



	Author	Jorge Bernal; F. Javier Sanchez; Fernando Vilariño
	Title	Towards Automatic Polyp Detection with a Polyp Appearance Model			Type	Journal Article
	Year	2012	Publication	Pattern Recognition	Abbreviated Journal	PR
	Volume	45	Issue	9	Pages	3166-3182
	Keywords	Colonoscopy,PolypDetection,RegionSegmentation,SA-DOVA descriptot
	Abstract	This work aims at the automatic polyp detection by using a model of polyp appearance in the context of the analysis of colonoscopy videos. Our method consists of three stages: region segmentation, region description and region classification. The performance of our region segmentation method guarantees that if a polyp is present in the image, it will be exclusively and totally contained in a single region. The output of the algorithm also defines which regions can be considered as non-informative. We define as our region descriptor the novel Sector Accumulation-Depth of Valleys Accumulation (SA-DOVA), which provides a necessary but not sufficient condition for the polyp presence. Finally, we classify our segmented regions according to the maximal values of the SA-DOVA descriptor. Our preliminary classification results are promising, especially when classifying those parts of the image that do not contain a polyp inside.
	Address
	Corporate Author				Thesis
	Publisher	Elsevier	Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN	0031-3203	ISBN		Medium
	Area	800	Expedition		Conference	IbPRIA
	Notes	MV;SIAI			Approved	no
	Call Number	Admin @ si @ BSV2012; IAM @ iam			Serial	1997
Permanent link to this record



	Author	Jorge Bernal; F. Javier Sanchez; Fernando Vilariño
	Title	Depth of Valleys Accumulation Algorithm for Object Detection			Type	Conference Article
	Year	2011	Publication	14th Congrès Català en Intel·ligencia Artificial	Abbreviated Journal
	Volume	1	Issue	1	Pages	71-80
	Keywords	Object Recognition, Object Region Identification, Image Analysis, Image Processing
	Abstract	This work aims at detecting in which regions the objects in the image are by using information about the intensity of valleys, which appear to surround ob- jects in images where the source of light is in the line of direction than the camera. We present our depth of valleys accumulation method, which consists of two stages: first, the definition of the depth of valleys image which combines the output of a ridges and valleys detector with the morphological gradient to measure how deep is a point inside a valley and second, an algorithm that denotes points of the image as interior to objects those which are inside complete or incomplete boundaries in the depth of valleys image. To evaluate the performance of our method we have tested it on several application domains. Our results on object region identification are promising, specially in the field of polyp detection in colonoscopy videos, and we also show its applicability in different areas.
	Address	Lleida
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN	978-1-60750-841-0	Medium
	Area	800	Expedition		Conference	CCIA
	Notes	MV;SIAI			Approved	no
	Call Number	IAM @ iam @ BSV2011b			Serial	1699
Permanent link to this record



	Author	Minesh Mathew; Lluis Gomez; Dimosthenis Karatzas; C.V. Jawahar
	Title	Asking questions on handwritten document collections			Type	Journal Article
	Year	2021	Publication	International Journal on Document Analysis and Recognition	Abbreviated Journal	IJDAR
	Volume	24	Issue		Pages	235-249
	Keywords
	Abstract	This work addresses the problem of Question Answering (QA) on handwritten document collections. Unlike typical QA and Visual Question Answering (VQA) formulations where the answer is a short text, we aim to locate a document snippet where the answer lies. The proposed approach works without recognizing the text in the documents. We argue that the recognition-free approach is suitable for handwritten documents and historical collections where robust text recognition is often difficult. At the same time, for human users, document image snippets containing answers act as a valid alternative to textual answers. The proposed approach uses an off-the-shelf deep embedding network which can project both textual words and word images into a common sub-space. This embedding bridges the textual and visual domains and helps us retrieve document snippets that potentially answer a question. We evaluate results of the proposed approach on two new datasets: (i) HW-SQuAD: a synthetic, handwritten document image counterpart of SQuAD1.0 dataset and (ii) BenthamQA: a smaller set of QA pairs defined on documents from the popular Bentham manuscripts collection. We also present a thorough analysis of the proposed recognition-free approach compared to a recognition-based approach which uses text recognized from the images using an OCR. Datasets presented in this work are available to download at docvqa.org.
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	DAG; 600.121			Approved	no
	Call Number	Admin @ si @ MGK2021			Serial	3621
Permanent link to this record



	Author	Cristina Palmero; Albert Clapes; Chris Bahnsen; Andreas Møgelmose; Thomas B. Moeslund; Sergio Escalera
	Title	Multi-modal RGB-Depth-Thermal Human Body Segmentation			Type	Journal Article
	Year	2016	Publication	International Journal of Computer Vision	Abbreviated Journal	IJCV
	Volume	118	Issue	2	Pages	217-239
	Keywords	Human body segmentation; RGB ; Depth Thermal
	Abstract	This work addresses the problem of human body segmentation from multi-modal visual cues as a first stage of automatic human behavior analysis. We propose a novel RGB–depth–thermal dataset along with a multi-modal segmentation baseline. The several modalities are registered using a calibration device and a registration algorithm. Our baseline extracts regions of interest using background subtraction, defines a partitioning of the foreground regions into cells, computes a set of image features on those cells using different state-of-the-art feature extractions, and models the distribution of the descriptors per cell using probabilistic models. A supervised learning algorithm then fuses the output likelihoods over cells in a stacked feature vector representation. The baseline, using Gaussian mixture models for the probabilistic modeling and Random Forest for the stacked learning, is superior to other state-of-the-art methods, obtaining an overlap above 75 % on the novel dataset when compared to the manually annotated ground-truth of human segmentations.
	Address
	Corporate Author				Thesis
	Publisher	Springer US	Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	HuPBA;MILAB;			Approved	no
	Call Number	Admin @ si @ PCB2016			Serial	2767
Permanent link to this record



	Author	Pau Cano; Alvaro Caravaca; Debora Gil; Eva Musulen
	Title	Diagnosis of Helicobacter pylori using AutoEncoders for the Detection of Anomalous Staining Patterns in Immunohistochemistry Images			Type	Miscellaneous
	Year	2023	Publication	Arxiv	Abbreviated Journal
	Volume		Issue		Pages	107241
	Keywords
	Abstract	This work addresses the detection of Helicobacter pylori a bacterium classified since 1994 as class 1 carcinogen to humans. By its highest specificity and sensitivity, the preferred diagnosis technique is the analysis of histological images with immunohistochemical staining, a process in which certain stained antibodies bind to antigens of the biological element of interest. This analysis is a time demanding task, which is currently done by an expert pathologist that visually inspects the digitized samples. We propose to use autoencoders to learn latent patterns of healthy tissue and detect H. pylori as an anomaly in image staining. Unlike existing classification approaches, an autoencoder is able to learn patterns in an unsupervised manner (without the need of image annotations) with high performance. In particular, our model has an overall 91% of accuracy with 86\% sensitivity, 96% specificity and 0.97 AUC in the detection of H. pylori.
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	IAM			Approved	no
	Call Number	Admin @ si @ CCG2023			Serial	3855
Permanent link to this record



	Author	Sergio Escalera; Ralf Herbrich
	Title	The NeurIPS’18 Competition: From Machine Learning to Intelligent Conversations			Type	Book Whole
	Year	2020	Publication	The Springer Series on Challenges in Machine Learning	Abbreviated Journal
	Volume		Issue		Pages
	Keywords
	Abstract	This volume presents the results of the Neural Information Processing Systems Competition track at the 2018 NeurIPS conference. The competition follows the same format as the 2017 competition track for NIPS. Out of 21 submitted proposals, eight competition proposals were selected, spanning the area of Robotics, Health, Computer Vision, Natural Language Processing, Systems and Physics. Competitions have become an integral part of advancing state-of-the-art in artificial intelligence (AI). They exhibit one important difference to benchmarks: Competitions test a system end-to-end rather than evaluating only a single component; they assess the practicability of an algorithmic solution in addition to assessing feasibility.
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor	Sergio Escalera; Ralf Hebrick
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN	2520-1328	ISBN	978-3-030-29134-1	Medium
	Area		Expedition		Conference
	Notes	HuPBA; no menciona			Approved	no
	Call Number	Admin @ si @ HeE2020			Serial	3328
Permanent link to this record



	Author	Sounak Dey
	Title	Mapping between Images and Conceptual Spaces: Sketch-based Image Retrieval			Type	Book Whole
	Year	2020	Publication	PhD Thesis, Universitat Autonoma de Barcelona-CVC	Abbreviated Journal
	Volume		Issue		Pages
	Keywords
	Abstract	This thesis presents several contributions to the literature of sketch based image retrieval (SBIR). In SBIR the first challenge we face is how to map two different domains to common space for effective retrieval of images, while tackling the different levels of abstraction people use to express their notion of objects around while sketching. To this extent we first propose a cross-modal learning framework that maps both sketches and text into a joint embedding space invariant to depictive style, while preserving semantics. Then we have also investigated different query types possible to encompass people's dilema in sketching certain world objects. For this we propose an approach for multi-modal image retrieval in multi-labelled images. A multi-modal deep network architecture is formulated to jointly model sketches and text as input query modalities into a common embedding space, which is then further aligned with the image feature space. This permits encoding the object-based features and its alignment with the query irrespective of the availability of the co-occurrence of different objects in the training set. Finally, we explore the problem of zero-shot sketch-based image retrieval (ZS-SBIR), where human sketches are used as queries to conduct retrieval of photos from unseen categories. We importantly advance prior arts by proposing a novel ZS-SBIR scenario that represents a firm step forward in its practical application. The new setting uniquely recognises two important yet often neglected challenges of practical ZS-SBIR, (i) the large domain gap between amateur sketch and photo, and (ii) the necessity for moving towards large-scale retrieval. We first contribute to the community a novel ZS-SBIR dataset, QuickDraw-Extended. We also in this dissertation pave the path to the future direction of research in this domain.
	Address
	Corporate Author				Thesis	Ph.D. thesis
	Publisher	Ediciones Graficas Rey	Place of Publication		Editor	Josep Llados;Umapada Pal
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN	978-84-121011-8-8	Medium
	Area		Expedition		Conference
	Notes	DAG; 600.121			Approved	no
	Call Number	Admin @ si @ Dey20			Serial	3480
Permanent link to this record



	Author	Ivan Huerta
	Title	Foreground Object Segmentation and Shadow Detection for Video Sequences in Uncontrolled Environments			Type	Book Whole
	Year	2010	Publication	PhD Thesis, Universitat Autonoma de Barcelona-CVC	Abbreviated Journal
	Volume		Issue		Pages
	Keywords
	Abstract	This Thesis is mainly divided in two parts. The first one presents a study of motion segmentation problems. Based on this study, a novel algorithm for mobile-object segmentation from a static background scene is also presented. This approach is demonstrated robust and accurate under most of the common problems in motion segmentation. The second one tackles the problem of shadows in depth. Firstly, a bottom-up approach based on a chromatic shadow detector is presented to deal with umbra shadows. Secondly, a top-down approach based on a tracking system has been developed in order to enhance the chromatic shadow detection. In our first contribution, a case analysis of motion segmentation problems is presented by taking into account the problems associated with different cues, namely colour, edge and intensity. Our second contribution is a hybrid architecture which handles the main problems observed in such a case analysis, by fusing (i) the knowledge from these three cues and (ii) a temporal difference algorithm. On the one hand, we enhance the colour and edge models to solve both global/local illumination changes (shadows and highlights) and camouflage in intensity. In addition, local information is exploited to cope with a very challenging problem such as the camouflage in chroma. On the other hand, the intensity cue is also applied when colour and edge cues are not available, such as when beyond the dynamic range. Additionally, temporal difference is included to segment motion when these three cues are not available, such as that background not visible during the training period. Lastly, the approach is enhanced for allowing ghost detection. As a result, our approach obtains very accurate and robust motion segmentation in both indoor and outdoor scenarios, as quantitatively and qualitatively demonstrated in the experimental results, by comparing our approach with most best-known state-of-the-art approaches. Motion Segmentation has to deal with shadows to avoid distortions when detecting moving objects. Most segmentation approaches dealing with shadow detection are typically restricted to penumbra shadows. Therefore, such techniques cannot cope well with umbra shadows. Consequently, umbra shadows are usually detected as part of moving objects. Firstly, a bottom-up approach for detection and removal of chromatic moving shadows in surveillance scenarios is proposed. Secondly, a top-down approach based on kalman filters to detect and track shadows has been developed in order to enhance the chromatic shadow detection. In the Bottom-up part, the shadow detection approach applies a novel technique based on gradient and colour models for separating chromatic moving shadows from moving objects. Well-known colour and gradient models are extended and improved into an invariant colour cone model and an invariant gradient model, respectively, to perform automatic segmentation while detecting potential shadows. Hereafter, the regions corresponding to potential shadows are grouped by considering ”a bluish effect” and an edge partitioning. Lastly, (i) temporal similarities between local gradient structures and (ii) spatial similarities between chrominance angle and brightness distortions are analysed for all potential shadow regions in order to finally identify umbra shadows. In the top-down process, after detection of objects and shadows both are tracked using Kalman filters, in order to enhance the chromatic shadow detection, when it fails to detect a shadow. Firstly, this implies a data association between the blobs (foreground and shadow) and Kalman filters. Secondly, an event analysis of the different data association cases is performed, and occlusion handling is managed by a Probabilistic Appearance Model (PAM). Based on this association, temporal consistency is looked for the association between foregrounds and shadows and their respective Kalman Filters. From this association several cases are studied, as a result lost chromatic shadows are correctly detected. Finally, the tracking results are used as feedback to improve the shadow and object detection. Unlike other approaches, our method does not make any a-priori assumptions about camera location, surface geometries, surface textures, shapes and types of shadows, objects, and background. Experimental results show the performance and accuracy of our approach in different shadowed materials and illumination conditions.
	Address
	Corporate Author				Thesis	Ph.D. thesis
	Publisher	Ediciones Graficas Rey	Place of Publication		Editor	Jordi Gonzalez;Xavier Roca
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN	978-84-937261-3-3	Medium
	Area		Expedition		Conference
	Notes				Approved	no
	Call Number	ISE @ ise @ Hue2010			Serial	1332
Permanent link to this record



	Author	Ariel Amato
	Title	Environment-Independent Moving Cast Shadow Suppression in Video Surveillance			Type	Book Whole
	Year	2012	Publication	PhD Thesis, Universitat Autonoma de Barcelona-CVC	Abbreviated Journal
	Volume		Issue		Pages
	Keywords
	Abstract	This thesis is devoted to moving shadows detection and suppression. Shadows could be defined as the parts of the scene that are not directly illuminated by a light source due to obstructing object or objects. Often, moving shadows in images sequences are undesirable since they could cause degradation of the expected results during processing of images for object detection, segmentation, scene surveillance or similar purposes. In this thesis first moving shadow detection methods are exhaustively overviewed. Beside the mentioned methods from literature and to compensate their limitations a new moving shadow detection method is proposed. It requires no prior knowledge about the scene, nor is it restricted to assumptions about specific scene structures. Furthermore, the technique can detect both achromatic and chromatic shadows even in the presence of camouflage that occurs when foreground regions are very similar in color to shadowed regions. The method exploits local color constancy properties due to reflectance suppression over shadowed regions. To detect shadowed regions in a scene the values of the background image are divided by values of the current frame in the RGB color space. In the thesis how this luminance ratio can be used to identify segments with low gradient constancy is shown, which in turn distinguish shadows from foreground. Experimental results on a collection of publicly available datasets illustrate the superior performance of the proposed method compared with the most sophisticated state-of-the-art shadow detection algorithms. These results show that the proposed approach is robust and accurate over a broad range of shadow types and challenging video conditions.
	Address
	Corporate Author				Thesis	Ph.D. thesis
	Publisher	Ediciones Graficas Rey	Place of Publication		Editor	Mikhail Mozerov;Jordi Gonzalez
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	ISE			Approved	no
	Call Number	Admin @ si @ Ama2012			Serial	2201
Permanent link to this record



	Author	Cristhian A. Aguilera-Carrasco
	Title	Evaluation of feature detectors and descriptors in VISIBLE-LWIR cross-spectral imaging			Type	Report
	Year	2014	Publication	CVC Technical Report	Abbreviated Journal
	Volume	177	Issue		Pages
	Keywords	Multi-spectral; Cross-spectral; Visible-LWIR imaging; Multimodal.
	Abstract	This thesis evaluates the performance of different state-of-art feature detectors and descriptors algorithms in the Visible-LWIR cross-spectral scenario. The focus is to determine if current detector and descriptor algorithms can be used to match features between the LWIR spectrum and the visible spectrum in applications such as, visual odometry, object recognition, image registration and stereo vision. An outdoor cross-spectral dataset was created to evaluate the suitability of the different algorithms. The results show that the tested algorithms are not suitable to the task of matching features across different spectra. The repeatability ratio was smaller than the 30 percent in the best case and in general matched features were not accurate located. Additionally, these results also suggest that is necessary to create new algorithms that take into account the nature of the different spectra, describing characteristics that exist in both spectra such as discontinuities.
	Address
	Corporate Author				Thesis	Master's thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	ADAS; 600.076			Approved	no
	Call Number	Admin @ si @Agu2014			Serial	2526
Permanent link to this record



	Author	Diego Velazquez
	Title	Towards Robustness in Computer-based Image Understanding			Type	Book Whole
	Year	2023	Publication	PhD Thesis, Universitat Autonoma de Barcelona-CVC	Abbreviated Journal
	Volume		Issue		Pages
	Keywords
	Abstract	This thesis embarks on an exploratory journey into robustness in deep learning, with a keen focus on the intertwining facets of generalization, explainability, and edge cases within the realm of computer vision. In deep learning, robustness epitomizes a model’s resilience and flexibility, grounded on its capacity to generalize across diverse data distributions, explain its predictions transparently, and navigate the intricacies of edge cases effectively. The challenges associated with robust generalization are multifaceted, encompassing the model’s performance on unseen data and its defense against out-of-distribution data and adversarial attacks. Bridging this gap, the potential of Embedding Propagation (EP) for improving out-of-distribution generalization is explored. EP is depicted as a powerful tool facilitating manifold smoothing, which in turn fortifies the model’s robustness against adversarial onslaughts and bolsters performance in few-shot and self-/semi-supervised learning scenarios. In the labyrinth of deep learning models, the path to robustness often intersects with explainability. As model complexity increases, so does the urgency to decipher their decision-making processes. Acknowledging this, the thesis introduces a robust framework for evaluating and comparing various counterfactual explanation methods, echoing the imperative of explanation quality over quantity and spotlighting the intricacies of diversifying explanations. Simultaneously, the deep learning landscape is fraught with edge cases – anomalies in the form of small objects or rare instances in object detection tasks that defy the norm. Confronting this, the thesis presents an extension of the DETR (DEtection TRansformer) model to enhance small object detection. The devised DETR-FP, embedding the Feature Pyramid technique, demonstrating improvement in small objects detection accuracy, albeit facing challenges like high computational costs. With emergence of foundation models in mind, the thesis unveils EarthView, the largest scale remote sensing dataset to date, built for the self-supervised learning of a robust foundational model for remote sensing. Collectively, these studies contribute to the grand narrative of robustness in deep learning, weaving together the strands of generalization, explainability, and edge case performance. Through these methodological advancements and novel datasets, the thesis calls for continued exploration, innovation, and refinement to fortify the bastion of robust computer vision.
	Address
	Corporate Author				Thesis	Ph.D. thesis
	Publisher	IMPRIMA	Place of Publication		Editor	Jordi Gonzalez;Josep M. Gonfaus;Pau Rodriguez
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN	978-81-126409-5-3	Medium
	Area		Expedition		Conference
	Notes	ISE			Approved	no
	Call Number	Admin @ si @ Vel2023			Serial	3965
Permanent link to this record



	Author	Lluis Gomez
	Title	Exploiting Similarity Hierarchies for Multi-script Scene Text Understanding			Type	Book Whole
	Year	2016	Publication	PhD Thesis, Universitat Autonoma de Barcelona-CVC	Abbreviated Journal
	Volume		Issue		Pages
	Keywords
	Abstract	This thesis addresses the problem of automatic scene text understanding in unconstrained conditions. In particular, we tackle the tasks of multi-language and arbitrary-oriented text detection, tracking, and script identification in natural scenes. For this we have developed a set of generic methods that build on top of the basic observation that text has always certain key visual and structural characteristics that are independent of the language or script in which it is written. Text instances in any language or script are always formed as groups of similar atomic parts, being them either individual characters, small stroke parts, or even whole words in the case of cursive text. This holistic (sumof-parts) and recursive perspective has lead us to explore different variants of the “segmentation and grouping” paradigm of computer vision. Scene text detection methodologies are usually based in classification of individual regions or patches, using a priory knowledge for a given script or language. Human perception of text, on the other hand, is based on perceptual organization through which text emerges as a perceptually significant group of atomic objects. In this thesis, we argue that the text detection problem must be posed as the detection of meaningful groups of regions. We address the problem of text detection in natural scenes from a hierarchical perspective, making explicit use of the recursive nature of text, aiming directly to the detection of region groupings corresponding to text within a hierarchy produced by an agglomerative similarity clustering process over individual regions. We propose an optimal way to construct such an hierarchy introducing a feature space designed to produce text group hypothese with high recall and a novel stopping rule combining a discriminative classifier and a probabilistic measure of group meaningfulness based in perceptual organization. Within this generic framework, we design a text-specific object proposals algorithm that, contrary to existing generic object proposals methods, aims directly to the detection of text regions groupings. For this, we abandon the rigid definition of “what is text” of traditional specialized text detectors, and move towards more fuzzy perspective of grouping-based object proposals methods. Then, we present a hybrid algorithm for detection and tracking of scene text where the notion of region groupings plays also a central role. By leveraging the structural arrangement of text group components between consecutive frames we can improve the overall tracking performance of the system. Finally, since our generic detection framework is inherently designed for multi-language environments, we focus on the problem of script identification in order to build a multi-language end-toend reading system. Facing this problem with state of the art CNN classifiers is not straightforward, as they fail to address a key characteristic of scene text instances: their extremely variable aspect ratio. Instead of resizing input images to a fixed size as in the typical use of holistic CNN classifiers, we propose a patch-based classification framework in order to preserve discriminative parts of the image that are characteristic of its class. We describe a novel method based on the use of ensembles of conjoined networks to jointly learn discriminative stroke-parts representations and their relative importance in a patch-based classification scheme.
	Address
	Corporate Author				Thesis	Ph.D. thesis
	Publisher		Place of Publication		Editor	Dimosthenis Karatzas
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	DAG			Approved	no
	Call Number	Admin @ si @ Gom2016			Serial	2891
Permanent link to this record