Home | [1–10] << 11 12 13 14 15 16 17 18 19 20 >> [21–30] |
Records | |||||
---|---|---|---|---|---|
Author | Henry Velesaca; Patricia Suarez; Angel Sappa; Dario Carpio; Rafael E. Rivadeneira; Angel Sanchez | ||||
Title | Review on Common Techniques for Urban Environment Video Analytics | Type | Conference Article | ||
Year | 2022 | Publication | Anais do III Workshop Brasileiro de Cidades Inteligentes | Abbreviated Journal | |
Volume | Issue | Pages | 107-118 | ||
Keywords | Video Analytics; Review; Urban Environments; Smart Cities | ||||
Abstract | This work compiles the different computer vision-based approaches
from the state-of-the-art intended for video analytics in urban environments. The manuscript groups the different approaches according to the typical modules present in video analysis, including image preprocessing, object detection, classification, and tracking. This proposed pipeline serves as a basic guide to representing these most representative approaches in this topic of video analysis that will be addressed in this work. Furthermore, the manuscript is not intended to be an exhaustive review of the most advanced approaches, but only a list of common techniques proposed to address recurring problems in this field. |
||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | WBCI | ||
Notes | MSIAU; 601.349 | Approved | no | ||
Call Number | Admin @ si @ VSS2022 | Serial | 3773 | ||
Permanent link to this record | |||||
Author | Angel Morera; Angel Sanchez; A. Belen Moreno; Angel Sappa; Jose F. Velez | ||||
Title | SSD vs. YOLO for Detection of Outdoor Urban Advertising Panels under Multiple Variabilities | Type | Journal Article | ||
Year | 2020 | Publication | Sensors | Abbreviated Journal | SENS |
Volume | 20 | Issue | 16 | Pages | 4587 |
Keywords | |||||
Abstract | This work compares Single Shot MultiBox Detector (SSD) and You Only Look Once (YOLO) deep neural networks for the outdoor advertisement panel detection problem by handling multiple and combined variabilities in the scenes. Publicity panel detection in images offers important advantages both in the real world as well as in the virtual one. For example, applications like Google Street View can be used for Internet publicity and when detecting these ads panels in images, it could be possible to replace the publicity appearing inside the panels by another from a funding company. In our experiments, both SSD and YOLO detectors have produced acceptable results under variable sizes of panels, illumination conditions, viewing perspectives, partial occlusion of panels, complex background and multiple panels in scenes. Due to the difficulty of finding annotated images for the considered problem, we created our own dataset for conducting the experiments. The major strength of the SSD model was the almost elimination of False Positive (FP) cases, situation that is preferable when the publicity contained inside the panel is analyzed after detecting them. On the other side, YOLO produced better panel localization results detecting a higher number of True Positive (TP) panels with a higher accuracy. Finally, a comparison of the two analyzed object detection models with different types of semantic segmentation networks and using the same evaluation metrics is also included. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | MSIAU; 600.130; 601.349; 600.122 | Approved | no | ||
Call Number | Admin @ si @ MSM2020 | Serial | 3452 | ||
Permanent link to this record | |||||
Author | Jorge Bernal; F. Javier Sanchez; Fernando Vilariño | ||||
Title | A Region Segmentation Method for Colonoscopy Images Using a Model of Polyp Appearance | Type | Conference Article | ||
Year | 2011 | Publication | 5th Iberian Conference on Pattern Recognition and Image Analysis | Abbreviated Journal | |
Volume | 6669 | Issue | Pages | 134-143 | |
Keywords | Colonoscopy, Polyp Detection, Region Merging, Region Segmentation. | ||||
Abstract | This work aims at the segmentation of colonoscopy images into a minimum number of informative regions. Our method performs in a way such, if a polyp is present in the image, it will be exclusively and totally contained in a single region. This result can be used in later stages to classify regions as polyp-containing candidates. The output of the algorithm also defines which regions can be considered as non-informative. The algorithm starts with a high number of initial regions and merges them taking into account the model of polyp appearance obtained from available data. The results show that our segmentations of polyp regions are more accurate than state-of-the-art methods. | ||||
Address | Las Palmas de Gran Canaria, June 2011 | ||||
Corporate Author | SpringerLink | Thesis | |||
Publisher | Place of Publication | Editor | Vitrià, Jordi and Sanches, João and Hernández, Mario | ||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Lecture Notes in Computer Science | Abbreviated Series Title | LNCS | |
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | 978-3-642-21256-7 | Medium | ||
Area | 800 | Expedition | Conference | IbPRIA | |
Notes | MV;SIAI | Approved | no | ||
Call Number | IAM @ iam @ BSV2011c | Serial | 1696 | ||
Permanent link to this record | |||||
Author | Jorge Bernal; F. Javier Sanchez; Fernando Vilariño | ||||
Title | Towards Automatic Polyp Detection with a Polyp Appearance Model | Type | Journal Article | ||
Year | 2012 | Publication | Pattern Recognition | Abbreviated Journal | PR |
Volume | 45 | Issue | 9 | Pages | 3166-3182 |
Keywords | Colonoscopy,PolypDetection,RegionSegmentation,SA-DOVA descriptot | ||||
Abstract | This work aims at the automatic polyp detection by using a model of polyp appearance in the context of the analysis of colonoscopy videos. Our method consists of three stages: region segmentation, region description and region classification. The performance of our region segmentation method guarantees that if a polyp is present in the image, it will be exclusively and totally contained in a single region. The output of the algorithm also defines which regions can be considered as non-informative. We define as our region descriptor the novel Sector Accumulation-Depth of Valleys Accumulation (SA-DOVA), which provides a necessary but not sufficient condition for the polyp presence. Finally, we classify our segmented regions according to the maximal values of the SA-DOVA descriptor. Our preliminary classification results are promising, especially when classifying those parts of the image that do not contain a polyp inside. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Elsevier | Place of Publication | Editor | ||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | 0031-3203 | ISBN | Medium | ||
Area | 800 | Expedition | Conference | IbPRIA | |
Notes | MV;SIAI | Approved | no | ||
Call Number | Admin @ si @ BSV2012; IAM @ iam | Serial | 1997 | ||
Permanent link to this record | |||||
Author | Jorge Bernal; F. Javier Sanchez; Fernando Vilariño | ||||
Title | Depth of Valleys Accumulation Algorithm for Object Detection | Type | Conference Article | ||
Year | 2011 | Publication | 14th Congrès Català en Intel·ligencia Artificial | Abbreviated Journal | |
Volume | 1 | Issue | 1 | Pages | 71-80 |
Keywords | Object Recognition, Object Region Identification, Image Analysis, Image Processing | ||||
Abstract | This work aims at detecting in which regions the objects in the image are by using information about the intensity of valleys, which appear to surround ob- jects in images where the source of light is in the line of direction than the camera. We present our depth of valleys accumulation method, which consists of two stages: first, the definition of the depth of valleys image which combines the output of a ridges and valleys detector with the morphological gradient to measure how deep is a point inside a valley and second, an algorithm that denotes points of the image as interior to objects those which are inside complete or incomplete boundaries in the depth of valleys image. To evaluate the performance of our method we have tested it on several application domains. Our results on object region identification are promising, specially in the field of polyp detection in colonoscopy videos, and we also show its applicability in different areas. | ||||
Address | Lleida | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | 978-1-60750-841-0 | Medium | ||
Area | 800 | Expedition | Conference | CCIA | |
Notes | MV;SIAI | Approved | no | ||
Call Number | IAM @ iam @ BSV2011b | Serial | 1699 | ||
Permanent link to this record | |||||
Author | Minesh Mathew; Lluis Gomez; Dimosthenis Karatzas; C.V. Jawahar | ||||
Title | Asking questions on handwritten document collections | Type | Journal Article | ||
Year | 2021 | Publication | International Journal on Document Analysis and Recognition | Abbreviated Journal | IJDAR |
Volume | 24 | Issue | Pages | 235-249 | |
Keywords | |||||
Abstract | This work addresses the problem of Question Answering (QA) on handwritten document collections. Unlike typical QA and Visual Question Answering (VQA) formulations where the answer is a short text, we aim to locate a document snippet where the answer lies. The proposed approach works without recognizing the text in the documents. We argue that the recognition-free approach is suitable for handwritten documents and historical collections where robust text recognition is often difficult. At the same time, for human users, document image snippets containing answers act as a valid alternative to textual answers. The proposed approach uses an off-the-shelf deep embedding network which can project both textual words and word images into a common sub-space. This embedding bridges the textual and visual domains and helps us retrieve document snippets that potentially answer a question. We evaluate results of the proposed approach on two new datasets: (i) HW-SQuAD: a synthetic, handwritten document image counterpart of SQuAD1.0 dataset and (ii) BenthamQA: a smaller set of QA pairs defined on documents from the popular Bentham manuscripts collection. We also present a thorough analysis of the proposed recognition-free approach compared to a recognition-based approach which uses text recognized from the images using an OCR. Datasets presented in this work are available to download at docvqa.org. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | DAG; 600.121 | Approved | no | ||
Call Number | Admin @ si @ MGK2021 | Serial | 3621 | ||
Permanent link to this record | |||||
Author | Cristina Palmero; Albert Clapes; Chris Bahnsen; Andreas Møgelmose; Thomas B. Moeslund; Sergio Escalera | ||||
Title | Multi-modal RGB-Depth-Thermal Human Body Segmentation | Type | Journal Article | ||
Year | 2016 | Publication | International Journal of Computer Vision | Abbreviated Journal | IJCV |
Volume | 118 | Issue | 2 | Pages | 217-239 |
Keywords | Human body segmentation; RGB ; Depth Thermal | ||||
Abstract | This work addresses the problem of human body segmentation from multi-modal visual cues as a first stage of automatic human behavior analysis. We propose a novel RGB–depth–thermal dataset along with a multi-modal segmentation baseline. The several modalities are registered using a calibration device and a registration algorithm. Our baseline extracts regions of interest using background subtraction, defines a partitioning of the foreground regions into cells, computes a set of image features on those cells using different state-of-the-art feature extractions, and models the distribution of the descriptors per cell using probabilistic models. A supervised learning algorithm then fuses the output likelihoods over cells in a stacked feature vector representation. The baseline, using Gaussian mixture models for the probabilistic modeling and Random Forest for the stacked learning, is superior to other state-of-the-art methods, obtaining an overlap above 75 % on the novel dataset when compared to the manually annotated ground-truth of human segmentations. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Springer US | Place of Publication | Editor | ||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | HuPBA;MILAB; | Approved | no | ||
Call Number | Admin @ si @ PCB2016 | Serial | 2767 | ||
Permanent link to this record | |||||
Author | Pau Cano; Alvaro Caravaca; Debora Gil; Eva Musulen | ||||
Title | Diagnosis of Helicobacter pylori using AutoEncoders for the Detection of Anomalous Staining Patterns in Immunohistochemistry Images | Type | Miscellaneous | ||
Year | 2023 | Publication | Arxiv | Abbreviated Journal | |
Volume | Issue | Pages | 107241 | ||
Keywords | |||||
Abstract | This work addresses the detection of Helicobacter pylori a bacterium classified since 1994 as class 1 carcinogen to humans. By its highest specificity and sensitivity, the preferred diagnosis technique is the analysis of histological images with immunohistochemical staining, a process in which certain stained antibodies bind to antigens of the biological element of interest. This analysis is a time demanding task, which is currently done by an expert pathologist that visually inspects the digitized samples.
We propose to use autoencoders to learn latent patterns of healthy tissue and detect H. pylori as an anomaly in image staining. Unlike existing classification approaches, an autoencoder is able to learn patterns in an unsupervised manner (without the need of image annotations) with high performance. In particular, our model has an overall 91% of accuracy with 86\% sensitivity, 96% specificity and 0.97 AUC in the detection of H. pylori. |
||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | IAM | Approved | no | ||
Call Number | Admin @ si @ CCG2023 | Serial | 3855 | ||
Permanent link to this record | |||||
Author | Sergio Escalera; Ralf Herbrich | ||||
Title | The NeurIPS’18 Competition: From Machine Learning to Intelligent Conversations | Type | Book Whole | ||
Year | 2020 | Publication | The Springer Series on Challenges in Machine Learning | Abbreviated Journal | |
Volume | Issue | Pages | |||
Keywords | |||||
Abstract | This volume presents the results of the Neural Information Processing Systems Competition track at the 2018 NeurIPS conference. The competition follows the same format as the 2017 competition track for NIPS. Out of 21 submitted proposals, eight competition proposals were selected, spanning the area of Robotics, Health, Computer Vision, Natural Language Processing, Systems and Physics. Competitions have become an integral part of advancing state-of-the-art in artificial intelligence (AI). They exhibit one important difference to benchmarks: Competitions test a system end-to-end rather than evaluating only a single component; they assess the practicability of an algorithmic solution in addition to assessing feasibility. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | Sergio Escalera; Ralf Hebrick | ||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | 2520-1328 | ISBN | 978-3-030-29134-1 | Medium | |
Area | Expedition | Conference | |||
Notes | HuPBA; no menciona | Approved | no | ||
Call Number | Admin @ si @ HeE2020 | Serial | 3328 | ||
Permanent link to this record | |||||
Author | Sounak Dey | ||||
Title | Mapping between Images and Conceptual Spaces: Sketch-based Image Retrieval | Type | Book Whole | ||
Year | 2020 | Publication | PhD Thesis, Universitat Autonoma de Barcelona-CVC | Abbreviated Journal | |
Volume | Issue | Pages | |||
Keywords | |||||
Abstract | This thesis presents several contributions to the literature of sketch based image retrieval (SBIR). In SBIR the first challenge we face is how to map two different domains to common space for effective retrieval of images, while tackling the different levels of abstraction people use to express their notion of objects around while sketching. To this extent we first propose a cross-modal learning framework that maps both sketches and text into a joint embedding space invariant to depictive style, while preserving semantics. Then we have also investigated different query types possible to encompass people's dilema in sketching certain world objects. For this we propose an approach for multi-modal image retrieval in multi-labelled images. A multi-modal deep network architecture is formulated to jointly model sketches and text as input query modalities into a common embedding space, which is then further aligned with the image feature space. This permits encoding the object-based features and its alignment with the query irrespective of the availability of the co-occurrence of different objects in the training set.
Finally, we explore the problem of zero-shot sketch-based image retrieval (ZS-SBIR), where human sketches are used as queries to conduct retrieval of photos from unseen categories. We importantly advance prior arts by proposing a novel ZS-SBIR scenario that represents a firm step forward in its practical application. The new setting uniquely recognises two important yet often neglected challenges of practical ZS-SBIR, (i) the large domain gap between amateur sketch and photo, and (ii) the necessity for moving towards large-scale retrieval. We first contribute to the community a novel ZS-SBIR dataset, QuickDraw-Extended. We also in this dissertation pave the path to the future direction of research in this domain. |
||||
Address | |||||
Corporate Author | Thesis | Ph.D. thesis | |||
Publisher | Ediciones Graficas Rey | Place of Publication | Editor | Josep Llados;Umapada Pal | |
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | 978-84-121011-8-8 | Medium | ||
Area | Expedition | Conference | |||
Notes | DAG; 600.121 | Approved | no | ||
Call Number | Admin @ si @ Dey20 | Serial | 3480 | ||
Permanent link to this record | |||||
Author | Ivan Huerta | ||||
Title | Foreground Object Segmentation and Shadow Detection for Video Sequences in Uncontrolled Environments | Type | Book Whole | ||
Year | 2010 | Publication | PhD Thesis, Universitat Autonoma de Barcelona-CVC | Abbreviated Journal | |
Volume | Issue | Pages | |||
Keywords | |||||
Abstract | This Thesis is mainly divided in two parts. The first one presents a study of motion
segmentation problems. Based on this study, a novel algorithm for mobile-object segmentation from a static background scene is also presented. This approach is demonstrated robust and accurate under most of the common problems in motion segmentation. The second one tackles the problem of shadows in depth. Firstly, a bottom-up approach based on a chromatic shadow detector is presented to deal with umbra shadows. Secondly, a top-down approach based on a tracking system has been developed in order to enhance the chromatic shadow detection. In our first contribution, a case analysis of motion segmentation problems is presented by taking into account the problems associated with different cues, namely colour, edge and intensity. Our second contribution is a hybrid architecture which handles the main problems observed in such a case analysis, by fusing (i) the knowledge from these three cues and (ii) a temporal difference algorithm. On the one hand, we enhance the colour and edge models to solve both global/local illumination changes (shadows and highlights) and camouflage in intensity. In addition, local information is exploited to cope with a very challenging problem such as the camouflage in chroma. On the other hand, the intensity cue is also applied when colour and edge cues are not available, such as when beyond the dynamic range. Additionally, temporal difference is included to segment motion when these three cues are not available, such as that background not visible during the training period. Lastly, the approach is enhanced for allowing ghost detection. As a result, our approach obtains very accurate and robust motion segmentation in both indoor and outdoor scenarios, as quantitatively and qualitatively demonstrated in the experimental results, by comparing our approach with most best-known state-of-the-art approaches. Motion Segmentation has to deal with shadows to avoid distortions when detecting moving objects. Most segmentation approaches dealing with shadow detection are typically restricted to penumbra shadows. Therefore, such techniques cannot cope well with umbra shadows. Consequently, umbra shadows are usually detected as part of moving objects. Firstly, a bottom-up approach for detection and removal of chromatic moving shadows in surveillance scenarios is proposed. Secondly, a top-down approach based on kalman filters to detect and track shadows has been developed in order to enhance the chromatic shadow detection. In the Bottom-up part, the shadow detection approach applies a novel technique based on gradient and colour models for separating chromatic moving shadows from moving objects. Well-known colour and gradient models are extended and improved into an invariant colour cone model and an invariant gradient model, respectively, to perform automatic segmentation while detecting potential shadows. Hereafter, the regions corresponding to potential shadows are grouped by considering ”a bluish effect” and an edge partitioning. Lastly, (i) temporal similarities between local gradient structures and (ii) spatial similarities between chrominance angle and brightness distortions are analysed for all potential shadow regions in order to finally identify umbra shadows. In the top-down process, after detection of objects and shadows both are tracked using Kalman filters, in order to enhance the chromatic shadow detection, when it fails to detect a shadow. Firstly, this implies a data association between the blobs (foreground and shadow) and Kalman filters. Secondly, an event analysis of the different data association cases is performed, and occlusion handling is managed by a Probabilistic Appearance Model (PAM). Based on this association, temporal consistency is looked for the association between foregrounds and shadows and their respective Kalman Filters. From this association several cases are studied, as a result lost chromatic shadows are correctly detected. Finally, the tracking results are used as feedback to improve the shadow and object detection. Unlike other approaches, our method does not make any a-priori assumptions about camera location, surface geometries, surface textures, shapes and types of shadows, objects, and background. Experimental results show the performance and accuracy of our approach in different shadowed materials and illumination conditions. |
||||
Address | |||||
Corporate Author | Thesis | Ph.D. thesis | |||
Publisher | Ediciones Graficas Rey | Place of Publication | Editor | Jordi Gonzalez;Xavier Roca | |
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | 978-84-937261-3-3 | Medium | ||
Area | Expedition | Conference | |||
Notes | Approved | no | |||
Call Number | ISE @ ise @ Hue2010 | Serial | 1332 | ||
Permanent link to this record | |||||
Author | Ariel Amato | ||||
Title | Environment-Independent Moving Cast Shadow Suppression in Video Surveillance | Type | Book Whole | ||
Year | 2012 | Publication | PhD Thesis, Universitat Autonoma de Barcelona-CVC | Abbreviated Journal | |
Volume | Issue | Pages | |||
Keywords | |||||
Abstract | This thesis is devoted to moving shadows detection and suppression. Shadows could be defined as the parts of the scene that are not directly illuminated by a light source due to obstructing object or objects. Often, moving shadows in images sequences are undesirable since they could cause degradation of the expected results during processing of images for object detection, segmentation, scene surveillance or similar purposes. In this thesis first moving shadow detection methods are exhaustively overviewed. Beside the mentioned methods from literature and to compensate their limitations a new moving shadow detection method is proposed. It requires no prior knowledge about the scene, nor is it restricted to assumptions about specific scene structures. Furthermore, the technique can detect both achromatic and chromatic shadows even in the presence of camouflage that occurs when foreground regions are very similar in color to shadowed regions. The method exploits local color constancy properties due to reflectance suppression over shadowed regions. To detect shadowed regions in a scene the values of the background image are divided by values of the current frame in the RGB color space. In the thesis how this luminance ratio can be used to identify segments with low gradient constancy is shown, which in turn distinguish shadows from foreground. Experimental results on a collection of publicly available datasets illustrate the superior performance of the proposed method compared with the most sophisticated state-of-the-art shadow detection algorithms. These results show that the proposed approach is robust and accurate over a broad range of shadow types and challenging video conditions. | ||||
Address | |||||
Corporate Author | Thesis | Ph.D. thesis | |||
Publisher | Ediciones Graficas Rey | Place of Publication | Editor | Mikhail Mozerov;Jordi Gonzalez | |
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | ISE | Approved | no | ||
Call Number | Admin @ si @ Ama2012 | Serial | 2201 | ||
Permanent link to this record | |||||
Author | Cristhian A. Aguilera-Carrasco | ||||
Title | Evaluation of feature detectors and descriptors in VISIBLE-LWIR cross-spectral imaging | Type | Report | ||
Year | 2014 | Publication | CVC Technical Report | Abbreviated Journal | |
Volume | 177 | Issue | Pages | ||
Keywords | Multi-spectral; Cross-spectral; Visible-LWIR imaging; Multimodal. | ||||
Abstract | This thesis evaluates the performance of different state-of-art feature detectors and descriptors algorithms in the Visible-LWIR cross-spectral scenario. The focus is to determine if current detector and descriptor algorithms can be used to match features between the LWIR spectrum and the visible spectrum in applications such as, visual odometry, object recognition, image registration and stereo vision. An outdoor cross-spectral dataset was created to evaluate the suitability of the different algorithms. The results
show that the tested algorithms are not suitable to the task of matching features across different spectra. The repeatability ratio was smaller than the 30 percent in the best case and in general matched features were not accurate located. Additionally, these results also suggest that is necessary to create new algorithms that take into account the nature of the different spectra, describing characteristics that exist in both spectra such as discontinuities. |
||||
Address | |||||
Corporate Author | Thesis | Master's thesis | |||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | ADAS; 600.076 | Approved | no | ||
Call Number | Admin @ si @Agu2014 | Serial | 2526 | ||
Permanent link to this record | |||||
Author | Diego Velazquez | ||||
Title | Towards Robustness in Computer-based Image Understanding | Type | Book Whole | ||
Year | 2023 | Publication | PhD Thesis, Universitat Autonoma de Barcelona-CVC | Abbreviated Journal | |
Volume | Issue | Pages | |||
Keywords | |||||
Abstract | This thesis embarks on an exploratory journey into robustness in deep learning,
with a keen focus on the intertwining facets of generalization, explainability, and edge cases within the realm of computer vision. In deep learning, robustness epitomizes a model’s resilience and flexibility, grounded on its capacity to generalize across diverse data distributions, explain its predictions transparently, and navigate the intricacies of edge cases effectively. The challenges associated with robust generalization are multifaceted, encompassing the model’s performance on unseen data and its defense against out-of-distribution data and adversarial attacks. Bridging this gap, the potential of Embedding Propagation (EP) for improving out-of-distribution generalization is explored. EP is depicted as a powerful tool facilitating manifold smoothing, which in turn fortifies the model’s robustness against adversarial onslaughts and bolsters performance in few-shot and self-/semi-supervised learning scenarios. In the labyrinth of deep learning models, the path to robustness often intersects with explainability. As model complexity increases, so does the urgency to decipher their decision-making processes. Acknowledging this, the thesis introduces a robust framework for evaluating and comparing various counterfactual explanation methods, echoing the imperative of explanation quality over quantity and spotlighting the intricacies of diversifying explanations. Simultaneously, the deep learning landscape is fraught with edge cases – anomalies in the form of small objects or rare instances in object detection tasks that defy the norm. Confronting this, the thesis presents an extension of the DETR (DEtection TRansformer) model to enhance small object detection. The devised DETR-FP, embedding the Feature Pyramid technique, demonstrating improvement in small objects detection accuracy, albeit facing challenges like high computational costs. With emergence of foundation models in mind, the thesis unveils EarthView, the largest scale remote sensing dataset to date, built for the self-supervised learning of a robust foundational model for remote sensing. Collectively, these studies contribute to the grand narrative of robustness in deep learning, weaving together the strands of generalization, explainability, and edge case performance. Through these methodological advancements and novel datasets, the thesis calls for continued exploration, innovation, and refinement to fortify the bastion of robust computer vision. |
||||
Address | |||||
Corporate Author | Thesis | Ph.D. thesis | |||
Publisher | IMPRIMA | Place of Publication | Editor | Jordi Gonzalez;Josep M. Gonfaus;Pau Rodriguez | |
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | 978-81-126409-5-3 | Medium | ||
Area | Expedition | Conference | |||
Notes | ISE | Approved | no | ||
Call Number | Admin @ si @ Vel2023 | Serial | 3965 | ||
Permanent link to this record | |||||
Author | Lluis Gomez | ||||
Title | Exploiting Similarity Hierarchies for Multi-script Scene Text Understanding | Type | Book Whole | ||
Year | 2016 | Publication | PhD Thesis, Universitat Autonoma de Barcelona-CVC | Abbreviated Journal | |
Volume | Issue | Pages | |||
Keywords | |||||
Abstract | This thesis addresses the problem of automatic scene text understanding in unconstrained conditions. In particular, we tackle the tasks of multi-language and arbitrary-oriented text detection, tracking, and script identification in natural scenes.
For this we have developed a set of generic methods that build on top of the basic observation that text has always certain key visual and structural characteristics that are independent of the language or script in which it is written. Text instances in any language or script are always formed as groups of similar atomic parts, being them either individual characters, small stroke parts, or even whole words in the case of cursive text. This holistic (sumof-parts) and recursive perspective has lead us to explore different variants of the “segmentation and grouping” paradigm of computer vision. Scene text detection methodologies are usually based in classification of individual regions or patches, using a priory knowledge for a given script or language. Human perception of text, on the other hand, is based on perceptual organization through which text emerges as a perceptually significant group of atomic objects. In this thesis, we argue that the text detection problem must be posed as the detection of meaningful groups of regions. We address the problem of text detection in natural scenes from a hierarchical perspective, making explicit use of the recursive nature of text, aiming directly to the detection of region groupings corresponding to text within a hierarchy produced by an agglomerative similarity clustering process over individual regions. We propose an optimal way to construct such an hierarchy introducing a feature space designed to produce text group hypothese with high recall and a novel stopping rule combining a discriminative classifier and a probabilistic measure of group meaningfulness based in perceptual organization. Within this generic framework, we design a text-specific object proposals algorithm that, contrary to existing generic object proposals methods, aims directly to the detection of text regions groupings. For this, we abandon the rigid definition of “what is text” of traditional specialized text detectors, and move towards more fuzzy perspective of grouping-based object proposals methods. Then, we present a hybrid algorithm for detection and tracking of scene text where the notion of region groupings plays also a central role. By leveraging the structural arrangement of text group components between consecutive frames we can improve the overall tracking performance of the system. Finally, since our generic detection framework is inherently designed for multi-language environments, we focus on the problem of script identification in order to build a multi-language end-toend reading system. Facing this problem with state of the art CNN classifiers is not straightforward, as they fail to address a key characteristic of scene text instances: their extremely variable aspect ratio. Instead of resizing input images to a fixed size as in the typical use of holistic CNN classifiers, we propose a patch-based classification framework in order to preserve discriminative parts of the image that are characteristic of its class. We describe a novel method based on the use of ensembles of conjoined networks to jointly learn discriminative stroke-parts representations and their relative importance in a patch-based classification scheme. |
||||
Address | |||||
Corporate Author | Thesis | Ph.D. thesis | |||
Publisher | Place of Publication | Editor | Dimosthenis Karatzas | ||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | DAG | Approved | no | ||
Call Number | Admin @ si @ Gom2016 | Serial | 2891 | ||
Permanent link to this record |