Home | [201–210] << 211 212 213 214 215 216 217 218 219 220 >> [221–228] |
Records | |||||
---|---|---|---|---|---|
Author | Vacit Oguz Yazici; Joost Van de Weijer; Longlong Yu | ||||
Title | Visual Transformers with Primal Object Queries for Multi-Label Image Classification | Type | Conference Article | ||
Year | 2022 | Publication | 26th International Conference on Pattern Recognition | Abbreviated Journal | |
Volume | Issue | Pages | |||
Keywords | |||||
Abstract | Multi-label image classification is about predicting a set of class labels that can be considered as orderless sequential data. Transformers process the sequential data as a whole, therefore they are inherently good at set prediction. The first vision-based transformer model, which was proposed for the object detection task introduced the concept of object queries. Object queries are learnable positional encodings that are used by attention modules in decoder layers to decode the object classes or bounding boxes using the region of interests in an image. However, inputting the same set of object queries to different decoder layers hinders the training: it results in lower performance and delays convergence. In this paper, we propose the usage of primal object queries that are only provided at the start of the transformer decoder stack. In addition, we improve the mixup technique proposed for multi-label classification. The proposed transformer model with primal object queries improves the state-of-the-art class wise F1 metric by 2.1% and 1.8%; and speeds up the convergence by 79.0% and 38.6% on MS-COCO and NUS-WIDE datasets respectively. | ||||
Address | Montreal; Quebec; Canada; August 2022 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | ICPR | ||
Notes | LAMP; 600.147; 601.309 | Approved | no | ||
Call Number | Admin @ si @ YWY2022 | Serial | 3786 | ||
Permanent link to this record | |||||
Author | Ayan Banerjee; Palaiahnakote Shivakumara; Parikshit Acharya; Umapada Pal; Josep Llados | ||||
Title | TWD: A New Deep E2E Model for Text Watermark Detection in Video Images | Type | Conference Article | ||
Year | 2022 | Publication | 26th International Conference on Pattern Recognition | Abbreviated Journal | |
Volume | Issue | Pages | |||
Keywords | Deep learning; U-Net; FCENet; Scene text detection; Video text detection; Watermark text detection | ||||
Abstract | Text watermark detection in video images is challenging because text watermark characteristics are different from caption and scene texts in the video images. Developing a successful model for detecting text watermark, caption, and scene texts is an open challenge. This study aims at developing a new Deep End-to-End model for Text Watermark Detection (TWD), caption and scene text in video images. To standardize non-uniform contrast, quality, and resolution, we explore the U-Net3+ model for enhancing poor quality text without affecting high-quality text. Similarly, to address the challenges of arbitrary orientation, text shapes and complex background, we explore Stacked Hourglass Encoded Fourier Contour Embedding Network (SFCENet) by feeding the output of the U-Net3+ model as input. Furthermore, the proposed work integrates enhancement and detection models as an end-to-end model for detecting multi-type text in video images. To validate the proposed model, we create our own dataset (named TW-866), which provides video images containing text watermark, caption (subtitles), as well as scene text. The proposed model is also evaluated on standard natural scene text detection datasets, namely, ICDAR 2019 MLT, CTW1500, Total-Text, and DAST1500. The results show that the proposed method outperforms the existing methods. This is the first work on text watermark detection in video images to the best of our knowledge | ||||
Address | Montreal; Quebec; Canada; August 2022 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | ICPR | ||
Notes | DAG; | Approved | no | ||
Call Number | Admin @ si @ BSA2022 | Serial | 3788 | ||
Permanent link to this record | |||||
Author | Ahmed M. A. Salih; Ilaria Boscolo Galazzo; Federica Cruciani; Lorenza Brusini; Petia Radeva | ||||
Title | Investigating Explainable Artificial Intelligence for MRI-based Classification of Dementia: a New Stability Criterion for Explainable Methods | Type | Conference Article | ||
Year | 2022 | Publication | 29th IEEE International Conference on Image Processing | Abbreviated Journal | |
Volume | Issue | Pages | |||
Keywords | Image processing; Stability criteria; Machine learning; Robustness; Alzheimer's disease; Monitoring | ||||
Abstract | Individuals diagnosed with Mild Cognitive Impairment (MCI) have shown an increased risk of developing Alzheimer’s Disease (AD). As such, early identification of dementia represents a key prognostic element, though hampered by complex disease patterns. Increasing efforts have focused on Machine Learning (ML) to build accurate classification models relying on a multitude of clinical/imaging variables. However, ML itself does not provide sensible explanations related to the model mechanism and feature contribution. Explainable Artificial Intelligence (XAI) represents the enabling technology in this framework, allowing to understand ML outcomes and derive human-understandable explanations. In this study, we aimed at exploring ML combined with MRI-based features and XAI to solve this classification problem and interpret the outcome. In particular, we propose a new method to assess the robustness of feature rankings provided by XAI methods, especially when multicollinearity exists. Our findings indicate that our method was able to disentangle the list of the informative features underlying dementia, with important implications for aiding personalized monitoring plans. | ||||
Address | Bordeaux; France; October 2022 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | ICIP | ||
Notes | MILAB | Approved | no | ||
Call Number | Admin @ si @ SBC2022 | Serial | 3789 | ||
Permanent link to this record | |||||
Author | Chengyi Zou; Shuai Wan; Marta Mrak; Marc Gorriz Blanch; Luis Herranz; Tiannan Ji | ||||
Title | Towards Lightweight Neural Network-based Chroma Intra Prediction for Video Coding | Type | Conference Article | ||
Year | 2022 | Publication | 29th IEEE International Conference on Image Processing | Abbreviated Journal | |
Volume | Issue | Pages | |||
Keywords | Video coding; Quantization (signal); Computational modeling; Neural networks; Predictive models; Video compression; Syntactics | ||||
Abstract | In video compression the luma channel can be useful for predicting chroma channels (Cb, Cr), as has been demonstrated with the Cross-Component Linear Model (CCLM) used in Versatile Video Coding (VVC) standard. More recently, it has been shown that neural networks can even better capture the relationship among different channels. In this paper, a new attention-based neural network is proposed for cross-component intra prediction. With the goal to simplify neural network design, the new framework consists of four branches: boundary branch and luma branch for extracting features from reference samples, attention branch for fusing the first two branches, and prediction branch for computing the predicted chroma samples. The proposed scheme is integrated into VVC test model together with one additional binary block-level syntax flag which indicates whether a given block makes use of the proposed method. Experimental results demonstrate 0.31%/2.36%/2.00% BD-rate reductions on Y/Cb/Cr components, respectively, on top of the VVC Test Model (VTM) 7.0 which uses CCLM. | ||||
Address | Bordeaux; France; October 2022 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | ICIP | ||
Notes | MACO | Approved | no | ||
Call Number | Admin @ si @ ZWM2022 | Serial | 3790 | ||
Permanent link to this record | |||||
Author | Yaxing Wang; Joost Van de Weijer; Lu Yu; Shangling Jui | ||||
Title | Distilling GANs with Style-Mixed Triplets for X2I Translation with Limited Data | Type | Conference Article | ||
Year | 2022 | Publication | 10th International Conference on Learning Representations | Abbreviated Journal | |
Volume | Issue | Pages | |||
Keywords | |||||
Abstract | Conditional image synthesis is an integral part of many X2I translation systems, including image-to-image, text-to-image and audio-to-image translation systems. Training these large systems generally requires huge amounts of training data.
Therefore, we investigate knowledge distillation to transfer knowledge from a high-quality unconditioned generative model (e.g., StyleGAN) to a conditioned synthetic image generation modules in a variety of systems. To initialize the conditional and reference branch (from a unconditional GAN) we exploit the style mixing characteristics of high-quality GANs to generate an infinite supply of style-mixed triplets to perform the knowledge distillation. Extensive experimental results in a number of image generation tasks (i.e., image-to-image, semantic segmentation-to-image, text-to-image and audio-to-image) demonstrate qualitatively and quantitatively that our method successfully transfers knowledge to the synthetic image generation modules, resulting in more realistic images than previous methods as confirmed by a significant drop in the FID. |
||||
Address | Virtual | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | ICLR | ||
Notes | LAMP; 600.147 | Approved | no | ||
Call Number | Admin @ si @ WWY2022 | Serial | 3791 | ||
Permanent link to this record | |||||
Author | Kai Wang; Fei Yang; Joost Van de Weijer | ||||
Title | Attention Distillation: self-supervised vision transformer students need more guidance | Type | Conference Article | ||
Year | 2022 | Publication | 33rd British Machine Vision Conference | Abbreviated Journal | |
Volume | Issue | Pages | |||
Keywords | |||||
Abstract | Self-supervised learning has been widely applied to train high-quality vision transformers. Unleashing their excellent performance on memory and compute constraint devices is therefore an important research topic. However, how to distill knowledge from one self-supervised ViT to another has not yet been explored. Moreover, the existing self-supervised knowledge distillation (SSKD) methods focus on ConvNet based architectures are suboptimal for ViT knowledge distillation. In this paper, we study knowledge distillation of self-supervised vision transformers (ViT-SSKD). We show that directly distilling information from the crucial attention mechanism from teacher to student can significantly narrow the performance gap between both. In experiments on ImageNet-Subset and ImageNet-1K, we show that our method AttnDistill outperforms existing self-supervised knowledge distillation (SSKD) methods and achieves state-of-the-art k-NN accuracy compared with self-supervised learning (SSL) methods learning from scratch (with the ViT-S model). We are also the first to apply the tiny ViT-T model on self-supervised learning. Moreover, AttnDistill is independent of self-supervised learning algorithms, it can be adapted to ViT based SSL methods to improve the performance in future research. | ||||
Address | London; UK; November 2022 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | BMVC | ||
Notes | LAMP; 600.147 | Approved | no | ||
Call Number | Admin @ si @ WYW2022 | Serial | 3793 | ||
Permanent link to this record | |||||
Author | Kai Wang; Chenshen Wu; Andrew Bagdanov; Xialei Liu; Shiqi Yang; Shangling Jui; Joost Van de Weijer | ||||
Title | Positive Pair Distillation Considered Harmful: Continual Meta Metric Learning for Lifelong Object Re-Identification | Type | Conference Article | ||
Year | 2022 | Publication | 33rd British Machine Vision Conference | Abbreviated Journal | |
Volume | Issue | Pages | |||
Keywords | |||||
Abstract | Lifelong object re-identification incrementally learns from a stream of re-identification tasks. The objective is to learn a representation that can be applied to all tasks and that generalizes to previously unseen re-identification tasks. The main challenge is that at inference time the representation must generalize to previously unseen identities. To address this problem, we apply continual meta metric learning to lifelong object re-identification. To prevent forgetting of previous tasks, we use knowledge distillation and explore the roles of positive and negative pairs. Based on our observation that the distillation and metric losses are antagonistic, we propose to remove positive pairs from distillation to robustify model updates. Our method, called Distillation without Positive Pairs (DwoPP), is evaluated on extensive intra-domain experiments on person and vehicle re-identification datasets, as well as inter-domain experiments on the LReID benchmark. Our experiments demonstrate that DwoPP significantly outperforms the state-of-the-art. | ||||
Address | London; UK; November 2022 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | BMVC | ||
Notes | LAMP; 600.147 | Approved | no | ||
Call Number | Admin @ si @ WWB2022 | Serial | 3794 | ||
Permanent link to this record | |||||
Author | Vishwesh Pillai; Pranav Mehar; Manisha Das; Deep Gupta; Petia Radeva | ||||
Title | Integrated Hierarchical and Flat Classifiers for Food Image Classification using Epistemic Uncertainty | Type | Conference Article | ||
Year | 2022 | Publication | IEEE International Conference on Signal Processing and Communications | Abbreviated Journal | |
Volume | Issue | Pages | |||
Keywords | |||||
Abstract | The problem of food image recognition is an essential one in today’s context because health conditions such as diabetes, obesity, and heart disease require constant monitoring of a person’s diet. To automate this process, several models are available to recognize food images. Due to a considerable number of unique food dishes and various cuisines, a traditional flat classifier ceases to perform well. To address this issue, prediction schemes consisting of both flat and hierarchical classifiers, with the analysis of epistemic uncertainty are used to switch between the classifiers. However, the accuracy of the predictions made using epistemic uncertainty data remains considerably low. Therefore, this paper presents a prediction scheme using three different threshold criteria that helps to increase the accuracy of epistemic uncertainty predictions. The performance of the proposed method is demonstrated using several experiments performed on the MAFood-121 dataset. The experimental results validate the proposal performance and show that the proposed threshold criteria help to increase the overall accuracy of the predictions by correctly classifying the uncertainty distribution of the samples. | ||||
Address | Bangalore; India; July 2022 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | SPCOM | ||
Notes | MILAB; no menciona | Approved | no | ||
Call Number | Admin @ si @ PMD2022 | Serial | 3796 | ||
Permanent link to this record | |||||
Author | Javier Rodenas; Bhalaji Nagarajan; Marc Bolaños; Petia Radeva | ||||
Title | Learning Multi-Subset of Classes for Fine-Grained Food Recognition | Type | Conference Article | ||
Year | 2022 | Publication | 7th International Workshop on Multimedia Assisted Dietary Management | Abbreviated Journal | |
Volume | Issue | Pages | 17–26 | ||
Keywords | |||||
Abstract | Food image recognition is a complex computer vision task, because of the large number of fine-grained food classes. Fine-grained recognition tasks focus on learning subtle discriminative details to distinguish similar classes. In this paper, we introduce a new method to improve the classification of classes that are more difficult to discriminate based on Multi-Subsets learning. Using a pre-trained network, we organize classes in multiple subsets using a clustering technique. Later, we embed these subsets in a multi-head model structure. This structure has three distinguishable parts. First, we use several shared blocks to learn the generalized representation of the data. Second, we use multiple specialized blocks focusing on specific subsets that are difficult to distinguish. Lastly, we use a fully connected layer to weight the different subsets in an end-to-end manner by combining the neuron outputs. We validated our proposed method using two recent state-of-the-art vision transformers on three public food recognition datasets. Our method was successful in learning the confused classes better and we outperformed the state-of-the-art on the three datasets. | ||||
Address | Lisboa; Portugal; October 2022 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | MADiMa | ||
Notes | MILAB | Approved | no | ||
Call Number | Admin @ si @ RNB2022 | Serial | 3797 | ||
Permanent link to this record | |||||
Author | Silvio Giancola; Anthony Cioppa; Adrien Deliege; Floriane Magera; Vladimir Somers; Le Kang; Xin Zhou; Olivier Barnich; Christophe De Vleeschouwer; Alexandre Alahi; Bernard Ghanem; Marc Van Droogenbroeck; Abdulrahman Darwish; Adrien Maglo; Albert Clapes; Andreas Luyts; Andrei Boiarov; Artur Xarles; Astrid Orcesi; Avijit Shah; Baoyu Fan; Bharath Comandur; Chen Chen; Chen Zhang; Chen Zhao; Chengzhi Lin; Cheuk-Yiu Chan; Chun Chuen Hui; Dengjie Li; Fan Yang; Fan Liang; Fang Da; Feng Yan; Fufu Yu; Guanshuo Wang; H. Anthony Chan; He Zhu; Hongwei Kan; Jiaming Chu; Jianming Hu; Jianyang Gu; Jin Chen; Joao V. B. Soares; Jonas Theiner; Jorge De Corte; Jose Henrique Brito; Jun Zhang; Junjie Li; Junwei Liang; Leqi Shen; Lin Ma; Lingchi Chen; Miguel Santos Marques; Mike Azatov; Nikita Kasatkin; Ning Wang; Qiong Jia; Quoc Cuong Pham; Ralph Ewerth; Ran Song; Rengang Li; Rikke Gade; Ruben Debien; Runze Zhang; Sangrok Lee; Sergio Escalera; Shan Jiang; Shigeyuki Odashima; Shimin Chen; Shoichi Masui; Shouhong Ding; Sin-wai Chan; Siyu Chen; Tallal El-Shabrawy; Tao He; Thomas B. Moeslund; Wan-Chi Siu; Wei Zhang; Wei Li; Xiangwei Wang; Xiao Tan; Xiaochuan Li; Xiaolin Wei; Xiaoqing Ye; Xing Liu; Xinying Wang; Yandong Guo; Yaqian Zhao; Yi Yu; Yingying Li; Yue He; Yujie Zhong; Zhenhua Guo; Zhiheng Li | ||||
Title | SoccerNet 2022 Challenges Results | Type | Conference Article | ||
Year | 2022 | Publication | 5th International ACM Workshop on Multimedia Content Analysis in Sports | Abbreviated Journal | |
Volume | Issue | Pages | 75-86 | ||
Keywords | |||||
Abstract | The SoccerNet 2022 challenges were the second annual video understanding challenges organized by the SoccerNet team. In 2022, the challenges were composed of 6 vision-based tasks: (1) action spotting, focusing on retrieving action timestamps in long untrimmed videos, (2) replay grounding, focusing on retrieving the live moment of an action shown in a replay, (3) pitch localization, focusing on detecting line and goal part elements, (4) camera calibration, dedicated to retrieving the intrinsic and extrinsic camera parameters, (5) player re-identification, focusing on retrieving the same players across multiple views, and (6) multiple object tracking, focusing on tracking players and the ball through unedited video streams. Compared to last year's challenges, tasks (1-2) had their evaluation metrics redefined to consider tighter temporal accuracies, and tasks (3-6) were novel, including their underlying data and annotations. More information on the tasks, challenges and leaderboards are available on this https URL. Baselines and development kits are available on this https URL. | ||||
Address | Lisboa; Portugal; October 2022 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | ACMW | ||
Notes | HUPBA; no menciona | Approved | no | ||
Call Number | Admin @ si @ GCD2022 | Serial | 3801 | ||
Permanent link to this record | |||||
Author | Patricia Suarez; Dario Carpio; Angel Sappa; Henry Velesaca | ||||
Title | Transformer based Image Dehazing | Type | Conference Article | ||
Year | 2022 | Publication | 16th IEEE International Conference on Signal Image Technology & Internet Based System | Abbreviated Journal | |
Volume | Issue | Pages | |||
Keywords | atmospheric light; brightness component; computational cost; dehazing quality; haze-free image | ||||
Abstract | This paper presents a novel approach to remove non homogeneous haze from real images. The proposed method consists mainly of image feature extraction, haze removal, and image reconstruction. To accomplish this challenging task, we propose an architecture based on transformers, which have been recently introduced and have shown great potential in different computer vision tasks. Our model is based on the SwinIR an image restoration architecture based on a transformer, but by modifying the deep feature extraction module, the depth level of the model, and by applying a combined loss function that improves styling and adapts the model for the non-homogeneous haze removal present in images. The obtained results prove to be superior to those obtained by state-of-the-art models. | ||||
Address | Dijon; France; October 2022 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | SITIS | ||
Notes | MSIAU; no proj | Approved | no | ||
Call Number | Admin @ si @ SCS2022 | Serial | 3803 | ||
Permanent link to this record | |||||
Author | Angel Sappa; Patricia Suarez; Henry Velesaca; Dario Carpio | ||||
Title | Domain Adaptation in Image Dehazing: Exploring the Usage of Images from Virtual Scenarios | Type | Conference Article | ||
Year | 2022 | Publication | 16th International Conference on Computer Graphics, Visualization, Computer Vision and Image Processing | Abbreviated Journal | |
Volume | Issue | Pages | 85-92 | ||
Keywords | Domain adaptation; Synthetic hazed dataset; Dehazing | ||||
Abstract | This work presents a novel domain adaptation strategy for deep learning-based approaches to solve the image dehazing
problem. Firstly, a large set of synthetic images is generated by using a realistic 3D graphic simulator; these synthetic images contain different densities of haze, which are used for training the model that is later adapted to any real scenario. The adaptation process requires just a few images to fine-tune the model parameters. The proposed strategy allows overcoming the limitation of training a given model with few images. In other words, the proposed strategy implements the adaptation of a haze removal model trained with synthetic images to real scenarios. It should be noticed that it is quite difficult, if not impossible, to have large sets of pairs of real-world images (with and without haze) to train in a supervised way dehazing algorithms. Experimental results are provided showing the validity of the proposed domain adaptation strategy. |
||||
Address | Lisboa; Portugal; July 2022 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | CGVCVIP | ||
Notes | MSIAU; no proj | Approved | no | ||
Call Number | Admin @ si @ SSV2022 | Serial | 3804 | ||
Permanent link to this record | |||||
Author | Michael Teutsch; Angel Sappa; Riad I. Hammoud | ||||
Title | Cross-Spectral Image Processing | Type | Book Chapter | ||
Year | 2022 | Publication | Computer Vision in the Infrared Spectrum. Synthesis Lectures on Computer Vision | Abbreviated Journal | |
Volume | Issue | Pages | 23-34 | ||
Keywords | |||||
Abstract | Although this book is on IR computer vision and its main focus lies on IR image and video processing and analysis, a special attention is dedicated to cross-spectral image processing due to the increasing number of publications and applications in this domain. In these cross-spectral frameworks, IR information is used together with information from other spectral bands to tackle some specific problems by developing more robust solutions. Tasks considered for cross-spectral processing are for instance dehazing, segmentation, vegetation index estimation, or face recognition. This increasing number of applications is motivated by cross- and multi-spectral camera setups available already on the market like for example smartphones, remote sensing multispectral cameras, or multi-spectral cameras for automotive systems or drones. In this chapter, different cross-spectral image processing techniques will be reviewed together with possible applications. Initially, image registration approaches for the cross-spectral case are reviewed: the registration stage is the first image processing task, which is needed to align images acquired by different sensors within the same reference coordinate system. Then, recent cross-spectral image colorization approaches, which are intended to colorize infrared images for different applications are presented. Finally, the cross-spectral image enhancement problem is tackled by including guided super resolution techniques, image dehazing approaches, cross-spectral filtering and edge detection. Figure 3.1 illustrates cross-spectral image processing stages as well as their possible connections. Table 3.1 presents some of the available public cross-spectral datasets generally used as reference data to evaluate cross-spectral image registration, colorization, enhancement, or exploitation results. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Springer | Place of Publication | Editor | ||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | SLCV | ||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | 978-3-031-00698-2 | Medium | ||
Area | Expedition | Conference | |||
Notes | MSIAU; MACO | Approved | no | ||
Call Number | Admin @ si @ TSH2022b | Serial | 3805 | ||
Permanent link to this record | |||||
Author | Michael Teutsch; Angel Sappa; Riad I. Hammoud | ||||
Title | Detection, Classification, and Tracking | Type | Book Chapter | ||
Year | 2022 | Publication | Computer Vision in the Infrared Spectrum. Synthesis Lectures on Computer Vision | Abbreviated Journal | |
Volume | Issue | Pages | 35-58 | ||
Keywords | |||||
Abstract | Automatic image and video exploitation or content analysis is a technique to extract higher-level information from a scene such as objects, behavior, (inter-)actions, environment, or even weather conditions. The relevant information is assumed to be contained in the two-dimensional signal provided in an image (width and height in pixels) or the three-dimensional signal provided in a video (width, height, and time). But also intermediate-level information such as object classes [196], locations [197], or motion [198] can help applications to fulfill certain tasks such as intelligent compression [199], video summarization [200], or video retrieval [201]. Usually, videos with their temporal dimension are a richer source of data compared to single images [202] and thus certain video content can be extracted from videos only such as object motion or object behavior. Often, machine learning or nowadays deep learning techniques are utilized to model prior knowledge about object or scene appearance using labeled training samples [203, 204]. After a learning phase, these models are then applied in real world applications, which is called inference. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Springer | Place of Publication | Editor | ||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | SLCV | ||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | 978-3-031-00698-2 | Medium | ||
Area | Expedition | Conference | |||
Notes | MSIAU; MACO | Approved | no | ||
Call Number | Admin @ si @ TSH2022c | Serial | 3806 | ||
Permanent link to this record | |||||
Author | Michael Teutsch; Angel Sappa; Riad I. Hammoud | ||||
Title | Image and Video Enhancement | Type | Book Chapter | ||
Year | 2022 | Publication | Computer Vision in the Infrared Spectrum. Synthesis Lectures on Computer Vision | Abbreviated Journal | |
Volume | Issue | Pages | 9-21 | ||
Keywords | |||||
Abstract | Image and video enhancement aims at improving the signal quality relative to imaging artifacts such as noise and blur or atmospheric perturbations such as turbulence and haze. It is usually performed in order to assist humans in analyzing image and video content or simply to present humans visually appealing images and videos. However, image and video enhancement can also be used as a preprocessing technique to ease the task and thus improve the performance of subsequent automatic image content analysis algorithms: preceding dehazing can improve object detection as shown by [23] or explicit turbulence modeling can improve moving object detection as discussed by [24]. But it remains an open question whether image and video enhancement should rather be performed explicitly as a preprocessing step or implicitly for example by feeding affected images directly to a neural network for image content analysis like object detection [25]. Especially for real-time video processing at low latency it can be better to handle image perturbation implicitly in order to minimize the processing time of an algorithm. This can be achieved by making algorithms for image content analysis robust or even invariant to perturbations such as noise or blur. Additionally, mistakes of an individual preprocessing module can obviously affect the quality of the entire processing pipeline. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Springer | Place of Publication | Editor | ||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | SLCV | ||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | MSIAU; MACO | Approved | no | ||
Call Number | Admin @ si @ TSH2022a | Serial | 3807 | ||
Permanent link to this record |