Home | << 1 2 3 4 5 6 7 8 9 10 >> |
Records | |||||
---|---|---|---|---|---|
Author | Kai Wang; Fei Yang; Joost Van de Weijer | ||||
Title | Attention Distillation: self-supervised vision transformer students need more guidance | Type | Conference Article | ||
Year | 2022 | Publication | 33rd British Machine Vision Conference | Abbreviated Journal | |
Volume | Issue | Pages | |||
Keywords | |||||
Abstract | Self-supervised learning has been widely applied to train high-quality vision transformers. Unleashing their excellent performance on memory and compute constraint devices is therefore an important research topic. However, how to distill knowledge from one self-supervised ViT to another has not yet been explored. Moreover, the existing self-supervised knowledge distillation (SSKD) methods focus on ConvNet based architectures are suboptimal for ViT knowledge distillation. In this paper, we study knowledge distillation of self-supervised vision transformers (ViT-SSKD). We show that directly distilling information from the crucial attention mechanism from teacher to student can significantly narrow the performance gap between both. In experiments on ImageNet-Subset and ImageNet-1K, we show that our method AttnDistill outperforms existing self-supervised knowledge distillation (SSKD) methods and achieves state-of-the-art k-NN accuracy compared with self-supervised learning (SSL) methods learning from scratch (with the ViT-S model). We are also the first to apply the tiny ViT-T model on self-supervised learning. Moreover, AttnDistill is independent of self-supervised learning algorithms, it can be adapted to ViT based SSL methods to improve the performance in future research. | ||||
Address | London; UK; November 2022 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | BMVC | ||
Notes | LAMP; 600.147 | Approved | no | ||
Call Number | Admin @ si @ WYW2022 | Serial | 3793 | ||
Permanent link to this record | |||||
Author | Kai Wang; Chenshen Wu; Andrew Bagdanov; Xialei Liu; Shiqi Yang; Shangling Jui; Joost Van de Weijer | ||||
Title | Positive Pair Distillation Considered Harmful: Continual Meta Metric Learning for Lifelong Object Re-Identification | Type | Conference Article | ||
Year | 2022 | Publication | 33rd British Machine Vision Conference | Abbreviated Journal | |
Volume | Issue | Pages | |||
Keywords | |||||
Abstract | Lifelong object re-identification incrementally learns from a stream of re-identification tasks. The objective is to learn a representation that can be applied to all tasks and that generalizes to previously unseen re-identification tasks. The main challenge is that at inference time the representation must generalize to previously unseen identities. To address this problem, we apply continual meta metric learning to lifelong object re-identification. To prevent forgetting of previous tasks, we use knowledge distillation and explore the roles of positive and negative pairs. Based on our observation that the distillation and metric losses are antagonistic, we propose to remove positive pairs from distillation to robustify model updates. Our method, called Distillation without Positive Pairs (DwoPP), is evaluated on extensive intra-domain experiments on person and vehicle re-identification datasets, as well as inter-domain experiments on the LReID benchmark. Our experiments demonstrate that DwoPP significantly outperforms the state-of-the-art. | ||||
Address | London; UK; November 2022 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | BMVC | ||
Notes | LAMP; 600.147 | Approved | no | ||
Call Number | Admin @ si @ WWB2022 | Serial | 3794 | ||
Permanent link to this record | |||||
Author | Vishwesh Pillai; Pranav Mehar; Manisha Das; Deep Gupta; Petia Radeva | ||||
Title | Integrated Hierarchical and Flat Classifiers for Food Image Classification using Epistemic Uncertainty | Type | Conference Article | ||
Year | 2022 | Publication | IEEE International Conference on Signal Processing and Communications | Abbreviated Journal | |
Volume | Issue | Pages | |||
Keywords | |||||
Abstract | The problem of food image recognition is an essential one in today’s context because health conditions such as diabetes, obesity, and heart disease require constant monitoring of a person’s diet. To automate this process, several models are available to recognize food images. Due to a considerable number of unique food dishes and various cuisines, a traditional flat classifier ceases to perform well. To address this issue, prediction schemes consisting of both flat and hierarchical classifiers, with the analysis of epistemic uncertainty are used to switch between the classifiers. However, the accuracy of the predictions made using epistemic uncertainty data remains considerably low. Therefore, this paper presents a prediction scheme using three different threshold criteria that helps to increase the accuracy of epistemic uncertainty predictions. The performance of the proposed method is demonstrated using several experiments performed on the MAFood-121 dataset. The experimental results validate the proposal performance and show that the proposed threshold criteria help to increase the overall accuracy of the predictions by correctly classifying the uncertainty distribution of the samples. | ||||
Address | Bangalore; India; July 2022 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | SPCOM | ||
Notes | MILAB; no menciona | Approved | no | ||
Call Number | Admin @ si @ PMD2022 | Serial | 3796 | ||
Permanent link to this record | |||||
Author | Javier Rodenas; Bhalaji Nagarajan; Marc Bolaños; Petia Radeva | ||||
Title | Learning Multi-Subset of Classes for Fine-Grained Food Recognition | Type | Conference Article | ||
Year | 2022 | Publication | 7th International Workshop on Multimedia Assisted Dietary Management | Abbreviated Journal | |
Volume | Issue | Pages | 17–26 | ||
Keywords | |||||
Abstract | Food image recognition is a complex computer vision task, because of the large number of fine-grained food classes. Fine-grained recognition tasks focus on learning subtle discriminative details to distinguish similar classes. In this paper, we introduce a new method to improve the classification of classes that are more difficult to discriminate based on Multi-Subsets learning. Using a pre-trained network, we organize classes in multiple subsets using a clustering technique. Later, we embed these subsets in a multi-head model structure. This structure has three distinguishable parts. First, we use several shared blocks to learn the generalized representation of the data. Second, we use multiple specialized blocks focusing on specific subsets that are difficult to distinguish. Lastly, we use a fully connected layer to weight the different subsets in an end-to-end manner by combining the neuron outputs. We validated our proposed method using two recent state-of-the-art vision transformers on three public food recognition datasets. Our method was successful in learning the confused classes better and we outperformed the state-of-the-art on the three datasets. | ||||
Address | Lisboa; Portugal; October 2022 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | MADiMa | ||
Notes | MILAB | Approved | no | ||
Call Number | Admin @ si @ RNB2022 | Serial | 3797 | ||
Permanent link to this record | |||||
Author | Silvio Giancola; Anthony Cioppa; Adrien Deliege; Floriane Magera; Vladimir Somers; Le Kang; Xin Zhou; Olivier Barnich; Christophe De Vleeschouwer; Alexandre Alahi; Bernard Ghanem; Marc Van Droogenbroeck; Abdulrahman Darwish; Adrien Maglo; Albert Clapes; Andreas Luyts; Andrei Boiarov; Artur Xarles; Astrid Orcesi; Avijit Shah; Baoyu Fan; Bharath Comandur; Chen Chen; Chen Zhang; Chen Zhao; Chengzhi Lin; Cheuk-Yiu Chan; Chun Chuen Hui; Dengjie Li; Fan Yang; Fan Liang; Fang Da; Feng Yan; Fufu Yu; Guanshuo Wang; H. Anthony Chan; He Zhu; Hongwei Kan; Jiaming Chu; Jianming Hu; Jianyang Gu; Jin Chen; Joao V. B. Soares; Jonas Theiner; Jorge De Corte; Jose Henrique Brito; Jun Zhang; Junjie Li; Junwei Liang; Leqi Shen; Lin Ma; Lingchi Chen; Miguel Santos Marques; Mike Azatov; Nikita Kasatkin; Ning Wang; Qiong Jia; Quoc Cuong Pham; Ralph Ewerth; Ran Song; Rengang Li; Rikke Gade; Ruben Debien; Runze Zhang; Sangrok Lee; Sergio Escalera; Shan Jiang; Shigeyuki Odashima; Shimin Chen; Shoichi Masui; Shouhong Ding; Sin-wai Chan; Siyu Chen; Tallal El-Shabrawy; Tao He; Thomas B. Moeslund; Wan-Chi Siu; Wei Zhang; Wei Li; Xiangwei Wang; Xiao Tan; Xiaochuan Li; Xiaolin Wei; Xiaoqing Ye; Xing Liu; Xinying Wang; Yandong Guo; Yaqian Zhao; Yi Yu; Yingying Li; Yue He; Yujie Zhong; Zhenhua Guo; Zhiheng Li | ||||
Title | SoccerNet 2022 Challenges Results | Type | Conference Article | ||
Year | 2022 | Publication | 5th International ACM Workshop on Multimedia Content Analysis in Sports | Abbreviated Journal | |
Volume | Issue | Pages | 75-86 | ||
Keywords | |||||
Abstract | The SoccerNet 2022 challenges were the second annual video understanding challenges organized by the SoccerNet team. In 2022, the challenges were composed of 6 vision-based tasks: (1) action spotting, focusing on retrieving action timestamps in long untrimmed videos, (2) replay grounding, focusing on retrieving the live moment of an action shown in a replay, (3) pitch localization, focusing on detecting line and goal part elements, (4) camera calibration, dedicated to retrieving the intrinsic and extrinsic camera parameters, (5) player re-identification, focusing on retrieving the same players across multiple views, and (6) multiple object tracking, focusing on tracking players and the ball through unedited video streams. Compared to last year's challenges, tasks (1-2) had their evaluation metrics redefined to consider tighter temporal accuracies, and tasks (3-6) were novel, including their underlying data and annotations. More information on the tasks, challenges and leaderboards are available on this https URL. Baselines and development kits are available on this https URL. | ||||
Address | Lisboa; Portugal; October 2022 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | ACMW | ||
Notes | HUPBA; no menciona | Approved | no | ||
Call Number | Admin @ si @ GCD2022 | Serial | 3801 | ||
Permanent link to this record | |||||
Author | Patricia Suarez; Dario Carpio; Angel Sappa; Henry Velesaca | ||||
Title | Transformer based Image Dehazing | Type | Conference Article | ||
Year | 2022 | Publication | 16th IEEE International Conference on Signal Image Technology & Internet Based System | Abbreviated Journal | |
Volume | Issue | Pages | |||
Keywords | atmospheric light; brightness component; computational cost; dehazing quality; haze-free image | ||||
Abstract | This paper presents a novel approach to remove non homogeneous haze from real images. The proposed method consists mainly of image feature extraction, haze removal, and image reconstruction. To accomplish this challenging task, we propose an architecture based on transformers, which have been recently introduced and have shown great potential in different computer vision tasks. Our model is based on the SwinIR an image restoration architecture based on a transformer, but by modifying the deep feature extraction module, the depth level of the model, and by applying a combined loss function that improves styling and adapts the model for the non-homogeneous haze removal present in images. The obtained results prove to be superior to those obtained by state-of-the-art models. | ||||
Address | Dijon; France; October 2022 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | SITIS | ||
Notes | MSIAU; no proj | Approved | no | ||
Call Number | Admin @ si @ SCS2022 | Serial | 3803 | ||
Permanent link to this record | |||||
Author | Angel Sappa; Patricia Suarez; Henry Velesaca; Dario Carpio | ||||
Title | Domain Adaptation in Image Dehazing: Exploring the Usage of Images from Virtual Scenarios | Type | Conference Article | ||
Year | 2022 | Publication | 16th International Conference on Computer Graphics, Visualization, Computer Vision and Image Processing | Abbreviated Journal | |
Volume | Issue | Pages | 85-92 | ||
Keywords | Domain adaptation; Synthetic hazed dataset; Dehazing | ||||
Abstract | This work presents a novel domain adaptation strategy for deep learning-based approaches to solve the image dehazing
problem. Firstly, a large set of synthetic images is generated by using a realistic 3D graphic simulator; these synthetic images contain different densities of haze, which are used for training the model that is later adapted to any real scenario. The adaptation process requires just a few images to fine-tune the model parameters. The proposed strategy allows overcoming the limitation of training a given model with few images. In other words, the proposed strategy implements the adaptation of a haze removal model trained with synthetic images to real scenarios. It should be noticed that it is quite difficult, if not impossible, to have large sets of pairs of real-world images (with and without haze) to train in a supervised way dehazing algorithms. Experimental results are provided showing the validity of the proposed domain adaptation strategy. |
||||
Address | Lisboa; Portugal; July 2022 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | CGVCVIP | ||
Notes | MSIAU; no proj | Approved | no | ||
Call Number | Admin @ si @ SSV2022 | Serial | 3804 | ||
Permanent link to this record | |||||
Author | Michael Teutsch; Angel Sappa; Riad I. Hammoud | ||||
Title | Cross-Spectral Image Processing | Type | Book Chapter | ||
Year | 2022 | Publication | Computer Vision in the Infrared Spectrum. Synthesis Lectures on Computer Vision | Abbreviated Journal | |
Volume | Issue | Pages | 23-34 | ||
Keywords | |||||
Abstract | Although this book is on IR computer vision and its main focus lies on IR image and video processing and analysis, a special attention is dedicated to cross-spectral image processing due to the increasing number of publications and applications in this domain. In these cross-spectral frameworks, IR information is used together with information from other spectral bands to tackle some specific problems by developing more robust solutions. Tasks considered for cross-spectral processing are for instance dehazing, segmentation, vegetation index estimation, or face recognition. This increasing number of applications is motivated by cross- and multi-spectral camera setups available already on the market like for example smartphones, remote sensing multispectral cameras, or multi-spectral cameras for automotive systems or drones. In this chapter, different cross-spectral image processing techniques will be reviewed together with possible applications. Initially, image registration approaches for the cross-spectral case are reviewed: the registration stage is the first image processing task, which is needed to align images acquired by different sensors within the same reference coordinate system. Then, recent cross-spectral image colorization approaches, which are intended to colorize infrared images for different applications are presented. Finally, the cross-spectral image enhancement problem is tackled by including guided super resolution techniques, image dehazing approaches, cross-spectral filtering and edge detection. Figure 3.1 illustrates cross-spectral image processing stages as well as their possible connections. Table 3.1 presents some of the available public cross-spectral datasets generally used as reference data to evaluate cross-spectral image registration, colorization, enhancement, or exploitation results. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Springer | Place of Publication | Editor | ||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | SLCV | ||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | 978-3-031-00698-2 | Medium | ||
Area | Expedition | Conference | |||
Notes | MSIAU; MACO | Approved | no | ||
Call Number | Admin @ si @ TSH2022b | Serial | 3805 | ||
Permanent link to this record | |||||
Author | Michael Teutsch; Angel Sappa; Riad I. Hammoud | ||||
Title | Detection, Classification, and Tracking | Type | Book Chapter | ||
Year | 2022 | Publication | Computer Vision in the Infrared Spectrum. Synthesis Lectures on Computer Vision | Abbreviated Journal | |
Volume | Issue | Pages | 35-58 | ||
Keywords | |||||
Abstract | Automatic image and video exploitation or content analysis is a technique to extract higher-level information from a scene such as objects, behavior, (inter-)actions, environment, or even weather conditions. The relevant information is assumed to be contained in the two-dimensional signal provided in an image (width and height in pixels) or the three-dimensional signal provided in a video (width, height, and time). But also intermediate-level information such as object classes [196], locations [197], or motion [198] can help applications to fulfill certain tasks such as intelligent compression [199], video summarization [200], or video retrieval [201]. Usually, videos with their temporal dimension are a richer source of data compared to single images [202] and thus certain video content can be extracted from videos only such as object motion or object behavior. Often, machine learning or nowadays deep learning techniques are utilized to model prior knowledge about object or scene appearance using labeled training samples [203, 204]. After a learning phase, these models are then applied in real world applications, which is called inference. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Springer | Place of Publication | Editor | ||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | SLCV | ||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | 978-3-031-00698-2 | Medium | ||
Area | Expedition | Conference | |||
Notes | MSIAU; MACO | Approved | no | ||
Call Number | Admin @ si @ TSH2022c | Serial | 3806 | ||
Permanent link to this record | |||||
Author | Michael Teutsch; Angel Sappa; Riad I. Hammoud | ||||
Title | Image and Video Enhancement | Type | Book Chapter | ||
Year | 2022 | Publication | Computer Vision in the Infrared Spectrum. Synthesis Lectures on Computer Vision | Abbreviated Journal | |
Volume | Issue | Pages | 9-21 | ||
Keywords | |||||
Abstract | Image and video enhancement aims at improving the signal quality relative to imaging artifacts such as noise and blur or atmospheric perturbations such as turbulence and haze. It is usually performed in order to assist humans in analyzing image and video content or simply to present humans visually appealing images and videos. However, image and video enhancement can also be used as a preprocessing technique to ease the task and thus improve the performance of subsequent automatic image content analysis algorithms: preceding dehazing can improve object detection as shown by [23] or explicit turbulence modeling can improve moving object detection as discussed by [24]. But it remains an open question whether image and video enhancement should rather be performed explicitly as a preprocessing step or implicitly for example by feeding affected images directly to a neural network for image content analysis like object detection [25]. Especially for real-time video processing at low latency it can be better to handle image perturbation implicitly in order to minimize the processing time of an algorithm. This can be achieved by making algorithms for image content analysis robust or even invariant to perturbations such as noise or blur. Additionally, mistakes of an individual preprocessing module can obviously affect the quality of the entire processing pipeline. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Springer | Place of Publication | Editor | ||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | SLCV | ||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | MSIAU; MACO | Approved | no | ||
Call Number | Admin @ si @ TSH2022a | Serial | 3807 | ||
Permanent link to this record | |||||
Author | Alex Falcon; Swathikiran Sudhakaran; Giuseppe Serra; Sergio Escalera; Oswald Lanz | ||||
Title | Relevance-based Margin for Contrastively-trained Video Retrieval Models | Type | Conference Article | ||
Year | 2022 | Publication | ICMR '22: Proceedings of the 2022 International Conference on Multimedia Retrieval | Abbreviated Journal | |
Volume | Issue | Pages | 146-157 | ||
Keywords | |||||
Abstract | Video retrieval using natural language queries has attracted increasing interest due to its relevance in real-world applications, from intelligent access in private media galleries to web-scale video search. Learning the cross-similarity of video and text in a joint embedding space is the dominant approach. To do so, a contrastive loss is usually employed because it organizes the embedding space by putting similar items close and dissimilar items far. This framework leads to competitive recall rates, as they solely focus on the rank of the groundtruth items. Yet, assessing the quality of the ranking list is of utmost importance when considering intelligent retrieval systems, since multiple items may share similar semantics, hence a high relevance. Moreover, the aforementioned framework uses a fixed margin to separate similar and dissimilar items, treating all non-groundtruth items as equally irrelevant. In this paper we propose to use a variable margin: we argue that varying the margin used during training based on how much relevant an item is to a given query, i.e. a relevance-based margin, easily improves the quality of the ranking lists measured through nDCG and mAP. We demonstrate the advantages of our technique using different models on EPIC-Kitchens-100 and YouCook2. We show that even if we carefully tuned the fixed margin, our technique (which does not have the margin as a hyper-parameter) would still achieve better performance. Finally, extensive ablation studies and qualitative analysis support the robustness of our approach. Code will be released at \urlhttps://github.com/aranciokov/RelevanceMargin-ICMR22. | ||||
Address | Newwark, NJ, USA, 27 June 2022 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | ICMR | ||
Notes | HuPBA; no menciona | Approved | no | ||
Call Number | Admin @ si @ FSS2022 | Serial | 3808 | ||
Permanent link to this record | |||||
Author | Shiqi Yang; Yaxing Wang; Kai Wang; Shangling Jui; Joost Van de Weijer | ||||
Title | Local Prediction Aggregation: A Frustratingly Easy Source-free Domain Adaptation Method | Type | Miscellaneous | ||
Year | 2022 | Publication | Arxiv | Abbreviated Journal | |
Volume | Issue | Pages | |||
Keywords | |||||
Abstract | We propose a simple but effective source-free domain adaptation (SFDA) method. Treating SFDA as an unsupervised clustering problem and following the intuition that local neighbors in feature space should have more similar predictions than other features, we propose to optimize an objective of prediction consistency. This objective encourages local neighborhood features in feature space to have similar predictions while features farther away in feature space have dissimilar predictions, leading to efficient feature clustering and cluster assignment simultaneously. For efficient training, we seek to optimize an upper-bound of the objective resulting in two simple terms. Furthermore, we relate popular existing methods in domain adaptation, source-free domain adaptation and contrastive learning via the perspective of discriminability and diversity. The experimental results prove the superiority of our method, and our method can be adopted as a simple but strong baseline for future research in SFDA. Our method can be also adapted to source-free open-set and partial-set DA which further shows the generalization ability of our method. Code is available in this https URL. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | LAMP; 600.147 | Approved | no | ||
Call Number | Admin @ si @ YWW2022b | Serial | 3815 | ||
Permanent link to this record | |||||
Author | Ali Furkan Biten; Ruben Tito; Lluis Gomez; Ernest Valveny; Dimosthenis Karatzas | ||||
Title | OCR-IDL: OCR Annotations for Industry Document Library Dataset | Type | Conference Article | ||
Year | 2022 | Publication | ECCV Workshop on Text in Everything | Abbreviated Journal | |
Volume | Issue | Pages | |||
Keywords | |||||
Abstract | Pretraining has proven successful in Document Intelligence tasks where deluge of documents are used to pretrain the models only later to be finetuned on downstream tasks. One of the problems of the pretraining approaches is the inconsistent usage of pretraining data with different OCR engines leading to incomparable results between models. In other words, it is not obvious whether the performance gain is coming from diverse usage of amount of data and distinct OCR engines or from the proposed models. To remedy the problem, we make public the OCR annotations for IDL documents using commercial OCR engine given their superior performance over open source OCR models. The contributed dataset (OCR-IDL) has an estimated monetary value over 20K US$. It is our hope that OCR-IDL can be a starting point for future works on Document Intelligence. All of our data and its collection process with the annotations can be found in this https URL. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | ECCV | ||
Notes | DAG; no proj | Approved | no | ||
Call Number | Admin @ si @ BTG2022 | Serial | 3817 | ||
Permanent link to this record | |||||
Author | Shiqi Yang; Yaxing Wang; Kai Wang; Shangling Jui; Joost Van de Weijer | ||||
Title | One Ring to Bring Them All: Towards Open-Set Recognition under Domain Shift | Type | Miscellaneous | ||
Year | 2022 | Publication | Arxiv | Abbreviated Journal | |
Volume | Issue | Pages | |||
Keywords | |||||
Abstract | In this paper, we investigate model adaptation under domain and category shift, where the final goal is to achieve
(SF-UNDA), which addresses the situation where there exist both domain and category shifts between source and target domains. Under the SF-UNDA setting, the model cannot access source data anymore during target adaptation, which aims to address data privacy concerns. We propose a novel training scheme to learn a ( +1)-way classifier to predict the source classes and the unknown class, where samples of only known source categories are available for training. Furthermore, for target adaptation, we simply adopt a weighted entropy minimization to adapt the source pretrained model to the unlabeled target domain without source data. In experiments, we show: After source training, the resulting source model can get excellent performance for ; After target adaptation, our method surpasses current UNDA approaches which demand source data during adaptation. The versatility to several different tasks strongly proves the efficacy and generalization ability of our method. When augmented with a closed-set domain adaptation approach during target adaptation, our source-free method further outperforms the current state-of-the-art UNDA method by 2.5%, 7.2% and 13% on Office-31, Office-Home and VisDA respectively. |
||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | LAMP; no proj | Approved | no | ||
Call Number | Admin @ si @ YWW2022c | Serial | 3818 | ||
Permanent link to this record | |||||
Author | Saiping Zhang, Luis Herranz, Marta Mrak, Marc Gorriz Blanch, Shuai Wan, Fuzheng Yang | ||||
Title | PeQuENet: Perceptual Quality Enhancement of Compressed Video with Adaptation-and Attention-based Network | Type | Miscellaneous | ||
Year | 2022 | Publication | Arxiv | Abbreviated Journal | |
Volume | Issue | Pages | |||
Keywords | |||||
Abstract | In this paper we propose a generative adversarial network (GAN) framework to enhance the perceptual quality of compressed videos. Our framework includes attention and adaptation to different quantization parameters (QPs) in a single model. The attention module exploits global receptive fields that can capture and align long-range correlations between consecutive frames, which can be beneficial for enhancing perceptual quality of videos. The frame to be enhanced is fed into the deep network together with its neighboring frames, and in the first stage features at different depths are extracted. Then extracted features are fed into attention blocks to explore global temporal correlations, followed by a series of upsampling and convolution layers. Finally, the resulting features are processed by the QP-conditional adaptation module which leverages the corresponding QP information. In this way, a single model can be used to enhance adaptively to various QPs without requiring multiple models specific for every QP value, while having similar performance. Experimental results demonstrate the superior performance of the proposed PeQuENet compared with the state-of-the-art compressed video quality enhancement algorithms. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | MACO; no proj | Approved | no | ||
Call Number | Admin @ si @ ZHM2022b | Serial | 3819 | ||
Permanent link to this record |