toggle visibility Search & Display Options

Select All    Deselect All
 |   | 
Details
   print
  Records Links
Author Mingyi Yang; Luis Herranz; Fei Yang; Luka Murn; Marc Gorriz Blanch; Shuai Wan; Fuzheng Yang; Marta Mrak edit  url
doi  openurl
  Title (up) Semantic Preprocessor for Image Compression for Machines Type Conference Article
  Year 2023 Publication IEEE International Conference on Acoustics, Speech and Signal Processing Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract Visual content is being increasingly transmitted and consumed by machines rather than humans to perform automated content analysis tasks. In this paper, we propose an image preprocessor that optimizes the input image for machine consumption prior to encoding by an off-the-shelf codec designed for human consumption. To achieve a better trade-off between the accuracy of the machine analysis task and bitrate, we propose leveraging pre-extracted semantic information to improve the preprocessor’s ability to accurately identify and filter out task-irrelevant information. Furthermore, we propose a two-part loss function to optimize the preprocessor, consisted of a rate-task performance loss and a semantic distillation loss, which helps the reconstructed image obtain more information that contributes to the accuracy of the task. Experiments show that the proposed preprocessor can save up to 48.83% bitrate compared with the method without the preprocessor, and save up to 36.24% bitrate compared to existing preprocessors for machine vision.  
  Address Rodhes Islands; Greece; June 2023  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference ICASSP  
  Notes MACO; LAMP Approved no  
  Call Number Admin @ si @ YHY2023 Serial 3912  
Permanent link to this record
 

 
Author Khanh Nguyen; Ali Furkan Biten; Andres Mafla; Lluis Gomez; Dimosthenis Karatzas edit  url
openurl 
  Title (up) Show, Interpret and Tell: Entity-Aware Contextualised Image Captioning in Wikipedia Type Conference Article
  Year 2023 Publication Proceedings of the 37th AAAI Conference on Artificial Intelligence Abbreviated Journal  
  Volume 37 Issue 2 Pages 1940-1948  
  Keywords  
  Abstract Humans exploit prior knowledge to describe images, and are able to adapt their explanation to specific contextual information given, even to the extent of inventing plausible explanations when contextual information and images do not match. In this work, we propose the novel task of captioning Wikipedia images by integrating contextual knowledge. Specifically, we produce models that jointly reason over Wikipedia articles, Wikimedia images and their associated descriptions to produce contextualized captions. The same Wikimedia image can be used to illustrate different articles, and the produced caption needs to be adapted to the specific context allowing us to explore the limits of the model to adjust captions to different contextual information. Dealing with out-of-dictionary words and Named Entities is a challenging task in this domain. To address this, we propose a pre-training objective, Masked Named Entity Modeling (MNEM), and show that this pretext task results to significantly improved models. Furthermore, we verify that a model pre-trained in Wikipedia generalizes well to News Captioning datasets. We further define two different test splits according to the difficulty of the captioning task. We offer insights on the role and the importance of each modality and highlight the limitations of our model.  
  Address Washington; USA; February 2023  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference AAAI  
  Notes DAG Approved no  
  Call Number Admin @ si @ NBM2023 Serial 3860  
Permanent link to this record
 

 
Author Parichehr Behjati; Pau Rodriguez; Carles Fernandez; Isabelle Hupont; Armin Mehri; Jordi Gonzalez edit  url
openurl 
  Title (up) Single image super-resolution based on directional variance attention network Type Journal Article
  Year 2023 Publication Pattern Recognition Abbreviated Journal PR  
  Volume 133 Issue Pages 108997  
  Keywords  
  Abstract Recent advances in single image super-resolution (SISR) explore the power of deep convolutional neural networks (CNNs) to achieve better performance. However, most of the progress has been made by scaling CNN architectures, which usually raise computational demands and memory consumption. This makes modern architectures less applicable in practice. In addition, most CNN-based SR methods do not fully utilize the informative hierarchical features that are helpful for final image recovery. In order to address these issues, we propose a directional variance attention network (DiVANet), a computationally efficient yet accurate network for SISR. Specifically, we introduce a novel directional variance attention (DiVA) mechanism to capture long-range spatial dependencies and exploit inter-channel dependencies simultaneously for more discriminative representations. Furthermore, we propose a residual attention feature group (RAFG) for parallelizing attention and residual block computation. The output of each residual block is linearly fused at the RAFG output to provide access to the whole feature hierarchy. In parallel, DiVA extracts most relevant features from the network for improving the final output and preventing information loss along the successive operations inside the network. Experimental results demonstrate the superiority of DiVANet over the state of the art in several datasets, while maintaining relatively low computation and memory footprint. The code is available at https://github.com/pbehjatii/DiVANet.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes ISE Approved no  
  Call Number Admin @ si @ BPF2023 Serial 3861  
Permanent link to this record
 

 
Author Anthony Cioppa; Silvio Giancola; Vladimir Somers; Floriane Magera; Xin Zhou; Hassan Mkhallati; Adrien Deliège; Jan Held; Carlos Hinojosa; Amir M. Mansourian; Pierre Miralles; Olivier Barnich; Christophe De Vleeschouwer; Alexandre Alahi; Bernard Ghanem; Marc Van Droogenbroeck; Abdullah Kamal; Adrien Maglo; Albert Clapes; Amr Abdelaziz; Artur Xarles; Astrid Orcesi; Atom Scott; Bin Liu; Byoungkwon Lim; Chen Chen; Fabian Deuser; Feng Yan; Fufu Yu; Gal Shitrit; Guanshuo Wang; Gyusik Choi; Hankyul Kim; Hao Guo; Hasby Fahrudin; Hidenari Koguchi; Håkan Ardo; Ibrahim Salah; Ido Yerushalmy; Iftikar Muhammad; Ikuma Uchida; Ishay Beery; Jaonary Rabarisoa; Jeongae Lee; Jiajun Fu; Jianqin Yin; Jinghang Xu; Jongho Nang; Julien Denize; Junjie Li; Junpei Zhang; Juntae Kim; Kamil Synowiec; Kenji Kobayashi; Kexin Zhang; Konrad Habel; Kota Nakajima; Licheng Jiao; Lin Ma; Lizhi Wang; Luping Wang; Menglong Li; Mengying Zhou; Mohamed Nasr; Mohamed Abdelwahed; Mykola Liashuha; Nikolay Falaleev; Norbert Oswald; Qiong Jia; Quoc-Cuong Pham; Ran Song; Romain Herault; Rui Peng; Ruilong Chen; Ruixuan Liu; Ruslan Baikulov; Ryuto Fukushima; Sergio Escalera; Seungcheon Lee; Shimin Chen; Shouhong Ding; Taiga Someya; Thomas B. Moeslund; Tianjiao Li; Wei Shen; Wei Zhang; Wei Li; Wei Dai; Weixin Luo; Wending Zhao; Wenjie Zhang; Xinquan Yang; Yanbiao Ma; Yeeun Joo; Yingsen Zeng; Yiyang Gan; Yongqiang Zhu; Yujie Zhong; Zheng Ruan; Zhiheng Li; Zhijian Huang; Ziyu Meng edit   pdf
url  openurl
  Title (up) SoccerNet 2023 Challenges Results Type Miscellaneous
  Year 2023 Publication Arxiv Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract The SoccerNet 2023 challenges were the third annual video understanding challenges organized by the SoccerNet team. For this third edition, the challenges were composed of seven vision-based tasks split into three main themes. The first theme, broadcast video understanding, is composed of three high-level tasks related to describing events occurring in the video broadcasts: (1) action spotting, focusing on retrieving all timestamps related to global actions in soccer, (2) ball action spotting, focusing on retrieving all timestamps related to the soccer ball change of state, and (3) dense video captioning, focusing on describing the broadcast with natural language and anchored timestamps. The second theme, field understanding, relates to the single task of (4) camera calibration, focusing on retrieving the intrinsic and extrinsic camera parameters from images. The third and last theme, player understanding, is composed of three low-level tasks related to extracting information about the players: (5) re-identification, focusing on retrieving the same players across multiple views, (6) multiple object tracking, focusing on tracking players and the ball through unedited video streams, and (7) jersey number recognition, focusing on recognizing the jersey number of players from tracklets. Compared to the previous editions of the SoccerNet challenges, tasks (2-3-7) are novel, including new annotations and data, task (4) was enhanced with more data and annotations, and task (6) now focuses on end-to-end approaches. More information on the tasks, challenges, and leaderboards are available on this https URL. Baselines and development kits can be found on this https URL.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes HUPBA Approved no  
  Call Number Admin @ si @ CGS2023 Serial 3991  
Permanent link to this record
 

 
Author Armin Mehri; Parichehr Behjati; Dario Carpio; Angel Sappa edit  url
doi  openurl
  Title (up) SRFormer: Efficient Yet Powerful Transformer Network for Single Image Super Resolution Type Journal Article
  Year 2023 Publication IEEE Access Abbreviated Journal ACCESS  
  Volume 11 Issue Pages  
  Keywords  
  Abstract Recent breakthroughs in single image super resolution have investigated the potential of deep Convolutional Neural Networks (CNNs) to improve performance. However, CNNs based models suffer from their limited fields and their inability to adapt to the input content. Recently, Transformer based models were presented, which demonstrated major performance gains in Natural Language Processing and Vision tasks while mitigating the drawbacks of CNNs. Nevertheless, Transformer computational complexity can increase quadratically for high-resolution images, and the fact that it ignores the original structures of the image by converting them to the 1D structure can make it problematic to capture the local context information and adapt it for real-time applications. In this paper, we present, SRFormer, an efficient yet powerful Transformer-based architecture, by making several key designs in the building of Transformer blocks and Transformer layers that allow us to consider the original structure of the image (i.e., 2D structure) while capturing both local and global dependencies without raising computational demands or memory consumption. We also present a Gated Multi-Layer Perceptron (MLP) Feature Fusion module to aggregate the features of different stages of Transformer blocks by focusing on inter-spatial relationships while adding minor computational costs to the network. We have conducted extensive experiments on several super-resolution benchmark datasets to evaluate our approach. SRFormer demonstrates superior performance compared to state-of-the-art methods from both Transformer and Convolutional networks, with an improvement margin of 0.1∼0.53dB . Furthermore, while SRFormer has almost the same model size, it outperforms SwinIR by 0.47% and inference time by half the time of SwinIR. The code will be available on GitHub.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes MSIAU Approved no  
  Call Number Admin @ si @ MBC2023 Serial 3887  
Permanent link to this record
 

 
Author Senmao Li; Joost van de Weijer; Taihang Hu; Fahad Shahbaz Khan; Qibin Hou; Yaxing Wang; Jian Yang edit   pdf
url  openurl
  Title (up) StyleDiffusion: Prompt-Embedding Inversion for Text-Based Editing Type Miscellaneous
  Year 2023 Publication Arxiv Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract A significant research effort is focused on exploiting the amazing capacities of pretrained diffusion models for the editing of images. They either finetune the model, or invert the image in the latent space of the pretrained model. However, they suffer from two problems: (1) Unsatisfying results for selected regions, and unexpected changes in nonselected regions. (2) They require careful text prompt editing where the prompt should include all visual objects in the input image. To address this, we propose two improvements: (1) Only optimizing the input of the value linear network in the cross-attention layers, is sufficiently powerful to reconstruct a real image. (2) We propose attention regularization to preserve the object-like attention maps after editing, enabling us to obtain accurate style editing without invoking significant structural changes. We further improve the editing technique which is used for the unconditional branch of classifier-free guidance, as well as the conditional one as used by P2P. Extensive experimental prompt-editing results on a variety of images, demonstrate qualitatively and quantitatively that our method has superior editing capabilities than existing and concurrent works.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes LAMP Approved no  
  Call Number Admin @ si @ LWH2023 Serial 3870  
Permanent link to this record
 

 
Author Hao Fang; Ajian Liu; Jun Wan; Sergio Escalera; Hugo Jair Escalante; Zhen Lei edit  url
doi  openurl
  Title (up) Surveillance Face Presentation Attack Detection Challenge Type Conference Article
  Year 2023 Publication Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops Abbreviated Journal  
  Volume Issue Pages 6360-6370  
  Keywords  
  Abstract Face Anti-spoofing (FAS) is essential to secure face recognition systems from various physical attacks. However, most of the studies lacked consideration of long-distance scenarios. Specifically, compared with FAS in traditional scenes such as phone unlocking, face payment, and self-service security inspection, FAS in long-distance such as station squares, parks, and self-service supermarkets are equally important, but it has not been sufficiently explored yet. In order to fill this gap in the FAS community, we collect a large-scale Surveillance High-Fidelity Mask (SuHiFiMask). SuHiFiMask contains 10,195 videos from 101 subjects of different age groups, which are collected by 7 mainstream surveillance cameras. Based on this dataset and protocol-3 for evaluating the robustness of the algorithm under quality changes, we organized a face presentation attack detection challenge in surveillance scenarios. It attracted 180 teams for the development phase with a total of 37 teams qualifying for the final round. The organization team re-verified and re-ran the submitted code and used the results as the final ranking. In this paper, we present an overview of the challenge, including an introduction to the dataset used, the definition of the protocol, the evaluation metrics, and the announcement of the competition results. Finally, we present the top-ranked algorithms and the research ideas provided by the competition for attack detection in long-range surveillance scenarios.  
  Address Vancouver; Canada; June 2023  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference CVPRW  
  Notes HuPBA Approved no  
  Call Number Admin @ si @ FLW2023 Serial 3917  
Permanent link to this record
 

 
Author Ayan Banerjee; Sanket Biswas; Josep Llados; Umapada Pal edit  url
openurl 
  Title (up) SwinDocSegmenter: An End-to-End Unified Domain Adaptive Transformer for Document Instance Segmentation Type Conference Article
  Year 2023 Publication 17th International Conference on Document Analysis and Recognition Abbreviated Journal  
  Volume 14187 Issue Pages 307–325  
  Keywords  
  Abstract Instance-level segmentation of documents consists in assigning a class-aware and instance-aware label to each pixel of the image. It is a key step in document parsing for their understanding. In this paper, we present a unified transformer encoder-decoder architecture for en-to-end instance segmentation of complex layouts in document images. The method adapts a contrastive training with a mixed query selection for anchor initialization in the decoder. Later on, it performs a dot product between the obtained query embeddings and the pixel embedding map (coming from the encoder) for semantic reasoning. Extensive experimentation on competitive benchmarks like PubLayNet, PRIMA, Historical Japanese (HJ), and TableBank demonstrate that our model with SwinL backbone achieves better segmentation performance than the existing state-of-the-art approaches with the average precision of 93.72, 54.39, 84.65 and 98.04 respectively under one billion parameters. The code is made publicly available at: github.com/ayanban011/SwinDocSegmenter .  
  Address San Jose; CA; USA; August 2023  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title LNCS  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference ICDAR  
  Notes DAG Approved no  
  Call Number Admin @ si @ BBL2023 Serial 3893  
Permanent link to this record
 

 
Author Jose Luis Gomez edit  openurl
  Title (up) Synth-to-real semi-supervised learning for visual tasks Type Book Whole
  Year 2023 Publication Going beyond Classification Problems for the Continual Learning of Deep Neural Networks Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract The curse of data labeling is a costly bottleneck in supervised deep learning, where large amounts of labeled data are needed to train intelligent systems. In onboard perception for autonomous driving, this cost corresponds to the labeling of raw data from sensors such as cameras, LiDARs, RADARs, etc. Therefore, synthetic data with automatically generated ground truth (labels) has aroused as a reliable alternative for training onboard perception models.
However, synthetic data commonly suffers from synth-to-real domain shift, i.e., models trained on the synthetic domain do not show their achievable accuracy when performing in the real world. This shift needs to be addressed by techniques falling in the realm of domain adaptation (DA).
The semi-supervised learning (SSL) paradigm can be followed to address DA. In this case, a model is trained using source data with labels (here synthetic) and leverages minimal knowledge from target data (here the real world) to generate pseudo-labels. These pseudo-labels help the training process to reduce the gap between the source and the target domains. In general, we can assume accessing both, pseudo-labels and a few amounts of human-provided labels for the target-domain data. However, the most interesting and challenging setting consists in assuming that we do not have human-provided labels at all. This setting is known as unsupervised domain adaptation (UDA). This PhD focuses on applying SSL to the UDA setting, for onboard visual tasks related to autonomous driving. We start by addressing the synth-to-real UDA problem on onboard vision-based object detection (pedestrians and cars), a critical task for autonomous driving and driving assistance. In particular, we propose to apply an SSL technique known as co-training, which we adapt to work with deep models that process a multi-modal input. The multi-modality consists of the visual appearance of the images (RGB) and their monocular depth estimation. The synthetic data we use as the source domain contains both, object bounding boxes and depth information. This prior knowledge is the
starting point for the co-training technique, which iteratively labels unlabeled real-world data and uses such pseudolabels (here bounding boxes with an assigned object class) to progressively improve the labeling results. Along this
process, two models collaborate to automatically label the images, in a way that one model compensates for the errors of the other, so avoiding error drift. While this automatic labeling process is done offline, the resulting pseudolabels can be used to train object detection models that must perform in real-time onboard a vehicle. We show that multi-modal co-training improves the labeling results compared to single-modal co-training, remaining competitive compared to human labeling.
Given the success of co-training in the context of object detection, we have also adapted this technique to a more crucial and challenging visual task, namely, onboard semantic segmentation. In fact, providing labels for a single image
can take from 30 to 90 minutes for a human labeler, depending on the content of the image. Thus, developing automatic labeling techniques for this visual task is of great interest to the automotive industry. In particular, the new co-training framework addresses synth-to-real UDA by an initial stage of self-training. Intermediate models arising from this stage are used to start the co-training procedure, for which we have elaborated an accurate collaboration policy between the two models performing the automatic labeling. Moreover, our co-training seamlessly leverages datasets from different synthetic domains. In addition, the co-training procedure is agnostic to the loss function used to train the semantic segmentation models which perform the automatic labeling. We achieve state-of-the-art results on publicly available benchmark datasets, again, remaining competitive compared to human labeling.
Finally, on the ground of our previous experience, we have designed and implemented a new SSL technique for UDA in the context of visual semantic segmentation. In this case, we mimic the labeling methodology followed by human labelers. In particular, rather than labeling full images at a time, categories of semantic classes are defined and only those are labeled in a labeling pass. In fact, different human labelers can become specialists in labeling different categories. Afterward, these per-category-labeled layers are combined to provide fully labeled images. Our technique is inspired by this methodology since we perform synth-to-real UDA per category, using the self-training stage previously developed as part of our co-training framework. The pseudo-labels obtained for each category are finally
fused to obtain fully automatically labeled images. In this context, we have also contributed to the development of a new photo-realistic synthetic dataset based on path-tracing rendering. Our new SSL technique seamlessly leverages publicly available synthetic datasets as well as this new one to obtain state-of-the-art results on synth-to-real UDA for semantic segmentation. We show that the new dataset allows us to reach better labeling accuracy than previously existing datasets, at the same time that it complements well them when combined. Moreover, we also show that the new human-inspired SSL technique outperforms co-training.
 
  Address  
  Corporate Author Thesis Ph.D. thesis  
  Publisher IMPRIMA Place of Publication Editor Antonio Lopez  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes ADAS Approved no  
  Call Number Admin @ si @ Gom2023 Serial 3961  
Permanent link to this record
 

 
Author Damian Sojka; Yuyang Liu; Dipam Goswami; Sebastian Cygert; Bartłomiej Twardowski; Joost van de Weijer edit   pdf
url  openurl
  Title (up) Technical Report for ICCV 2023 Visual Continual Learning Challenge: Continuous Test-time Adaptation for Semantic Segmentation Type Miscellaneous
  Year 2023 Publication Arxiv Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract The goal of the challenge is to develop a test-time adaptation (TTA) method, which could adapt the model to gradually changing domains in video sequences for semantic segmentation task. It is based on a synthetic driving video dataset – SHIFT. The source model is trained on images taken during daytime in clear weather. Domain changes at test-time are mainly caused by varying weather conditions and times of day. The TTA methods are evaluated in each image sequence (video) separately, meaning the model is reset to the source model state before the next sequence. Images come one by one and a prediction has to be made at the arrival of each frame. Each sequence is composed of 401 images and starts with the source domain, then gradually drifts to a different one (changing weather or time of day) until the middle of the sequence. In the second half of the sequence, the domain gradually shifts back to the source one. Ground truth data is available only for the validation split of the SHIFT dataset, in which there are only six sequences that start and end with the source domain. We conduct an analysis specifically on those sequences. Ground truth data for test split, on which the developed TTA methods are evaluated for leader board ranking, are not publicly available.
The proposed solution secured a 3rd place in a challenge and received an innovation award. Contrary to the solutions that scored better, we did not use any external pretrained models or specialized data augmentations, to keep the solutions as general as possible. We have focused on analyzing the distributional shift and developing a method that could adapt to changing data dynamics and generalize across different scenarios.
 
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes LAMP Approved no  
  Call Number Admin @ si @ SLG2023 Serial 3993  
Permanent link to this record
 

 
Author Mohamed Ali Souibgui; Sanket Biswas; Andres Mafla; Ali Furkan Biten; Alicia Fornes; Yousri Kessentini; Josep Llados; Lluis Gomez; Dimosthenis Karatzas edit  url
openurl 
  Title (up) Text-DIAE: a self-supervised degradation invariant autoencoder for text recognition and document enhancement Type Conference Article
  Year 2023 Publication Proceedings of the 37th AAAI Conference on Artificial Intelligence Abbreviated Journal  
  Volume 37 Issue 2 Pages  
  Keywords Representation Learning for Vision; CV Applications; CV Language and Vision; ML Unsupervised; Self-Supervised Learning  
  Abstract In this paper, we propose a Text-Degradation Invariant Auto Encoder (Text-DIAE), a self-supervised model designed to tackle two tasks, text recognition (handwritten or scene-text) and document image enhancement. We start by employing a transformer-based architecture that incorporates three pretext tasks as learning objectives to be optimized during pre-training without the usage of labelled data. Each of the pretext objectives is specifically tailored for the final downstream tasks. We conduct several ablation experiments that confirm the design choice of the selected pretext tasks. Importantly, the proposed model does not exhibit limitations of previous state-of-the-art methods based on contrastive losses, while at the same time requiring substantially fewer data samples to converge. Finally, we demonstrate that our method surpasses the state-of-the-art in existing supervised and self-supervised settings in handwritten and scene text recognition and document image enhancement. Our code and trained models will be made publicly available at https://github.com/dali92002/SSL-OCR  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference AAAI  
  Notes DAG Approved no  
  Call Number Admin @ si @ SBM2023 Serial 3848  
Permanent link to this record
 

 
Author Matej Kristan; Jiri Matas; Martin Danelljan; Michael Felsberg; Hyung Jin Chang; Luka Cehovin Zajc; Alan Lukezic; Ondrej Drbohlav; Zhongqun Zhang; Khanh-Tung Tran; Xuan-Son Vu; Johanna Bjorklund; Christoph Mayer; Yushan Zhang; Lei Ke; Jie Zhao; Gustavo Fernandez; Noor Al-Shakarji; Dong An; Michael Arens; Stefan Becker; Goutam Bhat; Sebastian Bullinger; Antoni B. Chan; Shijie Chang; Hanyuan Chen; Xin Chen; Yan Chen; Zhenyu Chen; Yangming Cheng; Yutao Cui; Chunyuan Deng; Jiahua Dong; Matteo Dunnhofer; Wei Feng; Jianlong Fu; Jie Gao; Ruize Han; Zeqi Hao; Jun-Yan He; Keji He; Zhenyu He; Xiantao Hu; Kaer Huang; Yuqing Huang; Yi Jiang; Ben Kang; Jin-Peng Lan; Hyungjun Lee; Chenyang Li; Jiahao Li; Ning Li; Wangkai Li; Xiaodi Li; Xin Li; Pengyu Liu; Yue Liu; Huchuan Lu; Bin Luo; Ping Luo; Yinchao Ma; Deshui Miao; Christian Micheloni; Kannappan Palaniappan; Hancheol Park; Matthieu Paul; HouWen Peng; Zekun Qian; Gani Rahmon; Norbert Scherer-Negenborn; Pengcheng Shao; Wooksu Shin; Elham Soltani Kazemi; Tianhui Song; Rainer Stiefelhagen; Rui Sun; Chuanming Tang; Zhangyong Tang; Imad Eddine Toubal; Jack Valmadre; Joost van de Weijer; Luc Van Gool; Jash Vira; Stephane Vujasinovic; Cheng Wan; Jia Wan; Dong Wang; Fei Wang; Feifan Wang; He Wang; Limin Wang; Song Wang; Yaowei Wang; Zhepeng Wang; Gangshan Wu; Jiannan Wu; Qiangqiang Wu; Xiaojun Wu; Anqi Xiao; Jinxia Xie; Chenlong Xu; Min Xu; Tianyang Xu; Yuanyou Xu; Bin Yan; Dawei Yang; Ming-Hsuan Yang; Tianyu Yang; Yi Yang; Zongxin Yang; Xuanwu Yin; Fisher Yu; Hongyuan Yu; Qianjin Yu; Weichen Yu; YongSheng Yuan; Zehuan Yuan; Jianlin Zhang; Lu Zhang; Tianzhu Zhang; Guodongfang Zhao; Shaochuan Zhao; Yaozong Zheng; Bineng Zhong; Jiawen Zhu; Xuefeng Zhu; Yueting Zhuang; ChengAo Zong; Kunlong Zuo edit   pdf
url  openurl
  Title (up) The First Visual Object Tracking Segmentation VOTS2023 Challenge Results Type Conference Article
  Year 2023 Publication Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops Abbreviated Journal  
  Volume Issue Pages 1796-1818  
  Keywords  
  Abstract The Visual Object Tracking Segmentation VOTS2023 challenge is the eleventh annual tracker benchmarking activity of the VOT initiative. This challenge is the first to merge short-term and long-term as well as single-target and multiple-target tracking with segmentation masks as the only target location specification. A new dataset was created; the ground truth has been withheld to prevent overfitting. New performance measures and evaluation protocols have been created along with a new toolkit and an evaluation server. Results of the presented 47 trackers indicate that modern tracking frameworks are well-suited to deal with convergence of short-term and long-term tracking and that multiple and single target tracking can be considered a single problem. A leaderboard, with participating trackers details, the source code, the datasets, and the evaluation kit are publicly available at the challenge website\footnote https://www.votchallenge.net/vots2023/.  
  Address Paris; France; October 2023  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference ICCVW  
  Notes LAMP Approved no  
  Call Number Admin @ si @ KMD2023 Serial 3939  
Permanent link to this record
 

 
Author Rafael E. Rivadeneira; Angel Sappa; Boris X. Vintimilla; Chenyang Wang; Junjun Jiang; Xianming Liu; Zhiwei Zhong; Dai Bin; Li Ruodi; Li Shengye edit  url
doi  openurl
  Title (up) Thermal Image Super-Resolution Challenge Results-PBVS 2023 Type Conference Article
  Year 2023 Publication Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops Abbreviated Journal  
  Volume Issue Pages 470-478  
  Keywords  
  Abstract This paper presents the results of two tracks from the fourth Thermal Image Super-Resolution (TISR) challenge, held at the Perception Beyond the Visible Spectrum (PBVS) 2023 workshop. Track-1 uses the same thermal image dataset as previous challenges, with 951 training images and 50 validation images at each resolution. In this track, two evaluations were conducted: the first consists of generating a SR image from a HR thermal noisy image downsampled by four, and the second consists of generating a SR image from a mid-resolution image and compare it with its semi-registered HR image (acquired with another camera). The results of Track-1 outperformed those from last year’s challenge. On the other hand, Track-2 uses a new acquired dataset consisting of 160 registered visible and thermal images of the same scenario for training and 30 validation images. This year, more than 150 teams participated in the challenge tracks, demonstrating the community’s ongoing interest in this topic.  
  Address Vancouver; Canada; June 2023  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference CVPRW  
  Notes MSIAU Approved no  
  Call Number Admin @ si @ RSV2023 Serial 3914  
Permanent link to this record
 

 
Author Xavier Soria; Yachuan Li; Mohammad Rouhani; Angel Sappa edit   pdf
url  openurl
  Title (up) Tiny and Efficient Model for the Edge Detection Generalization Type Conference Article
  Year 2023 Publication Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract Most high-level computer vision tasks rely on low-level image operations as their initial processes. Operations such as edge detection, image enhancement, and super-resolution, provide the foundations for higher level image analysis. In this work we address the edge detection considering three main objectives: simplicity, efficiency, and generalization since current state-of-the-art (SOTA) edge detection models are increased in complexity for better accuracy. To achieve this, we present Tiny and Efficient Edge Detector (TEED), a light convolutional neural network with only 58K parameters, less than 0:2% of the state-of-the-art models. Training on the BIPED dataset takes less than 30 minutes, with each epoch requiring less than 5 minutes. Our proposed model is easy to train and it quickly converges within very first few epochs, while the predicted edge-maps are crisp and of high quality. Additionally, we propose a new dataset to test the generalization of edge detection, which comprises samples from popular images used in edge detection and image segmentation. The source code is available in https://github.com/xavysp/TEED.  
  Address Paris; France; October 2023  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference ICCVW  
  Notes MSIAU Approved no  
  Call Number Admin @ si @ SLR2023 Serial 3941  
Permanent link to this record
 

 
Author Armin Mehri; Parichehr Behjati; Angel Sappa edit  url
openurl 
  Title (up) TnTViT-G: Transformer in Transformer Network for Guidance Super Resolution Type Journal Article
  Year 2023 Publication IEEE Access Abbreviated Journal ACCESS  
  Volume 11 Issue Pages 11529-11540  
  Keywords  
  Abstract Image Super Resolution is a potential approach that can improve the image quality of low-resolution optical sensors, leading to improved performance in various industrial applications. It is important to emphasize that most state-of-the-art super resolution algorithms often use a single channel of input data for training and inference. However, this practice ignores the fact that the cost of acquiring high-resolution images in various spectral domains can differ a lot from one another. In this paper, we attempt to exploit complementary information from a low-cost channel (visible image) to increase the image quality of an expensive channel (infrared image). We propose a dual stream Transformer-based super resolution approach that uses the visible image as a guide to super-resolve another spectral band image. To this end, we introduce Transformer in Transformer network for Guidance super resolution, named TnTViT-G, an efficient and effective method that extracts the features of input images via different streams and fuses them together at various stages. In addition, unlike other guidance super resolution approaches, TnTViT-G is not limited to a fixed upsample size and it can generate super-resolved images of any size. Extensive experiments on various datasets show that the proposed model outperforms other state-of-the-art super resolution approaches. TnTViT-G surpasses state-of-the-art methods by up to 0.19∼2.3dB , while it is memory efficient.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes MSIAU Approved no  
  Call Number Admin @ si @ MBS2023 Serial 3876  
Permanent link to this record
Select All    Deselect All
 |   | 
Details
   print

Save Citations:
Export Records: