Records |
Author |
Hao Fang; Ajian Liu; Jun Wan; Sergio Escalera; Hugo Jair Escalante; Zhen Lei |
Title |
Surveillance Face Presentation Attack Detection Challenge |
Type |
Conference Article |
Year |
2023 |
Publication |
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops |
Abbreviated Journal |
|
Volume |
|
Issue |
|
Pages |
6360-6370 |
Keywords |
|
Abstract |
Face Anti-spoofing (FAS) is essential to secure face recognition systems from various physical attacks. However, most of the studies lacked consideration of long-distance scenarios. Specifically, compared with FAS in traditional scenes such as phone unlocking, face payment, and self-service security inspection, FAS in long-distance such as station squares, parks, and self-service supermarkets are equally important, but it has not been sufficiently explored yet. In order to fill this gap in the FAS community, we collect a large-scale Surveillance High-Fidelity Mask (SuHiFiMask). SuHiFiMask contains 10,195 videos from 101 subjects of different age groups, which are collected by 7 mainstream surveillance cameras. Based on this dataset and protocol-3 for evaluating the robustness of the algorithm under quality changes, we organized a face presentation attack detection challenge in surveillance scenarios. It attracted 180 teams for the development phase with a total of 37 teams qualifying for the final round. The organization team re-verified and re-ran the submitted code and used the results as the final ranking. In this paper, we present an overview of the challenge, including an introduction to the dataset used, the definition of the protocol, the evaluation metrics, and the announcement of the competition results. Finally, we present the top-ranked algorithms and the research ideas provided by the competition for attack detection in long-range surveillance scenarios. |
Address |
Vancouver; Canada; June 2023 |
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
CVPRW |
Notes |
HuPBA |
Approved |
no |
Call Number |
Admin @ si @ FLW2023 |
Serial |
3917 |
Permanent link to this record |
|
|
|
Author |
Galadrielle Humblot-Renaux; Sergio Escalera; Thomas B. Moeslund |
Title |
Beyond AUROC & co. for evaluating out-of-distribution detection performance |
Type |
Conference Article |
Year |
2023 |
Publication |
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops |
Abbreviated Journal |
|
Volume |
|
Issue |
|
Pages |
3880-3889 |
Keywords |
|
Abstract |
While there has been a growing research interest in developing out-of-distribution (OOD) detection methods, there has been comparably little discussion around how these methods should be evaluated. Given their relevance for safe(r) AI, it is important to examine whether the basis for comparing OOD detection methods is consistent with practical needs. In this work, we take a closer look at the go-to metrics for evaluating OOD detection, and question the approach of exclusively reducing OOD detection to a binary classification task with little consideration for the detection threshold. We illustrate the limitations of current metrics (AUROC & its friends) and propose a new metric – Area Under the Threshold Curve (AUTC), which explicitly penalizes poor separation between ID and OOD samples. Scripts and data are available at https://github.com/glhr/beyond-auroc |
Address |
Vancouver; Canada; June 2023 |
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
CVPRW |
Notes |
HUPBA |
Approved |
no |
Call Number |
Admin @ si @ HEM2023 |
Serial |
3918 |
Permanent link to this record |
|
|
|
Author |
Dong Wang; Jia Guo; Qiqi Shao; Haochi He; Zhian Chen; Chuanbao Xiao; Ajian Liu; Sergio Escalera; Hugo Jair Escalante; Zhen Lei; Jun Wan; Jiankang Deng |
Title |
Wild Face Anti-Spoofing Challenge 2023: Benchmark and Results |
Type |
Conference Article |
Year |
2023 |
Publication |
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops |
Abbreviated Journal |
|
Volume |
|
Issue |
|
Pages |
6379-6390 |
Keywords |
|
Abstract |
Face anti-spoofing (FAS) is an essential mechanism for safeguarding the integrity of automated face recognition systems. Despite substantial advancements, the generalization of existing approaches to real-world applications remains challenging. This limitation can be attributed to the scarcity and lack of diversity in publicly available FAS datasets, which often leads to overfitting during training or saturation during testing. In terms of quantity, the number of spoof subjects is a critical determinant. Most datasets comprise fewer than 2,000 subjects. With regard to diversity, the majority of datasets consist of spoof samples collected in controlled environments using repetitive, mechanical processes. This data collection methodology results in homogenized samples and a dearth of scenario diversity. To address these shortcomings, we introduce the Wild Face Anti-Spoofing (WFAS) dataset, a large-scale, diverse FAS dataset collected in unconstrained settings. Our dataset encompasses 853,729 images of 321,751 spoof subjects and 529,571 images of 148,169 live subjects, representing a substantial increase in quantity. Moreover, our dataset incorporates spoof data obtained from the internet, spanning a wide array of scenarios and various commercial sensors, including 17 presentation attacks (PAs) that encompass both 2D and 3D forms. This novel data collection strategy markedly enhances FAS data diversity. Leveraging the WFAS dataset and Protocol 1 (Known-Type), we host the Wild Face Anti-Spoofing Challenge at the CVPR2023 workshop. Additionally, we meticulously evaluate representative methods using Protocol 1 and Protocol 2 (Unknown-Type). Through an in-depth examination of the challenge outcomes and benchmark baselines, we provide insightful analyses and propose potential avenues for future research. The dataset is released under Insightface 1 . |
Address |
Vancouver; Canada; June 2023 |
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
CVPRW |
Notes |
HUPBA |
Approved |
no |
Call Number |
Admin @ si @ WGS2023 |
Serial |
3919 |
Permanent link to this record |
|
|
|
Author |
Senmao Li; Joost Van de Weijer; Yaxing Wang; Fahad Shahbaz Khan; Meiqin Liu; Jian Yang |
Title |
3D-Aware Multi-Class Image-to-Image Translation with NeRFs |
Type |
Conference Article |
Year |
2023 |
Publication |
36th IEEE Conference on Computer Vision and Pattern Recognition |
Abbreviated Journal |
|
Volume |
|
Issue |
|
Pages |
12652-12662 |
Keywords |
|
Abstract |
Recent advances in 3D-aware generative models (3D-aware GANs) combined with Neural Radiance Fields (NeRF) have achieved impressive results. However no prior works investigate 3D-aware GANs for 3D consistent multiclass image-to-image (3D-aware 121) translation. Naively using 2D-121 translation methods suffers from unrealistic shape/identity change. To perform 3D-aware multiclass 121 translation, we decouple this learning process into a multiclass 3D-aware GAN step and a 3D-aware 121 translation step. In the first step, we propose two novel techniques: a new conditional architecture and an effective training strategy. In the second step, based on the well-trained multiclass 3D-aware GAN architecture, that preserves view-consistency, we construct a 3D-aware 121 translation system. To further reduce the view-consistency problems, we propose several new techniques, including a U-net-like adaptor network design, a hierarchical representation constrain and a relative regularization loss. In exten-sive experiments on two datasets, quantitative and qualitative results demonstrate that we successfully perform 3D-aware 121 translation with multi-view consistency. Code is available in 3DI2I. |
Address |
Vancouver; Canada; June 2023 |
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
CVPR |
Notes |
LAMP |
Approved |
no |
Call Number |
Admin @ si @ LWW2023b |
Serial |
3920 |
Permanent link to this record |
|
|
|
Author |
Hugo Bertiche; Niloy J Mitra; Kuldeep Kulkarni; Chun Hao Paul Huang; Tuanfeng Y Wang; Meysam Madadi; Sergio Escalera; Duygu Ceylan |
Title |
Blowing in the Wind: CycleNet for Human Cinemagraphs from Still Images |
Type |
Conference Article |
Year |
2023 |
Publication |
36th IEEE Conference on Computer Vision and Pattern Recognition |
Abbreviated Journal |
|
Volume |
|
Issue |
|
Pages |
459-468 |
Keywords |
|
Abstract |
Cinemagraphs are short looping videos created by adding subtle motions to a static image. This kind of media is popular and engaging. However, automatic generation of cinemagraphs is an underexplored area and current solutions require tedious low-level manual authoring by artists. In this paper, we present an automatic method that allows generating human cinemagraphs from single RGB images. We investigate the problem in the context of dressed humans under the wind. At the core of our method is a novel cyclic neural network that produces looping cinemagraphs for the target loop duration. To circumvent the problem of collecting real data, we demonstrate that it is possible, by working in the image normal space, to learn garment motion dynamics on synthetic data and generalize to real data. We evaluate our method on both synthetic and real data and demonstrate that it is possible to create compelling and plausible cinemagraphs from single RGB images. |
Address |
Vancouver; Canada; June 2023 |
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
CVPR |
Notes |
HUPBA |
Approved |
no |
Call Number |
Admin @ si @ BMK2023 |
Serial |
3921 |
Permanent link to this record |
|
|
|
Author |
Yael Tudela; Ana Garcia Rodriguez; Gloria Fernandez Esparrach; Jorge Bernal |
Title |
Towards Fine-Grained Polyp Segmentation and Classification |
Type |
Conference Article |
Year |
2023 |
Publication |
Workshop on Clinical Image-Based Procedures |
Abbreviated Journal |
|
Volume |
14242 |
Issue |
|
Pages |
32-42 |
Keywords |
Medical image segmentation; Colorectal Cancer; Vision Transformer; Classification |
Abstract |
Colorectal cancer is one of the main causes of cancer death worldwide. Colonoscopy is the gold standard screening tool as it allows lesion detection and removal during the same procedure. During the last decades, several efforts have been made to develop CAD systems to assist clinicians in lesion detection and classification. Regarding the latter, and in order to be used in the exploration room as part of resect and discard or leave-in-situ strategies, these systems must identify correctly all different lesion types. This is a challenging task, as the data used to train these systems presents great inter-class similarity, high class imbalance, and low representation of clinically relevant histology classes such as serrated sessile adenomas.
In this paper, a new polyp segmentation and classification method, Swin-Expand, is introduced. Based on Swin-Transformer, it uses a simple and lightweight decoder. The performance of this method has been assessed on a novel dataset, comprising 1126 high-definition images representing the three main histological classes. Results show a clear improvement in both segmentation and classification performance, also achieving competitive results when tested in public datasets. These results confirm that both the method and the data are important to obtain more accurate polyp representations. |
Address |
Vancouver; October 2023 |
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
LNCS |
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
MICCAIW |
Notes |
ISE |
Approved |
no |
Call Number |
Admin @ si @ TGF2023 |
Serial |
3837 |
Permanent link to this record |
|
|
|
Author |
Soumya Jahagirdar; Minesh Mathew; Dimosthenis Karatzas; CV Jawahar |
Title |
Watching the News: Towards VideoQA Models that can Read |
Type |
Conference Article |
Year |
2023 |
Publication |
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer |
Abbreviated Journal |
|
Volume |
|
Issue |
|
Pages |
|
Keywords |
|
Abstract |
Video Question Answering methods focus on commonsense reasoning and visual cognition of objects or persons and their interactions over time. Current VideoQA approaches ignore the textual information present in the video. Instead, we argue that textual information is complementary to the action and provides essential contextualisation cues to the reasoning process. To this end, we propose a novel VideoQA task that requires reading and understanding the text in the video. To explore this direction, we focus on news videos and require QA systems to comprehend and answer questions about the topics presented by combining visual and textual cues in the video. We introduce the ``NewsVideoQA'' dataset that comprises more than 8,600 QA pairs on 3,000+ news videos obtained from diverse news channels from around the world. We demonstrate the limitations of current Scene Text VQA and VideoQA methods and propose ways to incorporate scene text information into VideoQA methods. |
Address |
Waikoloa; Hawai; USA; January 2023 |
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
WACV |
Notes |
DAG |
Approved |
no |
Call Number |
Admin @ si @ JMK2023 |
Serial |
3899 |
Permanent link to this record |
|
|
|
Author |
Marcos V Conde; Florin Vasluianu; Javier Vazquez; Radu Timofte |
Title |
Perceptual image enhancement for smartphone real-time applications |
Type |
Conference Article |
Year |
2023 |
Publication |
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision |
Abbreviated Journal |
|
Volume |
|
Issue |
|
Pages |
1848-1858 |
Keywords |
|
Abstract |
Recent advances in camera designs and imaging pipelines allow us to capture high-quality images using smartphones. However, due to the small size and lens limitations of the smartphone cameras, we commonly find artifacts or degradation in the processed images. The most common unpleasant effects are noise artifacts, diffraction artifacts, blur, and HDR overexposure. Deep learning methods for image restoration can successfully remove these artifacts. However, most approaches are not suitable for real-time applications on mobile devices due to their heavy computation and memory requirements. In this paper, we propose LPIENet, a lightweight network for perceptual image enhancement, with the focus on deploying it on smartphones. Our experiments show that, with much fewer parameters and operations, our model can deal with the mentioned artifacts and achieve competitive performance compared with state-of-the-art methods on standard benchmarks. Moreover, to prove the efficiency and reliability of our approach, we deployed the model directly on commercial smartphones and evaluated its performance. Our model can process 2K resolution images under 1 second in mid-level commercial smartphones. |
Address |
Waikoloa; Hawai; USA; January 2023 |
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
WACV |
Notes |
MACO; CIC |
Approved |
no |
Call Number |
Admin @ si @ CVV2023 |
Serial |
3900 |
Permanent link to this record |
|
|
|
Author |
Dipam Goswami; J Schuster; Joost Van de Weijer; Didier Stricker |
Title |
Attribution-aware Weight Transfer: A Warm-Start Initialization for Class-Incremental Semantic Segmentation |
Type |
Conference Article |
Year |
2023 |
Publication |
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision |
Abbreviated Journal |
|
Volume |
|
Issue |
|
Pages |
3195-3204 |
Keywords |
|
Abstract |
Attribution-aware Weight Transfer: A Warm-Start Initialization for Class-Incremental Semantic Segmentation. D Goswami, R Schuster, J van de Weijer, D Stricker. Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2023, pp. 3195-3204 |
Address |
Waikoloa; Hawai; USA; January 2023 |
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
WACV |
Notes |
LAMP |
Approved |
no |
Call Number |
Admin @ si @ GSW2023 |
Serial |
3901 |
Permanent link to this record |
|
|
|
Author |
Mickael Cormier; Andreas Specker; Julio C. S. Jacques; Lucas Florin; Jurgen Metzler; Thomas B. Moeslund; Kamal Nasrollahi; Sergio Escalera; Jurgen Beyerer |
Title |
UPAR Challenge: Pedestrian Attribute Recognition and Attribute-based Person Retrieval – Dataset, Design, and Results |
Type |
Conference Article |
Year |
2023 |
Publication |
2023 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops |
Abbreviated Journal |
|
Volume |
|
Issue |
|
Pages |
166-175 |
Keywords |
|
Abstract |
In civilian video security monitoring, retrieving and tracking a person of interest often rely on witness testimony and their appearance description. Deployed systems rely on a large amount of annotated training data and are expected to show consistent performance in diverse areas and gen-eralize well between diverse settings w.r.t. different view-points, illumination, resolution, occlusions, and poses for indoor and outdoor scenes. However, for such generalization, the system would require a large amount of various an-notated data for training and evaluation. The WACV 2023 Pedestrian Attribute Recognition and Attributed-based Per-son Retrieval Challenge (UPAR-Challenge) aimed to spot-light the problem of domain gaps in a real-world surveil-lance context and highlight the challenges and limitations of existing methods. The UPAR dataset, composed of 40 important binary attributes over 12 attribute categories across four datasets, was extended with data captured from a low-flying UAV from the P-DESTRE dataset. To this aim, 0.6M additional annotations were manually labeled and vali-dated. Each track evaluated the robustness of the competing methods to domain shifts by training on limited data from a specific domain and evaluating using data from unseen do-mains. The challenge attracted 41 registered participants, but only one team managed to outperform the baseline on one track, emphasizing the task's difficulty. This work de-scribes the challenge design, the adopted dataset, obtained results, as well as future directions on the topic. |
Address |
Waikoloa; Hawai; USA; January 2023 |
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
WACVW |
Notes |
HUPBA |
Approved |
no |
Call Number |
Admin @ si @ CSJ2023 |
Serial |
3902 |
Permanent link to this record |
|
|
|
Author |
Khanh Nguyen; Ali Furkan Biten; Andres Mafla; Lluis Gomez; Dimosthenis Karatzas |
Title |
Show, Interpret and Tell: Entity-Aware Contextualised Image Captioning in Wikipedia |
Type |
Conference Article |
Year |
2023 |
Publication |
Proceedings of the 37th AAAI Conference on Artificial Intelligence |
Abbreviated Journal |
|
Volume |
37 |
Issue |
2 |
Pages |
1940-1948 |
Keywords |
|
Abstract |
Humans exploit prior knowledge to describe images, and are able to adapt their explanation to specific contextual information given, even to the extent of inventing plausible explanations when contextual information and images do not match. In this work, we propose the novel task of captioning Wikipedia images by integrating contextual knowledge. Specifically, we produce models that jointly reason over Wikipedia articles, Wikimedia images and their associated descriptions to produce contextualized captions. The same Wikimedia image can be used to illustrate different articles, and the produced caption needs to be adapted to the specific context allowing us to explore the limits of the model to adjust captions to different contextual information. Dealing with out-of-dictionary words and Named Entities is a challenging task in this domain. To address this, we propose a pre-training objective, Masked Named Entity Modeling (MNEM), and show that this pretext task results to significantly improved models. Furthermore, we verify that a model pre-trained in Wikipedia generalizes well to News Captioning datasets. We further define two different test splits according to the difficulty of the captioning task. We offer insights on the role and the importance of each modality and highlight the limitations of our model. |
Address |
Washington; USA; February 2023 |
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
AAAI |
Notes |
DAG |
Approved |
no |
Call Number |
Admin @ si @ NBM2023 |
Serial |
3860 |
Permanent link to this record |