toggle visibility Search & Display Options

Select All    Deselect All
 |   | 
Details
   print
  Records Links
Author Lei Li; Fuping Wu; Sihan Wang; Xinzhe Luo; Carlos Martin-Isla; Shuwei Zhai; Jianpeng Zhang; Yanfei Liu; Zhen Zhang; Markus J. Ankenbrand; Haochuan Jiang; Xiaoran Zhang; Linhong Wang; Tewodros Weldebirhan Arega; Elif Altunok; Zhou Zhao; Feiyan Li; Jun Ma; Xiaoping Yang; Elodie Puybareau; Ilkay Oksuz; Stephanie Bricq; Weisheng Li;Kumaradevan Punithakumar; Sotirios A. Tsaftaris; Laura M. Schreiber; Mingjing Yang; Guocai Liu; Yong Xia; Guotai Wang; Sergio Escalera; Xiahai Zhuag edit  url
openurl 
  Title MyoPS: A benchmark of myocardial pathology segmentation combining three-sequence cardiac magnetic resonance images Type Journal Article
  Year 2023 Publication Medical Image Analysis Abbreviated Journal MIA  
  Volume 87 Issue Pages 102808  
  Keywords  
  Abstract Assessment of myocardial viability is essential in diagnosis and treatment management of patients suffering from myocardial infarction, and classification of pathology on the myocardium is the key to this assessment. This work defines a new task of medical image analysis, i.e., to perform myocardial pathology segmentation (MyoPS) combining three-sequence cardiac magnetic resonance (CMR) images, which was first proposed in the MyoPS challenge, in conjunction with MICCAI 2020. Note that MyoPS refers to both myocardial pathology segmentation and the challenge in this paper. The challenge provided 45 paired and pre-aligned CMR images, allowing algorithms to combine the complementary information from the three CMR sequences for pathology segmentation. In this article, we provide details of the challenge, survey the works from fifteen participants and interpret their methods according to five aspects, i.e., preprocessing, data augmentation, learning strategy, model architecture and post-processing. In addition, we analyze the results with respect to different factors, in order to examine the key obstacles and explore the potential of solutions, as well as to provide a benchmark for future research. The average Dice scores of submitted algorithms were and for myocardial scars and edema, respectively. We conclude that while promising results have been reported, the research is still in the early stage, and more in-depth exploration is needed before a successful application to the clinics. MyoPS data and evaluation tool continue to be publicly available upon registration via its homepage (www.sdspeople.fudan.edu.cn/zhuangxiahai/0/myops20/).  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes HUPBA Approved no  
  Call Number Admin @ si @ LWW2023a Serial (down) 3878  
Permanent link to this record
 

 
Author Armin Mehri; Parichehr Behjati; Angel Sappa edit  url
openurl 
  Title TnTViT-G: Transformer in Transformer Network for Guidance Super Resolution Type Journal Article
  Year 2023 Publication IEEE Access Abbreviated Journal ACCESS  
  Volume 11 Issue Pages 11529-11540  
  Keywords  
  Abstract Image Super Resolution is a potential approach that can improve the image quality of low-resolution optical sensors, leading to improved performance in various industrial applications. It is important to emphasize that most state-of-the-art super resolution algorithms often use a single channel of input data for training and inference. However, this practice ignores the fact that the cost of acquiring high-resolution images in various spectral domains can differ a lot from one another. In this paper, we attempt to exploit complementary information from a low-cost channel (visible image) to increase the image quality of an expensive channel (infrared image). We propose a dual stream Transformer-based super resolution approach that uses the visible image as a guide to super-resolve another spectral band image. To this end, we introduce Transformer in Transformer network for Guidance super resolution, named TnTViT-G, an efficient and effective method that extracts the features of input images via different streams and fuses them together at various stages. In addition, unlike other guidance super resolution approaches, TnTViT-G is not limited to a fixed upsample size and it can generate super-resolved images of any size. Extensive experiments on various datasets show that the proposed model outperforms other state-of-the-art super resolution approaches. TnTViT-G surpasses state-of-the-art methods by up to 0.19∼2.3dB , while it is memory efficient.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes MSIAU Approved no  
  Call Number Admin @ si @ MBS2023 Serial (down) 3876  
Permanent link to this record
 

 
Author Chengyi Zou; Shuai Wan; Tiannan Ji; Marc Gorriz Blanch; Marta Mrak; Luis Herranz edit  url
doi  openurl
  Title Chroma Intra Prediction with Lightweight Attention-Based Neural Networks Type Journal Article
  Year 2023 Publication IEEE Transactions on Circuits and Systems for Video Technology Abbreviated Journal TCSVT  
  Volume 34 Issue 1 Pages 549 - 560  
  Keywords  
  Abstract Neural networks can be successfully used for cross-component prediction in video coding. In particular, attention-based architectures are suitable for chroma intra prediction using luma information because of their capability to model relations between difierent channels. However, the complexity of such methods is still very high and should be further reduced, especially for decoding. In this paper, a cost-effective attention-based neural network is designed for chroma intra prediction. Moreover, with the goal of further improving coding performance, a novel approach is introduced to utilize more boundary information effectively. In addition to improving prediction, a simplification methodology is also proposed to reduce inference complexity by simplifying convolutions. The proposed schemes are integrated into H.266/Versatile Video Coding (VVC) pipeline, and only one additional binary block-level syntax flag is introduced to indicate whether a given block makes use of the proposed method. Experimental results demonstrate that the proposed scheme achieves up to −0.46%/−2.29%/−2.17% BD-rate reduction on Y/Cb/Cr components, respectively, compared with H.266/VVC anchor. Reductions in the encoding and decoding complexity of up to 22% and 61%, respectively, are achieved by the proposed scheme with respect to the previous attention-based chroma intra prediction method while maintaining coding performance.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes MACO; LAMP Approved no  
  Call Number Admin @ si @ ZWJ2023 Serial (down) 3875  
Permanent link to this record
 

 
Author Shiqi Yang; Yaxing Wang; Luis Herranz; Shangling Jui; Joost Van de Weijer edit  url
openurl 
  Title Casting a BAIT for offline and online source-free domain adaptation Type Journal Article
  Year 2023 Publication Computer Vision and Image Understanding Abbreviated Journal CVIU  
  Volume 234 Issue Pages 103747  
  Keywords  
  Abstract We address the source-free domain adaptation (SFDA) problem, where only the source model is available during adaptation to the target domain. We consider two settings: the offline setting where all target data can be visited multiple times (epochs) to arrive at a prediction for each target sample, and the online setting where the target data needs to be directly classified upon arrival. Inspired by diverse classifier based domain adaptation methods, in this paper we introduce a second classifier, but with another classifier head fixed. When adapting to the target domain, the additional classifier initialized from source classifier is expected to find misclassified features. Next, when updating the feature extractor, those features will be pushed towards the right side of the source decision boundary, thus achieving source-free domain adaptation. Experimental results show that the proposed method achieves competitive results for offline SFDA on several benchmark datasets compared with existing DA and SFDA methods, and our method surpasses by a large margin other SFDA methods under online source-free domain adaptation setting.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes LAMP; MACO Approved no  
  Call Number Admin @ si @ YWH2023 Serial (down) 3874  
Permanent link to this record
 

 
Author Kunal Biswas; Palaiahnakote Shivakumara; Umapada Pal; Tong Lu; Michel Blumenstein; Josep Llados edit  url
openurl 
  Title Classification of aesthetic natural scene images using statistical and semantic features Type Journal Article
  Year 2023 Publication Multimedia Tools and Applications Abbreviated Journal MTAP  
  Volume 82 Issue 9 Pages 13507-13532  
  Keywords  
  Abstract Aesthetic image analysis is essential for improving the performance of multimedia image retrieval systems, especially from a repository of social media and multimedia content stored on mobile devices. This paper presents a novel method for classifying aesthetic natural scene images by studying the naturalness of image content using statistical features, and reading text in the images using semantic features. Unlike existing methods that focus only on image quality with human information, the proposed approach focuses on image features as well as text-based semantic features without human intervention to reduce the gap between subjectivity and objectivity in the classification. The aesthetic classes considered in this work are (i) Very Pleasant, (ii) Pleasant, (iii) Normal and (iv) Unpleasant. The naturalness is represented by features of focus, defocus, perceived brightness, perceived contrast, blurriness and noisiness, while semantics are represented by text recognition, description of the images and labels of images, profile pictures, and banner images. Furthermore, a deep learning model is proposed in a novel way to fuse statistical and semantic features for the classification of aesthetic natural scene images. Experiments on our own dataset and the standard datasets demonstrate that the proposed approach achieves 92.74%, 88.67% and 83.22% average classification rates on our own dataset, AVA dataset and CUHKPQ dataset, respectively. Furthermore, a comparative study of the proposed model with the existing methods shows that the proposed method is effective for the classification of aesthetic social media images.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes DAG Approved no  
  Call Number Admin @ si @ BSP2023 Serial (down) 3873  
Permanent link to this record
 

 
Author Hao Fang; Ajian Liu; Jun Wan; Sergio Escalera; Chenxu Zhao; Xu Zhang; Stan Z Li; Zhen Lei edit   pdf
url  openurl
  Title Surveillance Face Anti-spoofing Type Journal Article
  Year 2024 Publication IEEE Transactions on Information Forensics and Security Abbreviated Journal TIFS  
  Volume 19 Issue Pages 1535-1546  
  Keywords  
  Abstract Face Anti-spoofing (FAS) is essential to secure face recognition systems from various physical attacks. However, recent research generally focuses on short-distance applications (i.e., phone unlocking) while lacking consideration of long-distance scenes (i.e., surveillance security checks). In order to promote relevant research and fill this gap in the community, we collect a large-scale Surveillance High-Fidelity Mask (SuHiFiMask) dataset captured under 40 surveillance scenes, which has 101 subjects from different age groups with 232 3D attacks (high-fidelity masks), 200 2D attacks (posters, portraits, and screens), and 2 adversarial attacks. In this scene, low image resolution and noise interference are new challenges faced in surveillance FAS. Together with the SuHiFiMask dataset, we propose a Contrastive Quality-Invariance Learning (CQIL) network to alleviate the performance degradation caused by image quality from three aspects: (1) An Image Quality Variable module (IQV) is introduced to recover image information associated with discrimination by combining the super-resolution network. (2) Using generated sample pairs to simulate quality variance distributions to help contrastive learning strategies obtain robust feature representation under quality variation. (3) A Separate Quality Network (SQN) is designed to learn discriminative features independent of image quality. Finally, a large number of experiments verify the quality of the SuHiFiMask dataset and the superiority of the proposed CQIL.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes HUPBA Approved no  
  Call Number Admin @ si @ FLW2024 Serial (down) 3869  
Permanent link to this record
 

 
Author Wenwen Fu; Zhihong An; Wendong Huang; Haoran Sun; Wenjuan Gong; Jordi Gonzalez edit  url
openurl 
  Title A Spatio-Temporal Spotting Network with Sliding Windows for Micro-Expression Detection Type Journal Article
  Year 2023 Publication Electronics Abbreviated Journal ELEC  
  Volume 12 Issue 18 Pages 3947  
  Keywords micro-expression spotting; sliding window; key frame extraction  
  Abstract Micro-expressions reveal underlying emotions and are widely applied in political psychology, lie detection, law enforcement and medical care. Micro-expression spotting aims to detect the temporal locations of facial expressions from video sequences and is a crucial task in micro-expression recognition. In this study, the problem of micro-expression spotting is formulated as micro-expression classification per frame. We propose an effective spotting model with sliding windows called the spatio-temporal spotting network. The method involves a sliding window detection mechanism, combines the spatial features from the local key frames and the global temporal features and performs micro-expression spotting. The experiments are conducted on the CAS(ME)2 database and the SAMM Long Videos database, and the results demonstrate that the proposed method outperforms the state-of-the-art method by 30.58% for the CAS(ME)2 and 23.98% for the SAMM Long Videos according to overall F-scores.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes ISE Approved no  
  Call Number Admin @ si @ FAH2023 Serial (down) 3864  
Permanent link to this record
 

 
Author Wenjuan Gong; Yue Zhang; Wei Wang; Peng Cheng; Jordi Gonzalez edit  url
openurl 
  Title Meta-MMFNet: Meta-learning-based Multi-model Fusion Network for Micro-expression Recognition Type Journal Article
  Year 2023 Publication ACM Transactions on Multimedia Computing, Communications, and Applications Abbreviated Journal TMCCA  
  Volume 20 Issue 2 Pages 1–20  
  Keywords  
  Abstract Despite its wide applications in criminal investigations and clinical communications with patients suffering from autism, automatic micro-expression recognition remains a challenging problem because of the lack of training data and imbalanced classes problems. In this study, we proposed a meta-learning-based multi-model fusion network (Meta-MMFNet) to solve the existing problems. The proposed method is based on the metric-based meta-learning pipeline, which is specifically designed for few-shot learning and is suitable for model-level fusion. The frame difference and optical flow features were fused, deep features were extracted from the fused feature, and finally in the meta-learning-based framework, weighted sum model fusion method was applied for micro-expression classification. Meta-MMFNet achieved better results than state-of-the-art methods on four datasets. The code is available at https://github.com/wenjgong/meta-fusion-based-method.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes ISE Approved no  
  Call Number Admin @ si @ GZW2023 Serial (down) 3862  
Permanent link to this record
 

 
Author Parichehr Behjati; Pau Rodriguez; Carles Fernandez; Isabelle Hupont; Armin Mehri; Jordi Gonzalez edit  url
openurl 
  Title Single image super-resolution based on directional variance attention network Type Journal Article
  Year 2023 Publication Pattern Recognition Abbreviated Journal PR  
  Volume 133 Issue Pages 108997  
  Keywords  
  Abstract Recent advances in single image super-resolution (SISR) explore the power of deep convolutional neural networks (CNNs) to achieve better performance. However, most of the progress has been made by scaling CNN architectures, which usually raise computational demands and memory consumption. This makes modern architectures less applicable in practice. In addition, most CNN-based SR methods do not fully utilize the informative hierarchical features that are helpful for final image recovery. In order to address these issues, we propose a directional variance attention network (DiVANet), a computationally efficient yet accurate network for SISR. Specifically, we introduce a novel directional variance attention (DiVA) mechanism to capture long-range spatial dependencies and exploit inter-channel dependencies simultaneously for more discriminative representations. Furthermore, we propose a residual attention feature group (RAFG) for parallelizing attention and residual block computation. The output of each residual block is linearly fused at the RAFG output to provide access to the whole feature hierarchy. In parallel, DiVA extracts most relevant features from the network for improving the final output and preventing information loss along the successive operations inside the network. Experimental results demonstrate the superiority of DiVANet over the state of the art in several datasets, while maintaining relatively low computation and memory footprint. The code is available at https://github.com/pbehjatii/DiVANet.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes ISE Approved no  
  Call Number Admin @ si @ BPF2023 Serial (down) 3861  
Permanent link to this record
 

 
Author M. Altillawi; S. Li; S.M. Prakhya; Z. Liu; Joan Serrat edit  doi
openurl 
  Title Implicit Learning of Scene Geometry From Poses for Global Localization Type Journal Article
  Year 2024 Publication IEEE Robotics and Automation Letters Abbreviated Journal ROBOTAUTOMLET  
  Volume 9 Issue 2 Pages 955-962  
  Keywords Localization; Localization and mapping; Deep learning for visual perception; Visual learning  
  Abstract Global visual localization estimates the absolute pose of a camera using a single image, in a previously mapped area. Obtaining the pose from a single image enables many robotics and augmented/virtual reality applications. Inspired by latest advances in deep learning, many existing approaches directly learn and regress 6 DoF pose from an input image. However, these methods do not fully utilize the underlying scene geometry for pose regression. The challenge in monocular relocalization is the minimal availability of supervised training data, which is just the corresponding 6 DoF poses of the images. In this letter, we propose to utilize these minimal available labels (i.e., poses) to learn the underlying 3D geometry of the scene and use the geometry to estimate the 6 DoF camera pose. We present a learning method that uses these pose labels and rigid alignment to learn two 3D geometric representations ( X, Y, Z coordinates ) of the scene, one in camera coordinate frame and the other in global coordinate frame. Given a single image, it estimates these two 3D scene representations, which are then aligned to estimate a pose that matches the pose label. This formulation allows for the active inclusion of additional learning constraints to minimize 3D alignment errors between the two 3D scene representations, and 2D re-projection errors between the 3D global scene representation and 2D image pixels, resulting in improved localization accuracy. During inference, our model estimates the 3D scene geometry in camera and global frames and aligns them rigidly to obtain pose in real-time. We evaluate our work on three common visual localization datasets, conduct ablation studies, and show that our method exceeds state-of-the-art regression methods' pose accuracy on all datasets.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN 2377-3766 ISBN Medium  
  Area Expedition Conference  
  Notes ADAS Approved no  
  Call Number Admin @ si @ Serial (down) 3857  
Permanent link to this record
 

 
Author Jose Elias Yauri; M. Lagos; H. Vega-Huerta; P. de-la-Cruz; G.L.E Maquen-Niño; E. Condor-Tinoco edit  doi
openurl 
  Title Detection of Epileptic Seizures Based-on Channel Fusion and Transformer Network in EEG Recordings Type Journal Article
  Year 2023 Publication International Journal of Advanced Computer Science and Applications Abbreviated Journal IJACSA  
  Volume 14 Issue 5 Pages 1067-1074  
  Keywords Epilepsy; epilepsy detection; EEG; EEG channel fusion; convolutional neural network; self-attention  
  Abstract According to the World Health Organization, epilepsy affects more than 50 million people in the world, and specifically, 80% of them live in developing countries. Therefore, epilepsy has become among the major public issue for many governments and deserves to be engaged. Epilepsy is characterized by uncontrollable seizures in the subject due to a sudden abnormal functionality of the brain. Recurrence of epilepsy attacks change people’s lives and interferes with their daily activities. Although epilepsy has no cure, it could be mitigated with an appropriated diagnosis and medication. Usually, epilepsy diagnosis is based on the analysis of an electroencephalogram (EEG) of the patient. However, the process of searching for seizure patterns in a multichannel EEG recording is a visual demanding and time consuming task, even for experienced neurologists. Despite the recent progress in automatic recognition of epilepsy, the multichannel nature of EEG recordings still challenges current methods. In this work, a new method to detect epilepsy in multichannel EEG recordings is proposed. First, the method uses convolutions to perform channel fusion, and next, a self-attention network extracts temporal features to classify between interictal and ictal epilepsy states. The method was validated in the public CHB-MIT dataset using the k-fold cross-validation and achieved 99.74% of specificity and 99.15% of sensitivity, surpassing current approaches.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes IAM Approved no  
  Call Number Admin @ si @ Serial (down) 3856  
Permanent link to this record
 

 
Author Akshita Gupta; Sanath Narayan; Salman Khan; Fahad Shahbaz Khan; Ling Shao; Joost Van de Weijer edit  doi
openurl 
  Title Generative Multi-Label Zero-Shot Learning Type Journal Article
  Year 2023 Publication IEEE Transactions on Pattern Analysis and Machine Intelligence Abbreviated Journal TPAMI  
  Volume 45 Issue 12 Pages 14611-14624  
  Keywords Generalized zero-shot learning; Multi-label classification; Zero-shot object detection; Feature synthesis  
  Abstract Multi-label zero-shot learning strives to classify images into multiple unseen categories for which no data is available during training. The test samples can additionally contain seen categories in the generalized variant. Existing approaches rely on learning either shared or label-specific attention from the seen classes. Nevertheless, computing reliable attention maps for unseen classes during inference in a multi-label setting is still a challenge. In contrast, state-of-the-art single-label generative adversarial network (GAN) based approaches learn to directly synthesize the class-specific visual features from the corresponding class attribute embeddings. However, synthesizing multi-label features from GANs is still unexplored in the context of zero-shot setting. When multiple objects occur jointly in a single image, a critical question is how to effectively fuse multi-class information. In this work, we introduce different fusion approaches at the attribute-level, feature-level and cross-level (across attribute and feature-levels) for synthesizing multi-label features from their corresponding multi-label class embeddings. To the best of our knowledge, our work is the first to tackle the problem of multi-label feature synthesis in the (generalized) zero-shot setting. Our cross-level fusion-based generative approach outperforms the state-of-the-art on three zero-shot benchmarks: NUS-WIDE, Open Images and MS COCO. Furthermore, we show the generalization capabilities of our fusion approach in the zero-shot detection task on MS COCO, achieving favorable performance against existing methods.  
  Address December 2023  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes LAMP; PID2021-128178OB-I00 Approved no  
  Call Number Admin @ si @ Serial (down) 3853  
Permanent link to this record
 

 
Author G. Gasbarri; Matias Bilkis; E. Roda Salichs; J. Calsamiglia edit   pdf
url  openurl
  Title Sequential hypothesis testing for continuously-monitored quantum systems Type Journal Article
  Year 2024 Publication Quantum Abbreviated Journal  
  Volume 8 Issue 1289 Pages  
  Keywords  
  Abstract We consider a quantum system that is being continuously monitored, giving rise to a measurement signal. From such a stream of data, information needs to be inferred about the underlying system's dynamics. Here we focus on hypothesis testing problems and put forward the usage of sequential strategies where the signal is analyzed in real time, allowing the experiment to be concluded as soon as the underlying hypothesis can be identified with a certified prescribed success probability. We analyze the performance of sequential tests by studying the stopping-time behavior, showing a considerable advantage over currently-used strategies based on a fixed predetermined measurement time.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes xxxx Approved no  
  Call Number Admin @ si @ GBR2024 Serial (down) 3847  
Permanent link to this record
 

 
Author Bhalaji Nagarajan; Marc Bolaños; Eduardo Aguilar; Petia Radeva edit  url
openurl 
  Title Deep ensemble-based hard sample mining for food recognition Type Journal Article
  Year 2023 Publication Journal of Visual Communication and Image Representation Abbreviated Journal JVCIR  
  Volume 95 Issue Pages 103905  
  Keywords  
  Abstract Deep neural networks represent a compelling technique to tackle complex real-world problems, but are over-parameterized and often suffer from over- or under-confident estimates. Deep ensembles have shown better parameter estimations and often provide reliable uncertainty estimates that contribute to the robustness of the results. In this work, we propose a new metric to identify samples that are hard to classify. Our metric is defined as coincidence score for deep ensembles which measures the agreement of its individual models. The main hypothesis we rely on is that deep learning algorithms learn the low-loss samples better compared to large-loss samples. In order to compensate for this, we use controlled over-sampling on the identified ”hard” samples using proper data augmentation schemes to enable the models to learn those samples better. We validate the proposed metric using two public food datasets on different backbone architectures and show the improvements compared to the conventional deep neural network training using different performance metrics.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes MILAB Approved no  
  Call Number Admin @ si @ NBA2023 Serial (down) 3844  
Permanent link to this record
 

 
Author Ruben Tito; Dimosthenis Karatzas; Ernest Valveny edit   pdf
url  openurl
  Title Hierarchical multimodal transformers for Multipage DocVQA Type Journal Article
  Year 2023 Publication Pattern Recognition Abbreviated Journal PR  
  Volume 144 Issue 109834 Pages  
  Keywords  
  Abstract Existing work on DocVQA only considers single-page documents. However, in real applications documents are mostly composed of multiple pages that should be processed altogether. In this work, we propose a new multimodal hierarchical method Hi-VT5, that overcomes the limitations of current methods to process long multipage documents. In contrast to previous hierarchical methods that focus on different semantic granularity (He et al., 2021) or different subtasks (Zhou et al., 2022) used in image classification. Our method is a hierarchical transformer architecture where the encoder learns to summarize the most relevant information of every page and then, the decoder uses this summarized representation to generate the final answer, following a bottom-up approach. Moreover, due to the lack of multipage DocVQA datasets, we also introduce MP-DocVQA, an extension of SP-DocVQA where questions are posed over multipage documents instead of single pages. Through extensive experimentation, we demonstrate that Hi-VT5 is able, in a single stage, to answer the questions and provide the page that contains the answer, which can be used as a kind of explainability measure.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes DAG Approved no  
  Call Number Admin @ si @ TKV2023 Serial (down) 3836  
Permanent link to this record
Select All    Deselect All
 |   | 
Details
   print

Save Citations:
Export Records: