Home | << 1 2 3 4 5 6 7 8 9 10 >> [11–11] |
Records | |||||
---|---|---|---|---|---|
Author | Pau Cano; Debora Gil; Eva Musulen | ||||
Title | Towards automatic detection of helicobacter pylori in histological samples of gastric tissue | Type | Conference Article | ||
Year | 2023 | Publication | IEEE International Symposium on Biomedical Imaging | Abbreviated Journal | |
Volume | Issue | Pages | |||
Keywords | |||||
Abstract | |||||
Address | Cartagena de Indias; Colombia; April 2023 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | ISBI | ||
Notes | IAM | Approved | no | ||
Call Number | Admin @ si @ CGM2023 | Serial | 3953 | ||
Permanent link to this record | |||||
Author | Antonio Carta; Andrea Cossu; Vincenzo Lomonaco; Davide Bacciu; Joost Van de Weijer | ||||
Title | Projected Latent Distillation for Data-Agnostic Consolidation in Distributed Continual Learning | Type | Miscellaneous | ||
Year | 2023 | Publication | Arxiv | Abbreviated Journal | |
Volume | Issue | Pages | |||
Keywords | |||||
Abstract | Distributed learning on the edge often comprises self-centered devices (SCD) which learn local tasks independently and are unwilling to contribute to the performance of other SDCs. How do we achieve forward transfer at zero cost for the single SCDs? We formalize this problem as a Distributed Continual Learning scenario, where SCD adapt to local tasks and a CL model consolidates the knowledge from the resulting stream of models without looking at the SCD's private data. Unfortunately, current CL methods are not directly applicable to this scenario. We propose Data-Agnostic Consolidation (DAC), a novel double knowledge distillation method that consolidates the stream of SC models without using the original data. DAC performs distillation in the latent space via a novel Projected Latent Distillation loss. Experimental results show that DAC enables forward transfer between SCDs and reaches state-of-the-art accuracy on Split CIFAR100, CORe50 and Split TinyImageNet, both in reharsal-free and distributed CL scenarios. Somewhat surprisingly, even a single out-of-distribution image is sufficient as the only source of data during consolidation. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | LAMP | Approved | no | ||
Call Number | Admin @ si @ CCL2023 | Serial | 3871 | ||
Permanent link to this record | |||||
Author | Pau Cano; Alvaro Caravaca; Debora Gil; Eva Musulen | ||||
Title | Diagnosis of Helicobacter pylori using AutoEncoders for the Detection of Anomalous Staining Patterns in Immunohistochemistry Images | Type | Miscellaneous | ||
Year | 2023 | Publication | Arxiv | Abbreviated Journal | |
Volume | Issue | Pages | 107241 | ||
Keywords | |||||
Abstract | This work addresses the detection of Helicobacter pylori a bacterium classified since 1994 as class 1 carcinogen to humans. By its highest specificity and sensitivity, the preferred diagnosis technique is the analysis of histological images with immunohistochemical staining, a process in which certain stained antibodies bind to antigens of the biological element of interest. This analysis is a time demanding task, which is currently done by an expert pathologist that visually inspects the digitized samples.
We propose to use autoencoders to learn latent patterns of healthy tissue and detect H. pylori as an anomaly in image staining. Unlike existing classification approaches, an autoencoder is able to learn patterns in an unsupervised manner (without the need of image annotations) with high performance. In particular, our model has an overall 91% of accuracy with 86\% sensitivity, 96% specificity and 0.97 AUC in the detection of H. pylori. |
||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | IAM | Approved | no | ||
Call Number | Admin @ si @ CCG2023 | Serial | 3855 | ||
Permanent link to this record | |||||
Author | P. Canals; Simone Balocco; O. Diaz; J. Li; A. Garcia Tornel; M. Olive Gadea; M. Ribo | ||||
Title | A fully automatic method for vascular tortuosity feature extraction in the supra-aortic region: unraveling possibilities in stroke treatment planning | Type | Journal Article | ||
Year | 2023 | Publication | Computerized Medical Imaging and Graphics | Abbreviated Journal | CMIG |
Volume | 104 | Issue | 102170 | Pages | |
Keywords | Artificial intelligence; Deep learning; Stroke; Thrombectomy; Vascular feature extraction; Vascular tortuosity | ||||
Abstract | Vascular tortuosity of supra-aortic vessels is widely considered one of the main reasons for failure and delays in endovascular treatment of large vessel occlusion in patients with acute ischemic stroke. Characterization of tortuosity is a challenging task due to the lack of objective, robust and effective analysis tools. We present a fully automatic method for arterial segmentation, vessel labelling and tortuosity feature extraction applied to the supra-aortic region. A sample of 566 computed tomography angiography scans from acute ischemic stroke patients (aged 74.8 ± 12.9, 51.0% females) were used for training, validation and testing of a segmentation module based on a U-Net architecture (162 cases) and a vessel labelling module powered by a graph U-Net (566 cases). Successively, 30 cases were processed for testing of a tortuosity feature extraction module. Measurements obtained through automatic processing were compared to manual annotations from two observers for a thorough validation of the method. The proposed feature extraction method presented similar performance to the inter-rater variability observed in the measurement of 33 geometrical and morphological features of the arterial anatomy in the supra-aortic region. This system will contribute to the development of more complex models to advance the treatment of stroke by adding immediate automation, objectivity, repeatability and robustness to the vascular tortuosity characterization of patients. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | MILAB | Approved | no | ||
Call Number | Admin @ si @ CBD2023 | Serial | 4005 | ||
Permanent link to this record | |||||
Author | Roger Max Calle Quispe; Maya Aghaei Gavari; Eduardo Aguilar Torres | ||||
Title | Towards real-time accurate safety helmets detection through a deep learning-based method | Type | Journal | ||
Year | 2023 | Publication | Ingeniare. Revista chilena de ingenieria | Abbreviated Journal | |
Volume | 31 | Issue | 12 | Pages | |
Keywords | |||||
Abstract | Occupational safety is a fundamental activity in industries and revolves around the management of the necessary controls that must be present to mitigate occupational risks. These controls include verifying the use of Personal Protection Equipment (PPE). Within PPE, safety helmets are vital to reducing severe or fatal consequences caused by head injuries. This problem has been addressed recently by various research based on deep learning to detect the usage of safety helmets by the present people in the industrial field.
These works have achieved promising results for safety helmet detection using object detection methods from the YOLO family. In this work, we propose to analyze the performance of Scaled-YOLOv4, a novel model of the YOLO family that has yet to be previously studied for this problem. The performance of the Scaled-YOLOv4 is evaluated on two public databases, carefully selected among the previously proposed datasets for the occupational safety framework. We demonstrate the superiority of Scaled-YOLOv4 in terms of mAP and Fl-score concerning the previous works for both databases. Further, we summarize the currently available datasets for safety helmet detection purposes and discuss their suitability. |
||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | MILAB | Approved | no | ||
Call Number | Admin @ si @ CAA2023 | Serial | 3846 | ||
Permanent link to this record | |||||
Author | Kunal Biswas; Palaiahnakote Shivakumara; Umapada Pal; Tong Lu; Michel Blumenstein; Josep Llados | ||||
Title | Classification of aesthetic natural scene images using statistical and semantic features | Type | Journal Article | ||
Year | 2023 | Publication | Multimedia Tools and Applications | Abbreviated Journal | MTAP |
Volume | 82 | Issue | 9 | Pages | 13507-13532 |
Keywords | |||||
Abstract | Aesthetic image analysis is essential for improving the performance of multimedia image retrieval systems, especially from a repository of social media and multimedia content stored on mobile devices. This paper presents a novel method for classifying aesthetic natural scene images by studying the naturalness of image content using statistical features, and reading text in the images using semantic features. Unlike existing methods that focus only on image quality with human information, the proposed approach focuses on image features as well as text-based semantic features without human intervention to reduce the gap between subjectivity and objectivity in the classification. The aesthetic classes considered in this work are (i) Very Pleasant, (ii) Pleasant, (iii) Normal and (iv) Unpleasant. The naturalness is represented by features of focus, defocus, perceived brightness, perceived contrast, blurriness and noisiness, while semantics are represented by text recognition, description of the images and labels of images, profile pictures, and banner images. Furthermore, a deep learning model is proposed in a novel way to fuse statistical and semantic features for the classification of aesthetic natural scene images. Experiments on our own dataset and the standard datasets demonstrate that the proposed approach achieves 92.74%, 88.67% and 83.22% average classification rates on our own dataset, AVA dataset and CUHKPQ dataset, respectively. Furthermore, a comparative study of the proposed model with the existing methods shows that the proposed method is effective for the classification of aesthetic social media images. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | DAG | Approved | no | ||
Call Number | Admin @ si @ BSP2023 | Serial | 3873 | ||
Permanent link to this record | |||||
Author | Juan Borrego-Carazo; Carles Sanchez; David Castells; Jordi Carrabina; Debora Gil | ||||
Title | BronchoPose: an analysis of data and model configuration for vision-based bronchoscopy pose estimation | Type | Journal Article | ||
Year | 2023 | Publication | Computer Methods and Programs in Biomedicine | Abbreviated Journal | CMPB |
Volume | 228 | Issue | Pages | 107241 | |
Keywords | Videobronchoscopy guiding; Deep learning; Architecture optimization; Datasets; Standardized evaluation framework; Pose estimation | ||||
Abstract | Vision-based bronchoscopy (VB) models require the registration of the virtual lung model with the frames from the video bronchoscopy to provide effective guidance during the biopsy. The registration can be achieved by either tracking the position and orientation of the bronchoscopy camera or by calibrating its deviation from the pose (position and orientation) simulated in the virtual lung model. Recent advances in neural networks and temporal image processing have provided new opportunities for guided bronchoscopy. However, such progress has been hindered by the lack of comparative experimental conditions.
In the present paper, we share a novel synthetic dataset allowing for a fair comparison of methods. Moreover, this paper investigates several neural network architectures for the learning of temporal information at different levels of subject personalization. In order to improve orientation measurement, we also present a standardized comparison framework and a novel metric for camera orientation learning. Results on the dataset show that the proposed metric and architectures, as well as the standardized conditions, provide notable improvements to current state-of-the-art camera pose estimation in video bronchoscopy. |
||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Elsevier | Place of Publication | Editor | ||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | IAM; | Approved | no | ||
Call Number | Admin @ si @ BSC2023 | Serial | 3702 | ||
Permanent link to this record | |||||
Author | Asma Bensalah; Antonio Parziale; Giuseppe De Gregorio; Angelo Marcelli; Alicia Fornes; Josep Llados | ||||
Title | I Can’t Believe It’s Not Better: In-air Movement for Alzheimer Handwriting Synthetic Generation | Type | Conference Article | ||
Year | 2023 | Publication | 21st International Graphonomics Conference | Abbreviated Journal | |
Volume | Issue | Pages | 136–148 | ||
Keywords | |||||
Abstract | During recent years, there here has been a boom in terms of deep learning use for handwriting analysis and recognition. One main application for handwriting analysis is early detection and diagnosis in the health field. Unfortunately, most real case problems still suffer a scarcity of data, which makes difficult the use of deep learning-based models. To alleviate this problem, some works resort to synthetic data generation. Lately, more works are directed towards guided data synthetic generation, a generation that uses the domain and data knowledge to generate realistic data that can be useful to train deep learning models. In this work, we combine the domain knowledge about the Alzheimer’s disease for handwriting and use it for a more guided data generation. Concretely, we have explored the use of in-air movements for synthetic data generation. | ||||
Address | Evora; Portugal; October 2023 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | IGS | ||
Notes | DAG | Approved | no | ||
Call Number | Admin @ si @ BPG2023 | Serial | 3838 | ||
Permanent link to this record | |||||
Author | Parichehr Behjati; Pau Rodriguez; Carles Fernandez; Isabelle Hupont; Armin Mehri; Jordi Gonzalez | ||||
Title | Single image super-resolution based on directional variance attention network | Type | Journal Article | ||
Year | 2023 | Publication | Pattern Recognition | Abbreviated Journal | PR |
Volume | 133 | Issue | Pages | 108997 | |
Keywords | |||||
Abstract | Recent advances in single image super-resolution (SISR) explore the power of deep convolutional neural networks (CNNs) to achieve better performance. However, most of the progress has been made by scaling CNN architectures, which usually raise computational demands and memory consumption. This makes modern architectures less applicable in practice. In addition, most CNN-based SR methods do not fully utilize the informative hierarchical features that are helpful for final image recovery. In order to address these issues, we propose a directional variance attention network (DiVANet), a computationally efficient yet accurate network for SISR. Specifically, we introduce a novel directional variance attention (DiVA) mechanism to capture long-range spatial dependencies and exploit inter-channel dependencies simultaneously for more discriminative representations. Furthermore, we propose a residual attention feature group (RAFG) for parallelizing attention and residual block computation. The output of each residual block is linearly fused at the RAFG output to provide access to the whole feature hierarchy. In parallel, DiVA extracts most relevant features from the network for improving the final output and preventing information loss along the successive operations inside the network. Experimental results demonstrate the superiority of DiVANet over the state of the art in several datasets, while maintaining relatively low computation and memory footprint. The code is available at https://github.com/pbehjatii/DiVANet. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | ISE | Approved | no | ||
Call Number | Admin @ si @ BPF2023 | Serial | 3861 | ||
Permanent link to this record | |||||
Author | Iban Berganzo-Besga; Hector A. Orengo; Felipe Lumbreras; Aftab Alam; Rosie Campbell; Petrus J Gerrits; Jonas Gregorio de Souza; Afifa Khan; Maria Suarez Moreno; Jack Tomaney; Rebecca C Roberts; Cameron A Petrie | ||||
Title | Curriculum learning-based strategy for low-density archaeological mound detection from historical maps in India and Pakistan | Type | Journal Article | ||
Year | 2023 | Publication | Scientific Reports | Abbreviated Journal | ScR |
Volume | 13 | Issue | Pages | 11257 | |
Keywords | |||||
Abstract | This paper presents two algorithms for the large-scale automatic detection and instance segmentation of potential archaeological mounds on historical maps. Historical maps present a unique source of information for the reconstruction of ancient landscapes. The last 100 years have seen unprecedented landscape modifications with the introduction and large-scale implementation of mechanised agriculture, channel-based irrigation schemes, and urban expansion to name but a few. Historical maps offer a window onto disappearing landscapes where many historical and archaeological elements that no longer exist today are depicted. The algorithms focus on the detection and shape extraction of mound features with high probability of being archaeological settlements, mounds being one of the most commonly documented archaeological features to be found in the Survey of India historical map series, although not necessarily recognised as such at the time of surveying. Mound features with high archaeological potential are most commonly depicted through hachures or contour-equivalent form-lines, therefore, an algorithm has been designed to detect each of those features. Our proposed approach addresses two of the most common issues in archaeological automated survey, the low-density of archaeological features to be detected, and the small amount of training data available. It has been applied to all types of maps available of the historic 1″ to 1-mile series, thus increasing the complexity of the detection. Moreover, the inclusion of synthetic data, along with a Curriculum Learning strategy, has allowed the algorithm to better understand what the mound features look like. Likewise, a series of filters based on topographic setting, form, and size have been applied to improve the accuracy of the models. The resulting algorithms have a recall value of 52.61% and a precision of 82.31% for the hachure mounds, and a recall value of 70.80% and a precision of 70.29% for the form-line mounds, which allowed the detection of nearly 6000 mound features over an area of 470,500 km2, the largest such approach to have ever been applied. If we restrict our focus to the maps most similar to those used in the algorithm training, we reach recall values greater than 60% and precision values greater than 90%. This approach has shown the potential to implement an adaptive algorithm that allows, after a small amount of retraining with data detected from a new map, a better general mound feature detection in the same map. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | MSIAU | Approved | no | ||
Call Number | Admin @ si @ BOL2023 | Serial | 3976 | ||
Permanent link to this record | |||||
Author | Gisel Bastidas-Guacho; Patricio Moreno; Boris X. Vintimilla; Angel Sappa | ||||
Title | Application on the Loop of Multimodal Image Fusion: Trends on Deep-Learning Based Approaches | Type | Conference Article | ||
Year | 2023 | Publication | 13th International Conference on Pattern Recognition Systems | Abbreviated Journal | |
Volume | 14234 | Issue | Pages | 25–36 | |
Keywords | |||||
Abstract | Multimodal image fusion allows the combination of information from different modalities, which is useful for tasks such as object detection, edge detection, and tracking, to name a few. Using the fused representation for applications results in better task performance. There are several image fusion approaches, which have been summarized in surveys. However, the existing surveys focus on image fusion approaches where the application on the loop of multimodal image fusion is not considered. On the contrary, this study summarizes deep learning-based multimodal image fusion for computer vision (e.g., object detection) and image processing applications (e.g., semantic segmentation), that is, approaches where the application module leverages the multimodal fusion process to enhance the final result. Firstly, we introduce image fusion and the existing general frameworks for image fusion tasks such as multifocus, multiexposure and multimodal. Then, we describe the multimodal image fusion approaches. Next, we review the state-of-the-art deep learning multimodal image fusion approaches for vision applications. Finally, we conclude our survey with the trends of task-driven multimodal image fusion. | ||||
Address | Guayaquil; Ecuador; July 2023 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | ICPRS | ||
Notes | MSIAU | Approved | no | ||
Call Number | Admin @ si @ BMV2023 | Serial | 3932 | ||
Permanent link to this record | |||||
Author | Hugo Bertiche; Niloy J Mitra; Kuldeep Kulkarni; Chun Hao Paul Huang; Tuanfeng Y Wang; Meysam Madadi; Sergio Escalera; Duygu Ceylan | ||||
Title | Blowing in the Wind: CycleNet for Human Cinemagraphs from Still Images | Type | Conference Article | ||
Year | 2023 | Publication | 36th IEEE Conference on Computer Vision and Pattern Recognition | Abbreviated Journal | |
Volume | Issue | Pages | 459-468 | ||
Keywords | |||||
Abstract | Cinemagraphs are short looping videos created by adding subtle motions to a static image. This kind of media is popular and engaging. However, automatic generation of cinemagraphs is an underexplored area and current solutions require tedious low-level manual authoring by artists. In this paper, we present an automatic method that allows generating human cinemagraphs from single RGB images. We investigate the problem in the context of dressed humans under the wind. At the core of our method is a novel cyclic neural network that produces looping cinemagraphs for the target loop duration. To circumvent the problem of collecting real data, we demonstrate that it is possible, by working in the image normal space, to learn garment motion dynamics on synthetic data and generalize to real data. We evaluate our method on both synthetic and real data and demonstrate that it is possible to create compelling and plausible cinemagraphs from single RGB images. | ||||
Address | Vancouver; Canada; June 2023 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | CVPR | ||
Notes | HUPBA | Approved | no | ||
Call Number | Admin @ si @ BMK2023 | Serial | 3921 | ||
Permanent link to this record | |||||
Author | Souhail Bakkali; Zuheng Ming; Mickael Coustaty; Marçal Rusiñol; Oriol Ramos Terrades | ||||
Title | VLCDoC: Vision-Language Contrastive Pre-Training Model for Cross-Modal Document Classification | Type | Journal Article | ||
Year | 2023 | Publication | Pattern Recognition | Abbreviated Journal | PR |
Volume | 139 | Issue | Pages | 109419 | |
Keywords | |||||
Abstract | Multimodal learning from document data has achieved great success lately as it allows to pre-train semantically meaningful features as a prior into a learnable downstream approach. In this paper, we approach the document classification problem by learning cross-modal representations through language and vision cues, considering intra- and inter-modality relationships. Instead of merging features from different modalities into a common representation space, the proposed method exploits high-level interactions and learns relevant semantic information from effective attention flows within and across modalities. The proposed learning objective is devised between intra- and inter-modality alignment tasks, where the similarity distribution per task is computed by contracting positive sample pairs while simultaneously contrasting negative ones in the common feature representation space}. Extensive experiments on public document classification datasets demonstrate the effectiveness and the generalization capacity of our model on both low-scale and large-scale datasets. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISSN 0031-3203 | ISBN | Medium | ||
Area | Expedition | Conference | |||
Notes | DAG; 600.140; 600.121 | Approved | no | ||
Call Number | Admin @ si @ BMC2023 | Serial | 3826 | ||
Permanent link to this record | |||||
Author | Sonia Baeza; Debora Gil; Carles Sanchez; Guillermo Torres; Ignasi Garcia Olive; Ignasi Guasch; Samuel Garcia Reina; Felipe Andreo; Jose Luis Mate; Jose Luis Vercher; Antonio Rosell | ||||
Title | Biopsia virtual radiomica para el diagnóstico histológico de nódulos pulmonares – Resultados intermedios del proyecto Radiolung | Type | Conference Article | ||
Year | 2023 | Publication | SEPAR | Abbreviated Journal | |
Volume | Issue | Pages | |||
Keywords | |||||
Abstract | Pòster | ||||
Address | Granada; Spain; June 2023 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | SEPAR | ||
Notes | IAM | Approved | no | ||
Call Number | Admin @ si @ BGS2023 | Serial | 3951 | ||
Permanent link to this record | |||||
Author | Joakim Bruslund Haurum; Sergio Escalera; Graham W. Taylor; Thomas B. | ||||
Title | Which Tokens to Use? Investigating Token Reduction in Vision Transformers | Type | Conference Article | ||
Year | 2023 | Publication | Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops | Abbreviated Journal | |
Volume | Issue | Pages | |||
Keywords | |||||
Abstract | Since the introduction of the Vision Transformer (ViT), researchers have sought to make ViTs more efficient by removing redundant information in the processed tokens. While different methods have been explored to achieve this goal, we still lack understanding of the resulting reduction patterns and how those patterns differ across token reduction methods and datasets. To close this gap, we set out to understand the reduction patterns of 10 different token reduction methods using four image classification datasets. By systematically comparing these methods on the different classification tasks, we find that the Top-K pruning method is a surprisingly strong baseline. Through in-depth analysis of the different methods, we determine that: the reduction patterns are generally not consistent when varying the capacity of the backbone model, the reduction patterns of pruning-based methods significantly differ from fixed radial patterns, and the reduction patterns of pruning-based methods are correlated across classification datasets. Finally we report that the similarity of reduction patterns is a moderate-to-strong proxy for model performance. Project page at https://vap.aau.dk/tokens. | ||||
Address | Paris; France; October 2023 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | ICCVW | ||
Notes | HUPBA | Approved | no | ||
Call Number | Admin @ si @ BET2023 | Serial | 3940 | ||
Permanent link to this record |