toggle visibility Search & Display Options

Select All    Deselect All
 |   | 
Details
   print
  Records Links
Author Hugo Bertiche; Niloy J Mitra; Kuldeep Kulkarni; Chun Hao Paul Huang; Tuanfeng Y Wang; Meysam Madadi; Sergio Escalera; Duygu Ceylan edit  url
doi  openurl
  Title Blowing in the Wind: CycleNet for Human Cinemagraphs from Still Images Type Conference Article
  Year 2023 Publication 36th IEEE Conference on Computer Vision and Pattern Recognition Abbreviated Journal  
  Volume Issue Pages 459-468  
  Keywords  
  Abstract Cinemagraphs are short looping videos created by adding subtle motions to a static image. This kind of media is popular and engaging. However, automatic generation of cinemagraphs is an underexplored area and current solutions require tedious low-level manual authoring by artists. In this paper, we present an automatic method that allows generating human cinemagraphs from single RGB images. We investigate the problem in the context of dressed humans under the wind. At the core of our method is a novel cyclic neural network that produces looping cinemagraphs for the target loop duration. To circumvent the problem of collecting real data, we demonstrate that it is possible, by working in the image normal space, to learn garment motion dynamics on synthetic data and generalize to real data. We evaluate our method on both synthetic and real data and demonstrate that it is possible to create compelling and plausible cinemagraphs from single RGB images.  
  Address Vancouver; Canada; June 2023  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference (down) CVPR  
  Notes HUPBA Approved no  
  Call Number Admin @ si @ BMK2023 Serial 3921  
Permanent link to this record
 

 
Author Albin Soutif; Antonio Carta; Joost Van de Weijer edit   pdf
url  openurl
  Title Improving Online Continual Learning Performance and Stability with Temporal Ensembles Type Conference Article
  Year 2023 Publication 2nd Conference on Lifelong Learning Agents Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract Neural networks are very effective when trained on large datasets for a large number of iterations. However, when they are trained on non-stationary streams of data and in an online fashion, their performance is reduced (1) by the online setup, which limits the availability of data, (2) due to catastrophic forgetting because of the non-stationary nature of the data. Furthermore, several recent works (Caccia et al., 2022; Lange et al., 2023) arXiv:2205.13452 showed that replay methods used in continual learning suffer from the stability gap, encountered when evaluating the model continually (rather than only on task boundaries). In this article, we study the effect of model ensembling as a way to improve performance and stability in online continual learning. We notice that naively ensembling models coming from a variety of training tasks increases the performance in online continual learning considerably. Starting from this observation, and drawing inspirations from semi-supervised learning ensembling methods, we use a lightweight temporal ensemble that computes the exponential moving average of the weights (EMA) at test time, and show that it can drastically increase the performance and stability when used in combination with several methods from the literature.  
  Address Montreal; Canada; August 2023  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference (down) COLLAS  
  Notes LAMP Approved no  
  Call Number Admin @ si @ SCW2023 Serial 3922  
Permanent link to this record
 

 
Author Stepan Simsa; Michal Uricar; Milan Sulc; Yash Patel; Ahmed Hamdi; Matej Kocian; Matyas Skalicky; Jiri Matas; Antoine Doucet; Mickael Coustaty; Dimosthenis Karatzas edit  url
doi  openurl
  Title Overview of DocILE 2023: Document Information Localization and Extraction Type Conference Article
  Year 2023 Publication International Conference of the Cross-Language Evaluation Forum for European Languages Abbreviated Journal  
  Volume 14163 Issue Pages 276–293  
  Keywords Information Extraction; Computer Vision; Natural Language Processing; Optical Character Recognition; Document Understanding  
  Abstract This paper provides an overview of the DocILE 2023 Competition, its tasks, participant submissions, the competition results and possible future research directions. This first edition of the competition focused on two Information Extraction tasks, Key Information Localization and Extraction (KILE) and Line Item Recognition (LIR). Both of these tasks require detection of pre-defined categories of information in business documents. The second task additionally requires correctly grouping the information into tuples, capturing the structure laid out in the document. The competition used the recently published DocILE dataset and benchmark that stays open to new submissions. The diversity of the participant solutions indicates the potential of the dataset as the submissions included pure Computer Vision, pure Natural Language Processing, as well as multi-modal solutions and utilized all of the parts of the dataset, including the annotated, synthetic and unlabeled subsets.  
  Address Thessaloniki; Greece; September 2023  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title LNCS  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference (down) CLEF  
  Notes DAG Approved no  
  Call Number Admin @ si @ SUS2023a Serial 3924  
Permanent link to this record
 

 
Author Mohamed Ramzy Ibrahim; Robert Benavente; Daniel Ponsa; Felipe Lumbreras edit  url
openurl 
  Title Unveiling the Influence of Image Super-Resolution on Aerial Scene Classification Type Conference Article
  Year 2023 Publication Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications Abbreviated Journal  
  Volume 14469 Issue Pages 214–228  
  Keywords  
  Abstract Deep learning has made significant advances in recent years, and as a result, it is now in a stage where it can achieve outstanding results in tasks requiring visual understanding of scenes. However, its performance tends to decline when dealing with low-quality images. The advent of super-resolution (SR) techniques has started to have an impact on the field of remote sensing by enabling the restoration of fine details and enhancing image quality, which could help to increase performance in other vision tasks. However, in previous works, contradictory results for scene visual understanding were achieved when SR techniques were applied. In this paper, we present an experimental study on the impact of SR on enhancing aerial scene classification. Through the analysis of different state-of-the-art SR algorithms, including traditional methods and deep learning-based approaches, we unveil the transformative potential of SR in overcoming the limitations of low-resolution (LR) aerial imagery. By enhancing spatial resolution, more fine details are captured, opening the door for an improvement in scene understanding. We also discuss the effect of different image scales on the quality of SR and its effect on aerial scene classification. Our experimental work demonstrates the significant impact of SR on enhancing aerial scene classification compared to LR images, opening new avenues for improved remote sensing applications.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title LNCS  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference (down) CIARP  
  Notes MSIAU Approved no  
  Call Number Admin @ si @ IBP2023 Serial 4008  
Permanent link to this record
 

 
Author Guillermo Torres; Debora Gil; Antoni Rosell; S. Mena; Carles Sanchez edit  openurl
  Title Virtual Radiomics Biopsy for the Histological Diagnosis of Pulmonary Nodules Type Conference Article
  Year 2023 Publication 37th International Congress and Exhibition is organized by Computer Assisted Radiology and Surgery Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract Pòster  
  Address Munich; Germany; June 2023  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference (down) CARS  
  Notes IAM Approved no  
  Call Number Admin @ si @ TGR2023a Serial 3950  
Permanent link to this record
 

 
Author Mohamed Ali Souibgui; Sanket Biswas; Andres Mafla; Ali Furkan Biten; Alicia Fornes; Yousri Kessentini; Josep Llados; Lluis Gomez; Dimosthenis Karatzas edit  url
openurl 
  Title Text-DIAE: a self-supervised degradation invariant autoencoder for text recognition and document enhancement Type Conference Article
  Year 2023 Publication Proceedings of the 37th AAAI Conference on Artificial Intelligence Abbreviated Journal  
  Volume 37 Issue 2 Pages  
  Keywords Representation Learning for Vision; CV Applications; CV Language and Vision; ML Unsupervised; Self-Supervised Learning  
  Abstract In this paper, we propose a Text-Degradation Invariant Auto Encoder (Text-DIAE), a self-supervised model designed to tackle two tasks, text recognition (handwritten or scene-text) and document image enhancement. We start by employing a transformer-based architecture that incorporates three pretext tasks as learning objectives to be optimized during pre-training without the usage of labelled data. Each of the pretext objectives is specifically tailored for the final downstream tasks. We conduct several ablation experiments that confirm the design choice of the selected pretext tasks. Importantly, the proposed model does not exhibit limitations of previous state-of-the-art methods based on contrastive losses, while at the same time requiring substantially fewer data samples to converge. Finally, we demonstrate that our method surpasses the state-of-the-art in existing supervised and self-supervised settings in handwritten and scene text recognition and document image enhancement. Our code and trained models will be made publicly available at https://github.com/dali92002/SSL-OCR  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference (down) AAAI  
  Notes DAG Approved no  
  Call Number Admin @ si @ SBM2023 Serial 3848  
Permanent link to this record
 

 
Author Khanh Nguyen; Ali Furkan Biten; Andres Mafla; Lluis Gomez; Dimosthenis Karatzas edit  url
openurl 
  Title Show, Interpret and Tell: Entity-Aware Contextualised Image Captioning in Wikipedia Type Conference Article
  Year 2023 Publication Proceedings of the 37th AAAI Conference on Artificial Intelligence Abbreviated Journal  
  Volume 37 Issue 2 Pages 1940-1948  
  Keywords  
  Abstract Humans exploit prior knowledge to describe images, and are able to adapt their explanation to specific contextual information given, even to the extent of inventing plausible explanations when contextual information and images do not match. In this work, we propose the novel task of captioning Wikipedia images by integrating contextual knowledge. Specifically, we produce models that jointly reason over Wikipedia articles, Wikimedia images and their associated descriptions to produce contextualized captions. The same Wikimedia image can be used to illustrate different articles, and the produced caption needs to be adapted to the specific context allowing us to explore the limits of the model to adjust captions to different contextual information. Dealing with out-of-dictionary words and Named Entities is a challenging task in this domain. To address this, we propose a pre-training objective, Masked Named Entity Modeling (MNEM), and show that this pretext task results to significantly improved models. Furthermore, we verify that a model pre-trained in Wikipedia generalizes well to News Captioning datasets. We further define two different test splits according to the difficulty of the captioning task. We offer insights on the role and the importance of each modality and highlight the limitations of our model.  
  Address Washington; USA; February 2023  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference (down) AAAI  
  Notes DAG Approved no  
  Call Number Admin @ si @ NBM2023 Serial 3860  
Permanent link to this record
 

 
Author Pau Cano; Alvaro Caravaca; Debora Gil; Eva Musulen edit   pdf
url  openurl
  Title Diagnosis of Helicobacter pylori using AutoEncoders for the Detection of Anomalous Staining Patterns in Immunohistochemistry Images Type Miscellaneous
  Year 2023 Publication Arxiv Abbreviated Journal  
  Volume Issue Pages 107241  
  Keywords  
  Abstract This work addresses the detection of Helicobacter pylori a bacterium classified since 1994 as class 1 carcinogen to humans. By its highest specificity and sensitivity, the preferred diagnosis technique is the analysis of histological images with immunohistochemical staining, a process in which certain stained antibodies bind to antigens of the biological element of interest. This analysis is a time demanding task, which is currently done by an expert pathologist that visually inspects the digitized samples.
We propose to use autoencoders to learn latent patterns of healthy tissue and detect H. pylori as an anomaly in image staining. Unlike existing classification approaches, an autoencoder is able to learn patterns in an unsupervised manner (without the need of image annotations) with high performance. In particular, our model has an overall 91% of accuracy with 86\% sensitivity, 96% specificity and 0.97 AUC in the detection of H. pylori.
 
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference (down)  
  Notes IAM Approved no  
  Call Number Admin @ si @ CCG2023 Serial 3855  
Permanent link to this record
 

 
Author Mohamed Ali Souibgui; Asma Bensalah; Jialuo Chen; Alicia Fornes; Michelle Waldispühl edit  url
doi  openurl
  Title A User Perspective on HTR methods for the Automatic Transcription of Rare Scripts: The Case of Codex Runicus Just Accepted Type Journal Article
  Year 2023 Publication ACM Journal on Computing and Cultural Heritage Abbreviated Journal JOCCH  
  Volume 15 Issue 4 Pages 1-18  
  Keywords  
  Abstract Recent breakthroughs in Artificial Intelligence, Deep Learning and Document Image Analysis and Recognition have significantly eased the creation of digital libraries and the transcription of historical documents. However, for documents in rare scripts with few labelled training data available, current Handwritten Text Recognition (HTR) systems are too constraint. Moreover, research on HTR often focuses on technical aspects only, and rarely puts emphasis on implementing software tools for scholars in Humanities. In this article, we describe, compare and analyse different transcription methods for rare scripts. We evaluate their performance in a real use case of a medieval manuscript written in the runic script (Codex Runicus) and discuss advantages and disadvantages of each method from the user perspective. From this exhaustive analysis and comparison with a fully manual transcription, we raise conclusions and provide recommendations to scholars interested in using automatic transcription tools.  
  Address  
  Corporate Author Thesis  
  Publisher ACM Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference (down)  
  Notes DAG; 600.121; 600.162; 602.230; 600.140 Approved no  
  Call Number Admin @ si @ SBC2023 Serial 3732  
Permanent link to this record
 

 
Author Juan Borrego-Carazo; Carles Sanchez; David Castells; Jordi Carrabina; Debora Gil edit   pdf
doi  openurl
  Title BronchoPose: an analysis of data and model configuration for vision-based bronchoscopy pose estimation Type Journal Article
  Year 2023 Publication Computer Methods and Programs in Biomedicine Abbreviated Journal CMPB  
  Volume 228 Issue Pages 107241  
  Keywords Videobronchoscopy guiding; Deep learning; Architecture optimization; Datasets; Standardized evaluation framework; Pose estimation  
  Abstract Vision-based bronchoscopy (VB) models require the registration of the virtual lung model with the frames from the video bronchoscopy to provide effective guidance during the biopsy. The registration can be achieved by either tracking the position and orientation of the bronchoscopy camera or by calibrating its deviation from the pose (position and orientation) simulated in the virtual lung model. Recent advances in neural networks and temporal image processing have provided new opportunities for guided bronchoscopy. However, such progress has been hindered by the lack of comparative experimental conditions.
In the present paper, we share a novel synthetic dataset allowing for a fair comparison of methods. Moreover, this paper investigates several neural network architectures for the learning of temporal information at different levels of subject personalization. In order to improve orientation measurement, we also present a standardized comparison framework and a novel metric for camera orientation learning. Results on the dataset show that the proposed metric and architectures, as well as the standardized conditions, provide notable improvements to current state-of-the-art camera pose estimation in video bronchoscopy.
 
  Address  
  Corporate Author Thesis  
  Publisher Elsevier Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference (down)  
  Notes IAM; Approved no  
  Call Number Admin @ si @ BSC2023 Serial 3702  
Permanent link to this record
 

 
Author Jose Luis Gomez; Gabriel Villalonga; Antonio Lopez edit  url
openurl 
  Title Co-Training for Unsupervised Domain Adaptation of Semantic Segmentation Models Type Journal Article
  Year 2023 Publication Sensors – Special Issue on “Machine Learning for Autonomous Driving Perception and Prediction” Abbreviated Journal SENS  
  Volume 23 Issue 2 Pages 621  
  Keywords Domain adaptation; semi-supervised learning; Semantic segmentation; Autonomous driving  
  Abstract Semantic image segmentation is a central and challenging task in autonomous driving, addressed by training deep models. Since this training draws to a curse of human-based image labeling, using synthetic images with automatically generated labels together with unlabeled real-world images is a promising alternative. This implies to address an unsupervised domain adaptation (UDA) problem. In this paper, we propose a new co-training procedure for synth-to-real UDA of semantic
segmentation models. It consists of a self-training stage, which provides two domain-adapted models, and a model collaboration loop for the mutual improvement of these two models. These models are then used to provide the final semantic segmentation labels (pseudo-labels) for the real-world images. The overall
procedure treats the deep models as black boxes and drives their collaboration at the level of pseudo-labeled target images, i.e., neither modifying loss functions is required, nor explicit feature alignment. We test our proposal on standard synthetic and real-world datasets for on-board semantic segmentation. Our
procedure shows improvements ranging from ∼13 to ∼26 mIoU points over baselines, so establishing new state-of-the-art results.
 
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference (down)  
  Notes ADAS; no proj Approved no  
  Call Number Admin @ si @ GVL2023 Serial 3705  
Permanent link to this record
 

 
Author Reuben Dorent; Aaron Kujawa; Marina Ivory; Spyridon Bakas; Nikola Rieke; Samuel Joutard; Ben Glocker; Jorge Cardoso; Marc Modat; Kayhan Batmanghelich; Arseniy Belkov; Maria Baldeon Calisto; Jae Won Choi; Benoit M. Dawant; Hexin Dong; Sergio Escalera; Yubo Fan; Lasse Hansen; Mattias P. Heinrich; Smriti Joshi; Victoriya Kashtanova; Hyeon Gyu Kim; Satoshi Kondo; Christian N. Kruse; Susana K. Lai-Yuen; Hao Li; Han Liu; Buntheng Ly; Ipek Oguz; Hyungseob Shin; Boris Shirokikh; Zixian Su; Guotai Wang; Jianghao Wu; Yanwu Xu; Kai Yao; Li Zhang; Sebastien Ourselin, edit   pdf
url  doi
openurl 
  Title CrossMoDA 2021 challenge: Benchmark of Cross-Modality Domain Adaptation techniques for Vestibular Schwannoma and Cochlea Segmentation Type Journal Article
  Year 2023 Publication Medical Image Analysis Abbreviated Journal MIA  
  Volume 83 Issue Pages 102628  
  Keywords Domain Adaptation; Segmen tation; Vestibular Schwnannoma  
  Abstract Domain Adaptation (DA) has recently raised strong interests in the medical imaging community. While a large variety of DA techniques has been proposed for image segmentation, most of these techniques have been validated either on private datasets or on small publicly available datasets. Moreover, these datasets mostly addressed single-class problems. To tackle these limitations, the Cross-Modality Domain Adaptation (crossMoDA) challenge was organised in conjunction with the 24th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2021). CrossMoDA is the first large and multi-class benchmark for unsupervised cross-modality DA. The challenge's goal is to segment two key brain structures involved in the follow-up and treatment planning of vestibular schwannoma (VS): the VS and the cochleas. Currently, the diagnosis and surveillance in patients with VS are performed using contrast-enhanced T1 (ceT1) MRI. However, there is growing interest in using non-contrast sequences such as high-resolution T2 (hrT2) MRI. Therefore, we created an unsupervised cross-modality segmentation benchmark. The training set provides annotated ceT1 (N=105) and unpaired non-annotated hrT2 (N=105). The aim was to automatically perform unilateral VS and bilateral cochlea segmentation on hrT2 as provided in the testing set (N=137). A total of 16 teams submitted their algorithm for the evaluation phase. The level of performance reached by the top-performing teams is strikingly high (best median Dice – VS:88.4%; Cochleas:85.7%) and close to full supervision (median Dice – VS:92.5%; Cochleas:87.7%). All top-performing methods made use of an image-to-image translation approach to transform the source-domain images into pseudo-target-domain images. A segmentation network was then trained using these generated images and the manual annotations provided for the source image.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference (down)  
  Notes HUPBA Approved no  
  Call Number Admin @ si @ DKI2023 Serial 3706  
Permanent link to this record
 

 
Author Swathikiran Sudhakaran; Sergio Escalera; Oswald Lanz edit   pdf
doi  openurl
  Title Gate-Shift-Fuse for Video Action Recognition Type Journal Article
  Year 2023 Publication IEEE Transactions on Pattern Analysis and Machine Intelligence Abbreviated Journal TPAMI  
  Volume 45 Issue 9 Pages 10913-10928  
  Keywords Action Recognition; Video Classification; Spatial Gating; Channel Fusion  
  Abstract Convolutional Neural Networks are the de facto models for image recognition. However 3D CNNs, the straight forward extension of 2D CNNs for video recognition, have not achieved the same success on standard action recognition benchmarks. One of the main reasons for this reduced performance of 3D CNNs is the increased computational complexity requiring large scale annotated datasets to train them in scale. 3D kernel factorization approaches have been proposed to reduce the complexity of 3D CNNs. Existing kernel factorization approaches follow hand-designed and hard-wired techniques. In this paper we propose Gate-Shift-Fuse (GSF), a novel spatio-temporal feature extraction module which controls interactions in spatio-temporal decomposition and learns to adaptively route features through time and combine them in a data dependent manner. GSF leverages grouped spatial gating to decompose input tensor and channel weighting to fuse the decomposed tensors. GSF can be inserted into existing 2D CNNs to convert them into an efficient and high performing spatio-temporal feature extractor, with negligible parameter and compute overhead. We perform an extensive analysis of GSF using two popular 2D CNN families and achieve state-of-the-art or competitive performance on five standard action recognition benchmarks.  
  Address 1 Sept. 2023  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference (down)  
  Notes HUPBA; no menciona Approved no  
  Call Number Admin @ si @ SEL2023 Serial 3814  
Permanent link to this record
 

 
Author Guillermo Torres; Debora Gil; Antoni Rosell; S. Mena; Carles Sanchez edit  openurl
  Title Virtual Radiomics Biopsy for the Histological Diagnosis of Pulmonary Nodules – Intermediate Results of the RadioLung Project Type Journal Article
  Year 2023 Publication International Journal of Computer Assisted Radiology and Surgery Abbreviated Journal IJCARS  
  Volume Issue Pages  
  Keywords  
  Abstract  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference (down)  
  Notes IAM Approved no  
  Call Number Admin @ si @ TGM2023 Serial 3830  
Permanent link to this record
 

 
Author Kunal Biswas; Palaiahnakote Shivakumara; Umapada Pal; Tong Lu; Michel Blumenstein; Josep Llados edit  url
openurl 
  Title Classification of aesthetic natural scene images using statistical and semantic features Type Journal Article
  Year 2023 Publication Multimedia Tools and Applications Abbreviated Journal MTAP  
  Volume 82 Issue 9 Pages 13507-13532  
  Keywords  
  Abstract Aesthetic image analysis is essential for improving the performance of multimedia image retrieval systems, especially from a repository of social media and multimedia content stored on mobile devices. This paper presents a novel method for classifying aesthetic natural scene images by studying the naturalness of image content using statistical features, and reading text in the images using semantic features. Unlike existing methods that focus only on image quality with human information, the proposed approach focuses on image features as well as text-based semantic features without human intervention to reduce the gap between subjectivity and objectivity in the classification. The aesthetic classes considered in this work are (i) Very Pleasant, (ii) Pleasant, (iii) Normal and (iv) Unpleasant. The naturalness is represented by features of focus, defocus, perceived brightness, perceived contrast, blurriness and noisiness, while semantics are represented by text recognition, description of the images and labels of images, profile pictures, and banner images. Furthermore, a deep learning model is proposed in a novel way to fuse statistical and semantic features for the classification of aesthetic natural scene images. Experiments on our own dataset and the standard datasets demonstrate that the proposed approach achieves 92.74%, 88.67% and 83.22% average classification rates on our own dataset, AVA dataset and CUHKPQ dataset, respectively. Furthermore, a comparative study of the proposed model with the existing methods shows that the proposed method is effective for the classification of aesthetic social media images.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference (down)  
  Notes DAG Approved no  
  Call Number Admin @ si @ BSP2023 Serial 3873  
Permanent link to this record
Select All    Deselect All
 |   | 
Details
   print

Save Citations:
Export Records: