Publicacions CVC -- Query Results

	Publicacions CVC Home \| Show All \| Simple Search \| Advanced Search \| Add Record \| Import	Login Quick Search: Field: contains: ...
	76–90 of 161 records found matching your query (RSS \| history):

Search & Display Options

Select All Deselect All

<< 1 2 3 4 5 6 7 8 9 10 >> [11–11]

List View

Citations

Details

	Records
	Author	Hugo Bertiche; Niloy J Mitra; Kuldeep Kulkarni; Chun Hao Paul Huang; Tuanfeng Y Wang; Meysam Madadi; Sergio Escalera; Duygu Ceylan
	Title	Blowing in the Wind: CycleNet for Human Cinemagraphs from Still Images			Type	Conference Article
	Year	2023	Publication	36th IEEE Conference on Computer Vision and Pattern Recognition	Abbreviated Journal
	Volume		Issue		Pages	459-468
	Keywords
	Abstract	Cinemagraphs are short looping videos created by adding subtle motions to a static image. This kind of media is popular and engaging. However, automatic generation of cinemagraphs is an underexplored area and current solutions require tedious low-level manual authoring by artists. In this paper, we present an automatic method that allows generating human cinemagraphs from single RGB images. We investigate the problem in the context of dressed humans under the wind. At the core of our method is a novel cyclic neural network that produces looping cinemagraphs for the target loop duration. To circumvent the problem of collecting real data, we demonstrate that it is possible, by working in the image normal space, to learn garment motion dynamics on synthetic data and generalize to real data. We evaluate our method on both synthetic and real data and demonstrate that it is possible to create compelling and plausible cinemagraphs from single RGB images.
	Address	Vancouver; Canada; June 2023
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	CVPR
	Notes	HUPBA			Approved	no
	Call Number	Admin @ si @ BMK2023			Serial	3921
Permanent link to this record



	Author	Albin Soutif; Antonio Carta; Joost Van de Weijer
	Title	Improving Online Continual Learning Performance and Stability with Temporal Ensembles			Type	Conference Article
	Year	2023	Publication	2nd Conference on Lifelong Learning Agents	Abbreviated Journal
	Volume		Issue		Pages
	Keywords
	Abstract	Neural networks are very effective when trained on large datasets for a large number of iterations. However, when they are trained on non-stationary streams of data and in an online fashion, their performance is reduced (1) by the online setup, which limits the availability of data, (2) due to catastrophic forgetting because of the non-stationary nature of the data. Furthermore, several recent works (Caccia et al., 2022; Lange et al., 2023) arXiv:2205.13452 showed that replay methods used in continual learning suffer from the stability gap, encountered when evaluating the model continually (rather than only on task boundaries). In this article, we study the effect of model ensembling as a way to improve performance and stability in online continual learning. We notice that naively ensembling models coming from a variety of training tasks increases the performance in online continual learning considerably. Starting from this observation, and drawing inspirations from semi-supervised learning ensembling methods, we use a lightweight temporal ensemble that computes the exponential moving average of the weights (EMA) at test time, and show that it can drastically increase the performance and stability when used in combination with several methods from the literature.
	Address	Montreal; Canada; August 2023
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	COLLAS
	Notes	LAMP			Approved	no
	Call Number	Admin @ si @ SCW2023			Serial	3922
Permanent link to this record



	Author	Stepan Simsa; Michal Uricar; Milan Sulc; Yash Patel; Ahmed Hamdi; Matej Kocian; Matyas Skalicky; Jiri Matas; Antoine Doucet; Mickael Coustaty; Dimosthenis Karatzas
	Title	Overview of DocILE 2023: Document Information Localization and Extraction			Type	Conference Article
	Year	2023	Publication	International Conference of the Cross-Language Evaluation Forum for European Languages	Abbreviated Journal
	Volume	14163	Issue		Pages	276–293
	Keywords	Information Extraction; Computer Vision; Natural Language Processing; Optical Character Recognition; Document Understanding
	Abstract	This paper provides an overview of the DocILE 2023 Competition, its tasks, participant submissions, the competition results and possible future research directions. This first edition of the competition focused on two Information Extraction tasks, Key Information Localization and Extraction (KILE) and Line Item Recognition (LIR). Both of these tasks require detection of pre-defined categories of information in business documents. The second task additionally requires correctly grouping the information into tuples, capturing the structure laid out in the document. The competition used the recently published DocILE dataset and benchmark that stays open to new submissions. The diversity of the participant solutions indicates the potential of the dataset as the submissions included pure Computer Vision, pure Natural Language Processing, as well as multi-modal solutions and utilized all of the parts of the dataset, including the annotated, synthetic and unlabeled subsets.
	Address	Thessaloniki; Greece; September 2023
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title	LNCS
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	CLEF
	Notes	DAG			Approved	no
	Call Number	Admin @ si @ SUS2023a			Serial	3924
Permanent link to this record



	Author	Mohamed Ramzy Ibrahim; Robert Benavente; Daniel Ponsa; Felipe Lumbreras
	Title	Unveiling the Influence of Image Super-Resolution on Aerial Scene Classification			Type	Conference Article
	Year	2023	Publication	Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications	Abbreviated Journal
	Volume	14469	Issue		Pages	214–228
	Keywords
	Abstract	Deep learning has made significant advances in recent years, and as a result, it is now in a stage where it can achieve outstanding results in tasks requiring visual understanding of scenes. However, its performance tends to decline when dealing with low-quality images. The advent of super-resolution (SR) techniques has started to have an impact on the field of remote sensing by enabling the restoration of fine details and enhancing image quality, which could help to increase performance in other vision tasks. However, in previous works, contradictory results for scene visual understanding were achieved when SR techniques were applied. In this paper, we present an experimental study on the impact of SR on enhancing aerial scene classification. Through the analysis of different state-of-the-art SR algorithms, including traditional methods and deep learning-based approaches, we unveil the transformative potential of SR in overcoming the limitations of low-resolution (LR) aerial imagery. By enhancing spatial resolution, more fine details are captured, opening the door for an improvement in scene understanding. We also discuss the effect of different image scales on the quality of SR and its effect on aerial scene classification. Our experimental work demonstrates the significant impact of SR on enhancing aerial scene classification compared to LR images, opening new avenues for improved remote sensing applications.
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title	LNCS
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	CIARP
	Notes	MSIAU			Approved	no
	Call Number	Admin @ si @ IBP2023			Serial	4008
Permanent link to this record



	Author	Guillermo Torres; Debora Gil; Antoni Rosell; S. Mena; Carles Sanchez
	Title	Virtual Radiomics Biopsy for the Histological Diagnosis of Pulmonary Nodules			Type	Conference Article
	Year	2023	Publication	37th International Congress and Exhibition is organized by Computer Assisted Radiology and Surgery	Abbreviated Journal
	Volume		Issue		Pages
	Keywords
	Abstract	Pòster
	Address	Munich; Germany; June 2023
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	CARS
	Notes	IAM			Approved	no
	Call Number	Admin @ si @ TGR2023a			Serial	3950
Permanent link to this record



	Author	Mohamed Ali Souibgui; Sanket Biswas; Andres Mafla; Ali Furkan Biten; Alicia Fornes; Yousri Kessentini; Josep Llados; Lluis Gomez; Dimosthenis Karatzas
	Title	Text-DIAE: a self-supervised degradation invariant autoencoder for text recognition and document enhancement			Type	Conference Article
	Year	2023	Publication	Proceedings of the 37th AAAI Conference on Artificial Intelligence	Abbreviated Journal
	Volume	37	Issue	2	Pages
	Keywords	Representation Learning for Vision; CV Applications; CV Language and Vision; ML Unsupervised; Self-Supervised Learning
	Abstract	In this paper, we propose a Text-Degradation Invariant Auto Encoder (Text-DIAE), a self-supervised model designed to tackle two tasks, text recognition (handwritten or scene-text) and document image enhancement. We start by employing a transformer-based architecture that incorporates three pretext tasks as learning objectives to be optimized during pre-training without the usage of labelled data. Each of the pretext objectives is specifically tailored for the final downstream tasks. We conduct several ablation experiments that confirm the design choice of the selected pretext tasks. Importantly, the proposed model does not exhibit limitations of previous state-of-the-art methods based on contrastive losses, while at the same time requiring substantially fewer data samples to converge. Finally, we demonstrate that our method surpasses the state-of-the-art in existing supervised and self-supervised settings in handwritten and scene text recognition and document image enhancement. Our code and trained models will be made publicly available at https://github.com/dali92002/SSL-OCR
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	AAAI
	Notes	DAG			Approved	no
	Call Number	Admin @ si @ SBM2023			Serial	3848
Permanent link to this record



	Author	Khanh Nguyen; Ali Furkan Biten; Andres Mafla; Lluis Gomez; Dimosthenis Karatzas
	Title	Show, Interpret and Tell: Entity-Aware Contextualised Image Captioning in Wikipedia			Type	Conference Article
	Year	2023	Publication	Proceedings of the 37th AAAI Conference on Artificial Intelligence	Abbreviated Journal
	Volume	37	Issue	2	Pages	1940-1948
	Keywords
	Abstract	Humans exploit prior knowledge to describe images, and are able to adapt their explanation to specific contextual information given, even to the extent of inventing plausible explanations when contextual information and images do not match. In this work, we propose the novel task of captioning Wikipedia images by integrating contextual knowledge. Specifically, we produce models that jointly reason over Wikipedia articles, Wikimedia images and their associated descriptions to produce contextualized captions. The same Wikimedia image can be used to illustrate different articles, and the produced caption needs to be adapted to the specific context allowing us to explore the limits of the model to adjust captions to different contextual information. Dealing with out-of-dictionary words and Named Entities is a challenging task in this domain. To address this, we propose a pre-training objective, Masked Named Entity Modeling (MNEM), and show that this pretext task results to significantly improved models. Furthermore, we verify that a model pre-trained in Wikipedia generalizes well to News Captioning datasets. We further define two different test splits according to the difficulty of the captioning task. We offer insights on the role and the importance of each modality and highlight the limitations of our model.
	Address	Washington; USA; February 2023
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	AAAI
	Notes	DAG			Approved	no
	Call Number	Admin @ si @ NBM2023			Serial	3860
Permanent link to this record



	Author	Pau Cano; Alvaro Caravaca; Debora Gil; Eva Musulen
	Title	Diagnosis of Helicobacter pylori using AutoEncoders for the Detection of Anomalous Staining Patterns in Immunohistochemistry Images			Type	Miscellaneous
	Year	2023	Publication	Arxiv	Abbreviated Journal
	Volume		Issue		Pages	107241
	Keywords
	Abstract	This work addresses the detection of Helicobacter pylori a bacterium classified since 1994 as class 1 carcinogen to humans. By its highest specificity and sensitivity, the preferred diagnosis technique is the analysis of histological images with immunohistochemical staining, a process in which certain stained antibodies bind to antigens of the biological element of interest. This analysis is a time demanding task, which is currently done by an expert pathologist that visually inspects the digitized samples. We propose to use autoencoders to learn latent patterns of healthy tissue and detect H. pylori as an anomaly in image staining. Unlike existing classification approaches, an autoencoder is able to learn patterns in an unsupervised manner (without the need of image annotations) with high performance. In particular, our model has an overall 91% of accuracy with 86\% sensitivity, 96% specificity and 0.97 AUC in the detection of H. pylori.
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	IAM			Approved	no
	Call Number	Admin @ si @ CCG2023			Serial	3855
Permanent link to this record



	Author	Mohamed Ali Souibgui; Asma Bensalah; Jialuo Chen; Alicia Fornes; Michelle Waldispühl
	Title	A User Perspective on HTR methods for the Automatic Transcription of Rare Scripts: The Case of Codex Runicus Just Accepted			Type	Journal Article
	Year	2023	Publication	ACM Journal on Computing and Cultural Heritage	Abbreviated Journal	JOCCH
	Volume	15	Issue	4	Pages	1-18
	Keywords
	Abstract	Recent breakthroughs in Artificial Intelligence, Deep Learning and Document Image Analysis and Recognition have significantly eased the creation of digital libraries and the transcription of historical documents. However, for documents in rare scripts with few labelled training data available, current Handwritten Text Recognition (HTR) systems are too constraint. Moreover, research on HTR often focuses on technical aspects only, and rarely puts emphasis on implementing software tools for scholars in Humanities. In this article, we describe, compare and analyse different transcription methods for rare scripts. We evaluate their performance in a real use case of a medieval manuscript written in the runic script (Codex Runicus) and discuss advantages and disadvantages of each method from the user perspective. From this exhaustive analysis and comparison with a fully manual transcription, we raise conclusions and provide recommendations to scholars interested in using automatic transcription tools.
	Address
	Corporate Author				Thesis
	Publisher	ACM	Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	DAG; 600.121; 600.162; 602.230; 600.140			Approved	no
	Call Number	Admin @ si @ SBC2023			Serial	3732
Permanent link to this record



	Author	Juan Borrego-Carazo; Carles Sanchez; David Castells; Jordi Carrabina; Debora Gil
	Title	BronchoPose: an analysis of data and model configuration for vision-based bronchoscopy pose estimation			Type	Journal Article
	Year	2023	Publication	Computer Methods and Programs in Biomedicine	Abbreviated Journal	CMPB
	Volume	228	Issue		Pages	107241
	Keywords	Videobronchoscopy guiding; Deep learning; Architecture optimization; Datasets; Standardized evaluation framework; Pose estimation
	Abstract	Vision-based bronchoscopy (VB) models require the registration of the virtual lung model with the frames from the video bronchoscopy to provide effective guidance during the biopsy. The registration can be achieved by either tracking the position and orientation of the bronchoscopy camera or by calibrating its deviation from the pose (position and orientation) simulated in the virtual lung model. Recent advances in neural networks and temporal image processing have provided new opportunities for guided bronchoscopy. However, such progress has been hindered by the lack of comparative experimental conditions. In the present paper, we share a novel synthetic dataset allowing for a fair comparison of methods. Moreover, this paper investigates several neural network architectures for the learning of temporal information at different levels of subject personalization. In order to improve orientation measurement, we also present a standardized comparison framework and a novel metric for camera orientation learning. Results on the dataset show that the proposed metric and architectures, as well as the standardized conditions, provide notable improvements to current state-of-the-art camera pose estimation in video bronchoscopy.
	Address
	Corporate Author				Thesis
	Publisher	Elsevier	Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	IAM;			Approved	no
	Call Number	Admin @ si @ BSC2023			Serial	3702
Permanent link to this record



	Author	Jose Luis Gomez; Gabriel Villalonga; Antonio Lopez
	Title	Co-Training for Unsupervised Domain Adaptation of Semantic Segmentation Models			Type	Journal Article
	Year	2023	Publication	Sensors – Special Issue on “Machine Learning for Autonomous Driving Perception and Prediction”	Abbreviated Journal	SENS
	Volume	23	Issue	2	Pages	621
	Keywords	Domain adaptation; semi-supervised learning; Semantic segmentation; Autonomous driving
	Abstract	Semantic image segmentation is a central and challenging task in autonomous driving, addressed by training deep models. Since this training draws to a curse of human-based image labeling, using synthetic images with automatically generated labels together with unlabeled real-world images is a promising alternative. This implies to address an unsupervised domain adaptation (UDA) problem. In this paper, we propose a new co-training procedure for synth-to-real UDA of semantic segmentation models. It consists of a self-training stage, which provides two domain-adapted models, and a model collaboration loop for the mutual improvement of these two models. These models are then used to provide the final semantic segmentation labels (pseudo-labels) for the real-world images. The overall procedure treats the deep models as black boxes and drives their collaboration at the level of pseudo-labeled target images, i.e., neither modifying loss functions is required, nor explicit feature alignment. We test our proposal on standard synthetic and real-world datasets for on-board semantic segmentation. Our procedure shows improvements ranging from ∼13 to ∼26 mIoU points over baselines, so establishing new state-of-the-art results.
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	ADAS; no proj			Approved	no
	Call Number	Admin @ si @ GVL2023			Serial	3705
Permanent link to this record



	Author	Reuben Dorent; Aaron Kujawa; Marina Ivory; Spyridon Bakas; Nikola Rieke; Samuel Joutard; Ben Glocker; Jorge Cardoso; Marc Modat; Kayhan Batmanghelich; Arseniy Belkov; Maria Baldeon Calisto; Jae Won Choi; Benoit M. Dawant; Hexin Dong; Sergio Escalera; Yubo Fan; Lasse Hansen; Mattias P. Heinrich; Smriti Joshi; Victoriya Kashtanova; Hyeon Gyu Kim; Satoshi Kondo; Christian N. Kruse; Susana K. Lai-Yuen; Hao Li; Han Liu; Buntheng Ly; Ipek Oguz; Hyungseob Shin; Boris Shirokikh; Zixian Su; Guotai Wang; Jianghao Wu; Yanwu Xu; Kai Yao; Li Zhang; Sebastien Ourselin,
	Title	CrossMoDA 2021 challenge: Benchmark of Cross-Modality Domain Adaptation techniques for Vestibular Schwannoma and Cochlea Segmentation			Type	Journal Article
	Year	2023	Publication	Medical Image Analysis	Abbreviated Journal	MIA
	Volume	83	Issue		Pages	102628
	Keywords	Domain Adaptation; Segmen tation; Vestibular Schwnannoma
	Abstract	Domain Adaptation (DA) has recently raised strong interests in the medical imaging community. While a large variety of DA techniques has been proposed for image segmentation, most of these techniques have been validated either on private datasets or on small publicly available datasets. Moreover, these datasets mostly addressed single-class problems. To tackle these limitations, the Cross-Modality Domain Adaptation (crossMoDA) challenge was organised in conjunction with the 24th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2021). CrossMoDA is the first large and multi-class benchmark for unsupervised cross-modality DA. The challenge's goal is to segment two key brain structures involved in the follow-up and treatment planning of vestibular schwannoma (VS): the VS and the cochleas. Currently, the diagnosis and surveillance in patients with VS are performed using contrast-enhanced T1 (ceT1) MRI. However, there is growing interest in using non-contrast sequences such as high-resolution T2 (hrT2) MRI. Therefore, we created an unsupervised cross-modality segmentation benchmark. The training set provides annotated ceT1 (N=105) and unpaired non-annotated hrT2 (N=105). The aim was to automatically perform unilateral VS and bilateral cochlea segmentation on hrT2 as provided in the testing set (N=137). A total of 16 teams submitted their algorithm for the evaluation phase. The level of performance reached by the top-performing teams is strikingly high (best median Dice – VS:88.4%; Cochleas:85.7%) and close to full supervision (median Dice – VS:92.5%; Cochleas:87.7%). All top-performing methods made use of an image-to-image translation approach to transform the source-domain images into pseudo-target-domain images. A segmentation network was then trained using these generated images and the manual annotations provided for the source image.
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	HUPBA			Approved	no
	Call Number	Admin @ si @ DKI2023			Serial	3706
Permanent link to this record



	Author	Swathikiran Sudhakaran; Sergio Escalera; Oswald Lanz
	Title	Gate-Shift-Fuse for Video Action Recognition			Type	Journal Article
	Year	2023	Publication	IEEE Transactions on Pattern Analysis and Machine Intelligence	Abbreviated Journal	TPAMI
	Volume	45	Issue	9	Pages	10913-10928
	Keywords	Action Recognition; Video Classification; Spatial Gating; Channel Fusion
	Abstract	Convolutional Neural Networks are the de facto models for image recognition. However 3D CNNs, the straight forward extension of 2D CNNs for video recognition, have not achieved the same success on standard action recognition benchmarks. One of the main reasons for this reduced performance of 3D CNNs is the increased computational complexity requiring large scale annotated datasets to train them in scale. 3D kernel factorization approaches have been proposed to reduce the complexity of 3D CNNs. Existing kernel factorization approaches follow hand-designed and hard-wired techniques. In this paper we propose Gate-Shift-Fuse (GSF), a novel spatio-temporal feature extraction module which controls interactions in spatio-temporal decomposition and learns to adaptively route features through time and combine them in a data dependent manner. GSF leverages grouped spatial gating to decompose input tensor and channel weighting to fuse the decomposed tensors. GSF can be inserted into existing 2D CNNs to convert them into an efficient and high performing spatio-temporal feature extractor, with negligible parameter and compute overhead. We perform an extensive analysis of GSF using two popular 2D CNN families and achieve state-of-the-art or competitive performance on five standard action recognition benchmarks.
	Address	1 Sept. 2023
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	HUPBA; no menciona			Approved	no
	Call Number	Admin @ si @ SEL2023			Serial	3814
Permanent link to this record



	Author	Guillermo Torres; Debora Gil; Antoni Rosell; S. Mena; Carles Sanchez
	Title	Virtual Radiomics Biopsy for the Histological Diagnosis of Pulmonary Nodules – Intermediate Results of the RadioLung Project			Type	Journal Article
	Year	2023	Publication	International Journal of Computer Assisted Radiology and Surgery	Abbreviated Journal	IJCARS
	Volume		Issue		Pages
	Keywords
	Abstract
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	IAM			Approved	no
	Call Number	Admin @ si @ TGM2023			Serial	3830
Permanent link to this record



	Author	Kunal Biswas; Palaiahnakote Shivakumara; Umapada Pal; Tong Lu; Michel Blumenstein; Josep Llados
	Title	Classification of aesthetic natural scene images using statistical and semantic features			Type	Journal Article
	Year	2023	Publication	Multimedia Tools and Applications	Abbreviated Journal	MTAP
	Volume	82	Issue	9	Pages	13507-13532
	Keywords
	Abstract	Aesthetic image analysis is essential for improving the performance of multimedia image retrieval systems, especially from a repository of social media and multimedia content stored on mobile devices. This paper presents a novel method for classifying aesthetic natural scene images by studying the naturalness of image content using statistical features, and reading text in the images using semantic features. Unlike existing methods that focus only on image quality with human information, the proposed approach focuses on image features as well as text-based semantic features without human intervention to reduce the gap between subjectivity and objectivity in the classification. The aesthetic classes considered in this work are (i) Very Pleasant, (ii) Pleasant, (iii) Normal and (iv) Unpleasant. The naturalness is represented by features of focus, defocus, perceived brightness, perceived contrast, blurriness and noisiness, while semantics are represented by text recognition, description of the images and labels of images, profile pictures, and banner images. Furthermore, a deep learning model is proposed in a novel way to fuse statistical and semantic features for the classification of aesthetic natural scene images. Experiments on our own dataset and the standard datasets demonstrate that the proposed approach achieves 92.74%, 88.67% and 83.22% average classification rates on our own dataset, AVA dataset and CUHKPQ dataset, respectively. Furthermore, a comparative study of the proposed model with the existing methods shows that the proposed method is effective for the classification of aesthetic social media images.
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	DAG			Approved	no
	Call Number	Admin @ si @ BSP2023			Serial	3873
Permanent link to this record