Publicacions CVC -- Query Results

[181–190] << 191 192 193 194 195 196 197 198 199 200 >> [201–210]

Details

Records
Author	Jose Elias Yauri; Aura Hernandez-Sabate; Pau Folch; Debora Gil
Title	Mental Workload Detection Based on EEG Analysis			Type	Conference Article
Year	2021	Publication	Artificial Intelligent Research and Development. Proceedings 23rd International Conference of the Catalan Association for Artificial Intelligence.	Abbreviated Journal
Volume	339	Issue		Pages	268-277
Keywords	Cognitive states; Mental workload; EEG analysis; Neural Networks.
Abstract	The study of mental workload becomes essential for human work efficiency, health conditions and to avoid accidents, since workload compromises both performance and awareness. Although workload has been widely studied using several physiological measures, minimising the sensor network as much as possible remains both a challenge and a requirement. Electroencephalogram (EEG) signals have shown a high correlation to specific cognitive and mental states like workload. However, there is not enough evidence in the literature to validate how well models generalize in case of new subjects performing tasks of a workload similar to the ones included during model’s training. In this paper we propose a binary neural network to classify EEG features across different mental workloads. Two workloads, low and medium, are induced using two variants of the N-Back Test. The proposed model was validated in a dataset collected from 16 subjects and shown a high level of generalization capability: model reported an average recall of 81.81% in a leave-one-out subject evaluation.
Address	Virtual; October 20-22 2021
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	CCIA
Notes	IAM; 600.139; 600.118; 600.145			Approved	no
Call Number	Admin @ si @			Serial	3723
Permanent link to this record



Author	Giuseppe De Gregorio; Sanket Biswas; Mohamed Ali Souibgui; Asma Bensalah; Josep Llados; Alicia Fornes; Angelo Marcelli
Title	A Few Shot Multi-representation Approach for N-Gram Spotting in Historical Manuscripts			Type	Conference Article
Year	2022	Publication	Frontiers in Handwriting Recognition. International Conference on Frontiers in Handwriting Recognition (ICFHR2022)	Abbreviated Journal
Volume	13639	Issue		Pages	3-12
Keywords	N-gram spotting; Few-shot learning; Multimodal understanding; Historical handwritten collections
Abstract	Despite recent advances in automatic text recognition, the performance remains moderate when it comes to historical manuscripts. This is mainly because of the scarcity of available labelled data to train the data-hungry Handwritten Text Recognition (HTR) models. The Keyword Spotting System (KWS) provides a valid alternative to HTR due to the reduction in error rate, but it is usually limited to a closed reference vocabulary. In this paper, we propose a few-shot learning paradigm for spotting sequences of a few characters (N-gram) that requires a small amount of labelled training data. We exhibit that recognition of important n-grams could reduce the system’s dependency on vocabulary. In this case, an out-of-vocabulary (OOV) word in an input handwritten line image could be a sequence of n-grams that belong to the lexicon. An extensive experimental evaluation of our proposed multi-representation approach was carried out on a subset of Bentham’s historical manuscript collections to obtain some really promising results in this direction.
Address	December 04 – 07, 2022; Hyderabad, India
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title	LNCS
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	ICFHR
Notes	DAG; 600.121; 600.162; 602.230; 600.140			Approved	no
Call Number	Admin @ si @ GBS2022			Serial	3733
Permanent link to this record



Author	Arnau Baro; Carles Badal; Pau Torras; Alicia Fornes
Title	Handwritten Historical Music Recognition through Sequence-to-Sequence with Attention Mechanism			Type	Conference Article
Year	2022	Publication	3rd International Workshop on Reading Music Systems (WoRMS2021)	Abbreviated Journal
Volume		Issue		Pages	55-59
Keywords	Optical Music Recognition; Digits; Image Classification
Abstract	Despite decades of research in Optical Music Recognition (OMR), the recognition of old handwritten music scores remains a challenge because of the variabilities in the handwriting styles, paper degradation, lack of standard notation, etc. Therefore, the research in OMR systems adapted to the particularities of old manuscripts is crucial to accelerate the conversion of music scores existing in archives into digital libraries, fostering the dissemination and preservation of our music heritage. In this paper we explore the adaptation of sequence-to-sequence models with attention mechanism (used in translation and handwritten text recognition) and the generation of specific synthetic data for recognizing old music scores. The experimental validation demonstrates that our approach is promising, especially when compared with long short-term memory neural networks.
Address	July 23, 2021, Alicante (Spain)
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	WoRMS
Notes	DAG; 600.121; 600.162; 602.230; 600.140			Approved	no
Call Number	Admin @ si @ BBT2022			Serial	3734
Permanent link to this record



Author	Pau Torras; Arnau Baro; Alicia Fornes; Lei Kang
Title	Improving Handwritten Music Recognition through Language Model Integration			Type	Conference Article
Year	2022	Publication	4th International Workshop on Reading Music Systems (WoRMS2022)	Abbreviated Journal
Volume		Issue		Pages	42-46
Keywords	optical music recognition; historical sources; diversity; music theory; digital humanities
Abstract	Handwritten Music Recognition, especially in the historical domain, is an inherently challenging endeavour; paper degradation artefacts and the ambiguous nature of handwriting make recognising such scores an error-prone process, even for the current state-of-the-art Sequence to Sequence models. In this work we propose a way of reducing the production of statistically implausible output sequences by fusing a Language Model into a recognition Sequence to Sequence model. The idea is leveraging visually-conditioned and context-conditioned output distributions in order to automatically find and correct any mistakes that would otherwise break context significantly. We have found this approach to improve recognition results to 25.15 SER (%) from a previous best of 31.79 SER (%) in the literature.
Address	November 18, 2022
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	WoRMS
Notes	DAG; 600.121; 600.162; 602.230			Approved	no
Call Number	Admin @ si @ TBF2022			Serial	3735
Permanent link to this record



Author	Mohamed Ali Souibgui; Alicia Fornes; Yousri Kessentini; Beata Megyesi
Title	Few shots are all you need: A progressive learning approach for low resource handwritten text recognition			Type	Journal Article
Year	2022	Publication	Pattern Recognition Letters	Abbreviated Journal	PRL
Volume	160	Issue		Pages	43-49
Keywords
Abstract	Handwritten text recognition in low resource scenarios, such as manuscripts with rare alphabets, is a challenging problem. In this paper, we propose a few-shot learning-based handwriting recognition approach that significantly reduces the human annotation process, by requiring only a few images of each alphabet symbols. The method consists of detecting all the symbols of a given alphabet in a textline image and decoding the obtained similarity scores to the final sequence of transcribed symbols. Our model is first pretrained on synthetic line images generated from an alphabet, which could differ from the alphabet of the target domain. A second training step is then applied to reduce the gap between the source and the target data. Since this retraining would require annotation of thousands of handwritten symbols together with their bounding boxes, we propose to avoid such human effort through an unsupervised progressive learning approach that automatically assigns pseudo-labels to the unlabeled data. The evaluation on different datasets shows that our model can lead to competitive results with a significant reduction in human effort. The code will be publicly available in the following repository: https://github.com/dali92002/HTRbyMatching
Address
Corporate Author				Thesis
Publisher	Elsevier	Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes	DAG; 600.121; 600.162; 602.230			Approved	no
Call Number	Admin @ si @ SFK2022			Serial	3736
Permanent link to this record



Author	Joana Maria Pujadas-Mora; Alicia Fornes; Oriol Ramos Terrades; Josep Llados; Jialuo Chen; Miquel Valls-Figols; Anna Cabre
Title	The Barcelona Historical Marriage Database and the Baix Llobregat Demographic Database. From Algorithms for Handwriting Recognition to Individual-Level Demographic and Socioeconomic Data			Type	Journal
Year	2022	Publication	Historical Life Course Studies	Abbreviated Journal	HLCS
Volume	12	Issue		Pages	99-132
Keywords	Individual demographic databases; Computer vision, Record linkage; Social mobility; Inequality; Migration; Word spotting; Handwriting recognition; Local censuses; Marriage Licences
Abstract	The Barcelona Historical Marriage Database (BHMD) gathers records of the more than 600,000 marriages celebrated in the Diocese of Barcelona and their taxation registered in Barcelona Cathedral's so-called Marriage Licenses Books for the long period 1451–1905 and the BALL Demographic Database brings together the individual information recorded in the population registers, censuses and fiscal censuses of the main municipalities of the county of Baix Llobregat (Barcelona). In this ongoing collection 263,786 individual observations have been assembled, dating from the period between 1828 and 1965 by December 2020. The two databases started as part of different interdisciplinary research projects at the crossroads of Historical Demography and Computer Vision. Their construction uses artificial intelligence and computer vision methods as Handwriting Recognition to reduce the time of execution. However, its current state still requires some human intervention which explains the implemented crowdsourcing and game sourcing experiences. Moreover, knowledge graph techniques have allowed the application of advanced record linkage to link the same individuals and families across time and space. Moreover, we will discuss the main research lines using both databases developed so far in historical demography.
Address	June 23, 2022
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes	DAG; 600.121; 600.162; 602.230; 600.140			Approved	no
Call Number	Admin @ si @ PFR2022			Serial	3737
Permanent link to this record



Author	Asma Bensalah; Alicia Fornes; Cristina Carmona_Duarte; Josep Llados
Title	Easing Automatic Neurorehabilitation via Classification and Smoothness Analysis			Type	Conference Article
Year	2022	Publication	Intertwining Graphonomics with Human Movements. 20th International Conference of the International Graphonomics Society, IGS 2022	Abbreviated Journal
Volume	13424	Issue		Pages	336-348
Keywords	Neurorehabilitation; Upper-lim; Movement classification; Movement smoothness; Deep learning; Jerk
Abstract	Assessing the quality of movements for post-stroke patients during the rehabilitation phase is vital given that there is no standard stroke rehabilitation plan for all the patients. In fact, it depends basically on the patient’s functional independence and its progress along the rehabilitation sessions. To tackle this challenge and make neurorehabilitation more agile, we propose an automatic assessment pipeline that starts by recognising patients’ movements by means of a shallow deep learning architecture, then measuring the movement quality using jerk measure and related measures. A particularity of this work is that the dataset used is clinically relevant, since it represents movements inspired from Fugl-Meyer a well common upper-limb clinical stroke assessment scale for stroke patients. We show that it is possible to detect the contrast between healthy and patients movements in terms of smoothness, besides achieving conclusions about the patients’ progress during the rehabilitation sessions that correspond to the clinicians’ findings about each case.
Address	June 7-9, 2022, Las Palmas de Gran Canaria, Spain
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title	LNCS
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	IGS
Notes	DAG; 600.121; 600.162; 602.230; 600.140			Approved	no
Call Number	Admin @ si @ BFC2022			Serial	3738
Permanent link to this record



Author	Alicia Fornes; Asma Bensalah; Cristina Carmona_Duarte; Jialuo Chen; Miguel A. Ferrer; Andreas Fischer; Josep Llados; Cristina Martin; Eloy Opisso; Rejean Plamondon; Anna Scius-Bertrand; Josep Maria Tormos
Title	The RPM3D Project: 3D Kinematics for Remote Patient Monitoring			Type	Conference Article
Year	2022	Publication	Intertwining Graphonomics with Human Movements. 20th International Conference of the International Graphonomics Society, IGS 2022	Abbreviated Journal
Volume	13424	Issue		Pages	217-226
Keywords	Healthcare applications; Kinematic; Theory of Rapid Human Movements; Human activity recognition; Stroke rehabilitation; 3D kinematics
Abstract	This project explores the feasibility of remote patient monitoring based on the analysis of 3D movements captured with smartwatches. We base our analysis on the Kinematic Theory of Rapid Human Movement. We have validated our research in a real case scenario for stroke rehabilitation at the Guttmann Institute (https://www.guttmann.com/en/) (neurorehabilitation hospital), showing promising results. Our work could have a great impact in remote healthcare applications, improving the medical efficiency and reducing the healthcare costs. Future steps include more clinical validation, developing multi-modal analysis architectures (analysing data from sensors, images, audio, etc.), and exploring the application of our technology to monitor other neurodegenerative diseases.
Address	June 7-9, 2022, Las Palmas de Gran Canaria, Spain
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title	LNCS
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	IGS
Notes	DAG; 600.121; 600.162; 602.230; 600.140			Approved	no
Call Number	Admin @ si @ FBC2022			Serial	3739
Permanent link to this record



Author	Arnau Baro; Pau Riba; Alicia Fornes
Title	Musigraph: Optical Music Recognition Through Object Detection and Graph Neural Network			Type	Conference Article
Year	2022	Publication	Frontiers in Handwriting Recognition. International Conference on Frontiers in Handwriting Recognition (ICFHR2022)	Abbreviated Journal
Volume	13639	Issue		Pages	171-184
Keywords	Object detection; Optical music recognition; Graph neural network
Abstract	During the last decades, the performance of optical music recognition has been increasingly improving. However, and despite the 2-dimensional nature of music notation (e.g. notes have rhythm and pitch), most works treat musical scores as a sequence of symbols in one dimension, which make their recognition still a challenge. Thus, in this work we explore the use of graph neural networks for musical score recognition. First, because graphs are suited for n-dimensional representations, and second, because the combination of graphs with deep learning has shown a great performance in similar applications. Our methodology consists of: First, we will detect each isolated/atomic symbols (those that can not be decomposed in more graphical primitives) and the primitives that form a musical symbol. Then, we will build the graph taking as root node the notehead and as leaves those primitives or symbols that modify the note’s rhythm (stem, beam, flag) or pitch (flat, sharp, natural). Finally, the graph is translated into a human-readable character sequence for a final transcription and evaluation. Our method has been tested on more than five thousand measures, showing promising results.
Address	December 04 – 07, 2022; Hyderabad, India
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title	LNCS
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	ICFHR
Notes	DAG; 600.162; 600.140; 602.230			Approved	no
Call Number	Admin @ si @ BRF2022b			Serial	3740
Permanent link to this record



Author	Carlos Boned Riera; Oriol Ramos Terrades
Title	Discriminative Neural Variational Model for Unbalanced Classification Tasks in Knowledge Graph			Type	Conference Article
Year	2022	Publication	26th International Conference on Pattern Recognition	Abbreviated Journal
Volume		Issue		Pages	2186-2191
Keywords	Measurement; Couplings; Semantics; Ear; Benchmark testing; Data models; Pattern recognition
Abstract	Nowadays the paradigm of link discovery problems has shown significant improvements on Knowledge Graphs. However, method performances are harmed by the unbalanced nature of this classification problem, since many methods are easily biased to not find proper links. In this paper we present a discriminative neural variational auto-encoder model, called DNVAE from now on, in which we have introduced latent variables to serve as embedding vectors. As a result, the learnt generative model approximate better the underlying distribution and, at the same time, it better differentiate the type of relations in the knowledge graph. We have evaluated this approach on benchmark knowledge graph and Census records. Results in this last data set are quite impressive since we reach the highest possible score in the evaluation metrics. However, further experiments are still needed to deeper evaluate the performance of the method in more challenging tasks.
Address	Montreal; Quebec; Canada; August 2022
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	ICPR
Notes	DAG; 600.121; 600.162			Approved	no
Call Number	Admin @ si @ BoR2022			Serial	3741
Permanent link to this record



Author	Patricia Suarez; Angel Sappa
Title	A Generative Model for Guided Thermal Image Super-Resolution			Type	Conference Article
Year	2024	Publication	19th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications	Abbreviated Journal
Volume		Issue		Pages
Keywords
Abstract	This paper presents a novel approach for thermal super-resolution based on a fusion prior, low-resolution thermal image and H brightness channel of the corresponding visible spectrum image. The method combines bicubic interpolation of the ×8 scale target image with the brightness component. To enhance the guidance process, the original RGB image is converted to HSV, and the brightness channel is extracted. Bicubic interpolation is then applied to the low-resolution thermal image, resulting in a Bicubic-Brightness channel blend. This luminance-bicubic fusion is used as an input image to help the training process. With this fused image, the cyclic adversarial generative network obtains high-resolution thermal image results. Experimental evaluations show that the proposed approach significantly improves spatial resolution and pixel intensity levels compared to other state-of-the-art techniques, making it a promising method to obtain high-resolution thermal.
Address	Roma; Italia; February 2024
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	VISAPP
Notes	MSIAU			Approved	no
Call Number	Admin @ si @ SuS2024			Serial	4002
Permanent link to this record



Author	Hector Laria Mantecon; Kai Wang; Joost Van de Weijer; Bogdan Raducanu; Kai Wang
Title	NeRF-Diffusion for 3D-Consistent Face Generation and Editing			Type	Conference Article
Year	2024	Publication	19th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications	Abbreviated Journal
Volume		Issue		Pages
Keywords
Abstract	Generating high-fidelity 3D-aware images without 3D supervision is a valuable capability in various applications. Current methods based on NeRF features, SDF information, or triplane features have limited variation after training. To address this, we propose a novel approach that combines pretrained models for shape and content generation. Our method leverages a pretrained Neural Radiance Field as a shape prior and a diffusion model for content generation. By conditioning the diffusion model with 3D features, we enhance its ability to generate novel views with 3D awareness. We introduce a consistency token shared between the NeRF module and the diffusion model to maintain 3D consistency during sampling. Moreover, our framework allows for text editing of 3D-aware image generation, enabling users to modify the style over 3D views while preserving semantic content. Our contributions include incorporating 3D awareness into a text-to-image model, addressing identity consistency in 3D view synthesis, and enabling text editing of 3D-aware image generation. We provide detailed explanations, including the shape prior based on the NeRF model and the content generation process using the diffusion model. We also discuss challenges such as shape consistency and sampling saturation. Experimental results demonstrate the effectiveness and visual quality of our approach.
Address	Roma; Italia; February 2024
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	VISAPP
Notes	LAMP			Approved	no
Call Number	Admin @ si @ LWW2024			Serial	4003
Permanent link to this record



Author	Penny Tarling; Mauricio Cantor; Albert Clapes; Sergio Escalera
Title	Deep learning with self-supervision and uncertainty regularization to count fish in underwater images			Type	Journal Article
Year	2022	Publication	PloS One	Abbreviated Journal	Plos
Volume	17	Issue	5	Pages	e0267759
Keywords
Abstract	Effective conservation actions require effective population monitoring. However, accurately counting animals in the wild to inform conservation decision-making is difficult. Monitoring populations through image sampling has made data collection cheaper, wide-reaching and less intrusive but created a need to process and analyse this data efficiently. Counting animals from such data is challenging, particularly when densely packed in noisy images. Attempting this manually is slow and expensive, while traditional computer vision methods are limited in their generalisability. Deep learning is the state-of-the-art method for many computer vision tasks, but it has yet to be properly explored to count animals. To this end, we employ deep learning, with a density-based regression approach, to count fish in low-resolution sonar images. We introduce a large dataset of sonar videos, deployed to record wild Lebranche mullet schools (Mugil liza), with a subset of 500 labelled images. We utilise abundant unlabelled data in a self-supervised task to improve the supervised counting task. For the first time in this context, by introducing uncertainty quantification, we improve model training and provide an accompanying measure of prediction uncertainty for more informed biological decision-making. Finally, we demonstrate the generalisability of our proposed counting framework through testing it on a recent benchmark dataset of high-resolution annotated underwater images from varying habitats (DeepFish). From experiments on both contrasting datasets, we demonstrate our network outperforms the few other deep learning models implemented for solving this task. By providing an open-source framework along with training data, our study puts forth an efficient deep learning template for crowd counting aquatic animals thereby contributing effective methods to assess natural populations from the ever-increasing visual data.
Address
Corporate Author				Thesis
Publisher	Public Library of Science	Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes	HuPBA			Approved	no
Call Number	Admin @ si @ TCC2022			Serial	3743
Permanent link to this record



Author	Yecong Wan; Yuanshuo Cheng; Miingwen Shao; Jordi Gonzalez
Title	Image rain removal and illumination enhancement done in one go			Type	Journal Article
Year	2022	Publication	Knowledge-Based Systems	Abbreviated Journal	KBS
Volume	252	Issue		Pages	109244
Keywords
Abstract	Rain removal plays an important role in the restoration of degraded images. Recently, CNN-based methods have achieved remarkable success. However, these approaches neglect that the appearance of real-world rain is often accompanied by low light conditions, which will further degrade the image quality, thereby hindering the restoration mission. Therefore, it is very indispensable to jointly remove the rain and enhance illumination for real-world rain image restoration. To this end, we proposed a novel spatially-adaptive network, dubbed SANet, which can remove the rain and enhance illumination in one go with the guidance of degradation mask. Meanwhile, to fully utilize negative samples, a contrastive loss is proposed to preserve more natural textures and consistent illumination. In addition, we present a new synthetic dataset, named DarkRain, to boost the development of rain image restoration algorithms in practical scenarios. DarkRain not only contains different degrees of rain, but also considers different lighting conditions, and more realistically simulates real-world rainfall scenarios. SANet is extensively evaluated on the proposed dataset and attains new state-of-the-art performance against other combining methods. Moreover, after a simple transformation, our SANet surpasses existing the state-of-the-art algorithms in both rain removal and low-light image enhancement.
Address	Sept 2022
Corporate Author				Thesis
Publisher	Elsevier	Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes	ISE; 600.157; 600.168			Approved	no
Call Number	Admin @ si @ WCS2022			Serial	3744
Permanent link to this record



Author	Lu Yu; Xialei Liu; Joost Van de Weijer
Title	Self-Training for Class-Incremental Semantic Segmentation			Type	Journal Article
Year	2022	Publication	IEEE Transactions on Neural Networks and Learning Systems	Abbreviated Journal	TNNLS
Volume		Issue		Pages
Keywords	Class-incremental learning; Self-training; Semantic segmentation.
Abstract	In class-incremental semantic segmentation, we have no access to the labeled data of previous tasks. Therefore, when incrementally learning new classes, deep neural networks suffer from catastrophic forgetting of previously learned knowledge. To address this problem, we propose to apply a self-training approach that leverages unlabeled data, which is used for rehearsal of previous knowledge. Specifically, we first learn a temporary model for the current task, and then, pseudo labels for the unlabeled data are computed by fusing information from the old model of the previous task and the current temporary model. In addition, conflict reduction is proposed to resolve the conflicts of pseudo labels generated from both the old and temporary models. We show that maximizing self-entropy can further improve results by smoothing the overconfident predictions. Interestingly, in the experiments, we show that the auxiliary data can be different from the training data and that even general-purpose, but diverse auxiliary data can lead to large performance gains. The experiments demonstrate the state-of-the-art results: obtaining a relative gain of up to 114% on Pascal-VOC 2012 and 8.5% on the more challenging ADE20K compared to previous state-of-the-art methods.
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes	LAMP; 600.147; 611.008;			Approved	no
Call Number	Admin @ si @ YLW2022			Serial	3745
Permanent link to this record