Publicacions CVC -- Query Results

[211–220] << 221 222 223 224 225 226 227 228 >>

Details

Records
Author	Pau Torras; Arnau Baro; Alicia Fornes; Lei Kang
Title	Improving Handwritten Music Recognition through Language Model Integration			Type	Conference Article
Year	2022	Publication	4th International Workshop on Reading Music Systems (WoRMS2022)	Abbreviated Journal
Volume		Issue		Pages	42-46
Keywords	optical music recognition; historical sources; diversity; music theory; digital humanities
Abstract	Handwritten Music Recognition, especially in the historical domain, is an inherently challenging endeavour; paper degradation artefacts and the ambiguous nature of handwriting make recognising such scores an error-prone process, even for the current state-of-the-art Sequence to Sequence models. In this work we propose a way of reducing the production of statistically implausible output sequences by fusing a Language Model into a recognition Sequence to Sequence model. The idea is leveraging visually-conditioned and context-conditioned output distributions in order to automatically find and correct any mistakes that would otherwise break context significantly. We have found this approach to improve recognition results to 25.15 SER (%) from a previous best of 31.79 SER (%) in the literature.
Address	November 18, 2022
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	WoRMS
Notes	DAG; 600.121; 600.162; 602.230			Approved	no
Call Number	Admin @ si @ TBF2022			Serial	3735
Permanent link to this record



Author	Arnau Baro
Title	Reading Music Systems: From Deep Optical Music Recognition to Contextual Methods			Type	Book Whole
Year	2022	Publication	PhD Thesis, Universitat Autonoma de Barcelona-CVC	Abbreviated Journal
Volume		Issue		Pages
Keywords
Abstract	The transcription of sheet music into some machine-readable format can be carried out manually. However, the complexity of music notation inevitably leads to burdensome software for music score editing, which makes the whole process very time-consuming and prone to errors. Consequently, automatic transcription systems for musical documents represent interesting tools. Document analysis is the subject that deals with the extraction and processing of documents through image and pattern recognition. It is a branch of computer vision. Taking music scores as source, the field devoted to address this task is known as Optical Music Recognition (OMR). Typically, an OMR system takes an image of a music score and automatically extracts its content into some symbolic structure such as MEI or MusicXML. In this dissertation, we have investigated different methods for recognizing a single staff section (e.g. scores for violin, flute, etc.), much in the same way as most text recognition research focuses on recognizing words appearing in a given line image. These methods are based in two different methodologies. On the one hand, we present two methods based on Recurrent Neural Networks, in particular, the Long Short-Term Memory Neural Network. On the other hand, a method based on Sequence to Sequence models is detailed. Music context is needed to improve the OMR results, just like language models and dictionaries help in handwriting recognition. For example, syntactical rules and grammars could be easily defined to cope with the ambiguities in the rhythm. In music theory, for example, the time signature defines the amount of beats per bar unit. Thus, in the second part of this dissertation, different methodologies have been investigated to improve the OMR recognition. We have explored three different methods: (a) a graphic tree-structure representation, Dendrograms, that joins, at each level, its primitives following a set of rules, (b) the incorporation of Language Models to model the probability of a sequence of tokens, and (c) graph neural networks to analyze the music scores to avoid meaningless relationships between music primitives. Finally, to train all these methodologies, and given the method-specificity of the datasets in the literature, we have created four different music datasets. Two of them are synthetic with a modern or old handwritten appearance, whereas the other two are real handwritten scores, being one of them modern and the other old.
Address
Corporate Author				Thesis	Ph.D. thesis
Publisher	IMPRIMA	Place of Publication		Editor	Alicia Fornes
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN	978-84-124793-8-6	Medium
Area		Expedition		Conference
Notes	DAG;			Approved	no
Call Number	Admin @ si @ Bar2022			Serial	3754
Permanent link to this record



Author	Ali Furkan Biten
Title	A Bitter-Sweet Symphony on Vision and Language: Bias and World Knowledge			Type	Book Whole
Year	2022	Publication	PhD Thesis, Universitat Autonoma de Barcelona-CVC	Abbreviated Journal
Volume		Issue		Pages
Keywords
Abstract	Vision and Language are broadly regarded as cornerstones of intelligence. Even though language and vision have different aims – language having the purpose of communication, transmission of information and vision having the purpose of constructing mental representations around us to navigate and interact with objects – they cooperate and depend on one another in many tasks we perform effortlessly. This reliance is actively being studied in various Computer Vision tasks, e.g. image captioning, visual question answering, image-sentence retrieval, phrase grounding, just to name a few. All of these tasks share the inherent difficulty of the aligning the two modalities, while being robust to language priors and various biases existing in the datasets. One of the ultimate goal for vision and language research is to be able to inject world knowledge while getting rid of the biases that come with the datasets. In this thesis, we mainly focus on two vision and language tasks, namely Image Captioning and Scene-Text Visual Question Answering (STVQA). In both domains, we start by defining a new task that requires the utilization of world knowledge and in both tasks, we find that the models commonly employed are prone to biases that exist in the data. Concretely, we introduce new tasks and discover several problems that impede performance at each level and provide remedies or possible solutions in each chapter: i) We define a new task to move beyond Image Captioning to Image Interpretation that can utilize Named Entities in the form of world knowledge. ii) We study the object hallucination problem in classic Image Captioning systems and develop an architecture-agnostic solution. iii) We define a sub-task of Visual Question Answering that requires reading the text in the image (STVQA), where we highlight the limitations of current models. iv) We propose an architecture for the STVQA task that can point to the answer in the image and show how to combine it with classic VQA models. v) We show how far language can get us in STVQA and discover yet another bias which causes the models to disregard the image while doing Visual Question Answering.
Address
Corporate Author				Thesis	Ph.D. thesis
Publisher	IMPRIMA	Place of Publication		Editor	Dimosthenis Karatzas;Lluis Gomez
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN	978-84-124793-5-5	Medium
Area		Expedition		Conference
Notes	DAG			Approved	no
Call Number	Admin @ si @ Bit2022			Serial	3755
Permanent link to this record



Author	Andres Mafla
Title	Leveraging Scene Text Information for Image Interpretation			Type	Book Whole
Year	2022	Publication	PhD Thesis, Universitat Autonoma de Barcelona-CVC	Abbreviated Journal
Volume		Issue		Pages
Keywords
Abstract	Until recently, most computer vision models remained illiterate, largely ignoring the semantically rich and explicit information contained in scene text. Recent progress in scene text detection and recognition has recently allowed exploring its role in a diverse set of open computer vision problems, e.g. image classification, image-text retrieval, image captioning, and visual question answering to name a few. The explicit semantics of scene text closely requires specific modeling similar to language. However, scene text is a particular signal that has to be interpreted according to a comprehensive perspective that encapsulates all the visual cues in an image. Incorporating this information is a straightforward task for humans, but if we are unfamiliar with a language or scripture, achieving a complete world understanding is impossible (e.a. visiting a foreign country with a different alphabet). Despite the importance of scene text, modeling it requires considering the several ways in which scene text interacts with an image, processing and fusing an additional modality. In this thesis, we mainly focus on two tasks, scene text-based fine-grained image classification, and cross-modal retrieval. In both studied tasks we identify existing limitations in current approaches and propose plausible solutions. Concretely, in each chapter: i) We define a compact way to embed scene text that generalizes to unseen words at training time while performing in real-time. ii) We incorporate the previously learned scene text embedding to create an image-level descriptor that overcomes optical character recognition (OCR) errors which is well-suited to the fine-grained image classification task. iii) We design a region-level reasoning network that learns the interaction through semantics among salient visual regions and scene text instances. iv) We employ scene text information in image-text matching and introduce the Scene Text Aware Cross-Modal retrieval StacMR task. We gather a dataset that incorporates scene text and design a model suited for the newly studied modality. v) We identify the drawbacks of current retrieval metrics in cross-modal retrieval. An image captioning metric is proposed as a way of better evaluating semantics in retrieved results. Ample experimentation shows that incorporating such semantics into a model yields better semantic results while requiring significantly less data to converge.
Address
Corporate Author				Thesis	Ph.D. thesis
Publisher	IMPRIMA	Place of Publication		Editor	Dimosthenis Karatzas;Lluis Gomez
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN	978-84-124793-6-2	Medium
Area		Expedition		Conference
Notes	DAG			Approved	no
Call Number	Admin @ si @ Maf2022			Serial	3756
Permanent link to this record



Author	Mohamed Ali Souibgui
Title	Document Image Enhancement and Recognition in Low Resource Scenarios: Application to Ciphers and Handwritten Text			Type	Book Whole
Year	2022	Publication	PhD Thesis, Universitat Autonoma de Barcelona-CVC	Abbreviated Journal
Volume		Issue		Pages
Keywords
Abstract	In this thesis, we propose different contributions with the goal of enhancing and recognizing historical handwritten document images, especially the ones with rare scripts, such as cipher documents. In the first part, some effective end-to-end models for Document Image Enhancement (DIE) using deep learning models were presented. First, Generative Adversarial Networks (cGAN) for different tasks (document clean-up, binarization, deblurring, and watermark removal) were explored. Next, we further improve the results by recovering the degraded document images into a clean and readable form by integrating a text recognizer into the cGAN model to promote the generated document image to be more readable. Afterward, we present a new encoder-decoder architecture based on vision transformers to enhance both machine-printed and handwritten document images, in an end-to-end fashion. The second part of the thesis addresses Handwritten Text Recognition (HTR) in low resource scenarios, i.e. when only few labeled training data is available. We propose novel methods for recognizing ciphers with rare scripts. First, a few-shot object detection based method was proposed. Then, we incorporate a progressive learning strategy that automatically assignspseudo-labels to a set of unlabeled data to reduce the human labor of annotating few pages while maintaining the good performance of the model. Secondly, a data generation technique based on Bayesian Program Learning (BPL) is proposed to overcome the lack of data in such rare scripts. Thirdly, we propose a Text-Degradation Invariant Auto Encoder (Text-DIAE). This latter self-supervised model is designed to tackle two tasks, text recognition and document image enhancement. The proposed model does not exhibit limitations of previous state-of-the-art methods based on contrastive losses, while at the same time, it requires substantially fewer data samples to converge. In the third part of the thesis, we analyze, from the user perspective, the usage of HTR systems in low resource scenarios. This contrasts with the usual research on HTR, which often focuses on technical aspects only and rarely devotes efforts on implementing software tools for scholars in Humanities.
Address
Corporate Author				Thesis	Ph.D. thesis
Publisher	IMPRIMA	Place of Publication		Editor	Alicia Fornes;Yousri Kessentini
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN	978-84-124793-8-6	Medium
Area		Expedition		Conference
Notes	DAG			Approved	no
Call Number	Admin @ si @ Sou2022			Serial	3757
Permanent link to this record



Author	Shiqi Yang; Yaxing Wang; Kai Wang; Shangling Jui; Joost Van de Weijer
Title	Attracting and Dispersing: A Simple Approach for Source-free Domain Adaptation			Type	Conference Article
Year	2022	Publication	36th Conference on Neural Information Processing Systems	Abbreviated Journal
Volume		Issue		Pages
Keywords
Abstract	We propose a simple but effective source-free domain adaptation (SFDA) method. Treating SFDA as an unsupervised clustering problem and following the intuition that local neighbors in feature space should have more similar predictions than other features, we propose to optimize an objective of prediction consistency. This objective encourages local neighborhood features in feature space to have similar predictions while features farther away in feature space have dissimilar predictions, leading to efficient feature clustering and cluster assignment simultaneously. For efficient training, we seek to optimize an upper-bound of the objective resulting in two simple terms. Furthermore, we relate popular existing methods in domain adaptation, source-free domain adaptation and contrastive learning via the perspective of discriminability and diversity. The experimental results prove the superiority of our method, and our method can be adopted as a simple but strong baseline for future research in SFDA. Our method can be also adapted to source-free open-set and partial-set DA which further shows the generalization ability of our method.
Address	Virtual; November 2022
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	NEURIPS
Notes	LAMP; 600.147			Approved	no
Call Number	Admin @ si @ YWW2022a			Serial	3792
Permanent link to this record



Author	Hector Laria Mantecon; Yaxing Wang; Joost Van de Weijer; Bogdan Raducanu
Title	Transferring Unconditional to Conditional GANs With Hyper-Modulation			Type	Conference Article
Year	2022	Publication	IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)	Abbreviated Journal
Volume		Issue		Pages
Keywords
Abstract	GANs have matured in recent years and are able to generate high-resolution, realistic images. However, the computational resources and the data required for the training of high-quality GANs are enormous, and the study of transfer learning of these models is therefore an urgent topic. Many of the available high-quality pretrained GANs are unconditional (like StyleGAN). For many applications, however, conditional GANs are preferable, because they provide more control over the generation process, despite often suffering more training difficulties. Therefore, in this paper, we focus on transferring from high-quality pretrained unconditional GANs to conditional GANs. This requires architectural adaptation of the pretrained GAN to perform the conditioning. To this end, we propose hyper-modulated generative networks that allow for shared and complementary supervision. To prevent the additional weights of the hypernetwork to overfit, with subsequent mode collapse on small target domains, we introduce a self-initialization procedure that does not require any real data to initialize the hypernetwork parameters. To further improve the sample efficiency of the transfer, we apply contrastive learning in the discriminator, which effectively works on very limited batch sizes. In extensive experiments, we validate the efficiency of the hypernetworks, self-initialization and contrastive loss for knowledge transfer on standard benchmarks.
Address	New Orleans; USA; June 2022
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	CVPRW
Notes	LAMP; 600.147; 602.200			Approved	no
Call Number	LWW2022a			Serial	3785
Permanent link to this record



Author	Yaxing Wang; Joost Van de Weijer; Lu Yu; Shangling Jui
Title	Distilling GANs with Style-Mixed Triplets for X2I Translation with Limited Data			Type	Conference Article
Year	2022	Publication	10th International Conference on Learning Representations	Abbreviated Journal
Volume		Issue		Pages
Keywords
Abstract	Conditional image synthesis is an integral part of many X2I translation systems, including image-to-image, text-to-image and audio-to-image translation systems. Training these large systems generally requires huge amounts of training data. Therefore, we investigate knowledge distillation to transfer knowledge from a high-quality unconditioned generative model (e.g., StyleGAN) to a conditioned synthetic image generation modules in a variety of systems. To initialize the conditional and reference branch (from a unconditional GAN) we exploit the style mixing characteristics of high-quality GANs to generate an infinite supply of style-mixed triplets to perform the knowledge distillation. Extensive experimental results in a number of image generation tasks (i.e., image-to-image, semantic segmentation-to-image, text-to-image and audio-to-image) demonstrate qualitatively and quantitatively that our method successfully transfers knowledge to the synthetic image generation modules, resulting in more realistic images than previous methods as confirmed by a significant drop in the FID.
Address	Virtual
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	ICLR
Notes	LAMP; 600.147			Approved	no
Call Number	Admin @ si @ WWY2022			Serial	3791
Permanent link to this record



Author	Kai Wang; Fei Yang; Joost Van de Weijer
Title	Attention Distillation: self-supervised vision transformer students need more guidance			Type	Conference Article
Year	2022	Publication	33rd British Machine Vision Conference	Abbreviated Journal
Volume		Issue		Pages
Keywords
Abstract	Self-supervised learning has been widely applied to train high-quality vision transformers. Unleashing their excellent performance on memory and compute constraint devices is therefore an important research topic. However, how to distill knowledge from one self-supervised ViT to another has not yet been explored. Moreover, the existing self-supervised knowledge distillation (SSKD) methods focus on ConvNet based architectures are suboptimal for ViT knowledge distillation. In this paper, we study knowledge distillation of self-supervised vision transformers (ViT-SSKD). We show that directly distilling information from the crucial attention mechanism from teacher to student can significantly narrow the performance gap between both. In experiments on ImageNet-Subset and ImageNet-1K, we show that our method AttnDistill outperforms existing self-supervised knowledge distillation (SSKD) methods and achieves state-of-the-art k-NN accuracy compared with self-supervised learning (SSL) methods learning from scratch (with the ViT-S model). We are also the first to apply the tiny ViT-T model on self-supervised learning. Moreover, AttnDistill is independent of self-supervised learning algorithms, it can be adapted to ViT based SSL methods to improve the performance in future research.
Address	London; UK; November 2022
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	BMVC
Notes	LAMP; 600.147			Approved	no
Call Number	Admin @ si @ WYW2022			Serial	3793
Permanent link to this record



Author	Kai Wang; Chenshen Wu; Andrew Bagdanov; Xialei Liu; Shiqi Yang; Shangling Jui; Joost Van de Weijer
Title	Positive Pair Distillation Considered Harmful: Continual Meta Metric Learning for Lifelong Object Re-Identification			Type	Conference Article
Year	2022	Publication	33rd British Machine Vision Conference	Abbreviated Journal
Volume		Issue		Pages
Keywords
Abstract	Lifelong object re-identification incrementally learns from a stream of re-identification tasks. The objective is to learn a representation that can be applied to all tasks and that generalizes to previously unseen re-identification tasks. The main challenge is that at inference time the representation must generalize to previously unseen identities. To address this problem, we apply continual meta metric learning to lifelong object re-identification. To prevent forgetting of previous tasks, we use knowledge distillation and explore the roles of positive and negative pairs. Based on our observation that the distillation and metric losses are antagonistic, we propose to remove positive pairs from distillation to robustify model updates. Our method, called Distillation without Positive Pairs (DwoPP), is evaluated on extensive intra-domain experiments on person and vehicle re-identification datasets, as well as inter-domain experiments on the LReID benchmark. Our experiments demonstrate that DwoPP significantly outperforms the state-of-the-art.
Address	London; UK; November 2022
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference	BMVC
Notes	LAMP; 600.147			Approved	no
Call Number	Admin @ si @ WWB2022			Serial	3794
Permanent link to this record



Author	Guillermo Torres; Debora Gil; Antoni Rosell; S. Mena; Carles Sanchez
Title	Virtual Radiomics Biopsy for the Histological Diagnosis of Pulmonary Nodules – Intermediate Results of the RadioLung Project			Type	Journal Article
Year	2023	Publication	International Journal of Computer Assisted Radiology and Surgery	Abbreviated Journal	IJCARS
Volume		Issue		Pages
Keywords
Abstract
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes	IAM			Approved	no
Call Number	Admin @ si @ TGM2023			Serial	3830
Permanent link to this record



Author	Shiqi Yang; Yaxing Wang; Kai Wang; Shangling Jui; Joost Van de Weijer
Title	Local Prediction Aggregation: A Frustratingly Easy Source-free Domain Adaptation Method			Type	Miscellaneous
Year	2022	Publication	Arxiv	Abbreviated Journal
Volume		Issue		Pages
Keywords
Abstract	We propose a simple but effective source-free domain adaptation (SFDA) method. Treating SFDA as an unsupervised clustering problem and following the intuition that local neighbors in feature space should have more similar predictions than other features, we propose to optimize an objective of prediction consistency. This objective encourages local neighborhood features in feature space to have similar predictions while features farther away in feature space have dissimilar predictions, leading to efficient feature clustering and cluster assignment simultaneously. For efficient training, we seek to optimize an upper-bound of the objective resulting in two simple terms. Furthermore, we relate popular existing methods in domain adaptation, source-free domain adaptation and contrastive learning via the perspective of discriminability and diversity. The experimental results prove the superiority of our method, and our method can be adopted as a simple but strong baseline for future research in SFDA. Our method can be also adapted to source-free open-set and partial-set DA which further shows the generalization ability of our method. Code is available in this https URL.
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes	LAMP; 600.147			Approved	no
Call Number	Admin @ si @ YWW2022b			Serial	3815
Permanent link to this record



Author	Shiqi Yang; Yaxing Wang; Kai Wang; Shangling Jui; Joost Van de Weijer
Title	One Ring to Bring Them All: Towards Open-Set Recognition under Domain Shift			Type	Miscellaneous
Year	2022	Publication	Arxiv	Abbreviated Journal
Volume		Issue		Pages
Keywords
Abstract	In this paper, we investigate model adaptation under domain and category shift, where the final goal is to achieve (SF-UNDA), which addresses the situation where there exist both domain and category shifts between source and target domains. Under the SF-UNDA setting, the model cannot access source data anymore during target adaptation, which aims to address data privacy concerns. We propose a novel training scheme to learn a ( +1)-way classifier to predict the source classes and the unknown class, where samples of only known source categories are available for training. Furthermore, for target adaptation, we simply adopt a weighted entropy minimization to adapt the source pretrained model to the unlabeled target domain without source data. In experiments, we show: After source training, the resulting source model can get excellent performance for ; After target adaptation, our method surpasses current UNDA approaches which demand source data during adaptation. The versatility to several different tasks strongly proves the efficacy and generalization ability of our method. When augmented with a closed-set domain adaptation approach during target adaptation, our source-free method further outperforms the current state-of-the-art UNDA method by 2.5%, 7.2% and 13% on Office-31, Office-Home and VisDA respectively.
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes	LAMP; no proj			Approved	no
Call Number	Admin @ si @ YWW2022c			Serial	3818
Permanent link to this record



Author	Saiping Zhang, Luis Herranz, Marta Mrak, Marc Gorriz Blanch, Shuai Wan, Fuzheng Yang
Title	PeQuENet: Perceptual Quality Enhancement of Compressed Video with Adaptation-and Attention-based Network			Type	Miscellaneous
Year	2022	Publication	Arxiv	Abbreviated Journal
Volume		Issue		Pages
Keywords
Abstract	In this paper we propose a generative adversarial network (GAN) framework to enhance the perceptual quality of compressed videos. Our framework includes attention and adaptation to different quantization parameters (QPs) in a single model. The attention module exploits global receptive fields that can capture and align long-range correlations between consecutive frames, which can be beneficial for enhancing perceptual quality of videos. The frame to be enhanced is fed into the deep network together with its neighboring frames, and in the first stage features at different depths are extracted. Then extracted features are fed into attention blocks to explore global temporal correlations, followed by a series of upsampling and convolution layers. Finally, the resulting features are processed by the QP-conditional adaptation module which leverages the corresponding QP information. In this way, a single model can be used to enhance adaptively to various QPs without requiring multiple models specific for every QP value, while having similar performance. Experimental results demonstrate the superior performance of the proposed PeQuENet compared with the state-of-the-art compressed video quality enhancement algorithms.
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes	MACO; no proj			Approved	no
Call Number	Admin @ si @ ZHM2022b			Serial	3819
Permanent link to this record



Author	Ruben Ballester; Xavier Arnal Clemente; Carles Casacuberta; Meysam Madadi; Ciprian Corneanu
Title	Towards explaining the generalization gap in neural networks using topological data analysis			Type	Miscellaneous
Year	2022	Publication	Arxiv	Abbreviated Journal
Volume		Issue		Pages
Keywords
Abstract	Understanding how neural networks generalize on unseen data is crucial for designing more robust and reliable models. In this paper, we study the generalization gap of neural networks using methods from topological data analysis. For this purpose, we compute homological persistence diagrams of weighted graphs constructed from neuron activation correlations after a training phase, aiming to capture patterns that are linked to the generalization capacity of the network. We compare the usefulness of different numerical summaries from persistence diagrams and show that a combination of some of them can accurately predict and partially explain the generalization gap without the need of a test set. Evaluation on two computer vision recognition tasks (CIFAR10 and SVHN) shows competitive generalization gap prediction when compared against state-of-the-art methods.
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes	HUPBA; no menciona			Approved	no
Call Number	Admin @ si @ BAC2022			Serial	3821
Permanent link to this record