Robert Benavente, Ernest Valveny, Jaume Garcia, Agata Lapedriza, Miquel Ferrer, & Gemma Sanchez. (2008). Una experiencia de adaptacion al EEES de las asignaturas de programacion en Ingenieria Informatica.
|
Robert Benavente, Francesc Tous, Ramon Baldrich, & Maria Vanrell. (2002). Statical Modelling of a Colour Naming Space..
|
Robert Benavente, & Maria Vanrell. (2007). Parametrizacion del Espacio de Categorias de Color.
|
Robert Benavente, & Maria Vanrell. (2004). Fuzzy Colour Naming Based on Sigmoid Membership Functions..
|
Robert Benavente, Ramon Baldrich, M.C. Olive, & Maria Vanrell. (2000). Colour Naming Considering the Colour Variability Problem..
|
Ruben Ballester, Carles Casacuberta, & Sergio Escalera. (2023). Decorrelating neurons using persistence.
Abstract: We propose a novel way to improve the generalisation capacity of deep learning models by reducing high correlations between neurons. For this, we present two regularisation terms computed from the weights of a minimum spanning tree of the clique whose vertices are the neurons of a given network (or a sample of those), where weights on edges are correlation dissimilarities. We provide an extensive set of experiments to validate the effectiveness of our terms, showing that they outperform popular ones. Also, we demonstrate that naive minimisation of all correlations between neurons obtains lower accuracies than our regularisation terms, suggesting that redundancies play a significant role in artificial neural networks, as evidenced by some studies in neuroscience for real networks. We include a proof of differentiability of our regularisers, thus developing the first effective topological persistence-based regularisation terms that consider the whole set of neurons and that can be applied to a feedforward architecture in any deep learning task such as classification, data generation, or regression.
|
Ruben Ballester, Xavier Arnal Clemente, Carles Casacuberta, Meysam Madadi, & Ciprian Corneanu. (2022). Towards explaining the generalization gap in neural networks using topological data analysis.
Abstract: Understanding how neural networks generalize on unseen data is crucial for designing more robust and reliable models. In this paper, we study the generalization gap of neural networks using methods from topological data analysis. For this purpose, we compute homological persistence diagrams of weighted graphs constructed from neuron activation correlations after a training phase, aiming to capture patterns that are linked to the generalization capacity of the network. We compare the usefulness of different numerical summaries from persistence diagrams and show that a combination of some of them can accurately predict and partially explain the generalization gap without the need of a test set. Evaluation on two computer vision recognition tasks (CIFAR10 and SVHN) shows competitive generalization gap prediction when compared against state-of-the-art methods.
|
Ruben Tito, Khanh Nguyen, Marlon Tobaben, Raouf Kerkouche, Mohamed Ali Souibgui, Kangsoo Jung, et al. (2023). Privacy-Aware Document Visual Question Answering.
Abstract: Document Visual Question Answering (DocVQA) is a fast growing branch of document understanding. Despite the fact that documents contain sensitive or copyrighted information, none of the current DocVQA methods offers strong privacy guarantees.
In this work, we explore privacy in the domain of DocVQA for the first time. We highlight privacy issues in state of the art multi-modal LLM models used for DocVQA, and explore possible solutions.
Specifically, we focus on the invoice processing use case as a realistic, widely used scenario for document understanding, and propose a large scale DocVQA dataset comprising invoice documents and associated questions and answers. We employ a federated learning scheme, that reflects the real-life distribution of documents in different businesses, and we explore the use case where the ID of the invoice issuer is the sensitive information to be protected.
We demonstrate that non-private models tend to memorise, behaviour that can lead to exposing private information. We then evaluate baseline training schemes employing federated learning and differential privacy in this multi-modal scenario, where the sensitive information might be exposed through any of the two input modalities: vision (document image) or language (OCR tokens).
Finally, we design an attack exploiting the memorisation effect of the model, and demonstrate its effectiveness in probing different DocVQA models.
|
S. Garcia, Dani Rowe, Jordi Gonzalez, & Juan J. Villanueva. (2005). Articulated Object Modelling Using Neural Gas Networks.
|
S. Gonzalez, & A. Martinez. (1997). Fundamentos de la Vision aplicada a la Robotica Autonoma..
|
Saiping Zhang, L. H., Marta Mrak, Marc Gorriz Blanch, Shuai Wan, Fuzheng Yang. (2022). PeQuENet: Perceptual Quality Enhancement of Compressed Video with Adaptation-and Attention-based Network.
Abstract: In this paper we propose a generative adversarial network (GAN) framework to enhance the perceptual quality of compressed videos. Our framework includes attention and adaptation to different quantization parameters (QPs) in a single model. The attention module exploits global receptive fields that can capture and align long-range correlations between consecutive frames, which can be beneficial for enhancing perceptual quality of videos. The frame to be enhanced is fed into the deep network together with its neighboring frames, and in the first stage features at different depths are extracted. Then extracted features are fed into attention blocks to explore global temporal correlations, followed by a series of upsampling and convolution layers. Finally, the resulting features are processed by the QP-conditional adaptation module which leverages the corresponding QP information. In this way, a single model can be used to enhance adaptively to various QPs without requiring multiple models specific for every QP value, while having similar performance. Experimental results demonstrate the superior performance of the proposed PeQuENet compared with the state-of-the-art compressed video quality enhancement algorithms.
|
Senmao Li, Joost van de Weijer, Taihang Hu, Fahad Shahbaz Khan, Qibin Hou, Yaxing Wang, et al. (2023). StyleDiffusion: Prompt-Embedding Inversion for Text-Based Editing.
Abstract: A significant research effort is focused on exploiting the amazing capacities of pretrained diffusion models for the editing of images. They either finetune the model, or invert the image in the latent space of the pretrained model. However, they suffer from two problems: (1) Unsatisfying results for selected regions, and unexpected changes in nonselected regions. (2) They require careful text prompt editing where the prompt should include all visual objects in the input image. To address this, we propose two improvements: (1) Only optimizing the input of the value linear network in the cross-attention layers, is sufficiently powerful to reconstruct a real image. (2) We propose attention regularization to preserve the object-like attention maps after editing, enabling us to obtain accurate style editing without invoking significant structural changes. We further improve the editing technique which is used for the unconditional branch of classifier-free guidance, as well as the conditional one as used by P2P. Extensive experimental prompt-editing results on a variety of images, demonstrate qualitatively and quantitatively that our method has superior editing capabilities than existing and concurrent works.
|
Sergio Escalera. (2008). Coding and Decoding Design of ECOCs for Multi-Class Pattern and Object Recognition.
|
Sergio Escalera, Oriol Pujol, & Petia Radeva. (2006). ECOC-ONE: A novel coding and decoding strategy.
|
Sergio Escalera, Oriol Pujol, & Petia Radeva. (2006). Boosted Landmarks of Contextual Descriptors and Forest-ECOC: a novel framework to detect and classify objects in cluttered scenes.
|