|
Hongxing Gao. (2015). Focused Structural Document Image Retrieval in Digital Mailroom Applications (Josep Llados, Dimosthenis Karatzas, & Marçal Rusiñol, Eds.). Ph.D. thesis, Ediciones Graficas Rey, .
Abstract: In this work, we develop a generic framework that is able to handle the document retrieval problem in various scenarios such as searching for full page matches or retrieving the counterparts for specific document areas, focusing on their structural similarity or letting their visual resemblance to play a dominant role. Based on the spatial indexing technique, we propose to search for matches of local key-region pairs carrying both structural and visual information from the collection while a scheme allowing to adjust the relative contribution of structural and visual similarity is presented.
Based on the fact that the structure of documents is tightly linked with the distance among their elements, we firstly introduce an efficient detector named Distance Transform based Maximally Stable Extremal Regions (DTMSER). We illustrate that this detector is able to efficiently extract the structure of a document image as a dendrogram (hierarchical tree) of multi-scale key-regions that roughly correspond to letters, words and paragraphs. We demonstrate that, without benefiting from the structure information, the key-regions extracted by the DTMSER algorithm achieve better results comparing with state-of-the-art methods while much less amount of key-regions are employed.
We subsequently propose a pair-wise Bag of Words (BoW) framework to efficiently embed the explicit structure extracted by the DTMSER algorithm. We represent each document as a list of key-region pairs that correspond to the edges in the dendrogram where inclusion relationship is encoded. By employing those structural key-region pairs as the pooling elements for generating the histogram of features, the proposed method is able to encode the explicit inclusion relations into a BoW representation. The experimental results illustrate that the pair-wise BoW, powered by the embedded structural information, achieves remarkable improvement over the conventional BoW and spatial pyramidal BoW methods.
To handle various retrieval scenarios in one framework, we propose to directly query a series of key-region pairs, carrying both structure and visual information, from the collection. We introduce the spatial indexing techniques to the document retrieval community to speed up the structural relationship computation for key-region pairs. We firstly test the proposed framework in a full page retrieval scenario where structurally similar matches are expected. In this case, the pair-wise querying method achieves notable improvement over the BoW and spatial pyramidal BoW frameworks. Furthermore, we illustrate that the proposed method is also able to handle focused retrieval situations where the queries are defined as a specific interesting partial areas of the images. We examine our method on two types of focused queries: structure-focused and exact queries. The experimental results show that, the proposed generic framework obtains nearly perfect precision on both types of focused queries while it is the first framework able to tackle structure-focused queries, setting a new state of the art in the field.
Besides, we introduce a line verification method to check the spatial consistency among the matched key-region pairs. We propose a computationally efficient version of line verification through a two step implementation. We first compute tentative localizations of the query and subsequently employ them to divide the matched key-region pairs into several groups, then line verification is performed within each group while more precise bounding boxes are computed. We demonstrate that, comparing with the standard approach (based on RANSAC), the line verification proposed generally achieves much higher recall with slight loss on precision on specific queries.
|
|
|
Wenjuan Gong. (2013). 3D Motion Data aided Human Action Recognition and Pose Estimation (Jordi Gonzalez, & Xavier Roca, Eds.). Ph.D. thesis, Ediciones Graficas Rey, .
Abstract: In this work, we explore human action recognition and pose estimation prob-
lems. Different from traditional works of learning from 2D images or video
sequences and their annotated output, we seek to solve the problems with ad-
ditional 3D motion capture information, which helps to fill the gap between 2D
image features and human interpretations.
We first compare two different schools of approaches commonly used for 3D
pose estimation from 2D pose configuration: modeling and learning methods.
By looking into experiments results and considering our problems, we fixed a
learning method as the following approaches to do pose estimation. We then
establish a framework by adding a module of detecting 2D pose configuration
from images with varied background, which widely extend the application of
the approach. We also seek to directly estimate 3D poses from image features,
instead of estimating 2D poses as a intermediate module. We explore a robust
input feature, which combined with the proposed distance measure, provides
a solution for noisy or corrupted inputs. We further utilize the above method
to estimate weak poses,which is a concise representation of the original poses
by using dimension deduction technologies, from image features. Weak pose
space is where we calculate vocabulary and label action types using a bog of
words pipeline. Temporal information of an action is taken into consideration by
considering several consecutive frames as a single unit for computing vocabulary
and histogram assignments.
|
|
|
Cristhian A. Aguilera-Carrasco, C. Aguilera, & Angel Sappa. (2018). Melamine Faced Panels Defect Classification beyond the Visible Spectrum. SENS - Sensors, 18(11), 1–10.
Abstract: In this work, we explore the use of images from different spectral bands to classify defects in melamine faced panels, which could appear through the production process. Through experimental evaluation, we evaluate the use of images from the visible (VS), near-infrared (NIR), and long wavelength infrared (LWIR), to classify the defects using a feature descriptor learning approach together with a support vector machine classifier. Two descriptors were evaluated, Extended Local Binary Patterns (E-LBP) and SURF using a Bag of Words (BoW) representation. The evaluation was carried on with an image set obtained during this work, which contained five different defect categories that currently occurs in the industry. Results show that using images from beyond the visual spectrum helps to improve classification performance in contrast with a single visible spectrum solution.
Keywords: industrial application; infrared; machine learning
|
|
|
Valeriya Khan, Sebastian Cygert, Bartlomiej Twardowski, & Tomasz Trzcinski. (2023). Looking Through the Past: Better Knowledge Retention for Generative Replay in Continual Learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops (pp. 3496–3500).
Abstract: In this work, we improve the generative replay in a continual learning setting. We notice that in VAE-based generative replay, the generated features are quite far from the original ones when mapped to the latent space. Therefore, we propose modifications that allow the model to learn and generate complex data. More specifically, we incorporate the distillation in latent space between the current and previous models to reduce feature drift. Additionally, a latent matching for the reconstruction and original data is proposed to improve generated features alignment. Further, based on the observation that the reconstructions are better for preserving knowledge, we add the cycling of generations through the previously trained model to make them closer to the original data. Our method outperforms other generative replay methods in various scenarios.
|
|
|
Santi Puch, Irina Sanchez, Aura Hernandez-Sabate, Gemma Piella, & Vesna Prckovska. (2018). Global Planar Convolutions for Improved Context Aggregation in Brain Tumor Segmentation. In International MICCAI Brainlesion Workshop (Vol. 11384, pp. 393–405). LNCS.
Abstract: In this work, we introduce the Global Planar Convolution module as a building-block for fully-convolutional networks that aggregates global information and, therefore, enhances the context perception capabilities of segmentation networks in the context of brain tumor segmentation. We implement two baseline architectures (3D UNet and a residual version of 3D UNet, ResUNet) and present a novel architecture based on these two architectures, ContextNet, that includes the proposed Global Planar Convolution module. We show that the addition of such module eliminates the need of building networks with several representation levels, which tend to be over-parametrized and to showcase slow rates of convergence. Furthermore, we provide a visual demonstration of the behavior of GPC modules via visualization of intermediate representations. We finally participate in the 2018 edition of the BraTS challenge with our best performing models, that are based on ContextNet, and report the evaluation scores on the validation and the test sets of the challenge.
Keywords: Brain tumors; 3D fully-convolutional CNN; Magnetic resonance imaging; Global planar convolution
|
|
|
Filip Szatkowski, Mateusz Pyla, Marcin Przewięzlikowski, Sebastian Cygert, Bartłomiej Twardowski, & Tomasz Trzcinski. (2023). Adapt Your Teacher: Improving Knowledge Distillation for Exemplar-Free Continual Learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops (pp. 3512–3517).
Abstract: In this work, we investigate exemplar-free class incremental learning (CIL) with knowledge distillation (KD) as a regularization strategy, aiming to prevent forgetting. KD-based methods are successfully used in CIL, but they often struggle to regularize the model without access to exemplars of the training data from previous tasks. Our analysis reveals that this issue originates from substantial representation shifts in the teacher network when dealing with out-of-distribution data. This causes large errors in the KD loss component, leading to performance degradation in CIL. Inspired by recent test-time adaptation methods, we introduce Teacher Adaptation (TA), a method that concurrently updates the teacher and the main model during incremental training. Our method seamlessly integrates with KD-based CIL approaches and allows for consistent enhancement of their performance across multiple exemplar-free CIL benchmarks.
|
|
|
Pierluigi Casale, Oriol Pujol, Petia Radeva, & Jordi Vitria. (2009). A First Approach to Activity Recognition Using Topic Models. In 12th International Conference of the Catalan Association for Artificial Intelligence (Vol. 202, pp. 74–82).
Abstract: In this work, we present a first approach to activity patterns discovery by mean of topic models. Using motion data collected with a wearable device we prototype, TheBadge, we analyse raw accelerometer data using Latent Dirichlet Allocation (LDA), a particular instantiation of topic models. Results show that for particular values of the parameters necessary for applying LDA to a countinous dataset, good accuracies in activity classification can be achieved.
|
|
|
Jaume Garcia, Debora Gil, Joel Barajas, Francesc Carreras, Sandra Pujades, & Petia Radeva. (2006). Characterization of ventricular torsion in healthy subjects using Gabor filters and a variational framework. In Proc. Computers in Cardiology (pp. 877–880).
Abstract: In this work, we present a fully automated method for tissue deformation estimation in tagged magnetic resonance images (TMRI). Gabor filter banks, tuned independently for each left ventricle level, provide optimally filtered complex images which phase remains constant along the cardiac cycle. This fact can be thought as the brightness constancy condition required by classical optical flow (OF) methods. Pairs of these filtered sequences, together with a variational formulation are used in a second step to obtain dense continuous deformation maps that we call Harmonic Phase Flow. This method has been used to determine reference values of ventricular torsion (VT) in a set of 8 healthy volunteers. The results encourage the use of VT as a useful parameter for ventricular function assessment in clinical routine.
|
|
|
Oscar Camara, Estanislao Oubel, Gemma Piella, Simone Balocco, Mathieu De Craene, & Alejandro F. Frangi. (2009). Multi-sequence Registration of Cine, Tagged and Delay-Enhancement MRI with Shift Correction and Steerable Pyramid-Based Detagging. In 5th International Conference on Functional Imaging and Modeling of the Heart (Vol. 5528, 330–338). LNCS. Springer Berlin Heidelberg.
Abstract: In this work, we present a registration framework for cardiac cine MRI (cMRI), tagged (tMRI) and delay-enhancement MRI (deMRI), where the two main issues to find an accurate alignment between these images have been taking into account: the presence of tags in tMRI and respiration artifacts in all sequences. A steerable pyramid image decomposition has been used for detagging purposes since it is suitable to extract high-order oriented structures by directional adaptive filtering. Shift correction of cMRI is achieved by firstly maximizing the similarity between the Long Axis and Short Axis cMRI. Subsequently, these shift-corrected images are used as target images in a rigid registration procedure with their corresponding tMRI/deMRI in order to correct their shift. The proposed registration framework has been evaluated by 840 registration tests, considerably improving the alignment of the MR images (mean RMS error of 2.04mm vs. 5.44mm).
|
|
|
Rada Deeb, Joost Van de Weijer, Damien Muselet, Mathieu Hebert, & Alain Tremeau. (2019). Deep spectral reflectance and illuminant estimation from self-interreflections. JOSA A - Journal of the Optical Society of America A, 31(1), 105–114.
Abstract: In this work, we propose a convolutional neural network based approach to estimate the spectral reflectance of a surface and spectral power distribution of light from a single RGB image of a V-shaped surface. Interreflections happening in a concave surface lead to gradients of RGB values over its area. These gradients carry a lot of information concerning the physical properties of the surface and the illuminant. Our network is trained with only simulated data constructed using a physics-based interreflection model. Coupling interreflection effects with deep learning helps to retrieve the spectral reflectance under an unknown light and to estimate spectral power distribution of this light as well. In addition, it is more robust to the presence of image noise than classical approaches. Our results show that the proposed approach outperforms state-of-the-art learning-based approaches on simulated data. In addition, it gives better results on real data compared to other interreflection-based approaches.
|
|
|
Mariella Dimiccoli, Jean-Pascal Jacob, & Lionel Moisan. (2016). Particle detection and tracking in fluorescence time-lapse imaging: a contrario approach. MVAP - Journal of Machine Vision and Applications, 27, 511–527.
Abstract: In this work, we propose a probabilistic approach for the detection and the
tracking of particles on biological images. In presence of very noised and poor
quality data, particles and trajectories can be characterized by an a-contrario
model, that estimates the probability of observing the structures of interest
in random data. This approach, first introduced in the modeling of human visual
perception and then successfully applied in many image processing tasks, leads
to algorithms that do not require a previous learning stage, nor a tedious
parameter tuning and are very robust to noise. Comparative evaluations against
a well established baseline show that the proposed approach outperforms the
state of the art.
Keywords: particle detection; particle tracking; a-contrario approach; time-lapse fluorescence imaging
|
|
|
Josep Famadas, Meysam Madadi, Cristina Palmero, & Sergio Escalera. (2020). Generative Video Face Reenactment by AUs and Gaze Regularization. In 15th IEEE International Conference on Automatic Face and Gesture Recognition (pp. 444–451).
Abstract: In this work, we propose an encoder-decoder-like architecture to perform face reenactment in image sequences. Our goal is to transfer the training subject identity to a given test subject. We regularize face reenactment by facial action unit intensity and 3D gaze vector regression. This way, we enforce the network to transfer subtle facial expressions and eye dynamics, providing a more lifelike result. The proposed encoder-decoder receives as input the previous sequence frame stacked to the current frame image of facial landmarks. Thus, the generated frames benefit from appearance and geometry, while keeping temporal coherence for the generated sequence. At test stage, a new target subject with the facial performance of the source subject and the appearance of the training subject is reenacted. Principal component analysis is applied to project the test subject geometry to the closest training subject geometry before reenactment. Evaluation of our proposal shows faster convergence, and more accurate and realistic results in comparison to other architectures without action units and gaze regularization.
|
|
|
Yasuko Sugito, Trevor Canham, Javier Vazquez, & Marcelo Bertalmio. (2021). A Study of Objective Quality Metrics for HLG-Based HDR/WCG Image Coding. SMPTE - SMPTE Motion Imaging Journal, 53–65.
Abstract: In this work, we study the suitability of high dynamic range, wide color gamut (HDR/WCG) objective quality metrics to assess the perceived deterioration of compressed images encoded using the hybrid log-gamma (HLG) method, which is the standard for HDR television. Several image quality metrics have been developed to deal specifically with HDR content, although in previous work we showed that the best results (i.e., better matches to the opinion of human expert observers) are obtained by an HDR metric that consists simply in applying a given standard dynamic range metric, called visual information fidelity (VIF), directly to HLG-encoded images. However, all these HDR metrics ignore the chroma components for their calculations, that is, they consider only the luminance channel. For this reason, in the current work, we conduct subjective evaluation experiments in a professional setting using compressed HDR/WCG images encoded with HLG and analyze the ability of the best HDR metric to detect perceivable distortions in the chroma components, as well as the suitability of popular color metrics (including ΔITPR , which supports parameters for HLG) to correlate with the opinion scores. Our first contribution is to show that there is a need to consider the chroma components in HDR metrics, as there are color distortions that subjects perceive but that the best HDR metric fails to detect. Our second contribution is the surprising result that VIF, which utilizes only the luminance channel, correlates much better with the subjective evaluation scores than the metrics investigated that do consider the color components.
|
|
|
Enric Marti, Ferran Poveda, Antoni Gurgui, & Debora Gil. (2011). Aprendizaje Basado en Proyectos en Ingeniería Informática. Resultados y reflexiones de seis años de experiencia.
Abstract: In this workshop a 6 years experience in Project Based Learning (PBL) in Computer Graphics, Computer Engineering course at the Autonomous University of Barcelona (UAB) is presented. We use a Moodle environment suited to manage the documentation generated in PBL. The course is organized by means of two alternative routes: a classic itinerary of lectures and test-based evaluation and another with PBL. In the PBL itinerary we explain the organization in teamgroups, homework tutoring and monitoring and evaluation guidelines for students. We provide some of the work done by students, and the results of assessment surveys carried out to students during these years. We report the evolution of our PBL itinerary in terms of, both, organization and student surveys.
The workshop aims at discussing about on the advantages and disadvantages of using these active methodologies in technical degrees such as computer engineering, in order to debate about the most suitable way of organizing PBL and assessing students learning rate.
|
|
|
C. Alejandro Parraga, Robert Benavente, & Maria Vanrell. (2010). Towards a general model of colour categorization which considers context. PER - Perception. ECVP Abstract Supplement, 39, 86.
Abstract: In two previous experiments [Parraga et al, 2009 J. of Im. Sci. and Tech 53(3) 031106; Benavente et al,2009 Perception 38 ECVP Supplement, 36] the boundaries of basic colour categories were measured.
In the first experiment, samples were presented in isolation (ie on a dark background) and boundaries were measured using a yes/no paradigm. In the second, subjects adjusted the chromaticity of a sample presented on a random Mondrian background to find the boundary between pairs of adjacent colours.
Results from these experiments showed significant dierences but it was not possible to conclude whether this discrepancy was due to the absence/presence of a colourful background or to the dierences in the paradigms used. In this work, we settle this question by repeating the first experiment (ie samples presented on a dark background) using the second paradigm. A comparison of results shows that
although boundary locations are very similar, boundaries measured in context are significantly dierent(more diuse) than those measured in isolation (confirmed by a Student’s t-test analysis on the subject’s answers statistical distributions). In addition, we completed the mapping of colour name space by measuring the boundaries between chromatic colours and the achromatic centre. With these results we
completed our parametric fuzzy-sets model of colour naming space.
|
|