|
Jose Antonio Rodriguez. (2006). Pen-based Interfaces and Recognition: Application to Proofreading Interpretation.
|
|
|
David Aldavert. (2006). Visual Simultaneous Localization and Mapping.
|
|
|
David Geronimo. (2006). Model Features and Horizon Line Estimation for Pedestrian Detection in Advanced Driver Assistance Systems. Master's thesis, , .
|
|
|
Jorge Bernal. (2009). Use of Projection and Back-projection Methods in Bidimensional Computed Tomography Image Reconstruction (Vol. 141). Master's thesis, , Barcelona, Spain.
Abstract: One of the biggest drawbacks related to the use of CT scanners is the cost (in memory and in time) associated. In this project many methods to simulate their functioning, but in a more feasible way (taking an industrial point of view), will be studied.
The main group of techniques that are being used are the one entitled as ’back-projection’. The concept behind is to simulate the X ray emission in CT scans by lines that cross with the image we want to reconstruct.
In the first part of this document euclidean geometry is used to face the tasks of projec- tion and back-projection. After analysing the results achieved it has been proved that this approach does not lead to a fully perfect reconstruction (and also has some other problems related to running time and memory cost). Because of this in the second part of the document ’Filtered Back-projection’ method is introduced in order to improve the results.
Filtered Back-projection methods rely on mathematical transforms (Fourier, Radon) in order to provide more accurate results that can be obtained in much less time. The main cause of this better results is the use of a filtering process before the back-projection in order to avoid high frequency-caused errors.
As a result of this project two different implementations (one for each approach) had been implemented in order to compare their performance.
Keywords: Projection, Back-projection, CT scan, Euclidean geometry, Radon transform
|
|
|
J.R. Serra, & J.B. Subirana. (1996). Extraccion de estructuras interesantes en imagenes.
|
|
|
Patricia Marquez. (2010). Conditions Ensuring Accuracy of Local Optical Flow Schemes (Vol. 157). Master's thesis, , Bellaterra 08193, Barcelona, Spain.
Abstract: Accurate computation of optical flow is a key-point in many image processing fields. Detection of anomalous and unpredicted agents (such as pedestrians, bikers or cars) in urban scenes or pathology discrimination in medical imaging sequences, to mention just a two. The above kinds sequences present two main difficulties for standard optical flow techniques. On one hand, variability in acquisition conditions (illuminance, medical imaging modality, ...) force an alterantive representation for images fulfilling the britghtness constancy constrain. On the hand, current variational schemes produce oversmoothed fields unable to properly model discontinuous behaviours such as collisions or functionless pathological areas. This master project explores the abilities and limitations of local and global optical flow approaches. The master student will put especial emphasis in the theoretical grounds behind in order to design a variational framework combining the theoretical advantages of the considered techniques. In particular an optical flow based on Gabor phase tracking (developed in the group for medical imaging) will be generalized to urban scenes.
|
|
|
Dani Rowe. (2008). Towards Robust Multiple-Target Tracking in Unconstrained Human-Populated Environments.
|
|
|
Carme Julia. (2008). Missig Data Matrix Factorization Addressing the Structure from Motion Problem.
|
|
|
Jaime Lopez-Krahe, Josep Llados, & Enric Marti. (2000). Architectural Floor Plan Analysis (Robert B. Fisher, Ed.). University of Edinburgh.
|
|
|
Bojana Gajic, & Ramon Baldrich. (2018). Cross-domain fashion image retrieval. In CVPR 2018 Workshop on Women in Computer Vision (WiCV 2018, 4th Edition) (pp. 19500–19502).
Abstract: Cross domain image retrieval is a challenging task that implies matching images from one domain to their pairs from another domain. In this paper we focus on fashion image retrieval, which involves matching an image of a fashion item taken by users, to the images of the same item taken in controlled condition, usually by professional photographer. When facing this problem, we have different products
in train and test time, and we use triplet loss to train the network. We stress the importance of proper training of simple architecture, as well as adapting general models to the specific task.
|
|
|
Zhaocheng Liu, Luis Herranz, Fei Yang, Saiping Zhang, Shuai Wan, Marta Mrak, et al. (2022). Slimmable Video Codec. In CVPR 2022 Workshop and Challenge on Learned Image Compression (CLIC 2022, 5th Edition) (pp. 1742–1746).
Abstract: Neural video compression has emerged as a novel paradigm combining trainable multilayer neural net-works and machine learning, achieving competitive rate-distortion (RD) performances, but still remaining impractical due to heavy neural architectures, with large memory and computational demands. In addition, models are usually optimized for a single RD tradeoff. Recent slimmable image codecs can dynamically adjust their model capacity to gracefully reduce the memory and computation requirements, without harming RD performance. In this paper we propose a slimmable video codec (SlimVC), by integrating a slimmable temporal entropy model in a slimmable autoencoder. Despite a significantly more complex architecture, we show that slimming remains a powerful mechanism to control rate, memory footprint, computational cost and latency, all being important requirements for practical video compression.
|
|
|
Kai Wang, Xialei Liu, Andrew Bagdanov, Luis Herranz, Shangling Jui, & Joost Van de Weijer. (2022). Incremental Meta-Learning via Episodic Replay Distillation for Few-Shot Image Recognition. In CVPR 2022 Workshop on Continual Learning (CLVision, 3rd Edition) (pp. 3728–3738).
Abstract: In this paper we consider the problem of incremental meta-learning in which classes are presented incrementally in discrete tasks. We propose Episodic Replay Distillation (ERD), that mixes classes from the current task with exemplars from previous tasks when sampling episodes for meta-learning. To allow the training to benefit from a large as possible variety of classes, which leads to more gener-
alizable feature representations, we propose the cross-task meta loss. Furthermore, we propose episodic replay distillation that also exploits exemplars for improved knowledge distillation. Experiments on four datasets demonstrate that ERD surpasses the state-of-the-art. In particular, on the more challenging one-shot, long task sequence scenarios, we reduce the gap between Incremental Meta-Learning and
the joint-training upper bound from 3.5% / 10.1% / 13.4% / 11.7% with the current state-of-the-art to 2.6% / 2.9% / 5.0% / 0.2% with our method on Tiered-ImageNet / Mini-ImageNet / CIFAR100 / CUB, respectively.
Keywords: Training; Computer vision; Image recognition; Upper bound; Conferences; Pattern recognition; Task analysis
|
|
|
Alex Gomez-Villa, Bartlomiej Twardowski, Lu Yu, Andrew Bagdanov, & Joost Van de Weijer. (2022). Continually Learning Self-Supervised Representations With Projected Functional Regularization. In CVPR 2022 Workshop on Continual Learning (CLVision, 3rd Edition) (pp. 3866–3876).
Abstract: Recent self-supervised learning methods are able to learn high-quality image representations and are closing the gap with supervised approaches. However, these methods are unable to acquire new knowledge incrementally – they are, in fact, mostly used only as a pre-training phase over IID data. In this work we investigate self-supervised methods in continual learning regimes without any replay
mechanism. We show that naive functional regularization,also known as feature distillation, leads to lower plasticity and limits continual learning performance. Instead, we propose Projected Functional Regularization in which a separate temporal projection network ensures that the newly learned feature space preserves information of the previous one, while at the same time allowing for the learning of new features. This prevents forgetting while maintaining the plasticity of the learner. Comparison with other incremental learning approaches applied to self-supervision demonstrates that our method obtains competitive performance in
different scenarios and on multiple datasets.
Keywords: Computer vision; Conferences; Self-supervised learning; Image representation; Pattern recognition
|
|
|
Bojana Gajic, Ariel Amato, Ramon Baldrich, Joost Van de Weijer, & Carlo Gatta. (2022). Area Under the ROC Curve Maximization for Metric Learning. In CVPR 2022 Workshop on Efficien Deep Learning for Computer Vision (ECV 2022, 5th Edition).
Abstract: Most popular metric learning losses have no direct relation with the evaluation metrics that are subsequently applied to evaluate their performance. We hypothesize that training a metric learning model by maximizing the area under the ROC curve (which is a typical performance measure of recognition systems) can induce an implicit ranking suitable for retrieval problems. This hypothesis is supported by previous work that proved that a curve dominates in ROC space if and only if it dominates in Precision-Recall space. To test this hypothesis, we design and maximize an approximated, derivable relaxation of the area under the ROC curve. The proposed AUC loss achieves state-of-the-art results on two large scale retrieval benchmark datasets (Stanford Online Products and DeepFashion In-Shop). Moreover, the AUC loss achieves comparable performance to more complex, domain specific, state-of-the-art methods for vehicle re-identification.
Keywords: Training; Computer vision; Conferences; Area measurement; Benchmark testing; Pattern recognition
|
|
|
Mohamed Ramzy Ibrahim, Robert Benavente, Felipe Lumbreras, & Daniel Ponsa. (2022). 3DRRDB: Super Resolution of Multiple Remote Sensing Images using 3D Residual in Residual Dense Blocks. In CVPR 2022 Workshop on IEEE Perception Beyond the Visible Spectrum workshop series (PBVS, 18th Edition).
Abstract: The rapid advancement of Deep Convolutional Neural Networks helped in solving many remote sensing problems, especially the problems of super-resolution. However, most state-of-the-art methods focus more on Single Image Super-Resolution neglecting Multi-Image Super-Resolution. In this work, a new proposed 3D Residual in Residual Dense Blocks model (3DRRDB) focuses on remote sensing Multi-Image Super-Resolution for two different single spectral bands. The proposed 3DRRDB model explores the idea of 3D convolution layers in deeply connected Dense Blocks and the effect of local and global residual connections with residual scaling in Multi-Image Super-Resolution. The model tested on the Proba-V challenge dataset shows a significant improvement above the current state-of-the-art models scoring a Corrected Peak Signal to Noise Ratio (cPSNR) of 48.79 dB and 50.83 dB for Near Infrared (NIR) and RED Bands respectively. Moreover, the proposed 3DRRDB model scores a Corrected Structural Similarity Index Measure (cSSIM) of 0.9865 and 0.9909 for NIR and RED bands respectively.
Keywords: Training; Solid modeling; Three-dimensional displays; PSNR; Convolution; Superresolution; Pattern recognition
|
|