Marco Pedersoli, Jordi Gonzalez, & Juan J. Villanueva. (2009). High-Speed Human Detection Using a Multiresolution Cascade of Histograms of Oriented Gradients. In 4th Iberian Conference on Pattern Recognition and Image Analysis (Vol. 5524). LNCS. Springer Berlin Heidelberg.
Abstract: This paper presents a new method for human detection based on a multiresolution cascade of Histograms of Oriented Gradients (HOG) that can highly reduce the computational cost of the detection search without affecting accuracy. The method consists of a cascade of sliding window detectors. Each detector is a Support Vector Machine (SVM) composed by features at different resolution, from coarse for the first level to fine for the last one.
Considering that the spatial stride of the sliding window search is affected by the HOG features size, unlike previous methods based on Adaboost cascades, we can adopt a spatial stride inversely proportional to the features resolution. This produces that the speed-up of the cascade is not only due to the low number of features that need to be computed in the first levels, but also to the lower number of detection windows that needs to be evaluated.
Experimental results shows that our method permits a detection rate comparable with the state of the art, but at the same time a gain in the speed of the detection search of 10-20 times depending on the cascade configuration.
|
Marco Pedersoli, Jordi Gonzalez, Xu Hu, & Xavier Roca. (2014). Toward Real-Time Pedestrian Detection Based on a Deformable Template Model. TITS - IEEE Transactions on Intelligent Transportation Systems, 15(1), 355–364.
Abstract: Most advanced driving assistance systems already include pedestrian detection systems. Unfortunately, there is still a tradeoff between precision and real time. For a reliable detection, excellent precision-recall such a tradeoff is needed to detect as many pedestrians as possible while, at the same time, avoiding too many false alarms; in addition, a very fast computation is needed for fast reactions to dangerous situations. Recently, novel approaches based on deformable templates have been proposed since these show a reasonable detection performance although they are computationally too expensive for real-time performance. In this paper, we present a system for pedestrian detection based on a hierarchical multiresolution part-based model. The proposed system is able to achieve state-of-the-art detection accuracy due to the local deformations of the parts while exhibiting a speedup of more than one order of magnitude due to a fast coarse-to-fine inference technique. Moreover, our system explicitly infers the level of resolution available so that the detection of small examples is feasible with a very reduced computational cost. We conclude this contribution by presenting how a graphics processing unit-optimized implementation of our proposed system is suitable for real-time pedestrian detection in terms of both accuracy and speed.
|
Marcos V Conde, Florin Vasluianu, Javier Vazquez, & Radu Timofte. (2023). Perceptual image enhancement for smartphone real-time applications. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (pp. 1848–1858).
Abstract: Recent advances in camera designs and imaging pipelines allow us to capture high-quality images using smartphones. However, due to the small size and lens limitations of the smartphone cameras, we commonly find artifacts or degradation in the processed images. The most common unpleasant effects are noise artifacts, diffraction artifacts, blur, and HDR overexposure. Deep learning methods for image restoration can successfully remove these artifacts. However, most approaches are not suitable for real-time applications on mobile devices due to their heavy computation and memory requirements. In this paper, we propose LPIENet, a lightweight network for perceptual image enhancement, with the focus on deploying it on smartphones. Our experiments show that, with much fewer parameters and operations, our model can deal with the mentioned artifacts and achieve competitive performance compared with state-of-the-art methods on standard benchmarks. Moreover, to prove the efficiency and reliability of our approach, we deployed the model directly on commercial smartphones and evaluated its performance. Our model can process 2K resolution images under 1 second in mid-level commercial smartphones.
|
Marcos V Conde, Javier Vazquez, Michael S Brown, & Radu TImofte. (2024). NILUT: Conditional Neural Implicit 3D Lookup Tables for Image Enhancement. In 38th AAAI Conference on Artificial Intelligence.
Abstract: 3D lookup tables (3D LUTs) are a key component for image enhancement. Modern image signal processors (ISPs) have dedicated support for these as part of the camera rendering pipeline. Cameras typically provide multiple options for picture styles, where each style is usually obtained by applying a unique handcrafted 3D LUT. Current approaches for learning and applying 3D LUTs are notably fast, yet not so memory-efficient, as storing multiple 3D LUTs is required. For this reason and other implementation limitations, their use on mobile devices is less popular. In this work, we propose a Neural Implicit LUT (NILUT), an implicitly defined continuous 3D color transformation parameterized by a neural network. We show that NILUTs are capable of accurately emulating real 3D LUTs. Moreover, a NILUT can be extended to incorporate multiple styles into a single network with the ability to blend styles implicitly. Our novel approach is memory-efficient, controllable and can complement previous methods, including learned ISPs.
|
Margarita Torre, Beatriz Remeseiro, Petia Radeva, & Fernando Martinez. (2020). DeepNEM: Deep Network Energy-Minimization for Agricultural Field Segmentation. JSTAEOR - IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 13, 726–737.
Abstract: One of the main characteristics of agricultural fields is that the appearance of different crops and their growth status, in an aerial image, is varied, and has a wide range of radiometric values and high level of variability. The extraction of these fields and their monitoring are activities that require a high level of human intervention. In this article, we propose a novel automatic algorithm, named deep network energy-minimization (DeepNEM), to extract agricultural fields in aerial images. The model-guided process selects the most relevant image clues extracted by a deep network, completes them and finally generates regions that represent the agricultural fields under a minimization scheme. DeepNEM has been tested over a broad range of fields in terms of size, shape, and content. Different measures were used to compare the DeepNEM with other methods, and to prove that it represents an improved approach to achieve a high-quality segmentation of agricultural fields. Furthermore, this article also presents a new public dataset composed of 1200 images with their parcels boundaries annotations.
|
Maria Alberich-Carramiñana, Guillem Alenya, Juan Andrade, E. Martinez, & Carme Torras. (2006). Affine Epipolar Direction from Two Views of a Planar Contour. In Proceedings of the Advanced Concepts for Intelligent Vision Systems Conference, LNCS 4179: 944–955.
|
Maria del Camp Davesa. (2011). Human action categorization in image sequences (Vol. 169). Master's thesis, , .
|
Maria Elena Meza de Luna, Juan Ramon Terven Salinas, Bogdan Raducanu, & Joaquin Salas. (2019). A Social-Aware Assistant to support individuals with visual impairments during social interaction: A systematic requirements analysis. IJHC - International Journal of Human-Computer Studies, 122, 50–60.
Abstract: Visual impairment affects the normal course of activities in everyday life including mobility, education, employment, and social interaction. Most of the existing technical solutions devoted to empowering the visually impaired people are in the areas of navigation (obstacle avoidance), access to printed information and object recognition. Less effort has been dedicated so far in developing solutions to support social interactions. In this paper, we introduce a Social-Aware Assistant (SAA) that provides visually impaired people with cues to enhance their face-to-face conversations. The system consists of a perceptive component (represented by smartglasses with an embedded video camera) and a feedback component (represented by a haptic belt). When the vision system detects a head nodding, the belt vibrates, thus suggesting the user to replicate (mirror) the gesture. In our experiments, sighted persons interacted with blind people wearing the SAA. We instructed the former to mirror the noddings according to the vibratory signal, while the latter interacted naturally. After the face-to-face conversation, the participants had an interview to express their experience regarding the use of this new technological assistant. With the data collected during the experiment, we have assessed quantitatively and qualitatively the device usefulness and user satisfaction.
|
Maria Elena Meza-de-Luna, Juan Ramon Terven Salinas, Bogdan Raducanu, & Joaquin Salas. (2016). Assessing the Influence of Mirroring on the Perception of Professional Competence using Wearable Technology. TAC - IEEE Transactions on Affective Computing, 9(2), 161–175.
Abstract: Nonverbal communication is an intrinsic part in daily face-to-face meetings. A frequently observed behavior during social interactions is mirroring, in which one person tends to mimic the attitude of the counterpart. This paper shows that a computer vision system could be used to predict the perception of competence in dyadic interactions through the automatic detection of mirroring
events. To prove our hypothesis, we developed: (1) A social assistant for mirroring detection, using a wearable device which includes a video camera and (2) an automatic classifier for the perception of competence, using the number of nodding gestures and mirroring events as predictors. For our study, we used a mixed-method approach in an experimental design where 48 participants acting as customers interacted with a confederated psychologist. We found that the number of nods or mirroring events has a significant influence on the perception of competence. Our results suggest that: (1) Customer mirroring is a better predictor than psychologist mirroring; (2) the number of psychologist’s nods is a better predictor than the number of customer’s nods; (3) except for the psychologist mirroring, the computer vision algorithm we used worked about equally well whether it was acquiring images from wearable smartglasses or fixed cameras.
Keywords: Mirroring; Nodding; Competence; Perception; Wearable Technology
|
Maria Ines Torres, Javier Mikel Olaso, Cesar Montenegro, Riberto Santana, A.Vazquez, Raquel Justo, et al. (2019). The EMPATHIC project: mid-term achievements. In 12th ACM International Conference on PErvasive Technologies Related to Assistive Environments (pp. 629–638).
Abstract: Maria Ines Torres; Javier Mikel Olaso, César Montenegro, Riberto Santana, A. Vázquez, Raquel Justo, J. A. Lozano, Stephan Schlögl, Gérard Chollet, Nazim Dugan, M. Irvine, N. Glackin, C. Pickard, Anna Esposito, Gennaro Cordasco, Alda Troncone, Dijana Petrovska-Delacrétaz, Aymen Mtibaa, Mohamed Amine Hmani, M. S. Korsnes, L. J. Martinussen, Sergio Escalera, C. Palmero Cantariño, Olivier Deroo, O. Gordeeva, Jofre Tenorio-Laranga, E. Gonzalez-Fraile, Begoña Fernández-Ruanova, A. Gonzalez-Pinto
|
Maria Oliver, Gloria Haro, Mariella Dimiccoli, Baptiste Mazin, & Coloma Ballester. (2016). A computational model of amodal completion. In SIAM Conference on Imaging Science.
Abstract: This paper presents a computational model to recover the most likely interpretation of the 3D scene structure from a planar image, where some objects may occlude others. The estimated scene interpretation is obtained by integrating some global and local cues and provides both the complete disoccluded objects that form the scene and their ordering according to depth. Our method first computes several distal scenes which are compatible with the proximal planar image. To compute these different hypothesized scenes, we propose a perceptually inspired object disocclusion method, which works by minimizing the Euler's elastica as well as by incorporating the relatability of partially occluded contours and the convexity of the disoccluded objects. Then, to estimate the preferred scene we rely on a Bayesian model and define probabilities taking into account the global complexity of the objects in the hypothesized scenes as well as the effort of bringing these objects in their relative position in the planar image, which is also measured by an Euler's elastica-based quantity. The model is illustrated with numerical experiments on, both, synthetic and real images showing the ability of our model to reconstruct the occluded objects and the preferred perceptual order among them. We also present results on images of the Berkeley dataset with provided figure-ground ground-truth labeling.
|
Maria Salamo, Inmaculada Rodriguez, Maite Lopez, Anna Puig, Simone Balocco, & Mariona Taule. (2016). Recurso docente para la atención de la diversidad en el aula mediante la predicción de notas. ReVision.
Abstract: Desde la implantación del Espacio Europeo de Educación Superior (EEES) en los diferentes grados, se ha puesto de manifiesto la necesidad de utilizar diversos mecanismos que permitan tratar la diversidad en el aula, evaluando automáticamente y proporcionando una retroalimentación rápida tanto al alumnado como al profesorado sobre la evolución de los alumnos en una asignatura. En este artículo se presenta la evaluación de la exactitud en las predicciones de GRADEFORESEER, un recurso docente para la predicción de notas basado en técnicas de aprendizaje automático que permite evaluar la evolución del alumnado y estimar su nota final al terminar el curso. Este recurso se ha complementado con una interfaz de usuario para el profesorado que puede ser usada en diferentes plataformas software (sistemas operativos) y en cualquier asignatura de un grado en la que se utilice evaluación continuada. Además de la descripción del recurso, este artículo presenta los resultados obtenidos al aplicar el sistema de predicción en cuatro asignaturas de disciplinas distintas: Programación I (PI), Diseño de Software (DSW) del grado de Ingeniería Informática, Tecnologías de la Información y la Comunicación (TIC) del grado de Lingüística y la asignatura Fundamentos de Tecnología (FDT) del grado de Información y Documentación, todas ellas impartidas en la Universidad de Barcelona.
La capacidad predictiva se ha evaluado de forma binaria (aprueba o no) y según un criterio de rango (suspenso, aprobado, notable o sobresaliente), obteniendo mejores predicciones en los resultados evaluados de forma binaria.
Keywords: Aprendizaje automatico; Sistema de prediccion de notas; Herramienta docente
|
Maria Salamo, & Sergio Escalera. (2011). Increasing Retrieval Quality in Conversational Recommenders. TKDE - IEEE Transactions on Knowledge and Data Engineering, 99, 1.
Abstract: IF JCR CCIA 2.286 2009 24/103
JCR Impact Factor 2010: 1.851
A major task of research in conversational recommender systems is personalization. Critiquing is a common and powerful form of feedback, where a user can express her feature preferences by applying a series of directional critiques over the recommendations instead of providing specific preference values. Incremental Critiquing is a conversational recommender system that uses critiquing as a feedback to efficiently personalize products. The expectation is that in each cycle the system retrieves the products that best satisfy the user’s soft product preferences from a minimal information input. In this paper, we present a novel technique that increases retrieval quality based on a combination of compatibility and similarity scores. Under the hypothesis that a user learns Turing the recommendation process, we propose two novel exponential reinforcement learning approaches for compatibility that take into account both the instant at which the user makes a critique and the number of satisfied critiques. Moreover, we consider that the impact of features on the similarity differs according to the preferences manifested by the user. We propose a global weighting approach that uses a common weight for nearest cases in order to focus on groups of relevant products. We show that our methodology significantly improves recommendation efficiency in four data sets of different sizes in terms of session length in comparison with state-of-the-art approaches. Moreover, our recommender shows higher robustness against noisy user data when compared to classical approaches
|
Maria Salamo, Sergio Escalera, & Petia Radeva. (2009). Quality Enhancement based on Reinforcement Learning and Feature Weighting for a Critiquing-Based Recommender. In 8th International Conference on Case-Based Reasoning (Vol. 5650, 298–312). LNCS. Springer Berlin Heidelberg.
Abstract: Personalizing the product recommendation task is a major focus of research in the area of conversational recommender systems. Conversational case-based recommender systems help users to navigate through product spaces, alternatively making product suggestions and eliciting users feedback. Critiquing is a common form of feedback and incremental critiquing-based recommender system has shown its efficiency to personalize products based primarily on a quality measure. This quality measure influences the recommendation process and it is obtained by the combination of compatibility and similarity scores. In this paper, we describe new compatibility strategies whose basis is on reinforcement learning and a new feature weighting technique which is based on the user’s history of critiques. Moreover, we show that our methodology can significantly improve recommendation efficiency in comparison with the state-of-the-art approaches.
|
Maria Vanrell. (1997). Exploring the space of behaviour of a texture perception algorithm.
|