|
Laura Lopez-Fuentes, Joost Van de Weijer, Manuel Gonzalez-Hidalgo, Harald Skinnemoen, & Andrew Bagdanov. (2018). Review on computer vision techniques in emergency situations. MTAP - Multimedia Tools and Applications, 77(13), 17069–17107.
Abstract: In emergency situations, actions that save lives and limit the impact of hazards are crucial. In order to act, situational awareness is needed to decide what to do. Geolocalized photos and video of the situations as they evolve can be crucial in better understanding them and making decisions faster. Cameras are almost everywhere these days, either in terms of smartphones, installed CCTV cameras, UAVs or others. However, this poses challenges in big data and information overflow. Moreover, most of the time there are no disasters at any given location, so humans aiming to detect sudden situations may not be as alert as needed at any point in time. Consequently, computer vision tools can be an excellent decision support. The number of emergencies where computer vision tools has been considered or used is very wide, and there is a great overlap across related emergency research. Researchers tend to focus on state-of-the-art systems that cover the same emergency as they are studying, obviating important research in other fields. In order to unveil this overlap, the survey is divided along four main axes: the types of emergencies that have been studied in computer vision, the objective that the algorithms can address, the type of hardware needed and the algorithms used. Therefore, this review provides a broad overview of the progress of computer vision covering all sorts of emergencies.
Keywords: Emergency management; Computer vision; Decision makers; Situational awareness; Critical situation
|
|
|
Laura Lopez-Fuentes, Alessandro Farasin, Harald Skinnemoen, & Paolo Garza. (2018). Deep Learning models for passability detection of flooded roads. In MediaEval 2018 Multimedia Benchmark Workshop (Vol. 2283).
Abstract: In this paper we study and compare several approaches to detect floods and evidence for passability of roads by conventional means in Twitter. We focus on tweets containing both visual information (a picture shared by the user) and metadata, a combination of text and related extra information intrinsic to the Twitter API. This work has been done in the context of the MediaEval 2018 Multimedia Satellite Task.
|
|
|
Katerine Diaz, Jesus Martinez del Rincon, Aura Hernandez-Sabate, Marçal Rusiñol, & Francesc J. Ferri. (2018). Fast Kernel Generalized Discriminative Common Vectors for Feature Extraction. JMIV - Journal of Mathematical Imaging and Vision, 60(4), 512–524.
Abstract: This paper presents a supervised subspace learning method called Kernel Generalized Discriminative Common Vectors (KGDCV), as a novel extension of the known Discriminative Common Vectors method with Kernels. Our method combines the advantages of kernel methods to model complex data and solve nonlinear
problems with moderate computational complexity, with the better generalization properties of generalized approaches for large dimensional data. These attractive combination makes KGDCV specially suited for feature extraction and classification in computer vision, image processing and pattern recognition applications. Two different approaches to this generalization are proposed, a first one based on the kernel trick (KT) and a second one based on the nonlinear projection trick (NPT) for even higher efficiency. Both methodologies
have been validated on four different image datasets containing faces, objects and handwritten digits, and compared against well known non-linear state-of-art methods. Results show better discriminant properties than other generalized approaches both linear or kernel. In addition, the KGDCV-NPT approach presents a considerable computational gain, without compromising the accuracy of the model.
|
|
|
Katerine Diaz, Jesus Martinez del Rincon, Aura Hernandez-Sabate, & Debora Gil. (2018). Continuous head pose estimation using manifold subspace embedding and multivariate regression. ACCESS - IEEE Access, 6, 18325–18334.
Abstract: In this paper, a continuous head pose estimation system is proposed to estimate yaw and pitch head angles from raw facial images. Our approach is based on manifold learningbased methods, due to their promising generalization properties shown for face modelling from images. The method combines histograms of oriented gradients, generalized discriminative common vectors and continuous local regression to achieve successful performance. Our proposal was tested on multiple standard face datasets, as well as in a realistic scenario. Results show a considerable performance improvement and a higher consistence of our model in comparison with other state-of-art methods, with angular errors varying between 9 and 17 degrees.
Keywords: Head Pose estimation; HOG features; Generalized Discriminative Common Vectors; B-splines; Multiple linear regression
|
|
|
Katerine Diaz, Francesc J. Ferri, & Aura Hernandez-Sabate. (2018). An overview of incremental feature extraction methods based on linear subspaces. KBS - Knowledge-Based Systems, 145, 219–235.
Abstract: With the massive explosion of machine learning in our day-to-day life, incremental and adaptive learning has become a major topic, crucial to keep up-to-date and improve classification models and their corresponding feature extraction processes. This paper presents a categorized overview of incremental feature extraction based on linear subspace methods which aim at incorporating new information to the already acquired knowledge without accessing previous data. Specifically, this paper focuses on those linear dimensionality reduction methods with orthogonal matrix constraints based on global loss function, due to the extensive use of their batch approaches versus other linear alternatives. Thus, we cover the approaches derived from Principal Components Analysis, Linear Discriminative Analysis and Discriminative Common Vector methods. For each basic method, its incremental approaches are differentiated according to the subspace model and matrix decomposition involved in the updating process. Besides this categorization, several updating strategies are distinguished according to the amount of data used to update and to the fact of considering a static or dynamic number of classes. Moreover, the specific role of the size/dimension ratio in each method is considered. Finally, computational complexity, experimental setup and the accuracy rates according to published results are compiled and analyzed, and an empirical evaluation is done to compare the best approach of each kind.
|
|
|
Jun Wan, Sergio Escalera, Francisco Perales, & Josef Kittler. (2018). Articulated Motion and Deformable Objects. PR - Pattern Recognition, 79, 55–64.
Abstract: This guest editorial introduces the twenty two papers accepted for this Special Issue on Articulated Motion and Deformable Objects (AMDO). They are grouped into four main categories within the field of AMDO: human motion analysis (action/gesture), human pose estimation, deformable shape segmentation, and face analysis. For each of the four topics, a survey of the recent developments in the field is presented. The accepted papers are briefly introduced in the context of this survey. They contribute novel methods, algorithms with improved performance as measured on benchmarking datasets, as well as two new datasets for hand action detection and human posture analysis. The special issue should be of high relevance to the reader interested in AMDO recognition and promote future research directions in the field.
|
|
|
Julio C. S. Jacques Junior, Xavier Baro, & Sergio Escalera. (2018). Exploiting feature representations through similarity learning, post-ranking and ranking aggregation for person re-identification. IMAVIS - Image and Vision Computing, 79, 76–85.
Abstract: Person re-identification has received special attention by the human analysis community in the last few years. To address the challenges in this field, many researchers have proposed different strategies, which basically exploit either cross-view invariant features or cross-view robust metrics. In this work, we propose to exploit a post-ranking approach and combine different feature representations through ranking aggregation. Spatial information, which potentially benefits the person matching, is represented using a 2D body model, from which color and texture information are extracted and combined. We also consider background/foreground information, automatically extracted via Deep Decompositional Network, and the usage of Convolutional Neural Network (CNN) features. To describe the matching between images we use the polynomial feature map, also taking into account local and global information. The Discriminant Context Information Analysis based post-ranking approach is used to improve initial ranking lists. Finally, the Stuart ranking aggregation method is employed to combine complementary ranking lists obtained from different feature representations. Experimental results demonstrated that we improve the state-of-the-art on VIPeR and PRID450s datasets, achieving 67.21% and 75.64% on top-1 rank recognition rate, respectively, as well as obtaining competitive results on CUHK01 dataset.
|
|
|
Jose M. Armingol, Jorge Alfonso, Nourdine Aliane, Miguel Clavijo, Sergio Campos-Cordobes, Arturo de la Escalera, et al. (2018). Environmental Perception for Intelligent Vehicles. In Intelligent Vehicles. Enabling Technologies and Future Developments (23–101).
Abstract: Environmental perception represents, because of its complexity, a challenge for Intelligent Transport Systems due to the great variety of situations and different elements that can happen in road environments and that must be faced by these systems. In connection with this, so far there are a variety of solutions as regards sensors and methods, so the results of precision, complexity, cost, or computational load obtained by these works are different. In this chapter some systems based on computer vision and laser techniques are presented. Fusion methods are also introduced in order to provide advanced and reliable perception systems.
Keywords: Computer vision; laser techniques; data fusion; advanced driver assistance systems; traffic monitoring systems; intelligent vehicles
|
|
|
Jorge Charco, Boris X. Vintimilla, & Angel Sappa. (2018). Deep learning based camera pose estimation in multi-view environment. In 14th IEEE International Conference on Signal Image Technology & Internet Based System.
Abstract: This paper proposes to use a deep learning network architecture for relative camera pose estimation on a multi-view environment. The proposed network is a variant architecture of AlexNet to use as regressor for prediction the relative translation and rotation as output. The proposed approach is trained from
scratch on a large data set that takes as input a pair of imagesfrom the same scene. This new architecture is compared with a previous approach using standard metrics, obtaining better results on the relative camera pose.
Keywords: Deep learning; Camera pose estimation; Multiview environment; Siamese architecture
|
|
|
Jorge Bernal, Aymeric Histace, Marc Masana, Quentin Angermann, Cristina Sanchez Montes, Cristina Rodriguez de Miguel, et al. (2018). Polyp Detection Benchmark in Colonoscopy Videos using GTCreator: A Novel Fully Configurable Tool for Easy and Fast Annotation of Image Databases. In 32nd International Congress and Exhibition on Computer Assisted Radiology & Surgery.
|
|
|
Jon Almazan, Bojana Gajic, Naila Murray, & Diane Larlus. (2018). Re-ID done right: towards good practices for person re-identification.
Abstract: Training a deep architecture using a ranking loss has become standard for the person re-identification task. Increasingly, these deep architectures include additional components that leverage part detections, attribute predictions, pose estimators and other auxiliary information, in order to more effectively localize and align discriminative image regions. In this paper we adopt a different approach and carefully design each component of a simple deep architecture and, critically, the strategy for training it effectively for person re-identification. We extensively evaluate each design choice, leading to a list of good practices for person re-identification. By following these practices, our approach outperforms the state of the art, including more complex methods with auxiliary components, by large margins on four benchmark datasets. We also provide a qualitative analysis of our trained representation which indicates that, while compact, it is able to capture information from localized and discriminative regions, in a manner akin to an implicit attention mechanism.
|
|
|
Joan Serrat, Felipe Lumbreras, & Idoia Ruiz. (2018). Learning to measure for preshipment garment sizing. MEASURE - Measurement, 130, 327–339.
Abstract: Clothing is still manually manufactured for the most part nowadays, resulting in discrepancies between nominal and real dimensions, and potentially ill-fitting garments. Hence, it is common in the apparel industry to manually perform measures at preshipment time. We present an automatic method to obtain such measures from a single image of a garment that speeds up this task. It is generic and extensible in the sense that it does not depend explicitly on the garment shape or type. Instead, it learns through a probabilistic graphical model to identify the different contour parts. Subsequently, a set of Lasso regressors, one per desired measure, can predict the actual values of the measures. We present results on a dataset of 130 images of jackets and 98 of pants, of varying sizes and styles, obtaining 1.17 and 1.22 cm of mean absolute error, respectively.
Keywords: Apparel; Computer vision; Structured prediction; Regression
|
|
|
Jianzhy Guo, Zhen Lei, Jun Wan, Egils Avots, Noushin Hajarolasvadi, Boris Knyazev, et al. (2018). Dominant and Complementary Emotion Recognition from Still Images of Faces. ACCESS - IEEE Access, 6, 26391–26403.
Abstract: Emotion recognition has a key role in affective computing. Recently, fine-grained emotion analysis, such as compound facial expression of emotions, has attracted high interest of researchers working on affective computing. A compound facial emotion includes dominant and complementary emotions (e.g., happily-disgusted and sadly-fearful), which is more detailed than the seven classical facial emotions (e.g., happy, disgust, and so on). Current studies on compound emotions are limited to use data sets with limited number of categories and unbalanced data distributions, with labels obtained automatically by machine learning-based algorithms which could lead to inaccuracies. To address these problems, we released the iCV-MEFED data set, which includes 50 classes of compound emotions and labels assessed by psychologists. The task is challenging due to high similarities of compound facial emotions from different categories. In addition, we have organized a challenge based on the proposed iCV-MEFED data set, held at FG workshop 2017. In this paper, we analyze the top three winner methods and perform further detailed experiments on the proposed data set. Experiments indicate that pairs of compound emotion (e.g., surprisingly-happy vs happily-surprised) are more difficult to be recognized if compared with the seven basic emotions. However, we hope the proposed data set can help to pave the way for further research on compound facial emotion recognition.
|
|
|
Jialuo Chen, Pau Riba, Alicia Fornes, Juan Mas, Josep Llados, & Joana Maria Pujadas-Mora. (2018). Word-Hunter: A Gamesourcing Experience to Validate the Transcription of Historical Manuscripts. In 16th International Conference on Frontiers in Handwriting Recognition (pp. 528–533).
Abstract: Nowadays, there are still many handwritten historical documents in archives waiting to be transcribed and indexed. Since manual transcription is tedious and time consuming, the automatic transcription seems the path to follow. However, the performance of current handwriting recognition techniques is not perfect, so a manual validation is mandatory. Crowdsourcing is a good strategy for manual validation, however it is a tedious task. In this paper we analyze experiences based in gamification
in order to propose and design a gamesourcing framework that increases the interest of users. Then, we describe and analyze our experience when validating the automatic transcription using the gamesourcing application. Moreover, thanks to the combination of clustering and handwriting recognition techniques, we can speed up the validation while maintaining the performance.
Keywords: Crowdsourcing; Gamification; Handwritten documents; Performance evaluation
|
|
|
Jelena Gorbova, Egils Avots, Iiris Lusi, Mark Fishel, Sergio Escalera, & Gholamreza Anbarjafari. (2018). Integrating Vision and Language for First Impression Personality Analysis. MULTIMEDIA - IEEE Multimedia, 25(2), 24–33.
Abstract: The authors present a novel methodology for analyzing integrated audiovisual signals and language to assess a persons personality. An evaluation of their proposed multimodal method using a job candidate screening system that predicted five personality traits from a short video demonstrates the methods effectiveness.
|
|