|
Arash Akbarinia, & C. Alejandro Parraga. (2018). Feedback and Surround Modulated Boundary Detection. IJCV - International Journal of Computer Vision, 126(12), 1367–1380.
Abstract: Edges are key components of any visual scene to the extent that we can recognise objects merely by their silhouettes. The human visual system captures edge information through neurons in the visual cortex that are sensitive to both intensity discontinuities and particular orientations. The “classical approach” assumes that these cells are only responsive to the stimulus present within their receptive fields, however, recent studies demonstrate that surrounding regions and inter-areal feedback connections influence their responses significantly. In this work we propose a biologically-inspired edge detection model in which orientation selective neurons are represented through the first derivative of a Gaussian function resembling double-opponent cells in the primary visual cortex (V1). In our model we account for four kinds of receptive field surround, i.e. full, far, iso- and orthogonal-orientation, whose contributions are contrast-dependant. The output signal from V1 is pooled in its perpendicular direction by larger V2 neurons employing a contrast-variant centre-surround kernel. We further introduce a feedback connection from higher-level visual areas to the lower ones. The results of our model on three benchmark datasets show a big improvement compared to the current non-learning and biologically-inspired state-of-the-art algorithms while being competitive to the learning-based methods.
Keywords: Boundary detection; Surround modulation; Biologically-inspired vision
|
|
|
Marçal Rusiñol, J. Chazalon, & Katerine Diaz. (2018). Augmented Songbook: an Augmented Reality Educational Application for Raising Music Awareness. MTAP - Multimedia Tools and Applications, 77(11), 13773–13798.
Abstract: This paper presents the development of an Augmented Reality mobile application which aims at sensibilizing young children to abstract concepts of music. Such concepts are, for instance, the musical notation or the idea of rhythm. Recent studies in Augmented Reality for education suggest that such technologies have multiple benefits for students, including younger ones. As mobile document image acquisition and processing gains maturity on mobile platforms, we explore how it is possible to build a markerless and real-time application to augment the physical documents with didactic animations and interactive virtual content. Given a standard image processing pipeline, we compare the performance of different local descriptors at two key stages of the process. Results suggest alternatives to the SIFT local descriptors, regarding result quality and computational efficiency, both for document model identification and perspective transform estimation. All experiments are performed on an original and public dataset we introduce here.
Keywords: Augmented reality; Document image matching; Educational applications
|
|
|
Laura Lopez-Fuentes, Joost Van de Weijer, Manuel Gonzalez-Hidalgo, Harald Skinnemoen, & Andrew Bagdanov. (2018). Review on computer vision techniques in emergency situations. MTAP - Multimedia Tools and Applications, 77(13), 17069–17107.
Abstract: In emergency situations, actions that save lives and limit the impact of hazards are crucial. In order to act, situational awareness is needed to decide what to do. Geolocalized photos and video of the situations as they evolve can be crucial in better understanding them and making decisions faster. Cameras are almost everywhere these days, either in terms of smartphones, installed CCTV cameras, UAVs or others. However, this poses challenges in big data and information overflow. Moreover, most of the time there are no disasters at any given location, so humans aiming to detect sudden situations may not be as alert as needed at any point in time. Consequently, computer vision tools can be an excellent decision support. The number of emergencies where computer vision tools has been considered or used is very wide, and there is a great overlap across related emergency research. Researchers tend to focus on state-of-the-art systems that cover the same emergency as they are studying, obviating important research in other fields. In order to unveil this overlap, the survey is divided along four main axes: the types of emergencies that have been studied in computer vision, the objective that the algorithms can address, the type of hardware needed and the algorithms used. Therefore, this review provides a broad overview of the progress of computer vision covering all sorts of emergencies.
Keywords: Emergency management; Computer vision; Decision makers; Situational awareness; Critical situation
|
|
|
Katerine Diaz, Konstantia Georgouli, Anastasios Koidis, & Jesus Martinez del Rincon. (2017). Incremental model learning for spectroscopy-based food analysis. CILS - Chemometrics and Intelligent Laboratory Systems, 167, 123–131.
Abstract: In this paper we propose the use of incremental learning for creating and improving multivariate analysis models in the field of chemometrics of spectral data. As main advantages, our proposed incremental subspace-based learning allows creating models faster, progressively improving previously created models and sharing them between laboratories and institutions without requiring transferring or disclosing individual spectra samples. In particular, our approach allows to improve the generalization and adaptability of previously generated models with a few new spectral samples to be applicable to real-world situations. The potential of our approach is demonstrated using vegetable oil type identification based on spectroscopic data as case study. Results show how incremental models maintain the accuracy of batch learning methodologies while reducing their computational cost and handicaps.
Keywords: Incremental model learning; IGDCV technique; Subspace based learning; IdentificationVegetable oils; FT-IR spectroscopy
|
|
|
Katerine Diaz, Jesus Martinez del Rincon, & Aura Hernandez-Sabate. (2017). Decremental generalized discriminative common vectors applied to images classification. KBS - Knowledge-Based Systems, 131, 46–57.
Abstract: In this paper, a novel decremental subspace-based learning method called Decremental Generalized Discriminative Common Vectors method (DGDCV) is presented. The method makes use of the concept of decremental learning, which we introduce in the field of supervised feature extraction and classification. By efficiently removing unnecessary data and/or classes for a knowledge base, our methodology is able to update the model without recalculating the full projection or accessing to the previously processed training data, while retaining the previously acquired knowledge. The proposed method has been validated in 6 standard face recognition datasets, showing a considerable computational gain without compromising the accuracy of the model.
Keywords: Decremental learning; Generalized Discriminative Common Vectors; Feature extraction; Linear subspace methods; Classification
|
|
|
Fatemeh Noroozi, Marina Marjanovic, Angelina Njegus, Sergio Escalera, & Gholamreza Anbarjafari. (2019). Audio-Visual Emotion Recognition in Video Clips. TAC - IEEE Transactions on Affective Computing, 10(1), 60–75.
Abstract: This paper presents a multimodal emotion recognition system, which is based on the analysis of audio and visual cues. From the audio channel, Mel-Frequency Cepstral Coefficients, Filter Bank Energies and prosodic features are extracted. For the visual part, two strategies are considered. First, facial landmarks’ geometric relations, i.e. distances and angles, are computed. Second, we summarize each emotional video into a reduced set of key-frames, which are taught to visually discriminate between the emotions. In order to do so, a convolutional neural network is applied to key-frames summarizing videos. Finally, confidence outputs of all the classifiers from all the modalities are used to define a new feature space to be learned for final emotion label prediction, in a late fusion/stacking fashion. The experiments conducted on the SAVEE, eNTERFACE’05, and RML databases show significant performance improvements by our proposed system in comparison to current alternatives, defining the current state-of-the-art in all three databases.
|
|
|
Mohammad Ali Bagheri, Qigang Gao, Sergio Escalera, Huamin Ren, Thomas B. Moeslund, & Elham Etemad. (2017). Locality Regularized Group Sparse Coding for Action Recognition. CVIU - Computer Vision and Image Understanding, 158, 106–114.
Abstract: Bag of visual words (BoVW) models are widely utilized in image/ video representation and recognition. The cornerstone of these models is the encoding stage, in which local features are decomposed over a codebook in order to obtain a representation of features. In this paper, we propose a new encoding algorithm by jointly encoding the set of local descriptors of each sample and considering the locality structure of descriptors. The proposed method takes advantages of locality coding such as its stability and robustness to noise in descriptors, as well as the strengths of the group coding strategy by taking into account the potential relation among descriptors of a sample. To efficiently implement our proposed method, we consider the Alternating Direction Method of Multipliers (ADMM) framework, which results in quadratic complexity in the problem size. The method is employed for a challenging classification problem: action recognition by depth cameras. Experimental results demonstrate the outperformance of our methodology compared to the state-of-the-art on the considered datasets.
Keywords: Bag of words; Feature encoding; Locality constrained coding; Group sparse coding; Alternating direction method of multipliers; Action recognition
|
|
|
Miguel Angel Bautista, Oriol Pujol, Fernando De la Torre, & Sergio Escalera. (2018). Error-Correcting Factorization. TPAMI - IEEE Transactions on Pattern Analysis and Machine Intelligence, 40, 2388–2401.
Abstract: Error Correcting Output Codes (ECOC) is a successful technique in multi-class classification, which is a core problem in Pattern Recognition and Machine Learning. A major advantage of ECOC over other methods is that the multi- class problem is decoupled into a set of binary problems that are solved independently. However, literature defines a general error-correcting capability for ECOCs without analyzing how it distributes among classes, hindering a deeper analysis of pair-wise error-correction. To address these limitations this paper proposes an Error-Correcting Factorization (ECF) method, our contribution is three fold: (I) We propose a novel representation of the error-correction capability, called the design matrix, that enables us to build an ECOC on the basis of allocating correction to pairs of classes. (II) We derive the optimal code length of an ECOC using rank properties of the design matrix. (III) ECF is formulated as a discrete optimization problem, and a relaxed solution is found using an efficient constrained block coordinate descent approach. (IV) Enabled by the flexibility introduced with the design matrix we propose to allocate the error-correction on classes that are prone to confusion. Experimental results in several databases show that when allocating the error-correction to confusable classes ECF outperforms state-of-the-art approaches.
|
|
|
I. Sorodoc, S. Pezzelle, A. Herbelot, Mariella Dimiccoli, & R. Bernardi. (2018). Learning quantification from images: A structured neural architecture. NLE - Natural Language Engineering, 24(3), 363–392.
Abstract: Major advances have recently been made in merging language and vision representations. Most tasks considered so far have confined themselves to the processing of objects and lexicalised relations amongst objects (content words). We know, however, that humans (even pre-school children) can abstract over raw multimodal data to perform certain types of higher level reasoning, expressed in natural language by function words. A case in point is given by their ability to learn quantifiers, i.e. expressions like few, some and all. From formal semantics and cognitive linguistics, we know that quantifiers are relations over sets which, as a simplification, we can see as proportions. For instance, in most fish are red, most encodes the proportion of fish which are red fish. In this paper, we study how well current neural network strategies model such relations. We propose a task where, given an image and a query expressed by an object–property pair, the system must return a quantifier expressing which proportions of the queried object have the queried property. Our contributions are twofold. First, we show that the best performance on this task involves coupling state-of-the-art attention mechanisms with a network architecture mirroring the logical structure assigned to quantifiers by classic linguistic formalisation. Second, we introduce a new balanced dataset of image scenarios associated with quantification queries, which we hope will foster further research in this area.
|
|
|
Maedeh Aghaei, Mariella Dimiccoli, C. Canton-Ferrer, & Petia Radeva. (2018). Towards social pattern characterization from egocentric photo-streams. CVIU - Computer Vision and Image Understanding, 171, 104–117.
Abstract: Following the increasingly popular trend of social interaction analysis in egocentric vision, this article presents a comprehensive pipeline for automatic social pattern characterization of a wearable photo-camera user. The proposed framework relies merely on the visual analysis of egocentric photo-streams and consists of three major steps. The first step is to detect social interactions of the user where the impact of several social signals on the task is explored. The detected social events are inspected in the second step for categorization into different social meetings. These two steps act at event-level where each potential social event is modeled as a multi-dimensional time-series, whose dimensions correspond to a set of relevant features for each task; finally, LSTM is employed to classify the time-series. The last step of the framework is to characterize social patterns of the user. Our goal is to quantify the duration, the diversity and the frequency of the user social relations in various social situations. This goal is achieved by the discovery of recurrences of the same people across the whole set of social events related to the user. Experimental evaluation over EgoSocialStyle – the proposed dataset in this work, and EGO-GROUP demonstrates promising results on the task of social pattern characterization from egocentric photo-streams.
Keywords: Social pattern characterization; Social signal extraction; Lifelogging; Convolutional and recurrent neural networks
|
|
|
Mireia Forns-Nadal, Federico Sem, Anna Mane, Laura Igual, Dani Guinart, & Oscar Vilarroya. (2017). Increased Nucleus Accumbens Volume in First-Episode Psychosis. PRN - Psychiatry Research-Neuroimaging, 263, 57–60.
Abstract: Nucleus accumbens has been reported as a key structure in the neurobiology of schizophrenia. Studies analyzing structural abnormalities have shown conflicting results, possibly related to confounding factors. We investigated the nucleus accumbens volume using manual delimitation in first-episode psychosis (FEP) controlling for age, cannabis use and medication. Thirty-one FEP subjects who were naive or minimally exposed to antipsychotics and a control group were MRI scanned and clinically assessed from baseline to 6 months of follow-up. FEP showed increased relative and total accumbens volumes. Clinical correlations with negative symptoms, duration of untreated psychosis and cannabis use were not significant.
|
|
|
Debora Gil, Rosa Maria Ortiz, Carles Sanchez, & Antoni Rosell. (2018). Objective endoscopic measurements of central airway stenosis. A pilot study. RES - Respiration, 95, 63–69.
Abstract: Endoscopic estimation of the degree of stenosis in central airway obstruction is subjective and highly variable. Objective: To determine the benefits of using SENSA (System for Endoscopic Stenosis Assessment), an image-based computational software, for obtaining objective stenosis index (SI) measurements among a group of expert bronchoscopists and general pulmonologists. Methods: A total of 7 expert bronchoscopists and 7 general pulmonologists were enrolled to validate SENSA usage. The SI obtained by the physicians and by SENSA were compared with a reference SI to set their precision in SI computation. We used SENSA to efficiently obtain this reference SI in 11 selected cases of benign stenosis. A Web platform with three user-friendly microtasks was designed to gather the data. The users had to visually estimate the SI from videos with and without contours of the normal and the obstructed area provided by SENSA. The users were able to modify the SENSA contours to define the reference SI using morphometric bronchoscopy. Results: Visual SI estimation accuracy was associated with neither bronchoscopic experience (p = 0.71) nor the contours of the normal and the obstructed area provided by the system (p = 0.13). The precision of the SI by SENSA was 97.7% (95% CI: 92.4-103.7), which is significantly better than the precision of the SI by visual estimation (p < 0.001), with an improvement by at least 15%. Conclusion: SENSA provides objective SI measurements with a precision of up to 99.5%, which can be calculated from any bronchoscope using an affordable scalable interface. Providing normal and obstructed contours on bronchoscopic videos does not improve physicians' visual estimation of the SI.
Keywords: Bronchoscopy; Tracheal stenosis; Airway stenosis; Computer-assisted analysis
|
|
|
Katerine Diaz, Jesus Martinez del Rincon, Aura Hernandez-Sabate, Marçal Rusiñol, & Francesc J. Ferri. (2018). Fast Kernel Generalized Discriminative Common Vectors for Feature Extraction. JMIV - Journal of Mathematical Imaging and Vision, 60(4), 512–524.
Abstract: This paper presents a supervised subspace learning method called Kernel Generalized Discriminative Common Vectors (KGDCV), as a novel extension of the known Discriminative Common Vectors method with Kernels. Our method combines the advantages of kernel methods to model complex data and solve nonlinear
problems with moderate computational complexity, with the better generalization properties of generalized approaches for large dimensional data. These attractive combination makes KGDCV specially suited for feature extraction and classification in computer vision, image processing and pattern recognition applications. Two different approaches to this generalization are proposed, a first one based on the kernel trick (KT) and a second one based on the nonlinear projection trick (NPT) for even higher efficiency. Both methodologies
have been validated on four different image datasets containing faces, objects and handwritten digits, and compared against well known non-linear state-of-art methods. Results show better discriminant properties than other generalized approaches both linear or kernel. In addition, the KGDCV-NPT approach presents a considerable computational gain, without compromising the accuracy of the model.
|
|
|
Huamin Ren, Nattiya Kanhabua, Andreas Mogelmose, Weifeng Liu, Kaustubh Kulkarni, Sergio Escalera, et al. (2018). Back-dropout Transfer Learning for Action Recognition. IETCV - IET Computer Vision, 12(4), 484–491.
Abstract: Transfer learning aims at adapting a model learned from source dataset to target dataset. It is a beneficial approach especially when annotating on the target dataset is expensive or infeasible. Transfer learning has demonstrated its powerful learning capabilities in various vision tasks. Despite transfer learning being a promising approach, it is still an open question how to adapt the model learned from the source dataset to the target dataset. One big challenge is to prevent the impact of category bias on classification performance. Dataset bias exists when two images from the same category, but from different datasets, are not classified as the same. To address this problem, a transfer learning algorithm has been proposed, called negative back-dropout transfer learning (NB-TL), which utilizes images that have been misclassified and further performs back-dropout strategy on them to penalize errors. Experimental results demonstrate the effectiveness of the proposed algorithm. In particular, the authors evaluate the performance of the proposed NB-TL algorithm on UCF 101 action recognition dataset, achieving 88.9% recognition rate.
Keywords: Learning (artificial intelligence); Pattern Recognition
|
|
|
Mark Philip Philipsen, Jacob Velling Dueholm, Anders Jorgensen, Sergio Escalera, & Thomas B. Moeslund. (2018). Organ Segmentation in Poultry Viscera Using RGB-D. SENS - Sensors, 18(1), 117.
Abstract: We present a pattern recognition framework for semantic segmentation of visual structures, that is, multi-class labelling at pixel level, and apply it to the task of segmenting organs in the eviscerated viscera from slaughtered poultry in RGB-D images. This is a step towards replacing the current strenuous manual inspection at poultry processing plants. Features are extracted from feature maps such as activation maps from a convolutional neural network (CNN). A random forest classifier assigns class probabilities, which are further refined by utilizing context in a conditional random field. The presented method is compatible with both 2D and 3D features, which allows us to explore the value of adding 3D and CNN-derived features. The dataset consists of 604 RGB-D images showing 151 unique sets of eviscerated viscera from four different perspectives. A mean Jaccard index of 78.11% is achieved across the four classes of organs by using features derived from 2D, 3D and a CNN, compared to 74.28% using only basic 2D image features.
Keywords: semantic segmentation; RGB-D; random forest; conditional random field; 2D; 3D; CNN
|
|