|
Marçal Rusiñol, David Aldavert, Ricardo Toledo and Josep Llados. 2015. Towards Query-by-Speech Handwritten Keyword Spotting. 13th International Conference on Document Analysis and Recognition ICDAR2015.501–505.
Abstract: In this paper, we present a new querying paradigm for handwritten keyword spotting. We propose to represent handwritten word images both by visual and audio representations, enabling a query-by-speech keyword spotting system. The two representations are merged together and projected to a common sub-space in the training phase. This transform allows to, given a spoken query, retrieve word instances that were only represented by the visual modality. In addition, the same method can be used backwards at no additional cost to produce a handwritten text-tospeech system. We present our first results on this new querying mechanism using synthetic voices over the George Washington
dataset.
|
|
|
Marc Masana, Idoia Ruiz, Joan Serrat, Joost Van de Weijer and Antonio Lopez. 2018. Metric Learning for Novelty and Anomaly Detection. 29th British Machine Vision Conference.
Abstract: When neural networks process images which do not resemble the distribution seen during training, so called out-of-distribution images, they often make wrong predictions, and do so too confidently. The capability to detect out-of-distribution images is therefore crucial for many real-world applications. We divide out-of-distribution detection between novelty detection ---images of classes which are not in the training set but are related to those---, and anomaly detection ---images with classes which are unrelated to the training set. By related we mean they contain the same type of objects, like digits in MNIST and SVHN. Most existing work has focused on anomaly detection, and has addressed this problem considering networks trained with the cross-entropy loss. Differently from them, we propose to use metric learning which does not have the drawback of the softmax layer (inherent to cross-entropy methods), which forces the network to divide its prediction power over the learned classes. We perform extensive experiments and evaluate both novelty and anomaly detection, even in a relevant application such as traffic sign recognition, obtaining comparable or better results than previous works.
|
|
|
Marcelo D. Pistarelli, Angel Sappa and Ricardo Toledo. 2013. Multispectral Stereo Image Correspondence. 15th International Conference on Computer Analysis of Images and Patterns. Springer Berlin Heidelberg, 217–224. (LNCS.)
Abstract: This paper presents a novel multispectral stereo image correspondence approach. It is evaluated using a stereo rig constructed with a visible spectrum camera and a long wave infrared spectrum camera. The novelty of the proposed approach lies on the usage of Hough space as a correspondence search domain. In this way it avoids searching for correspondence in the original multispectral image domains, where information is low correlated, and a common domain is used. The proposed approach is intended to be used in outdoor urban scenarios, where images contain large amount of edges. These edges are used as distinctive characteristics for the matching in the Hough space. Experimental results are provided showing the validity of the proposed approach.
|
|
|
Miguel Oliveira, Angel Sappa and V. Santos. 2012. Color Correction using 3D Gaussian Mixture Models. 9th International Conference on Image Analysis and Recognition. Springer Berlin Heidelberg, 97–106. (LNCS.)
Abstract: The current paper proposes a novel color correction approach based on a probabilistic segmentation framework by using 3D Gaussian Mixture Models. Regions are used to compute local color correction functions, which are then combined to obtain the final corrected image. The proposed approach is evaluated using both a recently published metric and two large data sets composed of seventy images. The evaluation is performed by comparing our algorithm with eight well known color correction algorithms. Results show that the proposed approach is the highest scoring color correction method. Also, the proposed single step 3D color space probabilistic segmentation reduces processing time over similar approaches.
|
|
|
Miguel Oliveira, Angel Sappa and V. Santos. 2012. Color Correction for Onboard Multi-camera Systems using 3D Gaussian Mixture Models. IEEE Intelligent Vehicles Symposium. IEEE Xplore, 299–303.
Abstract: The current paper proposes a novel color correction approach for onboard multi-camera systems. It works by segmenting the given images into several regions. A probabilistic segmentation framework, using 3D Gaussian Mixture Models, is proposed. Regions are used to compute local color correction functions, which are then combined to obtain the final corrected image. An image data set of road scenarios is used to establish a performance comparison of the proposed method with other seven well known color correction algorithms. Results show that the proposed approach is the highest scoring color correction method. Also, the proposed single step 3D color space probabilistic segmentation reduces processing time over similar approaches.
|
|
|
Miguel Oliveira, Angel Sappa and V.Santos. 2011. Unsupervised Local Color Correction for Coarsely Registered Images. IEEE conference on Computer Vision and Pattern Recognition.201–208.
Abstract: The current paper proposes a new parametric local color correction technique. Initially, several color transfer functions are computed from the output of the mean shift color segmentation algorithm. Secondly, color influence maps are calculated. Finally, the contribution of every color transfer function is merged using the weights from the color influence maps. The proposed approach is compared with both global and local color correction approaches. Results show that our method outperforms the technique ranked first in a recent performance evaluation on this topic. Moreover, the proposed approach is computed in about one tenth of the time.
|
|
|
Miguel Oliveira, L. Seabra Lopes, G. Hyun Lim, S. Hamidreza Kasaei, Angel Sappa and A. Tom. 2015. Concurrent Learning of Visual Codebooks and Object Categories in Openended Domains. International Conference on Intelligent Robots and Systems.2488–2495.
Abstract: In open-ended domains, robots must continuously learn new object categories. When the training sets are created offline, it is not possible to ensure their representativeness with respect to the object categories and features the system will find when operating online. In the Bag of Words model, visual codebooks are constructed from training sets created offline. This might lead to non-discriminative visual words and, as a consequence, to poor recognition performance. This paper proposes a visual object recognition system which concurrently learns in an incremental and online fashion both the visual object category representations as well as the codebook words used to encode them. The codebook is defined using Gaussian Mixture Models which are updated using new object views. The approach contains similarities with the human visual object recognition system: evidence suggests that the development of recognition capabilities occurs on multiple levels and is sustained over large periods of time. Results show that the proposed system with concurrent learning of object categories and codebooks is capable of learning more categories, requiring less examples, and with similar accuracies, when compared to the classical Bag of Words approach using offline constructed codebooks.
Keywords: Visual Learning; Computer Vision; Autonomous Agents
|
|
|
Miguel Oliveira, V.Santos and Angel Sappa. 2012. Short term path planning using a multiple hypothesis evaluation approach for an autonomous driving competition. IEEE 4th Workshop on Planning, Perception and Navigation for Intelligent Vehicles.
|
|
|
Miguel Oliveira, Victor Santos, Angel Sappa and P. Dias. 2015. Scene Representations for Autonomous Driving: an approach based on polygonal primitives. 2nd Iberian Robotics Conference ROBOT2015.503–515.
Abstract: In this paper, we present a novel methodology to compute a 3D scene
representation. The algorithm uses macro scale polygonal primitives to model the scene. This means that the representation of the scene is given as a list of large scale polygons that describe the geometric structure of the environment. Results show that the approach is capable of producing accurate descriptions of the scene. In addition, the algorithm is very efficient when compared to other techniques.
Keywords: Scene reconstruction; Point cloud; Autonomous vehicles
|
|
|
Mohammad Rouhani and Angel Sappa. 2009. A Novel Approach to Geometric Fitting of Implicit Quadrics. 8th International Conference on Advanced Concepts for Intelligent Vision Systems. Springer Berlin Heidelberg, 121–132. (LNCS.)
Abstract: This paper presents a novel approach for estimating the geometric distance from a given point to the corresponding implicit quadric curve/surface. The proposed estimation is based on the height of a tetrahedron, which is used as a coarse but reliable estimation of the real distance. The estimated distance is then used for finding the best set of quadric parameters, by means of the Levenberg-Marquardt algorithm, which is a common framework in other geometric fitting approaches. Comparisons of the proposed approach with previous ones are provided to show both improvements in CPU time as well as in the accuracy of the obtained results.
|
|