|
Ferran Diego, Daniel Ponsa, Joan Serrat, & Antonio Lopez. (2010). Vehicle geolocalization based on video synchronization. In 13th Annual International Conference on Intelligent Transportation Systems (1511–1516).
Abstract: TC8.6
This paper proposes a novel method for estimating the geospatial localization of a vehicle. I uses as input a georeferenced video sequence recorded by a forward-facing camera attached to the windscreen. The core of the proposed method is an on-line video synchronization which finds out the corresponding frame in the georeferenced video sequence to the one recorded at each time by the camera on a second drive through the same track. Once found the corresponding frame in the georeferenced video sequence, we transfer its geospatial information of this frame. The key advantages of this method are: 1) the increase of the update rate and the geospatial accuracy with regard to a standard low-cost GPS and 2) the ability to localize a vehicle even when a GPS is not available or is not reliable enough, like in certain urban areas. Experimental results for an urban environments are presented, showing an average of relative accuracy of 1.5 meters.
Keywords: video alignment
|
|
|
Ferran Diego, Jose Manuel Alvarez, Joan Serrat, & Antonio Lopez. (2010). Vision-based road detection via on-line video registration. In 13th Annual International Conference on Intelligent Transportation Systems (1135–1140).
Abstract: TB6.2
Road segmentation is an essential functionality for supporting advanced driver assistance systems (ADAS) such as road following and vehicle and pedestrian detection. Significant efforts have been made in order to solve this task using vision-based techniques. The major challenge is to deal with lighting variations and the presence of objects on the road surface. In this paper, we propose a new road detection method to infer the areas of the image depicting road surfaces without performing any image segmentation. The idea is to previously segment manually or semi-automatically the road region in a traffic-free reference video record on a first drive. And then to transfer these regions to the frames of a second video sequence acquired later in a second drive through the same road, in an on-line manner. This is possible because we are able to automatically align the two videos in time and space, that is, to synchronize them and warp each frame of the first video to its corresponding frame in the second one. The geometric transform can thus transfer the road region to the present frame on-line. In order to reduce the different lighting conditions which are present in outdoor scenarios, our approach incorporates a shadowless feature space which represents an image in an illuminant-invariant feature space. Furthermore, we propose a dynamic background subtraction algorithm which removes the regions containing vehicles in the observed frames which are within the transferred road region.
Keywords: video alignment; road detection
|
|
|
Sergio Vera, Miguel Angel Gonzalez Ballester, & Debora Gil. (2012). A medial map capturing the essential geometry of organs. In ISBI Workshop on Open Source Medical Image Analysis software (1691 - 1694). IEEE.
Abstract: Medial representations are powerful tools for describing and parameterizing the volumetric shape of anatomical structures. Accurate computation of one pixel wide medial surfaces is mandatory. Those surfaces must represent faithfully the geometry of the volume. Although morphological methods produce excellent results in 2D, their complexity and quality drops across dimensions, due to a more complex description of pixel neighborhoods. This paper introduces a continuous operator for accurate and efficient computation of medial structures of arbitrary dimension. Our experiments show its higher performance for medical imaging applications in terms of simplicity of medial structures and capability for reconstructing the anatomical volume
Keywords: Medial Surface Representation, Volume Reconstruction,Geometry , Image reconstruction , Liver , Manifolds , Shape , Surface morphology , Surface reconstruction
|
|
|
Sergio Escalera, Oriol Pujol, Eric Laciar, Jordi Vitria, Esther Pueyo, & Petia Radeva. (2008). Coronary Damage Classification of Patients with the Chagas Disease with Error-Correcting Output Codes. In Intelligent Systems, 4th International IEEE Conference, 6–8 setembre 2008. (Vol. 2, 12–17).
Abstract: The Chagaspsila disease is endemic in all Latin America, affecting millions of people in the continent. In order to diagnose and treat the Chagaspsila disease, it is important to detect and measure the coronary damage of the patient. In this paper, we analyze and categorize patients into different groups based on the coronary damage produced by the disease. Based on the features of the heart cycle extracted using high resolution ECG, a multi-class scheme of error-correcting output codes (ECOC) is formulated and successfully applied. The results show that the proposed scheme obtains significant performance improvements compared to previous works and state-of-the-art ECOC designs.
|
|
|
Kamal Nasrollahi, Sergio Escalera, P. Rasti, Gholamreza Anbarjafari, Xavier Baro, Hugo Jair Escalante, et al. (2015). Deep Learning based Super-Resolution for Improved Action Recognition. In 5th International Conference on Image Processing Theory, Tools and Applications IPTA2015 (pp. 67–72).
Abstract: Action recognition systems mostly work with videos of proper quality and resolution. Even most challenging benchmark databases for action recognition, hardly include videos of low-resolution from, e.g., surveillance cameras. In videos recorded by such cameras, due to the distance between people and cameras, people are pictured very small and hence challenge action recognition algorithms. Simple upsampling methods, like bicubic interpolation, cannot retrieve all the detailed information that can help the recognition. To deal with this problem, in this paper we combine results of bicubic interpolation with results of a state-ofthe-art deep learning-based super-resolution algorithm, through an alpha-blending approach. The experimental results obtained on down-sampled version of a large subset of Hoolywood2 benchmark database show the importance of the proposed system in increasing the recognition rate of a state-of-the-art action recognition system for handling low-resolution videos.
|
|
|
Joan Arnedo-Moreno, D. Bañeres, Xavier Baro, S. Caballe, S. Guerrero, L. Porta, et al. (2014). Va-ID: A trust-based virtual assessment system. In 6th International Conference on Intelligent Networking and Collaborative Systems (pp. 328–335).
Abstract: Even though online education is a very important pillar of lifelong education, institutions are still reluctant to wager for a fully online educational model. At the end, they keep relying on on-site assessment systems, mainly because fully virtual alternatives do not have the deserved social recognition or credibility. Thus, the design of virtual assessment systems that are able to provide effective proof of student authenticity and authorship and the integrity of the activities in a scalable and cost efficient manner would be very helpful. This paper presents ValID, a virtual assessment approach based on a continuous trust level evaluation between students and the institution. The current trust level serves as the main mechanism to dynamically decide which kind of controls a given student should be subjected to, across different courses in a degree. The main goal is providing a fair trade-off between security, scalability and cost, while maintaining the perceived quality of the educational model.
|
|
|
Hugo Jair Escalante, Isabelle Guyon, Sergio Escalera, Julio C. S. Jacques Junior, Xavier Baro, Evelyne Viegas, et al. (2017). Design of an Explainable Machine Learning Challenge for Video Interviews. In International Joint Conference on Neural Networks.
Abstract: This paper reviews and discusses research advances on “explainable machine learning” in computer vision. We focus on a particular area of the “Looking at People” (LAP) thematic domain: first impressions and personality analysis. Our aim is to make the computational intelligence and computer vision communities aware of the importance of developing explanatory mechanisms for computer-assisted decision making applications, such as automating recruitment. Judgments based on personality traits are being made routinely by human resource departments to evaluate the candidates' capacity of social insertion and their potential of career growth. However, inferring personality traits and, in general, the process by which we humans form a first impression of people, is highly subjective and may be biased. Previous studies have demonstrated that learning machines can learn to mimic human decisions. In this paper, we go one step further and formulate the problem of explaining the decisions of the models as a means of identifying what visual aspects are important, understanding how they relate to decisions suggested, and possibly gaining insight into undesirable negative biases. We design a new challenge on explainability of learning machines for first impressions analysis. We describe the setting, scenario, evaluation metrics and preliminary outcomes of the competition. To the best of our knowledge this is the first effort in terms of challenges for explainability in computer vision. In addition our challenge design comprises several other quantitative and qualitative elements of novelty, including a “coopetition” setting, which combines competition and collaboration.
|
|
|
M. Ivasic-Kos, M. Pobar, & Jordi Gonzalez. (2019). Active Player Detection in Handball Videos Using Optical Flow and STIPs Based Measures. In 13th International Conference on Signal Processing and Communication Systems.
Abstract: In handball videos recorded during the training, multiple players are present in the scene at the same time. Although they all might move and interact, not all players contribute to the currently relevant exercise nor practice the given handball techniques. The goal of this experiment is to automatically determine players on training footage that perform given handball techniques and are therefore considered active. It is a very challenging task for which a precise object detector is needed that can handle cluttered scenes with poor illumination, with many players present in different sizes and distances from the camera, partially occluded, moving fast. To determine which of the detected players are active, additional information is needed about the level of player activity. Since many handball actions are characterized by considerable changes in speed, position, and variations in the player's appearance, we propose using spatio-temporal interest points (STIPs) and optical flow (OF). Therefore, we propose an active player detection method combining the YOLO object detector and two activity measures based on STIPs and OF. The performance of the proposed method and activity measures are evaluated on a custom handball video dataset acquired during handball training lessons.
|
|
|
Francisco Jose Perales, Juan J. Villanueva, & Yuhua Luo. (1991). An automatic two-camera human motion perception system based on biomechanical model matching. In IEEE International Conference on Systems, Man and Cybernetics (Vol. 2, pp. 856–858).
|
|
|
Felipe Codevilla, Matthias Muller, Antonio Lopez, Vladlen Koltun, & Alexey Dosovitskiy. (2018). End-to-end Driving via Conditional Imitation Learning. In IEEE International Conference on Robotics and Automation (pp. 4693–4700).
Abstract: Deep networks trained on demonstrations of human driving have learned to follow roads and avoid obstacles. However, driving policies trained via imitation learning cannot be controlled at test time. A vehicle trained end-to-end to imitate an expert cannot be guided to take a specific turn at an upcoming intersection. This limits the utility of such systems. We propose to condition imitation learning on high-level command input. At test time, the learned driving policy functions as a chauffeur that handles sensorimotor coordination but continues to respond to navigational commands. We evaluate different architectures for conditional imitation learning in vision-based driving. We conduct experiments in realistic three-dimensional simulations of urban driving and on a 1/5 scale robotic truck that is trained to drive in a residential area. Both systems drive based on visual input yet remain responsive to high-level navigational commands. The supplementary video can be viewed at this https URL
|
|
|
German Ros, J. Guerrero, Angel Sappa, & Antonio Lopez. (2013). VSLAM pose initialization via Lie groups and Lie algebras optimization. In Proceedings of IEEE International Conference on Robotics and Automation (pp. 5740–5747).
Abstract: We present a novel technique for estimating initial 3D poses in the context of localization and Visual SLAM problems. The presented approach can deal with noise, outliers and a large amount of input data and still performs in real time in a standard CPU. Our method produces solutions with an accuracy comparable to those produced by RANSAC but can be much faster when the percentage of outliers is high or for large amounts of input data. On the current work we propose to formulate the pose estimation as an optimization problem on Lie groups, considering their manifold structure as well as their associated Lie algebras. This allows us to perform a fast and simple optimization at the same time that conserve all the constraints imposed by the Lie group SE(3). Additionally, we present several key design concepts related with the cost function and its Jacobian; aspects that are critical for the good performance of the algorithm.
Keywords: SLAM
|
|
|
Carlos Boned Riera, & Oriol Ramos Terrades. (2022). Discriminative Neural Variational Model for Unbalanced Classification Tasks in Knowledge Graph. In 26th International Conference on Pattern Recognition (pp. 2186–2191).
Abstract: Nowadays the paradigm of link discovery problems has shown significant improvements on Knowledge Graphs. However, method performances are harmed by the unbalanced nature of this classification problem, since many methods are easily biased to not find proper links. In this paper we present a discriminative neural variational auto-encoder model, called DNVAE from now on, in which we have introduced latent variables to serve as embedding vectors. As a result, the learnt generative model approximate better the underlying distribution and, at the same time, it better differentiate the type of relations in the knowledge graph. We have evaluated this approach on benchmark knowledge graph and Census records. Results in this last data set are quite impressive since we reach the highest possible score in the evaluation metrics. However, further experiments are still needed to deeper evaluate the performance of the method in more challenging tasks.
Keywords: Measurement; Couplings; Semantics; Ear; Benchmark testing; Data models; Pattern recognition
|
|
|
Mohamed Ali Souibgui, Sanket Biswas, Sana Khamekhem Jemni, Yousri Kessentini, Alicia Fornes, Josep Llados, et al. (2022). DocEnTr: An End-to-End Document Image Enhancement Transformer. In 26th International Conference on Pattern Recognition (pp. 1699–1705).
Abstract: Document images can be affected by many degradation scenarios, which cause recognition and processing difficulties. In this age of digitization, it is important to denoise them for proper usage. To address this challenge, we present a new encoder-decoder architecture based on vision transformers to enhance both machine-printed and handwritten document images, in an end-to-end fashion. The encoder operates directly on the pixel patches with their positional information without the use of any convolutional layers, while the decoder reconstructs a clean image from the encoded patches. Conducted experiments show a superiority of the proposed model compared to the state-of the-art methods on several DIBCO benchmarks. Code and models will be publicly available at: https://github.com/dali92002/DocEnTR
Keywords: Degradation; Head; Optical character recognition; Self-supervised learning; Benchmark testing; Transformers; Magnetic heads
|
|
|
Mohamed Ali Souibgui, Alicia Fornes, Y.Kessentini, & C.Tudor. (2021). A Few-shot Learning Approach for Historical Encoded Manuscript Recognition. In 25th International Conference on Pattern Recognition (pp. 5413–5420).
Abstract: Encoded (or ciphered) manuscripts are a special type of historical documents that contain encrypted text. The automatic recognition of this kind of documents is challenging because: 1) the cipher alphabet changes from one document to another, 2) there is a lack of annotated corpus for training and 3) touching symbols make the symbol segmentation difficult and complex. To overcome these difficulties, we propose a novel method for handwritten ciphers recognition based on few-shot object detection. Our method first detects all symbols of a given alphabet in a line image, and then a decoding step maps the symbol similarity scores to the final sequence of transcribed symbols. By training on synthetic data, we show that the proposed architecture is able to recognize handwritten ciphers with unseen alphabets. In addition, if few labeled pages with the same alphabet are used for fine tuning, our method surpasses existing unsupervised and supervised HTR methods for ciphers recognition.
|
|
|
Xialei Liu, Marc Masana, Luis Herranz, Joost Van de Weijer, Antonio Lopez, & Andrew Bagdanov. (2018). Rotate your Networks: Better Weight Consolidation and Less Catastrophic Forgetting. In 24th International Conference on Pattern Recognition (pp. 2262–2268).
Abstract: In this paper we propose an approach to avoiding catastrophic forgetting in sequential task learning scenarios. Our technique is based on a network reparameterization that approximately diagonalizes the Fisher Information Matrix of the network parameters. This reparameterization takes the form of
a factorized rotation of parameter space which, when used in conjunction with Elastic Weight Consolidation (which assumes a diagonal Fisher Information Matrix), leads to significantly better performance on lifelong learning of sequential tasks. Experimental results on the MNIST, CIFAR-100, CUB-200 and
Stanford-40 datasets demonstrate that we significantly improve the results of standard elastic weight consolidation, and that we obtain competitive results when compared to the state-of-the-art in lifelong learning without forgetting.
|
|