|
Andreas Fischer, Volkmar Frinken, Horst Bunke, & Ching Y. Suen. (2013). Improving HMM-Based Keyword Spotting with Character Language Models. In 12th International Conference on Document Analysis and Recognition (pp. 506–510).
Abstract: Facing high error rates and slow recognition speed for full text transcription of unconstrained handwriting images, keyword spotting is a promising alternative to locate specific search terms within scanned document images. We have previously proposed a learning-based method for keyword spotting using character hidden Markov models that showed a high performance when compared with traditional template image matching. In the lexicon-free approach pursued, only the text appearance was taken into account for recognition. In this paper, we integrate character n-gram language models into the spotting system in order to provide an additional language context. On the modern IAM database as well as the historical George Washington database, we demonstrate that character language models significantly improve the spotting performance.
|
|
|
Andreas Møgelmose, Chris Bahnsen, Thomas B. Moeslund, Albert Clapes, & Sergio Escalera. (2013). Tri-modal Person Re-identification with RGB, Depth and Thermal Features. In 9th IEEE Workshop on Perception beyond the visible Spectrum, Computer Vision and Pattern Recognition (pp. 301–307).
Abstract: Person re-identification is about recognizing people who have passed by a sensor earlier. Previous work is mainly based on RGB data, but in this work we for the first time present a system where we combine RGB, depth, and thermal data for re-identification purposes. First, from each of the three modalities, we obtain some particular features: from RGB data, we model color information from different regions of the body, from depth data, we compute different soft body biometrics, and from thermal data, we extract local structural information. Then, the three information types are combined in a joined classifier. The tri-modal system is evaluated on a new RGB-D-T dataset, showing successful results in re-identification scenarios.
|
|
|
Andrei Polzounov, Artsiom Ablavatski, Sergio Escalera, Shijian Lu, & Jianfei Cai. (2017). WordFences: Text Localization and Recognition. In 24th International Conference on Image Processing.
|
|
|
Andres Mafla, Rafael S. Rezende, Lluis Gomez, Diana Larlus, & Dimosthenis Karatzas. (2021). StacMR: Scene-Text Aware Cross-Modal Retrieval. In IEEE Winter Conference on Applications of Computer Vision (pp. 2219–2229).
|
|
|
Andres Mafla, Sounak Dey, Ali Furkan Biten, Lluis Gomez, & Dimosthenis Karatzas. (2021). Multi-modal reasoning graph for scene-text based fine-grained image classification and retrieval. In IEEE Winter Conference on Applications of Computer Vision (pp. 4022–4032).
|
|
|
Andres Mafla, Sounak Dey, Ali Furkan Biten, Lluis Gomez, & Dimosthenis Karatzas. (2020). Fine-grained Image Classification and Retrieval by Combining Visual and Locally Pooled Textual Features. In IEEE Winter Conference on Applications of Computer Vision.
Abstract: Text contained in an image carries high-level semantics that can be exploited to achieve richer image understanding. In particular, the mere presence of text provides strong guiding content that should be employed to tackle a diversity of computer vision tasks such as image retrieval, fine-grained classification, and visual question answering. In this paper, we address the problem of fine-grained classification and image retrieval by leveraging textual information along with visual cues to comprehend the existing intrinsic relation between the two modalities. The novelty of the proposed model consists of the usage of a PHOC descriptor to construct a bag of textual words along with a Fisher Vector Encoding that captures the morphology of text. This approach provides a stronger multimodal representation for this task and as our experiments demonstrate, it achieves state-of-the-art results on two different tasks, fine-grained classification and image retrieval.
|
|
|
Andres Traumann, Sergio Escalera, & Gholamreza Anbarjafari. (2015). A New Retexturing Method for Virtual Fitting Room Using Kinect 2 Camera. In 2015 IEEE Conference on Computer Vision and Pattern Recognition Worshops (CVPRW) (pp. 75–79).
|
|
|
Andrew Nolan, Daniel Serrano, Aura Hernandez-Sabate, Daniel Ponsa, & Antonio Lopez. (2013). Obstacle mapping module for quadrotors on outdoor Search and Rescue operations. In International Micro Air Vehicle Conference and Flight Competition.
Abstract: Obstacle avoidance remains a challenging task for Micro Aerial Vehicles (MAV), due to their limited payload capacity to carry advanced sensors. Unlike larger vehicles, MAV can only carry light weight sensors, for instance a camera, which is our main assumption in this work. We explore passive monocular depth estimation and propose a novel method Position Aided Depth Estimation
(PADE). We analyse PADE performance and compare it against the extensively used Time To Collision (TTC). We evaluate the accuracy, robustness to noise and speed of three Optical Flow (OF) techniques, combined with both depth estimation methods. Our results show PADE is more accurate than TTC at depths between 0-12 meters and is less sensitive to noise. Our findings highlight the potential application of PADE for MAV to perform safe autonomous navigation in
unknown and unstructured environments.
Keywords: UAV
|
|
|
Aneesh Rangnekar, Zachary Mulhollan, Anthony Vodacek, Matthew Hoffman, Angel Sappa, Erik Blasch, et al. (2022). Semi-Supervised Hyperspectral Object Detection Challenge Results – PBVS 2022. In IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) (pp. 390–398).
Abstract: This paper summarizes the top contributions to the first semi-supervised hyperspectral object detection (SSHOD) challenge, which was organized as a part of the Perception Beyond the Visible Spectrum (PBVS) 2022 workshop at the Computer Vision and Pattern Recognition (CVPR) conference. The SSHODC challenge is a first-of-its-kind hyperspectral dataset with temporally contiguous frames collected from a university rooftop observing a 4-way vehicle intersection over a period of three days. The dataset contains a total of 2890 frames, captured at an average resolution of 1600 × 192 pixels, with 51 hyperspectral bands from 400nm to 900nm. SSHOD challenge uses 989 images as the training set, 605 images as validation set and 1296 images as the evaluation (test) set. Each set was acquired on a different day to maximize the variance in weather conditions. Labels are provided for 10% of the annotated data, hence formulating a semi-supervised learning task for the participants which is evaluated in terms of average precision over the entire set of classes, as well as individual moving object classes: namely vehicle, bus and bike. The challenge received participation registration from 38 individuals, with 8 participating in the validation phase and 3 participating in the test phase. This paper describes the dataset acquisition, with challenge formulation, proposed methods and qualitative and quantitative results.
Keywords: Training; Computer visio; Conferences; Training data; Object detection; Semisupervised learning; Transformers
|
|
|
Angel Morera, Angel Sanchez, Angel Sappa, & Jose F. Velez. (2019). Robust Detection of Outdoor Urban Advertising Panels in Static Images. In 18th International Conference on Practical Applications of Agents and Multi-Agent Systems (pp. 246–256).
Abstract: One interesting publicity application for Smart City environments is recognizing brand information contained in urban advertising panels. For such a purpose, a previous stage is to accurately detect and locate the position of these panels in images. This work presents an effective solution to this problem using a Single Shot Detector (SSD) based on a deep neural network architecture that minimizes the number of false detections under multiple variable conditions regarding the panels and the scene. Achieved experimental results using the Intersection over Union (IoU) accuracy metric make this proposal applicable in real complex urban images.
Keywords: Object detection; Urban ads panels; Deep learning; Single Shot Detector (SSD) architecture; Intersection over Union (IoU) metric; Augmented Reality
|
|
|
Angel Sappa, & Boris X. Vintimilla. (2006). Edge Point Linking by Means of Global and Local Schemes. In IEEE Int. Conf. on Signal-Image Technology and Internet-Based Systems, Hammamet, Tunisia, December 2006, pp. 551-560..
|
|
|
Angel Sappa, David Geronimo, Fadi Dornaika, & Antonio Lopez. (2006). Real Time Vehicle Pose Using On-Board Stereo Vision System. In International Conference on Image Analysis and Recognition (205–216).
Abstract: This paper presents a robust technique for a real time estimation of both camera’s position and orientation—referred as pose. A commercial stereo vision system is used. Unlike previous approaches, it can be used either for urban or highway scenarios. The proposed technique consists of two stages. Initially, a compact 2D representation of the original 3D data points is computed. Then, a RANSAC based least squares approach is used for fitting a plane to the road. At the same time,
relative camera’s position and orientation are computed. The proposed technique is intended to be used on a driving assistance scheme for applications such as obstacle or pedestrian detection. Experimental results on urban environments with different road geometries are presented.
|
|
|
Angel Sappa, Fadi Dornaika, David Geronimo, & Antonio Lopez. (2008). Registration-based Moving Object Detection from a Moving Camera. In IROS2008 2nd Workshop on Perception, Planning and Navigation for Intelligent Vehicles (65–69).
Abstract: This paper presents a robust approach for detecting moving objects from on-board stereo vision systems. It relies on a feature point quaternion-based registration, which avoids common problems that appear when computationally expensive iterative-based algorithms are used on dynamic environments. The proposed approach consists of three stages. Initially, feature points are extracted and tracked through consecutive frames. Then, a RANSAC based approach is used for registering
two 3D point sets with known correspondences by means of the quaternion method. Finally, the computed 3D rigid displacement is used to map two consecutive frames into the same coordinate system. Moving objects correspond to those areas with large registration errors. Experimental results, in different scenarios, show the viability of the proposed approach.
|
|
|
Angel Sappa, Fadi Dornaika, David Geronimo, & Antonio Lopez. (2007). Efficient On-Board Stereo Vision Pose Estimation. In Computer Aided Systems Theory, Selected paper from (Vol. 4739, 1183–1190). LNCS.
Abstract: This paper presents an efficient technique for real time estimation of on-board stereo vision system pose. The whole process is performed in the Euclidean space and consists of two stages. Initially, a compact representation of the original 3D data points is computed. Then, a RANSAC based least squares approach is used for fitting a plane to the 3D road points. Fast RANSAC fitting is obtained by selecting points according to a probability distribution function that takes into account the density of points at a given depth. Finally, stereo camera position
and orientation—pose—is computed relative to the road plane. The proposed technique is intended to be used on driver assistance systems for applications such as obstacle or pedestrian detection. A real time performance is reached. Experimental results on several environments and comparisons with a previous work are presented.
|
|
|
Angel Sappa, & M.A. Garcia. (2004). Hierarchical Clustering of 3D Objects and its Application to Minimum Distance Computation. In IEEE International Conference on Robotics & Automation, 5287–5292, New Orleans, LA (USA), ISBN: 0–7803–8232–3.
|
|