Publicacions CVC -- Query Results

Dimosthenis Karatzas. (2008). Detecting Gradients in Text Images Using the Hough Transform. In Proceedings of the 8th International Workshop on Document Analysis Systems, (245–252). http://refbase.cvc.uab.es/show.php?record=1062
Alicia Fornes, Josep Llados, Gemma Sanchez, & Horst Bunke. (2008). Writer Identification in Old Handwritten Music Scores. In Proceedings of the 8th International Workshop on Document Analysis Systems, (347–353). http://refbase.cvc.uab.es/show.php?record=1078
Antonio Lopez, Cristina Cañero, Joan Serrat, J. Saludes, Felipe Lumbreras, & T. Graf. (2005). Detection of lane markings based on ridgeness and RANSAC. Keywords: lane markings http://refbase.cvc.uab.es/show.php?record=588
Daniel Ponsa, Antonio Lopez, Joan Serrat, Felipe Lumbreras, & T. Graf. (2005). Multiple Vehicle 3D Tracking Using an Unscented Kalman Filter. Keywords: vehicle detection http://refbase.cvc.uab.es/show.php?record=615
Daniel Ponsa, Antonio Lopez, Felipe Lumbreras, Joan Serrat, & T. Graf. (2005). 3D Vehicle Sensor based on Monocular Vision. http://refbase.cvc.uab.es/show.php?record=614
Mirko Arnold, Stephan Ameling, Anarta Ghosh, & Gerard Lacey. (2011). Quality Improvement of Endoscopy Videos. In Proceedings of the 8th IASTED International Conference on Biomedical Engineering (Vol. 723). http://refbase.cvc.uab.es/show.php?record=2426
Partha Pratim Roy, Umapada Pal, & Josep Llados. (2008). Multi-oriented English Text Line Extraction using Background and Foreground Information. In Proceedings of the 8th IAPR International Workshop on Document Analysis Systems, (315–322). http://refbase.cvc.uab.es/show.php?record=1047
Marçal Rusiñol, & Josep Llados. (2008). Word and Symbol Spotting using Spatial Organization of Local Descriptors. In Proceedings of the 8th IAPR International Workshop on Document Analysis Systems, (489–496). http://refbase.cvc.uab.es/show.php?record=1059
T.O. Nguyen, Salvatore Tabbone, & Oriol Ramos Terrades. (2008). Symbol Descriptor Based on Shape Context and Vector Model of Information Retrieval. In Proceedings of the 8th IAPR International Workshop on Document Analysis Systems, (pp. 191–197). http://refbase.cvc.uab.es/show.php?record=1873
Christian Keilstrup Ingwersen, Artur Xarles, Albert Clapes, Meysam Madadi, Janus Nortoft Jensen, Morten Rieger Hannemose, et al. (2023). Video-based Skill Assessment for Golf: Estimating Golf Handicap. In Proceedings of the 6th International Workshop on Multimedia Content Analysis in Sports (pp. 31–39). Abstract: Automated skill assessment in sports using video-based analysis holds great potential for revolutionizing coaching methodologies. This paper focuses on the problem of skill determination in golfers by leveraging deep learning models applied to a large database of video recordings of golf swings. We investigate different regression, ranking and classification based methods and compare to a simple baseline approach. The performance is evaluated using mean squared error (MSE) as well as computing the percentages of correctly ranked pairs based on the Kendall correlation. Our results demonstrate an improvement over the baseline, with a 35% lower mean squared error and 68% correctly ranked pairs. However, achieving fine-grained skill assessment remains challenging. This work contributes to the development of AI-driven coaching systems and advances the understanding of video-based skill determination in the context of golf. http://refbase.cvc.uab.es/show.php?record=3929
Artur Xarles, Sergio Escalera, Thomas B. Moeslund, & Albert Clapes. (2023). ASTRA: An Action Spotting TRAnsformer for Soccer Videos. In Proceedings of the 6th International Workshop on Multimedia Content Analysis in Sports (93–102). Abstract: In this paper, we introduce ASTRA, a Transformer-based model designed for the task of Action Spotting in soccer matches. ASTRA addresses several challenges inherent in the task and dataset, including the requirement for precise action localization, the presence of a long-tail data distribution, non-visibility in certain actions, and inherent label noise. To do so, ASTRA incorporates (a) a Transformer encoder-decoder architecture to achieve the desired output temporal resolution and to produce precise predictions, (b) a balanced mixup strategy to handle the long-tail distribution of the data, (c) an uncertainty-aware displacement head to capture the label variability, and (d) input audio signal to enhance detection of non-visible actions. Results demonstrate the effectiveness of ASTRA, achieving a tight Average-mAP of 66.82 on the test set. Moreover, in the SoccerNet 2023 Action Spotting challenge, we secure the 3rd position with an Average-mAP of 70.21 on the challenge set. http://refbase.cvc.uab.es/show.php?record=3970
Francisco Javier Orozco, F.A. Garcia, J.L. Arcos, & Jordi Gonzalez. (2007). Spatio-Temporal Reasoning for Reliable Facial Expression Interpretation. In Proceedings of the 5th International Conference on Computer Vision Systems. http://refbase.cvc.uab.es/show.php?record=772
David Geronimo, Angel Sappa, Antonio Lopez, & Daniel Ponsa. (2007). Adaptive Image Sampling and Windows Classification for On-board Pedestrian Detection. In Proceedings of the 5th International Conference on Computer Vision Systems. Abstract: On–board pedestrian detection is in the frontier of the state–of–the–art since it implies processing outdoor scenarios from a mobile platform and searching for aspect–changing objects in cluttered urban environments. Most promising approaches include the development of classifiers based on feature selection and machine learning. However, they use a large number of features which compromises real–time. Thus, methods for running the classifiers in only a few image windows must be provided. In this paper we contribute in both aspects, proposing a camera pose estimation method for adaptive sparse image sampling, as well as a classifier for pedestrian detection based on Haar wavelets and edge orientation histograms as features and AdaBoost as learning machine. Both proposals are compared with relevant approaches in the literature, showing comparable results but reducing processing time by four for the sampling tasks and by ten for the classification one. Keywords: Pedestrian Detection http://refbase.cvc.uab.es/show.php?record=786
Maya Dimitrova, N. Kushmerick, Petia Radeva, & Juan J. Villanueva. (2003). User Assesment of a Visual Genre Classifier. http://refbase.cvc.uab.es/show.php?record=372
Mohamed Ali Souibgui, Sanket Biswas, Andres Mafla, Ali Furkan Biten, Alicia Fornes, Yousri Kessentini, et al. (2023). Text-DIAE: a self-supervised degradation invariant autoencoder for text recognition and document enhancement. In Proceedings of the 37th AAAI Conference on Artificial Intelligence (Vol. 37). Abstract: In this paper, we propose a Text-Degradation Invariant Auto Encoder (Text-DIAE), a self-supervised model designed to tackle two tasks, text recognition (handwritten or scene-text) and document image enhancement. We start by employing a transformer-based architecture that incorporates three pretext tasks as learning objectives to be optimized during pre-training without the usage of labelled data. Each of the pretext objectives is specifically tailored for the final downstream tasks. We conduct several ablation experiments that confirm the design choice of the selected pretext tasks. Importantly, the proposed model does not exhibit limitations of previous state-of-the-art methods based on contrastive losses, while at the same time requiring substantially fewer data samples to converge. Finally, we demonstrate that our method surpasses the state-of-the-art in existing supervised and self-supervised settings in handwritten and scene text recognition and document image enhancement. Our code and trained models will be made publicly available at https://github.com/dali92002/SSL-OCR Keywords: Representation Learning for Vision; CV Applications; CV Language and Vision; ML Unsupervised; Self-Supervised Learning http://refbase.cvc.uab.es/show.php?record=3848

Dimosthenis Karatzas. (2008). Detecting Gradients in Text Images Using the Hough Transform. In Proceedings of the 8th International Workshop on Document Analysis Systems, (245–252).

Alicia Fornes, Josep Llados, Gemma Sanchez, & Horst Bunke. (2008). Writer Identification in Old Handwritten Music Scores. In Proceedings of the 8th International Workshop on Document Analysis Systems, (347–353).

Antonio Lopez, Cristina Cañero, Joan Serrat, J. Saludes, Felipe Lumbreras, & T. Graf. (2005). Detection of lane markings based on ridgeness and RANSAC.

Daniel Ponsa, Antonio Lopez, Joan Serrat, Felipe Lumbreras, & T. Graf. (2005). Multiple Vehicle 3D Tracking Using an Unscented Kalman Filter.

Daniel Ponsa, Antonio Lopez, Felipe Lumbreras, Joan Serrat, & T. Graf. (2005). 3D Vehicle Sensor based on Monocular Vision.

Mirko Arnold, Stephan Ameling, Anarta Ghosh, & Gerard Lacey. (2011). Quality Improvement of Endoscopy Videos. In Proceedings of the 8th IASTED International Conference on Biomedical Engineering (Vol. 723).

Partha Pratim Roy, Umapada Pal, & Josep Llados. (2008). Multi-oriented English Text Line Extraction using Background and Foreground Information. In Proceedings of the 8th IAPR International Workshop on Document Analysis Systems, (315–322).

Marçal Rusiñol, & Josep Llados. (2008). Word and Symbol Spotting using Spatial Organization of Local Descriptors. In Proceedings of the 8th IAPR International Workshop on Document Analysis Systems, (489–496).

T.O. Nguyen, Salvatore Tabbone, & Oriol Ramos Terrades. (2008). Symbol Descriptor Based on Shape Context and Vector Model of Information Retrieval. In Proceedings of the 8th IAPR International Workshop on Document Analysis Systems, (pp. 191–197).

Christian Keilstrup Ingwersen, Artur Xarles, Albert Clapes, Meysam Madadi, Janus Nortoft Jensen, Morten Rieger Hannemose, et al. (2023). Video-based Skill Assessment for Golf: Estimating Golf Handicap. In Proceedings of the 6th International Workshop on Multimedia Content Analysis in Sports (pp. 31–39).

Artur Xarles, Sergio Escalera, Thomas B. Moeslund, & Albert Clapes. (2023). ASTRA: An Action Spotting TRAnsformer for Soccer Videos. In Proceedings of the 6th International Workshop on Multimedia Content Analysis in Sports (93–102).

Francisco Javier Orozco, F.A. Garcia, J.L. Arcos, & Jordi Gonzalez. (2007). Spatio-Temporal Reasoning for Reliable Facial Expression Interpretation. In Proceedings of the 5th International Conference on Computer Vision Systems.

David Geronimo, Angel Sappa, Antonio Lopez, & Daniel Ponsa. (2007). Adaptive Image Sampling and Windows Classification for On-board Pedestrian Detection. In Proceedings of the 5th International Conference on Computer Vision Systems.

Maya Dimitrova, N. Kushmerick, Petia Radeva, & Juan J. Villanueva. (2003). User Assesment of a Visual Genre Classifier.

Mohamed Ali Souibgui, Sanket Biswas, Andres Mafla, Ali Furkan Biten, Alicia Fornes, Yousri Kessentini, et al. (2023). Text-DIAE: a self-supervised degradation invariant autoencoder for text recognition and document enhancement. In Proceedings of the 37th AAAI Conference on Artificial Intelligence (Vol. 37).