|
Records |
Links |
|
Author |
Alicia Fornes; Volkmar Frinken; Andreas Fischer; Jon Almazan; G. Jackson; Horst Bunke |
![goto web page (via DOI) doi](img/doi.gif)
![find record details (via OpenURL) openurl](img/xref.gif)
|
|
Title |
A Keyword Spotting Approach Using Blurred Shape Model-Based Descriptors |
Type |
Conference Article |
|
Year |
2011 |
Publication ![sorted by Publication field, ascending order (up)](img/sort_asc.gif) |
Proceedings of the 2011 Workshop on Historical Document Imaging and Processing |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
83-90 |
|
|
Keywords |
|
|
|
Abstract |
The automatic processing of handwritten historical documents is considered a hard problem in pattern recognition. In addition to the challenges given by modern handwritten data, a lack of training data as well as effects caused by the degradation of documents can be observed. In this scenario, keyword spotting arises to be a viable solution to make documents amenable for searching and browsing. For this task we propose the adaptation of shape descriptors used in symbol recognition. By treating each word image as a shape, it can be represented using the Blurred Shape Model and the De-formable Blurred Shape Model. Experiments on the George Washington database demonstrate that this approach is able to outperform the commonly used Dynamic Time Warping approach. |
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
ACM |
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
978-1-4503-0916-5 |
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
HIP |
|
|
Notes |
DAG |
Approved |
no |
|
|
Call Number |
Admin @ si @ FFF2011a |
Serial |
1823 |
|
Permanent link to this record |
|
|
|
|
Author |
Andreas Fischer; Volkmar Frinken; Alicia Fornes; Horst Bunke |
![goto web page (via DOI) doi](img/doi.gif)
|
|
Title |
Transcription Alignment of Latin Manuscripts Using Hidden Markov Models |
Type |
Conference Article |
|
Year |
2011 |
Publication ![sorted by Publication field, ascending order (up)](img/sort_asc.gif) |
Proceedings of the 2011 Workshop on Historical Document Imaging and Processing |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
29-36 |
|
|
Keywords |
|
|
|
Abstract |
Transcriptions of historical documents are a valuable source for extracting labeled handwriting images that can be used for training recognition systems. In this paper, we introduce the Saint Gall database that includes images as well as the transcription of a Latin manuscript from the 9th century written in Carolingian script. Although the available transcription is of high quality for a human reader, the spelling of the words is not accurate when compared with the handwriting image. Hence, the transcription poses several challenges for alignment regarding, e.g., line breaks, abbreviations, and capitalization. We propose an alignment system based on character Hidden Markov Models that can cope with these challenges and efficiently aligns complete document pages. On the Saint Gall database, we demonstrate that a considerable alignment accuracy can be achieved, even with weakly trained character models. |
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
ACM |
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
HIP |
|
|
Notes |
DAG |
Approved |
no |
|
|
Call Number |
Admin @ si @ FFF2011b |
Serial |
1824 |
|
Permanent link to this record |
|
|
|
|
Author |
Cristina Palmero; Oleg V Komogortsev; Sergio Escalera; Sachin S Talathi |
![goto web page url](img/www.gif)
|
|
Title |
Multi-Rate Sensor Fusion for Unconstrained Near-Eye Gaze Estimation |
Type |
Conference Article |
|
Year |
2023 |
Publication ![sorted by Publication field, ascending order (up)](img/sort_asc.gif) |
Proceedings of the 2023 Symposium on Eye Tracking Research and Applications |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
1-8 |
|
|
Keywords |
|
|
|
Abstract |
The power requirements of video-oculography systems can be prohibitive for high-speed operation on portable devices. Recently, low-power alternatives such as photosensors have been evaluated, providing gaze estimates at high frequency with a trade-off in accuracy and robustness. Potentially, an approach combining slow/high-fidelity and fast/low-fidelity sensors should be able to exploit their complementarity to track fast eye motion accurately and robustly. To foster research on this topic, we introduce OpenSFEDS, a near-eye gaze estimation dataset containing approximately 2M synthetic camera-photosensor image pairs sampled at 500 Hz under varied appearance and camera position. We also formulate the task of sensor fusion for gaze estimation, proposing a deep learning framework consisting in appearance-based encoding and temporal eye-state dynamics. We evaluate several single- and multi-rate fusion baselines on OpenSFEDS, achieving 8.7% error decrease when tracking fast eye movements with a multi-rate approach vs. a gaze forecasting approach operating with a low-speed sensor alone. |
|
|
Address |
Tubingen; Germany; May 2023 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
ETRA |
|
|
Notes |
HUPBA |
Approved |
no |
|
|
Call Number |
Admin @ si @ PKE2023 |
Serial |
3923 |
|
Permanent link to this record |
|
|
|
|
Author |
Debora Gil; Jordi Gonzalez; Gemma Sanchez (eds) |
![find book details (via ISBN) isbn](img/isbn.gif)
|
|
Title |
Computer Vision: Advances in Research and Development |
Type |
Book Whole |
|
Year |
2007 |
Publication ![sorted by Publication field, ascending order (up)](img/sort_asc.gif) |
Proceedings of the 2nd CVC International Workshop |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
|
|
|
Keywords |
|
|
|
Abstract |
|
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
UAB |
Place of Publication |
Bellaterra (Spain) |
Editor |
Debora Gil; Jordi Gonzalez; Gemma Sanchez |
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
2 |
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
978-84-935251-4-9 |
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
IAM; ISE; DAG |
Approved |
no |
|
|
Call Number |
IAM @ iam @ GGS2007 |
Serial |
1493 |
|
Permanent link to this record |
|
|
|
|
Author |
J. Filipe; Juan Andrade; J.L. Ferrier |
![find record details (via OpenURL) openurl](img/xref.gif)
|
|
Title |
FAF 2005 |
Type |
Miscellaneous |
|
Year |
2005 |
Publication ![sorted by Publication field, ascending order (up)](img/sort_asc.gif) |
Proceedings of the 2nd International Conference on Informatics in Control, Automation and Robotics, INSTICC Press |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
|
|
|
Keywords |
|
|
|
Abstract |
|
|
|
Address |
Barcelona (Spain) |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
|
Approved |
no |
|
|
Call Number |
Admin @ si @ FAF2005 |
Serial |
609 |
|
Permanent link to this record |
|
|
|
|
Author |
A. Pujol; Jordi Vitria; Petia Radeva; Xavier Binefa; Robert Benavente; Ernest Valveny; Craig Von Land |
![find record details (via OpenURL) openurl](img/xref.gif)
|
|
Title |
Real time pharmaceutical product recognition using color and shape indexing. |
Type |
Conference Article |
|
Year |
1999 |
Publication ![sorted by Publication field, ascending order (up)](img/sort_asc.gif) |
Proceedings of the 2nd International Workshop on European Scientific and Industrial Collaboration (WESIC´99), Promotoring Advanced Technologies in Manufacturing. |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
|
|
|
Keywords |
|
|
|
Abstract |
|
|
|
Address |
Wales |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
OR;MILAB;DAG;CIC;MV |
Approved |
no |
|
|
Call Number |
BCNPCL @ bcnpcl @ PVR1999 |
Serial |
24 |
|
Permanent link to this record |
|
|
|
|
Author |
Siyang Song; Micol Spitale; Cheng Luo; German Barquero; Cristina Palmero; Sergio Escalera; Michel Valstar; Tobias Baur; Fabien Ringeval; Elisabeth Andre; Hatice Gunes |
![goto web page url](img/www.gif)
|
|
Title |
REACT2023: The First Multiple Appropriate Facial Reaction Generation Challenge |
Type |
Conference Article |
|
Year |
2023 |
Publication ![sorted by Publication field, ascending order (up)](img/sort_asc.gif) |
Proceedings of the 31st ACM International Conference on Multimedia |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
9620–9624 |
|
|
Keywords |
|
|
|
Abstract |
The Multiple Appropriate Facial Reaction Generation Challenge (REACT2023) is the first competition event focused on evaluating multimedia processing and machine learning techniques for generating human-appropriate facial reactions in various dyadic interaction scenarios, with all participants competing strictly under the same conditions. The goal of the challenge is to provide the first benchmark test set for multi-modal information processing and to foster collaboration among the audio, visual, and audio-visual behaviour analysis and behaviour generation (a.k.a generative AI) communities, to compare the relative merits of the approaches to automatic appropriate facial reaction generation under different spontaneous dyadic interaction conditions. This paper presents: (i) the novelties, contributions and guidelines of the REACT2023 challenge; (ii) the dataset utilized in the challenge; and (iii) the performance of the baseline systems on the two proposed sub-challenges: Offline Multiple Appropriate Facial Reaction Generation and Online Multiple Appropriate Facial Reaction Generation, respectively. The challenge baseline code is publicly available at https://github.com/reactmultimodalchallenge/baseline_react2023. |
|
|
Address |
Otawa; Canada; October 2023 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
MM |
|
|
Notes |
HUPBA |
Approved |
no |
|
|
Call Number |
Admin @ si @ SSL2023 |
Serial |
3931 |
|
Permanent link to this record |
|
|
|
|
Author |
Mohamed Ali Souibgui; Sanket Biswas; Andres Mafla; Ali Furkan Biten; Alicia Fornes; Yousri Kessentini; Josep Llados; Lluis Gomez; Dimosthenis Karatzas |
![goto web page url](img/www.gif)
|
|
Title |
Text-DIAE: a self-supervised degradation invariant autoencoder for text recognition and document enhancement |
Type |
Conference Article |
|
Year |
2023 |
Publication ![sorted by Publication field, ascending order (up)](img/sort_asc.gif) |
Proceedings of the 37th AAAI Conference on Artificial Intelligence |
Abbreviated Journal |
|
|
|
Volume |
37 |
Issue |
2 |
Pages |
|
|
|
Keywords |
Representation Learning for Vision; CV Applications; CV Language and Vision; ML Unsupervised; Self-Supervised Learning |
|
|
Abstract |
In this paper, we propose a Text-Degradation Invariant Auto Encoder (Text-DIAE), a self-supervised model designed to tackle two tasks, text recognition (handwritten or scene-text) and document image enhancement. We start by employing a transformer-based architecture that incorporates three pretext tasks as learning objectives to be optimized during pre-training without the usage of labelled data. Each of the pretext objectives is specifically tailored for the final downstream tasks. We conduct several ablation experiments that confirm the design choice of the selected pretext tasks. Importantly, the proposed model does not exhibit limitations of previous state-of-the-art methods based on contrastive losses, while at the same time requiring substantially fewer data samples to converge. Finally, we demonstrate that our method surpasses the state-of-the-art in existing supervised and self-supervised settings in handwritten and scene text recognition and document image enhancement. Our code and trained models will be made publicly available at https://github.com/dali92002/SSL-OCR |
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
AAAI |
|
|
Notes |
DAG |
Approved |
no |
|
|
Call Number |
Admin @ si @ SBM2023 |
Serial |
3848 |
|
Permanent link to this record |
|
|
|
|
Author |
Khanh Nguyen; Ali Furkan Biten; Andres Mafla; Lluis Gomez; Dimosthenis Karatzas |
![goto web page url](img/www.gif)
|
|
Title |
Show, Interpret and Tell: Entity-Aware Contextualised Image Captioning in Wikipedia |
Type |
Conference Article |
|
Year |
2023 |
Publication ![sorted by Publication field, ascending order (up)](img/sort_asc.gif) |
Proceedings of the 37th AAAI Conference on Artificial Intelligence |
Abbreviated Journal |
|
|
|
Volume |
37 |
Issue |
2 |
Pages |
1940-1948 |
|
|
Keywords |
|
|
|
Abstract |
Humans exploit prior knowledge to describe images, and are able to adapt their explanation to specific contextual information given, even to the extent of inventing plausible explanations when contextual information and images do not match. In this work, we propose the novel task of captioning Wikipedia images by integrating contextual knowledge. Specifically, we produce models that jointly reason over Wikipedia articles, Wikimedia images and their associated descriptions to produce contextualized captions. The same Wikimedia image can be used to illustrate different articles, and the produced caption needs to be adapted to the specific context allowing us to explore the limits of the model to adjust captions to different contextual information. Dealing with out-of-dictionary words and Named Entities is a challenging task in this domain. To address this, we propose a pre-training objective, Masked Named Entity Modeling (MNEM), and show that this pretext task results to significantly improved models. Furthermore, we verify that a model pre-trained in Wikipedia generalizes well to News Captioning datasets. We further define two different test splits according to the difficulty of the captioning task. We offer insights on the role and the importance of each modality and highlight the limitations of our model. |
|
|
Address |
Washington; USA; February 2023 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
AAAI |
|
|
Notes |
DAG |
Approved |
no |
|
|
Call Number |
Admin @ si @ NBM2023 |
Serial |
3860 |
|
Permanent link to this record |
|
|
|
|
Author |
Maya Dimitrova; N. Kushmerick; Petia Radeva; Juan J. Villanueva |
![find record details (via OpenURL) openurl](img/xref.gif)
|
|
Title |
User Assesment of a Visual Genre Classifier |
Type |
Miscellaneous |
|
Year |
2003 |
Publication ![sorted by Publication field, ascending order (up)](img/sort_asc.gif) |
Proceedings of the 3rd IASTED Int. Conference Visualization, Imaging and Image Processing |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
|
|
|
Keywords |
|
|
|
Abstract |
|
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
MILAB |
Approved |
no |
|
|
Call Number |
BCNPCL @ bcnpcl @ DKR2003 |
Serial |
372 |
|
Permanent link to this record |
|
|
|
|
Author |
Francisco Javier Orozco; F.A. Garcia; J.L. Arcos; Jordi Gonzalez |
![find record details (via OpenURL) openurl](img/xref.gif)
|
|
Title |
Spatio-Temporal Reasoning for Reliable Facial Expression Interpretation |
Type |
Conference Article |
|
Year |
2007 |
Publication ![sorted by Publication field, ascending order (up)](img/sort_asc.gif) |
Proceedings of the 5th International Conference on Computer Vision Systems |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
|
|
|
Keywords |
|
|
|
Abstract |
|
|
|
Address |
Bielefeld University (Germany) |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
ICVS |
|
|
Notes |
ISE |
Approved |
no |
|
|
Call Number |
ISE @ ise @ OGA2007 |
Serial |
772 |
|
Permanent link to this record |
|
|
|
|
Author |
David Geronimo; Angel Sappa; Antonio Lopez; Daniel Ponsa |
![download PDF file pdf](img/file_PDF.gif)
![find record details (via OpenURL) openurl](img/xref.gif)
|
|
Title |
Adaptive Image Sampling and Windows Classification for On-board Pedestrian Detection |
Type |
Conference Article |
|
Year |
2007 |
Publication ![sorted by Publication field, ascending order (up)](img/sort_asc.gif) |
Proceedings of the 5th International Conference on Computer Vision Systems |
Abbreviated Journal |
ICVS |
|
|
Volume |
|
Issue |
|
Pages |
|
|
|
Keywords |
Pedestrian Detection |
|
|
Abstract |
On–board pedestrian detection is in the frontier of the state–of–the–art since it implies processing outdoor scenarios from a mobile platform and searching for aspect–changing objects in cluttered urban environments. Most promising approaches include the development of classifiers based on feature selection and machine learning. However, they use a large number of features which compromises real–time. Thus, methods for running the classifiers in only a few image windows must be provided. In this paper we contribute in both aspects, proposing a camera
pose estimation method for adaptive sparse image sampling, as well as a classifier for pedestrian detection based on Haar wavelets and edge orientation histograms as features and AdaBoost as learning machine. Both proposals are compared with relevant approaches in the literature, showing comparable results but reducing processing time by four for the sampling tasks and by ten for the classification one. |
|
|
Address |
Bielefeld (Germany) |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
ADAS |
Approved |
no |
|
|
Call Number |
ADAS @ adas @ gsl2007a |
Serial |
786 |
|
Permanent link to this record |
|
|
|
|
Author |
Christian Keilstrup Ingwersen; Artur Xarles; Albert Clapes; Meysam Madadi; Janus Nortoft Jensen; Morten Rieger Hannemose; Anders Bjorholm Dahl; Sergio Escalera |
![goto web page url](img/www.gif)
|
|
Title |
Video-based Skill Assessment for Golf: Estimating Golf Handicap |
Type |
Conference Article |
|
Year |
2023 |
Publication ![sorted by Publication field, ascending order (up)](img/sort_asc.gif) |
Proceedings of the 6th International Workshop on Multimedia Content Analysis in Sports |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
31-39 |
|
|
Keywords |
|
|
|
Abstract |
Automated skill assessment in sports using video-based analysis holds great potential for revolutionizing coaching methodologies. This paper focuses on the problem of skill determination in golfers by leveraging deep learning models applied to a large database of video recordings of golf swings. We investigate different regression, ranking and classification based methods and compare to a simple baseline approach. The performance is evaluated using mean squared error (MSE) as well as computing the percentages of correctly ranked pairs based on the Kendall correlation. Our results demonstrate an improvement over the baseline, with a 35% lower mean squared error and 68% correctly ranked pairs. However, achieving fine-grained skill assessment remains challenging. This work contributes to the development of AI-driven coaching systems and advances the understanding of video-based skill determination in the context of golf. |
|
|
Address |
Otawa; Canada; October 2023 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
MMSports |
|
|
Notes |
HUPBA |
Approved |
no |
|
|
Call Number |
Admin @ si @ KXC2023 |
Serial |
3929 |
|
Permanent link to this record |
|
|
|
|
Author |
Artur Xarles; Sergio Escalera; Thomas B. Moeslund; Albert Clapes |
![goto web page url](img/www.gif)
|
|
Title |
ASTRA: An Action Spotting TRAnsformer for Soccer Videos |
Type |
Conference Article |
|
Year |
2023 |
Publication ![sorted by Publication field, ascending order (up)](img/sort_asc.gif) |
Proceedings of the 6th International Workshop on Multimedia Content Analysis in Sports |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
93–102 |
|
|
Keywords |
|
|
|
Abstract |
In this paper, we introduce ASTRA, a Transformer-based model designed for the task of Action Spotting in soccer matches. ASTRA addresses several challenges inherent in the task and dataset, including the requirement for precise action localization, the presence of a long-tail data distribution, non-visibility in certain actions, and inherent label noise. To do so, ASTRA incorporates (a) a Transformer encoder-decoder architecture to achieve the desired output temporal resolution and to produce precise predictions, (b) a balanced mixup strategy to handle the long-tail distribution of the data, (c) an uncertainty-aware displacement head to capture the label variability, and (d) input audio signal to enhance detection of non-visible actions. Results demonstrate the effectiveness of ASTRA, achieving a tight Average-mAP of 66.82 on the test set. Moreover, in the SoccerNet 2023 Action Spotting challenge, we secure the 3rd position with an Average-mAP of 70.21 on the challenge set. |
|
|
Address |
Otawa; Canada; October 2023 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
MMSports |
|
|
Notes |
HUPBA |
Approved |
no |
|
|
Call Number |
Admin @ si @ XEM2023 |
Serial |
3970 |
|
Permanent link to this record |
|
|
|
|
Author |
Partha Pratim Roy; Umapada Pal; Josep Llados |
![find record details (via OpenURL) openurl](img/xref.gif)
|
|
Title |
Multi-oriented English Text Line Extraction using Background and Foreground Information |
Type |
Conference Article |
|
Year |
2008 |
Publication ![sorted by Publication field, ascending order (up)](img/sort_asc.gif) |
Proceedings of the 8th IAPR International Workshop on Document Analysis Systems, |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
315–322 |
|
|
Keywords |
|
|
|
Abstract |
|
|
|
Address |
Nara (Japo) |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
DAS |
|
|
Notes |
DAG |
Approved |
no |
|
|
Call Number |
DAG @ dag @ RPL2008b |
Serial |
1047 |
|
Permanent link to this record |