toggle visibility Search & Display Options

Select All    Deselect All
 |   | 
Details
   print
  Records Links
Author Arnau Baro edit  isbn
openurl 
  Title Reading Music Systems: From Deep Optical Music Recognition to Contextual Methods Type Book Whole
  Year 2022 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract (down) The transcription of sheet music into some machine-readable format can be carried out manually. However, the complexity of music notation inevitably leads to burdensome software for music score editing, which makes the whole process
very time-consuming and prone to errors. Consequently, automatic transcription
systems for musical documents represent interesting tools.
Document analysis is the subject that deals with the extraction and processing
of documents through image and pattern recognition. It is a branch of computer
vision. Taking music scores as source, the field devoted to address this task is
known as Optical Music Recognition (OMR). Typically, an OMR system takes an
image of a music score and automatically extracts its content into some symbolic
structure such as MEI or MusicXML.
In this dissertation, we have investigated different methods for recognizing a
single staff section (e.g. scores for violin, flute, etc.), much in the same way as most text recognition research focuses on recognizing words appearing in a given line image. These methods are based in two different methodologies. On the one hand, we present two methods based on Recurrent Neural Networks, in particular, the
Long Short-Term Memory Neural Network. On the other hand, a method based on Sequence to Sequence models is detailed.
Music context is needed to improve the OMR results, just like language models
and dictionaries help in handwriting recognition. For example, syntactical rules
and grammars could be easily defined to cope with the ambiguities in the rhythm.
In music theory, for example, the time signature defines the amount of beats per
bar unit. Thus, in the second part of this dissertation, different methodologies
have been investigated to improve the OMR recognition. We have explored three
different methods: (a) a graphic tree-structure representation, Dendrograms, that
joins, at each level, its primitives following a set of rules, (b) the incorporation of Language Models to model the probability of a sequence of tokens, and (c) graph neural networks to analyze the music scores to avoid meaningless relationships between music primitives.
Finally, to train all these methodologies, and given the method-specificity of
the datasets in the literature, we have created four different music datasets. Two of them are synthetic with a modern or old handwritten appearance, whereas the
other two are real handwritten scores, being one of them modern and the other
old.
 
  Address  
  Corporate Author Thesis Ph.D. thesis  
  Publisher IMPRIMA Place of Publication Editor Alicia Fornes  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN 978-84-124793-8-6 Medium  
  Area Expedition Conference  
  Notes DAG; Approved no  
  Call Number Admin @ si @ Bar2022 Serial 3754  
Permanent link to this record
 

 
Author Ali Furkan Biten; Andres Mafla; Lluis Gomez; Dimosthenis Karatzas edit   pdf
url  doi
openurl 
  Title Is An Image Worth Five Sentences? A New Look into Semantics for Image-Text Matching Type Conference Article
  Year 2022 Publication Winter Conference on Applications of Computer Vision Abbreviated Journal  
  Volume Issue Pages 1391-1400  
  Keywords Measurement; Training; Integrated circuits; Annotations; Semantics; Training data; Semisupervised learning  
  Abstract (down) The task of image-text matching aims to map representations from different modalities into a common joint visual-textual embedding. However, the most widely used datasets for this task, MSCOCO and Flickr30K, are actually image captioning datasets that offer a very limited set of relationships between images and sentences in their ground-truth annotations. This limited ground truth information forces us to use evaluation metrics based on binary relevance: given a sentence query we consider only one image as relevant. However, many other relevant images or captions may be present in the dataset. In this work, we propose two metrics that evaluate the degree of semantic relevance of retrieved items, independently of their annotated binary relevance. Additionally, we incorporate a novel strategy that uses an image captioning metric, CIDEr, to define a Semantic Adaptive Margin (SAM) to be optimized in a standard triplet loss. By incorporating our formulation to existing models, a large improvement is obtained in scenarios where available training data is limited. We also demonstrate that the performance on the annotated image-caption pairs is maintained while improving on other non-annotated relevant items when employing the full training set. The code for our new metric can be found at github. com/furkanbiten/ncsmetric and the model implementation at github. com/andrespmd/semanticadaptive_margin.  
  Address Virtual; Waikoloa; Hawai; USA; January 2022  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference WACV  
  Notes DAG; 600.155; 302.105; Approved no  
  Call Number Admin @ si @ BMG2022 Serial 3663  
Permanent link to this record
 

 
Author Silvio Giancola; Anthony Cioppa; Adrien Deliege; Floriane Magera; Vladimir Somers; Le Kang; Xin Zhou; Olivier Barnich; Christophe De Vleeschouwer; Alexandre Alahi; Bernard Ghanem; Marc Van Droogenbroeck; Abdulrahman Darwish; Adrien Maglo; Albert Clapes; Andreas Luyts; Andrei Boiarov; Artur Xarles; Astrid Orcesi; Avijit Shah; Baoyu Fan; Bharath Comandur; Chen Chen; Chen Zhang; Chen Zhao; Chengzhi Lin; Cheuk-Yiu Chan; Chun Chuen Hui; Dengjie Li; Fan Yang; Fan Liang; Fang Da; Feng Yan; Fufu Yu; Guanshuo Wang; H. Anthony Chan; He Zhu; Hongwei Kan; Jiaming Chu; Jianming Hu; Jianyang Gu; Jin Chen; Joao V. B. Soares; Jonas Theiner; Jorge De Corte; Jose Henrique Brito; Jun Zhang; Junjie Li; Junwei Liang; Leqi Shen; Lin Ma; Lingchi Chen; Miguel Santos Marques; Mike Azatov; Nikita Kasatkin; Ning Wang; Qiong Jia; Quoc Cuong Pham; Ralph Ewerth; Ran Song; Rengang Li; Rikke Gade; Ruben Debien; Runze Zhang; Sangrok Lee; Sergio Escalera; Shan Jiang; Shigeyuki Odashima; Shimin Chen; Shoichi Masui; Shouhong Ding; Sin-wai Chan; Siyu Chen; Tallal El-Shabrawy; Tao He; Thomas B. Moeslund; Wan-Chi Siu; Wei Zhang; Wei Li; Xiangwei Wang; Xiao Tan; Xiaochuan Li; Xiaolin Wei; Xiaoqing Ye; Xing Liu; Xinying Wang; Yandong Guo; Yaqian Zhao; Yi Yu; Yingying Li; Yue He; Yujie Zhong; Zhenhua Guo; Zhiheng Li edit  url
doi  openurl
  Title SoccerNet 2022 Challenges Results Type Conference Article
  Year 2022 Publication 5th International ACM Workshop on Multimedia Content Analysis in Sports Abbreviated Journal  
  Volume Issue Pages 75-86  
  Keywords  
  Abstract (down) The SoccerNet 2022 challenges were the second annual video understanding challenges organized by the SoccerNet team. In 2022, the challenges were composed of 6 vision-based tasks: (1) action spotting, focusing on retrieving action timestamps in long untrimmed videos, (2) replay grounding, focusing on retrieving the live moment of an action shown in a replay, (3) pitch localization, focusing on detecting line and goal part elements, (4) camera calibration, dedicated to retrieving the intrinsic and extrinsic camera parameters, (5) player re-identification, focusing on retrieving the same players across multiple views, and (6) multiple object tracking, focusing on tracking players and the ball through unedited video streams. Compared to last year's challenges, tasks (1-2) had their evaluation metrics redefined to consider tighter temporal accuracies, and tasks (3-6) were novel, including their underlying data and annotations. More information on the tasks, challenges and leaderboards are available on this https URL. Baselines and development kits are available on this https URL.  
  Address Lisboa; Portugal; October 2022  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference ACMW  
  Notes HUPBA; no menciona Approved no  
  Call Number Admin @ si @ GCD2022 Serial 3801  
Permanent link to this record
 

 
Author Joakim Bruslund Haurum; Meysam Madadi; Sergio Escalera; Thomas B. Moeslund edit   pdf
url  doi
openurl 
  Title Multi-Task Classification of Sewer Pipe Defects and Properties Using a Cross-Task Graph Neural Network Decoder Type Conference Article
  Year 2022 Publication Winter Conference on Applications of Computer Vision Abbreviated Journal  
  Volume Issue Pages 2806-2817  
  Keywords Vision Systems; Applications Multi-Task Classification  
  Abstract (down) The sewerage infrastructure is one of the most important and expensive infrastructures in modern society. In order to efficiently manage the sewerage infrastructure, automated sewer inspection has to be utilized. However, while sewer
defect classification has been investigated for decades, little attention has been given to classifying sewer pipe properties such as water level, pipe material, and pipe shape, which are needed to evaluate the level of sewer pipe deterioration.
In this work we classify sewer pipe defects and properties concurrently and present a novel decoder-focused multi-task classification architecture Cross-Task Graph Neural Network (CT-GNN), which refines the disjointed per-task predictions using cross-task information. The CT-GNN architecture extends the traditional disjointed task-heads decoder, by utilizing a cross-task graph and unique class node embeddings. The cross-task graph can either be determined a priori based on the conditional probability between the task classes or determined dynamically using self-attention.
CT-GNN can be added to any backbone and trained end-toend at a small increase in the parameter count. We achieve state-of-the-art performance on all four classification tasks in the Sewer-ML dataset, improving defect classification and
water level classification by 5.3 and 8.0 percentage points, respectively. We also outperform the single task methods as well as other multi-task classification approaches while introducing 50 times fewer parameters than previous modelfocused approaches.
 
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference WACV  
  Notes HUPBA; no proj Approved no  
  Call Number Admin @ si @ BME2022 Serial 3638  
Permanent link to this record
 

 
Author Mohamed Ramzy Ibrahim; Robert Benavente; Felipe Lumbreras; Daniel Ponsa edit   pdf
doi  openurl
  Title 3DRRDB: Super Resolution of Multiple Remote Sensing Images using 3D Residual in Residual Dense Blocks Type Conference Article
  Year 2022 Publication CVPR 2022 Workshop on IEEE Perception Beyond the Visible Spectrum workshop series (PBVS, 18th Edition) Abbreviated Journal  
  Volume Issue Pages  
  Keywords Training; Solid modeling; Three-dimensional displays; PSNR; Convolution; Superresolution; Pattern recognition  
  Abstract (down) The rapid advancement of Deep Convolutional Neural Networks helped in solving many remote sensing problems, especially the problems of super-resolution. However, most state-of-the-art methods focus more on Single Image Super-Resolution neglecting Multi-Image Super-Resolution. In this work, a new proposed 3D Residual in Residual Dense Blocks model (3DRRDB) focuses on remote sensing Multi-Image Super-Resolution for two different single spectral bands. The proposed 3DRRDB model explores the idea of 3D convolution layers in deeply connected Dense Blocks and the effect of local and global residual connections with residual scaling in Multi-Image Super-Resolution. The model tested on the Proba-V challenge dataset shows a significant improvement above the current state-of-the-art models scoring a Corrected Peak Signal to Noise Ratio (cPSNR) of 48.79 dB and 50.83 dB for Near Infrared (NIR) and RED Bands respectively. Moreover, the proposed 3DRRDB model scores a Corrected Structural Similarity Index Measure (cSSIM) of 0.9865 and 0.9909 for NIR and RED bands respectively.  
  Address New Orleans, USA; 19 June 2022  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference CVPRW  
  Notes MSIAU; 600.130 Approved no  
  Call Number Admin @ si @ IBL2022 Serial 3693  
Permanent link to this record
 

 
Author Vishwesh Pillai; Pranav Mehar; Manisha Das; Deep Gupta; Petia Radeva edit  url
doi  openurl
  Title Integrated Hierarchical and Flat Classifiers for Food Image Classification using Epistemic Uncertainty Type Conference Article
  Year 2022 Publication IEEE International Conference on Signal Processing and Communications Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract (down) The problem of food image recognition is an essential one in today’s context because health conditions such as diabetes, obesity, and heart disease require constant monitoring of a person’s diet. To automate this process, several models are available to recognize food images. Due to a considerable number of unique food dishes and various cuisines, a traditional flat classifier ceases to perform well. To address this issue, prediction schemes consisting of both flat and hierarchical classifiers, with the analysis of epistemic uncertainty are used to switch between the classifiers. However, the accuracy of the predictions made using epistemic uncertainty data remains considerably low. Therefore, this paper presents a prediction scheme using three different threshold criteria that helps to increase the accuracy of epistemic uncertainty predictions. The performance of the proposed method is demonstrated using several experiments performed on the MAFood-121 dataset. The experimental results validate the proposal performance and show that the proposed threshold criteria help to increase the overall accuracy of the predictions by correctly classifying the uncertainty distribution of the samples.  
  Address Bangalore; India; July 2022  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference SPCOM  
  Notes MILAB; no menciona Approved no  
  Call Number Admin @ si @ PMD2022 Serial 3796  
Permanent link to this record
 

 
Author Aitor Alvarez-Gila edit  openurl
  Title Self-supervised learning for image-to-image translation in the small data regime Type Book Whole
  Year 2022 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal  
  Volume Issue Pages  
  Keywords Computer vision; Neural networks; Self-supervised learning; Image-to-image mapping; Probabilistic programming  
  Abstract (down) The mass irruption of Deep Convolutional Neural Networks (CNNs) in computer vision since 2012 led to a dominance of the image understanding paradigm consisting in an end-to-end fully supervised learning workflow over large-scale annotated datasets. This approach proved to be extremely useful at solving a myriad of classic and new computer vision tasks with unprecedented performance —often, surpassing that of humans—, at the expense of vast amounts of human-labeled data, extensive computational resources and the disposal of all of our prior knowledge on the task at hand. Even though simple transfer learning methods, such as fine-tuning, have achieved remarkable impact, their success when the amount of labeled data in the target domain is small is limited. Furthermore, the non-static nature of data generation sources will often derive in data distribution shifts that degrade the performance of deployed models. As a consequence, there is a growing demand for methods that can exploit elements of prior knowledge and sources of information other than the manually generated ground truth annotations of the images during the network training process, so that they can adapt to new domains that constitute, if not a small data regime, at least a small labeled data regime. This thesis targets such few or no labeled data scenario in three distinct image-to-image mapping learning problems. It contributes with various approaches that leverage our previous knowledge of different elements of the image formation process: We first present a data-efficient framework for both defocus and motion blur detection, based on a model able to produce realistic synthetic local degradations. The framework comprises a self-supervised, a weakly-supervised and a semi-supervised instantiation, depending on the absence or availability and the nature of human annotations, and outperforms fully-supervised counterparts in a variety of settings. Our knowledge on color image formation is then used to gather input and target ground truth image pairs for the RGB to hyperspectral image reconstruction task. We make use of a CNN to tackle this problem, which, for the first time, allows us to exploit spatial context and achieve state-of-the-art results given a limited hyperspectral image set. In our last contribution to the subfield of data-efficient image-to-image transformation problems, we present the novel semi-supervised task of zero-pair cross-view semantic segmentation: we consider the case of relocation of the camera in an end-to-end trained and deployed monocular, fixed-view semantic segmentation system often found in industry. Under the assumption that we are allowed to obtain an additional set of synchronized but unlabeled image pairs of new scenes from both original and new camera poses, we present ZPCVNet, a model and training procedure that enables the production of dense semantic predictions in either source or target views at inference time. The lack of existing suitable public datasets to develop this approach led us to the creation of MVMO, a large-scale Multi-View, Multi-Object path-traced dataset with per-view semantic segmentation annotations. We expect MVMO to propel future research in the exciting under-developed fields of cross-view and multi-view semantic segmentation. Last, in a piece of applied research of direct application in the context of process monitoring of an Electric Arc Furnace (EAF) in a steelmaking plant, we also consider the problem of simultaneously estimating the temperature and spectral emissivity of distant hot emissive samples. To that end, we design our own capturing device, which integrates three point spectrometers covering a wide range of the Ultra-Violet, visible, and Infra-Red spectra and is capable of registering the radiance signal incoming from an 8cm diameter spot located up to 20m away. We then define a physically accurate radiative transfer model that comprises the effects of atmospheric absorbance, of the optical system transfer function, and of the sample temperature and spectral emissivity themselves. We solve this inverse problem without the need for annotated data using a probabilistic programming-based Bayesian approach, which yields full posterior distribution estimates of the involved variables that are consistent with laboratory-grade measurements.  
  Address Julu, 2019  
  Corporate Author Thesis Ph.D. thesis  
  Publisher Place of Publication Editor Joost Van de Weijer; Estibaliz Garrote  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes LAMP Approved no  
  Call Number Admin @ si @ Alv2022 Serial 3716  
Permanent link to this record
 

 
Author Miquel Angel Piera; Jose Luis Muñoz; Debora Gil; Gonzalo Martin; Jordi Manzano edit  doi
openurl 
  Title A Socio-Technical Simulation Model for the Design of the Future Single Pilot Cockpit: An Opportunity to Improve Pilot Performance Type Journal Article
  Year 2022 Publication IEEE Access Abbreviated Journal ACCESS  
  Volume 10 Issue Pages 22330-22343  
  Keywords Human factors ; Performance evaluation ; Simulation; Sociotechnical systems ; System performance  
  Abstract (down) The future deployment of single pilot operations must be supported by new cockpit computer services. Such services require an adaptive context-aware integration of technical functionalities with the concurrent tasks that a pilot must deal with. Advanced artificial intelligence supporting services and improved communication capabilities are the key enabling technologies that will render future cockpits more integrated with the present digitalized air traffic management system. However, an issue in the integration of such technologies is the lack of socio-technical analysis in the design of these teaming mechanisms. A key factor in determining how and when a service support should be provided is the dynamic evolution of pilot workload. This paper investigates how the socio-technical model-based systems engineering approach paves the way for the design of a digital assistant framework by formalizing this workload. The model was validated in an Airbus A-320 cockpit simulator, and the results confirmed the degraded pilot behavioral model and the performance impact according to different contextual flight deck information. This study contributes to practical knowledge for designing human-machine task-sharing systems.  
  Address Feb 2022  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes IAM; Approved no  
  Call Number Admin @ si @ PMG2022 Serial 3697  
Permanent link to this record
 

 
Author Guillermo Torres; Sonia Baeza; Carles Sanchez; Ignasi Guasch; Antoni Rosell; Debora Gil edit  doi
openurl 
  Title An Intelligent Radiomic Approach for Lung Cancer Screening Type Journal Article
  Year 2022 Publication Applied Sciences Abbreviated Journal APPLSCI  
  Volume 12 Issue 3 Pages 1568  
  Keywords Lung cancer; Early diagnosis; Screening; Neural networks; Image embedding; Architecture optimization  
  Abstract (down) The efficiency of lung cancer screening for reducing mortality is hindered by the high rate of false positives. Artificial intelligence applied to radiomics could help to early discard benign cases from the analysis of CT scans. The available amount of data and the fact that benign cases are a minority, constitutes a main challenge for the successful use of state of the art methods (like deep learning), which can be biased, over-fitted and lack of clinical reproducibility. We present an hybrid approach combining the potential of radiomic features to characterize nodules in CT scans and the generalization of the feed forward networks. In order to obtain maximal reproducibility with minimal training data, we propose an embedding of nodules based on the statistical significance of radiomic features for malignancy detection. This representation space of lesions is the input to a feed
forward network, which architecture and hyperparameters are optimized using own-defined metrics of the diagnostic power of the whole system. Results of the best model on an independent set of patients achieve 100% of sensitivity and 83% of specificity (AUC = 0.94) for malignancy detection.
 
  Address Jan 2022  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes IAM; 600.139; 600.145 Approved no  
  Call Number Admin @ si @ TBS2022 Serial 3699  
Permanent link to this record
 

 
Author Aura Hernandez-Sabate; Jose Elias Yauri; Pau Folch; Miquel Angel Piera; Debora Gil edit  doi
openurl 
  Title Recognition of the Mental Workloads of Pilots in the Cockpit Using EEG Signals Type Journal Article
  Year 2022 Publication Applied Sciences Abbreviated Journal APPLSCI  
  Volume 12 Issue 5 Pages 2298  
  Keywords Cognitive states; Mental workload; EEG analysis; Neural networks; Multimodal data fusion  
  Abstract (down) The commercial flightdeck is a naturally multi-tasking work environment, one in which interruptions are frequent come in various forms, contributing in many cases to aviation incident reports. Automatic characterization of pilots’ workloads is essential to preventing these kind of incidents. In addition, minimizing the physiological sensor network as much as possible remains both a challenge and a requirement. Electroencephalogram (EEG) signals have shown high correlations with specific cognitive and mental states, such as workload. However, there is not enough evidence in the literature to validate how well models generalize in cases of new subjects performing tasks with workloads similar to the ones included during the model’s training. In this paper, we propose a convolutional neural network to classify EEG features across different mental workloads in a continuous performance task test that partly measures working memory and working memory capacity. Our model is valid at the general population level and it is able to transfer task learning to pilot mental workload recognition in a simulated operational environment.  
  Address February 2022  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes IAM; ADAS; 600.139; 600.145; 600.118 Approved no  
  Call Number Admin @ si @ HYF2022 Serial 3720  
Permanent link to this record
 

 
Author Jun Wan; Chi Lin; Longyin Wen; Yunan Li; Qiguang Miao; Sergio Escalera; Gholamreza Anbarjafari; Isabelle Guyon; Guodong Guo; Stan Z. Li edit   pdf
url  doi
openurl 
  Title ChaLearn Looking at People: IsoGD and ConGD Large-scale RGB-D Gesture Recognition Type Journal Article
  Year 2022 Publication IEEE Transactions on Cybernetics Abbreviated Journal TCIBERN  
  Volume 52 Issue 5 Pages 3422-3433  
  Keywords  
  Abstract (down) The ChaLearn large-scale gesture recognition challenge has been run twice in two workshops in conjunction with the International Conference on Pattern Recognition (ICPR) 2016 and International Conference on Computer Vision (ICCV) 2017, attracting more than 200 teams round the world. This challenge has two tracks, focusing on isolated and continuous gesture recognition, respectively. This paper describes the creation of both benchmark datasets and analyzes the advances in large-scale gesture recognition based on these two datasets. We discuss the challenges of collecting large-scale ground-truth annotations of gesture recognition, and provide a detailed analysis of the current state-of-the-art methods for large-scale isolated and continuous gesture recognition based on RGB-D video sequences. In addition to recognition rate and mean jaccard index (MJI) as evaluation metrics used in our previous challenges, we also introduce the corrected segmentation rate (CSR) metric to evaluate the performance of temporal segmentation for continuous gesture recognition. Furthermore, we propose a bidirectional long short-term memory (Bi-LSTM) baseline method, determining the video division points based on the skeleton points extracted by convolutional pose machine (CPM). Experiments demonstrate that the proposed Bi-LSTM outperforms the state-of-the-art methods with an absolute improvement of 8.1% (from 0.8917 to 0.9639) of CSR.  
  Address May 2022  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes HUPBA; no menciona Approved no  
  Call Number Admin @ si @ WLW2022 Serial 3522  
Permanent link to this record
 

 
Author Joana Maria Pujadas-Mora; Alicia Fornes; Oriol Ramos Terrades; Josep Llados; Jialuo Chen; Miquel Valls-Figols; Anna Cabre edit  doi
openurl 
  Title The Barcelona Historical Marriage Database and the Baix Llobregat Demographic Database. From Algorithms for Handwriting Recognition to Individual-Level Demographic and Socioeconomic Data Type Journal
  Year 2022 Publication Historical Life Course Studies Abbreviated Journal HLCS  
  Volume 12 Issue Pages 99-132  
  Keywords Individual demographic databases; Computer vision, Record linkage; Social mobility; Inequality; Migration; Word spotting; Handwriting recognition; Local censuses; Marriage Licences  
  Abstract (down) The Barcelona Historical Marriage Database (BHMD) gathers records of the more than 600,000 marriages celebrated in the Diocese of Barcelona and their taxation registered in Barcelona Cathedral's so-called Marriage Licenses Books for the long period 1451–1905 and the BALL Demographic Database brings together the individual information recorded in the population registers, censuses and fiscal censuses of the main municipalities of the county of Baix Llobregat (Barcelona). In this ongoing collection 263,786 individual observations have been assembled, dating from the period between 1828 and 1965 by December 2020. The two databases started as part of different interdisciplinary research projects at the crossroads of Historical Demography and Computer Vision. Their construction uses artificial intelligence and computer vision methods as Handwriting Recognition to reduce the time of execution. However, its current state still requires some human intervention which explains the implemented crowdsourcing and game sourcing experiences. Moreover, knowledge graph techniques have allowed the application of advanced record linkage to link the same individuals and families across time and space. Moreover, we will discuss the main research lines using both databases developed so far in historical demography.  
  Address June 23, 2022  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes DAG; 600.121; 600.162; 602.230; 600.140 Approved no  
  Call Number Admin @ si @ PFR2022 Serial 3737  
Permanent link to this record
 

 
Author Lei Kang; Pau Riba; Marçal Rusiñol; Alicia Fornes; Mauricio Villegas edit   file
url  doi
openurl 
  Title Pay Attention to What You Read: Non-recurrent Handwritten Text-Line Recognition Type Journal Article
  Year 2022 Publication Pattern Recognition Abbreviated Journal PR  
  Volume 129 Issue Pages 108766  
  Keywords  
  Abstract (down) The advent of recurrent neural networks for handwriting recognition marked an important milestone reaching impressive recognition accuracies despite the great variability that we observe across different writing styles. Sequential architectures are a perfect fit to model text lines, not only because of the inherent temporal aspect of text, but also to learn probability distributions over sequences of characters and words. However, using such recurrent paradigms comes at a cost at training stage, since their sequential pipelines prevent parallelization. In this work, we introduce a non-recurrent approach to recognize handwritten text by the use of transformer models. We propose a novel method that bypasses any recurrence. By using multi-head self-attention layers both at the visual and textual stages, we are able to tackle character recognition as well as to learn language-related dependencies of the character sequences to be decoded. Our model is unconstrained to any predefined vocabulary, being able to recognize out-of-vocabulary words, i.e. words that do not appear in the training vocabulary. We significantly advance over prior art and demonstrate that satisfactory recognition accuracies are yielded even in few-shot learning scenarios.  
  Address Sept. 2022  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes DAG; 600.121; 600.162 Approved no  
  Call Number Admin @ si @ KRR2022 Serial 3556  
Permanent link to this record
 

 
Author Ayan Banerjee; Palaiahnakote Shivakumara; Parikshit Acharya; Umapada Pal; Josep Llados edit  url
doi  openurl
  Title TWD: A New Deep E2E Model for Text Watermark Detection in Video Images Type Conference Article
  Year 2022 Publication 26th International Conference on Pattern Recognition Abbreviated Journal  
  Volume Issue Pages  
  Keywords Deep learning; U-Net; FCENet; Scene text detection; Video text detection; Watermark text detection  
  Abstract (down) Text watermark detection in video images is challenging because text watermark characteristics are different from caption and scene texts in the video images. Developing a successful model for detecting text watermark, caption, and scene texts is an open challenge. This study aims at developing a new Deep End-to-End model for Text Watermark Detection (TWD), caption and scene text in video images. To standardize non-uniform contrast, quality, and resolution, we explore the U-Net3+ model for enhancing poor quality text without affecting high-quality text. Similarly, to address the challenges of arbitrary orientation, text shapes and complex background, we explore Stacked Hourglass Encoded Fourier Contour Embedding Network (SFCENet) by feeding the output of the U-Net3+ model as input. Furthermore, the proposed work integrates enhancement and detection models as an end-to-end model for detecting multi-type text in video images. To validate the proposed model, we create our own dataset (named TW-866), which provides video images containing text watermark, caption (subtitles), as well as scene text. The proposed model is also evaluated on standard natural scene text detection datasets, namely, ICDAR 2019 MLT, CTW1500, Total-Text, and DAST1500. The results show that the proposed method outperforms the existing methods. This is the first work on text watermark detection in video images to the best of our knowledge  
  Address Montreal; Quebec; Canada; August 2022  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference ICPR  
  Notes DAG; Approved no  
  Call Number Admin @ si @ BSA2022 Serial 3788  
Permanent link to this record
 

 
Author Pau Riba; Lutz Goldmann; Oriol Ramos Terrades; Diede Rusticus; Alicia Fornes; Josep Llados edit  doi
openurl 
  Title Table detection in business document images by message passing networks Type Journal Article
  Year 2022 Publication Pattern Recognition Abbreviated Journal PR  
  Volume 127 Issue Pages 108641  
  Keywords  
  Abstract (down) Tabular structures in business documents offer a complementary dimension to the raw textual data. For instance, there is information about the relationships among pieces of information. Nowadays, digital mailroom applications have become a key service for workflow automation. Therefore, the detection and interpretation of tables is crucial. With the recent advances in information extraction, table detection and recognition has gained interest in document image analysis, in particular, with the absence of rule lines and unknown information about rows and columns. However, business documents usually contain sensitive contents limiting the amount of public benchmarking datasets. In this paper, we propose a graph-based approach for detecting tables in document images which do not require the raw content of the document. Hence, the sensitive content can be previously removed and, instead of using the raw image or textual content, we propose a purely structural approach to keep sensitive data anonymous. Our framework uses graph neural networks (GNNs) to describe the local repetitive structures that constitute a table. In particular, our main application domain are business documents. We have carefully validated our approach in two invoice datasets and a modern document benchmark. Our experiments demonstrate that tables can be detected by purely structural approaches.  
  Address July 2022  
  Corporate Author Thesis  
  Publisher Elsevier Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes DAG; 600.162; 600.121 Approved no  
  Call Number Admin @ si @ RGR2022 Serial 3729  
Permanent link to this record
Select All    Deselect All
 |   | 
Details
   print

Save Citations:
Export Records: