Home | [121–130] << 131 132 133 134 135 136 137 138 139 140 >> [141–150] |
![]() |
Records | |||||
---|---|---|---|---|---|
Author | Diego Velazquez; Josep M. Gonfaus; Pau Rodriguez; Xavier Roca; Seiichi Ozawa; Jordi Gonzalez | ||||
Title | Logo Detection With No Priors | Type | Journal Article | ||
Year | 2021 | Publication | IEEE Access | Abbreviated Journal | ACCESS |
Volume | 9 | Issue | Pages | 106998-107011 | |
Keywords | |||||
Abstract ![]() |
In recent years, top referred methods on object detection like R-CNN have implemented this task as a combination of proposal region generation and supervised classification on the proposed bounding boxes. Although this pipeline has achieved state-of-the-art results in multiple datasets, it has inherent limitations that make object detection a very complex and inefficient task in computational terms. Instead of considering this standard strategy, in this paper we enhance Detection Transformers (DETR) which tackles object detection as a set-prediction problem directly in an end-to-end fully differentiable pipeline without requiring priors. In particular, we incorporate Feature Pyramids (FP) to the DETR architecture and demonstrate the effectiveness of the resulting DETR-FP approach on improving logo detection results thanks to the improved detection of small logos. So, without requiring any domain specific prior to be fed to the model, DETR-FP obtains competitive results on the OpenLogo and MS-COCO datasets offering a relative improvement of up to 30%, when compared to a Faster R-CNN baseline which strongly depends on hand-designed priors. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | ISE | Approved | no | ||
Call Number | Admin @ si @ VGR2021 | Serial | 3664 | ||
Permanent link to this record | |||||
Author | Mireia Sole; Joan Blanco; Debora Gil; G. Fonseka; Richard Frodsham; Francesca Vidal; Zaida Sarrate | ||||
Title | Noves perspectives en l estudi de la territorialitat cromosomica de cel·lules germinals masculines: estudis tridimensionals | Type | Journal | ||
Year | 2017 | Publication | Biologia de la Reproduccio | Abbreviated Journal | JBR |
Volume | 15 | Issue | Pages | 73-78 | |
Keywords | |||||
Abstract ![]() |
In somatic cells, chromosomes occupy specific nuclear regions called chromosome territories which are involved in the
maintenance and regulation of the genome. Preliminary data in male germ cells also suggest the importance of chromosome territoriality in cell functionality. Nevertheless, the specific characteristics of testicular tissue (presence of different cell types with different morphological characteristics, in different stages of development and with different ploidy) makes difficult to achieve conclusive results. In this study we have developed a methodology to approach the threedimensional study of all chromosome territories in male germ cells from C57BL/6J mice (Mus musculus). The method includes the following steps: i) Optimized cell fixation to obtain an optimal preservation of the three-dimensionality cell morphology, ii) Chromosome identification by FISH (Chromoprobe Multiprobe® OctoChrome™ Murine System; Cytocell) and confocal microscopy (TCS-SP5, Leica Microsystems), iii) Cell type identification by immunofluorescence iv) Image analysis using Matlab scripts, v) Numerical data extraction related to chromosome features, chromosome radial position and chromosome relative position. This methodology allows the unequivocally identification and the analysis of the chromosome territories of all spermatogenic stages. Results will provide information about the features that determine chromosomal position, preferred associations between chromosomes, and the relationship between chromosome positioning and genome regulation. |
||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | 978-84-697-3767-5 | Medium | ||
Area | Expedition | Conference | |||
Notes | IAM; 600.096; 600.145 | Approved | no | ||
Call Number | Admin @ si @ SBG2017c | Serial | 2961 | ||
Permanent link to this record | |||||
Author | Cesar Isaza; Joaquin Salas; Bogdan Raducanu | ||||
Title | Evaluation of Intrinsic Image Algorithms to Detect the Shadows Cast by Static Objects Outdoors | Type | Journal Article | ||
Year | 2012 | Publication | Sensors | Abbreviated Journal | SENS |
Volume | 12 | Issue | 10 | Pages | 13333-13348 |
Keywords | |||||
Abstract ![]() |
In some automatic scene analysis applications, the presence of shadows becomes a nuisance that is necessary to deal with. As a consequence, a preliminary stage in many computer vision algorithms is to attenuate their effect. In this paper, we focus our attention on the detection of shadows cast by static objects outdoors, as the scene is viewed for extended periods of time (days, weeks) from a fixed camera and considering daylight intervals where the main source of light is the sun. In this context, we report two contributions. First, we introduce the use of synthetic images for which ground truth can be generated automatically, avoiding the tedious effort of manual annotation. Secondly, we report a novel application of the intrinsic image concept to the automatic detection of shadows cast by static objects in outdoors. We make both a quantitative and a qualitative evaluation of several algorithms based on this image representation. For the quantitative evaluation, we used the synthetic data set, while for the qualitative evaluation we used both data sets. Our experimental results show that the evaluated methods can partially solve the problem of shadow detection. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | OR;MV | Approved | no | ||
Call Number | Admin @ si @ ISR2012b | Serial | 2173 | ||
Permanent link to this record | |||||
Author | S. Chanda; Umapada Pal; Oriol Ramos Terrades | ||||
Title | Word-Wise Thai and Roman Script Identification | Type | Journal | ||
Year | 2009 | Publication | ACM Transactions on Asian Language Information Processing | Abbreviated Journal | TALIP |
Volume | 8 | Issue | 3 | Pages | 1-21 |
Keywords | |||||
Abstract ![]() |
In some Thai documents, a single text line of a printed document page may contain words of both Thai and Roman scripts. For the Optical Character Recognition (OCR) of such a document page it is better to identify, at first, Thai and Roman script portions and then to use individual OCR systems of the respective scripts on these identified portions. In this article, an SVM-based method is proposed for identification of word-wise printed Roman and Thai scripts from a single line of a document page. Here, at first, the document is segmented into lines and then lines are segmented into character groups (words). In the proposed scheme, we identify the script of a character group combining different character features obtained from structural shape, profile behavior, component overlapping information, topological properties, and water reservoir concept, etc. Based on the experiment on 10,000 data (words) we obtained 99.62% script identification accuracy from the proposed scheme. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | 1530-0226 | ISBN | Medium | ||
Area | Expedition | Conference | |||
Notes | DAG | Approved | no | ||
Call Number | Admin @ si @ CPR2009f | Serial | 1869 | ||
Permanent link to this record | |||||
Author | Mikhail Mozerov; Joost Van de Weijer | ||||
Title | Accurate stereo matching by two step global optimization | Type | Journal Article | ||
Year | 2015 | Publication | IEEE Transactions on Image Processing | Abbreviated Journal | TIP |
Volume | 24 | Issue | 3 | Pages | 1153-1163 |
Keywords | |||||
Abstract ![]() |
In stereo matching cost filtering methods and energy minimization algorithms are considered as two different techniques. Due to their global extend energy minimization methods obtain good stereo matching results. However, they tend to fail in occluded regions, in which cost filtering approaches obtain better results. In this paper we intend to combine both approaches with the aim to improve overall stereo matching results. We show that a global optimization with a fully connected model can be solved by cost fil tering methods. Based on this observation we propose to perform stereo matching as a two-step energy minimization algorithm. We consider two MRF models: a fully connected model defined on the complete set of pixels in an image and a conventional locally connected model. We solve the energy minimization problem for the fully connected model, after which the marginal function of the solution is used as the unary potential in the locally connected MRF model. Experiments on the Middlebury stereo datasets show that the proposed method achieves state-of-the-arts results. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | 1057-7149 | ISBN | Medium | ||
Area | Expedition | Conference | |||
Notes | ISE; LAMP; 600.079; 600.078 | Approved | no | ||
Call Number | Admin @ si @ MoW2015a | Serial | 2568 | ||
Permanent link to this record | |||||
Author | Mohammad Ali Bagheri; Qigang Gao; Sergio Escalera | ||||
Title | Action Recognition by Pairwise Proximity Function Support Vector Machines with Dynamic Time Warping Kernels | Type | Conference Article | ||
Year | 2016 | Publication | 29th Canadian Conference on Artificial Intelligence | Abbreviated Journal | |
Volume | 9673 | Issue | Pages | 3-14 | |
Keywords | |||||
Abstract ![]() |
In the context of human action recognition using skeleton data, the 3D trajectories of joint points may be considered as multi-dimensional time series. The traditional recognition technique in the literature is based on time series dis(similarity) measures (such as Dynamic Time Warping). For these general dis(similarity) measures, k-nearest neighbor algorithms are a natural choice. However, k-NN classifiers are known to be sensitive to noise and outliers. In this paper, a new class of Support Vector Machine that is applicable to trajectory classification, such as action recognition, is developed by incorporating an efficient time-series distances measure into the kernel function. More specifically, the derivative of Dynamic Time Warping (DTW) distance measure is employed as the SVM kernel. In addition, the pairwise proximity learning strategy is utilized in order to make use of non-positive semi-definite (PSD) kernels in the SVM formulation. The recognition results of the proposed technique on two action recognition datasets demonstrates the ourperformance of our methodology compared to the state-of-the-art methods. Remarkably, we obtained 89 % accuracy on the well-known MSRAction3D dataset using only 3D trajectories of body joints obtained by Kinect | ||||
Address | Victoria; Canada; May 2016 | ||||
Corporate Author | Thesis | ||||
Publisher | Springer International Publishing | Place of Publication | Editor | ||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | AI | ||
Notes | HuPBA;MILAB; | Approved | no | ||
Call Number | Admin @ si @ BGE2016b | Serial | 2770 | ||
Permanent link to this record | |||||
Author | Lichao Zhang | ||||
Title | Towards end-to-end Networks for Visual Tracking in RGB and TIR Videos | Type | Book Whole | ||
Year | 2019 | Publication | PhD Thesis, Universitat Autonoma de Barcelona-CVC | Abbreviated Journal | |
Volume | Issue | Pages | |||
Keywords | |||||
Abstract ![]() |
In the current work, we identify several problems of current tracking systems. The lack of large-scale labeled datasets hampers the usage of deep learning, especially end-to-end training, for tracking in TIR images. Therefore, many methods for tracking on TIR data are still based on hand-crafted features. This situation also happens in multi-modal tracking, e.g. RGB-T tracking. Another reason, which hampers the development of RGB-T tracking, is that there exists little research on the fusion mechanisms for combining information from RGB and TIR modalities. One of the crucial components of most trackers is the update module. For the currently existing end-to-end tracking architecture, e.g, Siamese trackers, the online model update is still not taken into consideration at the training stage. They use no-update or a linear update strategy during the inference stage. While such a hand-crafted approach to updating has led to improved results, its simplicity limits the potential gain likely to be obtained by learning to update.
To address the data-scarcity for TIR and RGB-T tracking, we use image-to-image translation to generate a large-scale synthetic TIR dataset. This dataset allows us to perform end-to-end training for TIR tracking. Furthermore, we investigate several fusion mechanisms for RGB-T tracking. The multi-modal trackers are also trained in an end-to-end manner on the synthetic data. To improve the standard online update, we pose the updating step as an optimization problem which can be solved by training a neural network. Our approach thereby reduces the hand-crafted components in the tracking pipeline and sets a further step in the direction of a complete end-to-end trained tracking network which also considers updating during optimization. |
||||
Address | November 2019 | ||||
Corporate Author | Thesis | Ph.D. thesis | |||
Publisher | Ediciones Graficas Rey | Place of Publication | Editor | Joost Van de Weijer;Abel Gonzalez;Fahad Shahbaz Khan | |
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | 978-84-1210011-1-9 | Medium | ||
Area | Expedition | Conference | |||
Notes | LAMP; 600.141; 600.120 | Approved | no | ||
Call Number | Admin @ si @ Zha2019 | Serial | 3393 | ||
Permanent link to this record | |||||
Author | Santiago Segui | ||||
Title | Contributions to the Diagnosis of Intestinal Motility by Automatic Image Analysis | Type | Book Whole | ||
Year | 2011 | Publication | PhD Thesis, Universitat de Barcelona-CVC | Abbreviated Journal | |
Volume | Issue | Pages | |||
Keywords | |||||
Abstract ![]() |
In the early twenty first century Given Imaging Ltd. presented wireless capsule endoscopy (WCE) as a new technological breakthrough that allowed the visualization of
the intestine by using a small, swallowed camera. This small size device was received with a high enthusiasm within the medical community, and until now, it is still one of the medical devices with the highest use growth rate. WCE can be used as a novel diagnostic tool that presents several clinical advantages, since it is non-invasive and at the same time it provides, for the first time, a full picture of the small bowel morphology, contents and dynamics. Since its appearance, the WCE has been used to detect several intestinal dysfunctions such as: polyps, ulcers and bleeding. However, the visual analysis of WCE videos presents an important drawback: the long time required by the physicians for proper video visualization. In this sense and regarding to this limitation, the development of computer aided systems is required for the extensive use of WCE in the medical community. The work presented in this thesis is a set of contributions for the automatic image analysis and computer-aided diagnosis of intestinal motility disorders using WCE. Until now, the diagnosis of small bowel motility dysfunctions was basically performed by invasive techniques such as the manometry test, which can only be conducted at some referral centers around the world owing to the complexity of the procedure and the medial expertise required in the interpretation of the results. Our contributions are divided in three main blocks: 1. Image analysis by computer vision techniques to detect events in the endoluminal WCE scene. Several methods have been proposed to detect visual events such as: intestinal contractions, intestinal content, tunnel and wrinkles; 2. Machine learning techniques for the analysis and the manipulation of the data from WCE. These methods have been proposed in order to overcome the problems that the analysis of WCE presents such as: video acquisition cost, unlabeled data and large number of data; 3. Two different systems for the computer-aided diagnosis of intestinal motility disorders using WCE. The first system presents a fully automatic method that aids at discriminating healthy subjects from patients with severe intestinal motor disorders like pseudo-obstruction or food intolerance. The second system presents another automatic method that models healthy subjects and discriminate them from mild intestinal motility patients. |
||||
Address | |||||
Corporate Author | Thesis | Ph.D. thesis | |||
Publisher | Ediciones Graficas Rey | Place of Publication | Editor | Jordi Vitria | |
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | MILAB | Approved | no | ||
Call Number | Admin @ si @ Seg2011 | Serial | 1836 | ||
Permanent link to this record | |||||
Author | Daniel Marczak; Sebastian Cygert; Tomasz Trzcinski; Bartlomiej Twardowski | ||||
Title | Revisiting Supervision for Continual Representation Learning | Type | Miscellaneous | ||
Year | 2023 | Publication | Arxiv | Abbreviated Journal | |
Volume | Issue | Pages | |||
Keywords | |||||
Abstract ![]() |
In the field of continual learning, models are designed to learn tasks one after the other. While most research has centered on supervised continual learning, recent studies have highlighted the strengths of self-supervised continual representation learning. The improved transferability of representations built with self-supervised methods is often associated with the role played by the multi-layer perceptron projector. In this work, we depart from this observation and reexamine the role of supervision in continual representation learning. We reckon that additional information, such as human annotations, should not deteriorate the quality of representations. Our findings show that supervised models when enhanced with a multi-layer perceptron head, can outperform self-supervised models in continual representation learning. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | xxx | Approved | no | ||
Call Number | Admin @ si @ MCT2023 | Serial | 4013 | ||
Permanent link to this record | |||||
Author | Palaiahnakote Shivakumara; Anjan Dutta; Trung Quy Phan; Chew Lim Tan; Umapada Pal | ||||
Title | A Novel Mutual Nearest Neighbor based Symmetry for Text Frame Classification in Video | Type | Journal Article | ||
Year | 2011 | Publication | Pattern Recognition | Abbreviated Journal | PR |
Volume | 44 | Issue | 8 | Pages | 1671-1683 |
Keywords | |||||
Abstract ![]() |
In the field of multimedia retrieval in video, text frame classification is essential for text detection, event detection, event boundary detection, etc. We propose a new text frame classification method that introduces a combination of wavelet and median moment with k-means clustering to select probable text blocks among 16 equally sized blocks of a video frame. The same feature combination is used with a new Max–Min clustering at the pixel level to choose probable dominant text pixels in the selected probable text blocks. For the probable text pixels, a so-called mutual nearest neighbor based symmetry is explored with a four-quadrant formation centered at the centroid of the probable dominant text pixels to know whether a block is a true text block or not. If a frame produces at least one true text block then it is considered as a text frame otherwise it is a non-text frame. Experimental results on different text and non-text datasets including two public datasets and our own created data show that the proposed method gives promising results in terms of recall and precision at the block and frame levels. Further, we also show how existing text detection methods tend to misclassify non-text frames as text frames in term of recall and precision at both the block and frame levels. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | DAG | Approved | no | ||
Call Number | Admin @ si @ SDP2011 | Serial | 1727 | ||
Permanent link to this record | |||||
Author | Trevor Canham; Javier Vazquez; Elise Mathieu; Marcelo Bertalmío | ||||
Title | Matching visual induction effects on screens of different size | Type | Journal Article | ||
Year | 2021 | Publication | Journal of Vision | Abbreviated Journal | JOV |
Volume | 21 | Issue | 6(10) | Pages | 1-22 |
Keywords | |||||
Abstract ![]() |
In the film industry, the same movie is expected to be watched on displays of vastly different sizes, from cinema screens to mobile phones. But visual induction, the perceptual phenomenon by which the appearance of a scene region is affected by its surroundings, will be different for the same image shown on two displays of different dimensions. This phenomenon presents a practical challenge for the preservation of the artistic intentions of filmmakers, because it can lead to shifts in image appearance between viewing destinations. In this work, we show that a neural field model based on the efficient representation principle is able to predict induction effects and how, by regularizing its associated energy functional, the model is still able to represent induction but is now invertible. From this finding, we propose a method to preprocess an image in a screen–size dependent way so that its perception, in terms of visual induction, may remain constant across displays of different size. The potential of the method is demonstrated through psychophysical experiments on synthetic images and qualitative examples on natural images. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | CIC | Approved | no | ||
Call Number | Admin @ si @ CVM2021 | Serial | 3595 | ||
Permanent link to this record | |||||
Author | Yaxing Wang; Salman Khan; Abel Gonzalez-Garcia; Joost Van de Weijer; Fahad Shahbaz Khan | ||||
Title | Semi-supervised Learning for Few-shot Image-to-Image Translation | Type | Conference Article | ||
Year | 2020 | Publication | 33rd IEEE Conference on Computer Vision and Pattern Recognition | Abbreviated Journal | |
Volume | Issue | Pages | |||
Keywords | |||||
Abstract ![]() |
In the last few years, unpaired image-to-image translation has witnessed remarkable progress. Although the latest methods are able to generate realistic images, they crucially rely on a large number of labeled images. Recently, some methods have tackled the challenging setting of few-shot image-to-image translation, reducing the labeled data requirements for the target domain during inference. In this work, we go one step further and reduce the amount of required labeled data also from the source domain during training. To do so, we propose applying semi-supervised learning via a noise-tolerant pseudo-labeling procedure. We also apply a cycle consistency constraint to further exploit the information from unlabeled images, either from the same dataset or external. Additionally, we propose several structural modifications to facilitate the image translation task under these circumstances. Our semi-supervised method for few-shot image translation, called SEMIT, achieves excellent results on four different datasets using as little as 10% of the source labels, and matches the performance of the main fully-supervised competitor using only 20% labeled data. Our code and models are made public at: this https URL. | ||||
Address | Virtual; June 2020 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | CVPR | ||
Notes | LAMP; 600.120 | Approved | no | ||
Call Number | Admin @ si @ WKG2020 | Serial | 3486 | ||
Permanent link to this record | |||||
Author | Marc Serra; Olivier Penacchio; Robert Benavente; Maria Vanrell | ||||
Title | Names and Shades of Color for Intrinsic Image Estimation | Type | Conference Article | ||
Year | 2012 | Publication | 25th IEEE Conference on Computer Vision and Pattern Recognition | Abbreviated Journal | |
Volume | Issue | Pages | 278-285 | ||
Keywords | |||||
Abstract ![]() |
In the last years, intrinsic image decomposition has gained attention. Most of the state-of-the-art methods are based on the assumption that reflectance changes come along with strong image edges. Recently, user intervention in the recovery problem has proved to be a remarkable source of improvement. In this paper, we propose a novel approach that aims to overcome the shortcomings of pure edge-based methods by introducing strong surface descriptors, such as the color-name descriptor which introduces high-level considerations resembling top-down intervention. We also use a second surface descriptor, termed color-shade, which allows us to include physical considerations derived from the image formation model capturing gradual color surface variations. Both color cues are combined by means of a Markov Random Field. The method is quantitatively tested on the MIT ground truth dataset using different error metrics, achieving state-of-the-art performance. | ||||
Address | Providence, Rhode Island | ||||
Corporate Author | Thesis | ||||
Publisher | IEEE Xplore | Place of Publication | Editor | ||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | 1063-6919 | ISBN | 978-1-4673-1226-4 | Medium | |
Area | Expedition | Conference | CVPR | ||
Notes | CIC | Approved | no | ||
Call Number | Admin @ si @ SPB2012 | Serial | 2026 | ||
Permanent link to this record | |||||
Author | Manuel Carbonell; Alicia Fornes; Mauricio Villegas; Josep Llados | ||||
Title | A Neural Model for Text Localization, Transcription and Named Entity Recognition in Full Pages | Type | Journal Article | ||
Year | 2020 | Publication | Pattern Recognition Letters | Abbreviated Journal | PRL |
Volume | 136 | Issue | Pages | 219-227 | |
Keywords | |||||
Abstract ![]() |
In the last years, the consolidation of deep neural network architectures for information extraction in document images has brought big improvements in the performance of each of the tasks involved in this process, consisting of text localization, transcription, and named entity recognition. However, this process is traditionally performed with separate methods for each task. In this work we propose an end-to-end model that combines a one stage object detection network with branches for the recognition of text and named entities respectively in a way that shared features can be learned simultaneously from the training error of each of the tasks. By doing so the model jointly performs handwritten text detection, transcription, and named entity recognition at page level with a single feed forward step. We exhaustively evaluate our approach on different datasets, discussing its advantages and limitations compared to sequential approaches. The results show that the model is capable of benefiting from shared features by simultaneously solving interdependent tasks. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | DAG; 600.140; 601.311; 600.121 | Approved | no | ||
Call Number | Admin @ si @ CFV2020 | Serial | 3451 | ||
Permanent link to this record | |||||
Author | Arnau Baro; Pau Riba; Alicia Fornes | ||||
Title | A Starting Point for Handwritten Music Recognition | Type | Conference Article | ||
Year | 2018 | Publication | 1st International Workshop on Reading Music Systems | Abbreviated Journal | |
Volume | Issue | Pages | 5-6 | ||
Keywords | Optical Music Recognition; Long Short-Term Memory; Convolutional Neural Networks; MUSCIMA++; CVCMUSCIMA | ||||
Abstract ![]() |
In the last years, the interest in Optical Music Recognition (OMR) has reawakened, especially since the appearance of deep learning. However, there are very few works addressing handwritten scores. In this work we describe a full OMR pipeline for handwritten music scores by using Convolutional and Recurrent Neural Networks that could serve as a baseline for the research community. | ||||
Address | Paris; France; September 2018 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | WORMS | ||
Notes | DAG; 600.097; 601.302; 601.330; 600.121 | Approved | no | ||
Call Number | Admin @ si @ BRF2018 | Serial | 3223 | ||
Permanent link to this record |