Home | [21–30] << 31 32 33 34 35 36 37 38 39 40 >> [41–50] |
Records | |||||
---|---|---|---|---|---|
Author | Hannes Mueller; Andre Groeger; Jonathan Hersh; Andrea Matranga; Joan Serrat | ||||
Title | Monitoring war destruction from space using machine learning | Type | Journal Article | ||
Year | 2021 | Publication | Proceedings of the National Academy of Sciences of the United States of America | Abbreviated Journal | PNAS |
Volume | 118 | Issue | 23 | Pages | e2025400118 |
Keywords | |||||
Abstract | Existing data on building destruction in conflict zones rely on eyewitness reports or manual detection, which makes it generally scarce, incomplete, and potentially biased. This lack of reliable data imposes severe limitations for media reporting, humanitarian relief efforts, human-rights monitoring, reconstruction initiatives, and academic studies of violent conflict. This article introduces an automated method of measuring destruction in high-resolution satellite images using deep-learning techniques combined with label augmentation and spatial and temporal smoothing, which exploit the underlying spatial and temporal structure of destruction. As a proof of concept, we apply this method to the Syrian civil war and reconstruct the evolution of damage in major cities across the country. Our approach allows generating destruction data with unprecedented scope, resolution, and frequency—and makes use of the ever-higher frequency at which satellite imagery becomes available. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | ADAS; 600.118 | Approved | no | ||
Call Number | Admin @ si @ MGH2021 | Serial | 3584 | ||
Permanent link to this record | |||||
Author | Bhaskar Chakraborty; Jordi Gonzalez; Xavier Roca | ||||
Title | Large scale continuous visual event recognition using max-margin Hough transformation framework | Type | Journal Article | ||
Year | 2013 | Publication | Computer Vision and Image Understanding | Abbreviated Journal | CVIU |
Volume | 117 | Issue | 10 | Pages | 1356–1368 |
Keywords | |||||
Abstract | In this paper we propose a novel method for continuous visual event recognition (CVER) on a large scale video dataset using max-margin Hough transformation framework. Due to high scalability, diverse real environmental state and wide scene variability direct application of action recognition/detection methods such as spatio-temporal interest point (STIP)-local feature based technique, on the whole dataset is practically infeasible. To address this problem, we apply a motion region extraction technique which is based on motion segmentation and region clustering to identify possible candidate “event of interest” as a preprocessing step. On these candidate regions a STIP detector is applied and local motion features are computed. For activity representation we use generalized Hough transform framework where each feature point casts a weighted vote for possible activity class centre. A max-margin frame work is applied to learn the feature codebook weight. For activity detection, peaks in the Hough voting space are taken into account and initial event hypothesis is generated using the spatio-temporal information of the participating STIPs. For event recognition a verification Support Vector Machine is used. An extensive evaluation on benchmark large scale video surveillance dataset (VIRAT) and as well on a small scale benchmark dataset (MSR) shows that the proposed method is applicable on a wide range of continuous visual event recognition applications having extremely challenging conditions. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | 1077-3142 | ISBN | Medium | ||
Area | Expedition | Conference | |||
Notes | ISE | Approved | no | ||
Call Number | Admin @ si @ CGR2013 | Serial | 2413 | ||
Permanent link to this record | |||||
Author | Bhaskar Chakraborty; Michael Holte; Thomas B. Moeslund; Jordi Gonzalez | ||||
Title | Selective Spatio-Temporal Interest Points | Type | Journal Article | ||
Year | 2012 | Publication | Computer Vision and Image Understanding | Abbreviated Journal | CVIU |
Volume | 116 | Issue | 3 | Pages | 396-410 |
Keywords | |||||
Abstract | Recent progress in the field of human action recognition points towards the use of Spatio-TemporalInterestPoints (STIPs) for local descriptor-based recognition strategies. In this paper, we present a novel approach for robust and selective STIP detection, by applying surround suppression combined with local and temporal constraints. This new method is significantly different from existing STIP detection techniques and improves the performance by detecting more repeatable, stable and distinctive STIPs for human actors, while suppressing unwanted background STIPs. For action representation we use a bag-of-video words (BoV) model of local N-jet features to build a vocabulary of visual-words. To this end, we introduce a novel vocabulary building strategy by combining spatial pyramid and vocabulary compression techniques, resulting in improved performance and efficiency. Action class specific Support Vector Machine (SVM) classifiers are trained for categorization of human actions. A comprehensive set of experiments on popular benchmark datasets (KTH and Weizmann), more challenging datasets of complex scenes with background clutter and camera motion (CVC and CMU), movie and YouTube video clips (Hollywood 2 and YouTube), and complex scenes with multiple actors (MSR I and Multi-KTH), validates our approach and show state-of-the-art performance. Due to the unavailability of ground truth action annotation data for the Multi-KTH dataset, we introduce an actor specific spatio-temporal clustering of STIPs to address the problem of automatic action annotation of multiple simultaneous actors. Additionally, we perform cross-data action recognition by training on source datasets (KTH and Weizmann) and testing on completely different and more challenging target datasets (CVC, CMU, MSR I and Multi-KTH). This documents the robustness of our proposed approach in the realistic scenario, using separate training and test datasets, which in general has been a shortcoming in the performance evaluation of human action recognition techniques. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Elsevier | Place of Publication | Editor | ||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | 1077-3142 | ISBN | Medium | ||
Area | Expedition | Conference | |||
Notes | ISE | Approved | no | ||
Call Number | Admin @ si @ CHM2012 | Serial | 1806 | ||
Permanent link to this record | |||||
Author | Susana Alvarez; Anna Salvatella; Maria Vanrell; Xavier Otazu | ||||
Title | Low-dimensional and Comprehensive Color Texture Description | Type | Journal Article | ||
Year | 2012 | Publication | Computer Vision and Image Understanding | Abbreviated Journal | CVIU |
Volume | 116 | Issue | I | Pages | 54-67 |
Keywords | |||||
Abstract | Image retrieval can be dealt by combining standard descriptors, such as those of MPEG-7, which are defined independently for each visual cue (e.g. SCD or CLD for Color, HTD for texture or EHD for edges).
A common problem is to combine similarities coming from descriptors representing different concepts in different spaces. In this paper we propose a color texture description that bypasses this problem from its inherent definition. It is based on a low dimensional space with 6 perceptual axes. Texture is described in a 3D space derived from a direct implementation of the original Julesz’s Texton theory and color is described in a 3D perceptual space. This early fusion through the blob concept in these two bounded spaces avoids the problem and allows us to derive a sparse color-texture descriptor that achieves similar performance compared to MPEG-7 in image retrieval. Moreover, our descriptor presents comprehensive qualities since it can also be applied either in segmentation or browsing: (a) a dense image representation is defined from the descriptor showing a reasonable performance in locating texture patterns included in complex images; and (b) a vocabulary of basic terms is derived to build an intermediate level descriptor in natural language improving browsing by bridging semantic gap |
||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | 1077-3142 | ISBN | Medium | ||
Area | Expedition | Conference | |||
Notes | CAT;CIC | Approved | no | ||
Call Number | Admin @ si @ ASV2012 | Serial | 1827 | ||
Permanent link to this record | |||||
Author | Jordi Gonzalez; Thomas B. Moeslund; Liang Wang | ||||
Title | Semantic Understanding of Human Behaviors in Image Sequences: From video-surveillance to video-hermeneutics | Type | Journal Article | ||
Year | 2012 | Publication | Computer Vision and Image Understanding | Abbreviated Journal | CVIU |
Volume | 116 | Issue | 3 | Pages | 305–306 |
Keywords | |||||
Abstract | Purpose: Atheromatic plaque progression is affected, among others phenomena, by biomechanical, biochemical, and physiological factors. In this paper, the authors introduce a novel framework able to provide both morphological (vessel radius, plaque thickness, and type) and biomechanical (wall shear stress and Von Mises stress) indices of coronary arteries.Methods: First, the approach reconstructs the three-dimensional morphology of the vessel from intravascular ultrasound (IVUS) and Angiographic sequences, requiring minimal user interaction. Then, a computational pipeline allows to automatically assess fluid-dynamic and mechanical indices. Ten coronary arteries are analyzed illustrating the capabilities of the tool and confirming previous technical and clinical observations.Results: The relations between the arterial indices obtained by IVUS measurement and simulations have been quantitatively analyzed along the whole surface of the artery, extending the analysis of the coronary arteries shown in previous state of the art studies. Additionally, for the first time in the literature, the framework allows the computation of the membrane stresses using a simplified mechanical model of the arterial wall.Conclusions: Circumferentially (within a given frame), statistical analysis shows an inverse relation between the wall shear stress and the plaque thickness. At the global level (comparing a frame within the entire vessel), it is observed that heavy plaque accumulations are in general calcified and are located in the areas of the vessel having high wall shear stress. Finally, in their experiments the inverse proportionality between fluid and structural stresses is observed. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | 1077-3142 | ISBN | Medium | ||
Area | Expedition | Conference | |||
Notes | ISE | Approved | no | ||
Call Number | Admin @ si @ GMW2012 | Serial | 2005 | ||
Permanent link to this record | |||||
Author | Miquel Ferrer; Dimosthenis Karatzas; Ernest Valveny; I. Bardaji; Horst Bunke | ||||
Title | A Generic Framework for Median Graph Computation based on a Recursive Embedding Approach | Type | Journal Article | ||
Year | 2011 | Publication | Computer Vision and Image Understanding | Abbreviated Journal | CVIU |
Volume | 115 | Issue | 7 | Pages | 919-928 |
Keywords | Median Graph, Graph Embedding, Graph Matching, Structural Pattern Recognition | ||||
Abstract | The median graph has been shown to be a good choice to obtain a represen- tative of a set of graphs. However, its computation is a complex problem. Recently, graph embedding into vector spaces has been proposed to obtain approximations of the median graph. The problem with such an approach is how to go from a point in the vector space back to a graph in the graph space. The main contribution of this paper is the generalization of this previ- ous method, proposing a generic recursive procedure that permits to recover the graph corresponding to a point in the vector space, introducing only the amount of approximation inherent to the use of graph matching algorithms. In order to evaluate the proposed method, we compare it with the set me- dian and with the other state-of-the-art embedding-based methods for the median graph computation. The experiments are carried out using four dif- ferent databases (one semi-artificial and three containing real-world data). Results show that with the proposed approach we can obtain better medi- ans, in terms of the sum of distances to the training graphs, than with the previous existing methods. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | DAG | Approved | no | ||
Call Number | IAM @ iam @ FKV2011 | Serial | 1831 | ||
Permanent link to this record | |||||
Author | David Geronimo; Angel Sappa; Daniel Ponsa; Antonio Lopez | ||||
Title | 2D-3D based on-board pedestrian detection system | Type | Journal Article | ||
Year | 2010 | Publication | Computer Vision and Image Understanding | Abbreviated Journal | CVIU |
Volume | 114 | Issue | 5 | Pages | 583–595 |
Keywords | Pedestrian detection; Advanced Driver Assistance Systems; Horizon line; Haar wavelets; Edge orientation histograms | ||||
Abstract | During the next decade, on-board pedestrian detection systems will play a key role in the challenge of increasing traffic safety. The main target of these systems, to detect pedestrians in urban scenarios, implies overcoming difficulties like processing outdoor scenes from a mobile platform and searching for aspect-changing objects in cluttered environments. This makes such systems combine techniques in the state-of-the-art Computer Vision. In this paper we present a three module system based on both 2D and 3D cues. The first module uses 3D information to estimate the road plane parameters and thus select a coherent set of regions of interest (ROIs) to be further analyzed. The second module uses Real AdaBoost and a combined set of Haar wavelets and edge orientation histograms to classify the incoming ROIs as pedestrian or non-pedestrian. The final module loops again with the 3D cue in order to verify the classified ROIs and with the 2D in order to refine the final results. According to the results, the integration of the proposed techniques gives rise to a promising system. | ||||
Address | Computer Vision and Image Understanding (Special Issue on Intelligent Vision Systems), Vol. 114(5):583-595 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | 1077-3142 | ISBN | Medium | ||
Area | Expedition | Conference | |||
Notes | ADAS | Approved | no | ||
Call Number | ADAS @ adas @ GSP2010 | Serial | 1341 | ||
Permanent link to this record | |||||
Author | Fernando Vilariño; Debora Gil; Petia Radeva | ||||
Title | A Novel FLDA Formulation for Numerical Stability Analysis | Type | Book Chapter | ||
Year | 2004 | Publication | Recent Advances in Artificial Intelligence Research and Development | Abbreviated Journal | |
Volume | 113 | Issue | Pages | 77-84 | |
Keywords | Supervised Learning; Linear Discriminant Analysis; Numerical Stability; Computer Vision | ||||
Abstract | Fisher Linear Discriminant Analysis (FLDA) is one of the most popular techniques used in classification applying dimensional reduction. The numerical scheme involves the inversion of the within-class scatter matrix, which makes FLDA potentially ill-conditioned when it becomes singular. In this paper we present a novel explicit formulation of FLDA in terms of the eccentricity ratio and eigenvector orientations of the within-class scatter matrix. An analysis of this function will characterize those situations where FLDA response is not reliable because of numerical instability. This can solve common situations of poor classification performance in computer vision. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | IOS Press | Place of Publication | Editor | J. Vitrià, P. Radeva and I. Aguiló | |
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | 978-1-58603-466-5 | Medium | ||
Area | Expedition | Conference | |||
Notes | MV;IAM;MILAB;SIAI | Approved | no | ||
Call Number | IAM @ iam @ VGR2004 | Serial | 1663 | ||
Permanent link to this record | |||||
Author | Thanh Nam Le; Muhammad Muzzamil Luqman; Anjan Dutta; Pierre Heroux; Christophe Rigaud; Clement Guerin; Pasquale Foggia; Jean Christophe Burie; Jean Marc Ogier; Josep Llados; Sebastien Adam | ||||
Title | Subgraph spotting in graph representations of comic book images | Type | Journal Article | ||
Year | 2018 | Publication | Pattern Recognition Letters | Abbreviated Journal | PRL |
Volume | 112 | Issue | Pages | 118-124 | |
Keywords | Attributed graph; Region adjacency graph; Graph matching; Graph isomorphism; Subgraph isomorphism; Subgraph spotting; Graph indexing; Graph retrieval; Query by example; Dataset and comic book images | ||||
Abstract | Graph-based representations are the most powerful data structures for extracting, representing and preserving the structural information of underlying data. Subgraph spotting is an interesting research problem, especially for studying and investigating the structural information based content-based image retrieval (CBIR) and query by example (QBE) in image databases. In this paper we address the problem of lack of freely available ground-truthed datasets for subgraph spotting and present a new dataset for subgraph spotting in graph representations of comic book images (SSGCI) with its ground-truth and evaluation protocol. Experimental results of two state-of-the-art methods of subgraph spotting are presented on the new SSGCI dataset. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | DAG; 600.097; 600.121 | Approved | no | ||
Call Number | Admin @ si @ LLD2018 | Serial | 3150 | ||
Permanent link to this record | |||||
Author | Juan Jose Rubio; Takahiro Kashiwa; Teera Laiteerapong; Wenlong Deng; Kohei Nagai; Sergio Escalera; Kotaro Nakayama; Yutaka Matsuo; Helmut Prendinger | ||||
Title | Multi-class structural damage segmentation using fully convolutional networks | Type | Journal Article | ||
Year | 2019 | Publication | Computers in Industry | Abbreviated Journal | COMPUTIND |
Volume | 112 | Issue | Pages | 103121 | |
Keywords | Bridge damage detection; Deep learning; Semantic segmentation | ||||
Abstract | Structural Health Monitoring (SHM) has benefited from computer vision and more recently, Deep Learning approaches, to accurately estimate the state of deterioration of infrastructure. In our work, we test Fully Convolutional Networks (FCNs) with a dataset of deck areas of bridges for damage segmentation. We create a dataset for delamination and rebar exposure that has been collected from inspection records of bridges in Niigata Prefecture, Japan. The dataset consists of 734 images with three labels per image, which makes it the largest dataset of images of bridge deck damage. This data allows us to estimate the performance of our method based on regions of agreement, which emulates the uncertainty of in-field inspections. We demonstrate the practicality of FCNs to perform automated semantic segmentation of surface damages. Our model achieves a mean accuracy of 89.7% for delamination and 78.4% for rebar exposure, and a weighted F1 score of 81.9%. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | HuPBA; no proj | Approved | no | ||
Call Number | Admin @ si @ RKL2019 | Serial | 3315 | ||
Permanent link to this record | |||||
Author | Lei Kang; Pau Riba; Mauricio Villegas; Alicia Fornes; Marçal Rusiñol | ||||
Title | Candidate Fusion: Integrating Language Modelling into a Sequence-to-Sequence Handwritten Word Recognition Architecture | Type | Journal Article | ||
Year | 2021 | Publication | Pattern Recognition | Abbreviated Journal | PR |
Volume | 112 | Issue | Pages | 107790 | |
Keywords | |||||
Abstract | Sequence-to-sequence models have recently become very popular for tackling
handwritten word recognition problems. However, how to effectively integrate an external language model into such recognizer is still a challenging problem. The main challenge faced when training a language model is to deal with the language model corpus which is usually different to the one used for training the handwritten word recognition system. Thus, the bias between both word corpora leads to incorrectness on the transcriptions, providing similar or even worse performances on the recognition task. In this work, we introduce Candidate Fusion, a novel way to integrate an external language model to a sequence-to-sequence architecture. Moreover, it provides suggestions from an external language knowledge, as a new input to the sequence-to-sequence recognizer. Hence, Candidate Fusion provides two improvements. On the one hand, the sequence-to-sequence recognizer has the flexibility not only to combine the information from itself and the language model, but also to choose the importance of the information provided by the language model. On the other hand, the external language model has the ability to adapt itself to the training corpus and even learn the most commonly errors produced from the recognizer. Finally, by conducting comprehensive experiments, the Candidate Fusion proves to outperform the state-of-the-art language models for handwritten word recognition tasks. |
||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | DAG; 600.140; 601.302; 601.312; 600.121 | Approved | no | ||
Call Number | Admin @ si @ KRV2021 | Serial | 3343 | ||
Permanent link to this record | |||||
Author | Jorge Charco; Angel Sappa; Boris X. Vintimilla; Henry Velesaca | ||||
Title | Camera pose estimation in multi-view environments: From virtual scenarios to the real world | Type | Journal Article | ||
Year | 2021 | Publication | Image and Vision Computing | Abbreviated Journal | IVC |
Volume | 110 | Issue | Pages | 104182 | |
Keywords | |||||
Abstract | This paper presents a domain adaptation strategy to efficiently train network architectures for estimating the relative camera pose in multi-view scenarios. The network architectures are fed by a pair of simultaneously acquired images, hence in order to improve the accuracy of the solutions, and due to the lack of large datasets with pairs of overlapped images, a domain adaptation strategy is proposed. The domain adaptation strategy consists on transferring the knowledge learned from synthetic images to real-world scenarios. For this, the networks are firstly trained using pairs of synthetic images, which are captured at the same time by a pair of cameras in a virtual environment; and then, the learned weights of the networks are transferred to the real-world case, where the networks are retrained with a few real images. Different virtual 3D scenarios are generated to evaluate the relationship between the accuracy on the result and the similarity between virtual and real scenarios—similarity on both geometry of the objects contained in the scene as well as relative pose between camera and objects in the scene. Experimental results and comparisons are provided showing that the accuracy of all the evaluated networks for estimating the camera pose improves when the proposed domain adaptation strategy is used, highlighting the importance on the similarity between virtual-real scenarios. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | MSIAU; 600.130; 600.122 | Approved | no | ||
Call Number | Admin @ si @ CSV2021 | Serial | 3577 | ||
Permanent link to this record | |||||
Author | Wenlong Deng; Yongli Mou; Takahiro Kashiwa; Sergio Escalera; Kohei Nagai; Kotaro Nakayama; Yutaka Matsuo; Helmut Prendinger | ||||
Title | Vision based Pixel-level Bridge Structural Damage Detection Using a Link ASPP Network | Type | Journal Article | ||
Year | 2020 | Publication | Automation in Construction | Abbreviated Journal | AC |
Volume | 110 | Issue | Pages | 102973 | |
Keywords | Semantic image segmentation; Deep learning | ||||
Abstract | Structural Health Monitoring (SHM) has greatly benefited from computer vision. Recently, deep learning approaches are widely used to accurately estimate the state of deterioration of infrastructure. In this work, we focus on the problem of bridge surface structural damage detection, such as delamination and rebar exposure. It is well known that the quality of a deep learning model is highly dependent on the quality of the training dataset. Bridge damage detection, our application domain, has the following main challenges: (i) labeling the damages requires knowledgeable civil engineering professionals, which makes it difficult to collect a large annotated dataset; (ii) the damage area could be very small, whereas the background area is large, which creates an unbalanced training environment; (iii) due to the difficulty to exactly determine the extension of the damage, there is often a variation among different labelers who perform pixel-wise labeling. In this paper, we propose a novel model for bridge structural damage detection to address the first two challenges. This paper follows the idea of an atrous spatial pyramid pooling (ASPP) module that is designed as a novel network for bridge damage detection. Further, we introduce the weight balanced Intersection over Union (IoU) loss function to achieve accurate segmentation on a highly unbalanced small dataset. The experimental results show that (i) the IoU loss function improves the overall performance of damage detection, as compared to cross entropy loss or focal loss, and (ii) the proposed model has a better ability to detect a minority class than other light segmentation networks. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | HuPBA; no proj | Approved | no | ||
Call Number | Admin @ si @ DMK2020 | Serial | 3314 | ||
Permanent link to this record | |||||
Author | Andres Mafla; Ruben Tito; Sounak Dey; Lluis Gomez; Marçal Rusiñol; Ernest Valveny; Dimosthenis Karatzas | ||||
Title | Real-time Lexicon-free Scene Text Retrieval | Type | Journal Article | ||
Year | 2021 | Publication | Pattern Recognition | Abbreviated Journal | PR |
Volume | 110 | Issue | Pages | 107656 | |
Keywords | |||||
Abstract | In this work, we address the task of scene text retrieval: given a text query, the system returns all images containing the queried text. The proposed model uses a single shot CNN architecture that predicts bounding boxes and builds a compact representation of spotted words. In this way, this problem can be modeled as a nearest neighbor search of the textual representation of a query over the outputs of the CNN collected from the totality of an image database. Our experiments demonstrate that the proposed model outperforms previous state-of-the-art, while offering a significant increase in processing speed and unmatched expressiveness with samples never seen at training time. Several experiments to assess the generalization capability of the model are conducted in a multilingual dataset, as well as an application of real-time text spotting in videos. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | DAG; 600.121; 600.129; 601.338 | Approved | no | ||
Call Number | Admin @ si @ MTD2021 | Serial | 3493 | ||
Permanent link to this record | |||||
Author | Meysam Madadi; Hugo Bertiche; Sergio Escalera | ||||
Title | SMPLR: Deep learning based SMPL reverse for 3D human pose and shape recovery | Type | Journal Article | ||
Year | 2020 | Publication | Pattern Recognition | Abbreviated Journal | PR |
Volume | 106 | Issue | Pages | 107472 | |
Keywords | Deep learning; 3D Human pose; Body shape; SMPL; Denoising autoencoder; Volumetric stack hourglass | ||||
Abstract | In this paper we propose to embed SMPL within a deep-based model to accurately estimate 3D pose and shape from a still RGB image. We use CNN-based 3D joint predictions as an intermediate representation to regress SMPL pose and shape parameters. Later, 3D joints are reconstructed again in the SMPL output. This module can be seen as an autoencoder where the encoder is a deep neural network and the decoder is SMPL model. We refer to this as SMPL reverse (SMPLR). By implementing SMPLR as an encoder-decoder we avoid the need of complex constraints on pose and shape. Furthermore, given that in-the-wild datasets usually lack accurate 3D annotations, it is desirable to lift 2D joints to 3D without pairing 3D annotations with RGB images. Therefore, we also propose a denoising autoencoder (DAE) module between CNN and SMPLR, able to lift 2D joints to 3D and partially recover from structured error. We evaluate our method on SURREAL and Human3.6M datasets, showing improvement over SMPL-based state-of-the-art alternatives by about 4 and 12 mm, respectively. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | HuPBA; no proj | Approved | no | ||
Call Number | Admin @ si @ MBE2020 | Serial | 3439 | ||
Permanent link to this record |