Home | << 1 2 3 4 5 6 7 8 9 10 >> [11–11] |
Records | |||||
---|---|---|---|---|---|
Author | Maryam Asadi-Aghbolaghi; Albert Clapes; Marco Bellantonio; Hugo Jair Escalante; Victor Ponce; Xavier Baro; Isabelle Guyon; Shohreh Kasaei; Sergio Escalera | ||||
Title | A survey on deep learning based approaches for action and gesture recognition in image sequences | Type | Conference Article | ||
Year | 2017 | Publication | 12th IEEE International Conference on Automatic Face and Gesture Recognition | Abbreviated Journal | |
Volume | Issue | Pages | |||
Keywords | |||||
Abstract | The interest in action and gesture recognition has grown considerably in the last years. In this paper, we present a survey on current deep learning methodologies for action and gesture recognition in image sequences. We introduce a taxonomy that summarizes important aspects of deep learning
for approaching both tasks. We review the details of the proposed architectures, fusion strategies, main datasets, and competitions. We summarize and discuss the main works proposed so far with particular interest on how they treat the temporal dimension of data, discussing their main features and identify opportunities and challenges for future research. |
||||
Address | Washington; USA; May 2017 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | FG | ||
Notes | HUPBA; no proj | Approved | no | ||
Call Number | Admin @ si @ ACB2017b | Serial | 2982 | ||
Permanent link to this record | |||||
Author | Bojana Gajic; Eduard Vazquez; Ramon Baldrich | ||||
Title | Evaluation of Deep Image Descriptors for Texture Retrieval | Type | Conference Article | ||
Year | 2017 | Publication | Proceedings of the 12th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2017) | Abbreviated Journal | |
Volume | Issue | Pages | 251-257 | ||
Keywords | Texture Representation; Texture Retrieval; Convolutional Neural Networks; Psychophysical Evaluation | ||||
Abstract | The increasing complexity learnt in the layers of a Convolutional Neural Network has proven to be of great help for the task of classification. The topic has received great attention in recently published literature.
Nonetheless, just a handful of works study low-level representations, commonly associated with lower layers. In this paper, we explore recent findings which conclude, counterintuitively, the last layer of the VGG convolutional network is the best to describe a low-level property such as texture. To shed some light on this issue, we are proposing a psychophysical experiment to evaluate the adequacy of different layers of the VGG network for texture retrieval. Results obtained suggest that, whereas the last convolutional layer is a good choice for a specific task of classification, it might not be the best choice as a texture descriptor, showing a very poor performance on texture retrieval. Intermediate layers show the best performance, showing a good combination of basic filters, as in the primary visual cortex, and also a degree of higher level information to describe more complex textures. |
||||
Address | Porto, Portugal; 27 February – 1 March 2017 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | VISIGRAPP | ||
Notes | CIC; 600.087 | Approved | no | ||
Call Number | Admin @ si @ | Serial | 3710 | ||
Permanent link to this record | |||||
Author | Arash Akbarinia; Karl R. Gegenfurtner | ||||
Title | Metameric Mismatching in Natural and Artificial Reflectances | Type | Journal Article | ||
Year | 2017 | Publication | Journal of Vision | Abbreviated Journal | JV |
Volume | 17 | Issue | 10 | Pages | 390-390 |
Keywords | Metamer; colour perception; spectral discrimination; photoreceptors | ||||
Abstract | The human visual system and most digital cameras sample the continuous spectral power distribution through three classes of receptors. This implies that two distinct spectral reflectances can result in identical tristimulus values under one illuminant and differ under another – the problem of metamer mismatching. It is still debated how frequent this issue arises in the real world, using naturally occurring reflectance functions and common illuminants.
We gathered more than ten thousand spectral reflectance samples from various sources, covering a wide range of environments (e.g., flowers, plants, Munsell chips) and evaluated their responses under a number of natural and artificial source of lights. For each pair of reflectance functions, we estimated the perceived difference using the CIE-defined distance ΔE2000 metric in Lab color space. The degree of metamer mismatching depended on the lower threshold value l when two samples would be considered to lead to equal sensor excitations (ΔE < l), and on the higher threshold value h when they would be considered different. For example, for l=h=1, we found that 43.129 comparisons out of a total of 6×107 pairs would be considered metameric (1 in 104). For l=1 and h=5, this number reduced to 705 metameric pairs (2 in 106). Extreme metamers, for instance l=1 and h=10, were rare (22 pairs or 6 in 108), as were instances where the two members of a metameric pair would be assigned to different color categories. Not unexpectedly, we observed variations among different reflectance databases and illuminant spectra with more frequency under artificial illuminants than natural ones. Overall, our numbers are not very different from those obtained earlier (Foster et al, JOSA A, 2006). However, our results also show that the degree of metamerism is typically not very strong and that category switches hardly ever occur. |
||||
Address | Florida, USA; May 2017 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | NEUROBIT; no menciona | Approved | no | ||
Call Number | Admin @ si @ AkG2017 | Serial | 2899 | ||
Permanent link to this record | |||||
Author | Pierdomenico Fiadino; Victor Ponce; Juan Antonio Torrero-Gonzalez; Marc Torrent-Moreno | ||||
Title | Call Detail Records for Human Mobility Studies: Taking Stock of the Situation in the “Always Connected Era" | Type | Conference Article | ||
Year | 2017 | Publication | Workshop on Big Data Analytics and Machine Learning for Data Communication Networks | Abbreviated Journal | |
Volume | Issue | Pages | 43-48 | ||
Keywords | mobile networks; call detail records; human mobility | ||||
Abstract | The exploitation of cellular network data for studying human mobility has been a popular research topic in the last decade. Indeed, mobile terminals could be considered ubiquitous sensors that allow the observation of human movements on large scale without the need of relying on non-scalable techniques, such as surveys, or dedicated and expensive monitoring infrastructures. In particular, Call Detail Records (CDRs), collected by operators for billing purposes,
have been extensively employed due to their rather large availability, compared to other types of cellular data (e.g., signaling). Despite the interest aroused around this topic, the research community has generally agreed about the scarcity of information provided by CDRs: the position of mobile terminals is logged when some kind of activity (calls, SMS, data connections) occurs, which translates in a picture of mobility somehow biased by the activity degree of users. By studying two datasets collected by a Nation-wide operator in 2014 and 2016, we show that the situation has drastically changed in terms of data volume and quality. The increase of flat data plans and the higher penetration of “ always connected” terminals have driven up the number of recorded CDRs, providing higher temporal accuracy for users’ locations. |
||||
Address | UCLA; USA; August 2017 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | 978-1-4503-5054-9 | Medium | ||
Area | Expedition | Conference | ACMW (SIGCOMM) | ||
Notes | HuPBA; no menciona | Approved | no | ||
Call Number | Admin @ si @ FPT2017 | Serial | 2980 | ||
Permanent link to this record | |||||
Author | Hugo Jair Escalante; Victor Ponce; Sergio Escalera; Xavier Baro; Alicia Morales-Reyes; Jose Martinez-Carranza | ||||
Title | Evolving weighting schemes for the Bag of Visual Words | Type | Journal Article | ||
Year | 2017 | Publication | Neural Computing and Applications | Abbreviated Journal | Neural Computing and Applications |
Volume | 28 | Issue | 5 | Pages | 925–939 |
Keywords | Bag of Visual Words; Bag of features; Genetic programming; Term-weighting schemes; Computer vision | ||||
Abstract | The Bag of Visual Words (BoVW) is an established representation in computer vision. Taking inspiration from text mining, this representation has proved
to be very effective in many domains. However, in most cases, standard term-weighting schemes are adopted (e.g.,term-frequency or TF-IDF). It remains open the question of whether alternative weighting schemes could boost the performance of methods based on BoVW. More importantly, it is unknown whether it is possible to automatically learn and determine effective weighting schemes from scratch. This paper brings some light into both of these unknowns. On the one hand, we report an evaluation of the most common weighting schemes used in text mining, but rarely used in computer vision tasks. Besides, we propose an evolutionary algorithm capable of automatically learning weighting schemes for computer vision problems. We report empirical results of an extensive study in several computer vision problems. Results show the usefulness of the proposed method. |
||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | Springer | ||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | HUPBA;MV; no menciona | Approved | no | ||
Call Number | Admin @ si @ EPE2017 | Serial | 2743 | ||
Permanent link to this record | |||||
Author | Patricia Suarez; Angel Sappa; Boris X. Vintimilla | ||||
Title | Cross-Spectral Image Patch Similarity using Convolutional Neural Network | Type | Conference Article | ||
Year | 2017 | Publication | IEEE International Workshop of Electronics, Control, Measurement, Signals and their application to Mechatronics | Abbreviated Journal | |
Volume | Issue | Pages | |||
Keywords | |||||
Abstract | The ability to compare image regions (patches) has been the basis of many approaches to core computer vision problems, including object, texture and scene categorization. Hence, developing representations for image patches have been of interest in several works. The current work focuses on learning similarity between cross-spectral image patches with a 2 channel convolutional neural network (CNN) model. The proposed approach is an adaptation of a previous work, trying to obtain similar results than the state of the art but with a lowcost hardware. Hence, obtained results are compared with both
classical approaches, showing improvements, and a state of the art CNN based approach. |
||||
Address | San Sebastian; Spain; May 2017 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | ECMSM | ||
Notes | ADAS; 600.086; 600.118 | Approved | no | ||
Call Number | Admin @ si @ SSV2017a | Serial | 2916 | ||
Permanent link to this record | |||||
Author | Dimosthenis Karatzas; Lluis Gomez; Marçal Rusiñol | ||||
Title | The Robust Reading Competition Annotation and Evaluation Platform | Type | Conference Article | ||
Year | 2017 | Publication | 1st International Workshop on Open Services and Tools for Document Analysis | Abbreviated Journal | |
Volume | Issue | Pages | |||
Keywords | |||||
Abstract | The ICDAR Robust Reading Competition (RRC), initiated in 2003 and re-established in 2011, has become the defacto evaluation standard for the international community. Concurrent with its second incarnation in 2011, a continuous effort started to develop an online framework to facilitate the hosting and management of competitions. This short paper briefly outlines the Robust Reading Competition Annotation and Evaluation Platform, the backbone of the Robust Reading Competition, comprising a collection of tools and processes that aim to simplify the management and annotation
of data, and to provide online and offline performance evaluation and analysis services |
||||
Address | Kyoto; Japan; November 2017 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | ICDAR-OST | ||
Notes | DAG; 600.084; 600.121; 600.129 | Approved | no | ||
Call Number | Admin @ si @ KGR2017 | Serial | 3063 | ||
Permanent link to this record | |||||
Author | Alicia Fornes; Veronica Romero; Arnau Baro; Juan Ignacio Toledo; Joan Andreu Sanchez; Enrique Vidal; Josep Llados | ||||
Title | ICDAR2017 Competition on Information Extraction in Historical Handwritten Records | Type | Conference Article | ||
Year | 2017 | Publication | 14th International Conference on Document Analysis and Recognition | Abbreviated Journal | |
Volume | Issue | Pages | 1389-1394 | ||
Keywords | |||||
Abstract | The extraction of relevant information from historical handwritten document collections is one of the key steps in order to make these manuscripts available for access and searches. In this competition, the goal is to detect the named entities and assign each of them a semantic category, and therefore, to simulate the filling in of a knowledge database. This paper describes the dataset, the tasks, the evaluation metrics, the participants methods and the results. | ||||
Address | Kyoto; Japan; November 2017 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | ICDAR | ||
Notes | DAG; 600.097; 601.225; 600.121 | Approved | no | ||
Call Number | Admin @ si @ FRB2017 | Serial | 3052 | ||
Permanent link to this record | |||||
Author | N. Nayef; F. Yin; I. Bizid; H .Choi; Y. Feng; Dimosthenis Karatzas; Z. Luo; Umapada Pal; Christophe Rigaud; J. Chazalon; W. Khlif; Muhammad Muzzamil Luqman; Jean-Christophe Burie; C.L. Liu; Jean-Marc Ogier | ||||
Title | ICDAR2017 Robust Reading Challenge on Multi-Lingual Scene Text Detection and Script Identification – RRC-MLT | Type | Conference Article | ||
Year | 2017 | Publication | 14th International Conference on Document Analysis and Recognition | Abbreviated Journal | |
Volume | Issue | Pages | 1454-1459 | ||
Keywords | |||||
Abstract | Text detection and recognition in a natural environment are key components of many applications, ranging from business card digitization to shop indexation in a street. This competition aims at assessing the ability of state-of-the-art methods to detect Multi-Lingual Text (MLT) in scene images, such as in contents gathered from the Internet media and in modern cities where multiple cultures live and communicate together. This competition is an extension of the Robust Reading Competition (RRC) which has been held since 2003 both in ICDAR and in an online context. The proposed competition is presented as a new challenge of the RRC. The dataset built for this challenge largely extends the previous RRC editions in many aspects: the multi-lingual text, the size of the dataset, the multi-oriented text, the wide variety of scenes. The dataset is comprised of 18,000 images which contain text belonging to 9 languages. The challenge is comprised of three tasks related to text detection and script classification. We have received a total of 16 participations from the research and industrial communities. This paper presents the dataset, the tasks and the findings of this RRC-MLT challenge. | ||||
Address | Kyoto; Japan; November 2017 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | 978-1-5386-3586-5 | Medium | ||
Area | Expedition | Conference | ICDAR | ||
Notes | DAG; 600.121 | Approved | no | ||
Call Number | Admin @ si @ NYB2017 | Serial | 3097 | ||
Permanent link to this record | |||||
Author | Antonio Lopez; Jiaolong Xu; Jose Luis Gomez; David Vazquez; German Ros | ||||
Title | From Virtual to Real World Visual Perception using Domain Adaptation -- The DPM as Example | Type | Book Chapter | ||
Year | 2017 | Publication | Domain Adaptation in Computer Vision Applications | Abbreviated Journal | |
Volume | Issue | 13 | Pages | 243-258 | |
Keywords | Domain Adaptation | ||||
Abstract | Supervised learning tends to produce more accurate classifiers than unsupervised learning in general. This implies that training data is preferred with annotations. When addressing visual perception challenges, such as localizing certain object classes within an image, the learning of the involved classifiers turns out to be a practical bottleneck. The reason is that, at least, we have to frame object examples with bounding boxes in thousands of images. A priori, the more complex the model is regarding its number of parameters, the more annotated examples are required. This annotation task is performed by human oracles, which ends up in inaccuracies and errors in the annotations (aka ground truth) since the task is inherently very cumbersome and sometimes ambiguous. As an alternative we have pioneered the use of virtual worlds for collecting such annotations automatically and with high precision. However, since the models learned with virtual data must operate in the real world, we still need to perform domain adaptation (DA). In this chapter we revisit the DA of a deformable part-based model (DPM) as an exemplifying case of virtual- to-real-world DA. As a use case, we address the challenge of vehicle detection for driver assistance, using different publicly available virtual-world data. While doing so, we investigate questions such as: how does the domain gap behave due to virtual-vs-real data with respect to dominant object appearance per domain, as well as the role of photo-realism in the virtual world. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Springer | Place of Publication | Editor | Gabriela Csurka | |
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | ADAS; 600.085; 601.223; 600.076; 600.118 | Approved | no | ||
Call Number | ADAS @ adas @ LXG2017 | Serial | 2872 | ||
Permanent link to this record | |||||
Author | Antonio Lopez; Atsushi Imiya; Tomas Pajdla; Jose Manuel Alvarez | ||||
Title | Computer Vision in Vehicle Technology: Land, Sea & Air | Type | Book Whole | ||
Year | 2017 | Publication | Abbreviated Journal | ||
Volume | Issue | Pages | 161-163 | ||
Keywords | |||||
Abstract | Summary This chapter examines different vision-based commercial solutions for real-live problems related to vehicles. It is worth mentioning the recent astonishing performance of deep convolutional neural networks (DCNNs) in difficult visual tasks such as image classification, object recognition/localization/detection, and semantic segmentation. In fact,
different DCNN architectures are already being explored for low-level tasks such as optical flow and disparity computation, and higher level ones such as place recognition. |
||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | John Wiley & Sons, Ltd | Place of Publication | Editor | ||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | 978-1-118-86807-2 | Medium | ||
Area | Expedition | Conference | |||
Notes | ADAS; 600.118 | Approved | no | ||
Call Number | Admin @ si @ LIP2017a | Serial | 2937 | ||
Permanent link to this record | |||||
Author | Meysam Madadi; Sergio Escalera; Alex Carruesco; Carlos Andujar; Xavier Baro; Jordi Gonzalez | ||||
Title | Occlusion Aware Hand Pose Recovery from Sequences of Depth Images | Type | Conference Article | ||
Year | 2017 | Publication | 12th IEEE International Conference on Automatic Face and Gesture Recognition | Abbreviated Journal | |
Volume | Issue | Pages | |||
Keywords | |||||
Abstract | State-of-the-art approaches on hand pose estimation from depth images have reported promising results under quite controlled considerations. In this paper we propose a two-step pipeline for recovering the hand pose from a sequence of depth images. The pipeline has been designed to deal with images taken from any viewpoint and exhibiting a high degree of finger occlusion. In a first step we initialize the hand pose using a part-based model, fitting a set of hand components in the depth images. In a second step we consider temporal data and estimate the parameters of a trained bilinear model consisting of shape and trajectory bases. Results on a synthetic, highly-occluded dataset demonstrate that the proposed method outperforms most recent pose recovering approaches, including those based on CNNs. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | FG | ||
Notes | HUPBA; ISE; 602.143; 600.098; 600.119 | Approved | no | ||
Call Number | Admin @ si @ MEC2017 | Serial | 2970 | ||
Permanent link to this record | |||||
Author | Simon Jégou; Michal Drozdzal; David Vazquez; Adriana Romero; Yoshua Bengio | ||||
Title | The One Hundred Layers Tiramisu: Fully Convolutional DenseNets for Semantic Segmentation | Type | Conference Article | ||
Year | 2017 | Publication | IEEE Conference on Computer Vision and Pattern Recognition Workshops | Abbreviated Journal | |
Volume | Issue | Pages | |||
Keywords | Semantic Segmentation | ||||
Abstract | State-of-the-art approaches for semantic image segmentation are built on Convolutional Neural Networks (CNNs). The typical segmentation architecture is composed of (a) a downsampling path responsible for extracting coarse semantic features, followed by (b) an upsampling path trained to recover the input image resolution at the output of the model and, optionally, (c) a post-processing module (e.g. Conditional Random Fields) to refine the model predictions.
Recently, a new CNN architecture, Densely Connected Convolutional Networks (DenseNets), has shown excellent results on image classification tasks. The idea of DenseNets is based on the observation that if each layer is directly connected to every other layer in a feed-forward fashion then the network will be more accurate and easier to train. In this paper, we extend DenseNets to deal with the problem of semantic segmentation. We achieve state-of-the-art results on urban scene benchmark datasets such as CamVid and Gatech, without any further post-processing module nor pretraining. Moreover, due to smart construction of the model, our approach has much less parameters than currently published best entries for these datasets. |
||||
Address | Honolulu; USA; July 2017 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | CVPRW | ||
Notes | MILAB; ADAS; 600.076; 600.085; 601.281 | Approved | no | ||
Call Number | ADAS @ adas @ JDV2016 | Serial | 2866 | ||
Permanent link to this record | |||||
Author | Xinhang Song; Luis Herranz; Shuqiang Jiang | ||||
Title | Depth CNNs for RGB-D Scene Recognition: Learning from Scratch Better than Transferring from RGB-CNNs | Type | Conference Article | ||
Year | 2017 | Publication | 31st AAAI Conference on Artificial Intelligence | Abbreviated Journal | |
Volume | Issue | Pages | |||
Keywords | RGB-D scene recognition; weakly supervised; fine tune; CNN | ||||
Abstract | Scene recognition with RGB images has been extensively studied and has reached very remarkable recognition levels, thanks to convolutional neural networks (CNN) and large scene datasets. In contrast, current RGB-D scene data is much more limited, so often leverages RGB large datasets, by transferring pretrained RGB CNN models and fine-tuning with the target RGB-D dataset. However, we show that this approach has the limitation of hardly reaching bottom layers, which is key to learn modality-specific features. In contrast, we focus on the bottom layers, and propose an alternative strategy to learn depth features combining local weakly supervised training from patches followed by global fine tuning with images. This strategy is capable of learning very discriminative depth-specific features with limited depth images, without resorting to Places-CNN. In addition we propose a modified CNN architecture to further match the complexity of the model and the amount of data available. For RGB-D scene recognition, depth and RGB features are combined by projecting them in a common space and further leaning a multilayer classifier, which is jointly optimized in an end-to-end network. Our framework achieves state-of-the-art accuracy on NYU2 and SUN RGB-D in both depth only and combined RGB-D data. | ||||
Address | San Francisco CA; February 2017 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | AAAI | ||
Notes | LAMP; 600.120 | Approved | no | ||
Call Number | Admin @ si @ SHJ2017 | Serial | 2967 | ||
Permanent link to this record | |||||
Author | Masakazu Iwamura; Naoyuki Morimoto; Keishi Tainaka; Dena Bazazian; Lluis Gomez; Dimosthenis Karatzas | ||||
Title | ICDAR2017 Robust Reading Challenge on Omnidirectional Video | Type | Conference Article | ||
Year | 2017 | Publication | 14th International Conference on Document Analysis and Recognition | Abbreviated Journal | |
Volume | Issue | Pages | |||
Keywords | |||||
Abstract | Results of ICDAR 2017 Robust Reading Challenge on Omnidirectional Video are presented. This competition uses Downtown Osaka Scene Text (DOST) Dataset that was captured in Osaka, Japan with an omnidirectional camera. Hence, it consists of sequential images (videos) of different view angles. Regarding the sequential images as videos (video mode), two tasks of localisation and end-to-end recognition are prepared. Regarding them as a set of still images (still image mode), three tasks of localisation, cropped word recognition and end-to-end recognition are prepared. As the dataset has been captured in Japan, the dataset contains Japanese text but also include text consisting of alphanumeric characters (Latin text). Hence, a submitted result for each task is evaluated in three ways: using Japanese only ground truth (GT), using Latin only GT and using combined GTs of both. Finally, by the submission deadline, we have received two submissions in the text localisation task of the still image mode. We intend to continue the competition in the open mode. Expecting further submissions, in this report we provide baseline results in all the tasks in addition to the submissions from the community. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | ICDAR | ||
Notes | DAG; 600.084; 600.121 | Approved | no | ||
Call Number | Admin @ si @ IMT2017 | Serial | 3077 | ||
Permanent link to this record |