|
Records |
Links |
|
Author |
Andres Mafla; Sounak Dey; Ali Furkan Biten; Lluis Gomez; Dimosthenis Karatzas |
|
|
Title |
Fine-grained Image Classification and Retrieval by Combining Visual and Locally Pooled Textual Features |
Type |
Conference Article |
|
Year |
2020 |
Publication |
IEEE Winter Conference on Applications of Computer Vision |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
|
|
|
Keywords |
|
|
|
Abstract |
Text contained in an image carries high-level semantics that can be exploited to achieve richer image understanding. In particular, the mere presence of text provides strong guiding content that should be employed to tackle a diversity of computer vision tasks such as image retrieval, fine-grained classification, and visual question answering. In this paper, we address the problem of fine-grained classification and image retrieval by leveraging textual information along with visual cues to comprehend the existing intrinsic relation between the two modalities. The novelty of the proposed model consists of the usage of a PHOC descriptor to construct a bag of textual words along with a Fisher Vector Encoding that captures the morphology of text. This approach provides a stronger multimodal representation for this task and as our experiments demonstrate, it achieves state-of-the-art results on two different tasks, fine-grained classification and image retrieval. |
|
|
Address |
Aspen; Colorado; USA; March 2020 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
WACV |
|
|
Notes |
DAG; 600.121; 600.129 |
Approved |
no |
|
|
Call Number |
Admin @ si @ MDB2020 |
Serial |
3334 |
|
Permanent link to this record |
|
|
|
|
Author |
N. Nayef; F. Yin; I. Bizid; H .Choi; Y. Feng; Dimosthenis Karatzas; Z. Luo; Umapada Pal; Christophe Rigaud; J. Chazalon; W. Khlif; Muhammad Muzzamil Luqman; Jean-Christophe Burie; C.L. Liu; Jean-Marc Ogier |
|
|
Title |
ICDAR2017 Robust Reading Challenge on Multi-Lingual Scene Text Detection and Script Identification – RRC-MLT |
Type |
Conference Article |
|
Year |
2017 |
Publication |
14th International Conference on Document Analysis and Recognition |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
1454-1459 |
|
|
Keywords |
|
|
|
Abstract |
Text detection and recognition in a natural environment are key components of many applications, ranging from business card digitization to shop indexation in a street. This competition aims at assessing the ability of state-of-the-art methods to detect Multi-Lingual Text (MLT) in scene images, such as in contents gathered from the Internet media and in modern cities where multiple cultures live and communicate together. This competition is an extension of the Robust Reading Competition (RRC) which has been held since 2003 both in ICDAR and in an online context. The proposed competition is presented as a new challenge of the RRC. The dataset built for this challenge largely extends the previous RRC editions in many aspects: the multi-lingual text, the size of the dataset, the multi-oriented text, the wide variety of scenes. The dataset is comprised of 18,000 images which contain text belonging to 9 languages. The challenge is comprised of three tasks related to text detection and script classification. We have received a total of 16 participations from the research and industrial communities. This paper presents the dataset, the tasks and the findings of this RRC-MLT challenge. |
|
|
Address |
Kyoto; Japan; November 2017 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
978-1-5386-3586-5 |
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
ICDAR |
|
|
Notes |
DAG; 600.121 |
Approved |
no |
|
|
Call Number |
Admin @ si @ NYB2017 |
Serial |
3097 |
|
Permanent link to this record |
|
|
|
|
Author |
Suman Ghosh |
|
|
Title |
Word Spotting and Recognition in Images from Heterogeneous Sources A |
Type |
Book Whole |
|
Year |
2018 |
Publication |
PhD Thesis, Universitat Autonoma de Barcelona-CVC |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
|
|
|
Keywords |
|
|
|
Abstract |
Text is the most common way of information sharing from ages. With recent development of personal images databases and handwritten historic manuscripts the demand for algorithms to make these databases accessible for browsing and indexing are in rise. Enabling search or understanding large collection of manuscripts or image databases needs fast and robust methods. Researchers have found different ways to represent cropped words for understanding and matching, which works well when words are already segmented. However there is no trivial way to extend these for non-segmented documents. In this thesis we explore different methods for text retrieval and recognition from unsegmented document and scene images. Two different ways of representation exist in literature, one uses a fixed length representation learned from cropped words and another a sequence of features of variable length. Throughout this thesis, we have studied both these representation for their suitability in segmentation free understanding of text. In the first part we are focused on segmentation free word spotting using a fixed length representation. We extended the use of the successful PHOC (Pyramidal Histogram of Character) representation to segmentation free retrieval. In the second part of the thesis, we explore sequence based features and finally, we propose a unified solution where the same framework can generate both kind of representations. |
|
|
Address |
November 2018 |
|
|
Corporate Author |
|
Thesis |
Ph.D. thesis |
|
|
Publisher |
Ediciones Graficas Rey |
Place of Publication |
|
Editor |
Ernest Valveny |
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
978-84-948531-0-4 |
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
DAG; 600.121 |
Approved |
no |
|
|
Call Number |
Admin @ si @ Gho2018 |
Serial |
3217 |
|
Permanent link to this record |
|
|
|
|
Author |
David Fernandez; Josep Llados; Alicia Fornes |
|
|
Title |
A graph-based approach for segmenting touching lines in historical handwritten documents |
Type |
Journal Article |
|
Year |
2014 |
Publication |
International Journal on Document Analysis and Recognition |
Abbreviated Journal |
IJDAR |
|
|
Volume |
17 |
Issue |
3 |
Pages |
293-312 |
|
|
Keywords |
Text line segmentation; Handwritten documents; Document image processing; Historical document analysis |
|
|
Abstract |
Text line segmentation in handwritten documents is an important task in the recognition of historical documents. Handwritten document images contain text lines with multiple orientations, touching and overlapping characters between consecutive text lines and different document structures, making line segmentation a difficult task. In this paper, we present a new approach for handwritten text line segmentation solving the problems of touching components, curvilinear text lines and horizontally overlapping components. The proposed algorithm formulates line segmentation as finding the central path in the area between two consecutive lines. This is solved as a graph traversal problem. A graph is constructed using the skeleton of the image. Then, a path-finding algorithm is used to find the optimum path between text lines. The proposed algorithm has been evaluated on a comprehensive dataset consisting of five databases: ICDAR2009, ICDAR2013, UMD, the George Washington and the Barcelona Marriages Database. The proposed method outperforms the state-of-the-art considering the different types and difficulties of the benchmarking data. |
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
Springer Berlin Heidelberg |
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
1433-2833 |
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
DAG; 600.056; 600.061; 602.006; 600.077 |
Approved |
no |
|
|
Call Number |
Admin @ si @ FLF2014 |
Serial |
2459 |
|
Permanent link to this record |
|
|
|
|
Author |
Dena Bazazian; Raul Gomez; Anguelos Nicolaou; Lluis Gomez; Dimosthenis Karatzas; Andrew Bagdanov |
|
|
Title |
Improving Text Proposals for Scene Images with Fully Convolutional Networks |
Type |
Conference Article |
|
Year |
2016 |
Publication |
23rd International Conference on Pattern Recognition Workshops |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
|
|
|
Keywords |
|
|
|
Abstract |
Text Proposals have emerged as a class-dependent version of object proposals – efficient approaches to reduce the search space of possible text object locations in an image. Combined with strong word classifiers, text proposals currently yield top state of the art results in end-to-end scene text
recognition. In this paper we propose an improvement over the original Text Proposals algorithm of [1], combining it with Fully Convolutional Networks to improve the ranking of proposals. Results on the ICDAR RRC and the COCO-text datasets show superior performance over current state-of-the-art. |
|
|
Address |
Cancun; Mexico; December 2016 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
ICPRW |
|
|
Notes |
DAG; LAMP; 600.084 |
Approved |
no |
|
|
Call Number |
Admin @ si @ BGN2016 |
Serial |
2823 |
|
Permanent link to this record |
|
|
|
|
Author |
Dena Bazazian |
|
|
Title |
Fully Convolutional Networks for Text Understanding in Scene Images |
Type |
Book Whole |
|
Year |
2018 |
Publication |
PhD Thesis, Universitat Autonoma de Barcelona-CVC |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
|
|
|
Keywords |
|
|
|
Abstract |
Text understanding in scene images has gained plenty of attention in the computer vision community and it is an important task in many applications as text carries semantically rich information about scene content and context. For instance, reading text in a scene can be applied to autonomous driving, scene understanding or assisting visually impaired people. The general aim of scene text understanding is to localize and recognize text in scene images. Text regions are first localized in the original image by a trained detector model and afterwards fed into a recognition module. The tasks of localization and recognition are highly correlated since an inaccurate localization can affect the recognition task.
The main purpose of this thesis is to devise efficient methods for scene text understanding. We investigate how the latest results on deep learning can advance text understanding pipelines. Recently, Fully Convolutional Networks (FCNs) and derived methods have achieved a significant performance on semantic segmentation and pixel level classification tasks. Therefore, we took benefit of the strengths of FCN approaches in order to detect text in natural scenes. In this thesis we have focused on two challenging tasks of scene text understanding which are Text Detection and Word Spotting. For the task of text detection, we have proposed an efficient text proposal technique in scene images. We have considered the Text Proposals method as the baseline which is an approach to reduce the search space of possible text regions in an image. In order to improve the Text Proposals method we combined it with Fully Convolutional Networks to efficiently reduce the number of proposals while maintaining the same level of accuracy and thus gaining a significant speed up. Our experiments demonstrate that this text proposal approach yields significantly higher recall rates than the line based text localization techniques, while also producing better-quality localization. We have also applied this technique on compressed images such as videos from wearable egocentric cameras. For the task of word spotting, we have introduced a novel mid-level word representation method. We have proposed a technique to create and exploit an intermediate representation of images based on text attributes which roughly correspond to character probability maps. Our representation extends the concept of Pyramidal Histogram Of Characters (PHOC) by exploiting Fully Convolutional Networks to derive a pixel-wise mapping of the character distribution within candidate word regions. We call this representation the Soft-PHOC. Furthermore, we show how to use Soft-PHOC descriptors for word spotting tasks through an efficient text line proposal algorithm. To evaluate the detected text, we propose a novel line based evaluation along with the classic bounding box based approach. We test our method on incidental scene text images which comprises real-life scenarios such as urban scenes. The importance of incidental scene text images is due to the complexity of backgrounds, perspective, variety of script and language, short text and little linguistic context. All of these factors together makes the incidental scene text images challenging. |
|
|
Address |
November 2018 |
|
|
Corporate Author |
|
Thesis |
Ph.D. thesis |
|
|
Publisher |
Ediciones Graficas Rey |
Place of Publication |
|
Editor |
Dimosthenis Karatzas;Andrew Bagdanov |
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
978-84-948531-1-1 |
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
DAG; 600.121 |
Approved |
no |
|
|
Call Number |
Admin @ si @ Baz2018 |
Serial |
3220 |
|
Permanent link to this record |
|
|
|
|
Author |
Ayan Banerjee; Palaiahnakote Shivakumara; Parikshit Acharya; Umapada Pal; Josep Llados |
|
|
Title |
TWD: A New Deep E2E Model for Text Watermark Detection in Video Images |
Type |
Conference Article |
|
Year |
2022 |
Publication |
26th International Conference on Pattern Recognition |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
|
|
|
Keywords |
Deep learning; U-Net; FCENet; Scene text detection; Video text detection; Watermark text detection |
|
|
Abstract |
Text watermark detection in video images is challenging because text watermark characteristics are different from caption and scene texts in the video images. Developing a successful model for detecting text watermark, caption, and scene texts is an open challenge. This study aims at developing a new Deep End-to-End model for Text Watermark Detection (TWD), caption and scene text in video images. To standardize non-uniform contrast, quality, and resolution, we explore the U-Net3+ model for enhancing poor quality text without affecting high-quality text. Similarly, to address the challenges of arbitrary orientation, text shapes and complex background, we explore Stacked Hourglass Encoded Fourier Contour Embedding Network (SFCENet) by feeding the output of the U-Net3+ model as input. Furthermore, the proposed work integrates enhancement and detection models as an end-to-end model for detecting multi-type text in video images. To validate the proposed model, we create our own dataset (named TW-866), which provides video images containing text watermark, caption (subtitles), as well as scene text. The proposed model is also evaluated on standard natural scene text detection datasets, namely, ICDAR 2019 MLT, CTW1500, Total-Text, and DAST1500. The results show that the proposed method outperforms the existing methods. This is the first work on text watermark detection in video images to the best of our knowledge |
|
|
Address |
Montreal; Quebec; Canada; August 2022 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
ICPR |
|
|
Notes |
DAG; |
Approved |
no |
|
|
Call Number |
Admin @ si @ BSA2022 |
Serial |
3788 |
|
Permanent link to this record |
|
|
|
|
Author |
Lluis Gomez; Andres Mafla; Marçal Rusiñol; Dimosthenis Karatzas |
|
|
Title |
Single Shot Scene Text Retrieval |
Type |
Conference Article |
|
Year |
2018 |
Publication |
15th European Conference on Computer Vision |
Abbreviated Journal |
|
|
|
Volume |
11218 |
Issue |
|
Pages |
728-744 |
|
|
Keywords |
Image retrieval; Scene text; Word spotting; Convolutional Neural Networks; Region Proposals Networks; PHOC |
|
|
Abstract |
Textual information found in scene images provides high level semantic information about the image and its context and it can be leveraged for better scene understanding. In this paper we address the problem of scene text retrieval: given a text query, the system must return all images containing the queried text. The novelty of the proposed model consists in the usage of a single shot CNN architecture that predicts at the same time bounding boxes and a compact text representation of the words in them. In this way, the text based image retrieval task can be casted as a simple nearest neighbor search of the query text representation over the outputs of the CNN over the entire image
database. Our experiments demonstrate that the proposed architecture
outperforms previous state-of-the-art while it offers a significant increase
in processing speed. |
|
|
Address |
Munich; September 2018 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
LNCS |
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
ECCV |
|
|
Notes |
DAG; 600.084; 601.338; 600.121; 600.129 |
Approved |
no |
|
|
Call Number |
Admin @ si @ GMR2018 |
Serial |
3143 |
|
Permanent link to this record |
|
|
|
|
Author |
Alicia Fornes; Veronica Romero; Arnau Baro; Juan Ignacio Toledo; Joan Andreu Sanchez; Enrique Vidal; Josep Llados |
|
|
Title |
ICDAR2017 Competition on Information Extraction in Historical Handwritten Records |
Type |
Conference Article |
|
Year |
2017 |
Publication |
14th International Conference on Document Analysis and Recognition |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
1389-1394 |
|
|
Keywords |
|
|
|
Abstract |
The extraction of relevant information from historical handwritten document collections is one of the key steps in order to make these manuscripts available for access and searches. In this competition, the goal is to detect the named entities and assign each of them a semantic category, and therefore, to simulate the filling in of a knowledge database. This paper describes the dataset, the tasks, the evaluation metrics, the participants methods and the results. |
|
|
Address |
Kyoto; Japan; November 2017 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
ICDAR |
|
|
Notes |
DAG; 600.097; 601.225; 600.121 |
Approved |
no |
|
|
Call Number |
Admin @ si @ FRB2017 |
Serial |
3052 |
|
Permanent link to this record |
|
|
|
|
Author |
Dimosthenis Karatzas; Lluis Gomez; Marçal Rusiñol |
|
|
Title |
The Robust Reading Competition Annotation and Evaluation Platform |
Type |
Conference Article |
|
Year |
2017 |
Publication |
1st International Workshop on Open Services and Tools for Document Analysis |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
|
|
|
Keywords |
|
|
|
Abstract |
The ICDAR Robust Reading Competition (RRC), initiated in 2003 and re-established in 2011, has become the defacto evaluation standard for the international community. Concurrent with its second incarnation in 2011, a continuous effort started to develop an online framework to facilitate the hosting and management of competitions. This short paper briefly outlines the Robust Reading Competition Annotation and Evaluation Platform, the backbone of the Robust Reading Competition, comprising a collection of tools and processes that aim to simplify the management and annotation
of data, and to provide online and offline performance evaluation and analysis services |
|
|
Address |
Kyoto; Japan; November 2017 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
ICDAR-OST |
|
|
Notes |
DAG; 600.084; 600.121; 600.129 |
Approved |
no |
|
|
Call Number |
Admin @ si @ KGR2017 |
Serial |
3063 |
|
Permanent link to this record |