|
Records |
Links |
|
Author |
Ilke Demir; Dena Bazazian; Adriana Romero; Viktoriia Sharmanska; Lyne P. Tchapmi |
|
|
Title |
WiCV 2018: The Fourth Women In Computer Vision Workshop |
Type |
Conference Article |
|
Year |
2018 |
Publication |
4th Women in Computer Vision Workshop |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
1941-19412 |
|
|
Keywords |
Conferences; Computer vision; Industries; Object recognition; Engineering profession; Collaboration; Machine learning |
|
|
Abstract |
We present WiCV 2018 – Women in Computer Vision Workshop to increase the visibility and inclusion of women researchers in computer vision field, organized in conjunction with CVPR 2018. Computer vision and machine learning have made incredible progress over the past years, yet the number of female researchers is still low both in academia and industry. WiCV is organized to raise visibility of female researchers, to increase the collaboration,
and to provide mentorship and give opportunities to femaleidentifying junior researchers in the field. In its fourth year, we are proud to present the changes and improvements over the past years, summary of statistics for presenters and attendees, followed by expectations from future generations. |
|
|
Address |
Salt Lake City; USA; June 2018 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
WiCV |
|
|
Notes |
DAG; 600.121; 600.129 |
Approved |
no |
|
|
Call Number |
Admin @ si @ DBR2018 |
Serial |
3222 |
|
Permanent link to this record |
|
|
|
|
Author |
Khanh Nguyen; Ali Furkan Biten; Andres Mafla; Lluis Gomez; Dimosthenis Karatzas |
|
|
Title |
Show, Interpret and Tell: Entity-Aware Contextualised Image Captioning in Wikipedia |
Type |
Conference Article |
|
Year |
2023 |
Publication |
Proceedings of the 37th AAAI Conference on Artificial Intelligence |
Abbreviated Journal |
|
|
|
Volume |
37 |
Issue |
2 |
Pages |
1940-1948 |
|
|
Keywords |
|
|
|
Abstract |
Humans exploit prior knowledge to describe images, and are able to adapt their explanation to specific contextual information given, even to the extent of inventing plausible explanations when contextual information and images do not match. In this work, we propose the novel task of captioning Wikipedia images by integrating contextual knowledge. Specifically, we produce models that jointly reason over Wikipedia articles, Wikimedia images and their associated descriptions to produce contextualized captions. The same Wikimedia image can be used to illustrate different articles, and the produced caption needs to be adapted to the specific context allowing us to explore the limits of the model to adjust captions to different contextual information. Dealing with out-of-dictionary words and Named Entities is a challenging task in this domain. To address this, we propose a pre-training objective, Masked Named Entity Modeling (MNEM), and show that this pretext task results to significantly improved models. Furthermore, we verify that a model pre-trained in Wikipedia generalizes well to News Captioning datasets. We further define two different test splits according to the difficulty of the captioning task. We offer insights on the role and the importance of each modality and highlight the limitations of our model. |
|
|
Address |
Washington; USA; February 2023 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
AAAI |
|
|
Notes |
DAG |
Approved |
no |
|
|
Call Number |
Admin @ si @ NBM2023 |
Serial |
3860 |
|
Permanent link to this record |
|
|
|
|
Author |
Albert Gordo; Florent Perronnin |
|
|
Title |
A Bag-of-Pages Approach to Unordered Multi-Page Document Classification |
Type |
Conference Article |
|
Year |
2010 |
Publication |
20th International Conference on Pattern Recognition |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
1920–1923 |
|
|
Keywords |
|
|
|
Abstract |
We consider the problem of classifying documents containing multiple unordered pages. For this purpose, we propose a novel bag-of-pages document representation. To represent a document, one assigns every page to a prototype in a codebook of pages. This leads to a histogram representation which can then be fed to any discriminative classifier. We also consider several refinements over this initial approach. We show on two challenging datasets that the proposed approach significantly outperforms a baseline system. |
|
|
Address |
Istanbul (Turkey) |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
1051-4651 |
ISBN |
978-1-4244-7542-1 |
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
ICPR |
|
|
Notes |
DAG |
Approved |
no |
|
|
Call Number |
Admin @ si @ GoP2010 |
Serial |
1480 |
|
Permanent link to this record |
|
|
|
|
Author |
Albert Gordo; Florent Perronnin; Ernest Valveny |
|
|
Title |
Large-scale document image retrieval and classification with runlength histograms and binary embeddings |
Type |
Journal Article |
|
Year |
2013 |
Publication |
Pattern Recognition |
Abbreviated Journal |
PR |
|
|
Volume |
46 |
Issue |
7 |
Pages |
1898-1905 |
|
|
Keywords |
visual document descriptor; compression; large-scale; retrieval; classification |
|
|
Abstract |
We present a new document image descriptor based on multi-scale runlength
histograms. This descriptor does not rely on layout analysis and can be
computed efficiently. We show how this descriptor can achieve state-of-theart
results on two very different public datasets in classification and retrieval
tasks. Moreover, we show how we can compress and binarize these descriptors
to make them suitable for large-scale applications. We can achieve state-ofthe-
art results in classification using binary descriptors of as few as 16 to 64
bits. |
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
Elsevier |
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
0031-3203 |
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
DAG; 600.042; 600.045; 605.203 |
Approved |
no |
|
|
Call Number |
Admin @ si @ GPV2013 |
Serial |
2306 |
|
Permanent link to this record |
|
|
|
|
Author |
Dena Bazazian; Dimosthenis Karatzas; Andrew Bagdanov |
|
|
Title |
Word Spotting in Scene Images based on Character Recognition |
Type |
Conference Article |
|
Year |
2018 |
Publication |
IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
1872-1874 |
|
|
Keywords |
|
|
|
Abstract |
In this paper we address the problem of unconstrained Word Spotting in scene images. We train a Fully Convolutional Network to produce heatmaps of all the character classes. Then, we employ the Text Proposals approach and, via a rectangle classifier, detect the most likely rectangle for each query word based on the character attribute maps. We evaluate the proposed method on ICDAR2015 and show that it is capable of identifying and recognizing query words in natural scene images. |
|
|
Address |
Salt Lake City; USA; June 2018 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
CVPRW |
|
|
Notes |
DAG; 600.129; 600.121 |
Approved |
no |
|
|
Call Number |
BKB2018a |
Serial |
3179 |
|
Permanent link to this record |
|
|
|
|
Author |
Mohamed Ali Souibgui; Sanket Biswas; Sana Khamekhem Jemni; Yousri Kessentini; Alicia Fornes; Josep Llados; Umapada Pal |
|
|
Title |
DocEnTr: An End-to-End Document Image Enhancement Transformer |
Type |
Conference Article |
|
Year |
2022 |
Publication |
26th International Conference on Pattern Recognition |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
1699-1705 |
|
|
Keywords |
Degradation; Head; Optical character recognition; Self-supervised learning; Benchmark testing; Transformers; Magnetic heads |
|
|
Abstract |
Document images can be affected by many degradation scenarios, which cause recognition and processing difficulties. In this age of digitization, it is important to denoise them for proper usage. To address this challenge, we present a new encoder-decoder architecture based on vision transformers to enhance both machine-printed and handwritten document images, in an end-to-end fashion. The encoder operates directly on the pixel patches with their positional information without the use of any convolutional layers, while the decoder reconstructs a clean image from the encoded patches. Conducted experiments show a superiority of the proposed model compared to the state-of the-art methods on several DIBCO benchmarks. Code and models will be publicly available at: https://github.com/dali92002/DocEnTR |
|
|
Address |
August 21-25, 2022 , Montréal Québec |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
ICPR |
|
|
Notes |
DAG; 600.121; 600.162; 602.230; 600.140 |
Approved |
no |
|
|
Call Number |
Admin @ si @ SBJ2022 |
Serial |
3730 |
|
Permanent link to this record |
|
|
|
|
Author |
Minesh Mathew; Viraj Bagal; Ruben Tito; Dimosthenis Karatzas; Ernest Valveny; C.V. Jawahar |
|
|
Title |
InfographicVQA |
Type |
Conference Article |
|
Year |
2022 |
Publication |
Winter Conference on Applications of Computer Vision |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
1697-1706 |
|
|
Keywords |
Document Analysis Datasets; Evaluation and Comparison of Vision Algorithms; Vision and Languages |
|
|
Abstract |
Infographics communicate information using a combination of textual, graphical and visual elements. This work explores the automatic understanding of infographic images by using a Visual Question Answering technique. To this end, we present InfographicVQA, a new dataset comprising a diverse collection of infographics and question-answer annotations. The questions require methods that jointly reason over the document layout, textual content, graphical elements, and data visualizations. We curate the dataset with an emphasis on questions that require elementary reasoning and basic arithmetic skills. For VQA on the dataset, we evaluate two Transformer-based strong baselines. Both the baselines yield unsatisfactory results compared to near perfect human performance on the dataset. The results suggest that VQA on infographics--images that are designed to communicate information quickly and clearly to human brain--is ideal for benchmarking machine understanding of complex document images. The dataset is available for download at docvqa. org |
|
|
Address |
Virtual; Waikoloa; Hawai; USA; January 2022 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
WACV |
|
|
Notes |
DAG; 600.155 |
Approved |
no |
|
|
Call Number |
MBT2022 |
Serial |
3625 |
|
Permanent link to this record |
|
|
|
|
Author |
Palaiahnakote Shivakumara; Anjan Dutta; Trung Quy Phan; Chew Lim Tan; Umapada Pal |
|
|
Title |
A Novel Mutual Nearest Neighbor based Symmetry for Text Frame Classification in Video |
Type |
Journal Article |
|
Year |
2011 |
Publication |
Pattern Recognition |
Abbreviated Journal |
PR |
|
|
Volume |
44 |
Issue |
8 |
Pages |
1671-1683 |
|
|
Keywords |
|
|
|
Abstract |
In the field of multimedia retrieval in video, text frame classification is essential for text detection, event detection, event boundary detection, etc. We propose a new text frame classification method that introduces a combination of wavelet and median moment with k-means clustering to select probable text blocks among 16 equally sized blocks of a video frame. The same feature combination is used with a new Max–Min clustering at the pixel level to choose probable dominant text pixels in the selected probable text blocks. For the probable text pixels, a so-called mutual nearest neighbor based symmetry is explored with a four-quadrant formation centered at the centroid of the probable dominant text pixels to know whether a block is a true text block or not. If a frame produces at least one true text block then it is considered as a text frame otherwise it is a non-text frame. Experimental results on different text and non-text datasets including two public datasets and our own created data show that the proposed method gives promising results in terms of recall and precision at the block and frame levels. Further, we also show how existing text detection methods tend to misclassify non-text frames as text frames in term of recall and precision at both the block and frame levels. |
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
DAG |
Approved |
no |
|
|
Call Number |
Admin @ si @ SDP2011 |
Serial |
1727 |
|
Permanent link to this record |
|
|
|
|
Author |
Anjan Dutta; Jaume Gibert; Josep Llados; Horst Bunke; Umapada Pal |
|
|
Title |
Combination of Product Graph and Random Walk Kernel for Symbol Spotting in Graphical Documents |
Type |
Conference Article |
|
Year |
2012 |
Publication |
21st International Conference on Pattern Recognition |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
1663-1666 |
|
|
Keywords |
|
|
|
Abstract |
This paper explores the utilization of product graph for spotting symbols on graphical documents. Product graph is intended to find the candidate subgraphs or components in the input graph containing the paths similar to the query graph. The acute angle between two edges and their length ratio are considered as the node labels. In a second step, each of the candidate subgraphs in the input graph is assigned with a distance measure computed by a random walk kernel. Actually it is the minimum of the distances of the component to all the components of the model graph. This distance measure is then used to eliminate dissimilar components. The remaining neighboring components are grouped and the grouped zone is considered as a retrieval zone of a symbol similar to the queried one. The entire method works online, i.e., it doesn't need any preprocessing step. The present paper reports the initial results of the method, which are very encouraging. |
|
|
Address |
Tsukuba, Japan |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
1051-4651 |
ISBN |
978-1-4673-2216-4 |
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
ICPR |
|
|
Notes |
DAG |
Approved |
no |
|
|
Call Number |
Admin @ si @ DGL2012 |
Serial |
2125 |
|
Permanent link to this record |
|
|
|
|
Author |
Veronica Romero; Alicia Fornes; Nicolas Serrano; Joan Andreu Sanchez; A.H. Toselli; Volkmar Frinken; E. Vidal; Josep Llados |
|
|
Title |
The ESPOSALLES database: An ancient marriage license corpus for off-line handwriting recognition |
Type |
Journal Article |
|
Year |
2013 |
Publication |
Pattern Recognition |
Abbreviated Journal |
PR |
|
|
Volume |
46 |
Issue |
6 |
Pages |
1658-1669 |
|
|
Keywords |
|
|
|
Abstract |
Historical records of daily activities provide intriguing insights into the life of our ancestors, useful for demography studies and genealogical research. Automatic processing of historical documents, however, has mostly been focused on single works of literature and less on social records, which tend to have a distinct layout, structure, and vocabulary. Such information is usually collected by expert demographers that devote a lot of time to manually transcribe them. This paper presents a new database, compiled from a marriage license books collection, to support research in automatic handwriting recognition for historical documents containing social records. Marriage license books are documents that were used for centuries by ecclesiastical institutions to register marriage licenses. Books from this collection are handwritten and span nearly half a millennium until the beginning of the 20th century. In addition, a study is presented about the capability of state-of-the-art handwritten text recognition systems, when applied to the presented database. Baseline results are reported for reference in future studies. |
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
Elsevier Science Inc. New York, NY, USA |
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
0031-3203 |
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
DAG; 600.045; 602.006; 605.203 |
Approved |
no |
|
|
Call Number |
Admin @ si @ RFS2013 |
Serial |
2298 |
|
Permanent link to this record |