Home | [91–100] << 101 102 103 104 105 106 107 108 109 110 >> [111–120] |
![]() |
Records | |||||
---|---|---|---|---|---|
Author | Patricia Suarez; Angel Sappa; Dario Carpio; Henry Velesaca; Francisca Burgos; Patricia Urdiales | ||||
Title | Deep Learning Based Shrimp Classification | Type | Conference Article | ||
Year | 2022 | Publication | 17th International Symposium on Visual Computing | Abbreviated Journal | |
Volume | 13598 | Issue | Pages | 36–45 | |
Keywords | Pigmentation; Color space; Light weight network | ||||
Abstract | This work proposes a novel approach based on deep learning to address the classification of shrimp (Pennaeus vannamei) into two classes, according to their level of pigmentation accepted by shrimp commerce. The main goal of this actual study is to support the shrimp industry in terms of price and process. An efficient CNN architecture is proposed to perform image classification through a program that could be set other in mobile devices or in fixed support in the shrimp supply chain. The proposed approach is a lightweight model that uses HSV color space shrimp images. A simple pipeline shows the most important stages performed to determine a pattern that identifies the class to which they belong based on their pigmentation. For the experiments, a database acquired with mobile devices of various brands and models has been used to capture images of shrimp. The results obtained with the images in the RGB and HSV color space allow for testing the effectiveness of the proposed model. | ||||
Address ![]() |
|||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | ISVC | ||
Notes | MSIAU; no proj | Approved | no | ||
Call Number | Admin @ si @ SAC2022 | Serial | 3772 | ||
Permanent link to this record | |||||
Author | Henry Velesaca; Patricia Suarez; Angel Sappa; Dario Carpio; Rafael E. Rivadeneira; Angel Sanchez | ||||
Title | Review on Common Techniques for Urban Environment Video Analytics | Type | Conference Article | ||
Year | 2022 | Publication | Anais do III Workshop Brasileiro de Cidades Inteligentes | Abbreviated Journal | |
Volume | Issue | Pages | 107-118 | ||
Keywords | Video Analytics; Review; Urban Environments; Smart Cities | ||||
Abstract | This work compiles the different computer vision-based approaches
from the state-of-the-art intended for video analytics in urban environments. The manuscript groups the different approaches according to the typical modules present in video analysis, including image preprocessing, object detection, classification, and tracking. This proposed pipeline serves as a basic guide to representing these most representative approaches in this topic of video analysis that will be addressed in this work. Furthermore, the manuscript is not intended to be an exhaustive review of the most advanced approaches, but only a list of common techniques proposed to address recurring problems in this field. |
||||
Address ![]() |
|||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | WBCI | ||
Notes | MSIAU; 601.349 | Approved | no | ||
Call Number | Admin @ si @ VSS2022 | Serial | 3773 | ||
Permanent link to this record | |||||
Author | Rafael E. Rivadeneira; Angel Sappa; Boris X. Vintimilla | ||||
Title | Thermal Image Super-Resolution: A Novel Unsupervised Approach | Type | Conference Article | ||
Year | 2022 | Publication | International Joint Conference on Computer Vision, Imaging and Computer Graphics | Abbreviated Journal | |
Volume | 1474 | Issue | Pages | 495–506 | |
Keywords | |||||
Abstract | This paper proposes the use of a CycleGAN architecture for thermal image super-resolution under a transfer domain strategy, where middle-resolution images from one camera are transferred to a higher resolution domain of another camera. The proposed approach is trained with a large dataset acquired using three thermal cameras at different resolutions. An unsupervised learning process is followed to train the architecture. Additional loss function is proposed trying to improve results from the state of the art approaches. Following the first thermal image super-resolution challenge (PBVS-CVPR2020) evaluations are performed. A comparison with previous works is presented showing the proposed approach reaches the best results. | ||||
Address ![]() |
|||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | VISIGRAPP | ||
Notes | MSIAU; 600.130 | Approved | no | ||
Call Number | Admin @ si @ RSV2022d | Serial | 3776 | ||
Permanent link to this record | |||||
Author | Ajian Liu; Chenxu Zhao; Zitong Yu; Jun Wan; Anyang Su; Xing Liu; Zichang Tan; Sergio Escalera; Junliang Xing; Yanyan Liang; Guodong Guo; Zhen Lei; Stan Z. Li; Shenshen Du | ||||
Title | Contrastive Context-Aware Learning for 3D High-Fidelity Mask Face Presentation Attack Detection | Type | Journal Article | ||
Year | 2022 | Publication | IEEE Transactions on Information Forensics and Security | Abbreviated Journal | TIForensicSEC |
Volume | 17 | Issue | Pages | 2497 - 2507 | |
Keywords | |||||
Abstract | Face presentation attack detection (PAD) is essential to secure face recognition systems primarily from high-fidelity mask attacks. Most existing 3D mask PAD benchmarks suffer from several drawbacks: 1) a limited number of mask identities, types of sensors, and a total number of videos; 2) low-fidelity quality of facial masks. Basic deep models and remote photoplethysmography (rPPG) methods achieved acceptable performance on these benchmarks but still far from the needs of practical scenarios. To bridge the gap to real-world applications, we introduce a large-scale Hi gh- Fi delity Mask dataset, namely HiFiMask . Specifically, a total amount of 54,600 videos are recorded from 75 subjects with 225 realistic masks by 7 new kinds of sensors. Along with the dataset, we propose a novel C ontrastive C ontext-aware L earning (CCL) framework. CCL is a new training methodology for supervised PAD tasks, which is able to learn by leveraging rich contexts accurately (e.g., subjects, mask material and lighting) among pairs of live faces and high-fidelity mask attacks. Extensive experimental evaluations on HiFiMask and three additional 3D mask datasets demonstrate the effectiveness of our method. The codes and dataset will be released soon. | ||||
Address ![]() |
|||||
Corporate Author | Thesis | ||||
Publisher | IEEE | Place of Publication | Editor | ||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | HuPBA | Approved | no | ||
Call Number | Admin @ si @ LZY2022 | Serial | 3778 | ||
Permanent link to this record | |||||
Author | Andrea Gemelli; Sanket Biswas; Enrico Civitelli; Josep Llados; Simone Marinai | ||||
Title | Doc2Graph: A Task Agnostic Document Understanding Framework Based on Graph Neural Networks | Type | Conference Article | ||
Year | 2022 | Publication | 17th European Conference on Computer Vision Workshops | Abbreviated Journal | |
Volume | 13804 | Issue | Pages | 329–344 | |
Keywords | |||||
Abstract | Geometric Deep Learning has recently attracted significant interest in a wide range of machine learning fields, including document analysis. The application of Graph Neural Networks (GNNs) has become crucial in various document-related tasks since they can unravel important structural patterns, fundamental in key information extraction processes. Previous works in the literature propose task-driven models and do not take into account the full power of graphs. We propose Doc2Graph, a task-agnostic document understanding framework based on a GNN model, to solve different tasks given different types of documents. We evaluated our approach on two challenging datasets for key information extraction in form understanding, invoice layout analysis and table detection. | ||||
Address ![]() |
|||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | LNCS | ||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | 978-3-031-25068-2 | Medium | ||
Area | Expedition | Conference | ECCV-TiE | ||
Notes | DAG; 600.162; 600.140; 110.312 | Approved | no | ||
Call Number | Admin @ si @ GBC2022 | Serial | 3795 | ||
Permanent link to this record | |||||
Author | Carles Onielfa; Carles Casacuberta; Sergio Escalera | ||||
Title | Influence in Social Networks Through Visual Analysis of Image Memes | Type | Conference Article | ||
Year | 2022 | Publication | Artificial Intelligence Research and Development | Abbreviated Journal | |
Volume | 356 | Issue | Pages | 71-80 | |
Keywords | |||||
Abstract | Memes evolve and mutate through their diffusion in social media. They have the potential to propagate ideas and, by extension, products. Many studies have focused on memes, but none so far, to our knowledge, on the users that post them, their relationships, and the reach of their influence. In this article, we define a meme influence graph together with suitable metrics to visualize and quantify influence between users who post memes, and we also describe a process to implement our definitions using a new approach to meme detection based on text-to-image area ratio and contrast. After applying our method to a set of users of the social media platform Instagram, we conclude that our metrics add information to already existing user characteristics. | ||||
Address ![]() |
|||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | HuPBA; no menciona | Approved | no | ||
Call Number | Admin @ si @ OCE2022 | Serial | 3799 | ||
Permanent link to this record | |||||
Author | Smriti Joshi; Richard Osuala; Carlos Martin-Isla; Victor M.Campello; Carla Sendra-Balcells; Karim Lekadir; Sergio Escalera | ||||
Title | nn-UNet Training on CycleGAN-Translated Images for Cross-modal Domain Adaptation in Biomedical Imaging | Type | Conference Article | ||
Year | 2022 | Publication | International MICCAI Brainlesion Workshop | Abbreviated Journal | |
Volume | 12963 | Issue | Pages | 540–551 | |
Keywords | Domain adaptation; Vestibular schwannoma (VS); Deep learning; nn-UNet; CycleGAN | ||||
Abstract | In recent years, deep learning models have considerably advanced the performance of segmentation tasks on Brain Magnetic Resonance Imaging (MRI). However, these models show a considerable performance drop when they are evaluated on unseen data from a different distribution. Since annotation is often a hard and costly task requiring expert supervision, it is necessary to develop ways in which existing models can be adapted to the unseen domains without any additional labelled information. In this work, we explore one such technique which extends the CycleGAN [2] architecture to generate label-preserving data in the target domain. The synthetic target domain data is used to train the nn-UNet [3] framework for the task of multi-label segmentation. The experiments are conducted and evaluated on the dataset [1] provided in the ‘Cross-Modality Domain Adaptation for Medical Image Segmentation’ challenge [23] for segmentation of vestibular schwannoma (VS) tumour and cochlea on contrast enhanced (ceT1) and high resolution (hrT2) MRI scans. In the proposed approach, our model obtains dice scores (DSC) 0.73 and 0.49 for tumour and cochlea respectively on the validation set of the dataset. This indicates the applicability of the proposed technique to real-world problems where data may be obtained by different acquisition protocols as in [1] where hrT2 images are more reliable, safer, and lower-cost alternative to ceT1. | ||||
Address ![]() |
|||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | LNCS | ||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | MICCAIW | ||
Notes | HUPBA; no menciona | Approved | no | ||
Call Number | Admin @ si @ JOM2022 | Serial | 3800 | ||
Permanent link to this record | |||||
Author | Dustin Carrion Ojeda; Hong Chen; Adrian El Baz; Sergio Escalera; Chaoyu Guan; Isabelle Guyon; Ihsan Ullah; Xin Wang; Wenwu Zhu | ||||
Title | NeurIPS’22 Cross-Domain MetaDL competition: Design and baseline results | Type | Conference Article | ||
Year | 2022 | Publication | Understanding Social Behavior in Dyadic and Small Group Interactions | Abbreviated Journal | |
Volume | 191 | Issue | Pages | 24-37 | |
Keywords | |||||
Abstract | We present the design and baseline results for a new challenge in the ChaLearn meta-learning series, accepted at NeurIPS'22, focusing on “cross-domain” meta-learning. Meta-learning aims to leverage experience gained from previous tasks to solve new tasks efficiently (i.e., with better performance, little training data, and/or modest computational resources). While previous challenges in the series focused on within-domain few-shot learning problems, with the aim of learning efficiently N-way k-shot tasks (i.e., N class classification problems with k training examples), this competition challenges the participants to solve “any-way” and “any-shot” problems drawn from various domains (healthcare, ecology, biology, manufacturing, and others), chosen for their humanitarian and societal impact. To that end, we created Meta-Album, a meta-dataset of 40 image classification datasets from 10 domains, from which we carve out tasks with any number of “ways” (within the range 2-20) and any number of “shots” (within the range 1-20). The competition is with code submission, fully blind-tested on the CodaLab challenge platform. The code of the winners will be open-sourced, enabling the deployment of automated machine learning solutions for few-shot image classification across several domains. | ||||
Address ![]() |
|||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | PMLR | ||
Notes | HUPBA; no menciona | Approved | no | ||
Call Number | Admin @ si @ CCB2022 | Serial | 3802 | ||
Permanent link to this record | |||||
Author | Michael Teutsch; Angel Sappa; Riad I. Hammoud | ||||
Title | Cross-Spectral Image Processing | Type | Book Chapter | ||
Year | 2022 | Publication | Computer Vision in the Infrared Spectrum. Synthesis Lectures on Computer Vision | Abbreviated Journal | |
Volume | Issue | Pages | 23-34 | ||
Keywords | |||||
Abstract | Although this book is on IR computer vision and its main focus lies on IR image and video processing and analysis, a special attention is dedicated to cross-spectral image processing due to the increasing number of publications and applications in this domain. In these cross-spectral frameworks, IR information is used together with information from other spectral bands to tackle some specific problems by developing more robust solutions. Tasks considered for cross-spectral processing are for instance dehazing, segmentation, vegetation index estimation, or face recognition. This increasing number of applications is motivated by cross- and multi-spectral camera setups available already on the market like for example smartphones, remote sensing multispectral cameras, or multi-spectral cameras for automotive systems or drones. In this chapter, different cross-spectral image processing techniques will be reviewed together with possible applications. Initially, image registration approaches for the cross-spectral case are reviewed: the registration stage is the first image processing task, which is needed to align images acquired by different sensors within the same reference coordinate system. Then, recent cross-spectral image colorization approaches, which are intended to colorize infrared images for different applications are presented. Finally, the cross-spectral image enhancement problem is tackled by including guided super resolution techniques, image dehazing approaches, cross-spectral filtering and edge detection. Figure 3.1 illustrates cross-spectral image processing stages as well as their possible connections. Table 3.1 presents some of the available public cross-spectral datasets generally used as reference data to evaluate cross-spectral image registration, colorization, enhancement, or exploitation results. | ||||
Address ![]() |
|||||
Corporate Author | Thesis | ||||
Publisher | Springer | Place of Publication | Editor | ||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | SLCV | ||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | 978-3-031-00698-2 | Medium | ||
Area | Expedition | Conference | |||
Notes | MSIAU; MACO | Approved | no | ||
Call Number | Admin @ si @ TSH2022b | Serial | 3805 | ||
Permanent link to this record | |||||
Author | Michael Teutsch; Angel Sappa; Riad I. Hammoud | ||||
Title | Detection, Classification, and Tracking | Type | Book Chapter | ||
Year | 2022 | Publication | Computer Vision in the Infrared Spectrum. Synthesis Lectures on Computer Vision | Abbreviated Journal | |
Volume | Issue | Pages | 35-58 | ||
Keywords | |||||
Abstract | Automatic image and video exploitation or content analysis is a technique to extract higher-level information from a scene such as objects, behavior, (inter-)actions, environment, or even weather conditions. The relevant information is assumed to be contained in the two-dimensional signal provided in an image (width and height in pixels) or the three-dimensional signal provided in a video (width, height, and time). But also intermediate-level information such as object classes [196], locations [197], or motion [198] can help applications to fulfill certain tasks such as intelligent compression [199], video summarization [200], or video retrieval [201]. Usually, videos with their temporal dimension are a richer source of data compared to single images [202] and thus certain video content can be extracted from videos only such as object motion or object behavior. Often, machine learning or nowadays deep learning techniques are utilized to model prior knowledge about object or scene appearance using labeled training samples [203, 204]. After a learning phase, these models are then applied in real world applications, which is called inference. | ||||
Address ![]() |
|||||
Corporate Author | Thesis | ||||
Publisher | Springer | Place of Publication | Editor | ||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | SLCV | ||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | 978-3-031-00698-2 | Medium | ||
Area | Expedition | Conference | |||
Notes | MSIAU; MACO | Approved | no | ||
Call Number | Admin @ si @ TSH2022c | Serial | 3806 | ||
Permanent link to this record | |||||
Author | Michael Teutsch; Angel Sappa; Riad I. Hammoud | ||||
Title | Image and Video Enhancement | Type | Book Chapter | ||
Year | 2022 | Publication | Computer Vision in the Infrared Spectrum. Synthesis Lectures on Computer Vision | Abbreviated Journal | |
Volume | Issue | Pages | 9-21 | ||
Keywords | |||||
Abstract | Image and video enhancement aims at improving the signal quality relative to imaging artifacts such as noise and blur or atmospheric perturbations such as turbulence and haze. It is usually performed in order to assist humans in analyzing image and video content or simply to present humans visually appealing images and videos. However, image and video enhancement can also be used as a preprocessing technique to ease the task and thus improve the performance of subsequent automatic image content analysis algorithms: preceding dehazing can improve object detection as shown by [23] or explicit turbulence modeling can improve moving object detection as discussed by [24]. But it remains an open question whether image and video enhancement should rather be performed explicitly as a preprocessing step or implicitly for example by feeding affected images directly to a neural network for image content analysis like object detection [25]. Especially for real-time video processing at low latency it can be better to handle image perturbation implicitly in order to minimize the processing time of an algorithm. This can be achieved by making algorithms for image content analysis robust or even invariant to perturbations such as noise or blur. Additionally, mistakes of an individual preprocessing module can obviously affect the quality of the entire processing pipeline. | ||||
Address ![]() |
|||||
Corporate Author | Thesis | ||||
Publisher | Springer | Place of Publication | Editor | ||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | SLCV | ||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | MSIAU; MACO | Approved | no | ||
Call Number | Admin @ si @ TSH2022a | Serial | 3807 | ||
Permanent link to this record | |||||
Author | Guillermo Torres; Debora Gil; Antoni Rosell; S. Mena; Carles Sanchez | ||||
Title | Virtual Radiomics Biopsy for the Histological Diagnosis of Pulmonary Nodules – Intermediate Results of the RadioLung Project | Type | Journal Article | ||
Year | 2023 | Publication | International Journal of Computer Assisted Radiology and Surgery | Abbreviated Journal | IJCARS |
Volume | Issue | Pages | |||
Keywords | |||||
Abstract | |||||
Address ![]() |
|||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | IAM | Approved | no | ||
Call Number | Admin @ si @ TGM2023 | Serial | 3830 | ||
Permanent link to this record | |||||
Author | Shiqi Yang; Yaxing Wang; Kai Wang; Shangling Jui; Joost Van de Weijer | ||||
Title | Local Prediction Aggregation: A Frustratingly Easy Source-free Domain Adaptation Method | Type | Miscellaneous | ||
Year | 2022 | Publication | Arxiv | Abbreviated Journal | |
Volume | Issue | Pages | |||
Keywords | |||||
Abstract | We propose a simple but effective source-free domain adaptation (SFDA) method. Treating SFDA as an unsupervised clustering problem and following the intuition that local neighbors in feature space should have more similar predictions than other features, we propose to optimize an objective of prediction consistency. This objective encourages local neighborhood features in feature space to have similar predictions while features farther away in feature space have dissimilar predictions, leading to efficient feature clustering and cluster assignment simultaneously. For efficient training, we seek to optimize an upper-bound of the objective resulting in two simple terms. Furthermore, we relate popular existing methods in domain adaptation, source-free domain adaptation and contrastive learning via the perspective of discriminability and diversity. The experimental results prove the superiority of our method, and our method can be adopted as a simple but strong baseline for future research in SFDA. Our method can be also adapted to source-free open-set and partial-set DA which further shows the generalization ability of our method. Code is available in this https URL. | ||||
Address ![]() |
|||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | LAMP; 600.147 | Approved | no | ||
Call Number | Admin @ si @ YWW2022b | Serial | 3815 | ||
Permanent link to this record | |||||
Author | Kunal Biswas; Palaiahnakote Shivakumara; Umapada Pal; Tong Lu; Michel Blumenstein; Josep Llados | ||||
Title | Classification of aesthetic natural scene images using statistical and semantic features | Type | Journal Article | ||
Year | 2023 | Publication | Multimedia Tools and Applications | Abbreviated Journal | MTAP |
Volume | 82 | Issue | 9 | Pages | 13507-13532 |
Keywords | |||||
Abstract | Aesthetic image analysis is essential for improving the performance of multimedia image retrieval systems, especially from a repository of social media and multimedia content stored on mobile devices. This paper presents a novel method for classifying aesthetic natural scene images by studying the naturalness of image content using statistical features, and reading text in the images using semantic features. Unlike existing methods that focus only on image quality with human information, the proposed approach focuses on image features as well as text-based semantic features without human intervention to reduce the gap between subjectivity and objectivity in the classification. The aesthetic classes considered in this work are (i) Very Pleasant, (ii) Pleasant, (iii) Normal and (iv) Unpleasant. The naturalness is represented by features of focus, defocus, perceived brightness, perceived contrast, blurriness and noisiness, while semantics are represented by text recognition, description of the images and labels of images, profile pictures, and banner images. Furthermore, a deep learning model is proposed in a novel way to fuse statistical and semantic features for the classification of aesthetic natural scene images. Experiments on our own dataset and the standard datasets demonstrate that the proposed approach achieves 92.74%, 88.67% and 83.22% average classification rates on our own dataset, AVA dataset and CUHKPQ dataset, respectively. Furthermore, a comparative study of the proposed model with the existing methods shows that the proposed method is effective for the classification of aesthetic social media images. | ||||
Address ![]() |
|||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | DAG | Approved | no | ||
Call Number | Admin @ si @ BSP2023 | Serial | 3873 | ||
Permanent link to this record | |||||
Author | Ali Furkan Biten; Ruben Tito; Lluis Gomez; Ernest Valveny; Dimosthenis Karatzas | ||||
Title | OCR-IDL: OCR Annotations for Industry Document Library Dataset | Type | Conference Article | ||
Year | 2022 | Publication | ECCV Workshop on Text in Everything | Abbreviated Journal | |
Volume | Issue | Pages | |||
Keywords | |||||
Abstract | Pretraining has proven successful in Document Intelligence tasks where deluge of documents are used to pretrain the models only later to be finetuned on downstream tasks. One of the problems of the pretraining approaches is the inconsistent usage of pretraining data with different OCR engines leading to incomparable results between models. In other words, it is not obvious whether the performance gain is coming from diverse usage of amount of data and distinct OCR engines or from the proposed models. To remedy the problem, we make public the OCR annotations for IDL documents using commercial OCR engine given their superior performance over open source OCR models. The contributed dataset (OCR-IDL) has an estimated monetary value over 20K US$. It is our hope that OCR-IDL can be a starting point for future works on Document Intelligence. All of our data and its collection process with the annotations can be found in this https URL. | ||||
Address ![]() |
|||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | ECCV | ||
Notes | DAG; no proj | Approved | no | ||
Call Number | Admin @ si @ BTG2022 | Serial | 3817 | ||
Permanent link to this record |