|
Records |
Links |
|
Author |
Ayan Banerjee; Sanket Biswas; Josep Llados; Umapada Pal |


|
|
Title |
SemiDocSeg: Harnessing Semi-Supervised Learning for Document Layout Analysis |
Type |
Journal Article |
|
Year |
2024 |
Publication |
International Journal on Document Analysis and Recognition |
Abbreviated Journal |
IJDAR |
|
|
Volume  |
|
Issue |
|
Pages |
|
|
|
Keywords |
Document layout analysis; Semi-supervised learning; Co-Occurrence matrix; Instance segmentation; Swin transformer |
|
|
Abstract |
Document Layout Analysis (DLA) is the process of automatically identifying and categorizing the structural components (e.g. Text, Figure, Table, etc.) within a document to extract meaningful content and establish the page's layout structure. It is a crucial stage in document parsing, contributing to their comprehension. However, traditional DLA approaches often demand a significant volume of labeled training data, and the labor-intensive task of generating high-quality annotated training data poses a substantial challenge. In order to address this challenge, we proposed a semi-supervised setting that aims to perform learning on limited annotated categories by eliminating exhaustive and expensive mask annotations. The proposed setting is expected to be generalizable to novel categories as it learns the underlying positional information through a support set and class information through Co-Occurrence that can be generalized from annotated categories to novel categories. Here, we first extract features from the input image and support set with a shared multi-scale feature acquisition backbone. Then, the extracted feature representation is fed to the transformer encoder as a query. Later on, we utilize a semantic embedding network before the decoder to capture the underlying semantic relationships and similarities between different instances, enabling the model to make accurate predictions or classifications with only a limited amount of labeled data. Extensive experimentation on competitive benchmarks like PRIMA, DocLayNet, and Historical Japanese (HJ) demonstrate that this generalized setup obtains significant performance compared to the conventional supervised approach. |
|
|
Address |
June 2024 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
DAG |
Approved |
no |
|
|
Call Number |
Admin @ si @ BBL2024a |
Serial |
4001 |
|
Permanent link to this record |
|
|
|
|
Author |
G.Thorvaldsen; Joana Maria Pujadas-Mora; T.Andersen ; L.Eikvil; Josep Llados; Alicia Fornes; Anna Cabre |

|
|
Title |
A Tale of two Transcriptions |
Type |
Journal |
|
Year |
2015 |
Publication |
Historical Life Course Studies |
Abbreviated Journal |
|
|
|
Volume  |
2 |
Issue |
|
Pages |
1-19 |
|
|
Keywords |
Nominative Sources; Census; Vital Records; Computer Vision; Optical Character Recognition; Word Spotting |
|
|
Abstract |
non-indexed
This article explains how two projects implement semi-automated transcription routines: for census sheets in Norway and marriage protocols from Barcelona. The Spanish system was created to transcribe the marriage license books from 1451 to 1905 for the Barcelona area; one of the world’s longest series of preserved vital records. Thus, in the Project “Five Centuries of Marriages” (5CofM) at the Autonomous University of Barcelona’s Center for Demographic Studies, the Barcelona Historical Marriage Database has been built. More than 600,000 records were transcribed by 150 transcribers working online. The Norwegian material is cross-sectional as it is the 1891 census, recorded on one sheet per person. This format and the underlining of keywords for several variables made it more feasible to semi-automate data entry than when many persons are listed on the same page. While Optical Character Recognition (OCR) for printed text is scientifically mature, computer vision research is now focused on more difficult problems such as handwriting recognition. In the marriage project, document analysis methods have been proposed to automatically recognize the marriage licenses. Fully automatic recognition is still a challenge, but some promising results have been obtained. In Spain, Norway and elsewhere the source material is available as scanned pictures on the Internet, opening up the possibility for further international cooperation concerning automating the transcription of historic source materials. Like what is being done in projects to digitize printed materials, the optimal solution is likely to be a combination of manual transcription and machine-assisted recognition also for hand-written sources. |
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
2352-6343 |
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
DAG; 600.077; 602.006 |
Approved |
no |
|
|
Call Number |
Admin @ si @ TPA2015 |
Serial |
2582 |
|
Permanent link to this record |
|
|
|
|
Author |
Carles Sanchez; Oriol Ramos Terrades; Patricia Marquez; Enric Marti; J.Roncaries; Debora Gil |

|
|
Title |
Automatic evaluation of practices in Moodle for Self Learning in Engineering |
Type |
Journal |
|
Year |
2015 |
Publication |
Journal of Technology and Science Education |
Abbreviated Journal |
JOTSE |
|
|
Volume  |
5 |
Issue |
2 |
Pages |
97-106 |
|
|
Keywords |
|
|
|
Abstract |
|
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
IAM; DAG; 600.075; 600.077 |
Approved |
no |
|
|
Call Number |
Admin @ si @ SRM2015 |
Serial |
2610 |
|
Permanent link to this record |
|
|
|
|
Author |
Marçal Rusiñol; R.Roset; Josep Llados; C.Montaner |

|
|
Title |
Automatic Index Generation of Digitized Map Series by Coordinate Extraction and Interpretation |
Type |
Journal |
|
Year |
2011 |
Publication |
e-Perimetron |
Abbreviated Journal |
ePER |
|
|
Volume  |
6 |
Issue |
4 |
Pages |
219-229 |
|
|
Keywords |
|
|
|
Abstract |
By means of computer vision algorithms scanned images of maps are processed in order to extract relevant geographic information from printed coordinate pairs. The meaningful information is then transformed into georeferencing information for each single map sheet, and the complete set is compiled to produce a graphical index sheet for the map series along with relevant metadata. The whole process is fully automated and trained to attain maximum effectivity and throughput. |
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
DAG |
Approved |
no |
|
|
Call Number |
Admin @ si @ RRL2011a |
Serial |
1765 |
|
Permanent link to this record |
|
|
|
|
Author |
Antonio Clavelli; Dimosthenis Karatzas; Josep Llados; Mario Ferraro; Giuseppe Boccignone |


|
|
Title |
Modelling task-dependent eye guidance to objects in pictures |
Type |
Journal Article |
|
Year |
2014 |
Publication |
Cognitive Computation |
Abbreviated Journal |
CoCom |
|
|
Volume  |
6 |
Issue |
3 |
Pages |
558-584 |
|
|
Keywords |
Visual attention; Gaze guidance; Value; Payoff; Stochastic fixation prediction |
|
|
Abstract |
5Y Impact Factor: 1.14 / 3rd (Computer Science, Artificial Intelligence)
We introduce a model of attentional eye guidance based on the rationale that the deployment of gaze is to be considered in the context of a general action-perception loop relying on two strictly intertwined processes: sensory processing, depending on current gaze position, identifies sources of information that are most valuable under the given task; motor processing links such information with the oculomotor act by sampling the next gaze position and thus performing the gaze shift. In such a framework, the choice of where to look next is task-dependent and oriented to classes of objects embedded within pictures of complex scenes. The dependence on task is taken into account by exploiting the value and the payoff of gazing at certain image patches or proto-objects that provide a sparse representation of the scene objects. The different levels of the action-perception loop are represented in probabilistic form and eventually give rise to a stochastic process that generates the gaze sequence. This way the model also accounts for statistical properties of gaze shifts such as individual scan path variability. Results of the simulations are compared either with experimental data derived from publicly available datasets and from our own experiments. |
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
Springer US |
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
1866-9956 |
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
DAG; 600.056; 600.045; 605.203; 601.212; 600.077 |
Approved |
no |
|
|
Call Number |
Admin @ si @ CKL2014 |
Serial |
2419 |
|
Permanent link to this record |