|   | 
Details
   web
Records
Author Souhail Bakkali; Sanket Biswas; Zuheng Ming; Mickael Coustaty; Marçal Rusiñol; Oriol Ramos Terrades; Josep Llados
Title TransferDoc: A Self-Supervised Transferable Document Representation Learning Model Unifying Vision and Language Type (up) Miscellaneous
Year 2023 Publication Arxiv Abbreviated Journal
Volume Issue Pages
Keywords
Abstract The field of visual document understanding has witnessed a rapid growth in emerging challenges and powerful multi-modal strategies. However, they rely on an extensive amount of document data to learn their pretext objectives in a ``pre-train-then-fine-tune'' paradigm and thus, suffer a significant performance drop in real-world online industrial settings. One major reason is the over-reliance on OCR engines to extract local positional information within a document page. Therefore, this hinders the model's generalizability, flexibility and robustness due to the lack of capturing global information within a document image. We introduce TransferDoc, a cross-modal transformer-based architecture pre-trained in a self-supervised fashion using three novel pretext objectives. TransferDoc learns richer semantic concepts by unifying language and visual representations, which enables the production of more transferable models. Besides, two novel downstream tasks have been introduced for a ``closer-to-real'' industrial evaluation scenario where TransferDoc outperforms other state-of-the-art approaches.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes DAG Approved no
Call Number Admin @ si @ BBM2023 Serial 3995
Permanent link to this record
 

 
Author Ruben Perez Tito; Khanh Nguyen; Marlon Tobaben; Raouf Kerkouche; Mohamed Ali Souibgui; Kangsoo Jung; Lei Kang; Ernest Valveny; Antti Honkela; Mario Fritz; Dimosthenis Karatzas
Title Privacy-Aware Document Visual Question Answering Type (up) Miscellaneous
Year 2023 Publication Arxiv Abbreviated Journal
Volume Issue Pages
Keywords
Abstract Document Visual Question Answering (DocVQA) is a fast growing branch of document understanding. Despite the fact that documents contain sensitive or copyrighted information, none of the current DocVQA methods offers strong privacy guarantees.
In this work, we explore privacy in the domain of DocVQA for the first time. We highlight privacy issues in state of the art multi-modal LLM models used for DocVQA, and explore possible solutions.
Specifically, we focus on the invoice processing use case as a realistic, widely used scenario for document understanding, and propose a large scale DocVQA dataset comprising invoice documents and associated questions and answers. We employ a federated learning scheme, that reflects the real-life distribution of documents in different businesses, and we explore the use case where the ID of the invoice issuer is the sensitive information to be protected.
We demonstrate that non-private models tend to memorise, behaviour that can lead to exposing private information. We then evaluate baseline training schemes employing federated learning and differential privacy in this multi-modal scenario, where the sensitive information might be exposed through any of the two input modalities: vision (document image) or language (OCR tokens).
Finally, we design an attack exploiting the memorisation effect of the model, and demonstrate its effectiveness in probing different DocVQA models.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes DAG Approved no
Call Number Admin @ si @ PNT2023 Serial 4012
Permanent link to this record
 

 
Author Daniel Marczak; Sebastian Cygert; Tomasz Trzcinski; Bartlomiej Twardowski
Title Revisiting Supervision for Continual Representation Learning Type (up) Miscellaneous
Year 2023 Publication Arxiv Abbreviated Journal
Volume Issue Pages
Keywords
Abstract In the field of continual learning, models are designed to learn tasks one after the other. While most research has centered on supervised continual learning, recent studies have highlighted the strengths of self-supervised continual representation learning. The improved transferability of representations built with self-supervised methods is often associated with the role played by the multi-layer perceptron projector. In this work, we depart from this observation and reexamine the role of supervision in continual representation learning. We reckon that additional information, such as human annotations, should not deteriorate the quality of representations. Our findings show that supervised models when enhanced with a multi-layer perceptron head, can outperform self-supervised models in continual representation learning.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes xxx Approved no
Call Number Admin @ si @ MCT2023 Serial 4013
Permanent link to this record
 

 
Author Jose Luis Gomez; Manuel Silva; Antonio Seoane; Agnes Borras; Mario Noriega; German Ros; Jose Antonio Iglesias; Antonio Lopez
Title All for One, and One for All: UrbanSyn Dataset, the third Musketeer of Synthetic Driving Scenes Type (up) Miscellaneous
Year 2023 Publication Arxiv Abbreviated Journal
Volume Issue Pages
Keywords
Abstract We introduce UrbanSyn, a photorealistic dataset acquired through semi-procedurally generated synthetic urban driving scenarios. Developed using high-quality geometry and materials, UrbanSyn provides pixel-level ground truth, including depth, semantic segmentation, and instance segmentation with object bounding boxes and occlusion degree. It complements GTAV and Synscapes datasets to form what we coin as the 'Three Musketeers'. We demonstrate the value of the Three Musketeers in unsupervised domain adaptation for image semantic segmentation. Results on real-world datasets, Cityscapes, Mapillary Vistas, and BDD100K, establish new benchmarks, largely attributed to UrbanSyn. We make UrbanSyn openly and freely accessible (this http URL).
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes ADAS Approved no
Call Number Admin @ si @ GSS2023 Serial 4015
Permanent link to this record
 

 
Author Mustafa Hajij; Mathilde Papillon; Florian Frantzen; Jens Agerberg; Ibrahem AlJabea; Ruben Ballester; Claudio Battiloro; Guillermo Bernardez; Tolga Birdal; Aiden Brent; Peter Chin; Sergio Escalera; Simone Fiorellino; Odin Hoff Gardaa; Gurusankar Gopalakrishnan; Devendra Govil; Josef Hoppe; Maneel Reddy Karri; Jude Khouja; Manuel Lecha; Neal Livesay; Jan Meibner; Soham Mukherjee; Alexander Nikitin; Theodore Papamarkou; Jaro Prilepok; Karthikeyan Natesan Ramamurthy; Paul Rosen; Aldo Guzman-Saenz; Alessandro Salatiello; Shreyas N. Samaga; Simone Scardapane; Michael T. Schaub; Luca Scofano; Indro Spinelli; Lev Telyatnikov; Quang Truong; Robin Walters; Maosheng Yang; Olga Zaghen; Ghada Zamzmi; Ali Zia; Nina Miolane
Title TopoX: A Suite of Python Packages for Machine Learning on Topological Domains Type (up) Miscellaneous
Year 2024 Publication Arxiv Abbreviated Journal
Volume Issue Pages
Keywords
Abstract We introduce TopoX, a Python software suite that provides reliable and user-friendly building blocks for computing and machine learning on topological domains that extend graphs: hypergraphs, simplicial, cellular, path and combinatorial complexes. TopoX consists of three packages: TopoNetX facilitates constructing and computing on these domains, including working with nodes, edges and higher-order cells; TopoEmbedX provides methods to embed topological domains into vector spaces, akin to popular graph-based embedding algorithms such as node2vec; TopoModelx is built on top of PyTorch and offers a comprehensive toolbox of higher-order message passing functions for neural networks on topological domains. The extensively documented and unit-tested source code of TopoX is available under MIT license at this https URL.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes HUPBA Approved no
Call Number Admin @ si @ HPF2024 Serial 4021
Permanent link to this record
 

 
Author German Barquero; Sergio Escalera; Cristina Palmero
Title Seamless Human Motion Composition with Blended Positional Encodings Type (up) Miscellaneous
Year 2024 Publication Arxiv Abbreviated Journal
Volume Issue Pages
Keywords
Abstract Conditional human motion generation is an important topic with many applications in virtual reality, gaming, and robotics. While prior works have focused on generating motion guided by text, music, or scenes, these typically result in isolated motions confined to short durations. Instead, we address the generation of long, continuous sequences guided by a series of varying textual descriptions. In this context, we introduce FlowMDM, the first diffusion-based model that generates seamless Human Motion Compositions (HMC) without any postprocessing or redundant denoising steps. For this, we introduce the Blended Positional Encodings, a technique that leverages both absolute and relative positional encodings in the denoising chain. More specifically, global motion coherence is recovered at the absolute stage, whereas smooth and realistic transitions are built at the relative stage. As a result, we achieve state-of-the-art results in terms of accuracy, realism, and smoothness on the Babel and HumanML3D datasets. FlowMDM excels when trained with only a single description per motion sequence thanks to its Pose-Centric Cross-ATtention, which makes it robust against varying text descriptions at inference time. Finally, to address the limitations of existing HMC metrics, we propose two new metrics: the Peak Jerk and the Area Under the Jerk, to detect abrupt transitions.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes HUPBA Approved no
Call Number Admin @ si @ BEP2024 Serial 4022
Permanent link to this record
 

 
Author Ayan Banerjee; Sanket Biswas; Josep Llados; Umapada Pal
Title GraphKD: Exploring Knowledge Distillation Towards Document Object Detection with Structured Graph Creation Type (up) Miscellaneous
Year 2024 Publication Arxiv Abbreviated Journal
Volume Issue Pages
Keywords
Abstract Object detection in documents is a key step to automate the structural elements identification process in a digital or scanned document through understanding the hierarchical structure and relationships between different elements. Large and complex models, while achieving high accuracy, can be computationally expensive and memory-intensive, making them impractical for deployment on resource constrained devices. Knowledge distillation allows us to create small and more efficient models that retain much of the performance of their larger counterparts. Here we present a graph-based knowledge distillation framework to correctly identify and localize the document objects in a document image. Here, we design a structured graph with nodes containing proposal-level features and edges representing the relationship between the different proposal regions. Also, to reduce text bias an adaptive node sampling strategy is designed to prune the weight distribution and put more weightage on non-text nodes. We encode the complete graph as a knowledge representation and transfer it from the teacher to the student through the proposed distillation loss by effectively capturing both local and global information concurrently. Extensive experimentation on competitive benchmarks demonstrates that the proposed framework outperforms the current state-of-the-art approaches. The code will be available at: this https URL.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes DAG Approved no
Call Number Admin @ si @ BBL2024b Serial 4023
Permanent link to this record
 

 
Author Petia Radeva; Jordi Vitria; Fernando Vilariño; Panagiota Spyridonos; Fernando Azpiroz; Juan Malagelada; Fosca de Iorio; Anna Accarino
Title Cascade analysis for intestinal contraction detection Type (up) Patent
Year 2009 Publication US 2009/0284589 A1 Abbreviated Journal USPO
Volume Issue Pages 1-25
Keywords
Abstract A method and system cascade analysisi for intestinal contraction detection is provided by extracting from image frames captured in-vivo. The method and system also relate to the detection of turbid liquids in intestinal tracts, to automatic detection of video image frames taken in the gastrointestinal tract including a field of view obstructed by turbid media, and more particulary, to extraction of image data obstructed by turbid media.
Address
Corporate Author US Patent Office Thesis
Publisher US Patent Office Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes MILAB; OR; MV;SIAI Approved no
Call Number IAM @ iam @ RVV2009 Serial 1700
Permanent link to this record
 

 
Author Panagiota Spyridonos; Fernando Vilariño; Jordi Vitria; Petia Radeva; Fernando Azpiroz; Juan Malagelada
Title Device, system and method for automatic detection of contractile activity in an image frame Type (up) Patent
Year 2011 Publication US 2011/0044515 A1 Abbreviated Journal
Volume Issue Pages
Keywords
Abstract A device, system and method for automatic detection of contractile activity of a body lumen in an image frame is provided, wherein image frames during contractile activity are captured and/or image frames including contractile activity are automatically detected, such as through pattern recognition and/or feature extraction to trace image frames including contractions, e.g., with wrinkle patterns. A manual procedure of annotation of contractions, e.g. tonic contractions in capsule endoscopy, may consist of the visualization of the whole video by a specialist, and the labeling of the contraction frames. Embodiments of the present invention may be suitable for implementation in an in vivo imaging system.
Address Pearl Cohen Zedek Latzer, LLP, 1500 Broadway 12th Floor, New York (NY) 10036 (US)
Corporate Author US Patent Office Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes MV;OR;MILAB;SIAI Approved no
Call Number IAM @ iam @ SVV2011 Serial 1701
Permanent link to this record
 

 
Author Fernando Vilariño; Panagiota Spyridonos; Petia Radeva; Jordi Vitria; Fernando Azpiroz; Juan Malagelada
Title Method for automatic classification of in vivo images Type (up) Patent
Year 2010 Publication US 2010/0046816 Abbreviated Journal
Volume Issue Pages
Keywords
Abstract A method for automatically detecting a post-duodenal boundary in an image stream of the gastrointestinal (GI) tract. The image stream is sampled to obtain a reduced set of images for processing. The reduced set of images is filtered to remove non-valid frames or non-valid portions of frames, thereby generating a filtered set of valid images. A polar representation of the valid images is generated. Textural features of the polar representation are processed to detect the post-duodenal boundary of the GI tract.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area 800 Expedition Conference
Notes MV;OR;MILAB;SIAI Approved no
Call Number IAM @ iam @ VSR2010 Serial 1702
Permanent link to this record
 

 
Author Gerard Lacey; Fernando Vilariño
Title Endoscopy system with motion sensors Type (up) Patent
Year 2011 Publication US 2011/0032347 A1 Abbreviated Journal
Volume Issue Pages
Keywords
Abstract An endoscopy system (1) comprises an endoscope (2) with a camera (3) at its tip. The endoscope extends through an endoscope guide (4) for guiding movement of the endoscope and for measurement of its movement as it enters the body. The guide (4) comprises a generally conical body (5) having a through passage (105) through which the endoscope (2) extends. A motion sensor comprises an optical transmitter (7) and a detector (8) mounted alongside the passage (105) to measure the insertion-withdrawal linear motion and also rotation of the endoscope by the endoscopist's hand. The system (1) also comprises a flexure controller (10) having wheels operated by the endoscopist. The camera (3), the motion sensor (7/8), and the flexure controller (10) are all connected to a processor (11) which feeds a display.
Address Jacobson Holman PPLC; 400 Seventh Street, N.W. Suite 600; Whashington DC 20004 DC
Corporate Author USPTO Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area 800 Expedition Conference
Notes MV;SIAI Approved no
Call Number IAM @ iam @ LaV2011 Serial 1703
Permanent link to this record
 

 
Author Fernando Vilariño; Panagiota Spyridonos; Petia Radeva; Jordi Vitria; Fernando Azpiroz; Juan Malagelada
Title Device, system and method for measurement and analysis of contractile activity Type (up) Patent
Year 2009 Publication US 2009/0202117 A1 Abbreviated Journal
Volume Issue Pages
Keywords
Abstract A method and system for determining intestinal dysfunction condition are provided by classifying and analyzing image frames captured in-vivo. The method and system also relate to the detection of contractile activity in intestinal tracts, to automatic detection of video image frames taken in the gastrointestinal tract including contractile activity, and more particularly to measurement and analysis of contractile activity of the GI tract based on image intensity of in vivo image data.
Address Pearl Cohen Zedek Latzer
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area 800 Expedition Conference
Notes MV;OR;MILAB;SIAI Approved no
Call Number IAM @ iam @ VSR2009 Serial 1704
Permanent link to this record
 

 
Author Michal Drozdzal; Petia Radeva; Santiago Segui; Laura Igual; Carolina Malagelada; Fernando Azpiroz; Jordi Vitria
Title System and Method for Improving a Discriminative Model Type (up) Patent
Year 2012 Publication US 61/450,886 Abbreviated Journal
Volume Issue Pages
Keywords
Abstract
Address Given Imaging
Corporate Author US Patent Office Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes MILAB; OR;MV Approved no
Call Number Admin @ si @ DRS2012a Serial 1896
Permanent link to this record
 

 
Author Francesc Tanarro Marquez; Pau Gratacos Marti; F. Javier Sanchez; Joan Ramon Jimenez Minguell; Coen Antens; Enric Sala i Esteva
Title A device for monitoring condition of a railway supply Type (up) Patent
Year 2012 Publication EP 2 404 777 A1 Abbreviated Journal
Volume Issue Pages
Keywords
Abstract of a railway supply line when the supply line is in contact with a head of a pantograph of a vehicle in order to power said vehicle . The device includes a camera ( for monitoring parameters indicative of operating capability of said supply line.
The device is intended to monitor condition
tive of operating capability of said supply line. The device includes a reflective element. comprising a pattern , intended to be arranged onto the pantograph head . The camera is intended to be arranged on the vehicle (10) so as to register the pattern position regarding a vertical direction.
Address
Corporate Author ALSTOM Transport SA Thesis
Publisher European Patent Office Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes MV Approved no
Call Number IAM @ iam @ MMS2012 Serial 1854
Permanent link to this record
 

 
Author Michal Drozdzal; Santiago Segui; Petia Radeva; Jordi Vitria; Laura Igual
Title System and Method for Displaying Motility Events in an in Vivo Image Stream Type (up) Patent
Year 2011 Publication US 61/592,786 Abbreviated Journal
Volume Issue Pages
Keywords
Abstract
Address Given Imaging
Corporate Author US Patent Office Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes MILAB; OR;MV Approved no
Call Number Admin @ si @ DSR2011 Serial 1897
Permanent link to this record