Souhail Bakkali, Sanket Biswas, Zuheng Ming, Mickael Coustaty, Marçal Rusiñol, Oriol Ramos Terrades, et al. (2023). TransferDoc: A Self-Supervised Transferable Document Representation Learning Model Unifying Vision and Language.
Abstract: The field of visual document understanding has witnessed a rapid growth in emerging challenges and powerful multi-modal strategies. However, they rely on an extensive amount of document data to learn their pretext objectives in a ``pre-train-then-fine-tune'' paradigm and thus, suffer a significant performance drop in real-world online industrial settings. One major reason is the over-reliance on OCR engines to extract local positional information within a document page. Therefore, this hinders the model's generalizability, flexibility and robustness due to the lack of capturing global information within a document image. We introduce TransferDoc, a cross-modal transformer-based architecture pre-trained in a self-supervised fashion using three novel pretext objectives. TransferDoc learns richer semantic concepts by unifying language and visual representations, which enables the production of more transferable models. Besides, two novel downstream tasks have been introduced for a ``closer-to-real'' industrial evaluation scenario where TransferDoc outperforms other state-of-the-art approaches.
|
G.Blasco, Simone Balocco, J.Puig, J.Sanchez-Gonzalez, W.Ricart, J.Daunis-I-Estadella, et al. (2015). Carotid pulse wave velocity by magnetic resonance imaging is increased in middle-aged subjects with the metabolic syndrome. ICJI - International Journal of Cardiovascular Imaging, 31(3), 603–612.
Abstract: Arterial pulse wave velocity (PWV), an independent predictor of cardiovascular disease, physiologically increases with age; however, growing evidence suggests metabolic syndrome (MetS) accelerates this increase. Magnetic resonance imaging (MRI) enables reliable noninvasive assessment of arterial stiffness by measuring arterial PWV in specific vascular segments. We investigated the association between the presence of MetS and its components with carotid PWV (cPWV) in asymptomatic subjects without diabetes. We assessed cPWV by MRI in 61 individuals (mean age, 55.3 ± 14.1 years; median age, 55 years): 30 with MetS and 31 controls with similar age, sex, body mass index, and LDL-cholesterol levels. The study population was dichotomized by the median age. To remove the physiological association between PWV and age, unpaired t tests and multiple regression analyses were performed using the residuals of the regression between PWV and age. cPWV was higher in middle-aged subjects with MetS than in those without (p = 0.001), but no differences were found in elder subjects (p = 0.313). cPWV was associated with diastolic blood pressure (r = 0.276, p = 0.033) and waist circumference (r = 0.268, p = 0.038). The presence of MetS was associated with increased cPWV regardless of age, sex, blood pressure, and waist (p = 0.007). The MetS components contributing independently to an increased cPWV were hypertension (p = 0.018) and hypertriglyceridemia (p = 0.002). The presence of MetS is associated with an increased cPWV in middle-aged subjects. In particular, hypertension and hypertriglyceridemia may contribute to early progression of carotid stiffness.
Keywords: Metabolic syndrome; Arterial stiffness; Pulse wave velocity; Carotid artery; Magnetic resonance
|
C. Butakoff, Simone Balocco, F.M. Sukno, C. Hoogendoorn, C. Tobon-Gomez, G. Avegliano, et al. (2016). Left-ventricular Epi- and Endocardium Extraction from 3D Ultrasound Images Using an Automatically Constructed 3D ASM. CMBBE - Computer Methods in Biomechanics and Biomedical Engineering: Imaging and Visualization, 4(5), 265–280.
Abstract: In this paper, we propose an automatic method for constructing an active shape model (ASM) to segment the complete cardiac left ventricle in 3D ultrasound (3DUS) images, which avoids costly manual landmarking. The automatic construction of the ASM has already been addressed in the literature; however, the direct application of these methods to 3DUS is hampered by a high level of noise and artefacts. Therefore, we propose to construct the ASM by fusing the multidetector computed tomography data, to learn the shape, with the artificially generated 3DUS, in order to learn the neighbourhood of the boundaries. Our artificial images were generated by two approaches: a faster one that does not take into account the geometry of the transducer, and a more comprehensive one, implemented in Field II toolbox. The segmentation accuracy of our ASM was evaluated on 20 patients with left-ventricular asynchrony, demonstrating plausibility of the approach.
Keywords: ASM; cardiac segmentation; statistical model; shape model; 3D ultrasound; cardiac segmentation
|
Arnau Baro, Carles Badal, Pau Torras, & Alicia Fornes. (2022). Handwritten Historical Music Recognition through Sequence-to-Sequence with Attention Mechanism. In 3rd International Workshop on Reading Music Systems (WoRMS2021) (pp. 55–59).
Abstract: Despite decades of research in Optical Music Recognition (OMR), the recognition of old handwritten music scores remains a challenge because of the variabilities in the handwriting styles, paper degradation, lack of standard notation, etc. Therefore, the research in OMR systems adapted to the particularities of old manuscripts is crucial to accelerate the conversion of music scores existing in archives into digital libraries, fostering the dissemination and preservation of our music heritage. In this paper we explore the adaptation of sequence-to-sequence models with attention mechanism (used in translation and handwritten text recognition) and the generation of specific synthetic data for recognizing old music scores. The experimental validation demonstrates that our approach is promising, especially when compared with long short-term memory neural networks.
Keywords: Optical Music Recognition; Digits; Image Classification
|
Jean-Christophe Burie, J. Chazalon, M. Coustaty, S. Eskenazi, Muhammad Muzzamil Luqman, M. Mehri, et al. (2015). ICDAR2015 Competition on Smartphone Document Capture and OCR (SmartDoc). In 13th International Conference on Document Analysis and Recognition ICDAR2015 (pp. 1161–1165).
Abstract: Smartphones are enabling new ways of capture,
hence arises the need for seamless and reliable acquisition and
digitization of documents, in order to convert them to editable,
searchable and a more human-readable format. Current stateof-the-art
works lack databases and baseline benchmarks for
digitizing mobile captured documents. We have organized a
competition for mobile document capture and OCR in order to
address this issue. The competition is structured into two independent
challenges: smartphone document capture, and smartphone
OCR. This report describes the datasets for both challenges
along with their ground truth, details the performance evaluation
protocols which we used, and presents the final results of the
participating methods. In total, we received 13 submissions: 8
for challenge-I, and 5 for challenge-2.
|
Ruben Ballester, Carles Casacuberta, & Sergio Escalera. (2023). Decorrelating neurons using persistence.
Abstract: We propose a novel way to improve the generalisation capacity of deep learning models by reducing high correlations between neurons. For this, we present two regularisation terms computed from the weights of a minimum spanning tree of the clique whose vertices are the neurons of a given network (or a sample of those), where weights on edges are correlation dissimilarities. We provide an extensive set of experiments to validate the effectiveness of our terms, showing that they outperform popular ones. Also, we demonstrate that naive minimisation of all correlations between neurons obtains lower accuracies than our regularisation terms, suggesting that redundancies play a significant role in artificial neural networks, as evidenced by some studies in neuroscience for real networks. We include a proof of differentiability of our regularisers, thus developing the first effective topological persistence-based regularisation terms that consider the whole set of neurons and that can be applied to a feedforward architecture in any deep learning task such as classification, data generation, or regression.
|
Arnau Baro, Jialuo Chen, Alicia Fornes, & Beata Megyesi. (2019). Towards a generic unsupervised method for transcription of encoded manuscripts. In 3rd International Conference on Digital Access to Textual Cultural Heritage (pp. 73–78).
Abstract: Historical ciphers, a special type of manuscripts, contain encrypted information, important for the interpretation of our history. The first step towards decipherment is to transcribe the images, either manually or by automatic image processing techniques. Despite the improvements in handwritten text recognition (HTR) thanks to deep learning methodologies, the need of labelled data to train is an important limitation. Given that ciphers often use symbol sets across various alphabets and unique symbols without any transcription scheme available, these supervised HTR techniques are not suitable to transcribe ciphers. In this paper we propose an un-supervised method for transcribing encrypted manuscripts based on clustering and label propagation, which has been successfully applied to community detection in networks. We analyze the performance on ciphers with various symbol sets, and discuss the advantages and drawbacks compared to supervised HTR methods.
Keywords: A. Baró, J. Chen, A. Fornés, B. Megyesi.
|
Asma Bensalah, Jialuo Chen, Alicia Fornes, Cristina Carmona_Duarte, Josep Llados, & Miguel A. Ferrer. (2020). Towards Stroke Patients' Upper-limb Automatic Motor Assessment Using Smartwatches. In International Workshop on Artificial Intelligence for Healthcare Applications (Vol. 12661, pp. 476–489).
Abstract: Assessing the physical condition in rehabilitation scenarios is a challenging problem, since it involves Human Activity Recognition (HAR) and kinematic analysis methods. In addition, the difficulties increase in unconstrained rehabilitation scenarios, which are much closer to the real use cases. In particular, our aim is to design an upper-limb assessment pipeline for stroke patients using smartwatches. We focus on the HAR task, as it is the first part of the assessing pipeline. Our main target is to automatically detect and recognize four key movements inspired by the Fugl-Meyer assessment scale, which are performed in both constrained and unconstrained scenarios. In addition to the application protocol and dataset, we propose two detection and classification baseline methods. We believe that the proposed framework, dataset and baseline results will serve to foster this research field.
|
Simone Balocco, Francesco Ciompi, Juan Rigla, Xavier Carrillo, J. Mauri, & Petia Radeva. (2017). Intra-Coronary Stent localization In Intravascular Ultrasound Sequences, A Preliminary Study. In International workshop on Computing and Visualization for Intravascular Imaging and Computer Assisted Stenting (CVII-STENT). LNCS.
Abstract: An intraluminal coronary stent is a metal scaold deployed in a stenotic artery during Percutaneous Coronary Intervention (PCI).
Intravascular Ultrasound (IVUS) is a catheter-based imaging technique generally used for assessing the correct placement of the stent. All the approaches proposed so far for the stent analysis only focused on the struts detection, while this paper proposes a novel approach to detect the boundaries and the position of the stent along the pullback.
The pipeline of the method requires the identication of the stable frames
of the sequence and the reliable detection of stent struts. Using this data,
a measure of likelihood for a frame to contain a stent is computed. Then,
a robust binary representation of the presence of the stent in the pullback
is obtained applying an iterative and multi-scale approximation of the signal to symbols using the SAX algorithm. Results obtained comparing the automatic results versus the manual annotation of two observers on 80 IVUS in-vivo sequences shows that the method approaches the inter-observer variability scores.
|
Simone Balocco, Francesco Ciompi, Juan Rigla, Xavier Carrillo, J. Mauri, & Petia Radeva. (2019). Assessment of intracoronary stent location and extension in intravascular ultrasound sequences. MEDPHYS - Medical Physics, 46(2), 484–493.
Abstract: PURPOSE:
An intraluminal coronary stent is a metal scaffold deployed in a stenotic artery during percutaneous coronary intervention (PCI). In order to have an effective deployment, a stent should be optimally placed with regard to anatomical structures such as bifurcations and stenoses. Intravascular ultrasound (IVUS) is a catheter-based imaging technique generally used for PCI guiding and assessing the correct placement of the stent. A novel approach that automatically detects the boundaries and the position of the stent along the IVUS pullback is presented. Such a technique aims at optimizing the stent deployment.
METHODS:
The method requires the identification of the stable frames of the sequence and the reliable detection of stent struts. Using these data, a measure of likelihood for a frame to contain a stent is computed. Then, a robust binary representation of the presence of the stent in the pullback is obtained applying an iterative and multiscale quantization of the signal to symbols using the Symbolic Aggregate approXimation algorithm.
RESULTS:
The technique was extensively validated on a set of 103 IVUS of sequences of in vivo coronary arteries containing metallic and bioabsorbable stents acquired through an international multicentric collaboration across five clinical centers. The method was able to detect the stent position with an overall F-measure of 86.4%, a Jaccard index score of 75% and a mean distance of 2.5 mm from manually annotated stent boundaries, and in bioabsorbable stents with an overall F-measure of 88.6%, a Jaccard score of 77.7 and a mean distance of 1.5 mm from manually annotated stent boundaries. Additionally, a map indicating the distance between the lumen and the stent along the pullback is created in order to show the angular sectors of the sequence in which the malapposition is present.
CONCLUSIONS:
Results obtained comparing the automatic results vs the manual annotation of two observers shows that the method approaches the interobserver variability. Similar performances are obtained on both metallic and bioabsorbable stents, showing the flexibility and robustness of the method.
Keywords: IVUS; malapposition; stent; ultrasound
|
Klaus Broelemann, Anjan Dutta, Xiaoyi Jiang, & Josep Llados. (2012). Hierarchical graph representation for symbol spotting in graphical document images. In Structural, Syntactic, and Statistical Pattern Recognition, Joint IAPR International Workshop (Vol. 7626, pp. 529–538). LNCS. Springer Berlin Heidelberg.
Abstract: Symbol spotting can be defined as locating given query symbol in a large collection of graphical documents. In this paper we present a hierarchical graph representation for symbols. This representation allows graph matching methods to deal with low-level vectorization errors and, thus, to perform a robust symbol spotting. To show the potential of this approach, we conduct an experiment with the SESYD dataset.
|
Klaus Broelemann, Anjan Dutta, Xiaoyi Jiang, & Josep Llados. (2013). Plausibility-Graphs for Symbol Spotting in Graphical Documents. In 10th IAPR International Workshop on Graphics Recognition.
Abstract: Graph representation of graphical documents often suffers from noise viz. spurious nodes and spurios edges of graph and their discontinuity etc. In general these errors occur during the low-level image processing viz. binarization, skeletonization, vectorization etc. Hierarchical graph representation is a nice and efficient way to solve this kind of problem by hierarchically merging node-node and node-edge depending on the distance.
But the creation of hierarchical graph representing the graphical information often uses hard thresholds on the distance to create the hierarchical nodes (next state) of the lower nodes (or states) of a graph. As a result the representation often loses useful information. This paper introduces plausibilities to the nodes of hierarchical graph as a function of distance and proposes a modified algorithm for matching subgraphs of the hierarchical
graphs. The plausibility-annotated nodes help to improve the performance of the matching algorithm on two hierarchical structures. To show the potential of this approach, we conduct an experiment with the SESYD dataset.
|
Klaus Broelemann, Anjan Dutta, Xiaoyi Jiang, & Josep Llados. (2014). Hierarchical Plausibility-Graphs for Symbol Spotting in Graphical Documents. In Bart Lamiroy, & Jean-Marc Ogier (Eds.), Graphics Recognition. Current Trends and Challenges (Vol. 8746, pp. 25–37). LNCS. Springer Berlin Heidelberg.
Abstract: Graph representation of graphical documents often suffers from noise such as spurious nodes and edges, and their discontinuity. In general these errors occur during the low-level image processing viz. binarization, skeletonization, vectorization etc. Hierarchical graph representation is a nice and efficient way to solve this kind of problem by hierarchically merging node-node and node-edge depending on the distance. But the creation of hierarchical graph representing the graphical information often uses hard thresholds on the distance to create the hierarchical nodes (next state) of the lower nodes (or states) of a graph. As a result, the representation often loses useful information. This paper introduces plausibilities to the nodes of hierarchical graph as a function of distance and proposes a modified algorithm for matching subgraphs of the hierarchical graphs. The plausibility-annotated nodes help to improve the performance of the matching algorithm on two hierarchical structures. To show the potential of this approach, we conduct an experiment with the SESYD dataset.
|
Thierry Brouard, A. Delaplace, Muhammad Muzzamil Luqman, H. Cardot, & Jean-Yves Ramel. (2010). Design of Evolutionary Methods Applied to the Learning of Bayesian Nerwork Structures. In Ahmed Rebai (Ed.), Bayesian Network (pp. 13–37). Sciyo.
|
Marc Bolaños, Mariella Dimiccoli, & Petia Radeva. (2017). Towards Storytelling from Visual Lifelogging: An Overview. THMS - IEEE Transactions on Human-Machine Systems, 47(1), 77–90.
Abstract: Visual lifelogging consists of acquiring images that capture the daily experiences of the user by wearing a camera over a long period of time. The pictures taken offer considerable potential for knowledge mining concerning how people live their lives, hence, they open up new opportunities for many potential applications in fields including healthcare, security, leisure and
the quantified self. However, automatically building a story from a huge collection of unstructured egocentric data presents major challenges. This paper provides a thorough review of advances made so far in egocentric data analysis, and in view of the current state of the art, indicates new lines of research to move us towards storytelling from visual lifelogging.
|