|
Giuseppe De Gregorio, Sanket Biswas, Mohamed Ali Souibgui, Asma Bensalah, Josep Llados, Alicia Fornes, et al. (2022). A Few Shot Multi-representation Approach for N-Gram Spotting in Historical Manuscripts. In Frontiers in Handwriting Recognition. International Conference on Frontiers in Handwriting Recognition (ICFHR2022) (Vol. 13639, pp. 3–12). LNCS.
Abstract: Despite recent advances in automatic text recognition, the performance remains moderate when it comes to historical manuscripts. This is mainly because of the scarcity of available labelled data to train the data-hungry Handwritten Text Recognition (HTR) models. The Keyword Spotting System (KWS) provides a valid alternative to HTR due to the reduction in error rate, but it is usually limited to a closed reference vocabulary. In this paper, we propose a few-shot learning paradigm for spotting sequences of a few characters (N-gram) that requires a small amount of labelled training data. We exhibit that recognition of important n-grams could reduce the system’s dependency on vocabulary. In this case, an out-of-vocabulary (OOV) word in an input handwritten line image could be a sequence of n-grams that belong to the lexicon. An extensive experimental evaluation of our proposed multi-representation approach was carried out on a subset of Bentham’s historical manuscript collections to obtain some really promising results in this direction.
Keywords: N-gram spotting; Few-shot learning; Multimodal understanding; Historical handwritten collections
|
|
|
Hugo Bertiche, Meysam Madadi, & Sergio Escalera. (2022). Neural Cloth Simulation. ACMTGraph - ACM Transactions on Graphics, 41(6), 1–14.
Abstract: We present a general framework for the garment animation problem through unsupervised deep learning inspired in physically based simulation. Existing trends in the literature already explore this possibility. Nonetheless, these approaches do not handle cloth dynamics. Here, we propose the first methodology able to learn realistic cloth dynamics unsupervisedly, and henceforth, a general formulation for neural cloth simulation. The key to achieve this is to adapt an existing optimization scheme for motion from simulation based methodologies to deep learning. Then, analyzing the nature of the problem, we devise an architecture able to automatically disentangle static and dynamic cloth subspaces by design. We will show how this improves model performance. Additionally, this opens the possibility of a novel motion augmentation technique that greatly improves generalization. Finally, we show it also allows to control the level of motion in the predictions. This is a useful, never seen before, tool for artists. We provide of detailed analysis of the problem to establish the bases of neural cloth simulation and guide future research into the specifics of this domain.
ACM Transactions on GraphicsVolume 41Issue 6December 2022 Article No.: 220pp 1–
|
|
|
Joakim Bruslund Haurum, Meysam Madadi, Sergio Escalera, & Thomas B. Moeslund. (2022). Multi-scale hybrid vision transformer and Sinkhorn tokenizer for sewer defect classification. AC - Automation in Construction, 144, 104614.
Abstract: A crucial part of image classification consists of capturing non-local spatial semantics of image content. This paper describes the multi-scale hybrid vision transformer (MSHViT), an extension of the classical convolutional neural network (CNN) backbone, for multi-label sewer defect classification. To better model spatial semantics in the images, features are aggregated at different scales non-locally through the use of a lightweight vision transformer, and a smaller set of tokens was produced through a novel Sinkhorn clustering-based tokenizer using distinct cluster centers. The proposed MSHViT and Sinkhorn tokenizer were evaluated on the Sewer-ML multi-label sewer defect classification dataset, showing consistent performance improvements of up to 2.53 percentage points.
Keywords: Sewer Defect Classification; Vision Transformers; Sinkhorn-Knopp; Convolutional Neural Networks; Closed-Circuit Television; Sewer Inspection
|
|
|
Jose Carlos Rubio, Joan Serrat, & Antonio Lopez. (2012). Video Co-segmentation. In 11th Asian Conference on Computer Vision (Vol. 7725, pp. 13–24). LNCS. Springer Berlin Heidelberg.
Abstract: Segmentation of a single image is in general a highly underconstrained problem. A frequent approach to solve it is to somehow provide prior knowledge or constraints on how the objects of interest look like (in terms of their shape, size, color, location or structure). Image co-segmentation trades the need for such knowledge for something much easier to obtain, namely, additional images showing the object from other viewpoints. Now the segmentation problem is posed as one of differentiating the similar object regions in all the images from the more varying background. In this paper, for the first time, we extend this approach to video segmentation: given two or more video sequences showing the same object (or objects belonging to the same class) moving in a similar manner, we aim to outline its region in all the frames. In addition, the method works in an unsupervised manner, by learning to segment at testing time. We compare favorably with two state-of-the-art methods on video segmentation and report results on benchmark videos.
|
|
|
Katerine Diaz, Francesc J. Ferri, & W. Diaz. (2013). Fast Approximated Discriminative Common Vectors using rank-one SVD updates. In 20th International Conference On Neural Information Processing (Vol. 8228, pp. 368–375). LNCS. Springer Berlin Heidelberg.
Abstract: An efficient incremental approach to the discriminative common vector (DCV) method for dimensionality reduction and classification is presented. The proposal consists of a rank-one update along with an adaptive restriction on the rank of the null space which leads to an approximate but convenient solution. The algorithm can be implemented very efficiently in terms of matrix operations and space complexity, which enables its use in large-scale dynamic application domains. Deep comparative experimentation using publicly available high dimensional image datasets has been carried out in order to properly assess the proposed algorithm against several recent incremental formulations.
K. Diaz-Chito, F.J. Ferri, W. Diaz
|
|
|
Oriol Pujol. (2004). A semi-Supervised Statistical Framework and Generative Snakes for IVUS Analysis (Petia Radeva, Ed.). Ph.D. thesis, , .
|
|
|
A. Martinez, & Jordi Vitria. (1998). Learning mixture models with the EM algorithm and genetic algorithms.
|
|
|
Oriol Pujol. (1999). Model-based three dimensional interpolation of IVUS images.
|
|
|
David Guillamet. (1999). Reconeixement d´objectes en entorns poc controlats mitjançant metodes estadistics.
|
|
|
Robert Benavente. (1999). Dealing with colour variability: application to a colour naming task.
|
|
|
Jordi Vitria, Petia Radeva, X. Binefa, A. Pujol, Ernest Valveny, Robert Benavente, et al. (1999). Real time recognition of pharmaceutical products by subspace methods.
|
|
|
A. Martinez, & Jordi Vitria. (1996). Dimensionality reduction for face recognition.
|
|
|
X. Varona, & Juan J. Villanueva. (1997). Neural Networks for Early Vision..
|
|
|
Robert Benavente, & Maria Vanrell. (2001). A colour naming experiment.
|
|
|
Josep Llados. (1996). Interpretacio de dibuixos linials fets a ma alçada mitjançant isomorfisme entre subgrafs i transformacio de Hough.
|
|