|
Wenwen Yu, Mingyu Liu, Mingrui Chen, Ning Lu, Yinlong We, Yuliang Liu, et al. (2023). ICDAR 2023 Competition on Reading the Seal Title. In 17th International Conference on Document Analysis and Recognition (Vol. 14188, 522–535). LNCS.
Abstract: Reading seal title text is a challenging task due to the variable shapes of seals, curved text, background noise, and overlapped text. However, this important element is commonly found in official and financial scenarios, and has not received the attention it deserves in the field of OCR technology. To promote research in this area, we organized ICDAR 2023 competition on reading the seal title (ReST), which included two tasks: seal title text detection (Task 1) and end-to-end seal title recognition (Task 2). We constructed a dataset of 10,000 real seal data, covering the most common classes of seals, and labeled all seal title texts with text polygons and text contents. The competition opened on 30th December, 2022 and closed on 20th March, 2023. The competition attracted 53 participants and received 135 submissions from academia and industry, including 28 participants and 72 submissions for Task 1, and 25 participants and 63 submissions for Task 2, which demonstrated significant interest in this challenging task. In this report, we present an overview of the competition, including the organization, challenges, and results. We describe the dataset and tasks, and summarize the submissions and evaluation results. The results show that significant progress has been made in the field of seal title text reading, and we hope that this competition will inspire further research and development in this important area of OCR technology.
|
|
|
Marçal Rusiñol, Dimosthenis Karatzas, Andrew Bagdanov, & Josep Llados. (2012). Multipage Document Retrieval by Textual and Visual Representations. In 21st International Conference on Pattern Recognition (pp. 521–524).
Abstract: In this paper we present a multipage administrative document image retrieval system based on textual and visual representations of document pages. Individual pages are represented by textual or visual information using a bag-of-words framework. Different fusion strategies are evaluated which allow the system to perform multipage document retrieval on the basis of a single page retrieval system. Results are reported on a large dataset of document images sampled from a banking workflow.
|
|
|
Aura Hernandez-Sabate, Debora Gil, David Roche, Monica M. S. Matsumoto, & Sergio S. Furuie. (2011). Inferring the Performance of Medical Imaging Algorithms. In Pedro Real, Daniel Diaz-Pernil, Helena Molina-Abril, Ainhoa Berciano, & Walter Kropatsch (Eds.), 14th International Conference on Computer Analysis of Images and Patterns (Vol. 6854, pp. 520–528). LNCS. Berlin: Springer-Verlag Berlin Heidelberg.
Abstract: Evaluation of the performance and limitations of medical imaging algorithms is essential to estimate their impact in social, economic or clinical aspects. However, validation of medical imaging techniques is a challenging task due to the variety of imaging and clinical problems involved, as well as, the difficulties for systematically extracting a reliable solely ground truth. Although specific validation protocols are reported in any medical imaging paper, there are still two major concerns: definition of standardized methodologies transversal to all problems and generalization of conclusions to the whole clinical data set.
We claim that both issues would be fully solved if we had a statistical model relating ground truth and the output of computational imaging techniques. Such a statistical model could conclude to what extent the algorithm behaves like the ground truth from the analysis of a sampling of the validation data set. We present a statistical inference framework reporting the agreement and describing the relationship of two quantities. We show its transversality by applying it to validation of two different tasks: contour segmentation and landmark correspondence.
Keywords: Validation, Statistical Inference, Medical Imaging Algorithms.
|
|
|
Joan Mas, Jose Antonio Rodriguez, Dimosthenis Karatzas, Gemma Sanchez, & Josep Llados. (2008). HistoSketch: A Semi-Automatic Annotation Tool for Archival Documents. In Proceedings of the 8th International Workshop on Document Analysis Systems, (517–524).
|
|
|
J.Poujol, Cristhian A. Aguilera-Carrasco, E.Danos, Boris X. Vintimilla, Ricardo Toledo, & Angel Sappa. (2015). Visible-Thermal Fusion based Monocular Visual Odometry. In 2nd Iberian Robotics Conference ROBOT2015 (Vol. 417, pp. 517–528). Springer International Publishing.
Abstract: The manuscript evaluates the performance of a monocular visual odometry approach when images from different spectra are considered, both independently and fused. The objective behind this evaluation is to analyze if classical approaches can be improved when the given images, which are from different spectra, are fused and represented in new domains. The images in these new domains should have some of the following properties: i) more robust to noisy data; ii) less sensitive to changes (e.g., lighting); iii) more rich in descriptive information, among other. In particular in the current work two different image fusion strategies are considered. Firstly, images from the visible and thermal spectrum are fused using a Discrete Wavelet Transform (DWT) approach. Secondly, a monochrome threshold strategy is considered. The obtained
representations are evaluated under a visual odometry framework, highlighting
their advantages and disadvantages, using different urban and semi-urban scenarios. Comparisons with both monocular-visible spectrum and monocular-infrared spectrum, are also provided showing the validity of the proposed approach.
Keywords: Monocular Visual Odometry; LWIR-RGB cross-spectral Imaging; Image Fusion.
|
|
|
Petia Radeva, & Enric Marti. (1995). An improved model of snakes for model-based segmentation. In Proceedings of Computer Analysis of Images and Patterns (pp. 515–520).
Abstract: The main advantage of segmentation by snakes consists in its ability to incorporate smoothness constraints on the detected shapes that can occur. Likewise, we propose to model snakes with other properties that reflect the information provided about the object of interest in a different extent. We consider different kinds of snakes, those searching for contours with a certain direction, those preserving an object’s model, those seeking for symmetry, those expanding open, etc. The availability of such a collection of snakes allows not only the more complete use of the knowledge about the segmented object, but also to solve some problems of the existing snakes. Our experiments on segmentation of facial features justify the usefulness of snakes with different properties.
|
|
|
Spencer Low, Oliver Nina, Angel Sappa, Erik Blasch, & Nathan Inkawhich. (2023). Multi-Modal Aerial View Image Challenge: Translation From Synthetic Aperture Radar to Electro-Optical Domain Results-PBVS 2023. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (pp. 515–523).
Abstract: This paper unveils the discoveries and outcomes of the inaugural iteration of the Multi-modal Aerial View Image Challenge (MAVIC) aimed at image translation. The primary objective of this competition is to stimulate research efforts towards the development of models capable of translating co-aligned images between multiple modalities. To accomplish the task of image translation, the competition utilizes images obtained from both synthetic aperture radar (SAR) and electro-optical (EO) sources. Specifically, the challenge centers on the translation from the SAR modality to the EO modality, an area of research that has garnered attention. The inaugural challenge demonstrates the feasibility of the task. The dataset utilized in this challenge is derived from the UNIfied COincident Optical and Radar for recognitioN (UNICORN) dataset. We introduce an new version of the UNICORN dataset that is focused on enabling the sensor translation task. Performance evaluation is conducted using a combination of measures to ensure high fidelity and high accuracy translations.
|
|
|
Raul Gomez, Lluis Gomez, Jaume Gibert, & Dimosthenis Karatzas. (2018). Learning to Learn from Web Data through Deep Semantic Embeddings. In 15th European Conference on Computer Vision Workshops (Vol. 11134, pp. 514–529). LNCS.
Abstract: In this paper we propose to learn a multimodal image and text embedding from Web and Social Media data, aiming to leverage the semantic knowledge learnt in the text domain and transfer it to a visual model for semantic image retrieval. We demonstrate that the pipeline can learn from images with associated text without supervision and perform a thourough analysis of five different text embeddings in three different benchmarks. We show that the embeddings learnt with Web and Social Media data have competitive performances over supervised methods in the text based image retrieval task, and we clearly outperform state of the art in the MIRFlickr dataset when training in the target data. Further we demonstrate how semantic multimodal image retrieval can be performed using the learnt embeddings, going beyond classical instance-level retrieval problems. Finally, we present a new dataset, InstaCities1M, composed by Instagram images and their associated texts that can be used for fair comparison of image-text embeddings.
|
|
|
Joan Serrat, Ferran Diego, Jose Manuel Alvarez, & Felipe Lumbreras. (2007). Alignment of Videos Recorded from Moving Vehicles. In in 14th International Conference on Image Analysis and Processing, (512–517).
|
|
|
David Aldavert, Marçal Rusiñol, Ricardo Toledo, & Josep Llados. (2013). Integrating Visual and Textual Cues for Query-by-String Word Spotting. In 12th International Conference on Document Analysis and Recognition (pp. 511–515).
Abstract: In this paper, we present a word spotting framework that follows the query-by-string paradigm where word images are represented both by textual and visual representations. The textual representation is formulated in terms of character $n$-grams while the visual one is based on the bag-of-visual-words scheme. These two representations are merged together and projected to a sub-vector space. This transform allows to, given a textual query, retrieve word instances that were only represented by the visual modality. Moreover, this statistical representation can be used together with state-of-the-art indexation structures in order to deal with large-scale scenarios. The proposed method is evaluated using a collection of historical documents outperforming state-of-the-art performances.
|
|
|
Mohammad Ali Bagheri, Qigang Gao, & Sergio Escalera. (2012). Error Correcting Output Codes for multiclass classification: Application to two image vision problems. In 16th symposium on Artificial Intelligence & Signal Processing (pp. 508–513). IEEE Xplore.
Abstract: Error-correcting output codes (ECOC) represents a powerful framework to deal with multiclass classification problems based on combining binary classifiers. The key factor affecting the performance of ECOC methods is the independence of binary classifiers, without which the ECOC method would be ineffective. In spite of its ability on classification of problems with relatively large number of classes, it has been applied in few real world problems. In this paper, we investigate the behavior of the ECOC approach on two image vision problems: logo recognition and shape classification using Decision Tree and AdaBoost as the base learners. The results show that the ECOC method can be used to improve the classification performance in comparison with the classical multiclass approaches.
|
|
|
Andreas Fischer, Volkmar Frinken, Horst Bunke, & Ching Y. Suen. (2013). Improving HMM-Based Keyword Spotting with Character Language Models. In 12th International Conference on Document Analysis and Recognition (pp. 506–510).
Abstract: Facing high error rates and slow recognition speed for full text transcription of unconstrained handwriting images, keyword spotting is a promising alternative to locate specific search terms within scanned document images. We have previously proposed a learning-based method for keyword spotting using character hidden Markov models that showed a high performance when compared with traditional template image matching. In the lexicon-free approach pursued, only the text appearance was taken into account for recognition. In this paper, we integrate character n-gram language models into the spotting system in order to provide an additional language context. On the modern IAM database as well as the historical George Washington database, we demonstrate that character language models significantly improve the spotting performance.
|
|
|
Miguel Oliveira, Victor Santos, Angel Sappa, & P. Dias. (2015). Scene Representations for Autonomous Driving: an approach based on polygonal primitives. In 2nd Iberian Robotics Conference ROBOT2015 (Vol. 417, pp. 503–515).
Abstract: In this paper, we present a novel methodology to compute a 3D scene
representation. The algorithm uses macro scale polygonal primitives to model the scene. This means that the representation of the scene is given as a list of large scale polygons that describe the geometric structure of the environment. Results show that the approach is capable of producing accurate descriptions of the scene. In addition, the algorithm is very efficient when compared to other techniques.
Keywords: Scene reconstruction; Point cloud; Autonomous vehicles
|
|
|
Dani Rowe, Jordi Gonzalez, Ivan Huerta, & Juan J. Villanueva. (2007). On Reasoning over Tracking Events. In 15th Scandinavian Conference on Image Analysis (Vol. 4522, 502–511). LNCS.
|
|
|
Ernest Valveny, Ricardo Toledo, Ramon Baldrich, & Enric Marti. (2002). Combining recognition-based in segmentation-based approaches for graphic symol recognition using deformable template matching. In Proceeding of the Second IASTED International Conference Visualization, Imaging and Image Proceesing VIIP 2002 (502–507).
|
|