|
Saad Minhas, Aura Hernandez-Sabate, Shoaib Ehsan, Katerine Diaz, Ales Leonardis, Antonio Lopez, et al. (2016). LEE: A photorealistic Virtual Environment for Assessing Driver-Vehicle Interactions in Self-Driving Mode. In 14th European Conference on Computer Vision Workshops (Vol. 9915, pp. 894–900). LNCS.
Abstract: Photorealistic virtual environments are crucial for developing and testing automated driving systems in a safe way during trials. As commercially available simulators are expensive and bulky, this paper presents a low-cost, extendable, and easy-to-use (LEE) virtual environment with the aim to highlight its utility for level 3 driving automation. In particular, an experiment is performed using the presented simulator to explore the influence of different variables regarding control transfer of the car after the system was driving autonomously in a highway scenario. The results show that the speed of the car at the time when the system needs to transfer the control to the human driver is critical.
Keywords: Simulation environment; Automated Driving; Driver-Vehicle interaction
|
|
|
S.Grau, Anna Puig, Sergio Escalera, Maria Salamo, & Oscar Amoros. (2013). Efficient complementary viewpoint selection in volume rendering. In 21st WSCG Conference on Computer Graphics,.
Abstract: A major goal of visualization is to appropriately express knowledge of scientific data. Generally, gathering visual information contained in the volume data often requires a lot of expertise from the final user to setup the parameters of the visualization. One way of alleviating this problem is to provide the position of inner structures with different viewpoint locations to enhance the perception and construction of the mental image. To this end, traditional illustrations use two or three different views of the regions of interest. Similarly, with the aim of assisting the users to easily place a good viewpoint location, this paper proposes an automatic and interactive method that locates different complementary viewpoints from a reference camera in volume datasets. Specifically, the proposed method combines the quantity of information each camera provides for each structure and the shape similarity of the projections of the remaining viewpoints based on Dynamic Time Warping. The selected complementary viewpoints allow a better understanding of the focused structure in several applications. Thus, the user interactively receives feedback based on several viewpoints that helps him to understand the visual information. A live-user evaluation on different data sets show a good convergence to useful complementary viewpoints.
Keywords: Dual camera; Visualization; Interactive Interfaces; Dynamic Time Warping.
|
|
|
S.Grau, Ana Puig, Sergio Escalera, & Maria Salamo. (2013). Intelligent Interactive Volume Classification. In Pacific Graphics (Vol. 32, pp. 23–28).
Abstract: This paper defines an intelligent and interactive framework to classify multiple regions of interest from the original data on demand, without requiring any preprocessing or previous segmentation. The proposed intelligent and interactive approach is divided in three stages: visualize, training and testing. First, users visualize and label some samples directly on slices of the volume. Training and testing are based on a framework of Error Correcting Output Codes and Adaboost classifiers that learn to classify each region the user has painted. Later, at the testing stage, each classifier is directly applied on the rest of samples and combined to perform multi-class labeling, being used in the final rendering. We also parallelized the training stage using a GPU-based implementation for
obtaining a rapid interaction and classification.
|
|
|
S. Chanda, Oriol Ramos Terrades, & Umapada Pal. (2007). SVM Based Scheme for Thai and English Script Identification. In 9th International Conference on Document Analysis and Recognition (Vol. 1, 551–555).
|
|
|
Ruth Aylett, Ginevra Castellano, Bogdan Raducanu, Ana Paiva, & Marc Hanheide. (2011). Long-term socially perceptive and interactive robot companions: challenges and future perspectives. In 13th International Conference on Multimodal Interaction (pp. 323–326). ACM.
Abstract: This paper gives a brief overview of the challenges for multi-model perception and generation applied to robot companions located in human social environments. It reviews the current position in both perception and generation and the immediate technical challenges and goes on to consider the extra issues raised by embodiment and social context. Finally, it briefly discusses the impact of systems that must function continually over months rather than just for a few hours.
Keywords: human-robot interaction, multimodal interaction, social robotics
|
|
|
Rui Zhang, Yongsheng Zhou, Qianyi Jiang, Qi Song, Nan Li, Kai Zhou, et al. (2019). ICDAR 2019 Robust Reading Challenge on Reading Chinese Text on Signboard. In 15th International Conference on Document Analysis and Recognition (pp. 1577–1581).
Abstract: Chinese scene text reading is one of the most challenging problems in computer vision and has attracted great interest. Different from English text, Chinese has more than 6000 commonly used characters and Chinesecharacters can be arranged in various layouts with numerous fonts. The Chinese signboards in street view are a good choice for Chinese scene text images since they have different backgrounds, fonts and layouts. We organized a competition called ICDAR2019-ReCTS, which mainly focuses on reading Chinese text on signboard. This report presents the final results of the competition. A large-scale dataset of 25,000 annotated signboard images, in which all the text lines and characters are annotated with locations and transcriptions, were released. Four tasks, namely character recognition, text line recognition, text line detection and end-to-end recognition were set up. Besides, considering the Chinese text ambiguity issue, we proposed a multi ground truth (multi-GT) evaluation method to make evaluation fairer. The competition started on March 1, 2019 and ended on April 30, 2019. 262 submissions from 46 teams are received. Most of the participants come from universities, research institutes, and tech companies in China. There are also some participants from the United States, Australia, Singapore, and Korea. 21 teams submit results for Task 1, 23 teams submit results for Task 2, 24 teams submit results for Task 3, and 13 teams submit results for Task 4.
|
|
|
Rui Hua, Oriol Pujol, Francesco Ciompi, Marina Alberti, Simone Balocco, J. Mauri, et al. (2012). Stent Strut Detection by Classifying a Wide Set of IVUS Features. In Computed Assisted Stenting Workshop.
|
|
|
Ruben Tito, Minesh Mathew, C.V. Jawahar, Ernest Valveny, & Dimosthenis Karatzas. (2021). ICDAR 2021 Competition on Document Visual Question Answering. In 16th International Conference on Document Analysis and Recognition (pp. 635–649).
Abstract: In this report we present results of the ICDAR 2021 edition of the Document Visual Question Challenges. This edition complements the previous tasks on Single Document VQA and Document Collection VQA with a newly introduced on Infographics VQA. Infographics VQA is based on a new dataset of more than 5, 000 infographics images and 30, 000 question-answer pairs. The winner methods have scored 0.6120 ANLS in Infographics VQA task, 0.7743 ANLSL in Document Collection VQA task and 0.8705 ANLS in Single Document VQA. We present a summary of the datasets used for each task, description of each of the submitted methods and the results and analysis of their performance. A summary of the progress made on Single Document VQA since the first edition of the DocVQA 2020 challenge is also presented.
|
|
|
Ruben Tito, Dimosthenis Karatzas, & Ernest Valveny. (2021). Document Collection Visual Question Answering. In 16th International Conference on Document Analysis and Recognition (Vol. 12822, pp. 778–792). LNCS.
Abstract: Current tasks and methods in Document Understanding aims to process documents as single elements. However, documents are usually organized in collections (historical records, purchase invoices), that provide context useful for their interpretation. To address this problem, we introduce Document Collection Visual Question Answering (DocCVQA) a new dataset and related task, where questions are posed over a whole collection of document images and the goal is not only to provide the answer to the given question, but also to retrieve the set of documents that contain the information needed to infer the answer. Along with the dataset we propose a new evaluation metric and baselines which provide further insights to the new dataset and task.
Keywords: Document collection; Visual Question Answering
|
|
|
Rosa Maria Ortiz, Debora Gil, Elisa Minchole, Marta Diez-Ferrer, & Noelia Cubero de Frutos. (2017). Classification of Confolcal Endomicroscopy Patterns for Diagnosis of Lung Cancer. In 18th World Conference on Lung Cancer.
Abstract: Confocal Laser Endomicroscopy (CLE) is an emerging imaging technique that allows the in-vivo acquisition of cell patterns of potentially malignant lesions. Such patterns could discriminate between inflammatory and neoplastic lesions and, thus, serve as a first in-vivo biopsy to discard cases that do not actually require a cell biopsy.
The goal of this work is to explore whether CLE images obtained during videobronchoscopy contain enough visual information to discriminate between benign and malign peripheral lesions for lung cancer diagnosis. To do so, we have performed a pilot comparative study with 12 patients (6 adenocarcinoma and 6 benign-inflammatory) using 2 different methods for CLE pattern analysis: visual analysis by 3 experts and a novel methodology that uses graph methods to find patterns in pre-trained feature spaces. Our preliminary results indicate that although visual analysis can only achieve a 60.2% of accuracy, the accuracy of the proposed unsupervised image pattern classification raises to 84.6%.
We conclude that CLE images visual information allow in-vivo detection of neoplastic lesions and graph structural analysis applied to deep-learning feature spaces can achieve competitive results.
|
|
|
Roberto Morales, Juan Quispe, & Eduardo Aguilar. (2023). Exploring multi-food detection using deep learning-based algorithms. In 13th International Conference on Pattern Recognition Systems (pp. 1–7).
Abstract: People are becoming increasingly concerned about their diet, whether for disease prevention, medical treatment or other purposes. In meals served in restaurants, schools or public canteens, it is not easy to identify the ingredients and/or the nutritional information they contain. Currently, technological solutions based on deep learning models have facilitated the recording and tracking of food consumed based on the recognition of the main dish present in an image. Considering that sometimes there may be multiple foods served on the same plate, food analysis should be treated as a multi-class object detection problem. EfficientDet and YOLOv5 are object detection algorithms that have demonstrated high mAP and real-time performance on general domain data. However, these models have not been evaluated and compared on public food datasets. Unlike general domain objects, foods have more challenging features inherent in their nature that increase the complexity of detection. In this work, we performed a performance evaluation of Efficient-Det and YOLOv5 on three public food datasets: UNIMIB2016, UECFood256 and ChileanFood64. From the results obtained, it can be seen that YOLOv5 provides a significant difference in terms of both mAP and response time compared to EfficientDet in all datasets. Furthermore, YOLOv5 outperforms the state-of-the-art on UECFood256, achieving an improvement of more than 4% in terms of mAP@.50 over the best reported.
|
|
|
Robert Benavente, Gemma Sanchez, Ramon Baldrich, Maria Vanrell, & Josep Llados. (2000). Normalized colour segmentation for human appearance description. In 15 th International Conference on Pattern Recognition (Vol. 3, pp. 637–641).
|
|
|
Robert Benavente, C. Alejandro Parraga, & Maria Vanrell. (2010). La influencia del contexto en la definicion de las fronteras entre las categorias cromaticas. In 9th Congreso Nacional del Color (92–95).
Abstract: En este artículo presentamos los resultados de un experimento de categorización de color en el que las muestras se presentaron sobre un fondo multicolor (Mondrian) para simular los efectos del contexto. Los resultados se comparan con los de un experimento previo que, utilizando un paradigma diferente, determinó las fronteras sin tener en cuenta el contexto. El análisis de los resultados muestra que las fronteras obtenidas con el experimento en contexto presentan menos confusión que las obtenidas en el experimento sin contexto.
Keywords: Categorización del color; Apariencia del color; Influencia del contexto; Patrones de Mondrian; Modelos paramétricos
|
|
|
Riccardo Del Chiaro, Bartlomiej Twardowski, Andrew Bagdanov, & Joost Van de Weijer. (2020). Recurrent attention to transient tasks for continual image captioning. In 34th Conference on Neural Information Processing Systems.
Abstract: Research on continual learning has led to a variety of approaches to mitigating catastrophic forgetting in feed-forward classification networks. Until now surprisingly little attention has been focused on continual learning of recurrent models applied to problems like image captioning. In this paper we take a systematic look at continual learning of LSTM-based models for image captioning. We propose an attention-based approach that explicitly accommodates the transient nature of vocabularies in continual image captioning tasks -- i.e. that task vocabularies are not disjoint. We call our method Recurrent Attention to Transient Tasks (RATT), and also show how to adapt continual learning approaches based on weight egularization and knowledge distillation to recurrent continual learning problems. We apply our approaches to incremental image captioning problem on two new continual learning benchmarks we define using the MS-COCO and Flickr30 datasets. Our results demonstrate that RATT is able to sequentially learn five captioning tasks while incurring no forgetting of previously learned ones.
|
|
|
Ricardo Toledo, X. Orriols, Petia Radeva, X. Binefa, Jordi Vitria, Cristina Cañero, et al. (2000). Eigensnakes for vessel segmentation in angiography. In 15 th International Conference on Pattern Recognition (Vol. 4, pp. 340–343).
|
|