Eduard Vazquez, Ramon Baldrich, Joost Van de Weijer, & Maria Vanrell. (2011). Describing Reflectances for Colour Segmentation Robust to Shadows, Highlights and Textures. TPAMI - IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(5), 917–930.
Abstract: The segmentation of a single material reflectance is a challenging problem due to the considerable variation in image measurements caused by the geometry of the object, shadows, and specularities. The combination of these effects has been modeled by the dichromatic reflection model. However, the application of the model to real-world images is limited due to unknown acquisition parameters and compression artifacts. In this paper, we present a robust model for the shape of a single material reflectance in histogram space. The method is based on a multilocal creaseness analysis of the histogram which results in a set of ridges representing the material reflectances. The segmentation method derived from these ridges is robust to both shadow, shading and specularities, and texture in real-world images. We further complete the method by incorporating prior knowledge from image statistics, and incorporate spatial coherence by using multiscale color contrast information. Results obtained show that our method clearly outperforms state-of-the-art segmentation methods on a widely used segmentation benchmark, having as a main characteristic its excellent performance in the presence of shadows and highlights at low computational cost.
|
Sounak Dey, Anjan Dutta, Suman Ghosh, Ernest Valveny, Josep Llados, & Umapada Pal. (2018). Learning Cross-Modal Deep Embeddings for Multi-Object Image Retrieval using Text and Sketch. In 24th International Conference on Pattern Recognition (pp. 916–921).
Abstract: In this work we introduce a cross modal image retrieval system that allows both text and sketch as input modalities for the query. A cross-modal deep network architecture is formulated to jointly model the sketch and text input modalities as well as the the image output modality, learning a common embedding between text and images and between sketches and images. In addition, an attention model is used to selectively focus the attention on the different objects of the image, allowing for retrieval with multiple objects in the query. Experiments show that the proposed method performs the best in both single and multiple object image retrieval in standard datasets.
|
Javad Zolfaghari Bengar, Abel Gonzalez-Garcia, Gabriel Villalonga, Bogdan Raducanu, Hamed H. Aghdam, Mikhail Mozerov, et al. (2019). Temporal Coherence for Active Learning in Videos. In IEEE International Conference on Computer Vision Workshops (pp. 914–923).
Abstract: Autonomous driving systems require huge amounts of data to train. Manual annotation of this data is time-consuming and prohibitively expensive since it involves human resources. Therefore, active learning emerged as an alternative to ease this effort and to make data annotation more manageable. In this paper, we introduce a novel active learning approach for object detection in videos by exploiting temporal coherence. Our active learning criterion is based on the estimated number of errors in terms of false positives and false negatives. The detections obtained by the object detector are used to define the nodes of a graph and tracked forward and backward to temporally link the nodes. Minimizing an energy function defined on this graphical model provides estimates of both false positives and false negatives. Additionally, we introduce a synthetic video dataset, called SYNTHIA-AL, specially designed to evaluate active learning for video object detection in road scenes. Finally, we show that our approach outperforms active learning baselines tested on two datasets.
|
C. Sbert, & A.F. Sole. (2000). Stereo reconstruction of 3D curves. In 15 th International Conference on Pattern Recognition (Vol. 1, 912–915).
|
Jürgen Brauer, Wenjuan Gong, Jordi Gonzalez, & Michael Arens. (2011). On the Effect of Temporal Information on Monocular 3D Human Pose Estimation. In 2nd IEEE International Workshop on Analysis and Retrieval of Tracked Events and Motion in Imagery Streams (pp. 906–913).
Abstract: We address the task of estimating 3D human poses from monocular camera sequences. Many works make use of multiple consecutive frames for the estimation of a 3D pose in a frame. Although such an approach should ease the pose estimation task substantially since multiple consecutive frames allow to solve for 2D projection ambiguities in principle, it has not yet been investigated systematically how much we can improve the 3D pose estimates when using multiple consecutive frames opposed to single frame information. In this paper we analyze the difference in quality of 3D pose estimates based on different numbers of consecutive frames from which 2D pose estimates are available. We validate the use of temporal information on two major different approaches for human pose estimation – modeling and learning approaches. The results of our experiments show that both learning and modeling approaches benefit from using multiple frames opposed to single frame input but that the benefit is small when the 2D pose estimates show a high quality in terms of precision.
|
Jose Carlos Rubio, Joan Serrat, Antonio Lopez, & Daniel Ponsa. (2010). Multiple-target tracking for the intelligent headlights control. In 13th Annual International Conference on Intelligent Transportation Systems (903–910).
Abstract: TA7.4
Intelligent vehicle lighting systems aim at automatically regulating the headlights' beam to illuminate as much of the road ahead as possible while avoiding dazzling other drivers. A key component of such a system is computer vision software that is able to distinguish blobs due to vehicles' headlights and rear lights from those due to road lamps and reflective elements such as poles and traffic signs. In a previous work, we have devised a set of specialized supervised classifiers to make such decisions based on blob features related to its intensity and shape. Despite the overall good performance, there remain challenging that have yet to be solved: notably, faint and tiny blobs corresponding to quite distant vehicles. In fact, for such distant blobs, classification decisions can be taken after observing them during a few frames. Hence, incorporating tracking could improve the overall lighting system performance by enforcing the temporal consistency of the classifier decision. Accordingly, this paper focuses on the problem of constructing blob tracks, which is actually one of multiple-target tracking (MTT), but under two special conditions: We have to deal with frequent occlusions, as well as blob splits and merges. We approach it in a novel way by formulating the problem as a maximum a posteriori inference on a Markov random field. The qualitative (in video form) and quantitative evaluation of our new MTT method shows good tracking results. In addition, we will also see that the classification performance of the problematic blobs improves due to the proposed MTT algorithm.
Keywords: Intelligent Headlights
|
Bogdan Raducanu, & Jordi Vitria. (2008). Face Recognition by Artificial Vision Systems: A Cognitive Perspective. IJPRAI - International Journal of Pattern Recognition and Artificial Intelligence, 899–913.
|
Aura Hernandez-Sabate, Monica Mitiko, Sergio Shiguemi, & Debora Gil. (2010). A validation protocol for assessing cardiac phase retrieval in IntraVascular UltraSound. In Computing in Cardiology (Vol. 37, pp. 899–902). IEEE.
Abstract: A good reliable approach to cardiac triggering is of utmost importance in obtaining accurate quantitative results of atherosclerotic plaque burden from the analysis of IntraVascular UltraSound. Although, in the last years, there has been an increase in research of methods for retrospective gating, there is no general consensus in a validation protocol. Many methods are based on quality assessment of longitudinal cuts appearance and those reporting quantitative numbers do not follow a standard protocol. Such heterogeneity in validation protocols makes faithful comparison across methods a difficult task. We propose a validation protocol based on the variability of the retrieved cardiac phase and explore the capability of several quality measures for quantifying such variability. An ideal detector, suitable for its application in clinical practice, should produce stable phases. That is, it should always sample the same cardiac cycle fraction. In this context, one should measure the variability (variance) of a candidate sampling with respect a ground truth (reference) sampling, since the variance would indicate how spread we are aiming a target. In order to quantify the deviation between the sampling and the ground truth, we have considered two quality scores reported in the literature: signed distance to the closest reference sample and distance to the right of each reference sample. We have also considered the residuals of the regression line of reference against candidate sampling. The performance of the measures has been explored on a set of synthetic samplings covering different cardiac cycle fractions and variabilities. From our simulations, we conclude that the metrics related to distances are sensitive to the shift considered while the residuals are robust against fraction and variabilities as far as one can establish a pair-wise correspondence between candidate and reference. We will further investigate the impact of false positive and negative detections in experimental data.
|
Adrien Gaidon, Antonio Lopez, & Florent Perronnin. (2018). The Reasonable Effectiveness of Synthetic Visual Data. IJCV - International Journal of Computer Vision, 126(9), 899–901.
|
Saad Minhas, Aura Hernandez-Sabate, Shoaib Ehsan, Katerine Diaz, Ales Leonardis, Antonio Lopez, et al. (2016). LEE: A photorealistic Virtual Environment for Assessing Driver-Vehicle Interactions in Self-Driving Mode. In 14th European Conference on Computer Vision Workshops (Vol. 9915, pp. 894–900). LNCS.
Abstract: Photorealistic virtual environments are crucial for developing and testing automated driving systems in a safe way during trials. As commercially available simulators are expensive and bulky, this paper presents a low-cost, extendable, and easy-to-use (LEE) virtual environment with the aim to highlight its utility for level 3 driving automation. In particular, an experiment is performed using the presented simulator to explore the influence of different variables regarding control transfer of the car after the system was driving autonomously in a highway scenario. The results show that the speed of the car at the time when the system needs to transfer the control to the human driver is critical.
Keywords: Simulation environment; Automated Driving; Driver-Vehicle interaction
|
Hugo Jair Escalante, Heysem Kaya, Albert Ali Salah, Sergio Escalera, Yagmur Gucluturk, Umut Guçlu, et al. (2022). Modeling, Recognizing, and Explaining Apparent Personality from Videos. TAC - IEEE Transactions on Affective Computing, 13(2), 894–911.
Abstract: Explainability and interpretability are two critical aspects of decision support systems. Despite their importance, it is only recently that researchers are starting to explore these aspects. This paper provides an introduction to explainability and interpretability in the context of apparent personality recognition. To the best of our knowledge, this is the first effort in this direction. We describe a challenge we organized on explainability in first impressions analysis from video. We analyze in detail the newly introduced data set, evaluation protocol, proposed solutions and summarize the results of the challenge. We investigate the issue of bias in detail. Finally, derived from our study, we outline research opportunities that we foresee will be relevant in this area in the near future.
|
Mohammad Rouhani, & Angel Sappa. (2011). Implicit B-Spline Fitting Using the 3L Algorithm. In 18th IEEE International Conference on Image Processing (pp. 893–896).
|
O.F.Ahmad, Y.Mori, M.Misawa, S.Kudo, J.T.Anderson, & Jorge Bernal. (2021). Establishing key research questions for the implementation of artificial intelligence in colonoscopy: a modified Delphi method. END - Endoscopy, 53(9), 893–901.
Abstract: BACKGROUND : Artificial intelligence (AI) research in colonoscopy is progressing rapidly but widespread clinical implementation is not yet a reality. We aimed to identify the top implementation research priorities. METHODS : An established modified Delphi approach for research priority setting was used. Fifteen international experts, including endoscopists and translational computer scientists/engineers, from nine countries participated in an online survey over 9 months. Questions related to AI implementation in colonoscopy were generated as a long-list in the first round, and then scored in two subsequent rounds to identify the top 10 research questions. RESULTS : The top 10 ranked questions were categorized into five themes. Theme 1: clinical trial design/end points (4 questions), related to optimum trial designs for polyp detection and characterization, determining the optimal end points for evaluation of AI, and demonstrating impact on interval cancer rates. Theme 2: technological developments (3 questions), including improving detection of more challenging and advanced lesions, reduction of false-positive rates, and minimizing latency. Theme 3: clinical adoption/integration (1 question), concerning the effective combination of detection and characterization into one workflow. Theme 4: data access/annotation (1 question), concerning more efficient or automated data annotation methods to reduce the burden on human experts. Theme 5: regulatory approval (1 question), related to making regulatory approval processes more efficient. CONCLUSIONS : This is the first reported international research priority setting exercise for AI in colonoscopy. The study findings should be used as a framework to guide future research with key stakeholders to accelerate the clinical implementation of AI in endoscopy.
|
Misael Rosales, Petia Radeva, Oriol Rodriguez, & Debora Gil. (2005). Suppression of IVUS Image Rotation. A Kinematic Approach. In Monica Andres and Hernandez Petia and Santos A. and R. Frangi (Ed.), Functional Imaging and Modeling of the Heart (Vol. 3504, pp. 889–892). LNCS, 3504. Springer Berlin / Heidelberg.
Abstract: IntraVascular Ultrasound (IVUS) is an exploratory technique used in interventional procedures that shows cross section images of arteries and provides qualitative information about the causes and severity of the arterial lumen narrowing. Cross section analysis as well as visualization of plaque extension in a vessel segment during the catheter imaging pullback are the technique main advantages. However, IVUS sequence exhibits a periodic rotation artifact that makes difficult the longitudinal lesion inspection and hinders any segmentation algorithm. In this paper we propose a new kinematic method to estimate and remove the image rotation of IVUS images sequences. Results on several IVUS sequences show good results and prompt some of the clinical applications to vessel dynamics study, and relation to vessel pathology.
|
Marçal Rusiñol, Josep Llados, & Philippe Dosch. (2007). Camera-Based Graphical Symbol Detection. In 9th IEEE International Conference on Document Analysis and Recognition (Vol. 2, 884–888).
|