|
Anguelos Nicolaou, Sounak Dey, V.Christlein, A.Maier, & Dimosthenis Karatzas. (2018). Non-deterministic Behavior of Ranking-based Metrics when Evaluating Embeddings. In International Workshop on Reproducible Research in Pattern Recognition (Vol. 11455, pp. 71–82). LNCS.
Abstract: Embedding data into vector spaces is a very popular strategy of pattern recognition methods. When distances between embeddings are quantized, performance metrics become ambiguous. In this paper, we present an analysis of the ambiguity quantized distances introduce and provide bounds on the effect. We demonstrate that it can have a measurable effect in empirical data in state-of-the-art systems. We also approach the phenomenon from a computer security perspective and demonstrate how someone being evaluated by a third party can exploit this ambiguity and greatly outperform a random predictor without even access to the input data. We also suggest a simple solution making the performance metrics, which rely on ranking, totally deterministic and impervious to such exploits.
|
|
|
Carles Onielfa, Carles Casacuberta, & Sergio Escalera. (2022). Influence in Social Networks Through Visual Analysis of Image Memes. In Artificial Intelligence Research and Development (Vol. 356, pp. 71–80).
Abstract: Memes evolve and mutate through their diffusion in social media. They have the potential to propagate ideas and, by extension, products. Many studies have focused on memes, but none so far, to our knowledge, on the users that post them, their relationships, and the reach of their influence. In this article, we define a meme influence graph together with suitable metrics to visualize and quantify influence between users who post memes, and we also describe a process to implement our definitions using a new approach to meme detection based on text-to-image area ratio and contrast. After applying our method to a set of users of the social media platform Instagram, we conclude that our metrics add information to already existing user characteristics.
|
|
|
Simone Balocco, Carlo Gatta, Francesco Ciompi, A. Wahle, Petia Radeva, S. Carlier, et al. (2014). Standardized evaluation methodology and reference database for evaluating IVUS image segmentation. CMIG - Computerized Medical Imaging and Graphics, 38(2), 70–90.
Abstract: This paper describes an evaluation framework that allows a standardized and quantitative comparison of IVUS lumen and media segmentation algorithms. This framework has been introduced at the MICCAI 2011 Computing and Visualization for (Intra)Vascular Imaging (CVII) workshop, comparing the results of eight teams that participated.
We describe the available data-base comprising of multi-center, multi-vendor and multi-frequency IVUS datasets, their acquisition, the creation of the reference standard and the evaluation measures. The approaches address segmentation of the lumen, the media, or both borders; semi- or fully-automatic operation; and 2-D vs. 3-D methodology. Three performance measures for quantitative analysis have
been proposed. The results of the evaluation indicate that segmentation of the vessel lumen and media is possible with an accuracy that is comparable to manual annotation when semi-automatic methods are used, as well as encouraging results can be obtained also in case of fully-automatic segmentation. The analysis performed in this paper also highlights the challenges in IVUS segmentation that remains to be
solved.
Keywords: IVUS (intravascular ultrasound); Evaluation framework; Algorithm comparison; Image segmentation
|
|
|
Ariel Amato. (2014). Moving cast shadow detection. ELCVIA - Electronic letters on computer vision and image analysis, 13(2), 70–71.
Abstract: Motion perception is an amazing innate ability of the creatures on the planet. This adroitness entails a functional advantage that enables species to compete better in the wild. The motion perception ability is usually employed at different levels, allowing from the simplest interaction with the ’physis’ up to the most transcendental survival tasks. Among the five classical perception system , vision is the most widely used in the motion perception field. Millions years of evolution have led to a highly specialized visual system in humans, which is characterized by a tremendous accuracy as well as an extraordinary robustness. Although humans and an immense diversity of species can distinguish moving object with a seeming simplicity, it has proven to be a difficult and non trivial problem from a computational perspective. In the field of Computer Vision, the detection of moving objects is a challenging and fundamental research area. This can be referred to as the ’origin’ of vast and numerous vision-based research sub-areas. Nevertheless, from the bottom to the top of this hierarchical analysis, the foundations still relies on when and where motion has occurred in an image. Pixels corresponding to moving objects in image sequences can be identified by measuring changes in their values. However, a pixel’s value (representing a combination of color and brightness) could also vary due to other factors such as: variation in scene illumination, camera noise and nonlinear sensor responses among others. The challenge lies in detecting if the changes in pixels’ value are caused by a genuine object movement or not. An additional challenging aspect in motion detection is represented by moving cast shadows. The paradox arises because a moving object and its cast shadow share similar motion patterns. However, a moving cast shadow is not a moving object. In fact, a shadow represents a photometric illumination effect caused by the relative position of the object with respect to the light sources. Shadow detection methods are mainly divided in two domains depending on the application field. One normally consists of static images where shadows are casted by static objects, whereas the second one is referred to image sequences where shadows are casted by moving objects. For the first case, shadows can provide additional geometric and semantic cues about shape and position of its casting object as well as the localization of the light source. Although the previous information can be extracted from static images as well as video sequences, the main focus in the second area is usually change detection, scene matching or surveillance. In this context, a shadow can severely affect with the analysis and interpretation of the scene. The work done in the thesis is focused on the second case, thus it addresses the problem of detection and removal of moving cast shadows in video sequences in order to enhance the detection of moving object.
|
|
|
Mikhail Mozerov, Ariel Amato, & Xavier Roca. (2009). Occlusion Handling in Trinocular Stereo using Composite Disparity Space Image. In 19th International Conference on Computer Graphics and Vision (69–73).
Abstract: In this paper we propose a method that smartly improves occlusion handling in stereo matching using trinocular stereo. The main idea is based on the assumption that any occluded region in a matched stereo pair (middle-left images) in general is not occluded in the opposite matched pair (middle-right images). Then two disparity space images (DSI) can be merged in one composite DSI. The proposed integration differs from the known approach that uses a cumulative cost. A dense disparity map is obtained with a global optimization algorithm using the proposed composite DSI. The experimental results are evaluated on the Middlebury data set, showing high performance of the proposed algorithm especially in the occluded regions. One of the top positions in the rank of the Middlebury website confirms the performance of our method to be competitive with the best stereo matching.
|
|
|
Hanne Kause, Aura Hernandez-Sabate, Patricia Marquez, Andrea Fuster, Luc Florack, Hans van Assen, et al. (2015). Confidence Measures for Assessing the HARP Algorithm in Tagged Magnetic Resonance Imaging. In Statistical Atlases and Computational Models of the Heart. Revised selected papers of Imaging and Modelling Challenges 6th International Workshop, STACOM 2015, Held in Conjunction with MICCAI 2015 (Vol. 9534, pp. 69–79). LNCS. Springer International Publishing.
Abstract: Cardiac deformation and changes therein have been linked to pathologies. Both can be extracted in detail from tagged Magnetic Resonance Imaging (tMRI) using harmonic phase (HARP) images. Although point tracking algorithms have shown to have high accuracies on HARP images, these vary with position. Detecting and discarding areas with unreliable results is crucial for use in clinical support systems. This paper assesses the capability of two confidence measures (CMs), based on energy and image structure, for detecting locations with reduced accuracy in motion tracking results. These CMs were tested on a database of simulated tMRI images containing the most common artifacts that may affect tracking accuracy. CM performance is assessed based on its capability for HARP tracking error bounding and compared in terms of significant differences detected using a multi comparison analysis of variance that takes into account the most influential factors on HARP tracking performance. Results showed that the CM based on image structure was better suited to detect unreliable optical flow vectors. In addition, it was shown that CMs can be used to detect optical flow vectors with large errors in order to improve the optical flow obtained with the HARP tracking algorithm.
|
|
|
Aura Hernandez-Sabate, Debora Gil, & Petia Radeva. (2005). On the usefulness of supervised learning for vessel border detection in IntraVascular Imaging. In Proceeding of the 2005 conference on Artificial Intelligence Research and Development (pp. 67–74). Amsterdam, The Netherlands: IOS Press.
Abstract: IntraVascular UltraSound (IVUS) imaging is a useful tool in diagnosis of cardiac diseases since sequences completely show the morphology of coronary vessels. Vessel borders detection, especially the external adventitia layer, plays a central role in morphological measures and, thus, their segmentation feeds development of medical imaging techniques. Deterministic approaches fail to yield optimal results due to the large amount of IVUS artifacts and vessel borders descriptors. We propose using classification techniques to learn the set of descriptors and parameters that best detect vessel borders. Statistical hypothesis test on the error between automated detections and manually traced borders by 4 experts show that our detections keep within inter-observer variability.
Keywords: classification; vessel border modelling; IVUS
|
|
|
Volkmar Frinken, Andreas Fischer, & Carlos David Martinez Hinarejos. (2013). Handwriting Recognition in Historical Documents using Very Large Vocabularies. In 2nd International Workshop on Historical Document Imaging and Processing (pp. 67–72).
Abstract: Language models are used in automatic transcription system to resolve ambiguities. This is done by limiting the vocabulary of words that can be recognized as well as estimating the n-gram probability of the words in the given text. In the context of historical documents, a non-unified spelling and the limited amount of written text pose a substantial problem for the selection of the recognizable vocabulary as well as the computation of the word probabilities. In this paper we propose for the transcription of historical Spanish text to keep the corpus for the n-gram limited to a sample of the target text, but expand the vocabulary with words gathered from external resources. We analyze the performance of such a transcription system with different sizes of external vocabularies and demonstrate the applicability and the significant increase in recognition accuracy of using up to 300 thousand external words.
|
|
|
Jon Almazan, Albert Gordo, Alicia Fornes, & Ernest Valveny. (2012). Efficient Exemplar Word Spotting. In 23rd British Machine Vision Conference (67.pp. 1–67.11).
Abstract: In this paper we propose an unsupervised segmentation-free method for word spotting in document images.
Documents are represented with a grid of HOG descriptors, and a sliding window approach is used to locate the document regions that are most similar to the query. We use the exemplar SVM framework to produce a better representation of the query in an unsupervised way. Finally, the document descriptors are precomputed and compressed with Product Quantization. This offers two advantages: first, a large number of documents can be kept in RAM memory at the same time. Second, the sliding window becomes significantly faster since distances between quantized HOG descriptors can be precomputed. Our results significantly outperform other segmentation-free methods in the literature, both in accuracy and in speed and memory usage.
|
|
|
Marc Bolaños, Maite Garolera, & Petia Radeva. (2015). Object Discovery using CNN Features in Egocentric Videos. In Pattern Recognition and Image Analysis, Proceedings of 7th Iberian Conference , ibPRIA 2015 (Vol. 9117, pp. 67–74). LNCS.
Abstract: Lifelogging devices based on photo/video are spreading faster everyday. This growth can represent great benefits to develop methods for extraction of meaningful information about the user wearing the device and his/her environment. In this paper, we propose a semi-supervised strategy for easily discovering objects relevant to the person wearing a first-person camera. The egocentric video sequence acquired by the camera, uses both the appearance extracted by means of a deep convolutional neural network and an object refill methodology that allow to discover objects even in case of small amount of object appearance in the collection of images. We validate our method on a sequence of 1000 egocentric daily images and obtain results with an F-measure of 0.5, 0.17 better than the state of the art approach.
Keywords: Object discovery; Egocentric videos; Lifelogging; CNN
|
|
|
Kamal Nasrollahi, Sergio Escalera, P. Rasti, Gholamreza Anbarjafari, Xavier Baro, Hugo Jair Escalante, et al. (2015). Deep Learning based Super-Resolution for Improved Action Recognition. In 5th International Conference on Image Processing Theory, Tools and Applications IPTA2015 (pp. 67–72).
Abstract: Action recognition systems mostly work with videos of proper quality and resolution. Even most challenging benchmark databases for action recognition, hardly include videos of low-resolution from, e.g., surveillance cameras. In videos recorded by such cameras, due to the distance between people and cameras, people are pictured very small and hence challenge action recognition algorithms. Simple upsampling methods, like bicubic interpolation, cannot retrieve all the detailed information that can help the recognition. To deal with this problem, in this paper we combine results of bicubic interpolation with results of a state-ofthe-art deep learning-based super-resolution algorithm, through an alpha-blending approach. The experimental results obtained on down-sampled version of a large subset of Hoolywood2 benchmark database show the importance of the proposed system in increasing the recognition rate of a state-of-the-art action recognition system for handling low-resolution videos.
|
|
|
Hans Stadthagen-Gonzalez, Luis Lopez, M. Carmen Parafita, & C. Alejandro Parraga. (2018). Using two-alternative forced choice tasks and Thurstone law of comparative judgments for code-switching research. In Linguistic Approaches to Bilingualism (pp. 67–97).
Abstract: This article argues that 2-alternative forced choice tasks and Thurstone’s law of comparative judgments (Thurstone, 1927) are well suited to investigate code-switching competence by means of acceptability judgments. We compare this method with commonly used Likert scale judgments and find that the 2-alternative forced choice task provides granular details that remain invisible in a Likert scale experiment. In order to compare and contrast both methods, we examined the syntactic phenomenon usually referred to as the Adjacency Condition (AC) (apud Stowell, 1981), which imposes a condition of adjacency between verb and object. Our interest in the AC comes from the fact that it is a subtle feature of English grammar which is absent in Spanish, and this provides an excellent springboard to create minimal code-switched pairs that allow us to formulate a clear research question that can be tested using both methods.
Keywords: two-alternative forced choice and Thurstone's law; acceptability judgment; code-switching
|
|
|
David Sanchez-Mendoza, David Masip, & Agata Lapedriza. (2015). Emotion recognition from mid-level features. PRL - Pattern Recognition Letters, 67(Part 1), 66–74.
Abstract: In this paper we present a study on the use of Action Units as mid-level features for automatically recognizing basic and subtle emotions. We propose a representation model based on mid-level facial muscular movement features. We encode these movements dynamically using the Facial Action Coding System, and propose to use these intermediate features based on Action Units (AUs) to classify emotions. AUs activations are detected fusing a set of spatiotemporal geometric and appearance features. The algorithm is validated in two applications: (i) the recognition of 7 basic emotions using the publicly available Cohn-Kanade database, and (ii) the inference of subtle emotional cues in the Newscast database. In this second scenario, we consider emotions that are perceived cumulatively in longer periods of time. In particular, we Automatically classify whether video shoots from public News TV channels refer to Good or Bad news. To deal with the different video lengths we propose a Histogram of Action Units and compute it using a sliding window strategy on the frame sequences. Our approach achieves accuracies close to human perception.
Keywords: Facial expression; Emotion recognition; Action units; Computer vision
|
|
|
Veronica Romero, Emilio Granell, Alicia Fornes, Enrique Vidal, & Joan Andreu Sanchez. (2019). Information Extraction in Handwritten Marriage Licenses Books. In 5th International Workshop on Historical Document Imaging and Processing (pp. 66–71).
Abstract: Handwritten marriage licenses books are characterized by a simple structure of the text in the records with an evolutionary vocabulary, mainly composed of proper names that change along the time. This distinct vocabulary makes automatic transcription and semantic information extraction difficult tasks. Previous works have shown that the use of category-based language models and a Grammatical Inference technique known as MGGI can improve the accuracy of these
tasks. However, the application of the MGGI algorithm requires an a priori knowledge to label the words of the training strings, that is not always easy to obtain. In this paper we study how to automatically obtain the information required by the MGGI algorithm using a technique based on Confusion Networks. Using the resulting language model, full handwritten text recognition and information extraction experiments have been carried out with results supporting the proposed approach.
|
|
|
Angel Sappa, Fadi Dornaika, David Geronimo, & Antonio Lopez. (2008). Registration-based Moving Object Detection from a Moving Camera. In IROS2008 2nd Workshop on Perception, Planning and Navigation for Intelligent Vehicles (65–69).
Abstract: This paper presents a robust approach for detecting moving objects from on-board stereo vision systems. It relies on a feature point quaternion-based registration, which avoids common problems that appear when computationally expensive iterative-based algorithms are used on dynamic environments. The proposed approach consists of three stages. Initially, feature points are extracted and tracked through consecutive frames. Then, a RANSAC based approach is used for registering
two 3D point sets with known correspondences by means of the quaternion method. Finally, the computed 3D rigid displacement is used to map two consecutive frames into the same coordinate system. Moving objects correspond to those areas with large registration errors. Experimental results, in different scenarios, show the viability of the proposed approach.
|
|