E. Tavalera, Mariella Dimiccoli, Marc Bolaños, Maedeh Aghaei, & Petia Radeva. (2015). Regularized Clustering for Egocentric Video Segmentation. In Pattern Recognition and Image Analysis (pp. 327–336). LNCS. Springer International Publishing.
Abstract: In this paper, we present a new method for egocentric video temporal segmentation based on integrating a statistical mean change detector and agglomerative clustering(AC) within an energyminimization framework. Given the tendency of most AC methods to oversegment video sequences when clustering their frames, we combine the clustering with a concept drift detection technique (ADWIN) that has rigorous guarantee of performances. ADWIN serves as a statistical upper bound for the clustering-based video segmentation. We integrate techniques in an energy-minimization framework that serves disambiguate the decision of both techniques and to complete the segmentation taking into account the temporal continuity of video frames We present experiments over egocentric sets of more than 13.000 images acquired with different wearable cameras, showing that our method outperforms state-of-the-art clustering methods.
Keywords: Temporal video segmentation ; Egocentric videos ; Clustering
|
Dimosthenis Karatzas, Lluis Gomez, Anguelos Nicolaou, Suman Ghosh, Andrew Bagdanov, Masakazu Iwamura, et al. (2015). ICDAR 2015 Competition on Robust Reading. In 13th International Conference on Document Analysis and Recognition ICDAR2015 (pp. 1156–1160).
|
Dennis G.Romero, Anselmo Frizera, Angel Sappa, Boris X. Vintimilla, & Teodiano F.Bastos. (2015). A predictive model for human activity recognition by observing actions and context. In Advanced Concepts for Intelligent Vision Systems, Proceedings of 16th International Conference, ACIVS 2015 (Vol. 9386, pp. 323–333). LNCS. Springer International Publishing.
Abstract: This paper presents a novel model to estimate human activities — a human activity is defined by a set of human actions. The proposed approach is based on the usage of Recurrent Neural Networks (RNN) and Bayesian inference through the continuous monitoring of human actions and its surrounding environment. In the current work human activities are inferred considering not only visual analysis but also additional resources; external sources of information, such as context information, are incorporated to contribute to the activity estimation. The novelty of the proposed approach lies in the way the information is encoded, so that it can be later associated according to a predefined semantic structure. Hence, a pattern representing a given activity can be defined by a set of actions, plus contextual information or other kind of information that could be relevant to describe the activity. Experimental results with real data are provided showing the validity of the proposed approach.
|
Debora Gil, F. Javier Sanchez, Gloria Fernandez Esparrach, & Jorge Bernal. (2015). 3D Stable Spatio-temporal Polyp Localization in Colonoscopy Videos. In Computer-Assisted and Robotic Endoscopy. Revised selected papers of Second International Workshop, CARE 2015, Held in Conjunction with MICCAI 2015 (Vol. 9515, pp. 140–152). LNCS.
Abstract: Computational intelligent systems could reduce polyp miss rate in colonoscopy for colon cancer diagnosis and, thus, increase the efficiency of the procedure. One of the main problems of existing polyp localization methods is a lack of spatio-temporal stability in their response. We propose to explore the response of a given polyp localization across temporal windows in order to select
those image regions presenting the highest stable spatio-temporal response.
Spatio-temporal stability is achieved by extracting 3D watershed regions on the
temporal window. Stability in localization response is statistically determined by analysis of the variance of the output of the localization method inside each 3D region. We have explored the benefits of considering spatio-temporal stability in two different tasks: polyp localization and polyp detection. Experimental results indicate an average improvement of 21:5% in polyp localization and 43:78% in polyp detection.
Keywords: Colonoscopy, Polyp Detection, Polyp Localization, Region Extraction, Watersheds
|
Debora Gil, David Roche, Agnes Borras, & Jesus Giraldo. (2015). Terminating Evolutionary Algorithms at their Steady State. COA - Computational Optimization and Applications, 61(2), 489–515.
Abstract: Assessing the reliability of termination conditions for evolutionary algorithms (EAs) is of prime importance. An erroneous or weak stop criterion can negatively affect both the computational effort and the final result. We introduce a statistical framework for assessing whether a termination condition is able to stop an EA at its steady state, so that its results can not be improved anymore. We use a regression model in order to determine the requirements ensuring that a measure derived from EA evolving population is related to the distance to the optimum in decision variable space. Our framework is analyzed across 24 benchmark test functions and two standard termination criteria based on function fitness value in objective function space and EA population decision variable space distribution for the differential evolution (DE) paradigm. Results validate our framework as a powerful tool for determining the capability of a measure for terminating EA and the results also identify the decision variable space distribution as the best-suited for accurately terminating DE in real-world applications.
Keywords: Evolutionary algorithms; Termination condition; Steady state; Differential evolution
|
David Sanchez-Mendoza, David Masip, & Agata Lapedriza. (2015). Emotion recognition from mid-level features. PRL - Pattern Recognition Letters, 67(Part 1), 66–74.
Abstract: In this paper we present a study on the use of Action Units as mid-level features for automatically recognizing basic and subtle emotions. We propose a representation model based on mid-level facial muscular movement features. We encode these movements dynamically using the Facial Action Coding System, and propose to use these intermediate features based on Action Units (AUs) to classify emotions. AUs activations are detected fusing a set of spatiotemporal geometric and appearance features. The algorithm is validated in two applications: (i) the recognition of 7 basic emotions using the publicly available Cohn-Kanade database, and (ii) the inference of subtle emotional cues in the Newscast database. In this second scenario, we consider emotions that are perceived cumulatively in longer periods of time. In particular, we Automatically classify whether video shoots from public News TV channels refer to Good or Bad news. To deal with the different video lengths we propose a Histogram of Action Units and compute it using a sliding window strategy on the frame sequences. Our approach achieves accuracies close to human perception.
Keywords: Facial expression; Emotion recognition; Action units; Computer vision
|
David Roche. (2015). A Statistical Framework for Terminating Evolutionary Algorithms at their Steady State (Debora Gil, & Jesus Giraldo, Eds.). Ph.D. thesis, Ediciones Graficas Rey, .
Abstract: As any iterative technique, it is a necessary condition a stop criterion for terminating Evolutionary Algorithms (EA). In the case of optimization methods, the algorithm should stop at the time it has reached a steady state so it can not improve results anymore. Assessing the reliability of termination conditions for EAs is of prime importance. A wrong or weak stop criterion can negatively aect both the computational eort and the nal result.
In this Thesis, we introduce a statistical framework for assessing whether a termination condition is able to stop EA at its steady state. In one hand a numeric approximation to steady states to detect the point in which EA population has lost its diversity has been presented for EA termination. This approximation has been applied to dierent EA paradigms based on diversity and a selection of functions covering the properties most relevant for EA convergence. Experiments show that our condition works regardless of the search space dimension and function landscape and Dierential Evolution (DE) arises as the best paradigm. On the other hand, we use a regression model in order to determine the requirements ensuring that a measure derived from EA evolving population is related to the distance to the optimum in xspace.
Our theoretical framework is analyzed across several benchmark test functions
and two standard termination criteria based on function improvement in f-space and EA population x-space distribution for the DE paradigm. Results validate our statistical framework as a powerful tool for determining the capability of a measure for terminating EA and select the x-space distribution as the best-suited for accurately stopping DE in real-world applications.
|
David Aldavert, Marçal Rusiñol, Ricardo Toledo, & Josep Llados. (2015). A Study of Bag-of-Visual-Words Representations for Handwritten Keyword Spotting. IJDAR - International Journal on Document Analysis and Recognition, 18(3), 223–234.
Abstract: The Bag-of-Visual-Words (BoVW) framework has gained popularity among the document image analysis community, specifically as a representation of handwritten words for recognition or spotting purposes. Although in the computer vision field the BoVW method has been greatly improved, most of the approaches in the document image analysis domain still rely on the basic implementation of the BoVW method disregarding such latest refinements. In this paper, we present a review of those improvements and its application to the keyword spotting task. We thoroughly evaluate their impact against a baseline system in the well-known George Washington dataset and compare the obtained results against nine state-of-the-art keyword spotting methods. In addition, we also compare both the baseline and improved systems with the methods presented at the Handwritten Keyword Spotting Competition 2014.
Keywords: Bag-of-Visual-Words; Keyword spotting; Handwritten documents; Performance evaluation
|
Daniel Sanchez, Miguel Angel Bautista, & Sergio Escalera. (2015). HuPBA 8k+: Dataset and ECOC-GraphCut based Segmentation of Human Limbs. NEUCOM - Neurocomputing, 150(A), 173–188.
Abstract: Human multi-limb segmentation in RGB images has attracted a lot of interest in the research community because of the huge amount of possible applications in fields like Human-Computer Interaction, Surveillance, eHealth, or Gaming. Nevertheless, human multi-limb segmentation is a very hard task because of the changes in appearance produced by different points of view, clothing, lighting conditions, occlusions, and number of articulations of the human body. Furthermore, this huge pose variability makes the availability of large annotated datasets difficult. In this paper, we introduce the HuPBA8k+ dataset. The dataset contains more than 8000 labeled frames at pixel precision, including more than 120000 manually labeled samples of 14 different limbs. For completeness, the dataset is also labeled at frame-level with action annotations drawn from an 11 action dictionary which includes both single person actions and person-person interactive actions. Furthermore, we also propose a two-stage approach for the segmentation of human limbs. In a first stage, human limbs are trained using cascades of classifiers to be split in a tree-structure way, which is included in an Error-Correcting Output Codes (ECOC) framework to define a body-like probability map. This map is used to obtain a binary mask of the subject by means of GMM color modelling and GraphCuts theory. In a second stage, we embed a similar tree-structure in an ECOC framework to build a more accurate set of limb-like probability maps within the segmented user mask, that are fed to a multi-label GraphCut procedure to obtain final multi-limb segmentation. The methodology is tested on the novel HuPBA8k+ dataset, showing performance improvements in comparison to state-of-the-art approaches. In addition, a baseline of standard action recognition methods for the 11 actions categories of the novel dataset is also provided.
Keywords: Human limb segmentation; ECOC; Graph-Cuts
|
Dan Norton, Fernando Vilariño, & Onur Ferhat. (2015). Memory Field – Creative Engagement in Digital Collections. In Internet Librarian International Conference.
Abstract: “Memory Fields” is a trans-disciplinary project aiming at the (re)valorisation of digital collections.Its main deliverable is an interface for a dual screen installation, used to access and mix the public library digital collections. The collections being used in this case are a collection of digitised posters from the Spanish Civil War, belonging to the Arxiu General de Catalunya, and a collection of field recordings made by Dan Norton. The system generates visualisations, and the images and sounds are mixed together using narrative primitives of video dj. Users contribute to the digital collections by adding personal memories and observations. The comments and recollections appear as flowers growing in a “memory field” and memories remain public in a Twitter feed (@Memoryfields).
|
Cristhian A. Aguilera-Carrasco, Angel Sappa, & Ricardo Toledo. (2015). LGHD: a Feature Descriptor for Matching Across Non-Linear Intensity Variations. In 22th IEEE International Conference on Image Processing (pp. 178–181).
|
Christophe Rigaud, Clement Guerin, Dimosthenis Karatzas, Jean-Christophe Burie, & Jean-Marc Ogier. (2015). Knowledge-driven understanding of images in comic books. IJDAR - International Journal on Document Analysis and Recognition, 18(3), 199–221.
Abstract: Document analysis is an active field of research, which can attain a complete understanding of the semantics of a given document. One example of the document understanding process is enabling a computer to identify the key elements of a comic book story and arrange them according to a predefined domain knowledge. In this study, we propose a knowledge-driven system that can interact with bottom-up and top-down information to progressively understand the content of a document. We model the comic book’s and the image processing domains knowledge for information consistency analysis. In addition, different image processing methods are improved or developed to extract panels, balloons, tails, texts, comic characters and their semantic relations in an unsupervised way.
Keywords: Document Understanding; comics analysis; expert system
|
Chen Zhang, Maria del Mar Vila Muñoz, Petia Radeva, Roberto Elosua, Maria Grau, Angels Betriu, et al. (2015). Carotid Artery Segmentation in Ultrasound Images. In Computing and Visualization for Intravascular Imaging and Computer Assisted Stenting (CVII-STENT2015), Joint MICCAI Workshops.
|
Carolina Malagelada, Michal Drozdzal, Santiago Segui, Sara Mendez, Jordi Vitria, Petia Radeva, et al. (2015). Classification of functional bowel disorders by objective physiological criteria based on endoluminal image analysis. AJPGI - American Journal of Physiology-Gastrointestinal and Liver Physiology, 309(6), G413–G419.
Abstract: We have previously developed an original method to evaluate small bowel motor function based on computer vision analysis of endoluminal images obtained by capsule endoscopy. Our aim was to demonstrate intestinal motor abnormalities in patients with functional bowel disorders by endoluminal vision analysis. Patients with functional bowel disorders (n = 205) and healthy subjects (n = 136) ingested the endoscopic capsule (Pillcam-SB2, Given-Imaging) after overnight fast and 45 min after gastric exit of the capsule a liquid meal (300 ml, 1 kcal/ml) was administered. Endoluminal image analysis was performed by computer vision and machine learning techniques to define the normal range and to identify clusters of abnormal function. After training the algorithm, we used 196 patients and 48 healthy subjects, completely naive, as test set. In the test set, 51 patients (26%) were detected outside the normal range (P < 0.001 vs. 3 healthy subjects) and clustered into hypo- and hyperdynamic subgroups compared with healthy subjects. Patients with hypodynamic behavior (n = 38) exhibited less luminal closure sequences (41 ± 2% of the recording time vs. 61 ± 2%; P < 0.001) and more static sequences (38 ± 3 vs. 20 ± 2%; P < 0.001); in contrast, patients with hyperdynamic behavior (n = 13) had an increased proportion of luminal closure sequences (73 ± 4 vs. 61 ± 2%; P = 0.029) and more high-motion sequences (3 ± 1 vs. 0.5 ± 0.1%; P < 0.001). Applying an original methodology, we have developed a novel classification of functional gut disorders based on objective, physiological criteria of small bowel function.
Keywords: capsule endoscopy; computer vision analysis; functional bowel disorders; intestinal motility; machine learning
|
Carles Sanchez, Oriol Ramos Terrades, Patricia Marquez, Enric Marti, J.Roncaries, & Debora Gil. (2015). Automatic evaluation of practices in Moodle for Self Learning in Engineering. JOTSE - Journal of Technology and Science Education, 97–106.
|