Wenjuan Gong, Xuena Zhang, Jordi Gonzalez, Andrews Sobral, Thierry Bouwmans, Changhe Tu, et al. (2016). Human Pose Estimation from Monocular Images: A Comprehensive Survey. SENS - Sensors, 16(12), 1966.
Abstract: Human pose estimation refers to the estimation of the location of body parts and how they are connected in an image. Human pose estimation from monocular images has wide applications (e.g., image indexing). Several surveys on human pose estimation can be found in the literature, but they focus on a certain category; for example, model-based approaches or human motion analysis, etc. As far as we know, an overall review of this problem domain has yet to be provided. Furthermore, recent advancements based on deep learning have brought novel algorithms for this problem. In this paper, a comprehensive survey of human pose estimation from monocular images is carried out including milestone works and recent advancements. Based on one standard pipeline for the solution of computer vision problems, this survey splits the problem into several modules: feature extraction and description, human body models, and modeling
methods. Problem modeling methods are approached based on two means of categorization in this survey. One way to categorize includes top-down and bottom-up methods, and another way includes generative and discriminative methods. Considering the fact that one direct application of human pose estimation is to provide initialization for automatic video surveillance, there are additional sections for motion-related methods in all modules: motion features, motion models, and motion-based methods. Finally, the paper also collects 26 publicly available data sets for validation and provides error measurement methods that are frequently used.
Keywords: human pose estimation; human bodymodels; generativemethods; discriminativemethods; top-down methods; bottom-up methods
|
Pedro Herruzo, Marc Bolaños, & Petia Radeva. (2016). Can a CNN Recognize Catalan Diet? In AIP Conference Proceedings (Vol. 1773).
Abstract: CoRR abs/1607.08811
Nowadays, we can find several diseases related to the unhealthy diet habits of the population, such as diabetes, obesity, anemia, bulimia and anorexia. In many cases, these diseases are related to the food consumption of people. Mediterranean diet is scientifically known as a healthy diet that helps to prevent many metabolic diseases. In particular, our work focuses on the recognition of Mediterranean food and dishes. The development of this methodology would allow to analise the daily habits of users with wearable cameras, within the topic of lifelogging. By using automatic mechanisms we could build an objective tool for the analysis of the patient’s behavior, allowing specialists to discover unhealthy food patterns and understand the user’s lifestyle.
With the aim to automatically recognize a complete diet, we introduce a challenging multi-labeled dataset related to Mediter-ranean diet called FoodCAT. The first type of label provided consists of 115 food classes with an average of 400 images per dish, and the second one consists of 12 food categories with an average of 3800 pictures per class. This dataset will serve as a basis for the development of automatic diet recognition. In this context, deep learning and more specifically, Convolutional Neural Networks (CNNs), currently are state-of-the-art methods for automatic food recognition. In our work, we compare several architectures for image classification, with the purpose of diet recognition. Applying the best model for recognising food categories, we achieve a top-1 accuracy of 72.29%, and top-5 of 97.07%. In a complete diet recognition of dishes from Mediterranean diet, enlarged with the Food-101 dataset for international dishes recognition, we achieve a top-1 accuracy of 68.07%, and top-5 of 89.53%, for a total of 115+101 food classes.
|
Antonio Hernandez, Sergio Escalera, & Stan Sclaroff. (2016). Poselet-basedContextual Rescoring for Human Pose Estimation via Pictorial Structures. IJCV - International Journal of Computer Vision, 118(1), 49–64.
Abstract: In this paper we propose a contextual rescoring method for predicting the position of body parts in a human pose estimation framework. A set of poselets is incorporated in the model, and their detections are used to extract spatial and score-related features relative to other body part hypotheses. A method is proposed for the automatic discovery of a compact subset of poselets that covers the different poses in a set of validation images while maximizing precision. A rescoring mechanism is defined as a set-based boosting classifier that computes a new score for each body joint detection, given its relationship to detections of other body joints and mid-level parts in the image. This new score is incorporated in the pictorial structure model as an additional unary potential, following the recent work of Pishchulin et al. Experiments on two benchmarks show comparable results to Pishchulin et al. while reducing the size of the mid-level representation by an order of magnitude, reducing the execution time by 68 % accordingly.
Keywords: Contextual rescoring; Poselets; Human pose estimation
|
Thanh Ha Do, Salvatore Tabbone, & Oriol Ramos Terrades. (2016). Spotting Symbol over Graphical Documents Via Sparsity in Visual Vocabulary. In Recent Trends in Image Processing and Pattern Recognition (Vol. 709).
|
Jean-Pascal Jacob, Mariella Dimiccoli, & Lionel Moisan. (2016). Active skeleton for bacteria modeling. CMBBE - Computer Methods in Biomechanics and Biomedical Engineering: Imaging and Visualization, 5(4), 274–286.
Abstract: The investigation of spatio-temporal dynamics of bacterial cells and their molecular components requires automated image analysis tools to track cell shape properties and molecular component locations inside the cells. In the study of bacteria aging, the molecular components of interest are protein aggregates accumulated near bacteria boundaries. This particular location makes very ambiguous the correspondence between aggregates and cells, since computing accurately bacteria boundaries in phase-contrast time-lapse imaging is a challenging task. This paper proposes an active skeleton formulation for bacteria modeling which provides several advantages: an easy computation of shape properties (perimeter, length, thickness, orientation), an improved boundary accuracy in noisy images, and a natural bacteria-centered coordinate system that permits the intrinsic location of molecular components inside the cell. Starting from an initial skeleton estimate, the medial axis of the bacterium is obtained by minimizing an energy function which incorporates bacteria shape constraints. Experimental results on biological images and comparative evaluation of the performances validate the proposed approach for modeling cigar-shaped bacteria like Escherichia coli. The Image-J plugin of the proposed method can be found online at this http URL
Keywords: Bacteria modelling; medial axis; active contours; active skeleton; shape contraints
|
Svebor Karaman, Andrew Bagdanov, Lea Landucci, Gianpaolo D'Amico, Andrea Ferracani, Daniele Pezzatini, et al. (2016). Personalized multimedia content delivery on an interactive table by passive observation of museum visitors. MTAP - Multimedia Tools and Applications, 75(7), 3787–3811.
Abstract: The amount of multimedia data collected in museum databases is growing fast, while the capacity of museums to display information to visitors is acutely limited by physical space. Museums must seek the perfect balance of information given on individual pieces in order to provide sufficient information to aid visitor understanding while maintaining sparse usage of the walls and guaranteeing high appreciation of the exhibit. Moreover, museums often target the interests of average visitors instead of the entire spectrum of different interests each individual visitor might have. Finally, visiting a museum should not be an experience contained in the physical space of the museum but a door opened onto a broader context of related artworks, authors, artistic trends, etc. In this paper we describe the MNEMOSYNE system that attempts to address these issues through a new multimedia museum experience. Based on passive observation, the system builds a profile of the artworks of interest for each visitor. These profiles of interest are then used to drive an interactive table that personalizes multimedia content delivery. The natural user interface on the interactive table uses the visitor’s profile, an ontology of museum content and a recommendation system to personalize exploration of multimedia content. At the end of their visit, the visitor can take home a personalized summary of their visit on a custom mobile application. In this article we describe in detail each component of our approach as well as the first field trials of our prototype system built and deployed at our permanent exhibition space at LeMurate (http://www.lemurate.comune.fi.it/lemurate/) in Florence together with the first results of the evaluation process during the official installation in the National Museum of Bargello (http://www.uffizi.firenze.it/musei/?m=bargello).
Keywords: Computer vision; Video surveillance; Cultural heritage; Multimedia museum; Personalization; Natural interaction; Passive profiling
|
Iiris Lusi, Sergio Escalera, & Gholamreza Anbarjafari. (2016). SASE: RGB-Depth Database for Human Head Pose Estimation. In 14th European Conference on Computer Vision Workshops.
|
Iiris Lusi, Sergio Escalera, & Gholamreza Anbarjafari. (2016). Human Head Pose Estimation on SASE database using Random Hough Regression Forests. In 23rd International Conference on Pattern Recognition Workshops (Vol. 10165). LNCS.
Abstract: In recent years head pose estimation has become an important task in face analysis scenarios. Given the availability of high resolution 3D sensors, the design of a high resolution head pose database would be beneficial for the community. In this paper, Random Hough Forests are used to estimate 3D head pose and location on a new 3D head database, SASE, which represents the baseline performance on the new data for an upcoming international head pose estimation competition. The data in SASE is acquired with a Microsoft Kinect 2 camera, including the RGB and depth information of 50 subjects with a large sample of head poses, allowing us to test methods for real-life scenarios. We briefly review the database while showing baseline head pose estimation results based on Random Hough Forests.
|
Dennis H. Lundtoft, Kamal Nasrollahi, Thomas B. Moeslund, & Sergio Escalera. (2016). Spatiotemporal Facial Super-Pixels for Pain Detection. In 9th Conference on Articulated Motion and Deformable Objects.
Abstract: Best student paper award.
Pain detection using facial images is of critical importance in many Health applications. Since pain is a spatiotemporal process, recent works on this topic employ facial spatiotemporal features to detect pain. These systems extract such features from the entire area of the face. In this paper, we show that by employing super-pixels we can divide the face into three regions, in a way that only one of these regions (about one third of the face) contributes to the pain estimation and the other two regions can be discarded. The experimental results on the UNBCMcMaster database show that the proposed system using this single region outperforms state-of-the-art systems in detecting no-pain scenarios, while it reaches comparable results in detecting weak and severe pain scenarios.
Keywords: Facial images; Super-pixels; Spatiotemporal filters; Pain detection
|
Antonio Esteban Lansaque, Carles Sanchez, Agnes Borras, Marta Diez-Ferrer, Antoni Rosell, & Debora Gil. (2016). Stable Airway Center Tracking for Bronchoscopic Navigation. In 28th Conference of the international Society for Medical Innovation and Technology.
Abstract: Bronchoscopists use X‐ray fluoroscopy to guide bronchoscopes to the lesion to be biopsied without any kind of incisions. Reducing exposure to X‐ray is important for both patients and doctors but alternatives like electromagnetic navigation require specific equipment and increase the cost of the clinical procedure. We propose a guiding system based on the extraction of airway centers from intra‐operative videos. Such anatomical landmarks could be
matched to the airway centerline extracted from a pre‐planned CT to indicate the best path to the lesion. We present an extraction of lumen centers
from intra‐operative videos based on tracking of maximal stable regions of energy maps.
|
Antonio Esteban Lansaque, Carles Sanchez, Agnes Borras, Marta Diez-Ferrer, Antoni Rosell, & Debora Gil. (2016). Stable Anatomical Structure Tracking for video-bronchoscopy Navigation. In 19th International Conference on Medical Image Computing and Computer Assisted Intervention Workshops.
Abstract: Bronchoscopy allows to examine the patient airways for detection of lesions and sampling of tissues without surgery. A main drawback in lung cancer diagnosis is the diculty to check whether the exploration is following the correct path to the nodule that has to be biopsied. The most extended guidance uses uoroscopy which implies repeated radiation of clinical sta and patients. Alternatives such as virtual bronchoscopy or electromagnetic navigation are very expensive and not completely robust to blood, mocus or deformations as to be extensively used. We propose a method that extracts and tracks stable lumen regions at dierent levels of the bronchial tree. The tracked regions are stored in a tree that encodes the anatomical structure of the scene which can be useful to retrieve the path to the lesion that the clinician should follow to do the biopsy. We present a multi-expert validation of our anatomical landmark extraction in 3 intra-operative ultrathin explorations.
Keywords: Lung cancer diagnosis; video-bronchoscopy; airway lumen detection; region tracking
|
Jose Marone, Simone Balocco, Marc Bolaños, Jose Massa, & Petia Radeva. (2016). Learning the Lumen Border using a Convolutional Neural Networks classifier. In 19th International Conference on Medical Image Computing and Computer Assisted Intervention Workshop.
Abstract: IntraVascular UltraSound (IVUS) is a technique allowing the diagnosis of coronary plaque. An accurate (semi-)automatic assessment of the luminal contours could speed up the diagnosis. In most of the approaches, the information on the vessel shape is obtained combining a supervised learning step with a local refinement algorithm. In this paper, we explore for the first time, the use of a Convolutional Neural Networks (CNN) architecture that on one hand is able to extract the optimal image features and at the same time can serve as a supervised classifier to detect the lumen border in IVUS images. The main limitation of CNN, relies on the fact that this technique requires a large amount of training data due to the huge amount of parameters that it has. To
solve this issue, we introduce a patch classification approach to generate an extended training-set from a few annotated images. An accuracy of 93% and F-score of 71% was obtained with this technique, even when it was applied to challenging frames containig calcified plaques, stents and catheter shadows.
|
Pedro Martins, Paulo Carvalho, & Carlo Gatta. (2016). On the completeness of feature-driven maximally stable extremal regions. PRL - Pattern Recognition Letters, 74, 9–16.
Abstract: By definition, local image features provide a compact representation of the image in which most of the image information is preserved. This capability offered by local features has been overlooked, despite being relevant in many application scenarios. In this paper, we analyze and discuss the performance of feature-driven Maximally Stable Extremal Regions (MSER) in terms of the coverage of informative image parts (completeness). This type of features results from an MSER extraction on saliency maps in which features related to objects boundaries or even symmetry axes are highlighted. These maps are intended to be suitable domains for MSER detection, allowing this detector to provide a better coverage of informative image parts. Our experimental results, which were based on a large-scale evaluation, show that feature-driven MSER have relatively high completeness values and provide more complete sets than a traditional MSER detection even when sets of similar cardinality are considered.
Keywords: Local features; Completeness; Maximally Stable Extremal Regions
|
Joan Mas, Alicia Fornes, & Josep Llados. (2016). An Interactive Transcription System of Census Records using Word-Spotting based Information Transfer. In 12th IAPR Workshop on Document Analysis Systems (pp. 54–59).
Abstract: This paper presents a system to assist in the transcription of historical handwritten census records in a crowdsourcing platform. Census records have a tabular structured layout. They consist in a sequence of rows with information of homes ordered by street address. For each household snippet in the page, the list of family members is reported. The censuses are recorded in intervals of a few years and the information of individuals in each household is quite stable from a point in time to the next one. This redundancy is used to assist the transcriber, so the redundant information is transferred from the census already transcribed to the next one. Household records are aligned from one year to the next one using the knowledge of the ordering by street address. Given an already transcribed census, a query by string word spotting is applied. Thus, names from the census in time t are used as queries in the corresponding home record in time t+1. Since the search is constrained, the obtained precision-recall values are very high, with an important reduction in the transcription time. The proposed system has been tested in a real citizen-science experience where non expert users transcribe the census data of their home town.
|
H. Martin Kjer, Jens Fagertun, Sergio Vera, Debora Gil, Miguel Angel Gonzalez Ballester, & Rasmus R. Paulsena. (2016). Free-form image registration of human cochlear uCT data using skeleton similarity as anatomical prior. PRL - Patter Recognition Letters, 76(1), 76–82.
|