|
Maedeh Aghaei, Mariella Dimiccoli, & Petia Radeva. (2016). Multi-face tracking by extended bag-of-tracklets in egocentric photo-streams. CVIU - Computer Vision and Image Understanding, 149, 146–156.
Abstract: Wearable cameras offer a hands-free way to record egocentric images of daily experiences, where social events are of special interest. The first step towards detection of social events is to track the appearance of multiple persons involved in them. In this paper, we propose a novel method to find correspondences of multiple faces in low temporal resolution egocentric videos acquired through a wearable camera. This kind of photo-stream imposes additional challenges to the multi-tracking problem with respect to conventional videos. Due to the free motion of the camera and to its low temporal resolution, abrupt changes in the field of view, in illumination condition and in the target location are highly frequent. To overcome such difficulties, we propose a multi-face tracking method that generates a set of tracklets through finding correspondences along the whole sequence for each detected face and takes advantage of the tracklets redundancy to deal with unreliable ones. Similar tracklets are grouped into the so called extended bag-of-tracklets (eBoT), which is aimed to correspond to a specific person. Finally, a prototype tracklet is extracted for each eBoT, where the occurred occlusions are estimated by relying on a new measure of confidence. We validated our approach over an extensive dataset of egocentric photo-streams and compared it to state of the art methods, demonstrating its effectiveness and robustness.
|
|
|
Maedeh Aghaei, Mariella Dimiccoli, C. Canton-Ferrer, & Petia Radeva. (2018). Towards social pattern characterization from egocentric photo-streams. CVIU - Computer Vision and Image Understanding, 171, 104–117.
Abstract: Following the increasingly popular trend of social interaction analysis in egocentric vision, this article presents a comprehensive pipeline for automatic social pattern characterization of a wearable photo-camera user. The proposed framework relies merely on the visual analysis of egocentric photo-streams and consists of three major steps. The first step is to detect social interactions of the user where the impact of several social signals on the task is explored. The detected social events are inspected in the second step for categorization into different social meetings. These two steps act at event-level where each potential social event is modeled as a multi-dimensional time-series, whose dimensions correspond to a set of relevant features for each task; finally, LSTM is employed to classify the time-series. The last step of the framework is to characterize social patterns of the user. Our goal is to quantify the duration, the diversity and the frequency of the user social relations in various social situations. This goal is achieved by the discovery of recurrences of the same people across the whole set of social events related to the user. Experimental evaluation over EgoSocialStyle – the proposed dataset in this work, and EGO-GROUP demonstrates promising results on the task of social pattern characterization from egocentric photo-streams.
Keywords: Social pattern characterization; Social signal extraction; Lifelogging; Convolutional and recurrent neural networks
|
|
|
M. Visani, Oriol Ramos Terrades, & Salvatore Tabbone. (2011). A Protocol to Characterize the Descriptive Power and the Complementarity of Shape Descriptors. IJDAR - International Journal on Document Analysis and Recognition, 14(1), 87–100.
Abstract: Most document analysis applications rely on the extraction of shape descriptors, which may be grouped into different categories, each category having its own advantages and drawbacks (O.R. Terrades et al. in Proceedings of ICDAR’07, pp. 227–231, 2007). In order to improve the richness of their description, many authors choose to combine multiple descriptors. Yet, most of the authors who propose a new descriptor content themselves with comparing its performance to the performance of a set of single state-of-the-art descriptors in a specific applicative context (e.g. symbol recognition, symbol spotting...). This results in a proliferation of the shape descriptors proposed in the literature. In this article, we propose an innovative protocol, the originality of which is to be as independent of the final application as possible and which relies on new quantitative and qualitative measures. We introduce two types of measures: while the measures of the first type are intended to characterize the descriptive power (in terms of uniqueness, distinctiveness and robustness towards noise) of a descriptor, the second type of measures characterizes the complementarity between multiple descriptors. Characterizing upstream the complementarity of shape descriptors is an alternative to the usual approach where the descriptors to be combined are selected by trial and error, considering the performance characteristics of the overall system. To illustrate the contribution of this protocol, we performed experimental studies using a set of descriptors and a set of symbols which are widely used by the community namely ART and SC descriptors and the GREC 2003 database.
Keywords: Document analysis; Shape descriptors; Symbol description; Performance characterization; Complementarity analysis
|
|
|
M. Bressan, & Jordi Vitria. (2002). Independent Component Analysis and Naïve Bayes Classification. Proceedings of the Second IASTED International Conference Visualilzation, Imaging and Image Proceesing VIIP 2002: 496–501., .
|
|
|
M. Bressan, & Jordi Vitria. (2003). Nonparametric Discriminant Analysis and Nearest Neighbor Classification. PRL - Pattern Recognition Letters, 24(15), 2743–2749.
|
|
|
M. Altillawi, S. Li, S.M. Prakhya, Z. Liu, & Joan Serrat. (2024). Implicit Learning of Scene Geometry From Poses for Global Localization. ROBOTAUTOMLET - IEEE Robotics and Automation Letters, 9(2), 955–962.
Abstract: Global visual localization estimates the absolute pose of a camera using a single image, in a previously mapped area. Obtaining the pose from a single image enables many robotics and augmented/virtual reality applications. Inspired by latest advances in deep learning, many existing approaches directly learn and regress 6 DoF pose from an input image. However, these methods do not fully utilize the underlying scene geometry for pose regression. The challenge in monocular relocalization is the minimal availability of supervised training data, which is just the corresponding 6 DoF poses of the images. In this letter, we propose to utilize these minimal available labels (i.e., poses) to learn the underlying 3D geometry of the scene and use the geometry to estimate the 6 DoF camera pose. We present a learning method that uses these pose labels and rigid alignment to learn two 3D geometric representations ( X, Y, Z coordinates ) of the scene, one in camera coordinate frame and the other in global coordinate frame. Given a single image, it estimates these two 3D scene representations, which are then aligned to estimate a pose that matches the pose label. This formulation allows for the active inclusion of additional learning constraints to minimize 3D alignment errors between the two 3D scene representations, and 2D re-projection errors between the 3D global scene representation and 2D image pixels, resulting in improved localization accuracy. During inference, our model estimates the 3D scene geometry in camera and global frames and aligns them rigidly to obtain pose in real-time. We evaluate our work on three common visual localization datasets, conduct ablation studies, and show that our method exceeds state-of-the-art regression methods' pose accuracy on all datasets.
Keywords: Localization; Localization and mapping; Deep learning for visual perception; Visual learning
|
|
|
Luis Herranz, Shuqiang Jiang, & Ruihan Xu. (2017). Modeling Restaurant Context for Food Recognition. TMM - IEEE Transactions on Multimedia, 19(2), 430–440.
Abstract: Food photos are widely used in food logs for diet monitoring and in social networks to share social and gastronomic experiences. A large number of these images are taken in restaurants. Dish recognition in general is very challenging, due to different cuisines, cooking styles, and the intrinsic difficulty of modeling food from its visual appearance. However, contextual knowledge can be crucial to improve recognition in such scenario. In particular, geocontext has been widely exploited for outdoor landmark recognition. Similarly, we exploit knowledge about menus and location of restaurants and test images. We first adapt a framework based on discarding unlikely categories located far from the test image. Then, we reformulate the problem using a probabilistic model connecting dishes, restaurants, and locations. We apply that model in three different tasks: dish recognition, restaurant recognition, and location refinement. Experiments on six datasets show that by integrating multiple evidences (visual, location, and external knowledge) our system can boost the performance in all tasks.
|
|
|
Luca Ginanni Corradini, Simone Balocco, Luciano Maresca, Silvio Vitale, & Matteo Stefanini. (2023). Anatomical Modifications After Stent Implantation: A Comparative Analysis Between CGuard, Wallstent, and Roadsaver Carotid Stents. Journal of Endovascular Therapy, 30(1), 18–24.
Abstract: Abstract
Purpose:
Carotid revascularization can be associated with modifications of the vascular geometry, which may lead to complications. The changes on the vessel angulation before and after a carotid WallStent (WS) implantation are compared against 2 new dual-layer devices, CGuard (CG) and RoadSaver (RS).
Materials and Methods:
The study prospectively recruited 217 consecutive patients (112 GC, 73 WS, and 32 RS, respectively). Angiography projections were explored and the one having a higher arterial angle was selected as a basal view. After stent implantation, a stent control angiography was performed selecting the projection having the maximal angle. The same procedure is followed in all the 3 stent types to guarantee comparable conditions. The angulation changes on the stented segments were quantified from both angiographies. The statistical analysis quantitatively compared the pre-and post-angles for the 3 stent types. The results are qualitatively illustrated using boxplots. Finally, the relation between pre- and post-angles measurements is analyzed using linear regression.
Results:
For CG, no statistical difference in the axial vessel geometry between the basal and postprocedural angles was found. For WS and RS, statistical difference was found between pre- and post-angles. The regression analysis shows that CG induces lower changes from the original curvature with respect to WS and RS.
Conclusion:
Based on our results, CG determines minor changes over the basal morphology than WS and RS stents. Hence, CG respects better the native vessel anatomy than the other stents.
Level of Evidence: Level 4, Case Series.
Keywords: Ginanni Corradini L, Balocco S, Maresca L, Vitale S, Stefanini M.
|
|
|
Lu Yu, Xialei Liu, & Joost Van de Weijer. (2022). Self-Training for Class-Incremental Semantic Segmentation. TNNLS - IEEE Transactions on Neural Networks and Learning Systems, .
Abstract: In class-incremental semantic segmentation, we have no access to the labeled data of previous tasks. Therefore, when incrementally learning new classes, deep neural networks suffer from catastrophic forgetting of previously learned knowledge. To address this problem, we propose to apply a self-training approach that leverages unlabeled data, which is used for rehearsal of previous knowledge. Specifically, we first learn a temporary model for the current task, and then, pseudo labels for the unlabeled data are computed by fusing information from the old model of the previous task and the current temporary model. In addition, conflict reduction is proposed to resolve the conflicts of pseudo labels generated from both the old and temporary models. We show that maximizing self-entropy can further improve results by smoothing the overconfident predictions. Interestingly, in the experiments, we show that the auxiliary data can be different from the training data and that even general-purpose, but diverse auxiliary data can lead to large performance gains. The experiments demonstrate the state-of-the-art results: obtaining a relative gain of up to 114% on Pascal-VOC 2012 and 8.5% on the more challenging ADE20K compared to previous state-of-the-art methods.
Keywords: Class-incremental learning; Self-training; Semantic segmentation.
|
|
|
Lu Yu, Lichao Zhang, Joost Van de Weijer, Fahad Shahbaz Khan, Yongmei Cheng, & C. Alejandro Parraga. (2018). Beyond Eleven Color Names for Image Understanding. MVAP - Machine Vision and Applications, 29(2), 361–373.
Abstract: Color description is one of the fundamental problems of image understanding. One of the popular ways to represent colors is by means of color names. Most existing work on color names focuses on only the eleven basic color terms of the English language. This could be limiting the discriminative power of these representations, and representations based on more color names are expected to perform better. However, there exists no clear strategy to choose additional color names. We collect a dataset of 28 additional color names. To ensure that the resulting color representation has high discriminative power we propose a method to order the additional color names according to their complementary nature with the basic color names. This allows us to compute color name representations with high discriminative power of arbitrary length. In the experiments we show that these new color name descriptors outperform the existing color name descriptor on the task of visual tracking, person re-identification and image classification.
Keywords: Color name; Discriminative descriptors; Image classification; Re-identification; Tracking
|
|
|
Lorenzo Seidenari, Giuseppe Serra, Andrew Bagdanov, & Alberto del Bimbo. (2014). Local pyramidal descriptors for image recognition. TPAMI - IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(5), 1033–1040.
Abstract: In this paper we present a novel method to improve the flexibility of descriptor matching for image recognition by using local multiresolution
pyramids in feature space. We propose that image patches be represented at multiple levels of descriptor detail and that these levels be defined in terms of local spatial pooling resolution. Preserving multiple levels of detail in local descriptors is a way of hedging one’s bets on which levels will most relevant for matching during learning and recognition. We introduce the Pyramid SIFT (P-SIFT) descriptor and show that its use in four state-of-the-art image recognition pipelines improves accuracy and yields state-of-the-art results. Our technique is applicable independently of spatial pyramid matching and we show that spatial pyramids can be combined with local pyramids to obtain
further improvement.We achieve state-of-the-art results on Caltech-101
(80.1%) and Caltech-256 (52.6%) when compared to other approaches based on SIFT features over intensity images. Our technique is efficient and is extremely easy to integrate into image recognition pipelines.
Keywords: Object categorization; local features; kernel methods
|
|
|
Lluis Pere de las Heras, Oriol Ramos Terrades, Sergi Robles, & Gemma Sanchez. (2015). CVC-FP and SGT: a new database for structural floor plan analysis and its groundtruthing tool. IJDAR - International Journal on Document Analysis and Recognition, 18(1), 15–30.
Abstract: Recent results on structured learning methods have shown the impact of structural information in a wide range of pattern recognition tasks. In the field of document image analysis, there is a long experience on structural methods for the analysis and information extraction of multiple types of documents. Yet, the lack of conveniently annotated and free access databases has not benefited the progress in some areas such as technical drawing understanding. In this paper, we present a floor plan database, named CVC-FP, that is annotated for the architectural objects and their structural relations. To construct this database, we have implemented a groundtruthing tool, the SGT tool, that allows to make specific this sort of information in a natural manner. This tool has been made for general purpose groundtruthing: It allows to define own object classes and properties, multiple labeling options are possible, grants the cooperative work, and provides user and version control. We finally have collected some of the recent work on floor plan interpretation and present a quantitative benchmark for this database. Both CVC-FP database and the SGT tool are freely released to the research community to ease comparisons between methods and boost reproducible research.
|
|
|
Lluis Pere de las Heras, Ahmed Sheraz, Marcus Liwicki, Ernest Valveny, & Gemma Sanchez. (2014). Statistical Segmentation and Structural Recognition for Floor Plan Interpretation. IJDAR - International Journal on Document Analysis and Recognition, 17(3), 221–237.
Abstract: A generic method for floor plan analysis and interpretation is presented in this article. The method, which is mainly inspired by the way engineers draw and interpret floor plans, applies two recognition steps in a bottom-up manner. First, basic building blocks, i.e., walls, doors, and windows are detected using a statistical patch-based segmentation approach. Second, a graph is generated, and structural pattern recognition techniques are applied to further locate the main entities, i.e., rooms of the building. The proposed approach is able to analyze any type of floor plan regardless of the notation used. We have evaluated our method on different publicly available datasets of real architectural floor plans with different notations. The overall detection and recognition accuracy is about 95 %, which is significantly better than any other state-of-the-art method. Our approach is generic enough such that it could be easily adopted to the recognition and interpretation of any other printed machine-generated structured documents.
|
|
|
Lluis Gomez, & Dimosthenis Karatzas. (2016). A fast hierarchical method for multi‐script and arbitrary oriented scene text extraction. IJDAR - International Journal on Document Analysis and Recognition, 19(4), 335–349.
Abstract: Typography and layout lead to the hierarchical organisation of text in words, text lines, paragraphs. This inherent structure is a key property of text in any script and language, which has nonetheless been minimally leveraged by existing text detection methods. This paper addresses the problem of text
segmentation in natural scenes from a hierarchical perspective.
Contrary to existing methods, we make explicit use of text structure, aiming directly to the detection of region groupings corresponding to text within a hierarchy produced by an agglomerative similarity clustering process over individual regions. We propose an optimal way to construct such an hierarchy introducing a feature space designed to produce text group hypotheses with
high recall and a novel stopping rule combining a discriminative classifier and a probabilistic measure of group meaningfulness based in perceptual organization. Results obtained over four standard datasets, covering text in variable orientations and different languages, demonstrate that our algorithm, while being trained in a single mixed dataset, outperforms state of the art
methods in unconstrained scenarios.
Keywords: scene text; segmentation; detection; hierarchical grouping; perceptual organisation
|
|
|
Lluis Gomez, & Dimosthenis Karatzas. (2017). TextProposals: a Text‐specific Selective Search Algorithm for Word Spotting in the Wild. PR - Pattern Recognition, 70, 60–74.
Abstract: Motivated by the success of powerful while expensive techniques to recognize words in a holistic way (Goel et al., 2013; Almazán et al., 2014; Jaderberg et al., 2016) object proposals techniques emerge as an alternative to the traditional text detectors. In this paper we introduce a novel object proposals method that is specifically designed for text. We rely on a similarity based region grouping algorithm that generates a hierarchy of word hypotheses. Over the nodes of this hierarchy it is possible to apply a holistic word recognition method in an efficient way.
Our experiments demonstrate that the presented method is superior in its ability of producing good quality word proposals when compared with class-independent algorithms. We show impressive recall rates with a few thousand proposals in different standard benchmarks, including focused or incidental text datasets, and multi-language scenarios. Moreover, the combination of our object proposals with existing whole-word recognizers (Almazán et al., 2014; Jaderberg et al., 2016) shows competitive performance in end-to-end word spotting, and, in some benchmarks, outperforms previously published results. Concretely, in the challenging ICDAR2015 Incidental Text dataset, we overcome in more than 10% F-score the best-performing method in the last ICDAR Robust Reading Competition (Karatzas, 2015). Source code of the complete end-to-end system is available at https://github.com/lluisgomez/TextProposals.
|
|