|
Jose Luis Alba, A. Pujol, & Juan J. Villanueva. (2001). Separating Geometry from Texture to Improve Face Analysis..
|
|
|
A. Pujol, Juan J. Villanueva, & Jose Luis Alba. (2001). Efficient Computation of Face Shape Similarity Using Distance Transform Eigendecomposition and Valleys..
|
|
|
Maria Vanrell, Felipe Lumbreras, A. Pujol, Ramon Baldrich, Josep Llados, & Juan J. Villanueva. (2001). Colour Normalisation Based on Background Information..
|
|
|
Roberto Morales, Juan Quispe, & Eduardo Aguilar. (2023). Exploring multi-food detection using deep learning-based algorithms. In 13th International Conference on Pattern Recognition Systems (pp. 1–7).
Abstract: People are becoming increasingly concerned about their diet, whether for disease prevention, medical treatment or other purposes. In meals served in restaurants, schools or public canteens, it is not easy to identify the ingredients and/or the nutritional information they contain. Currently, technological solutions based on deep learning models have facilitated the recording and tracking of food consumed based on the recognition of the main dish present in an image. Considering that sometimes there may be multiple foods served on the same plate, food analysis should be treated as a multi-class object detection problem. EfficientDet and YOLOv5 are object detection algorithms that have demonstrated high mAP and real-time performance on general domain data. However, these models have not been evaluated and compared on public food datasets. Unlike general domain objects, foods have more challenging features inherent in their nature that increase the complexity of detection. In this work, we performed a performance evaluation of Efficient-Det and YOLOv5 on three public food datasets: UNIMIB2016, UECFood256 and ChileanFood64. From the results obtained, it can be seen that YOLOv5 provides a significant difference in terms of both mAP and response time compared to EfficientDet in all datasets. Furthermore, YOLOv5 outperforms the state-of-the-art on UECFood256, achieving an improvement of more than 4% in terms of mAP@.50 over the best reported.
|
|
|
Gisel Bastidas-Guacho, Patricio Moreno, Boris X. Vintimilla, & Angel Sappa. (2023). Application on the Loop of Multimodal Image Fusion: Trends on Deep-Learning Based Approaches. In 13th International Conference on Pattern Recognition Systems (Vol. 14234, 25–36).
Abstract: Multimodal image fusion allows the combination of information from different modalities, which is useful for tasks such as object detection, edge detection, and tracking, to name a few. Using the fused representation for applications results in better task performance. There are several image fusion approaches, which have been summarized in surveys. However, the existing surveys focus on image fusion approaches where the application on the loop of multimodal image fusion is not considered. On the contrary, this study summarizes deep learning-based multimodal image fusion for computer vision (e.g., object detection) and image processing applications (e.g., semantic segmentation), that is, approaches where the application module leverages the multimodal fusion process to enhance the final result. Firstly, we introduce image fusion and the existing general frameworks for image fusion tasks such as multifocus, multiexposure and multimodal. Then, we describe the multimodal image fusion approaches. Next, we review the state-of-the-art deep learning multimodal image fusion approaches for vision applications. Finally, we conclude our survey with the trends of task-driven multimodal image fusion.
|
|
|
Naila Murray, Luca Marchesotti, & Florent Perronnin. (2012). Learning to Rank Images using Semantic and Aesthetic Labels. In 23rd British Machine Vision Conference (110.pp. 1–110.10).
Abstract: Most works on image retrieval from text queries have addressed the problem of retrieving semantically relevant images. However, the ability to assess the aesthetic quality of an image is an increasingly important differentiating factor for search engines. In this work, given a semantic query, we are interested in retrieving images which are semantically relevant and score highly in terms of aesthetics/visual quality. We use large-margin classifiers and rankers to learn statistical models capable of ordering images based on the aesthetic and semantic information. In particular, we compare two families of approaches: while the first one attempts to learn a single ranker which takes into account both semantic and aesthetic information, the second one learns separate semantic and aesthetic models. We carry out a quantitative and qualitative evaluation on a recently-published large-scale dataset and we show that the second family of techniques significantly outperforms the first one.
|
|
|
Firat Ismailoglu, Ida G. Sprinkhuizen-Kuyper, Evgueni Smirnov, Sergio Escalera, & Ralf Peeters. (2015). Fractional Programming Weighted Decoding for Error-Correcting Output Codes. In Multiple Classifier Systems, Proceedings of 12th International Workshop , MCS 2015 (pp. 38–50). Springer International Publishing.
Abstract: In order to increase the classification performance obtained using Error-Correcting Output Codes designs (ECOC), introducing weights in the decoding phase of the ECOC has attracted a lot of interest. In this work, we present a method for ECOC designs that focuses on increasing hypothesis margin on the data samples given a base classifier. While achieving this, we implicitly reward the base classifiers with high performance, whereas punish those with low performance. The resulting objective function is of the fractional programming type and we deal with this problem through the Dinkelbach’s Algorithm. The conducted tests over well known UCI datasets show that the presented method is superior to the unweighted decoding and that it outperforms the results of the state-of-the-art weighted decoding methods in most of the performed experiments.
|
|
|
Georg Langs, Petia Radeva, David Rotger, & Francesc Carreras. (2004). Explorative Building of 3D Vessel Tree Models.
|
|
|
Antonio Lopez, Joan Serrat, J. Saludes, Cristina Cañero, Felipe Lumbreras, & T. Graf. (2005). Ridgeness for Detecting Lane Markings.
|
|
|
Miguel Oliveira, L. Seabra Lopes, G. Hyun Lim, S. Hamidreza Kasaei, Angel Sappa, & A. Tom. (2015). Concurrent Learning of Visual Codebooks and Object Categories in Openended Domains. In International Conference on Intelligent Robots and Systems (pp. 2488–2495).
Abstract: In open-ended domains, robots must continuously learn new object categories. When the training sets are created offline, it is not possible to ensure their representativeness with respect to the object categories and features the system will find when operating online. In the Bag of Words model, visual codebooks are constructed from training sets created offline. This might lead to non-discriminative visual words and, as a consequence, to poor recognition performance. This paper proposes a visual object recognition system which concurrently learns in an incremental and online fashion both the visual object category representations as well as the codebook words used to encode them. The codebook is defined using Gaussian Mixture Models which are updated using new object views. The approach contains similarities with the human visual object recognition system: evidence suggests that the development of recognition capabilities occurs on multiple levels and is sustained over large periods of time. Results show that the proposed system with concurrent learning of object categories and codebooks is capable of learning more categories, requiring less examples, and with similar accuracies, when compared to the classical Bag of Words approach using offline constructed codebooks.
Keywords: Visual Learning; Computer Vision; Autonomous Agents
|
|
|
Salvatore Tabbone, & Josep Llados. (2007). A Propos de la Reconnaissance de Documents Graphiques: Synthese et Perspectives. In Traitement et Analyse de l’Information: Methodes et Applications (247–258).
|
|
|
Angel Sappa, & Boris X. Vintimilla. (2006). Edge Point Linking by Means of Global and Local Schemes. In IEEE Int. Conf. on Signal-Image Technology and Internet-Based Systems, Hammamet, Tunisia, December 2006, pp. 551-560..
|
|
|
Fadi Dornaika, & Bogdan Raducanu. (2008). Constructing Panoramic Views Through Facial Gaze Tracking. In IEEE International Conference on Multimedia and Expo, (969–972).
|
|
|
M. Bressan, David Guillamet, & Jordi Vitria. (2001). Using a local ICA Representation of High Dimensional Data for Object Recognition and Classification..
|
|
|
David Guillamet, M. Bressan, & Jordi Vitria. (2001). Weighted Non-negative Matrix Factorization for Local Representations..
|
|