|
Md. Mostafa Kamal Sarker, Mohammed Jabreel, Hatem A. Rashwan, Syeda Furruka Banu, Petia Radeva, & Domenec Puig. (2018). CuisineNet: Food Attributes Classification using Multi-scale Convolution Network. In 21st International Conference of the Catalan Association for Artificial Intelligence (pp. 365–372).
Abstract: Diversity of food and its attributes represents the culinary habits of peoples from different countries. Thus, this paper addresses the problem of identifying food culture of people around the world and its flavor by classifying two main food attributes, cuisine and flavor. A deep learning model based on multi-scale convotuional networks is proposed for extracting more accurate features from input images. The aggregation of multi-scale convolution layers with different kernel size is also used for weighting the features results from different scales. In addition, a joint loss function based on Negative Log Likelihood (NLL) is used to fit the model probability to multi labeled classes for multi-modal classification task. Furthermore, this work provides a new dataset for food attributes, so-called Yummly48K, extracted from the popular food website, Yummly. Our model is assessed on the constructed Yummly48K dataset. The experimental results show that our proposed method yields 65% and 62% average F1 score on validation and test set which outperforming the state-of-the-art models.
|
|
|
Angel Sappa, Rosa Herrero, Fadi Dornaika, David Geronimo, & Antonio Lopez. (2007). Road Approximation in Euclidean and v-Disparity Space: A Comparative Study. In EUROCAST2007, Workshop on Cybercars and Intelligent Vehicles (368–369).
Abstract: This paper presents a comparative study between two road approximation techniques—planar surfaces—from stereo vision data. The first approach is carried out in the v-disparity space and is based on a voting scheme, the Hough transform. The second one consists in computing the best fitting plane for the whole 3D road data points, directly in the Euclidean space, by using least squares fitting. The comparative study is initially performed over a set of different synthetic surfaces
(e.g., plane, quadratic surface, cubic surface) digitized by a virtual stereo head; then real data obtained with a commercial stereo head are used. The comparative study is intended to be used as a criterion for fining the best technique according to the road geometry. Additionally, it highlights common problems driven from a wrong assumption about the scene’s prior knowledge.
|
|
|
Katerine Diaz, Francesc J. Ferri, & W. Diaz. (2013). Fast Approximated Discriminative Common Vectors using rank-one SVD updates. In 20th International Conference On Neural Information Processing (Vol. 8228, pp. 368–375). LNCS. Springer Berlin Heidelberg.
Abstract: An efficient incremental approach to the discriminative common vector (DCV) method for dimensionality reduction and classification is presented. The proposal consists of a rank-one update along with an adaptive restriction on the rank of the null space which leads to an approximate but convenient solution. The algorithm can be implemented very efficiently in terms of matrix operations and space complexity, which enables its use in large-scale dynamic application domains. Deep comparative experimentation using publicly available high dimensional image datasets has been carried out in order to properly assess the proposed algorithm against several recent incremental formulations.
K. Diaz-Chito, F.J. Ferri, W. Diaz
|
|
|
Youssef El Rhabi, Simon Loic, Brun Luc, Josep Llados, & Felipe Lumbreras. (2016). Information Theoretic Rotationwise Robust Binary Descriptor Learning. In Joint IAPR International Workshops on Statistical Techniques in Pattern Recognition (SPR) and Structural and Syntactic Pattern Recognition (SSPR) (pp. 368–378).
Abstract: In this paper, we propose a new data-driven approach for binary descriptor selection. In order to draw a clear analysis of common designs, we present a general information-theoretic selection paradigm. It encompasses several standard binary descriptor construction schemes, including a recent state-of-the-art one named BOLD. We pursue the same endeavor to increase the stability of the produced descriptors with respect to rotations. To achieve this goal, we have designed a novel offline selection criterion which is better adapted to the online matching procedure. The effectiveness of our approach is demonstrated on two standard datasets, where our descriptor is compared to BOLD and to several classical descriptors. In particular, it emerges that our approach can reproduce equivalent if not better performance as BOLD while relying on twice shorter descriptors. Such an improvement can be influential for real-time applications.
|
|
|
Dimosthenis Karatzas, V. Poulain d'Andecy, & Marçal Rusiñol. (2016). Human-Document Interaction – a new frontier for document image analysis. In 12th IAPR Workshop on Document Analysis Systems (pp. 369–374).
Abstract: All indications show that paper documents will not cede in favour of their digital counterparts, but will instead be used increasingly in conjunction with digital information. An open challenge is how to seamlessly link the physical with the digital – how to continue taking advantage of the important affordances of paper, without missing out on digital functionality. This paper
presents the authors’ experience with developing systems for Human-Document Interaction based on augmented document interfaces and examines new challenges and opportunities arising for the document image analysis field in this area. The system presented combines state of the art camera-based document
image analysis techniques with a range of complementary tech-nologies to offer fluid Human-Document Interaction. Both fixed and nomadic setups are discussed that have gone through user testing in real-life environments, and use cases are presented that span the spectrum from business to educational application
|
|
|
Patricia Suarez, Dario Carpio, & Angel Sappa. (2023). A Deep Learning Based Approach for Synthesizing Realistic Depth Maps. In 22nd International Conference on Image Analysis and Processing (Vol. 14234, 369–380). LNCS.
Abstract: This paper presents a novel cycle generative adversarial network (CycleGAN) architecture for synthesizing high-quality depth maps from a given monocular image. The proposed architecture uses multiple loss functions, including cycle consistency, contrastive, identity, and least square losses, to enable the generation of realistic and high-fidelity depth maps. The proposed approach addresses this challenge by synthesizing depth maps from RGB images without requiring paired training data. Comparisons with several state-of-the-art approaches are provided showing the proposed approach overcome other approaches both in terms of quantitative metrics and visual quality.
|
|
|
D. Jayagopi, Bogdan Raducanu, & D. Gatica-Perez. (2009). Characterizing conversational group dynamics using nonverbal behaviour. In 10th IEEE International Conference on Multimedia and Expo (370–373).
Abstract: This paper addresses the novel problem of characterizing conversational group dynamics. It is well documented in social psychology that depending on the objectives a group, the dynamics are different. For example, a competitive meeting has a different objective from that of a collaborative meeting. We propose a method to characterize group dynamics based on the joint description of a group members' aggregated acoustical nonverbal behaviour to classify two meeting datasets (one being cooperative-type and the other being competitive-type). We use 4.5 hours of real behavioural multi-party data and show that our methodology can achieve a classification rate of upto 100%.
|
|
|
Fernando Vilariño, Stephan Ameling, Gerard Lacey, Stephen Patchett, & Hugh Mulcahy. (2009). Eye Tracking Search Patterns in Expert and Trainee Colonoscopists: A Novel Method of Assessing Endoscopic Competency? GI - Gastrointestinal Endoscopy, 69(5), 370.
|
|
|
Y. Mori, M.Misawa, Jorge Bernal, M. Bretthauer, S.Kudo, A. Rastogi, et al. (2022). Artificial Intelligence for Disease Diagnosis-the Gold Standard Challenge. Gastrointestinal Endoscopy, 96(2), 370–372.
|
|
|
Mario Rojas, David Masip, & Jordi Vitria. (2011). Automatic Detection of Facial Feature Points via HOGs and Geometric Prior Models. In 5th Iberian Conference on Pattern Recognition and Image Analysis (Vol. 6669, pp. 371–378). Springer Berlin Heidelberg.
Abstract: Most applications dealing with problems involving the face require a robust estimation of the facial salient points. Nevertheless, this estimation is not usually an automated preprocessing step in applications dealing with facial expression recognition. In this paper we present a simple method to detect facial salient points in the face. It is based on a prior Point Distribution Model and a robust object descriptor. The model learns the distribution of the points from the training data, as well as the amount of variation in location each point exhibits. Using this model, we reduce the search areas to look for each point. In addition, we also exploit the global consistency of the points constellation, increasing the detection accuracy. The method was tested on two separate data sets and the results, in some cases, outperform the state of the art.
|
|
|
Javier Vazquez, Maria Vanrell, & Ramon Baldrich. (2008). Towards a Psychophysical Evaluation of Colour Constancy Algorithms. In 4th European Conference on Colour in Graphics, Imaging and Vision Proceedings (372–377).
|
|
|
Arnau Ramisa, Adriana Tapus, David Aldavert, Ricardo Toledo, & Ramon Lopez de Mantaras. (2009). Robust Vision-Based Localization using Combinations of Local Feature Regions Detectors. AR - Autonomous Robots, 27(4), 373–385.
Abstract: This paper presents a vision-based approach for mobile robot localization. The model of the environment is topological. The new approach characterizes a place using a signature. This signature consists of a constellation of descriptors computed over different types of local affine covariant regions extracted from an omnidirectional image acquired rotating a standard camera with a pan-tilt unit. This type of representation permits a reliable and distinctive environment modelling. Our objectives were to validate the proposed method in indoor environments and, also, to find out if the combination of complementary local feature region detectors improves the localization versus using a single region detector. Our experimental results show that if false matches are effectively rejected, the combination of different covariant affine region detectors increases notably the performance of the approach by combining the different strengths of the individual detectors. In order to reduce the localization time, two strategies are evaluated: re-ranking the map nodes using a global similarity measure and using standard perspective view field of 45°.
In order to systematically test topological localization methods, another contribution proposed in this work is a novel method to see the degradation in localization performance as the robot moves away from the point where the original signature was acquired. This allows to know the robustness of the proposed signature. In order for this to be effective, it must be done in several, variated, environments that test all the possible situations in which the robot may have to perform localization.
|
|
|
J.Kuhn, A.Nussbaumer, J.Pirker, Dimosthenis Karatzas, A. Pagani, O.Conlan, et al. (2015). Advancing Physics Learning Through Traversing a Multi-Modal Experimentation Space. In Workshop Proceedings on the 11th International Conference on Intelligent Environments (Vol. 19, pp. 373–380).
Abstract: Translating conceptual knowledge into real world experiences presents a significant educational challenge. This position paper presents an approach that supports learners in moving seamlessly between conceptual learning and their application in the real world by bringing physical and virtual experiments into everyday settings. Learners are empowered in conducting these situated experiments in a variety of physical settings by leveraging state of the art mobile, augmented reality, and virtual reality technology. A blend of mobile-based multi-sensory physical experiments, augmented reality and enabling virtual environments can allow learners to bridge their conceptual learning with tangible experiences in a completely novel manner. This approach focuses on the learner by applying self-regulated personalised learning techniques, underpinned by innovative pedagogical approaches and adaptation techniques, to ensure that the needs and preferences of each learner are catered for individually.
|
|
|
M.J. Yzuel, J. Pladellorens, Joan Serrat, & A. Dupuy. (1993). Application restauration and edge detection techniques in the calculation of left ventricular volumes. In Optics in Medicine, Biology and Environmental Research : Selected contributions to the first International Conference on Optics within Life Sciences (OWLS I) (pp. 374–375). Elsevier.
|
|
|
Ekaterina Zaytseva, Santiago Segui, & Jordi Vitria. (2012). Sketchable Histograms of Oriented Gradients for Object Detection. In 17th Iberomerican Conference on Pattern Recognition (Vol. 7441, pp. 374–381). Springer Berlin Heidelberg.
Abstract: In this paper we investigate a new representation approach for visual object recognition. The new representation, called sketchable-HoG, extends the classical histogram of oriented gradients (HoG) feature by adding two different aspects: the stability of the majority orientation and the continuity of gradient orientations. In this way, the sketchable-HoG locally characterizes the complexity of an object model and introduces global structure information while still keeping simplicity, compactness and robustness. We evaluated the proposed image descriptor on publicly Catltech 101 dataset. The obtained results outperforms classical HoG descriptor as well as other reported descriptors in the literature.
|
|