|
Antonio Hernandez. (2015). From pixels to gestures: learning visual representations for human analysis in color and depth data sequences (Sergio Escalera, & Stan Sclaroff, Eds.). Ph.D. thesis, Ediciones Graficas Rey, .
Abstract: The visual analysis of humans from images is an important topic of interest due to its relevance to many computer vision applications like pedestrian detection, monitoring and surveillance, human-computer interaction, e-health or content-based image retrieval, among others.
In this dissertation we are interested in learning different visual representations of the human body that are helpful for the visual analysis of humans in images and video sequences. To that end, we analyze both RGB and depth image modalities and address the problem from three different research lines, at different levels of abstraction; from pixels to gestures: human segmentation, human pose estimation and gesture recognition.
First, we show how binary segmentation (object vs. background) of the human body in image sequences is helpful to remove all the background clutter present in the scene. The presented method, based on Graph cuts optimization, enforces spatio-temporal consistency of the produced segmentation masks among consecutive frames. Secondly, we present a framework for multi-label segmentation for obtaining much more detailed segmentation masks: instead of just obtaining a binary representation separating the human body from the background, finer segmentation masks can be obtained separating the different body parts.
At a higher level of abstraction, we aim for a simpler yet descriptive representation of the human body. Human pose estimation methods usually rely on skeletal models of the human body, formed by segments (or rectangles) that represent the body limbs, appropriately connected following the kinematic constraints of the human body. In practice, such skeletal models must fulfill some constraints in order to allow for efficient inference, while actually limiting the expressiveness of the model. In order to cope with this, we introduce a top-down approach for predicting the position of the body parts in the model, using a mid-level part representation based on Poselets.
Finally, we propose a framework for gesture recognition based on the bag of visual words framework. We leverage the benefits of RGB and depth image modalities by combining modality-specific visual vocabularies in a late fusion fashion. A new rotation-variant depth descriptor is presented, yielding better results than other state-of-the-art descriptors. Moreover, spatio-temporal pyramids are used to encode rough spatial and temporal structure. In addition, we present a probabilistic reformulation of Dynamic Time Warping for gesture segmentation in video sequences. A Gaussian-based probabilistic model of a gesture is learnt, implicitly encoding possible deformations in both spatial and time domains.
|
|
|
Adriana Romero. (2015). Assisting the training of deep neural networks with applications to computer vision (Carlo Gatta, & Petia Radeva, Eds.). Ph.D. thesis, Ediciones Graficas Rey, .
Abstract: Deep learning has recently been enjoying an increasing popularity due to its success in solving challenging tasks. In particular, deep learning has proven to be effective in a large variety of computer vision tasks, such as image classification, object recognition and image parsing. Contrary to previous research, which required engineered feature representations, designed by experts, in order to succeed, deep learning attempts to learn representation hierarchies automatically from data. More recently, the trend has been to go deeper with representation hierarchies.
Learning (very) deep representation hierarchies is a challenging task, which
involves the optimization of highly non-convex functions. Therefore, the search
for algorithms to ease the learning of (very) deep representation hierarchies from data is extensive and ongoing.
In this thesis, we tackle the challenging problem of easing the learning of (very) deep representation hierarchies. We present a hyper-parameter free, off-the-shelf, simple and fast unsupervised algorithm to discover hidden structure from the input data by enforcing a very strong form of sparsity. We study the applicability and potential of the algorithm to learn representations of varying depth in a handful of applications and domains, highlighting the ability of the algorithm to provide discriminative feature representations that are able to achieve top performance.
Yet, while emphasizing the great value of unsupervised learning methods when
labeled data is scarce, the recent industrial success of deep learning has revolved around supervised learning. Supervised learning is currently the focus of many recent research advances, which have shown to excel at many computer vision tasks. Top performing systems often involve very large and deep models, which are not well suited for applications with time or memory limitations. More in line with the current trends, we engage in making top performing models more efficient, by designing very deep and thin models. Since training such very deep models still appears to be a challenging task, we introduce a novel algorithm that guides the training of very thin and deep models by hinting their intermediate representations.
Very deep and thin models trained by the proposed algorithm end up extracting feature representations that are comparable or even better performing
than the ones extracted by large state-of-the-art models, while compellingly
reducing the time and memory consumption of the model.
|
|
|
Robert Benavente, Laura Igual, & Fernando Vilariño. (2008). Current Challenges in Computer Vision.
|
|
|
Giovanni Maria Farinella, Petia Radeva, & Jose Braz. (2020). Proceedings of the 15th International Joint Conference on Computer Vision; Imaging and Computer Graphics Theory and Applications (Vol. 4).
|
|
|
Giovanni Maria Farinella, Petia Radeva, & Jose Braz. (2020). Proceedings of the 15th International Joint Conference on Computer Vision; Imaging and Computer Graphics Theory and Applications (Vol. 5).
|
|
|
Giovanni Maria Farinella, Petia Radeva, Jose Braz, & Kadi Bouatouch. (2021). Proceedings of the 16th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications – (Volume 5) (Vol. 5).
Abstract: This book contains the proceedings of the 16th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2021) which was organized and sponsored by the Institute for Systems and Technologies of Information, Control and Communication (INSTICC), endorsed by the International Association for Pattern Recognition (IAPR), and in cooperation with the ACM Special Interest Group on Graphics and Interactive Techniques (SIGGRAPH), the European Association for Computer Graphics (EUROGRAPHICS), the EUROGRAPHICS Portuguese Chapter, the VRVis Center for Virtual Reality and Visualization Forschungs-GmbH, the French Association for Computer Graphics (AFIG), and the Society for Imaging Science and Technology (IS&T). The proceedings here published demonstrate new and innovative solutions and highlight technical problems in each field that are challenging and worthy of being disseminated to the interested research audiences. VISIGRAPP 2021 was organized to promote a discussion forum about the conference’s research topics between researchers, developers, manufacturers and end-users, and to establish guidelines in the development of more advanced solutions. This year VISIGRAPP was, exceptionally, held as a web-based event, due to the COVID-19 pandemic, from 8 – 10 February. We received a high number of paper submissions for this edition of VISIGRAPP, 371 in total, with contributions from 52 countries. This attests to the success and global dimension of VISIGRAPP. To evaluate each submission, we used a hierarchical process of double-blind evaluation where each paper was reviewed by two to six experts from the International Program Committee (IPC). The IPC selected for oral presentation and for publication as full papers 12 papers from GRAPP, 8 from HUCAPP, 11 papers from IVAPP, and 56 papers from VISAPP, which led to a result for the full-paper acceptance ratio of 24% and a high-quality program. Apart from the above full papers, the conference program also features 118 short papers and 67 poster presentations. We hope that these conference proceedings, which are submitted for indexation by Thomson Reuters Conference Proceedings Citation Index, SCOPUS, DBLP, Semantic Scholar, Google Scholar, EI and Microsoft Academic, will help the Computer Vision, Imaging, Visualization, Computer Graphics and Human-Computer Interaction communities to find interesting research work. Moreover, we are proud to inform that the program also includes three plenary keynote lectures, given by internationally distinguished researchers, namely Federico Tombari (Google and Technical University of Munich, Germany), Dieter Schmalstieg (Graz University of Technology, Austria) and Nathalie Henry Riche (Microsoft Research, United States), thus contributing to increase the overall quality of the conference and to provide a deeper understanding of the conference’s interest fields. Furthermore, a short list of the presented papers will be selected to be extended into a forthcoming book of VISIGRAPP Selected Papers to be published by Springer during 2021 in the CCIS series. Moreover, a short list of presented papers will be selected for publication of extended and revised versions in a special issue of the Springer Nature Computer Science journal. All papers presented at this conference will be available at the SCITEPRESS Digital Library. Three awards are delivered at the closing session, to recognize the best conference paper, the best student paper and the best poster for each of the four conferences. There is also an award for best industrial paper to be delivered at the closing session for VISAPP. We would like to express our thanks, first of all, to the authors of the technical papers, whose work and dedication made it possible to put together a program that we believe to be very exciting and of high technical quality. Next, we would like to thank the Area Chairs, all the members of the program committee and auxiliary reviewers, who helped us with their expertise and time. We would also like to thank the invited speakers for their invaluable contribution and for sharing their vision in their talks. Finally, we gratefully acknowledge the professional support of the INSTICC team for all organizational processes, especially given the need to introduce online streaming, forum management, direct messaging facilitation and other web-based activities in order to make it possible for VISIGRAPP 2021 authors to present their work and share ideas with colleagues in spite of the logistic difficulties caused by the current pandemic situation. We wish you all an exciting conference. We hope to meet you again for the next edition of VISIGRAPP, details of which are available at http://www. visigrapp.org.
|
|
|
Giovanni Maria Farinella, Petia Radeva, Jose Braz, & Kadi Bouatouch. (2021). Proceedings of the 16th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (Volume 4) (Vol. 4).
Abstract: This book contains the proceedings of the 16th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2021) which was organized and sponsored by the Institute for Systems and Technologies of Information, Control and Communication (INSTICC), endorsed by the International Association for Pattern Recognition (IAPR), and in cooperation with the ACM Special Interest Group on Graphics and Interactive Techniques (SIGGRAPH), the European Association for Computer Graphics (EUROGRAPHICS), the EUROGRAPHICS Portuguese Chapter, the VRVis Center for Virtual Reality and Visualization Forschungs-GmbH, the French Association for Computer Graphics (AFIG), and the Society for Imaging Science and Technology (IS&T). The proceedings here published demonstrate new and innovative solutions and highlight technical problems in each field that are challenging and worthy of being disseminated to the interested research audiences. VISIGRAPP 2021 was organized to promote a discussion forum about the conference’s research topics between researchers, developers, manufacturers and end-users, and to establish guidelines in the development of more advanced solutions. This year VISIGRAPP was, exceptionally, held as a web-based event, due to the COVID-19 pandemic, from 8 – 10 February. We received a high number of paper submissions for this edition of VISIGRAPP, 371 in total, with contributions from 52 countries. This attests to the success and global dimension of VISIGRAPP. To evaluate each submission, we used a hierarchical process of double-blind evaluation where each paper was reviewed by two to six experts from the International Program Committee (IPC). The IPC selected for oral presentation and for publication as full papers 12 papers from GRAPP, 8 from HUCAPP, 11 papers from IVAPP, and 56 papers from VISAPP, which led to a result for the full-paper acceptance ratio of 24% and a high-quality program. Apart from the above full papers, the conference program also features 118 short papers and 67 poster presentations. We hope that these conference proceedings, which are submitted for indexation by Thomson Reuters Conference Proceedings Citation Index, SCOPUS, DBLP, Semantic Scholar, Google Scholar, EI and Microsoft Academic, will help the Computer Vision, Imaging, Visualization, Computer Graphics and Human-Computer Interaction communities to find interesting research work. Moreover, we are proud to inform that the program also includes three plenary keynote lectures, given by internationally distinguished researchers, namely Federico Tombari (Google and Technical University of Munich, Germany), Dieter Schmalstieg (Graz University of Technology, Austria) and Nathalie Henry Riche (Microsoft Research, United States), thus contributing to increase the overall quality of the conference and to provide a deeper understanding of the conference’s interest fields. Furthermore, a short list of the presented papers will be selected to be extended into a forthcoming book of VISIGRAPP Selected Papers to be published by Springer during 2021 in the CCIS series. Moreover, a short list of presented papers will be selected for publication of extended and revised versions in a special issue of the Springer Nature Computer Science journal. All papers presented at this conference will be available at the SCITEPRESS Digital Library. Three awards are delivered at the closing session, to recognize the best conference paper, the best student paper and the best poster for each of the four conferences. There is also an award for best industrial paper to be delivered at the closing session for VISAPP. We would like to express our thanks, first of all, to the authors of the technical papers, whose work and dedication made it possible to put together a program that we believe to be very exciting and of high technical quality. Next, we would like to thank the Area Chairs, all the members of the program committee and auxiliary reviewers, who helped us with their expertise and time. We would also like to thank the invited speakers for their invaluable contribution and for sharing their vision in their talks. Finally, we gratefully acknowledge the professional support of the INSTICC team for all organizational processes, especially given the need to introduce online streaming, forum management, direct messaging facilitation and other web-based activities in order to make it possible for VISIGRAPP 2021 authors to present their work and share ideas with colleagues in spite of the logistic difficulties caused by the current pandemic situation. We wish you all an exciting conference. We hope to meet you again for the next edition of VISIGRAPP, details of which are available at http://www. visigrapp.org
|
|
|
Debora Gil, Jordi Gonzalez, & Gemma Sanchez (Eds.). (2007). Computer Vision: Advances in Research and Development. 2. Bellaterra (Spain): UAB.
|
|
|
Juan J. Villanueva. (2008). Visualization, Imaging, and Image Processing,.
|
|
|
Sergio Escalera, Xavier Baro, Oriol Pujol, Jordi Vitria, & Petia Radeva. (2011). Traffic-Sign Recognition Systems. Springer London.
|
|
|
David Geronimo, & Antonio Lopez. (2014). Vision-based Pedestrian Protection Systems for Intelligent Vehicles. Springer Briefs in Computer Vision.
Abstract: Pedestrian Protection Systems (PPSs) are on-board systems aimed at detecting and tracking people in the surroundings of a vehicle in order to avoid potentially dangerous situations. These systems, together with other Advanced Driver Assistance Systems (ADAS) such as lane departure warning or adaptive cruise control, are one of the most promising ways to improve traffic safety. By the use of computer vision, cameras working either in the visible or infra-red spectra have been demonstrated as a reliable sensor to perform this task. Nevertheless, the variability of human’s appearance, not only in terms of clothing and sizes but also as a result of their dynamic shape, makes pedestrians one of the most complex classes even for computer vision. Moreover, the unstructured changing and unpredictable environment in which such on-board systems must work makes detection a difficult task to be carried out with the demanded robustness. In this brief, the state of the art in PPSs is introduced through the review of the most relevant papers of the last decade. A common computational architecture is presented as a framework to organize each method according to its main contribution. More than 300 papers are referenced, most of them addressing pedestrian detection and others corresponding to the descriptors (features), pedestrian models, and learning machines used. In addition, an overview of topics such as real-time aspects, systems benchmarking and future challenges of this research area are presented.
Keywords: Computer Vision; Driver Assistance Systems; Intelligent Vehicles; Pedestrian Detection; Vulnerable Road Users
|
|
|
Marçal Rusiñol, & Josep Llados. (2010). Symbol Spotting in Digital Libraries:Focused Retrieval over Graphic-rich Document Collections. Springer.
Abstract: The specific problem of symbol recognition in graphical documents requires additional techniques to those developed for character recognition. The most well-known obstacle is the so-called Sayre paradox: Correct recognition requires good segmentation, yet improvement in segmentation is achieved using information provided by the recognition process. This dilemma can be avoided by techniques that identify sets of regions containing useful information. Such symbol-spotting methods allow the detection of symbols in maps or technical drawings without having to fully segment or fully recognize the entire content.
This unique text/reference provides a complete, integrated and large-scale solution to the challenge of designing a robust symbol-spotting method for collections of graphic-rich documents. The book examines a number of features and descriptors, from basic photometric descriptors commonly used in computer vision techniques to those specific to graphical shapes, presenting a methodology which can be used in a wide variety of applications. Additionally, readers are supplied with an insight into the problem of performance evaluation of spotting methods. Some very basic knowledge of pattern recognition, document image analysis and graphics recognition is assumed.
Keywords: Focused Retrieval , Graphical Pattern Indexation,Graphics Recognition ,Pattern Recognition , Performance Evaluation , Symbol Description ,Symbol Spotting
|
|
|
Jun Wan, Guodong Guo, Sergio Escalera, Hugo Jair Escalante, & Stan Z. Li. (2020). Multi-modal Face Presentation Attach Detection (Vol. 13).
|
|
|
Michael Teutsch, Angel Sappa, & Riad I. Hammoud. (2021). Computer Vision in the Infrared Spectrum: Challenges and Approaches (Vol. 10).
Abstract: Human visual perception is limited to the visual-optical spectrum. Machine vision is not. Cameras sensitive to the different infrared spectra can enhance the abilities of autonomous systems and visually perceive the environment in a holistic way. Relevant scene content can be made visible especially in situations, where sensors of other modalities face issues like a visual-optical camera that needs a source of illumination. As a consequence, not only human mistakes can be avoided by increasing the level of automation, but also machine-induced errors can be reduced that, for example, could make a self-driving car crash into a pedestrian under difficult illumination conditions. Furthermore, multi-spectral sensor systems with infrared imagery as one modality are a rich source of information and can provably increase the robustness of many autonomous systems. Applications that can benefit from utilizing infrared imagery range from robotics to automotive and from biometrics to surveillance. In this book, we provide a brief yet concise introduction to the current state-of-the-art of computer vision and machine learning in the infrared spectrum. Based on various popular computer vision tasks such as image enhancement, object detection, or object tracking, we first motivate each task starting from established literature in the visual-optical spectrum. Then, we discuss the differences between processing images and videos in the visual-optical spectrum and the various infrared spectra. An overview of the current literature is provided together with an outlook for each task. Furthermore, available and annotated public datasets and common evaluation methods and metrics are presented. In a separate chapter, popular applications that can greatly benefit from the use of infrared imagery as a data source are presented and discussed. Among them are automatic target recognition, video surveillance, or biometrics including face recognition. Finally, we conclude with recommendations for well-fitting sensor setups and data processing algorithms for certain computer vision tasks. We address this book to prospective researchers and engineers new to the field but also to anyone who wants to get introduced to the challenges and the approaches of computer vision using infrared images or videos. Readers will be able to start their work directly after reading the book supported by a highly comprehensive backlog of recent and relevant literature as well as related infrared datasets including existing evaluation frameworks. Together with consistently decreasing costs for infrared cameras, new fields of application appear and make computer vision in the infrared spectrum a great opportunity to face nowadays scientific and engineering challenges.
|
|
|
Sergio Escalera, & Ralf Herbrich. (2020). The NeurIPS’18 Competition: From Machine Learning to Intelligent Conversations (Sergio Escalera, & Ralf Hebrick, Eds.).
Abstract: This volume presents the results of the Neural Information Processing Systems Competition track at the 2018 NeurIPS conference. The competition follows the same format as the 2017 competition track for NIPS. Out of 21 submitted proposals, eight competition proposals were selected, spanning the area of Robotics, Health, Computer Vision, Natural Language Processing, Systems and Physics. Competitions have become an integral part of advancing state-of-the-art in artificial intelligence (AI). They exhibit one important difference to benchmarks: Competitions test a system end-to-end rather than evaluating only a single component; they assess the practicability of an algorithmic solution in addition to assessing feasibility.
|
|