Publicacions CVC -- Query Results

	Publicacions CVC Home \| Show All \| Simple Search \| Advanced Search \| Add Record \| Import	Login Quick Search: Field: contains: ...
	1–15 of 18 records found matching your query (RSS \| history):

Search & Display Options

Select All Deselect All

<< 1 2 >>

List View

Citations

Details

	Records
	Author	David Geronimo; Angel Sappa; Daniel Ponsa; Antonio Lopez
	Title	2D-3D based on-board pedestrian detection system			Type	Journal Article
	Year	2010	Publication	Computer Vision and Image Understanding	Abbreviated Journal	CVIU
	Volume	114	Issue	5	Pages	583–595
	Keywords	Pedestrian detection; Advanced Driver Assistance Systems; Horizon line; Haar wavelets; Edge orientation histograms
	Abstract	During the next decade, on-board pedestrian detection systems will play a key role in the challenge of increasing traffic safety. The main target of these systems, to detect pedestrians in urban scenarios, implies overcoming difficulties like processing outdoor scenes from a mobile platform and searching for aspect-changing objects in cluttered environments. This makes such systems combine techniques in the state-of-the-art Computer Vision. In this paper we present a three module system based on both 2D and 3D cues. The first module uses 3D information to estimate the road plane parameters and thus select a coherent set of regions of interest (ROIs) to be further analyzed. The second module uses Real AdaBoost and a combined set of Haar wavelets and edge orientation histograms to classify the incoming ROIs as pedestrian or non-pedestrian. The final module loops again with the 3D cue in order to verify the classified ROIs and with the 2D in order to refine the final results. According to the results, the integration of the proposed techniques gives rise to a promising system.
	Address	Computer Vision and Image Understanding (Special Issue on Intelligent Vision Systems), Vol. 114(5):583-595
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN	1077-3142	ISBN		Medium
	Area		Expedition		Conference
	Notes	ADAS			Approved	no
	Call Number	ADAS @ adas @ GSP2010			Serial	1341
Permanent link to this record



	Author	Debora Gil; Petia Radeva
	Title	Extending anisotropic operators to recover smooth shapes			Type	Journal Article
	Year	2005	Publication	Computer Vision and Image Understanding	Abbreviated Journal
	Volume	99	Issue	1	Pages	110-125
	Keywords	Contour completion; Functional extension; Differential operators; Riemmanian manifolds; Snake segmentation
	Abstract	Anisotropic differential operators are widely used in image enhancement processes. Recently, their property of smoothly extending functions to the whole image domain has begun to be exploited. Strong ellipticity of differential operators is a requirement that ensures existence of a unique solution. This condition is too restrictive for operators designed to extend image level sets: their own functionality implies that they should restrict to some vector field. The diffusion tensor that defines the diffusion operator links anisotropic processes with Riemmanian manifolds. In this context, degeneracy implies restricting diffusion to the varieties generated by the vector fields of positive eigenvalues, provided that an integrability condition is satisfied. We will use that any smooth vector field fulfills this integrability requirement to design line connection algorithms for contour completion. As application we present a segmenting strategy that assures convergent snakes whatever the geometry of the object to be modelled is.
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN	1077-3142	ISBN		Medium
	Area		Expedition		Conference
	Notes	IAM;MILAB			Approved	no
	Call Number	IAM @ iam @ GIR2005			Serial	1530
Permanent link to this record



	Author	Bhaskar Chakraborty; Michael Holte; Thomas B. Moeslund; Jordi Gonzalez
	Title	Selective Spatio-Temporal Interest Points			Type	Journal Article
	Year	2012	Publication	Computer Vision and Image Understanding	Abbreviated Journal	CVIU
	Volume	116	Issue	3	Pages	396-410
	Keywords
	Abstract	Recent progress in the field of human action recognition points towards the use of Spatio-TemporalInterestPoints (STIPs) for local descriptor-based recognition strategies. In this paper, we present a novel approach for robust and selective STIP detection, by applying surround suppression combined with local and temporal constraints. This new method is significantly different from existing STIP detection techniques and improves the performance by detecting more repeatable, stable and distinctive STIPs for human actors, while suppressing unwanted background STIPs. For action representation we use a bag-of-video words (BoV) model of local N-jet features to build a vocabulary of visual-words. To this end, we introduce a novel vocabulary building strategy by combining spatial pyramid and vocabulary compression techniques, resulting in improved performance and efficiency. Action class specific Support Vector Machine (SVM) classifiers are trained for categorization of human actions. A comprehensive set of experiments on popular benchmark datasets (KTH and Weizmann), more challenging datasets of complex scenes with background clutter and camera motion (CVC and CMU), movie and YouTube video clips (Hollywood 2 and YouTube), and complex scenes with multiple actors (MSR I and Multi-KTH), validates our approach and show state-of-the-art performance. Due to the unavailability of ground truth action annotation data for the Multi-KTH dataset, we introduce an actor specific spatio-temporal clustering of STIPs to address the problem of automatic action annotation of multiple simultaneous actors. Additionally, we perform cross-data action recognition by training on source datasets (KTH and Weizmann) and testing on completely different and more challenging target datasets (CVC, CMU, MSR I and Multi-KTH). This documents the robustness of our proposed approach in the realistic scenario, using separate training and test datasets, which in general has been a shortcoming in the performance evaluation of human action recognition techniques.
	Address
	Corporate Author				Thesis
	Publisher	Elsevier	Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN	1077-3142	ISBN		Medium
	Area		Expedition		Conference
	Notes	ISE			Approved	no
	Call Number	Admin @ si @ CHM2012			Serial	1806
Permanent link to this record



	Author	Susana Alvarez; Anna Salvatella; Maria Vanrell; Xavier Otazu
	Title	Low-dimensional and Comprehensive Color Texture Description			Type	Journal Article
	Year	2012	Publication	Computer Vision and Image Understanding	Abbreviated Journal	CVIU
	Volume	116	Issue	I	Pages	54-67
	Keywords
	Abstract	Image retrieval can be dealt by combining standard descriptors, such as those of MPEG-7, which are deﬁned independently for each visual cue (e.g. SCD or CLD for Color, HTD for texture or EHD for edges). A common problem is to combine similarities coming from descriptors representing different concepts in different spaces. In this paper we propose a color texture description that bypasses this problem from its inherent deﬁnition. It is based on a low dimensional space with 6 perceptual axes. Texture is described in a 3D space derived from a direct implementation of the original Julesz’s Texton theory and color is described in a 3D perceptual space. This early fusion through the blob concept in these two bounded spaces avoids the problem and allows us to derive a sparse color-texture descriptor that achieves similar performance compared to MPEG-7 in image retrieval. Moreover, our descriptor presents comprehensive qualities since it can also be applied either in segmentation or browsing: (a) a dense image representation is deﬁned from the descriptor showing a reasonable performance in locating texture patterns included in complex images; and (b) a vocabulary of basic terms is derived to build an intermediate level descriptor in natural language improving browsing by bridging semantic gap
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN	1077-3142	ISBN		Medium
	Area		Expedition		Conference
	Notes	CAT;CIC			Approved	no
	Call Number	Admin @ si @ ASV2012			Serial	1827
Permanent link to this record



	Author	Miquel Ferrer; Dimosthenis Karatzas; Ernest Valveny; I. Bardaji; Horst Bunke
	Title	A Generic Framework for Median Graph Computation based on a Recursive Embedding Approach			Type	Journal Article
	Year	2011	Publication	Computer Vision and Image Understanding	Abbreviated Journal	CVIU
	Volume	115	Issue	7	Pages	919-928
	Keywords	Median Graph, Graph Embedding, Graph Matching, Structural Pattern Recognition
	Abstract	The median graph has been shown to be a good choice to obtain a represen- tative of a set of graphs. However, its computation is a complex problem. Recently, graph embedding into vector spaces has been proposed to obtain approximations of the median graph. The problem with such an approach is how to go from a point in the vector space back to a graph in the graph space. The main contribution of this paper is the generalization of this previ- ous method, proposing a generic recursive procedure that permits to recover the graph corresponding to a point in the vector space, introducing only the amount of approximation inherent to the use of graph matching algorithms. In order to evaluate the proposed method, we compare it with the set me- dian and with the other state-of-the-art embedding-based methods for the median graph computation. The experiments are carried out using four dif- ferent databases (one semi-artificial and three containing real-world data). Results show that with the proposed approach we can obtain better medi- ans, in terms of the sum of distances to the training graphs, than with the previous existing methods.
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	DAG			Approved	no
	Call Number	IAM @ iam @ FKV2011			Serial	1831
Permanent link to this record



	Author	Jordi Gonzalez; Thomas B. Moeslund; Liang Wang
	Title	Semantic Understanding of Human Behaviors in Image Sequences: From video-surveillance to video-hermeneutics			Type	Journal Article
	Year	2012	Publication	Computer Vision and Image Understanding	Abbreviated Journal	CVIU
	Volume	116	Issue	3	Pages	305–306
	Keywords
	Abstract	Purpose: Atheromatic plaque progression is affected, among others phenomena, by biomechanical, biochemical, and physiological factors. In this paper, the authors introduce a novel framework able to provide both morphological (vessel radius, plaque thickness, and type) and biomechanical (wall shear stress and Von Mises stress) indices of coronary arteries.Methods: First, the approach reconstructs the three-dimensional morphology of the vessel from intravascular ultrasound (IVUS) and Angiographic sequences, requiring minimal user interaction. Then, a computational pipeline allows to automatically assess fluid-dynamic and mechanical indices. Ten coronary arteries are analyzed illustrating the capabilities of the tool and confirming previous technical and clinical observations.Results: The relations between the arterial indices obtained by IVUS measurement and simulations have been quantitatively analyzed along the whole surface of the artery, extending the analysis of the coronary arteries shown in previous state of the art studies. Additionally, for the first time in the literature, the framework allows the computation of the membrane stresses using a simplified mechanical model of the arterial wall.Conclusions: Circumferentially (within a given frame), statistical analysis shows an inverse relation between the wall shear stress and the plaque thickness. At the global level (comparing a frame within the entire vessel), it is observed that heavy plaque accumulations are in general calcified and are located in the areas of the vessel having high wall shear stress. Finally, in their experiments the inverse proportionality between fluid and structural stresses is observed.
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN	1077-3142	ISBN		Medium
	Area		Expedition		Conference
	Notes	ISE			Approved	no
	Call Number	Admin @ si @ GMW2012			Serial	2005
Permanent link to this record



	Author	Bhaskar Chakraborty; Jordi Gonzalez; Xavier Roca
	Title	Large scale continuous visual event recognition using max-margin Hough transformation framework			Type	Journal Article
	Year	2013	Publication	Computer Vision and Image Understanding	Abbreviated Journal	CVIU
	Volume	117	Issue	10	Pages	1356–1368
	Keywords
	Abstract	In this paper we propose a novel method for continuous visual event recognition (CVER) on a large scale video dataset using max-margin Hough transformation framework. Due to high scalability, diverse real environmental state and wide scene variability direct application of action recognition/detection methods such as spatio-temporal interest point (STIP)-local feature based technique, on the whole dataset is practically infeasible. To address this problem, we apply a motion region extraction technique which is based on motion segmentation and region clustering to identify possible candidate “event of interest” as a preprocessing step. On these candidate regions a STIP detector is applied and local motion features are computed. For activity representation we use generalized Hough transform framework where each feature point casts a weighted vote for possible activity class centre. A max-margin frame work is applied to learn the feature codebook weight. For activity detection, peaks in the Hough voting space are taken into account and initial event hypothesis is generated using the spatio-temporal information of the participating STIPs. For event recognition a verification Support Vector Machine is used. An extensive evaluation on benchmark large scale video surveillance dataset (VIRAT) and as well on a small scale benchmark dataset (MSR) shows that the proposed method is applicable on a wide range of continuous visual event recognition applications having extremely challenging conditions.
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN	1077-3142	ISBN		Medium
	Area		Expedition		Conference
	Notes	ISE			Approved	no
	Call Number	Admin @ si @ CGR2013			Serial	2413
Permanent link to this record



	Author	Josep M. Gonfaus; Marco Pedersoli; Jordi Gonzalez; Andrea Vedaldi; Xavier Roca
	Title	Factorized appearances for object detection			Type	Journal Article
	Year	2015	Publication	Computer Vision and Image Understanding	Abbreviated Journal	CVIU
	Volume	138	Issue		Pages	92–101
	Keywords	Object recognition; Deformable part models; Learning and sharing parts; Discovering discriminative parts
	Abstract	Deformable object models capture variations in an object’s appearance that can be represented as image deformations. Other effects such as out-of-plane rotations, three-dimensional articulations, and self-occlusions are often captured by considering mixture of deformable models, one per object aspect. A more scalable approach is representing instead the variations at the level of the object parts, applying the concept of a mixture locally. Combining a few part variations can in fact cheaply generate a large number of global appearances. A limited version of this idea was proposed by Yang and Ramanan [1], for human pose dectection. In this paper we apply it to the task of generic object category detection and extend it in several ways. First, we propose a model for the relationship between part appearances more general than the tree of Yang and Ramanan [1], which is more suitable for generic categories. Second, we treat part locations as well as their appearance as latent variables so that training does not need part annotations but only the object bounding boxes. Third, we modify the weakly-supervised learning of Felzenszwalb et al. and Girshick et al. [2], [3] to handle a significantly more complex latent structure. Our model is evaluated on standard object detection benchmarks and is found to improve over existing approaches, yielding state-of-the-art results for several object categories.
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	ISE; 600.063; 600.078			Approved	no
	Call Number	Admin @ si @ GPG2015			Serial	2705
Permanent link to this record



	Author	Mariella Dimiccoli; Marc Bolaños; Estefania Talavera; Maedeh Aghaei; Stavri G. Nikolov; Petia Radeva
	Title	SR-Clustering: Semantic Regularized Clustering for Egocentric Photo Streams Segmentation			Type	Journal Article
	Year	2017	Publication	Computer Vision and Image Understanding	Abbreviated Journal	CVIU
	Volume	155	Issue		Pages	55-69
	Keywords
	Abstract	While wearable cameras are becoming increasingly popular, locating relevant information in large unstructured collections of egocentric images is still a tedious and time consuming processes. This paper addresses the problem of organizing egocentric photo streams acquired by a wearable camera into semantically meaningful segments. First, contextual and semantic information is extracted for each image by employing a Convolutional Neural Networks approach. Later, by integrating language processing, a vocabulary of concepts is defined in a semantic space. Finally, by exploiting the temporal coherence in photo streams, images which share contextual and semantic attributes are grouped together. The resulting temporal segmentation is particularly suited for further analysis, ranging from activity and event recognition to semantic indexing and summarization. Experiments over egocentric sets of nearly 17,000 images, show that the proposed approach outperforms state-of-the-art methods.
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	MILAB; 601.235			Approved	no
	Call Number	Admin @ si @ DBT2017			Serial	2714
Permanent link to this record



	Author	Maedeh Aghaei; Mariella Dimiccoli; Petia Radeva
	Title	Multi-face tracking by extended bag-of-tracklets in egocentric photo-streams			Type	Journal Article
	Year	2016	Publication	Computer Vision and Image Understanding	Abbreviated Journal	CVIU
	Volume	149	Issue		Pages	146-156
	Keywords
	Abstract	Wearable cameras offer a hands-free way to record egocentric images of daily experiences, where social events are of special interest. The first step towards detection of social events is to track the appearance of multiple persons involved in them. In this paper, we propose a novel method to find correspondences of multiple faces in low temporal resolution egocentric videos acquired through a wearable camera. This kind of photo-stream imposes additional challenges to the multi-tracking problem with respect to conventional videos. Due to the free motion of the camera and to its low temporal resolution, abrupt changes in the field of view, in illumination condition and in the target location are highly frequent. To overcome such difficulties, we propose a multi-face tracking method that generates a set of tracklets through finding correspondences along the whole sequence for each detected face and takes advantage of the tracklets redundancy to deal with unreliable ones. Similar tracklets are grouped into the so called extended bag-of-tracklets (eBoT), which is aimed to correspond to a specific person. Finally, a prototype tracklet is extracted for each eBoT, where the occurred occlusions are estimated by relying on a new measure of confidence. We validated our approach over an extensive dataset of egocentric photo-streams and compared it to state of the art methods, demonstrating its effectiveness and robustness.
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	MILAB;			Approved	no
	Call Number	Admin @ si @ ADR2016b			Serial	2742
Permanent link to this record



	Author	Gerard Canal; Sergio Escalera; Cecilio Angulo
	Title	A Real-time Human-Robot Interaction system based on gestures for assistive scenarios			Type	Journal Article
	Year	2016	Publication	Computer Vision and Image Understanding	Abbreviated Journal	CVIU
	Volume	149	Issue		Pages	65-77
	Keywords	Gesture recognition; Human Robot Interaction; Dynamic Time Warping; Pointing location estimation
	Abstract	Natural and intuitive human interaction with robotic systems is a key point to develop robots assisting people in an easy and effective way. In this paper, a Human Robot Interaction (HRI) system able to recognize gestures usually employed in human non-verbal communication is introduced, and an in-depth study of its usability is performed. The system deals with dynamic gestures such as waving or nodding which are recognized using a Dynamic Time Warping approach based on gesture specific features computed from depth maps. A static gesture consisting in pointing at an object is also recognized. The pointed location is then estimated in order to detect candidate objects the user may refer to. When the pointed object is unclear for the robot, a disambiguation procedure by means of either a verbal or gestural dialogue is performed. This skill would lead to the robot picking an object in behalf of the user, which could present difficulties to do it by itself. The overall system — which is composed by a NAO and Wifibot robots, a KinectTM v2 sensor and two laptops — is firstly evaluated in a structured lab setup. Then, a broad set of user tests has been completed, which allows to assess correct performance in terms of recognition rates, easiness of use and response times.
	Address
	Corporate Author				Thesis
	Publisher	Elsevier B.V.	Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	HuPBA;MILAB;			Approved	no
	Call Number	Admin @ si @ CEA2016			Serial	2768
Permanent link to this record



	Author	Mohammad Ali Bagheri; Qigang Gao; Sergio Escalera; Huamin Ren; Thomas B. Moeslund; Elham Etemad
	Title	Locality Regularized Group Sparse Coding for Action Recognition			Type	Journal Article
	Year	2017	Publication	Computer Vision and Image Understanding	Abbreviated Journal	CVIU
	Volume	158	Issue		Pages	106-114
	Keywords	Bag of words; Feature encoding; Locality constrained coding; Group sparse coding; Alternating direction method of multipliers; Action recognition
	Abstract	Bag of visual words (BoVW) models are widely utilized in image/ video representation and recognition. The cornerstone of these models is the encoding stage, in which local features are decomposed over a codebook in order to obtain a representation of features. In this paper, we propose a new encoding algorithm by jointly encoding the set of local descriptors of each sample and considering the locality structure of descriptors. The proposed method takes advantages of locality coding such as its stability and robustness to noise in descriptors, as well as the strengths of the group coding strategy by taking into account the potential relation among descriptors of a sample. To efficiently implement our proposed method, we consider the Alternating Direction Method of Multipliers (ADMM) framework, which results in quadratic complexity in the problem size. The method is employed for a challenging classification problem: action recognition by depth cameras. Experimental results demonstrate the outperformance of our methodology compared to the state-of-the-art on the considered datasets.
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	HuPBA; no proj;MILAB			Approved	no
	Call Number	Admin @ si @ BGE2017			Serial	3014
Permanent link to this record



	Author	Maedeh Aghaei; Mariella Dimiccoli; C. Canton-Ferrer; Petia Radeva
	Title	Towards social pattern characterization from egocentric photo-streams			Type	Journal Article
	Year	2018	Publication	Computer Vision and Image Understanding	Abbreviated Journal	CVIU
	Volume	171	Issue		Pages	104-117
	Keywords	Social pattern characterization; Social signal extraction; Lifelogging; Convolutional and recurrent neural networks
	Abstract	Following the increasingly popular trend of social interaction analysis in egocentric vision, this article presents a comprehensive pipeline for automatic social pattern characterization of a wearable photo-camera user. The proposed framework relies merely on the visual analysis of egocentric photo-streams and consists of three major steps. The first step is to detect social interactions of the user where the impact of several social signals on the task is explored. The detected social events are inspected in the second step for categorization into different social meetings. These two steps act at event-level where each potential social event is modeled as a multi-dimensional time-series, whose dimensions correspond to a set of relevant features for each task; finally, LSTM is employed to classify the time-series. The last step of the framework is to characterize social patterns of the user. Our goal is to quantify the duration, the diversity and the frequency of the user social relations in various social situations. This goal is achieved by the discovery of recurrences of the same people across the whole set of social events related to the user. Experimental evaluation over EgoSocialStyle – the proposed dataset in this work, and EGO-GROUP demonstrates promising results on the task of social pattern characterization from egocentric photo-streams.
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	MILAB; no proj			Approved	no
	Call Number	Admin @ si @ ADC2018			Serial	3022
Permanent link to this record



	Author	Pichao Wang; Wanqing Li; Philip Ogunbona; Jun Wan; Sergio Escalera
	Title	RGB-D-based Human Motion Recognition with Deep Learning: A Survey			Type	Journal Article
	Year	2018	Publication	Computer Vision and Image Understanding	Abbreviated Journal	CVIU
	Volume	171	Issue		Pages	118-139
	Keywords	Human motion recognition; RGB-D data; Deep learning; Survey
	Abstract	Human motion recognition is one of the most important branches of human-centered research activities. In recent years, motion recognition based on RGB-D data has attracted much attention. Along with the development in artificial intelligence, deep learning techniques have gained remarkable success in computer vision. In particular, convolutional neural networks (CNN) have achieved great success for image-based tasks, and recurrent neural networks (RNN) are renowned for sequence-based problems. Specifically, deep learning methods based on the CNN and RNN architectures have been adopted for motion recognition using RGB-D data. In this paper, a detailed overview of recent advances in RGB-D-based motion recognition is presented. The reviewed methods are broadly categorized into four groups, depending on the modality adopted for recognition: RGB-based, depth-based, skeleton-based and RGB+D-based. As a survey focused on the application of deep learning to RGB-D-based motion recognition, we explicitly discuss the advantages and limitations of existing techniques. Particularly, we highlighted the methods of encoding spatial-temporal-structural information inherent in video sequence, and discuss potential directions for future research.
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	HUPBA; no proj;MILAB			Approved	no
	Call Number	Admin @ si @ WLO2018			Serial	3123
Permanent link to this record



	Author	Aymen Azaza; Joost Van de Weijer; Ali Douik; Marc Masana
	Title	Context Proposals for Saliency Detection			Type	Journal Article
	Year	2018	Publication	Computer Vision and Image Understanding	Abbreviated Journal	CVIU
	Volume	174	Issue		Pages	1-11
	Keywords
	Abstract	One of the fundamental properties of a salient object region is its contrast with the immediate context. The problem is that numerous object regions exist which potentially can all be salient. One way to prevent an exhaustive search over all object regions is by using object proposal algorithms. These return a limited set of regions which are most likely to contain an object. Several saliency estimation methods have used object proposals. However, they focus on the saliency of the proposal only, and the importance of its immediate context has not been evaluated. In this paper, we aim to improve salient object detection. Therefore, we extend object proposal methods with context proposals, which allow to incorporate the immediate context in the saliency computation. We propose several saliency features which are computed from the context proposals. In the experiments, we evaluate five object proposal methods for the task of saliency segmentation, and find that Multiscale Combinatorial Grouping outperforms the others. Furthermore, experiments show that the proposed context features improve performance, and that our method matches results on the FT datasets and obtains competitive results on three other datasets (PASCAL-S, MSRA-B and ECSSD).
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	LAMP; 600.109; 600.109; 600.120;CIC			Approved	no
	Call Number	Admin @ si @ AWD2018			Serial	3241
Permanent link to this record