Publicacions CVC -- Query Results

	Publicacions CVC Home \| Show All \| Simple Search \| Advanced Search \| Add Record \| Import	Login Quick Search: Field: contains: ...
	61–75 of 738 records found matching your query (RSS \| history):

Search & Display Options

Select All Deselect All

<< 1 2 3 4 5 6 7 8 9 10 >> [11–20]

List View

Citations

Details

	Records
	Author	Vacit Oguz Yazici; Longlong Yu; Arnau Ramisa; Luis Herranz; Joost Van de Weijer
	Title	Main product detection with graph networks for fashion			Type	Journal Article
	Year	2024	Publication	Multimedia Tools and Applications	Abbreviated Journal	MTAP
	Volume	83	Issue		Pages	3215–3231
	Keywords
	Abstract	Computer vision has established a foothold in the online fashion retail industry. Main product detection is a crucial step of vision-based fashion product feed parsing pipelines, focused on identifying the bounding boxes that contain the product being sold in the gallery of images of the product page. The current state-of-the-art approach does not leverage the relations between regions in the image, and treats images of the same product independently, therefore not fully exploiting visual and product contextual information. In this paper, we propose a model that incorporates Graph Convolutional Networks (GCN) that jointly represent all detected bounding boxes in the gallery as nodes. We show that the proposed method is better than the state-of-the-art, especially, when we consider the scenario where title-input is missing at inference time and for cross-dataset evaluation, our method outperforms previous approaches by a large margin.
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	LAMP; MACO; 600.147; 600.167; 600.164; 600.161; 600.141; 601.309			Approved	no
	Call Number	Admin @ si @ YYR2024			Serial	4017
Permanent link to this record



	Author	Razieh Rastgoo; Kourosh Kiani; Sergio Escalera
	Title	Hand pose aware multimodal isolated sign language recognition			Type	Journal Article
	Year	2020	Publication	Multimedia Tools and Applications	Abbreviated Journal	MTAP
	Volume	80	Issue		Pages	127–163
	Keywords
	Abstract	Isolated hand sign language recognition from video is a challenging research area in computer vision. Some of the most important challenges in this area include dealing with hand occlusion, fast hand movement, illumination changes, or background complexity. While most of the state-of-the-art results in the field have been achieved using deep learning-based models, the previous challenges are not completely solved. In this paper, we propose a hand pose aware model for isolated hand sign language recognition using deep learning approaches from two input modalities, RGB and depth videos. Four spatial feature types: pixel-level, flow, deep hand, and hand pose features, fused from both visual modalities, are input to LSTM for temporal sign recognition. While we use Optical Flow (OF) for flow information in RGB video inputs, Scene Flow (SF) is used for depth video inputs. By including hand pose features, we show a consistent performance improvement of the sign language recognition model. To the best of our knowledge, this is the first time that this discriminant spatiotemporal features, benefiting from the hand pose estimation features and multi-modal inputs, are fused for isolated hand sign language recognition. We perform a step-by-step analysis of the impact in terms of recognition performance of the hand pose features, different combinations of the spatial features, and different recurrent models, especially LSTM and GRU. Results on four public datasets confirm that the proposed model outperforms the current state-of-the-art models on Montalbano II, MSR Daily Activity 3D, and CAD-60 datasets with a relative accuracy improvement of 1.64%, 6.5%, and 7.6%. Furthermore, our model obtains a competitive results on isoGD dataset with only 0.22% margin lower than the current state-of-the-art model.
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	HUPBA; no menciona			Approved	no
	Call Number	Admin @ si @ RKE2020			Serial	3524
Permanent link to this record



	Author	Razieh Rastgoo; Kourosh Kiani; Sergio Escalera
	Title	Video-based Isolated Hand Sign Language Recognition Using a Deep Cascaded Model			Type	Journal Article
	Year	2020	Publication	Multimedia Tools and Applications	Abbreviated Journal	MTAP
	Volume	79	Issue		Pages	22965–22987
	Keywords
	Abstract	In this paper, we propose an efficient cascaded model for sign language recognition taking benefit from spatio-temporal hand-based information using deep learning approaches, especially Single Shot Detector (SSD), Convolutional Neural Network (CNN), and Long Short Term Memory (LSTM), from videos. Our simple yet efficient and accurate model includes two main parts: hand detection and sign recognition. Three types of spatial features, including hand features, Extra Spatial Hand Relation (ESHR) features, and Hand Pose (HP) features, have been fused in the model to feed to LSTM for temporal features extraction. We train SSD model for hand detection using some videos collected from five online sign dictionaries. Our model is evaluated on our proposed dataset (Rastgoo et al., Expert Syst Appl 150: 113336, 2020), including 10’000 sign videos for 100 Persian sign using 10 contributors in 10 different backgrounds, and isoGD dataset. Using the 5-fold cross-validation method, our model outperforms state-of-the-art alternatives in sign language recognition
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	HuPBA; no menciona			Approved	no
	Call Number	Admin @ si @ RKE2020b			Serial	3442
Permanent link to this record



	Author	Egils Avots; Meysam Madadi; Sergio Escalera; Jordi Gonzalez; Xavier Baro; Paul Pallin; Gholamreza Anbarjafari
	Title	From 2D to 3D geodesic-based garment matching			Type	Journal Article
	Year	2019	Publication	Multimedia Tools and Applications	Abbreviated Journal	MTAP
	Volume	78	Issue	18	Pages	25829–25853
	Keywords	Shape matching; Geodesic distance; Texture mapping; RGBD image processing; Gaussian mixture model
	Abstract	A new approach for 2D to 3D garment retexturing is proposed based on Gaussian mixture models and thin plate splines (TPS). An automatically segmented garment of an individual is matched to a new source garment and rendered, resulting in augmented images in which the target garment has been retextured using the texture of the source garment. We divide the problem into garment boundary matching based on Gaussian mixture models and then interpolate inner points using surface topology extracted through geodesic paths, which leads to a more realistic result than standard approaches. We evaluated and compared our system quantitatively by root mean square error (RMS) and qualitatively using the mean opinion score (MOS), showing the benefits of the proposed methodology on our gathered dataset.
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	HuPBA; ISE; 600.098; 600.119; 602.133			Approved	no
	Call Number	Admin @ si @ AME2019			Serial	3317
Permanent link to this record



	Author	Andre Litvin; Kamal Nasrollahi; Sergio Escalera; Cagri Ozcinar; Thomas B. Moeslund; Gholamreza Anbarjafari
	Title	A Novel Deep Network Architecture for Reconstructing RGB Facial Images from Thermal for Face Recognition			Type	Journal Article
	Year	2019	Publication	Multimedia Tools and Applications	Abbreviated Journal	MTAP
	Volume	78	Issue	18	Pages	25259–25271
	Keywords	Fully convolutional networks; FusionNet; Thermal imaging; Face recognition
	Abstract	This work proposes a fully convolutional network architecture for RGB face image generation from a given input thermal face image to be applied in face recognition scenarios. The proposed method is based on the FusionNet architecture and increases robustness against overfitting using dropout after bridge connections, randomised leaky ReLUs (RReLUs), and orthogonal regularization. Furthermore, we propose to use a decoding block with resize convolution instead of transposed convolution to improve final RGB face image generation. To validate our proposed network architecture, we train a face classifier and compare its face recognition rate on the reconstructed RGB images from the proposed architecture, to those when reconstructing images with the original FusionNet, as well as when using the original RGB images. As a result, we are introducing a new architecture which leads to a more accurate network.
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	HuPBA; no menciona			Approved	no
	Call Number	Admin @ si @ LNE2019			Serial	3318
Permanent link to this record



	Author	Laura Lopez-Fuentes; Joost Van de Weijer; Manuel Gonzalez-Hidalgo; Harald Skinnemoen; Andrew Bagdanov
	Title	Review on computer vision techniques in emergency situations			Type	Journal Article
	Year	2018	Publication	Multimedia Tools and Applications	Abbreviated Journal	MTAP
	Volume	77	Issue	13	Pages	17069–17107
	Keywords	Emergency management; Computer vision; Decision makers; Situational awareness; Critical situation
	Abstract	In emergency situations, actions that save lives and limit the impact of hazards are crucial. In order to act, situational awareness is needed to decide what to do. Geolocalized photos and video of the situations as they evolve can be crucial in better understanding them and making decisions faster. Cameras are almost everywhere these days, either in terms of smartphones, installed CCTV cameras, UAVs or others. However, this poses challenges in big data and information overflow. Moreover, most of the time there are no disasters at any given location, so humans aiming to detect sudden situations may not be as alert as needed at any point in time. Consequently, computer vision tools can be an excellent decision support. The number of emergencies where computer vision tools has been considered or used is very wide, and there is a great overlap across related emergency research. Researchers tend to focus on state-of-the-art systems that cover the same emergency as they are studying, obviating important research in other fields. In order to unveil this overlap, the survey is divided along four main axes: the types of emergencies that have been studied in computer vision, the objective that the algorithms can address, the type of hardware needed and the algorithms used. Therefore, this review provides a broad overview of the progress of computer vision covering all sorts of emergencies.
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	LAMP; 600.068; 600.120			Approved	no
	Call Number	Admin @ si @ LWG2018			Serial	3041
Permanent link to this record



	Author	Thanh Ha Do; Oriol Ramos Terrades; Salvatore Tabbone
	Title	DSD: document sparse-based denoising algorithm			Type	Journal Article
	Year	2019	Publication	Pattern Analysis and Applications	Abbreviated Journal	PAA
	Volume	22	Issue	1	Pages	177–186
	Keywords	Document denoising; Sparse representations; Sparse dictionary learning; Document degradation models
	Abstract	In this paper, we present a sparse-based denoising algorithm for scanned documents. This method can be applied to any kind of scanned documents with satisfactory results. Unlike other approaches, the proposed approach encodes noise documents through sparse representation and visual dictionary learning techniques without any prior noise model. Moreover, we propose a precision parameter estimator. Experiments on several datasets demonstrate the robustness of the proposed approach compared to the state-of-the-art methods on document denoising.
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	DAG; 600.097; 600.140; 600.121			Approved	no
	Call Number	Admin @ si @ DRT2019			Serial	3254
Permanent link to this record



	Author	Minesh Mathew; Lluis Gomez; Dimosthenis Karatzas; C.V. Jawahar
	Title	Asking questions on handwritten document collections			Type	Journal Article
	Year	2021	Publication	International Journal on Document Analysis and Recognition	Abbreviated Journal	IJDAR
	Volume	24	Issue		Pages	235-249
	Keywords
	Abstract	This work addresses the problem of Question Answering (QA) on handwritten document collections. Unlike typical QA and Visual Question Answering (VQA) formulations where the answer is a short text, we aim to locate a document snippet where the answer lies. The proposed approach works without recognizing the text in the documents. We argue that the recognition-free approach is suitable for handwritten documents and historical collections where robust text recognition is often difficult. At the same time, for human users, document image snippets containing answers act as a valid alternative to textual answers. The proposed approach uses an off-the-shelf deep embedding network which can project both textual words and word images into a common sub-space. This embedding bridges the textual and visual domains and helps us retrieve document snippets that potentially answer a question. We evaluate results of the proposed approach on two new datasets: (i) HW-SQuAD: a synthetic, handwritten document image counterpart of SQuAD1.0 dataset and (ii) BenthamQA: a smaller set of QA pairs defined on documents from the popular Bentham manuscripts collection. We also present a thorough analysis of the proposed recognition-free approach compared to a recognition-based approach which uses text recognized from the images using an OCR. Datasets presented in this work are available to download at docvqa.org.
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	DAG; 600.121			Approved	no
	Call Number	Admin @ si @ MGK2021			Serial	3621
Permanent link to this record



	Author	Sanket Biswas; Pau Riba; Josep Llados; Umapada Pal
	Title	Beyond Document Object Detection: Instance-Level Segmentation of Complex Layouts			Type	Journal Article
	Year	2021	Publication	International Journal on Document Analysis and Recognition	Abbreviated Journal	IJDAR
	Volume	24	Issue		Pages	269–281
	Keywords
	Abstract	Information extraction is a fundamental task of many business intelligence services that entail massive document processing. Understanding a document page structure in terms of its layout provides contextual support which is helpful in the semantic interpretation of the document terms. In this paper, inspired by the progress of deep learning methodologies applied to the task of object recognition, we transfer these models to the specific case of document object detection, reformulating the traditional problem of document layout analysis. Moreover, we importantly contribute to prior arts by defining the task of instance segmentation on the document image domain. An instance segmentation paradigm is especially important in complex layouts whose contents should interact for the proper rendering of the page, i.e., the proper text wrapping around an image. Finally, we provide an extensive evaluation, both qualitative and quantitative, that demonstrates the superior performance of the proposed methodology over the current state of the art.
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	DAG; 600.121; 600.140; 110.312			Approved	no
	Call Number	Admin @ si @ BRL2021b			Serial	3574
Permanent link to this record



	Author	Sounak Dey; Anguelos Nicolaou; Josep Llados; Umapada Pal
	Title	Evaluation of the Effect of Improper Segmentation on Word Spotting			Type	Journal Article
	Year	2019	Publication	International Journal on Document Analysis and Recognition	Abbreviated Journal	IJDAR
	Volume	22	Issue		Pages	361-374
	Keywords
	Abstract	Word spotting is an important recognition task in large-scale retrieval of document collections. In most of the cases, methods are developed and evaluated assuming perfect word segmentation. In this paper, we propose an experimental framework to quantify the goodness that word segmentation has on the performance achieved by word spotting methods in identical unbiased conditions. The framework consists of generating systematic distortions on segmentation and retrieving the original queries from the distorted dataset. We have tested our framework on several established and state-of-the-art methods using George Washington and Barcelona Marriage Datasets. The experiments done allow for an estimate of the end-to-end performance of word spotting methods.
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	DAG; 600.097; 600.084; 600.121; 600.140; 600.129			Approved	no
	Call Number	Admin @ si @ DNL2019			Serial	3455
Permanent link to this record



	Author	Lluis Gomez; Dimosthenis Karatzas
	Title	A fast hierarchical method for multi‐script and arbitrary oriented scene text extraction			Type	Journal Article
	Year	2016	Publication	International Journal on Document Analysis and Recognition	Abbreviated Journal	IJDAR
	Volume	19	Issue	4	Pages	335-349
	Keywords	scene text; segmentation; detection; hierarchical grouping; perceptual organisation
	Abstract	Typography and layout lead to the hierarchical organisation of text in words, text lines, paragraphs. This inherent structure is a key property of text in any script and language, which has nonetheless been minimally leveraged by existing text detection methods. This paper addresses the problem of text segmentation in natural scenes from a hierarchical perspective. Contrary to existing methods, we make explicit use of text structure, aiming directly to the detection of region groupings corresponding to text within a hierarchy produced by an agglomerative similarity clustering process over individual regions. We propose an optimal way to construct such an hierarchy introducing a feature space designed to produce text group hypotheses with high recall and a novel stopping rule combining a discriminative classifier and a probabilistic measure of group meaningfulness based in perceptual organization. Results obtained over four standard datasets, covering text in variable orientations and different languages, demonstrate that our algorithm, while being trained in a single mixed dataset, outperforms state of the art methods in unconstrained scenarios.
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	DAG; 600.056; 601.197			Approved	no
	Call Number	Admin @ si @ GoK2016a			Serial	2862
Permanent link to this record



	Author	Anjan Dutta; Pau Riba; Josep Llados; Alicia Fornes
	Title	Hierarchical Stochastic Graphlet Embedding for Graph-based Pattern Recognition			Type	Journal Article
	Year	2020	Publication	Neural Computing and Applications	Abbreviated Journal	NEUCOMA
	Volume	32	Issue		Pages	11579–11596
	Keywords
	Abstract	Despite being very successful within the pattern recognition and machine learning community, graph-based methods are often unusable because of the lack of mathematical operations defined in graph domain. Graph embedding, which maps graphs to a vectorial space, has been proposed as a way to tackle these difficulties enabling the use of standard machine learning techniques. However, it is well known that graph embedding functions usually suffer from the loss of structural information. In this paper, we consider the hierarchical structure of a graph as a way to mitigate this loss of information. The hierarchical structure is constructed by topologically clustering the graph nodes and considering each cluster as a node in the upper hierarchical level. Once this hierarchical structure is constructed, we consider several configurations to define the mapping into a vector space given a classical graph embedding, in particular, we propose to make use of the stochastic graphlet embedding (SGE). Broadly speaking, SGE produces a distribution of uniformly sampled low-to-high-order graphlets as a way to embed graphs into the vector space. In what follows, the coarse-to-fine structure of a graph hierarchy and the statistics fetched by the SGE complements each other and includes important structural information with varied contexts. Altogether, these two techniques substantially cope with the usual information loss involved in graph embedding techniques, obtaining a more robust graph representation. This fact has been corroborated through a detailed experimental evaluation on various benchmark graph datasets, where we outperform the state-of-the-art methods.
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	DAG; 600.140; 600.121; 600.141			Approved	no
	Call Number	Admin @ si @ DRL2020			Serial	3348
Permanent link to this record



	Author	Ana Garcia Rodriguez; Jorge Bernal; F. Javier Sanchez; Henry Cordova; Rodrigo Garces Duran; Cristina Rodriguez de Miguel; Gloria Fernandez Esparrach
	Title	Polyp fingerprint: automatic recognition of colorectal polyps’ unique features			Type	Journal Article
	Year	2020	Publication	Surgical Endoscopy and other Interventional Techniques	Abbreviated Journal	SEND
	Volume	34	Issue	4	Pages	1887-1889
	Keywords
	Abstract	BACKGROUND: Content-based image retrieval (CBIR) is an application of machine learning used to retrieve images by similarity on the basis of features. Our objective was to develop a CBIR system that could identify images containing the same polyp ('polyp fingerprint'). METHODS: A machine learning technique called Bag of Words was used to describe each endoscopic image containing a polyp in a unique way. The system was tested with 243 white light images belonging to 99 different polyps (for each polyp there were at least two images representing it in two different temporal moments). Images were acquired in routine colonoscopies at Hospital Clínic using high-definition Olympus endoscopes. The method provided for each image the closest match within the dataset. RESULTS: The system matched another image of the same polyp in 221/243 cases (91%). No differences were observed in the number of correct matches according to Paris classification (protruded: 90.7% vs. non-protruded: 91.3%) and size (< 10 mm: 91.6% vs. > 10 mm: 90%). CONCLUSIONS: A CBIR system can match accurately two images containing the same polyp, which could be a helpful aid for polyp image recognition. KEYWORDS: Artificial intelligence; Colorectal polyps; Content-based image retrieval
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	MV; no menciona			Approved	no
	Call Number	Admin @ si @			Serial	3403
Permanent link to this record



	Author	Mireia Sole; Joan Blanco; Debora Gil; Oliver Valero; Alvaro Pascual; B. Cardenas; G. Fonseka; E. Anton; Richard Frodsham; Francesca Vidal; Zaida Sarrate
	Title	Chromosomal positioning in spermatogenic cells is influenced by chromosomal factors associated with gene activity, bouquet formation, and meiotic sex-chromosome inactivation			Type	Journal Article
	Year	2021	Publication	Chromosoma	Abbreviated Journal
	Volume	130	Issue		Pages	163-175
	Keywords
	Abstract	Chromosome territoriality is not random along the cell cycle and it is mainly governed by intrinsic chromosome factors and gene expression patterns. Conversely, very few studies have explored the factors that determine chromosome territoriality and its influencing factors during meiosis. In this study, we analysed chromosome positioning in murine spermatogenic cells using three-dimensionally fluorescence in situ hybridization-based methodology, which allows the analysis of the entire karyotype. The main objective of the study was to decipher chromosome positioning in a radial axis (all analysed germ-cell nuclei) and longitudinal axis (only spermatozoa) and to identify the chromosomal factors that regulate such an arrangement. Results demonstrated that the radial positioning of chromosomes during spermatogenesis was cell-type specific and influenced by chromosomal factors associated to gene activity. Chromosomes with specific features that enhance transcription (high GC content, high gene density and high numbers of predicted expressed genes) were preferentially observed in the inner part of the nucleus in virtually all cell types. Moreover, the position of the sex chromosomes was influenced by their transcriptional status, from the periphery of the nucleus when its activity was repressed (pachytene) to a more internal position when it is partially activated (spermatid). At pachytene, chromosome positioning was also influenced by chromosome size due to the bouquet formation. Longitudinal chromosome positioning in the sperm nucleus was not random either, suggesting the importance of ordered longitudinal positioning for the release and activation of the paternal genome after fertilisation.
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	IAM; 600.145			Approved	no
	Call Number	Admin @ si @ SBG2021			Serial	3592
Permanent link to this record



	Author	Albert Clapes; Alex Pardo; Oriol Pujol; Sergio Escalera
	Title	Action detection fusing multiple Kinects and a WIMU: an application to in-home assistive technology for the elderly			Type	Journal Article
	Year	2018	Publication	Machine Vision and Applications	Abbreviated Journal	MVAP
	Volume	29	Issue	5	Pages	765–788
	Keywords	Multimodal activity detection; Computer vision; Inertial sensors; Dense trajectories; Dynamic time warping; Assistive technology
	Abstract	We present a vision-inertial system which combines two RGB-Depth devices together with a wearable inertial movement unit in order to detect activities of the daily living. From multi-view videos, we extract dense trajectories enriched with a histogram of normals description computed from the depth cue and bag them into multi-view codebooks. During the later classification step a multi-class support vector machine with a RBF- 2 kernel combines the descriptions at kernel level. In order to perform action detection from the videos, a sliding window approach is utilized. On the other hand, we extract accelerations, rotation angles, and jerk features from the inertial data collected by the wearable placed on the user’s dominant wrist. During gesture spotting, a dynamic time warping is applied and the aligning costs to a set of pre-selected gesture sub-classes are thresholded to determine possible detections. The outputs of the two modules are combined in a late-fusion fashion. The system is validated in a real-case scenario with elderly from an elder home. Learning-based fusion results improve the ones from the single modalities, demonstrating the success of such multimodal approach.
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	HUPBA; no proj			Approved	no
	Call Number	Admin @ si @ CPP2018			Serial	3125
Permanent link to this record