toggle visibility Search & Display Options

Select All    Deselect All
 |   | 
Details
   print
  Records Links
Author Diana Ramirez Cifuentes; Ana Freire; Ricardo Baeza Yates; Nadia Sanz Lamora; Aida Alvarez; Alexandre Gonzalez; Meritxell Lozano; Roger Llobet; Diego Velazquez; Josep M. Gonfaus; Jordi Gonzalez edit  url
doi  openurl
  Title Characterization of Anorexia Nervosa on Social Media: Textual, Visual, Relational, Behavioral, and Demographical Analysis Type Journal Article
  Year 2021 Publication (down) Journal of Medical Internet Research Abbreviated Journal JMIR  
  Volume 23 Issue 7 Pages e25925  
  Keywords  
  Abstract Background: Eating disorders are psychological conditions characterized by unhealthy eating habits. Anorexia nervosa (AN) is defined as the belief of being overweight despite being dangerously underweight. The psychological signs involve emotional and behavioral issues. There is evidence that signs and symptoms can manifest on social media, wherein both harmful and beneficial content is shared daily.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes ISE Approved no  
  Call Number Admin @ si @ RFB2021 Serial 3665  
Permanent link to this record
 

 
Author AN Ruchai; VI Kober; KA Dorofeev; VN Karnaukhov; Mikhail Mozerov edit  url
doi  openurl
  Title Classification of breast abnormalities using a deep convolutional neural network and transfer learning Type Journal Article
  Year 2021 Publication (down) Journal of Communications Technology and Electronics Abbreviated Journal  
  Volume 66 Issue 6 Pages 778–783  
  Keywords  
  Abstract A new algorithm for classification of breast pathologies in digital mammography using a convolutional neural network and transfer learning is proposed. The following pretrained neural networks were chosen: MobileNetV2, InceptionResNetV2, Xception, and ResNetV2. All mammographic images were pre-processed to improve classification reliability. Transfer training was carried out using additional data augmentation and fine-tuning. The performance of the proposed algorithm for classification of breast pathologies in terms of accuracy on real data is discussed and compared with that of state-of-the-art algorithms on the available MIAS database.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes LAMP; Approved no  
  Call Number Admin @ si @ RKD2022 Serial 3680  
Permanent link to this record
 

 
Author Pau Torras; Arnau Baro; Lei Kang; Alicia Fornes edit  openurl
  Title On the Integration of Language Models into Sequence to Sequence Architectures for Handwritten Music Recognition Type Conference Article
  Year 2021 Publication (down) International Society for Music Information Retrieval Conference Abbreviated Journal  
  Volume Issue Pages 690-696  
  Keywords  
  Abstract Despite the latest advances in Deep Learning, the recognition of handwritten music scores is still a challenging endeavour. Even though the recent Sequence to Sequence(Seq2Seq) architectures have demonstrated its capacity to reliably recognise handwritten text, their performance is still far from satisfactory when applied to historical handwritten scores. Indeed, the ambiguous nature of handwriting, the non-standard musical notation employed by composers of the time and the decaying state of old paper make these scores remarkably difficult to read, sometimes even by trained humans. Thus, in this work we explore the incorporation of language models into a Seq2Seq-based architecture to try to improve transcriptions where the aforementioned unclear writing produces statistically unsound mistakes, which as far as we know, has never been attempted for this field of research on this architecture. After studying various Language Model integration techniques, the experimental evaluation on historical handwritten music scores shows a significant improvement over the state of the art, showing that this is a promising research direction for dealing with such difficult manuscripts.  
  Address Virtual; November 2021  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference ISMIR  
  Notes DAG; 600.140; 600.121 Approved no  
  Call Number Admin @ si @ TBK2021 Serial 3616  
Permanent link to this record
 

 
Author Sanket Biswas; Pau Riba; Josep Llados; Umapada Pal edit   pdf
url  doi
openurl 
  Title Beyond Document Object Detection: Instance-Level Segmentation of Complex Layouts Type Journal Article
  Year 2021 Publication (down) International Journal on Document Analysis and Recognition Abbreviated Journal IJDAR  
  Volume 24 Issue Pages 269–281  
  Keywords  
  Abstract Information extraction is a fundamental task of many business intelligence services that entail massive document processing. Understanding a document page structure in terms of its layout provides contextual support which is helpful in the semantic interpretation of the document terms. In this paper, inspired by the progress of deep learning methodologies applied to the task of object recognition, we transfer these models to the specific case of document object detection, reformulating the traditional problem of document layout analysis. Moreover, we importantly contribute to prior arts by defining the task of instance segmentation on the document image domain. An instance segmentation paradigm is especially important in complex layouts whose contents should interact for the proper rendering of the page, i.e., the proper text wrapping around an image. Finally, we provide an extensive evaluation, both qualitative and quantitative, that demonstrates the superior performance of the proposed methodology over the current state of the art.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes DAG; 600.121; 600.140; 110.312 Approved no  
  Call Number Admin @ si @ BRL2021b Serial 3574  
Permanent link to this record
 

 
Author Minesh Mathew; Lluis Gomez; Dimosthenis Karatzas; C.V. Jawahar edit   pdf
url  openurl
  Title Asking questions on handwritten document collections Type Journal Article
  Year 2021 Publication (down) International Journal on Document Analysis and Recognition Abbreviated Journal IJDAR  
  Volume 24 Issue Pages 235-249  
  Keywords  
  Abstract This work addresses the problem of Question Answering (QA) on handwritten document collections. Unlike typical QA and Visual Question Answering (VQA) formulations where the answer is a short text, we aim to locate a document snippet where the answer lies. The proposed approach works without recognizing the text in the documents. We argue that the recognition-free approach is suitable for handwritten documents and historical collections where robust text recognition is often difficult. At the same time, for human users, document image snippets containing answers act as a valid alternative to textual answers. The proposed approach uses an off-the-shelf deep embedding network which can project both textual words and word images into a common sub-space. This embedding bridges the textual and visual domains and helps us retrieve document snippets that potentially answer a question. We evaluate results of the proposed approach on two new datasets: (i) HW-SQuAD: a synthetic, handwritten document image counterpart of SQuAD1.0 dataset and (ii) BenthamQA: a smaller set of QA pairs defined on documents from the popular Bentham manuscripts collection. We also present a thorough analysis of the proposed recognition-free approach compared to a recognition-based approach which uses text recognized from the images using an OCR. Datasets presented in this work are available to download at docvqa.org.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes DAG; 600.121 Approved no  
  Call Number Admin @ si @ MGK2021 Serial 3621  
Permanent link to this record
 

 
Author Manisha Das; Deep Gupta; Petia Radeva; Ashwini M. Bakde edit  url
openurl 
  Title Multi-scale decomposition-based CT-MR neurological image fusion using optimized bio-inspired spiking neural model with meta-heuristic optimization Type Journal Article
  Year 2021 Publication (down) International Journal of Imaging Systems and Technology Abbreviated Journal IMA  
  Volume 31 Issue 4 Pages 2170-2188  
  Keywords  
  Abstract Multi-modal medical image fusion plays an important role in clinical diagnosis and works as an assistance model for clinicians. In this paper, a computed tomography-magnetic resonance (CT-MR) image fusion model is proposed using an optimized bio-inspired spiking feedforward neural network in different decomposition domains. First, source images are decomposed into base (low-frequency) and detail (high-frequency) layer components. Low-frequency subbands are fused using texture energy measures to capture the local energy, contrast, and small edges in the fused image. High-frequency coefficients are fused using firing maps obtained by pixel-activated neural model with the optimized parameters using three different optimization techniques such as differential evolution, cuckoo search, and gray wolf optimization, individually. In the optimization model, a fitness function is computed based on the edge index of resultant fused images, which helps to extract and preserve sharp edges available in the source CT and MR images. To validate the fusion performance, a detailed comparative analysis is presented among the proposed and state-of-the-art methods in terms of quantitative and qualitative measures along with computational complexity. Experimental results show that the proposed method produces a significantly better visual quality of fused images meanwhile outperforms the existing methods.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes MILAB; no menciona Approved no  
  Call Number Admin @ si @ DGR2021a Serial 3630  
Permanent link to this record
 

 
Author Meysam Madadi; Hugo Bertiche; Sergio Escalera edit   pdf
doi  openurl
  Title Deep unsupervised 3D human body reconstruction from a sparse set of landmarks Type Journal Article
  Year 2021 Publication (down) International Journal of Computer Vision Abbreviated Journal IJCV  
  Volume 129 Issue Pages 2499–2512  
  Keywords  
  Abstract In this paper we propose the first deep unsupervised approach in human body reconstruction to estimate body surface from a sparse set of landmarks, so called DeepMurf. We apply a denoising autoencoder to estimate missing landmarks. Then we apply an attention model to estimate body joints from landmarks. Finally, a cascading network is applied to regress parameters of a statistical generative model that reconstructs body. Our set of proposed loss functions allows us to train the network in an unsupervised way. Results on four public datasets show that our approach accurately reconstructs the human body from real world mocap data.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes HUPBA; no proj Approved no  
  Call Number Admin @ si @ MBE2021 Serial 3654  
Permanent link to this record
 

 
Author Javad Zolfaghari Bengar; Joost Van de Weijer; Bartlomiej Twardowski; Bogdan Raducanu edit  url
doi  openurl
  Title Reducing Label Effort: Self- Supervised Meets Active Learning Type Conference Article
  Year 2021 Publication (down) International Conference on Computer Vision Workshops Abbreviated Journal  
  Volume Issue Pages 1631-1639  
  Keywords  
  Abstract Active learning is a paradigm aimed at reducing the annotation effort by training the model on actively selected informative and/or representative samples. Another paradigm to reduce the annotation effort is self-training that learns from a large amount of unlabeled data in an unsupervised way and fine-tunes on few labeled samples. Recent developments in self-training have achieved very impressive results rivaling supervised learning on some datasets. The current work focuses on whether the two paradigms can benefit from each other. We studied object recognition datasets including CIFAR10, CIFAR100 and Tiny ImageNet with several labeling budgets for the evaluations. Our experiments reveal that self-training is remarkably more efficient than active learning at reducing the labeling effort, that for a low labeling budget, active learning offers no benefit to self-training, and finally that the combination of active learning and self-training is fruitful when the labeling budget is high. The performance gap between active learning trained either with self-training or from scratch diminishes as we approach to the point where almost half of the dataset is labeled.  
  Address October 2021  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference ICCVW  
  Notes LAMP; Approved no  
  Call Number Admin @ si @ ZVT2021 Serial 3672  
Permanent link to this record
 

 
Author Shun Yao; Fei Yang; Yongmei Cheng; Mikhail Mozerov edit   pdf
url  doi
openurl 
  Title 3D Shapes Local Geometry Codes Learning with SDF Type Conference Article
  Year 2021 Publication (down) International Conference on Computer Vision Workshops Abbreviated Journal  
  Volume Issue Pages 2110-2117  
  Keywords  
  Abstract A signed distance function (SDF) as the 3D shape description is one of the most effective approaches to represent 3D geometry for rendering and reconstruction. Our work is inspired by the state-of-the-art method DeepSDF [17] that learns and analyzes the 3D shape as the iso-surface of its shell and this method has shown promising results especially in the 3D shape reconstruction and compression domain. In this paper, we consider the degeneration problem of reconstruction coming from the capacity decrease of the DeepSDF model, which approximates the SDF with a neural network and a single latent code. We propose Local Geometry Code Learning (LGCL), a model that improves the original DeepSDF results by learning from a local shape geometry of the full 3D shape. We add an extra graph neural network to split the single transmittable latent code into a set of local latent codes distributed on the 3D shape. Mentioned latent codes are used to approximate the SDF in their local regions, which will alleviate the complexity of the approximation compared to the original DeepSDF. Furthermore, we introduce a new geometric loss function to facilitate the training of these local latent codes. Note that other local shape adjusting methods use the 3D voxel representation, which in turn is a problem highly difficult to solve or even is insolvable. In contrast, our architecture is based on graph processing implicitly and performs the learning regression process directly in the latent code space, thus make the proposed architecture more flexible and also simple for realization. Our experiments on 3D shape reconstruction demonstrate that our LGCL method can keep more details with a significantly smaller size of the SDF decoder and outperforms considerably the original DeepSDF method under the most important quantitative metrics.  
  Address VIRTUAL; October 2021  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference ICCVW  
  Notes LAMP Approved no  
  Call Number Admin @ si @ YYC2021 Serial 3681  
Permanent link to this record
 

 
Author Jorge Charco; Angel Sappa; Boris X. Vintimilla; Henry Velesaca edit   pdf
url  openurl
  Title Camera pose estimation in multi-view environments: From virtual scenarios to the real world Type Journal Article
  Year 2021 Publication (down) Image and Vision Computing Abbreviated Journal IVC  
  Volume 110 Issue Pages 104182  
  Keywords  
  Abstract This paper presents a domain adaptation strategy to efficiently train network architectures for estimating the relative camera pose in multi-view scenarios. The network architectures are fed by a pair of simultaneously acquired images, hence in order to improve the accuracy of the solutions, and due to the lack of large datasets with pairs of overlapped images, a domain adaptation strategy is proposed. The domain adaptation strategy consists on transferring the knowledge learned from synthetic images to real-world scenarios. For this, the networks are firstly trained using pairs of synthetic images, which are captured at the same time by a pair of cameras in a virtual environment; and then, the learned weights of the networks are transferred to the real-world case, where the networks are retrained with a few real images. Different virtual 3D scenarios are generated to evaluate the relationship between the accuracy on the result and the similarity between virtual and real scenarios—similarity on both geometry of the objects contained in the scene as well as relative pose between camera and objects in the scene. Experimental results and comparisons are provided showing that the accuracy of all the evaluated networks for estimating the camera pose improves when the proposed domain adaptation strategy is used, highlighting the importance on the similarity between virtual-real scenarios.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes MSIAU; 600.130; 600.122 Approved no  
  Call Number Admin @ si @ CSV2021 Serial 3577  
Permanent link to this record
 

 
Author Reza Azad; Afshin Bozorgpour; Maryam Asadi-Aghbolaghi; Dorit Merhof; Sergio Escalera edit   pdf
openurl 
  Title Deep Frequency Re-Calibration U-Net for Medical Image Segmentation Type Conference Article
  Year 2021 Publication (down) IEEE/CVF International Conference on Computer Vision Workshops Abbreviated Journal  
  Volume Issue Pages 3274-3283  
  Keywords  
  Abstract We present a novel solution to the garment animation problem through deep learning. Our contribution allows animating any template outfit with arbitrary topology and geometric complexity. Recent works develop models for garment edition, resizing and animation at the same time by leveraging the support body model (encoding garments as body homotopies). This leads to complex engineering solutions that suffer from scalability, applicability and compatibility. By limiting our scope to garment animation only, we are able to propose a simple model that can animate any outfit, independently of its topology, vertex order or connectivity. Our proposed architecture maps outfits to animated 3D models into the standard format for 3D animation (blend weights and blend shapes matrices), automatically providing of compatibility with any graphics engine. We also propose a methodology to complement supervised learning with an unsupervised physically based learning that implicitly solves collisions and enhances cloth quality.  
  Address VIRTUAL; October 2021  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference ICCVW  
  Notes HUPBA; no proj Approved no  
  Call Number Admin @ si @ ABA2021 Serial 3645  
Permanent link to this record
 

 
Author Ajian Liu; Chenxu Zhao; Zitong Yu; Anyang Su; Xing Liu; Zijian Kong; Jun Wan; Sergio Escalera; Hugo Jair Escalante; Zhen Lei; Guodong Guo edit   pdf
openurl 
  Title 3D High-Fidelity Mask Face Presentation Attack Detection Challenge Type Conference Article
  Year 2021 Publication (down) IEEE/CVF International Conference on Computer Vision Workshops Abbreviated Journal  
  Volume Issue Pages 814-823  
  Keywords  
  Abstract The threat of 3D mask to face recognition systems is increasing serious, and has been widely concerned by researchers. To facilitate the study of the algorithms, a large-scale High-Fidelity Mask dataset, namely CASIA-SURF HiFiMask (briefly HiFiMask) has been collected. Specifically, it consists of total amount of 54,600 videos which are recorded from 75 subjects with 225 realistic masks under 7 new kinds of sensors. Based on this dataset and Protocol 3 which evaluates both the discrimination and generalization ability of the algorithm under the open set scenarios, we organized a 3D High-Fidelity Mask Face Presentation Attack Detection Challenge to boost the research of 3D mask based attack detection. It attracted more than 200 teams for the development phase with a total of 18 teams qualifying for the final round. All the results were verified and re-ran by the organizing team, and the results were used for the final ranking. This paper presents an overview of the challenge, including the introduction of the dataset used, the definition of the protocol, the calculation of the evaluation criteria, and the summary and publication of the competition results. Finally, we focus on introducing and analyzing the top ranked algorithms, the conclusion summary, and the research ideas for mask attack detection provided by this competition.  
  Address Virtual; October 2021  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference ICCVW  
  Notes HUPBA; no proj Approved no  
  Call Number Admin @ si @ LZY2021 Serial 3646  
Permanent link to this record
 

 
Author Claudia Greco; Carmela Buono; Pau Buch-Cardona; Gennaro Cordasco; Sergio Escalera; Anna Esposito; Anais Fernandez; Daria Kyslitska; Maria Stylianou Kornes; Cristina Palmero; Jofre Tenorio Laranga; Anna Torp Johansen; Maria Ines Torres edit   pdf
doi  openurl
  Title Emotional Features of Interactions With Empathic Agents Type Conference Article
  Year 2021 Publication (down) IEEE/CVF International Conference on Computer Vision Workshops Abbreviated Journal  
  Volume Issue Pages 2168-2176  
  Keywords  
  Abstract The current study is part of the EMPATHIC project, whose aim is to develop an Empathic Virtual Coach (VC) capable of promoting healthy and independent aging. To this end, the VC needs to be capable of perceiving the emotional states of users and adjusting its behaviour during the interactions according to what the users are experiencing in terms of emotions and comfort. Thus, the present work focuses on some sessions where elderly users of three different countries interact with a simulated system. Audio and video information extracted from these sessions were examined by external observers to assess participants' emotional experience with the EMPATHIC-VC in terms of categorical and dimensional assessment of emotions. Analyses were conducted on the emotional labels assigned by the external observers while participants were engaged in two different scenarios: a generic one, where the interaction was carried out with no intention to discuss a specific topic, and a nutrition one, aimed to accomplish a conversation on users' nutritional habits. Results of analyses performed on both audio and video data revealed that the EMPATHIC coach did not elicit negative feelings in the users. Indeed, users from all countries have shown relaxed and positive behavior when interacting with the simulated VC during both scenarios. Overall, the EMPATHIC-VC was capable to offer an enjoyable experience without eliciting negative feelings in the users. This supports the hypothesis that an Empathic Virtual Coach capable of considering users' expectations and emotional states could support elderly people in daily life activities and help them to remain independent.  
  Address VIRTUAL; October 2021  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference ICCVW  
  Notes HUPBA; no proj Approved no  
  Call Number Admin @ si @ GBB2021 Serial 3647  
Permanent link to this record
 

 
Author David Curto; Albert Clapes; Javier Selva; Sorina Smeureanu; Julio C. S. Jacques Junior; David Gallardo-Pujol; Georgina Guilera; David Leiva; Thomas B. Moeslund; Sergio Escalera; Cristina Palmero edit   pdf
doi  openurl
  Title Dyadformer: A Multi-Modal Transformer for Long-Range Modeling of Dyadic Interactions Type Conference Article
  Year 2021 Publication (down) IEEE/CVF International Conference on Computer Vision Workshops Abbreviated Journal  
  Volume Issue Pages 2177-2188  
  Keywords  
  Abstract Personality computing has become an emerging topic in computer vision, due to the wide range of applications it can be used for. However, most works on the topic have focused on analyzing the individual, even when applied to interaction scenarios, and for short periods of time. To address these limitations, we present the Dyadformer, a novel multi-modal multi-subject Transformer architecture to model individual and interpersonal features in dyadic interactions using variable time windows, thus allowing the capture of long-term interdependencies. Our proposed cross-subject layer allows the network to explicitly model interactions among subjects through attentional operations. This proof-of-concept approach shows how multi-modality and joint modeling of both interactants for longer periods of time helps to predict individual attributes. With Dyadformer, we improve state-of-the-art self-reported personality inference results on individual subjects on the UDIVA v0.5 dataset.  
  Address Virtual; October 2021  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference ICCVW  
  Notes HUPBA; no proj Approved no  
  Call Number Admin @ si @ CCS2021 Serial 3648  
Permanent link to this record
 

 
Author Neelu Madan; Arya Farkhondeh; Kamal Nasrollahi; Sergio Escalera; Thomas B. Moeslund edit   pdf
openurl 
  Title Temporal Cues From Socially Unacceptable Trajectories for Anomaly Detection Type Conference Article
  Year 2021 Publication (down) IEEE/CVF International Conference on Computer Vision Workshops Abbreviated Journal  
  Volume Issue Pages 2150-2158  
  Keywords  
  Abstract State-of-the-Art (SoTA) deep learning-based approaches to detect anomalies in surveillance videos utilize limited temporal information, including basic information from motion, e.g., optical flow computed between consecutive frames. In this paper, we compliment the SoTA methods by including long-range dependencies from trajectories for anomaly detection. To achieve that, we first created trajectories by running a tracker on two SoTA datasets, namely Avenue and Shanghai-Tech. We propose a prediction-based anomaly detection method using trajectories based on Social GANs, also called in this paper as temporal-based anomaly detection. Then, we hypothesize that late fusion of the result of this temporal-based anomaly detection system with spatial-based anomaly detection systems produces SoTA results. We verify this hypothesis on two spatial-based anomaly detection systems. We show that both cases produce results better than baseline spatial-based systems, indicating the usefulness of the temporal information coming from the trajectories for anomaly detection. We observe that the proposed approach depicts the maximum improvement in micro-level Area-Under-the-Curve (AUC) by 4.1% on CUHK Avenue and 3.4% on Shanghai-Tech over one of the baseline method. We also show a high performance on cross-data evaluation, where we learn the weights to combine spatial and temporal information on Shanghai-Tech and perform evaluation on CUHK Avenue and vice-versa.  
  Address Virtual; October 2021  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference ICCVW  
  Notes HUPBA; no proj Approved no  
  Call Number Admin @ si @ MFN2021 Serial 3649  
Permanent link to this record
Select All    Deselect All
 |   | 
Details
   print

Save Citations:
Export Records: