Home | [1–10] << 11 12 13 14 15 16 17 18 19 20 >> [21–30] |
Records | |||||
---|---|---|---|---|---|
Author | Muhammad Muzzamil Luqman; Jean-Yves Ramel; Josep Llados; Thierry Brouard | ||||
Title | Fuzzy Multilevel Graph Embedding | Type | Journal Article | ||
Year | 2013 | Publication | Pattern Recognition | Abbreviated Journal | PR |
Volume | 46 | Issue | 2 | Pages | 551-565 |
Keywords | Pattern recognition; Graphics recognition; Graph clustering; Graph classification; Explicit graph embedding; Fuzzy logic | ||||
Abstract | Structural pattern recognition approaches offer the most expressive, convenient, powerful but computational expensive representations of underlying relational information. To benefit from mature, less expensive and efficient state-of-the-art machine learning models of statistical pattern recognition they must be mapped to a low-dimensional vector space. Our method of explicit graph embedding bridges the gap between structural and statistical pattern recognition. We extract the topological, structural and attribute information from a graph and encode numeric details by fuzzy histograms and symbolic details by crisp histograms. The histograms are concatenated to achieve a simple and straightforward embedding of graph into a low-dimensional numeric feature vector. Experimentation on standard public graph datasets shows that our method outperforms the state-of-the-art methods of graph embedding for richly attributed graphs. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Elsevier | Place of Publication | Editor | ||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | 0031-3203 | ISBN | Medium | ||
Area | Expedition | Conference | |||
Notes | DAG; 600.042; 600.045; 605.203 | Approved | no | ||
Call Number | Admin @ si @ LRL2013a | Serial | 2270 | ||
Permanent link to this record | |||||
Author | Anjan Dutta; Josep Llados; Umapada Pal | ||||
Title | A symbol spotting approach in graphical documents by hashing serialized graphs | Type | Journal Article | ||
Year | 2013 | Publication | Pattern Recognition | Abbreviated Journal | PR |
Volume | 46 | Issue | 3 | Pages | 752-768 |
Keywords | Symbol spotting; Graphics recognition; Graph matching; Graph serialization; Graph factorization; Graph paths; Hashing | ||||
Abstract | In this paper we propose a symbol spotting technique in graphical documents. Graphs are used to represent the documents and a (sub)graph matching technique is used to detect the symbols in them. We propose a graph serialization to reduce the usual computational complexity of graph matching. Serialization of graphs is performed by computing acyclic graph paths between each pair of connected nodes. Graph paths are one-dimensional structures of graphs which are less expensive in terms of computation. At the same time they enable robust localization even in the presence of noise and distortion. Indexing in large graph databases involves a computational burden as well. We propose a graph factorization approach to tackle this problem. Factorization is intended to create a unified indexed structure over the database of graphical documents. Once graph paths are extracted, the entire database of graphical documents is indexed in hash tables by locality sensitive hashing (LSH) of shape descriptors of the paths. The hashing data structure aims to execute an approximate k-NN search in a sub-linear time. We have performed detailed experiments with various datasets of line drawings and compared our method with the state-of-the-art works. The results demonstrate the effectiveness and efficiency of our technique. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Elsevier | Place of Publication | Editor | ||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | 0031-3203 | ISBN | Medium | ||
Area | Expedition | Conference | |||
Notes | DAG; 600.042; 600.045; 605.203; 601.152 | Approved | no | ||
Call Number | Admin @ si @ DLP2012 | Serial | 2127 | ||
Permanent link to this record | |||||
Author | Frederic Sampedro; Sergio Escalera; Anna Puig | ||||
Title | Iterative Multiclass Multiscale Stacked Sequential Learning: definition and application to medical volume segmentation | Type | Journal Article | ||
Year | 2014 | Publication | Pattern Recognition Letters | Abbreviated Journal | PRL |
Volume | 46 | Issue | Pages | 1-10 | |
Keywords | Machine learning; Sequential learning; Multi-class problems; Contextual learning; Medical volume segmentation | ||||
Abstract | In this work we present the iterative multi-class multi-scale stacked sequential learning framework (IMMSSL), a novel learning scheme that is particularly suited for medical volume segmentation applications. This model exploits the inherent voxel contextual information of the structures of interest in order to improve its segmentation performance results. Without any feature set or learning algorithm prior assumption, the proposed scheme directly seeks to learn the contextual properties of a region from the predicted classifications of previous classifiers within an iterative scheme. Performance results regarding segmentation accuracy in three two-class and multi-class medical volume datasets show a significant improvement with respect to state of the art alternatives. Due to its easiness of implementation and its independence of feature space and learning algorithm, the presented machine learning framework could be taken into consideration as a first choice in complex volume segmentation scenarios. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | HuPBA;MILAB | Approved | no | ||
Call Number | Admin @ si @ SEP2014 | Serial | 2550 | ||
Permanent link to this record | |||||
Author | Miguel Angel Bautista; Antonio Hernandez; Sergio Escalera; Laura Igual; Oriol Pujol; Josep Moya; Veronica Violant; Maria Teresa Anguera | ||||
Title | A Gesture Recognition System for Detecting Behavioral Patterns of ADHD | Type | Journal Article | ||
Year | 2016 | Publication | IEEE Transactions on System, Man and Cybernetics, Part B | Abbreviated Journal | TSMCB |
Volume | 46 | Issue | 1 | Pages | 136-147 |
Keywords | Gesture Recognition; ADHD; Gaussian Mixture Models; Convex Hulls; Dynamic Time Warping; Multi-modal RGB-Depth data | ||||
Abstract | We present an application of gesture recognition using an extension of Dynamic Time Warping (DTW) to recognize behavioural patterns of Attention Deficit Hyperactivity Disorder (ADHD). We propose an extension of DTW using one-class classifiers in order to be able to encode the variability of a gesture category, and thus, perform an alignment between a gesture sample and a gesture class. We model the set of gesture samples of a certain gesture category using either GMMs or an approximation of Convex Hulls. Thus, we add a theoretical contribution to classical warping path in DTW by including local modeling of intra-class gesture variability. This methodology is applied in a clinical context, detecting a group of ADHD behavioural patterns defined by experts in psychology/psychiatry, to provide support to clinicians in the diagnose procedure. The proposed methodology is tested on a novel multi-modal dataset (RGB plus Depth) of ADHD children recordings with behavioural patterns. We obtain satisfying results when compared to standard state-of-the-art approaches in the DTW context. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | HuPBA; MILAB; | Approved | no | ||
Call Number | Admin @ si @ BHE2016 | Serial | 2566 | ||
Permanent link to this record | |||||
Author | C. Alejandro Parraga | ||||
Title | Colours and Colour Vision: An Introductory Survey | Type | Journal Article | ||
Year | 2017 | Publication | Perception | Abbreviated Journal | PER |
Volume | 46 | Issue | 5 | Pages | 640-641 |
Keywords | |||||
Abstract | |||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | NEUROBIT; no menciona | Approved | no | ||
Call Number | Par2017 | Serial | 3101 | ||
Permanent link to this record | |||||
Author | Simone Balocco; Francesco Ciompi; Juan Rigla; Xavier Carrillo; J. Mauri; Petia Radeva | ||||
Title | Assessment of intracoronary stent location and extension in intravascular ultrasound sequences | Type | Journal Article | ||
Year | 2019 | Publication | Medical Physics | Abbreviated Journal | MEDPHYS |
Volume | 46 | Issue | 2 | Pages | 484-493 |
Keywords | IVUS; malapposition; stent; ultrasound | ||||
Abstract | PURPOSE:
An intraluminal coronary stent is a metal scaffold deployed in a stenotic artery during percutaneous coronary intervention (PCI). In order to have an effective deployment, a stent should be optimally placed with regard to anatomical structures such as bifurcations and stenoses. Intravascular ultrasound (IVUS) is a catheter-based imaging technique generally used for PCI guiding and assessing the correct placement of the stent. A novel approach that automatically detects the boundaries and the position of the stent along the IVUS pullback is presented. Such a technique aims at optimizing the stent deployment. METHODS: The method requires the identification of the stable frames of the sequence and the reliable detection of stent struts. Using these data, a measure of likelihood for a frame to contain a stent is computed. Then, a robust binary representation of the presence of the stent in the pullback is obtained applying an iterative and multiscale quantization of the signal to symbols using the Symbolic Aggregate approXimation algorithm. RESULTS: The technique was extensively validated on a set of 103 IVUS of sequences of in vivo coronary arteries containing metallic and bioabsorbable stents acquired through an international multicentric collaboration across five clinical centers. The method was able to detect the stent position with an overall F-measure of 86.4%, a Jaccard index score of 75% and a mean distance of 2.5 mm from manually annotated stent boundaries, and in bioabsorbable stents with an overall F-measure of 88.6%, a Jaccard score of 77.7 and a mean distance of 1.5 mm from manually annotated stent boundaries. Additionally, a map indicating the distance between the lumen and the stent along the pullback is created in order to show the angular sectors of the sequence in which the malapposition is present. CONCLUSIONS: Results obtained comparing the automatic results vs the manual annotation of two observers shows that the method approaches the interobserver variability. Similar performances are obtained on both metallic and bioabsorbable stents, showing the flexibility and robustness of the method. |
||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | MILAB; no proj | Approved | no | ||
Call Number | Admin @ si @ BCR2019 | Serial | 3231 | ||
Permanent link to this record | |||||
Author | Noha Elfiky; Fahad Shahbaz Khan; Joost Van de Weijer; Jordi Gonzalez | ||||
Title | Discriminative Compact Pyramids for Object and Scene Recognition | Type | Journal Article | ||
Year | 2012 | Publication | Pattern Recognition | Abbreviated Journal | PR |
Volume | 45 | Issue | 4 | Pages | 1627-1636 |
Keywords | |||||
Abstract | Spatial pyramids have been successfully applied to incorporating spatial information into bag-of-words based image representation. However, a major drawback is that it leads to high dimensional image representations. In this paper, we present a novel framework for obtaining compact pyramid representation. First, we investigate the usage of the divisive information theoretic feature clustering (DITC) algorithm in creating a compact pyramid representation. In many cases this method allows us to reduce the size of a high dimensional pyramid representation up to an order of magnitude with little or no loss in accuracy. Furthermore, comparison to clustering based on agglomerative information bottleneck (AIB) shows that our method obtains superior results at significantly lower computational costs. Moreover, we investigate the optimal combination of multiple features in the context of our compact pyramid representation. Finally, experiments show that the method can obtain state-of-the-art results on several challenging data sets. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | 0031-3203 | ISBN | Medium | ||
Area | Expedition | Conference | |||
Notes | ISE; CAT;CIC | Approved | no | ||
Call Number | Admin @ si @ EKW2012 | Serial | 1807 | ||
Permanent link to this record | |||||
Author | Bogdan Raducanu; Fadi Dornaika | ||||
Title | A Supervised Non-linear Dimensionality Reduction Approach for Manifold Learning | Type | Journal Article | ||
Year | 2012 | Publication | Pattern Recognition | Abbreviated Journal | PR |
Volume | 45 | Issue | 6 | Pages | 2432-2444 |
Keywords | |||||
Abstract | IF= 2.61
IF=2.61 (2010) In this paper we introduce a novel supervised manifold learning technique called Supervised Laplacian Eigenmaps (S-LE), which makes use of class label information to guide the procedure of non-linear dimensionality reduction by adopting the large margin concept. The graph Laplacian is split into two components: within-class graph and between-class graph to better characterize the discriminant property of the data. Our approach has two important characteristics: (i) it adaptively estimates the local neighborhood surrounding each sample based on data density and similarity and (ii) the objective function simultaneously maximizes the local margin between heterogeneous samples and pushes the homogeneous samples closer to each other. Our approach has been tested on several challenging face databases and it has been conveniently compared with other linear and non-linear techniques, demonstrating its superiority. Although we have concentrated in this paper on the face recognition problem, the proposed approach could also be applied to other category of objects characterized by large variations in their appearance (such as hand or body pose, for instance. |
||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Elsevier | Place of Publication | Editor | ||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | 0031-3203 | ISBN | Medium | ||
Area | Expedition | Conference | |||
Notes | OR; MV | Approved | no | ||
Call Number | Admin @ si @ RaD2012a | Serial | 1884 | ||
Permanent link to this record | |||||
Author | Jon Almazan; Alicia Fornes; Ernest Valveny | ||||
Title | A non-rigid appearance model for shape description and recognition | Type | Journal Article | ||
Year | 2012 | Publication | Pattern Recognition | Abbreviated Journal | PR |
Volume | 45 | Issue | 9 | Pages | 3105--3113 |
Keywords | Shape recognition; Deformable models; Shape modeling; Hand-drawn recognition | ||||
Abstract | In this paper we describe a framework to learn a model of shape variability in a set of patterns. The framework is based on the Active Appearance Model (AAM) and permits to combine shape deformations with appearance variability. We have used two modifications of the Blurred Shape Model (BSM) descriptor as basic shape and appearance features to learn the model. These modifications permit to overcome the rigidity of the original BSM, adapting it to the deformations of the shape to be represented. We have applied this framework to representation and classification of handwritten digits and symbols. We show that results of the proposed methodology outperform the original BSM approach. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | 0031-3203 | ISBN | Medium | ||
Area | Expedition | Conference | |||
Notes | DAG | Approved | no | ||
Call Number | DAG @ dag @ AFV2012 | Serial | 1982 | ||
Permanent link to this record | |||||
Author | Jaume Gibert; Ernest Valveny; Horst Bunke | ||||
Title | Graph Embedding in Vector Spaces by Node Attribute Statistics | Type | Journal Article | ||
Year | 2012 | Publication | Pattern Recognition | Abbreviated Journal | PR |
Volume | 45 | Issue | 9 | Pages | 3072-3083 |
Keywords | Structural pattern recognition; Graph embedding; Data clustering; Graph classification | ||||
Abstract | Graph-based representations are of broad use and applicability in pattern recognition. They exhibit, however, a major drawback with regards to the processing tools that are available in their domain. Graphembedding into vectorspaces is a growing field among the structural pattern recognition community which aims at providing a feature vector representation for every graph, and thus enables classical statistical learning machinery to be used on graph-based input patterns. In this work, we propose a novel embedding methodology for graphs with continuous nodeattributes and unattributed edges. The approach presented in this paper is based on statistics of the node labels and the edges between them, based on their similarity to a set of representatives. We specifically deal with an important issue of this methodology, namely, the selection of a suitable set of representatives. In an experimental evaluation, we empirically show the advantages of this novel approach in the context of different classification problems using several databases of graphs. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | 0031-3203 | ISBN | Medium | ||
Area | Expedition | Conference | |||
Notes | DAG | Approved | no | ||
Call Number | Admin @ si @ GVB2012a | Serial | 1992 | ||
Permanent link to this record | |||||
Author | Jorge Bernal; F. Javier Sanchez; Fernando Vilariño | ||||
Title | Towards Automatic Polyp Detection with a Polyp Appearance Model | Type | Journal Article | ||
Year | 2012 | Publication | Pattern Recognition | Abbreviated Journal | PR |
Volume | 45 | Issue | 9 | Pages | 3166-3182 |
Keywords | Colonoscopy,PolypDetection,RegionSegmentation,SA-DOVA descriptot | ||||
Abstract | This work aims at the automatic polyp detection by using a model of polyp appearance in the context of the analysis of colonoscopy videos. Our method consists of three stages: region segmentation, region description and region classification. The performance of our region segmentation method guarantees that if a polyp is present in the image, it will be exclusively and totally contained in a single region. The output of the algorithm also defines which regions can be considered as non-informative. We define as our region descriptor the novel Sector Accumulation-Depth of Valleys Accumulation (SA-DOVA), which provides a necessary but not sufficient condition for the polyp presence. Finally, we classify our segmented regions according to the maximal values of the SA-DOVA descriptor. Our preliminary classification results are promising, especially when classifying those parts of the image that do not contain a polyp inside. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Elsevier | Place of Publication | Editor | ||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | 0031-3203 | ISBN | Medium | ||
Area | 800 | Expedition | Conference | IbPRIA | |
Notes | MV;SIAI | Approved | no | ||
Call Number | Admin @ si @ BSV2012; IAM @ iam | Serial | 1997 | ||
Permanent link to this record | |||||
Author | Susana Alvarez; Maria Vanrell | ||||
Title | Texton theory revisited: a bag-of-words approach to combine textons | Type | Journal Article | ||
Year | 2012 | Publication | Pattern Recognition | Abbreviated Journal | PR |
Volume | 45 | Issue | 12 | Pages | 4312-4325 |
Keywords | |||||
Abstract | The aim of this paper is to revisit an old theory of texture perception and
update its computational implementation by extending it to colour. With this in mind we try to capture the optimality of perceptual systems. This is achieved in the proposed approach by sharing well-known early stages of the visual processes and extracting low-dimensional features that perfectly encode adequate properties for a large variety of textures without needing further learning stages. We propose several descriptors in a bag-of-words framework that are derived from different quantisation models on to the feature spaces. Our perceptual features are directly given by the shape and colour attributes of image blobs, which are the textons. In this way we avoid learning visual words and directly build the vocabularies on these lowdimensionaltexton spaces. Main differences between proposed descriptors rely on how co-occurrence of blob attributes is represented in the vocabularies. Our approach overcomes current state-of-art in colour texture description which is proved in several experiments on large texture datasets. |
||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | 0031-3203 | ISBN | Medium | ||
Area | Expedition | Conference | |||
Notes | CIC | Approved | no | ||
Call Number | Admin @ si @ AlV2012a | Serial | 2130 | ||
Permanent link to this record | |||||
Author | Partha Pratim Roy; Umapada Pal; Josep Llados; Mathieu Nicolas Delalandre | ||||
Title | Multi-oriented touching text character segmentation in graphical documents using dynamic programming | Type | Journal Article | ||
Year | 2012 | Publication | Pattern Recognition | Abbreviated Journal | PR |
Volume | 45 | Issue | 5 | Pages | 1972-1983 |
Keywords | |||||
Abstract | 2,292 JCR
The touching character segmentation problem becomes complex when touching strings are multi-oriented. Moreover in graphical documents sometimes characters in a single-touching string have different orientations. Segmentation of such complex touching is more challenging. In this paper, we present a scheme towards the segmentation of English multi-oriented touching strings into individual characters. When two or more characters touch, they generate a big cavity region in the background portion. Based on the convex hull information, at first, we use this background information to find some initial points for segmentation of a touching string into possible primitives (a primitive consists of a single character or part of a character). Next, the primitives are merged to get optimum segmentation. A dynamic programming algorithm is applied for this purpose using the total likelihood of characters as the objective function. A SVM classifier is used to find the likelihood of a character. To consider multi-oriented touching strings the features used in the SVM are invariant to character orientation. Experiments were performed in different databases of real and synthetic touching characters and the results show that the method is efficient in segmenting touching characters of arbitrary orientations and sizes. |
||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | 0031-3203 | ISBN | Medium | ||
Area | Expedition | Conference | |||
Notes | DAG | Approved | no | ||
Call Number | Admin @ si @ RPL2012a | Serial | 2133 | ||
Permanent link to this record | |||||
Author | Katerine Diaz; Aura Hernandez-Sabate; Antonio Lopez | ||||
Title | A reduced feature set for driver head pose estimation | Type | Journal Article | ||
Year | 2016 | Publication | Applied Soft Computing | Abbreviated Journal | ASOC |
Volume | 45 | Issue | Pages | 98-107 | |
Keywords | Head pose estimation; driving performance evaluation; subspace based methods; linear regression | ||||
Abstract | Evaluation of driving performance is of utmost importance in order to reduce road accident rate. Since driving ability includes visual-spatial and operational attention, among others, head pose estimation of the driver is a crucial indicator of driving performance. This paper proposes a new automatic method for coarse and fine head's yaw angle estimation of the driver. We rely on a set of geometric features computed from just three representative facial keypoints, namely the center of the eyes and the nose tip. With these geometric features, our method combines two manifold embedding methods and a linear regression one. In addition, the method has a confidence mechanism to decide if the classification of a sample is not reliable. The approach has been tested using the CMU-PIE dataset and our own driver dataset. Despite the very few facial keypoints required, the results are comparable to the state-of-the-art techniques. The low computational cost of the method and its robustness makes feasible to integrate it in massive consume devices as a real time application. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | ADAS; 600.085; 600.076; | Approved | no | ||
Call Number | Admin @ si @ DHL2016 | Serial | 2760 | ||
Permanent link to this record | |||||
Author | Swathikiran Sudhakaran; Sergio Escalera; Oswald Lanz | ||||
Title | Gate-Shift-Fuse for Video Action Recognition | Type | Journal Article | ||
Year | 2023 | Publication | IEEE Transactions on Pattern Analysis and Machine Intelligence | Abbreviated Journal | TPAMI |
Volume | 45 | Issue | 9 | Pages | 10913-10928 |
Keywords | Action Recognition; Video Classification; Spatial Gating; Channel Fusion | ||||
Abstract | Convolutional Neural Networks are the de facto models for image recognition. However 3D CNNs, the straight forward extension of 2D CNNs for video recognition, have not achieved the same success on standard action recognition benchmarks. One of the main reasons for this reduced performance of 3D CNNs is the increased computational complexity requiring large scale annotated datasets to train them in scale. 3D kernel factorization approaches have been proposed to reduce the complexity of 3D CNNs. Existing kernel factorization approaches follow hand-designed and hard-wired techniques. In this paper we propose Gate-Shift-Fuse (GSF), a novel spatio-temporal feature extraction module which controls interactions in spatio-temporal decomposition and learns to adaptively route features through time and combine them in a data dependent manner. GSF leverages grouped spatial gating to decompose input tensor and channel weighting to fuse the decomposed tensors. GSF can be inserted into existing 2D CNNs to convert them into an efficient and high performing spatio-temporal feature extractor, with negligible parameter and compute overhead. We perform an extensive analysis of GSF using two popular 2D CNN families and achieve state-of-the-art or competitive performance on five standard action recognition benchmarks. | ||||
Address | 1 Sept. 2023 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | HUPBA; no menciona | Approved | no | ||
Call Number | Admin @ si @ SEL2023 | Serial | 3814 | ||
Permanent link to this record |