toggle visibility Search & Display Options

Select All    Deselect All
 |   | 
Details
   print
  Records Links
Author Antonio Lopez; Joan Serrat edit  openurl
  Title (down) Tracing crease curves by solving a system of differential equations. Type Miscellaneous
  Year 1996 Publication Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes ADAS Approved no  
  Call Number ADAS @ adas @ LoS1996 Serial 84  
Permanent link to this record
 

 
Author Carles Sanchez edit  isbn
openurl 
  Title (down) Tracheal Structure Characterization using Geometric and Appearance Models for Efficient Assessment of Stenosis in Videobronchoscopy Type Book Whole
  Year 2014 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract Recent advances in endoscopic devices have increased their use for minimal invasive diagnostic and intervention procedures. Among all endoscopic modalities, bronchoscopy is one of the most frequent with around 261 millions of procedures per year. Although the use of bronchoscopy is spread among clinical facilities it presents some drawbacks, being the visual inspection for the assessment of anatomical measurements the most prevalent of them. In
particular, inaccuracies in the estimation of the degree of stenosis (the percentage of obstructed airway) decreases its diagnostic yield and might lead to erroneous treatments. An objective computation of tracheal stenosis in bronchoscopy videos would constitute a breakthrough for this non-invasive technique and a reduction in treatment cost.
This thesis settles the first steps towards on-line reliable extraction of anatomical information from videobronchoscopy for computation of objective measures. In particular, we focus on the computation of the degree of stenosis, which is obtained by comparing the area delimited by a healthy tracheal ring and the stenosed lumen. Reliable extraction of airway structures in interventional videobronchoscopy is a challenging task. This is mainly due to the large variety of acquisition conditions (positions and illumination), devices (different digitalizations) and in videos acquired at the operating room the unpredicted presence of surgical devices (such as probe ends). This thesis contributes to on-line stenosis assessment in several ways. We
propose a parametric strategy for the extraction of lumen and tracheal rings regions based on the characterization of their geometry and appearance that guide a deformable model. The geometric and appearance characterization is based on a physical model describing the way bronchoscopy images are obtained and includes local and global descriptions. In order to ensure a systematic applicability we present a statistical framework to select the optimal
parameters of our method. Experiments perform on the first public annotated database, show that the performance of our method is comparable to the one provided by clinicians and its computation time allows for a on-line implementation in the operating room.
 
  Address  
  Corporate Author Thesis Ph.D. thesis  
  Publisher Ediciones Graficas Rey Place of Publication Editor F. Javier Sanchez;Debora Gil;Jorge Bernal  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN 978-84-940902-9-5 Medium  
  Area Expedition Conference  
  Notes IAM; 600.075 Approved no  
  Call Number Admin @ si @ San2014 Serial 2575  
Permanent link to this record
 

 
Author Carles Sanchez edit   pdf
openurl 
  Title (down) Tracheal ring detection in bronchoscopy Type Report
  Year 2011 Publication CVC Technical Report Abbreviated Journal  
  Volume 168 Issue Pages  
  Keywords Bronchoscopy, tracheal ring, segmentation  
  Abstract Endoscopy is the process in which a camera is introduced inside a human.
Given that endoscopy provides realistic images (in contrast to other modalities) and allows non-invase minimal intervention procedures (which can aid in diagnosis and surgical interventions), its use has spreaded during last decades.
In this project we will focus on bronchoscopic procedures, during which the camera is introduced through the trachea in order to have a diagnostic of the patient. The diagnostic interventions are focused on: degree of stenosis (reduction in tracheal area), prosthesis or early diagnosis of tumors. In the first case, assessment of the luminal area and the calculation of the diameters of the tracheal rings are required. A main limitation is that all the process is done by hand,
which means that the doctor takes all the measurements and decisions just by looking at the screen. As far as we know there is no computational framework for helping the doctors in the diagnosis.
This project will consist of analysing bronchoscopic videos in order to extract useful information for the diagnostic of the degree of stenosis. In particular we will focus on segmentation of the tracheal rings. As a result of this project several strategies (for detecting tracheal rings) had been implemented in order to compare their performance.
 
  Address  
  Corporate Author Thesis Master's thesis  
  Publisher Place of Publication Editor Debora Gil, F.Javier Sanchez  
  Language english Summary Language english Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes IAM;MV Approved no  
  Call Number IAM @ iam @ San2011 Serial 1841  
Permanent link to this record
 

 
Author Pau Rodriguez; Jordi Gonzalez; Josep M. Gonfaus; Xavier Roca edit   pdf
openurl 
  Title (down) Towards Visual Personality Questionnaires based on Deep Learning and Social Media Type Conference Article
  Year 2019 Publication 21st International Conference on Social Influence and Social Psychology Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract  
  Address April 2019; Tokio; Japan  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference ICSISP  
  Notes ISE; 600.119 Approved no  
  Call Number Admin @ si @ RGG2020 Serial 3554  
Permanent link to this record
 

 
Author Bonifaz Stuhr edit  isbn
openurl 
  Title (down) Towards Unsupervised Representation Learning: Learning, Evaluating and Transferring Visual Representations Type Book Whole
  Year 2023 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract Unsupervised representation learning aims at finding methods that learn representations from data without annotation-based signals. Abstaining from annotations not only leads to economic benefits but may – and to some extent already does – result in advantages regarding the representation’s structure, robustness, and generalizability to different tasks. In the long run, unsupervised methods are expected to surpass their supervised counterparts due to the reduction of human intervention and the inherently more general setup that does not bias the optimization towards an objective originating from specific annotation-based signals. While major advantages of unsupervised representation learning have been recently observed in natural language processing, supervised methods still dominate in vision domains for most tasks. In this dissertation, we contribute to the field of unsupervised (visual) representation learning from three perspectives: (i) Learning representations: We design unsupervised, backpropagation-free Convolutional Self-Organizing Neural Networks (CSNNs) that utilize self-organization- and Hebbian-based learning rules to learn convolutional kernels and masks to achieve deeper backpropagation-free models. Thereby, we observe that backpropagation-based and -free methods can suffer from an objective function mismatch between the unsupervised pretext task and the target task. This mismatch can lead to performance decreases for the target task. (ii) Evaluating representations: We build upon the widely used (non-)linear evaluation protocol to define pretext- and target-objective-independent metrics for measuring the objective function mismatch. With these metrics, we evaluate various pretext and target tasks and disclose dependencies of the objective function mismatch concerning different parts of the training and model setup. (iii) Transferring representations: We contribute CARLANE, the first 3-way sim-to-real domain adaptation benchmark for 2D lane detection. We adopt several well-known unsupervised domain adaptation methods as baselines and propose a method based on prototypical cross-domain self-supervised learning. Finally, we focus on pixel-based unsupervised domain adaptation and contribute a content-consistent unpaired image-to-image translation method that utilizes masks, global and local discriminators, and similarity sampling to mitigate content inconsistencies, as well as feature-attentive denormalization to fuse content-based statistics into the generator stream. In addition, we propose the cKVD metric to incorporate class-specific content inconsistencies into perceptual metrics for measuring translation quality.  
  Address  
  Corporate Author Thesis Ph.D. thesis  
  Publisher IMPRIA Place of Publication Editor Jordi Gonzalez;Jurgen Brauer  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN 978-84-126409-6-0 Medium  
  Area Expedition Conference  
  Notes ISE Approved no  
  Call Number Admin @ si @ Stu2023 Serial 3966  
Permanent link to this record
 

 
Author Estefania Talavera; Nicolai Petkov; Petia Radeva edit  url
openurl 
  Title (down) Towards Unsupervised Familiar Scene Recognition in Egocentric Videos Type Miscellaneous
  Year 2019 Publication Arxiv Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract CoRR abs/1905.04093
Nowadays, there is an upsurge of interest in using lifelogging devices. Such devices generate huge amounts of image data; consequently, the need for automatic methods for analyzing and summarizing these data is drastically increasing. We present a new method for familiar scene recognition in egocentric videos, based on background pattern detection through automatically configurable COSFIRE filters. We present some experiments over egocentric data acquired with the Narrative Clip.
 
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes MILAB; no menciona Approved no  
  Call Number Admin @ si @ TPR2019b Serial 3379  
Permanent link to this record
 

 
Author Arnau Baro; Pau Riba; Alicia Fornes edit   pdf
doi  openurl
  Title (down) Towards the recognition of compound music notes in handwritten music scores Type Conference Article
  Year 2016 Publication 15th international conference on Frontiers in Handwriting Recognition Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract The recognition of handwritten music scores still remains an open problem. The existing approaches can only deal with very simple handwritten scores mainly because of the variability in the handwriting style and the variability in the composition of groups of music notes (i.e. compound music notes). In this work we focus on this second problem and propose a method based on perceptual grouping for the recognition of compound music notes. Our method has been tested using several handwritten music scores of the CVC-MUSCIMA database and compared with a commercial Optical Music Recognition (OMR) software. Given that our method is learning-free, the obtained results are promising.  
  Address Shenzhen; China; October 2016  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN 2167-6445 ISBN Medium  
  Area Expedition Conference ICFHR  
  Notes DAG; 600.097 Approved no  
  Call Number Admin @ si @ BRF2016 Serial 2903  
Permanent link to this record
 

 
Author Pau Riba; Alicia Fornes; Josep Llados edit  isbn
openurl 
  Title (down) Towards the Alignment of Handwritten Music Scores Type Conference Article
  Year 2015 Publication 11th IAPR International Workshop on Graphics Recognition Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract It is very common to find different versions of the same music work in archives of Opera Theaters. These differences correspond to modifications and annotations from the musicians. From the musicologist point of view, these variations are very interesting and deserve study. This paper explores the alignment of music scores as a tool for automatically detecting the passages that contain such differences. Given the difficulties in the recognition of handwritten music scores, our goal is to align the music scores and at the same time, avoid the recognition of music elements as much as possible. After removing the staff lines, braces and ties, the bar lines are detected. Then, the bar units are described as a whole using the Blurred Shape Model. The bar units alignment is performed by using Dynamic Time Warping. The analysis of the alignment path is used to detect the variations in the music scores. The method has been evaluated on a subset of the CVC-MUSCIMA dataset, showing encouraging results.  
  Address Nancy; France; August 2015  
  Corporate Author Thesis  
  Publisher Springer International Publishing Place of Publication Editor Bart Lamiroy; Rafael Dueire Lins  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title LNCS  
  Series Volume Series Issue Edition  
  ISSN ISBN 978-3-319-52158-9 Medium  
  Area Expedition Conference GREC  
  Notes DAG Approved no  
  Call Number Admin @ si @ Serial 2874  
Permanent link to this record
 

 
Author Pau Riba; Alicia Fornes; Josep Llados edit   pdf
url  isbn
openurl 
  Title (down) Towards the Alignment of Handwritten Music Scores Type Book Chapter
  Year 2017 Publication International Workshop on Graphics Recognition. GREC 2015.Graphic Recognition. Current Trends and Challenges Abbreviated Journal  
  Volume 9657 Issue Pages 103-116  
  Keywords Optical Music Recognition; Handwritten Music Scores; Dynamic Time Warping alignment  
  Abstract It is very common to nd di erent versions of the same music work in archives of Opera Theaters. These di erences correspond to modi cations and annotations from the musicians. From the musicologist point of view, these variations are very interesting and deserve study.
This paper explores the alignment of music scores as a tool for automatically detecting the passages that contain such di erences. Given the diculties in the recognition of handwritten music scores, our goal is to align the music scores and at the same time, avoid the recognition of music elements as much as possible. After removing the sta lines, braces and ties, the bar lines are detected. Then, the bar units are described as a whole using the Blurred Shape Model. The bar units alignment is performed by using Dynamic Time Warping. The analysis of the alignment path is used to detect the variations in the music scores. The method has been evaluated on a subset of the CVC-MUSCIMA dataset, showing encouraging results.
 
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor Bart Lamiroy; R Dueire Lins  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title LNCS  
  Series Volume Series Issue Edition  
  ISSN ISBN 978-3-319-52158-9 Medium  
  Area Expedition Conference  
  Notes DAG; 600.097; 602.006; 600.121 Approved no  
  Call Number Admin @ si @ RFL2017 Serial 2955  
Permanent link to this record
 

 
Author Asma Bensalah; Jialuo Chen; Alicia Fornes; Cristina Carmona_Duarte; Josep Llados; Miguel A. Ferrer edit   pdf
url  openurl
  Title (down) Towards Stroke Patients' Upper-limb Automatic Motor Assessment Using Smartwatches. Type Conference Article
  Year 2020 Publication International Workshop on Artificial Intelligence for Healthcare Applications Abbreviated Journal  
  Volume 12661 Issue Pages 476-489  
  Keywords  
  Abstract Assessing the physical condition in rehabilitation scenarios is a challenging problem, since it involves Human Activity Recognition (HAR) and kinematic analysis methods. In addition, the difficulties increase in unconstrained rehabilitation scenarios, which are much closer to the real use cases. In particular, our aim is to design an upper-limb assessment pipeline for stroke patients using smartwatches. We focus on the HAR task, as it is the first part of the assessing pipeline. Our main target is to automatically detect and recognize four key movements inspired by the Fugl-Meyer assessment scale, which are performed in both constrained and unconstrained scenarios. In addition to the application protocol and dataset, we propose two detection and classification baseline methods. We believe that the proposed framework, dataset and baseline results will serve to foster this research field.  
  Address Virtual; January 2021  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference ICPRW  
  Notes DAG; 600.121; 600.140; Approved no  
  Call Number Admin @ si @ BCF2020 Serial 3508  
Permanent link to this record
 

 
Author Marc Bolaños; Mariella Dimiccoli; Petia Radeva edit   pdf
doi  openurl
  Title (down) Towards Storytelling from Visual Lifelogging: An Overview Type Journal Article
  Year 2017 Publication IEEE Transactions on Human-Machine Systems Abbreviated Journal THMS  
  Volume 47 Issue 1 Pages 77 - 90  
  Keywords  
  Abstract Visual lifelogging consists of acquiring images that capture the daily experiences of the user by wearing a camera over a long period of time. The pictures taken offer considerable potential for knowledge mining concerning how people live their lives, hence, they open up new opportunities for many potential applications in fields including healthcare, security, leisure and
the quantified self. However, automatically building a story from a huge collection of unstructured egocentric data presents major challenges. This paper provides a thorough review of advances made so far in egocentric data analysis, and in view of the current state of the art, indicates new lines of research to move us towards storytelling from visual lifelogging.
 
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes MILAB; 601.235 Approved no  
  Call Number Admin @ si @ BDR2017 Serial 2712  
Permanent link to this record
 

 
Author Shiqi Yang edit  isbn
openurl 
  Title (down) Towards Source-Free Domain Adaption of Neural Networks in an Open World Type Book Whole
  Year 2023 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract Though they achieve great success, deep neural networks typically require a huge
amount of labeled data for training. However, collecting labeled data is often laborious and expensive. It would, therefore, be ideal if the knowledge obtained from label-rich datasets could be transferred to unlabeled data. However, deep networks are weak at generalizing to unseen domains, even when the differences are only subtle between the datasets. In real-world situations, a typical factor impairing the model generalization ability is the distribution shift between data from different domains, which is a long-standing problem usually termed as (unsupervised) domain adaptation.
A crucial requirement in the methodology of these domain adaptation methods is that they require access to source domain data during the adaptation process to the target domain. Accessibility to the source data of a trained source model is often impossible in real-world applications, for example, when deploying domain adaptation algorithms on mobile devices where the computational capacity is limited or in situations where data privacy rules limit access to the source domain data. Without access to the source domain data, existing methods suffer from inferior performance. Thus, in this thesis, we investigate domain adaptation without source data (termed as source-free domain adaptation) in multiple different scenarios that focus on image classification tasks.
We first study the source-free domain adaptation problem in a closed-set setting,
where the label space of different domains is identical. Only accessing the pretrained source model, we propose to address source-free domain adaptation from the perspective of unsupervised clustering. We achieve this based on nearest neighborhood clustering. In this way, we can transfer the challenging source-free domain adaptation task to a type of clustering problem. The final optimization objective is an upper bound containing only two simple terms, which can be explained as discriminability and diversity. We show that this allows us to relate several other methods in domain adaptation, unsupervised clustering and contrastive learning via the perspective of discriminability and diversity.
 
  Address  
  Corporate Author Thesis Ph.D. thesis  
  Publisher IMPRIMA Place of Publication Editor Joost  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN 978-84-126409-3-9 Medium  
  Area Expedition Conference  
  Notes LAMP Approved no  
  Call Number Admin @ si @ Yan2023 Serial 3963  
Permanent link to this record
 

 
Author Maedeh Aghaei; Mariella Dimiccoli; C. Canton-Ferrer; Petia Radeva edit   pdf
url  doi
openurl 
  Title (down) Towards social pattern characterization from egocentric photo-streams Type Journal Article
  Year 2018 Publication Computer Vision and Image Understanding Abbreviated Journal CVIU  
  Volume 171 Issue Pages 104-117  
  Keywords Social pattern characterization; Social signal extraction; Lifelogging; Convolutional and recurrent neural networks  
  Abstract Following the increasingly popular trend of social interaction analysis in egocentric vision, this article presents a comprehensive pipeline for automatic social pattern characterization of a wearable photo-camera user. The proposed framework relies merely on the visual analysis of egocentric photo-streams and consists of three major steps. The first step is to detect social interactions of the user where the impact of several social signals on the task is explored. The detected social events are inspected in the second step for categorization into different social meetings. These two steps act at event-level where each potential social event is modeled as a multi-dimensional time-series, whose dimensions correspond to a set of relevant features for each task; finally, LSTM is employed to classify the time-series. The last step of the framework is to characterize social patterns of the user. Our goal is to quantify the duration, the diversity and the frequency of the user social relations in various social situations. This goal is achieved by the discovery of recurrences of the same people across the whole set of social events related to the user. Experimental evaluation over EgoSocialStyle – the proposed dataset in this work, and EGO-GROUP demonstrates promising results on the task of social pattern characterization from egocentric photo-streams.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes MILAB; no proj Approved no  
  Call Number Admin @ si @ ADC2018 Serial 3022  
Permanent link to this record
 

 
Author Maedeh Aghaei; Mariella Dimiccoli; Petia Radeva edit  doi
openurl 
  Title (down) Towards social interaction detection in egocentric photo-streams Type Conference Article
  Year 2015 Publication Proceedings of SPIE, 8th International Conference on Machine Vision , ICMV 2015 Abbreviated Journal  
  Volume 9875 Issue Pages  
  Keywords  
  Abstract Detecting social interaction in videos relying solely on visual cues is a valuable task that is receiving increasing attention in recent years. In this work, we address this problem in the challenging domain of egocentric photo-streams captured by a low temporal resolution wearable camera (2fpm). The major difficulties to be handled in this context are the sparsity of observations as well as unpredictability of camera motion and attention orientation due to the fact that the camera is worn as part of clothing. Our method consists of four steps: multi-faces localization and tracking, 3D localization, pose estimation and analysis of f-formations. By estimating pair-to-pair interaction probabilities over the sequence, our method states the presence or absence of interaction with the camera wearer and specifies which people are more involved in the interaction. We tested our method over a dataset of 18.000 images and we show its reliability on our considered purpose. © (2015) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference ICMV  
  Notes MILAB Approved no  
  Call Number Admin @ si @ ADR2015a Serial 2702  
Permanent link to this record
 

 
Author Vacit Oguz Yazici edit  isbn
openurl 
  Title (down) Towards Smart Fashion: Visual Recognition of Products and Attributes Type Book Whole
  Year 2022 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract Artificial intelligence is innovating the fashion industry by proposing new applications and solutions to the problems encountered by researchers and engineers working in the industry. In this thesis, we address three of these problems. In the first part of the thesis, we tackle the problem of multi-label image classification which is very related to fashion attribute recognition. In the second part of the thesis, we address two problems that are specific to fashion. Firstly, we address the problem of main product detection which is the task of associating correct image parts (e.g. bounding boxes) with the fashion product being sold. Secondly, we address the problem of color naming for multicolored fashion items. The task of multi-label image classification consists in assigning various concepts such as objects or attributes to images. Usually, there are dependencies that can be learned between the concepts to capture label correlations (chair and table classes are more likely to co-exist than chair and giraffe).
If we treat the multi-label image classification problem as an orderless set prediction problem, we can exploit recurrent neural networks (RNN) to capture label correlations. However, RNNs are trained to predict ordered sequences of tokens, so if the order of the predicted sequence is different than the order of the ground truth sequence, there will be penalization although the predictions are correct. Therefore, in the first part of the thesis, we propose an orderless loss function which will order the labels in the ground truth sequence dynamically in a way that the minimum loss is achieved. This results in a significant improvement of RNN models on multi-label image classification over the previous methods.
However, RNNs suffer from long term dependencies when the cardinality of set grows bigger. The decoding process might stop early if the current hidden state cannot find any object and outputs the termination token. This would cause the remaining classes not to be predicted and lower recall metric. Transformers can be used to avoid the long term dependency problem exploiting their selfattention modules that process sequential data simultaneously. Consequently, we propose a novel transformer model for multi-label image classification which surpasses the state-of-the-art results by a large margin.
In the second part of thesis, we focus on two fashion-specific problems. Main product detection is the task of associating image parts with the fashion product that is being sold, generally using associated textual metadata (product title or description). Normally, in fashion e-commerces, products are represented by multiple images where a person wears the product along with other fashion items. If all the fashion items in the images are marked with bounding boxes, we can use the textual metadata to decide which item is the main product. The initial work treated each of these images independently, discarding the fact that they all belong to the same product. In this thesis, we represent the bounding boxes from all the images as nodes in a fully connected graph. This allows the algorithm to learn relations between the nodes during training and take the entire context into account for the final decision. Our algorithm results in a significant improvement of the state-ofthe-art.
Moreover, we address the problem of color naming for multicolored fashion items, which is a challenging task due to the external factors such as illumination changes or objects that act as clutter. In the context of multi-label classification, the vaguely defined lines between the classes in the color space cause ambiguity. For example, a shade of blue which is very close to green might cause the model to incorrectly predict the color blue and green at the same time. Based on this, models trained for color naming are expected to recognize the colors and their quantities in both single colored and multicolored fashion items. Therefore, in this thesis, we propose a novel architecture with an additional head that explicitly estimates the number of colors in fashion items. This removes the ambiguity problem and results in better color naming performance.
 
  Address January 2022  
  Corporate Author Thesis Ph.D. thesis  
  Publisher IMPRIMA Place of Publication Editor Joost Van de Weijer;Arnau Ramisa  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN 978-84-122714-6-1 Medium  
  Area Expedition Conference  
  Notes LAMP Approved no  
  Call Number Admin @ si @ Ogu2022 Serial 3631  
Permanent link to this record
Select All    Deselect All
 |   | 
Details
   print

Save Citations:
Export Records: