toggle visibility Search & Display Options

Select All    Deselect All
 |   | 
Details
   print
  Records Links
Author Alicia Fornes edit  openurl
  Title Writer Identification by a Combination of Graphical Features in the Framework of Old Handwritten Music Scores Type Book Whole
  Year 2009 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract The analysis and recognition of historical document images has attracted growing interest in the last years. Mass digitization and document image understanding allows the preservation, access and indexation of this artistic, cultural and technical heritage. The analysis of handwritten documents is an outstanding subfield. The main interest is not only the transcription of the document to a standard format, but also, the identification of the author of a document from a set of writers (namely writer identification).

Writer identification in handwritten text documents is an active area of study, however, the identification of the writer of graphical documents is still a challenge. The main objective of this thesis is the identification of the writer in old music scores, as an example of graphic documents. Concerning old music scores, many historical archives contain a huge number of sheets of musical compositions without information about the composer, and the research on this field could be helpful for musicologists.

The writer identification framework proposed in this thesis combines three different writer identification approaches, which are the main scientific contributions. The first one is based on symbol recognition methods. For this purpose, two novel symbol recognition methods are proposed for coping with the typical distortions in hand-drawn symbols. The second approach preprocesses the music score for obtaining music lines, and extracts information about the slant, width of the writing, connected components, contours and fractals. Finally, the third approach extracts global information by generating texture images from the music scores and extracting textural features (such as Gabor filters and co-occurence matrices).

The high identification rates obtained in the experimental results demonstrate the suitability of the proposed ensemble architecture. To the best of our knowledge, this work is the first contribution on writer identification from images containing graphical languages.
 
  Address Barcelona (Spain)  
  Corporate Author Thesis Ph.D. thesis  
  Publisher Ediciones Graficas Rey Place of Publication Editor (down) Josep Llados;Gemma Sanchez  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes Approved no  
  Call Number DAG @ dag @ For2009 Serial 1265  
Permanent link to this record
 

 
Author Hongxing Gao edit  isbn
openurl 
  Title Focused Structural Document Image Retrieval in Digital Mailroom Applications Type Book Whole
  Year 2015 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract In this work, we develop a generic framework that is able to handle the document retrieval problem in various scenarios such as searching for full page matches or retrieving the counterparts for specific document areas, focusing on their structural similarity or letting their visual resemblance to play a dominant role. Based on the spatial indexing technique, we propose to search for matches of local key-region pairs carrying both structural and visual information from the collection while a scheme allowing to adjust the relative contribution of structural and visual similarity is presented.
Based on the fact that the structure of documents is tightly linked with the distance among their elements, we firstly introduce an efficient detector named Distance Transform based Maximally Stable Extremal Regions (DTMSER). We illustrate that this detector is able to efficiently extract the structure of a document image as a dendrogram (hierarchical tree) of multi-scale key-regions that roughly correspond to letters, words and paragraphs. We demonstrate that, without benefiting from the structure information, the key-regions extracted by the DTMSER algorithm achieve better results comparing with state-of-the-art methods while much less amount of key-regions are employed.
We subsequently propose a pair-wise Bag of Words (BoW) framework to efficiently embed the explicit structure extracted by the DTMSER algorithm. We represent each document as a list of key-region pairs that correspond to the edges in the dendrogram where inclusion relationship is encoded. By employing those structural key-region pairs as the pooling elements for generating the histogram of features, the proposed method is able to encode the explicit inclusion relations into a BoW representation. The experimental results illustrate that the pair-wise BoW, powered by the embedded structural information, achieves remarkable improvement over the conventional BoW and spatial pyramidal BoW methods.
To handle various retrieval scenarios in one framework, we propose to directly query a series of key-region pairs, carrying both structure and visual information, from the collection. We introduce the spatial indexing techniques to the document retrieval community to speed up the structural relationship computation for key-region pairs. We firstly test the proposed framework in a full page retrieval scenario where structurally similar matches are expected. In this case, the pair-wise querying method achieves notable improvement over the BoW and spatial pyramidal BoW frameworks. Furthermore, we illustrate that the proposed method is also able to handle focused retrieval situations where the queries are defined as a specific interesting partial areas of the images. We examine our method on two types of focused queries: structure-focused and exact queries. The experimental results show that, the proposed generic framework obtains nearly perfect precision on both types of focused queries while it is the first framework able to tackle structure-focused queries, setting a new state of the art in the field.
Besides, we introduce a line verification method to check the spatial consistency among the matched key-region pairs. We propose a computationally efficient version of line verification through a two step implementation. We first compute tentative localizations of the query and subsequently employ them to divide the matched key-region pairs into several groups, then line verification is performed within each group while more precise bounding boxes are computed. We demonstrate that, comparing with the standard approach (based on RANSAC), the line verification proposed generally achieves much higher recall with slight loss on precision on specific queries.
 
  Address January 2015  
  Corporate Author Thesis Ph.D. thesis  
  Publisher Ediciones Graficas Rey Place of Publication Editor (down) Josep Llados;Dimosthenis Karatzas;Marçal Rusiñol  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN 978-84-943427-0-7 Medium  
  Area Expedition Conference  
  Notes DAG; 600.077 Approved no  
  Call Number Admin @ si @ Gao2015 Serial 2577  
Permanent link to this record
 

 
Author David Fernandez edit  isbn
openurl 
  Title Contextual Word Spotting in Historical Handwritten Documents Type Book Whole
  Year 2014 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract There are countless collections of historical documents in archives and libraries that contain plenty of valuable information for historians and researchers. The extraction of this information has become a central task among the Document Analysis researches and practitioners.
There is an increasing interest to digital preserve and provide access to these kind of documents. But only the digitalization is not enough for the researchers. The extraction and/or indexation of information of this documents has had an increased interest among researchers. In many cases, and in particular in historical manuscripts, the full transcription of these documents is extremely dicult due the inherent de ciencies: poor physical preservation, di erent writing styles, obsolete languages, etc. Word spotting has become a popular an ecient alternative to full transcription. It inherently involves a high level of degradation in the images. The search of words is holistically
formulated as a visual search of a given query shape in a larger image, instead of recognising the input text and searching the query word with an ascii string comparison. But the performance of classical word spotting approaches depend on the degradation level of the images being unacceptable in many cases . In this thesis we have proposed a novel paradigm called contextual word spotting method that uses the contextual/semantic information to achieve acceptable results whereas classical word spotting does not reach. The contextual word spotting framework proposed in this thesis is a segmentation-based word spotting approach, so an ecient word segmentation is needed. Historical handwritten
documents present some common diculties that can increase the diculties the extraction of the words. We have proposed a line segmentation approach that formulates the problem as nding the central part path in the area between two consecutive lines. This is solved as a graph traversal problem. A path nding algorithm is used to nd the optimal path in a graph, previously computed, between the text lines. Once the text lines are extracted, words are localized inside the text lines using a word segmentation technique from the state of the
art. Classical word spotting approaches can be improved using the contextual information of the documents. We have introduced a new framework, oriented to handwritten documents that present a highly structure, to extract information making use of context. The framework is an ecient tool for semi-automatic transcription that uses the contextual information to achieve better results than classical word spotting approaches. The contextual information is
automatically discovered by recognizing repetitive structures and categorizing all the words according to semantic classes. The most frequent words in each semantic cluster are extracted and the same text is used to transcribe all them. The experimental results achieved in this thesis outperform classical word spotting approaches demonstrating the suitability of the proposed ensemble architecture for spotting words in historical handwritten documents using contextual information.
 
  Address  
  Corporate Author Thesis Ph.D. thesis  
  Publisher Ediciones Graficas Rey Place of Publication Editor (down) Josep Llados;Alicia Fornes  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN 978-84-940902-7-1 Medium  
  Area Expedition Conference  
  Notes DAG; 600.077 Approved no  
  Call Number Admin @ si @ Fer2014 Serial 2573  
Permanent link to this record
 

 
Author Pau Riba edit  isbn
openurl 
  Title Distilling Structure from Imagery: Graph-based Models for the Interpretation of Document Images Type Book Whole
  Year 2020 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract From its early stages, the community of Pattern Recognition and Computer Vision has considered the importance of leveraging the structural information when understanding images. Usually, graphs have been proposed as a suitable model to represent this kind of information due to their flexibility and representational power able to codify both, the components, objects, or entities and their pairwise relationship. Even though graphs have been successfully applied to a huge variety of tasks, as a result of their symbolic and relational nature, graphs have always suffered from some limitations compared to statistical approaches. Indeed, some trivial mathematical operations do not have an equivalence in the graph domain. For instance, in the core of many pattern recognition applications, there is a need to compare two objects. This operation, which is trivial when considering feature vectors defined in \(\mathbb{R}^n\), is not properly defined for graphs.


In this thesis, we have investigated the importance of the structural information from two perspectives, the traditional graph-based methods and the new advances on Geometric Deep Learning. On the one hand, we explore the problem of defining a graph representation and how to deal with it on a large scale and noisy scenario. On the other hand, Graph Neural Networks are proposed to first redefine a Graph Edit Distance methodologies as a metric learning problem, and second, to apply them in a real use case scenario for the detection of repetitive patterns which define tables in invoice documents. As experimental framework, we have validated the different methodological contributions in the domain of Document Image Analysis and Recognition.
 
  Address  
  Corporate Author Thesis Ph.D. thesis  
  Publisher Ediciones Graficas Rey Place of Publication Editor (down) Josep Llados;Alicia Fornes  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN 978-84-121011-6-4 Medium  
  Area Expedition Conference  
  Notes DAG; 600.121 Approved no  
  Call Number Admin @ si @ Rib20 Serial 3478  
Permanent link to this record
 

 
Author Marçal Rusiñol edit  openurl
  Title Geometric and Structural-based Symbol Spotting. Application to Focused Retrieval in Graphic Document Collections Type Book Whole
  Year 2009 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract Usually, pattern recognition systems consist of two main parts. On the one hand, the data acquisition and, on the other hand, the classification of this data on a certain category. In order to recognize which category a certain query element belongs to, a set of pattern models must be provided beforehand. An off-line learning stage is needed to train the classifier and to offer a robust classification of the patterns. Within the pattern recognition field, we are interested in the recognition of graphics and, in particular, on the analysis of documents rich in graphical information. In this context, one of the main concerns is to see if the proposed systems remain scalable with respect to the data volume so as it can handle growing amounts of symbol models. In order to avoid to work with a database of reference symbols, symbol spotting and on-the-fly symbol recognition methods have been introduced in the past years.

Generally speaking, the symbol spotting problem can be defined as the identification of a set of regions of interest from a document image which are likely to contain an instance of a certain queriedn symbol without explicitly applying the whole pattern recognition scheme. Our application framework consists on indexing a collection of graphic-rich document images. This collection is
queried by example with a single instance of the symbol to look for and, by means of symbol spotting methods we retrieve the regions of interest where the symbol is likely to appear within the documents. This kind of applications are known as focused retrieval methods.

In order that the focused retrieval application can handle large collections of documents there is a need to provide an efficient access to the large volume of information that might be stored. We use indexing strategies in order to efficiently retrieve by similarity the locations where a certain part of the symbol appears. In that scenario, graphical patterns should be used as indices for accessing and navigating the collection of documents.
These indexing mechanism allow the user to search for similar elements using graphical information rather than textual queries.

Along this thesis we present a spotting architecture and different methods aiming to build a complete focused retrieval application dealing with a graphic-rich document collections. In addition, a protocol to evaluate the performance of symbol
spotting systems in terms of recognition abilities, location accuracy and scalability is proposed.
 
  Address Barcelona (Spain)  
  Corporate Author Thesis Ph.D. thesis  
  Publisher Ediciones Graficas Rey Place of Publication Editor (down) Josep Llados  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes DAG Approved no  
  Call Number DAG @ dag @ Rus2009 Serial 1264  
Permanent link to this record
 

 
Author Agnes Borras edit   pdf
openurl 
  Title Contributions to the Content-Based Image Retrieval Using Pictorial Queries Type Book Whole
  Year 2009 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract The broad access to digital cameras, personal computers and Internet, has lead to the generation of large volumes of data in digital form. If we want an effective usage of this huge amount of data, we need automatic tools to allow the retrieval of relevant information. Image data is a particular type of information that requires specific techniques of description and indexing. The computer vision field that studies these kind of techniques is called Content-Based Image Retrieval (CBIR). Instead of using text-based descriptions, a system of CBIR deals on properties that are inherent in the images themselves. Hence, the feature-based description provides a universal via of image expression in contrast with the more than 6000 languages spoken in the world.
Nowadays, the CBIR is a dynamic focus of research that has derived in important applications for many professional groups. The potential fields of application can be such diverse as: the medical domain, the crime prevention, the protection of the intel- lectual property, the journalism, the graphic design, the web search, the preservation of cultural heritage, etc.
The definition on the role of the user is a key point in the development of a CBIR application. The user is in charge to formulate the queries from which the images are retrieved. We have centered our attention on the image retrieval techniques that use queries based on pictorial information. We have identified a taxonomy composed by four main query paradigms: query-by-selection, query-by-iconic-composition, query- by-sketch and query-by-paint. Each one of these paradigms allows a different degree of user expressivity. From a simple image selection, to a complete painting of the query, the user takes control of the input in the CBIR system.
Along the chapters of this thesis we have analyzed the influence that each query paradigm imposes in the internal operations of a CBIR system. Moreover, we have proposed a set of contributions that we have exemplified in the context of a final application.
 
  Address Barcelona (Spain)  
  Corporate Author Thesis Ph.D. thesis  
  Publisher Ediciones Graficas Rey Place of Publication Bellaterra Editor (down) Josep Llados  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes DAG; Approved no  
  Call Number DAG @ dag @ Bor2009; IAM @ iam @ Bor2009 Serial 1269  
Permanent link to this record
 

 
Author Agata Lapedriza edit  openurl
  Title Multitask Learning Techniques for Automatic Face Classification Type Book Whole
  Year 2009 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract Automatic face classification is currently a popular research area in Computer Vision. It involves several subproblems, such as subject recognition, gender classification or subject verification.

Current systems of automatic face classification need a large amount of training data to robustly learn a task. However, the collection of labeled data is usually a difficult issue. For this reason, the research on methods that are able to learn from a small sized training set is essential.

The dependency on the abundance of training data is not so evident in human learning processes. We are able to learn from a very small number of examples, given that we use, additionally, some prior knowledge to learn a new task. For example, we frequently find patterns and analogies from other domains to reuse them in new situations, or exploit training data from other experiences.

In computer science, Multitask Learning is a new Machine Learning approach that studies this idea of knowledge transfer among different tasks, to overcome the effects of the small sample sized problem.

This thesis explores, proposes and tests some Multitask Learning methods specially developed for face classification purposes. Moreover, it presents two more contributions dealing with the small sample sized problem, out of the Multitask Learning context. The first one is a method to extract external face features, to be used as an additional information source in automatic face classification problems. The second one is an empirical study on the most suitable face image resolution to perform automatic subject recognition.
 
  Address Barcelona (Spain)  
  Corporate Author Thesis Ph.D. thesis  
  Publisher Ediciones Graficas Rey Place of Publication Editor (down) Jordi Vitria;David Masip  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes OR;MV Approved no  
  Call Number BCNPCL @ bcnpcl @ Lap2009 Serial 1263  
Permanent link to this record
 

 
Author David Guillamet edit  openurl
  Title Statistical Local Appearance Models for Object Recognition Type Book Whole
  Year 2004 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract  
  Address Bellaterra  
  Corporate Author Thesis Ph.D. thesis  
  Publisher Place of Publication Editor (down) Jordi Vitria  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes Approved no  
  Call Number Admin @ si @ Gui2004 Serial 444  
Permanent link to this record
 

 
Author David Masip edit  isbn
openurl 
  Title Face Classification Using Discriminative Features and Classifier Combination Type Book Whole
  Year 2005 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract  
  Address CVC (UAB)  
  Corporate Author Thesis Ph.D. thesis  
  Publisher Place of Publication Editor (down) Jordi Vitria  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN 84-933652-3-8 Medium  
  Area Expedition Conference  
  Notes OR;MV Approved no  
  Call Number Admin @ si @ Mas2005b Serial 602  
Permanent link to this record
 

 
Author Xavier Baro edit  openurl
  Title Probabilistic Darwin Machines: A New Approach to Develop Evolutionary Object Detection Type Book Whole
  Year 2009 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract Ever since computers were invented, we have wondered whether they might perform some of the human quotidian tasks. One of the most studied and still nowadays less understood problem is the capacity to learn from our experiences and how we generalize the knowledge that we acquire. One of that unaware tasks for the persons and that more interest is awakening in different scientific areas since the beginning, is the one that is known as pattern recognition. The creation of models that represent the world that surrounds us, help us for recognizing objects in our environment, to predict situations, to identify behaviors... All this information allows us to adapt ourselves and to interact with our environment. The capacity of adaptation of individuals to their environment has been related to the amount of patterns that are capable of identifying.

This thesis faces the pattern recognition problem from a Computer Vision point of view, taking one of the most paradigmatic and extended approaches to object detection as starting point. After studying this approach, two weak points are identified: The first makes reference to the description of the objects, and the second is a limitation of the learning algorithm, which hampers the utilization of best descriptors.

In order to address the learning limitations, we introduce evolutionary computation techniques to the classical object detection approach.

After testing the classical evolutionary approaches, such as genetic algorithms, we develop a new learning algorithm based on Probabilistic Darwin Machines, which better adapts to the learning problem. Once the learning limitation is avoided, we introduce a new feature set, which maintains the benefits of the classical feature set, adding the ability to describe non localities. This combination of evolutionary learning algorithm and features is tested on different public data sets, outperforming the results obtained by the classical approach.
 
  Address Barcelona (Spain)  
  Corporate Author Thesis Ph.D. thesis  
  Publisher Ediciones Graficas Rey Place of Publication Editor (down) Jordi Vitria  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes OR;HuPBA;MV Approved no  
  Call Number BCNPCL @ bcnpcl @ Bar2009 Serial 1262  
Permanent link to this record
 

 
Author Debora Gil edit   pdf
isbn  openurl
  Title Geometric Differential Operators for Shape Modelling Type Book Whole
  Year 2004 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract Medical imaging feeds research in many computer vision and image processing fields: image filtering, segmentation, shape recovery, registration, retrieval and pattern matching. Because of their low contrast changes and large variety of artifacts and noise, medical imaging processing techniques relying on an analysis of the geometry of image level sets rather than on intensity values result in more robust treatment. From the starting point of treatment of intravascular images, this PhD thesis ad- dresses the design of differential image operators based on geometric principles for a robust shape modelling and restoration. Among all fields applying shape recovery, we approach filtering and segmentation of image objects. For a successful use in real images, the segmentation process should go through three stages: noise removing, shape modelling and shape recovery. This PhD addresses all three topics, but for the sake of algorithms as automated as possible, techniques for image processing will be designed to satisfy three main principles: a) convergence of the iterative schemes to non-trivial states avoiding image degeneration to a constant image and representing smooth models of the originals; b) smooth asymptotic behav- ior ensuring stabilization of the iterative process; c) fixed parameter values ensuring equal (domain free) performance of the algorithms whatever initial images/shapes. Our geometric approach to the generic equations that model the different processes approached enables defining techniques satisfying all the former requirements. First, we introduce a new curvature-based geometric flow for image filtering achieving a good compromise between noise removing and resemblance to original images. Sec- ond, we describe a new family of diffusion operators that restrict their scope to image level curves and serve to restore smooth closed models from unconnected sets of points. Finally, we design a regularization of snake (distance) maps that ensures its smooth convergence towards any closed shape. Experiments show that performance of the techniques proposed overpasses that of state-of-the-art algorithms.  
  Address  
  Corporate Author Thesis Ph.D. thesis  
  Publisher Ediciones Graficas Rey Place of Publication Barcelona (Spain) Editor (down) Jordi Saludes i Closa;Petia Radeva  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN 84-933652-0-3 Medium prit  
  Area Expedition Conference  
  Notes IAM; Approved no  
  Call Number IAM @ iam @ GIL2004 Serial 1517  
Permanent link to this record
 

 
Author Parichehr Behjati Ardakani edit  isbn
openurl 
  Title Towards Efficient and Robust Convolutional Neural Networks for Single Image Super-Resolution Type Book Whole
  Year 2022 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract Single image super-resolution (SISR) is an important task in image processing which aims to enhance the resolution of imaging systems. Recently, SISR has witnessed great strides with the rapid development of deep learning. Recent advances in SISR are mostly devoted to designing deeper and wider networks to enhance their representation learning capacity. However, as the depth of networks increases, deep learning-based methods are faced with the challenge of computational complexity in practice. Moreover, most existing methods rarely leverage the intermediate features and also do not discriminate the computation of features by their frequencial components, thereby achieving relatively low performance. Aside from the aforementioned problems, another desired ability is to upsample images to arbitrary scales using a single model. Most current SISR methods train a dedicated model for each target resolution, losing generality and increasing memory requirements. In this thesis, we address the aforementioned issues and propose solutions to them: i) We present a novel frequency-based enhancement block which treats different frequencies in a heterogeneous way and also models inter-channel dependencies, which consequently enrich the output feature. Thus it helps the network generate more discriminative representations by explicitly recovering finer details. ii) We introduce OverNet which contains two main parts: a lightweight feature extractor that follows a novel recursive framework of skip and dense connections to reduce low-level feature degradation, and an overscaling module that generates an accurate SR image by internally constructing an overscaled intermediate representation of the output features. Then, to solve the problem of reconstruction at arbitrary scale factors, we introduce a novel multi-scale loss, that allows the simultaneous training of all scale factors using a single model. iii) We propose a directional variance attention network which leverages a novel attention mechanism to enhance features in different channels and spatial regions. Moreover, we introduce a novel procedure for using attention mechanisms together with residual blocks to facilitate the preservation of finer details. Finally, we demonstrate that our approaches achieve considerably better performance than previous state-of-the-art methods, in terms of both quantitative and visual quality.  
  Address April, 2022  
  Corporate Author Thesis Ph.D. thesis  
  Publisher Place of Publication Editor (down) Jordi Gonzalez;Xavier Roca;Pau Rodriguez  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN 978-84-124793-1-7 Medium  
  Area Expedition Conference  
  Notes ISE Approved no  
  Call Number Admin @ si @ Beh2022 Serial 3713  
Permanent link to this record
 

 
Author Pau Baiget edit  openurl
  Title Modeling Human Behavior for Image Sequence Understanding and Generation Type Book Whole
  Year 2009 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract The comprehension of animal behavior, especially human behavior, is one of the most ancient and studied problems since the beginning of civilization. The big list of factors that interact to determine a person action require the collaboration of different disciplines, such as psichology, biology, or sociology. In the last years the analysis of human behavior has received great attention also from the computer vision community, given the latest advances in the acquisition of human motion data from image sequences.

Despite the increasing availability of that data, there still exists a gap towards obtaining a conceptual representation of the obtained observations. Human behavior analysis is based on a qualitative interpretation of the results, and therefore the assignment of concepts to quantitative data is linked to a certain ambiguity.

This Thesis tackles the problem of obtaining a proper representation of human behavior in the contexts of computer vision and animation. On the one hand, a good behavior model should permit the recognition and explanation the observed activity in image sequences. On the other hand, such a model must allow the generation of new synthetic instances, which model the behavior of virtual agents.

First, we propose methods to automatically learn the models from observations. Given a set of quantitative results output by a vision system, a normal behavior model is learnt. This results provides a tool to determine the normality or abnormality of future observations. However, machine learning methods are unable to provide a richer description of the observations. We confront this problem by means of a new method that incorporates prior knowledge about the enviornment and about the expected behaviors. This framework, formed by the reasoning engine FMTL and the modeling tool SGT allows the generation of conceptual descriptions of activity in new image sequences. Finally, we demonstrate the suitability of the proposed framework to simulate behavior of virtual agents, which are introduced into real image sequences and interact with observed real agents, thereby easing the generation of augmented reality sequences.

The set of approaches presented in this Thesis has a growing set of potential applications. The analysis and description of behavior in image sequences has its principal application in the domain of smart video--surveillance, in order to detect suspicious or dangerous behaviors. Other applications include automatic sport commentaries, elderly monitoring, road traffic analysis, and the development of semantic video search engines. Alternatively, behavioral virtual agents allow to simulate accurate real situations, such as fires or crowds. Moreover, the inclusion of virtual agents into real image sequences has been widely deployed in the games and cinema industries.
 
  Address Bellaterra (Spain)  
  Corporate Author Thesis Ph.D. thesis  
  Publisher Ediciones Graficas Rey Place of Publication Editor (down) Jordi Gonzalez;Xavier Roca  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes Approved no  
  Call Number Admin @ si @ Bai2009 Serial 1210  
Permanent link to this record
 

 
Author Ignasi Rius edit  isbn
openurl 
  Title Motion Priors for Efficient Bayesian Tracking in Human Sequence Evaluation Type Book Whole
  Year 2010 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract Recovering human motion by visual analysis is a challenging computer vision research
area with a lot of potential applications. Model-based tracking approaches, and in
particular particle lters, formulate the problem as a Bayesian inference task whose
aim is to sequentially estimate the distribution of the parameters of a human body
model over time. These approaches strongly rely on good dynamical and observation
models to predict and update congurations of the human body according to measurements from the image data. However, it is very dicult to design observation
models which extract useful and reliable information from image sequences robustly.
This results specially challenging in monocular tracking given that only one viewpoint
from the scene is available. Therefore, to overcome these limitations strong motion
priors are needed to guide the exploration of the state space.
The work presented in this Thesis is aimed to retrieve the 3D motion parameters
of a human body model from incomplete and noisy measurements of a monocular
image sequence. These measurements consist of the 2D positions of a reduced set of
joints in the image plane. Towards this end, we present a novel action-specic model
of human motion which is trained from several databases of real motion-captured
performances of an action, and is used as a priori knowledge within a particle ltering
scheme.
Body postures are represented by means of a simple and compact stick gure
model which uses direction cosines to represent the direction of body limbs in the 3D
Cartesian space. Then, for a given action, Principal Component Analysis is applied to
the training data to perform dimensionality reduction over the highly correlated input
data. Before the learning stage of the action model, the input motion performances
are synchronized by means of a novel dense matching algorithm based on Dynamic
Programming. The algorithm synchronizes all the motion sequences of the same
action class, nding an optimal solution in real-time.
Then, a probabilistic action model is learnt, based on the synchronized motion
examples, which captures the variability and temporal evolution of full-body motion
within a specic action. In particular, for each action, the parameters learnt are: a
representative manifold for the action consisting of its mean performance, the standard deviation from the mean performance, the mean observed direction vectors from
each motion subsequence of a given length and the expected error at a given time
instant.
Subsequently, the action-specic model is used as a priori knowledge on human
motion which improves the eciency and robustness of the overall particle filtering tracking framework. First, the dynamic model guides the particles according to similar
situations previously learnt. Then, the state space is constrained so only feasible
human postures are accepted as valid solutions at each time step. As a result, the
state space is explored more eciently as the particle set covers the most probable
body postures.
Finally, experiments are carried out using test sequences from several motion
databases. Results point out that our tracker scheme is able to estimate the rough
3D conguration of a full-body model providing only the 2D positions of a reduced
set of joints. Separate tests on the sequence synchronization method and the subsequence probabilistic matching technique are also provided.
 
  Address  
  Corporate Author Thesis Ph.D. thesis  
  Publisher Ediciones Graficas Rey Place of Publication Editor (down) Jordi Gonzalez;Xavier Roca  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN 978-84-937261-9-5 Medium  
  Area Expedition Conference  
  Notes Approved no  
  Call Number Admin @ si @ Riu2010 Serial 1331  
Permanent link to this record
 

 
Author Ivan Huerta edit  isbn
openurl 
  Title Foreground Object Segmentation and Shadow Detection for Video Sequences in Uncontrolled Environments Type Book Whole
  Year 2010 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract This Thesis is mainly divided in two parts. The first one presents a study of motion
segmentation problems. Based on this study, a novel algorithm for mobile-object
segmentation from a static background scene is also presented. This approach is
demonstrated robust and accurate under most of the common problems in motion
segmentation. The second one tackles the problem of shadows in depth. Firstly, a
bottom-up approach based on a chromatic shadow detector is presented to deal with
umbra shadows. Secondly, a top-down approach based on a tracking system has been
developed in order to enhance the chromatic shadow detection.
In our first contribution, a case analysis of motion segmentation problems is presented by taking into account the problems associated with different cues, namely
colour, edge and intensity. Our second contribution is a hybrid architecture which
handles the main problems observed in such a case analysis, by fusing (i) the knowledge from these three cues and (ii) a temporal difference algorithm. On the one hand,
we enhance the colour and edge models to solve both global/local illumination changes
(shadows and highlights) and camouflage in intensity. In addition, local information is
exploited to cope with a very challenging problem such as the camouflage in chroma.
On the other hand, the intensity cue is also applied when colour and edge cues are not
available, such as when beyond the dynamic range. Additionally, temporal difference
is included to segment motion when these three cues are not available, such as that
background not visible during the training period. Lastly, the approach is enhanced
for allowing ghost detection. As a result, our approach obtains very accurate and robust motion segmentation in both indoor and outdoor scenarios, as quantitatively and
qualitatively demonstrated in the experimental results, by comparing our approach
with most best-known state-of-the-art approaches.
Motion Segmentation has to deal with shadows to avoid distortions when detecting
moving objects. Most segmentation approaches dealing with shadow detection are
typically restricted to penumbra shadows. Therefore, such techniques cannot cope
well with umbra shadows. Consequently, umbra shadows are usually detected as part
of moving objects.
Firstly, a bottom-up approach for detection and removal of chromatic moving
shadows in surveillance scenarios is proposed. Secondly, a top-down approach based
on kalman filters to detect and track shadows has been developed in order to enhance
the chromatic shadow detection. In the Bottom-up part, the shadow detection approach applies a novel technique based on gradient and colour models for separating
chromatic moving shadows from moving objects.
Well-known colour and gradient models are extended and improved into an invariant colour cone model and an invariant gradient model, respectively, to perform
automatic segmentation while detecting potential shadows. Hereafter, the regions corresponding to potential shadows are grouped by considering ”a bluish effect” and an
edge partitioning. Lastly, (i) temporal similarities between local gradient structures
and (ii) spatial similarities between chrominance angle and brightness distortions are
analysed for all potential shadow regions in order to finally identify umbra shadows.
In the top-down process, after detection of objects and shadows both are tracked
using Kalman filters, in order to enhance the chromatic shadow detection, when it
fails to detect a shadow. Firstly, this implies a data association between the blobs
(foreground and shadow) and Kalman filters. Secondly, an event analysis of the different data association cases is performed, and occlusion handling is managed by a
Probabilistic Appearance Model (PAM). Based on this association, temporal consistency is looked for the association between foregrounds and shadows and their
respective Kalman Filters. From this association several cases are studied, as a result
lost chromatic shadows are correctly detected. Finally, the tracking results are used
as feedback to improve the shadow and object detection.
Unlike other approaches, our method does not make any a-priori assumptions
about camera location, surface geometries, surface textures, shapes and types of
shadows, objects, and background. Experimental results show the performance and
accuracy of our approach in different shadowed materials and illumination conditions.
 
  Address  
  Corporate Author Thesis Ph.D. thesis  
  Publisher Ediciones Graficas Rey Place of Publication Editor (down) Jordi Gonzalez;Xavier Roca  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN 978-84-937261-3-3 Medium  
  Area Expedition Conference  
  Notes Approved no  
  Call Number ISE @ ise @ Hue2010 Serial 1332  
Permanent link to this record
Select All    Deselect All
 |   | 
Details
   print

Save Citations:
Export Records: