toggle visibility Search & Display Options

Select All    Deselect All
 |   | 
Details
   print
  Records Links
Author Carles Fernandez edit  isbn
openurl 
  Title Understanding Image Sequences: the Role of Ontologies in Cognitive Vision Type Book Whole
  Year 2010 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract The increasing ubiquitousness of digital information in our daily lives has positioned
video as a favored information vehicle, and given rise to an astonishing generation of
social media and surveillance footage. This raises a series of technological demands
for automatic video understanding and management, which together with the compromising attentional limitations of human operators, have motivated the research
community to guide its steps towards a better attainment of such capabilities. As
a result, current trends on cognitive vision promise to recognize complex events and
self-adapt to different environments, while managing and integrating several types of
knowledge. Future directions suggest to reinforce the multi-modal fusion of information sources and the communication with end-users.
In this thesis we tackle the problem of recognizing and describing meaningful
events in video sequences from different domains, and communicating the resulting
knowledge to end-users by means of advanced interfaces for human–computer interaction. This problem is addressed by designing the high-level modules of a cognitive
vision framework exploiting ontological knowledge. Ontologies allow us to define the
relevant concepts in a domain and the relationships among them; we prove that the
use of ontologies to organize, centralize, link, and reuse different types of knowledge
is a key factor in the materialization of our objectives.
The proposed framework contributes to: (i) automatically learn the characteristics
of different scenarios in a domain; (ii) reason about uncertain, incomplete, or vague
information from visual –camera’s– or linguistic –end-user’s– inputs; (iii) derive plausible interpretations of complex events from basic spatiotemporal developments; (iv)
facilitate natural interfaces that adapt to the needs of end-users, and allow them to
communicate efficiently with the system at different levels of interaction; and finally,
(v) find mechanisms to guide modeling processes, maintain and extend the resulting
models, and to exploit multimodal resources synergically to enhance the former tasks.
We describe a holistic methodology to achieve these goals. First, the use of prior
taxonomical knowledge is proved useful to guide MAP-MRF inference processes in
the automatic identification of semantic regions, with independence of a particular scenario. Towards the recognition of complex video events, we combine fuzzy
metric-temporal reasoning with SGTs, thus assessing high-level interpretations from
spatiotemporal data. Here, ontological resources like T–Boxes, onomasticons, or factual databases become useful to derive video indexing and retrieval capabilities, and
also to forward highlighted content to smart user interfaces. There, we explore the
application of ontologies to discourse analysis and cognitive linguistic principles, or scene augmentation techniques towards advanced communication by means of natural language dialogs and synthetic visualizations. Ontologies become fundamental to
coordinate, adapt, and reuse the different modules in the system.
The suitability of our ontological framework is demonstrated by a series of applications that especially benefit the field of smart video surveillance, viz. automatic generation of linguistic reports about the content of video sequences in multiple natural
languages; content-based filtering and summarization of these reports; dialogue-based
interfaces to query and browse video contents; automatic learning of semantic regions
in a scenario; and tools to evaluate the performance of components and models in the
system, via simulation and augmented reality.
 
  Address  
  Corporate Author Thesis (up) Ph.D. thesis  
  Publisher Ediciones Graficas Rey Place of Publication Editor Jordi Gonzalez;Xavier Roca  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN 978-84-937261-2-6 Medium  
  Area Expedition Conference  
  Notes Approved no  
  Call Number Admin @ si @ Fer2010a Serial 1333  
Permanent link to this record
 

 
Author Joan Mas edit  isbn
openurl 
  Title A Syntactic Pattern Recognition Approach based on a Distribution Tolerant Adjacency Grammar and a Spatial Indexed Parser. Application to Sketched Document Recognition Type Book Whole
  Year 2010 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract Sketch recognition is a discipline which has gained an increasing interest in the last
20 years. This is due to the appearance of new devices such as PDA, Tablet PC’s
or digital pen & paper protocols. From the wide range of sketched documents we
focus on those that represent structured documents such as: architectural floor-plans,
engineering drawing, UML diagrams, etc. To recognize and understand these kinds
of documents, first we have to recognize the different compounding symbols and then
we have to identify the relations between these elements. From the way that a sketch
is captured, there are two categories: on-line and off-line. On-line input modes refer
to draw directly on a PDA or a Tablet PC’s while off-line input modes refer to scan
a previously drawn sketch.
This thesis is an overlapping of three different areas on Computer Science: Pattern
Recognition, Document Analysis and Human-Computer Interaction. The aim of this
thesis is to interpret sketched documents independently on whether they are captured
on-line or off-line. For this reason, the proposed approach should contain the following
features. First, as we are working with sketches the elements present in our input
contain distortions. Second, as we would work in on-line or off-line input modes, the
order in the input of the primitives is indifferent. Finally, the proposed method should
be applied in real scenarios, its response time must be slow.
To interpret a sketched document we propose a syntactic approach. A syntactic
approach is composed of two correlated components: a grammar and a parser. The
grammar allows describing the different elements on the document as well as their
relations. The parser, given a document checks whether it belongs to the language
generated by the grammar or not. Thus, the grammar should be able to cope with
the distortions appearing on the instances of the elements. Moreover, it would be
necessary to define a symbol independently of the order of their primitives. Concerning to the parser when analyzing 2D sentences, it does not assume an order in the
primitives. Then, at each new primitive in the input, the parser searches among the
previous analyzed symbols candidates to produce a valid reduction.
Taking into account these features, we have proposed a grammar based on Adjacency Grammars. This kind of grammars defines their productions as a multiset
of symbols rather than a list. This allows describing a symbol without an order in
their components. To cope with distortion we have proposed a distortion model.
This distortion model is an attributed estimated over the constraints of the grammar and passed through the productions. This measure gives an idea on how far is the
symbol from its ideal model. In addition to the distortion on the constraints other
distortions appear when working with sketches. These distortions are: overtracing,
overlapping, gaps or spurious strokes. Some grammatical productions have been defined to cope with these errors. Concerning the recognition, we have proposed an
incremental parser with an indexation mechanism. Incremental parsers analyze the
input symbol by symbol given a response to the user when a primitive is analyzed.
This makes incremental parser suitable to work in on-line as well as off-line input
modes. The parser has been adapted with an indexation mechanism based on a spatial division. This indexation mechanism allows setting the primitives in the space
and reducing the search to a neighbourhood.
A third contribution is a grammatical inference algorithm. This method given a
set of symbols captures the production describing it. In the field of formal languages,
different approaches has been proposed but in the graphical domain not so much work
is done in this field. The proposed method is able to capture the production from
a set of symbol although they are drawn in different order. A matching step based
on the Haussdorff distance and the Hungarian method has been proposed to match
the primitives of the different symbols. In addition the proposed approach is able to
capture the variability in the parameters of the constraints.
From the experimental results, we may conclude that we have proposed a robust
approach to describe and recognize sketches. Moreover, the addition of new symbols
to the alphabet is not restricted to an expert. Finally, the proposed approach has
been used in two real scenarios obtaining a good performance.
 
  Address  
  Corporate Author Thesis (up) Ph.D. thesis  
  Publisher Ediciones Graficas Rey Place of Publication Editor Gemma Sanchez;Josep Llados  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN 978-84-937261-4-0 Medium  
  Area Expedition Conference  
  Notes DAG Approved no  
  Call Number DAG @ dag @ Mas2010 Serial 1334  
Permanent link to this record
 

 
Author Francisco Javier Orozco edit  isbn
openurl 
  Title Human Emotion Evaluation on Facial Image Sequences Type Book Whole
  Year 2010 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract Psychological evidence has emphasized the importance of affective behaviour understanding due to its high impact in nowadays interaction humans and computers. All
type of affective and behavioural patterns such as gestures, emotions and mental
states are highly displayed through the face, head and body. Therefore, this thesis is
focused to analyse affective behaviours on head and face. To this end, head and facial
movements are encoded by using appearance based tracking methods. Specifically,
a wise combination of deformable models captures rigid and non-rigid movements of
different kinematics; 3D head pose, eyebrows, mouth, eyelids and irises are taken into
account as basis for extracting features from databases of video sequences. This approach combines the strengths of adaptive appearance models, optimization methods
and backtracking techniques.
For about thirty years, computer sciences have addressed the investigation on
human emotions to the automatic recognition of six prototypic emotions suggested
by Darwin and systematized by Paul Ekman in the seventies. The Facial Action
Coding System (FACS) which uses discrete movements of the face (called Action
units or AUs) to code the six facial emotions named anger, disgust, fear, happy-Joy,
sadness and surprise. However, human emotions are much complex patterns that
have not received the same attention from computer scientists.
Simon Baron-Cohen proposed a new taxonomy of emotions and mental states
without a system coding of the facial actions. These 426 affective behaviours are
more challenging for the understanding of human emotions. Beyond of classically
classifying the six basic facial expressions, more subtle gestures, facial actions and
spontaneous emotions are considered here. By assessing confidence on the recognition
results, exploring spatial and temporal relationships of the features, some methods are
combined and enhanced for developing new taxonomy of expressions and emotions.
The objective of this dissertation is to develop a computer vision system, including both facial feature extraction, expression recognition and emotion understanding
by building a bottom-up reasoning process. Building a detailed taxonomy of human
affective behaviours is an interesting challenge for head-face-based image analysis
methods. In this paper, we exploit the strengths of Canonical Correlation Analysis
(CCA) to enhance an on-line head-face tracker. A relationship between head pose and
local facial movements is studied according to their cognitive interpretation on affective expressions and emotions. Active Shape Models are synthesized for AAMs based
on CCA-regression. Head pose and facial actions are fused into a maximally correlated space in order to assess expressiveness, confidence and classification in a CBR system. The CBR solutions are also correlated to the cognitive features, which allow
avoiding exhaustive search when recognizing new head-face features. Subsequently,
Support Vector Machines (SVMs) and Bayesian Networks are applied for learning the
spatial relationships of facial expressions. Similarly, the temporal evolution of facial
expressions, emotion and mental states are analysed based on Factorized Dynamic
Bayesian Networks (FaDBN).
As results, the bottom-up system recognizes six facial expressions, six basic emotions and six mental states, plus enhancing this categorization with confidence assessment at each level, intensity of expressions and a complete taxonomy
 
  Address  
  Corporate Author Thesis (up) Ph.D. thesis  
  Publisher Ediciones Graficas Rey Place of Publication Editor Jordi Gonzalez;Xavier Roca  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN 978-84-936529-3-7 Medium  
  Area Expedition Conference  
  Notes Approved no  
  Call Number Admin @ si @ Oro2010 Serial 1335  
Permanent link to this record
 

 
Author Jose Manuel Alvarez edit  isbn
openurl 
  Title Combining Context and Appearance for Road Detection Type Book Whole
  Year 2010 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract Road traffic crashes have become a major cause of death and injury throughout the world.
Hence, in order to improve road safety, the automobile manufacture is moving towards the
development of vehicles with autonomous functionalities such as keeping in the right lane, safe distance keeping between vehicles or regulating the speed of the vehicle according to the traffic conditions. A key component of these systems is vision–based road detection that aims to detect the free road surface ahead the moving vehicle. Detecting the road using a monocular vision system is very challenging since the road is an outdoor scenario imaged from a mobile platform. Hence, the detection algorithm must be able to deal with continuously changing imaging conditions such as the presence ofdifferent objects (vehicles, pedestrians), different environments (urban, highways, off–road), different road types (shape, color), and different imaging conditions (varying illumination, different viewpoints and changing weather conditions). Therefore, in this thesis, we focus on vision–based road detection using a single color camera. More precisely, we first focus on analyzing and grouping pixels according to their low–level properties. In this way, two different approaches are presented to exploit
color and photometric invariance. Then, we focus the research of the thesis on exploiting context information. This information provides relevant knowledge about the road not using pixel features from road regions but semantic information from the analysis of the scene.
In this way, we present two different approaches to infer the geometry of the road ahead
the moving vehicle. Finally, we focus on combining these context and appearance (color)
approaches to improve the overall performance of road detection algorithms. The qualitative and quantitative results presented in this thesis on real–world driving sequences show that the proposed method is robust to varying imaging conditions, road types and scenarios going beyond the state–of–the–art.
 
  Address  
  Corporate Author Thesis (up) Ph.D. thesis  
  Publisher Ediciones Graficas Rey Place of Publication Editor Antonio Lopez;Theo Gevers  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN 978-84-937261-8-8 Medium  
  Area Expedition Conference  
  Notes ADAS Approved no  
  Call Number Admin @ si @ Alv2010 Serial 1454  
Permanent link to this record
 

 
Author Partha Pratim Roy edit  isbn
openurl 
  Title Multi-Oriented and Multi-Scaled Text Character Analysis and Recognition in Graphical Documents and their Applications to Document Image Retrieval Type Book Whole
  Year 2010 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract With the advent research of Document Image Analysis and Recognition (DIAR), an
important line of research is explored on indexing and retrieval of graphics rich documents. It aims at finding relevant documents relying on segmentation and recognition
of text and graphics components underlying in non-standard layout where commercial
OCRs can not be applied due to complexity. This thesis is focused towards text information extraction approaches in graphical documents and retrieval of such documents
using text information.
Automatic text recognition in graphical documents (map, engineering drawing,
etc.) involves many challenges because text characters are usually printed in multioriented and multi-scale way along with different graphical objects. Text characters
are used to annotate the graphical curve lines and hence, many times they follow
curvi-linear paths too. For OCR of such documents, individual text lines and their
corresponding words/characters need to be extracted.
For recognition of multi-font, multi-scale and multi-oriented characters, we have
proposed a feature descriptor for character shape using angular information from contour pixels to take care of the invariance nature. To improve the efficiency of OCR, an
approach towards the segmentation of multi-oriented touching strings into individual
characters is also discussed. Convex hull based background information is used to
segment a touching string into possible primitive segments and later these primitive
segments are merged to get optimum segmentation using dynamic programming. To
overcome the touching/overlapping problem of text with graphical lines, a character
spotting approach using SIFT and skeleton information is included. Afterwards, we
propose a novel method to extract individual curvi-linear text lines using the foreground and background information of the characters of the text and a water reservoir
concept is used to utilize the background information.
We have also formulated the methodologies for graphical document retrieval applications using query words and seals. The retrieval approaches are performed using
recognition results of individual components in the document. Given a query text,
the system extracts positional knowledge from the query word and uses the same to
generate hypothetical locations in the document. Indexing of documents is also performed based on automatic detection of seals from documents containing cluttered
background. A seal is characterized by scale and rotation invariant spatial feature
descriptors computed from labelled text characters and a concept based on the Generalized Hough Transform is used to locate the seal in documents.
 
  Address  
  Corporate Author Thesis (up) Ph.D. thesis  
  Publisher Ediciones Graficas Rey Place of Publication Editor Josep Llados;Umapada Pal  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN 978-84-937261-7-1 Medium  
  Area Expedition Conference  
  Notes Approved no  
  Call Number Admin @ si @ Roy2010 Serial 1455  
Permanent link to this record
Select All    Deselect All
 |   | 
Details
   print

Save Citations:
Export Records: