@PhdThesis{CarlesFernandez2010, author="Carles Fernandez", editor="Jordi Gonzalez and Xavier Roca", title="Understanding Image Sequences: the Role of Ontologies in Cognitive Vision", year="2010", publisher="Ediciones Graficas Rey", abstract="The increasing ubiquitousness of digital information in our daily lives has positionedvideo as a favored information vehicle, and given rise to an astonishing generation ofsocial media and surveillance footage. This raises a series of technological demandsfor automatic video understanding and management, which together with the compromising attentional limitations of human operators, have motivated the researchcommunity to guide its steps towards a better attainment of such capabilities. Asa result, current trends on cognitive vision promise to recognize complex events andself-adapt to different environments, while managing and integrating several types ofknowledge. Future directions suggest to reinforce the multi-modal fusion of information sources and the communication with end-users.In this thesis we tackle the problem of recognizing and describing meaningfulevents in video sequences from different domains, and communicating the resultingknowledge to end-users by means of advanced interfaces for human--computer interaction. This problem is addressed by designing the high-level modules of a cognitivevision framework exploiting ontological knowledge. Ontologies allow us to define therelevant concepts in a domain and the relationships among them; we prove that theuse of ontologies to organize, centralize, link, and reuse different types of knowledgeis a key factor in the materialization of our objectives.The proposed framework contributes to: (i) automatically learn the characteristicsof different scenarios in a domain; (ii) reason about uncertain, incomplete, or vagueinformation from visual --camera{\textquoteright}s-- or linguistic --end-user{\textquoteright}s-- inputs; (iii) derive plausible interpretations of complex events from basic spatiotemporal developments; (iv)facilitate natural interfaces that adapt to the needs of end-users, and allow them tocommunicate efficiently with the system at different levels of interaction; and finally,(v) find mechanisms to guide modeling processes, maintain and extend the resultingmodels, and to exploit multimodal resources synergically to enhance the former tasks.We describe a holistic methodology to achieve these goals. First, the use of priortaxonomical knowledge is proved useful to guide MAP-MRF inference processes inthe automatic identification of semantic regions, with independence of a particular scenario. Towards the recognition of complex video events, we combine fuzzymetric-temporal reasoning with SGTs, thus assessing high-level interpretations fromspatiotemporal data. Here, ontological resources like T--Boxes, onomasticons, or factual databases become useful to derive video indexing and retrieval capabilities, andalso to forward highlighted content to smart user interfaces. There, we explore theapplication of ontologies to discourse analysis and cognitive linguistic principles, or scene augmentation techniques towards advanced communication by means of natural language dialogs and synthetic visualizations. Ontologies become fundamental tocoordinate, adapt, and reuse the different modules in the system.The suitability of our ontological framework is demonstrated by a series of applications that especially benefit the field of smart video surveillance, viz. automatic generation of linguistic reports about the content of video sequences in multiple naturallanguages; content-based filtering and summarization of these reports; dialogue-basedinterfaces to query and browse video contents; automatic learning of semantic regionsin a scenario; and tools to evaluate the performance of components and models in thesystem, via simulation and augmented reality.", optnote="exported from refbase (http://refbase.cvc.uab.es/show.php?record=1333), last updated on Fri, 17 Dec 2021 14:02:49 +0100", isbn="978-84-937261-2-6" }