Publicacions CVC -- Query Results

[111–120] << 121 122 123 124 125 126 127 128 129 130 >> [131–140]

Details

Records
Author	Adrien Pavao; Isabelle Guyon; Anne-Catherine Letournel; Dinh-Tuan Tran; Xavier Baro; Hugo Jair Escalante; Sergio Escalera; Tyler Thomas; Zhen Xu
Title	CodaLab Competitions: An Open Source Platform to Organize Scientific Challenges			Type	Journal Article
Year	2023	Publication	Journal of Machine Learning Research	Abbreviated Journal	JMLR
Volume		Issue		Pages
Keywords
Abstract	CodaLab Competitions is an open source web platform designed to help data scientists and research teams to crowd-source the resolution of machine learning problems through the organization of competitions, also called challenges or contests. CodaLab Competitions provides useful features such as multiple phases, results and code submissions, multi-score leaderboards, and jobs running inside Docker containers. The platform is very flexible and can handle large scale experiments, by allowing organizers to upload large datasets and provide their own CPU or GPU compute workers.
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes	HUPBA			Approved	no
Call Number	Admin @ si @ PGL2023			Serial	3973
Permanent link to this record



Author	Karel Paleček; David Geronimo; Frederic Lerasle
Title	Pre-attention cues for person detection			Type	Conference Article
Year	2012	Publication	Cognitive Behavioural Systems, COST 2102 International Training School	Abbreviated Journal
Volume		Issue		Pages	225-235
Keywords
Abstract	Current state-of-the-art person detectors have been proven reliable and achieve very good detection rates. However, the performance is often far from real time, which limits their use to low resolution images only. In this paper, we deal with candidate window generation problem for person detection, i.e. we want to reduce the computational complexity of a person detector by reducing the number of regions that has to be evaluated. We base our work on Alexe’s paper [1], which introduced several pre-attention cues for generic object detection. We evaluate these cues in the context of person detection and show that their performance degrades rapidly for scenes containing multiple objects of interest such as pictures from urban environment. We extend this set by new cues, which better suits our class-specific task. The cues are designed to be simple and efficient, so that they can be used in the pre-attention phase of a more complex sliding window based person detector.
Address	Dresden, Germany
Corporate Author				Thesis
Publisher	Springer Berlin Heidelberg	Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title	LNCS
Series Volume		Series Issue		Edition
ISSN	0302-9743	ISBN	978-3-642-34583-8	Medium
Area		Expedition		Conference	COST-TS
Notes	ADAS			Approved	no
Call Number	Admin @ si @ PGL2012			Serial	2148
Permanent link to this record



Author	Y. Patel; Lluis Gomez; Raul Gomez; Marçal Rusiñol; Dimosthenis Karatzas; C.V. Jawahar
Title	TextTopicNet-Self-Supervised Learning of Visual Features Through Embedding Images on Semantic Text Spaces			Type	Miscellaneous
Year	2018	Publication	Arxiv	Abbreviated Journal
Volume		Issue		Pages
Keywords
Abstract	The immense success of deep learning based methods in computer vision heavily relies on large scale training datasets. These richly annotated datasets help the network learn discriminative visual features. Collecting and annotating such datasets requires a tremendous amount of human effort and annotations are limited to popular set of classes. As an alternative, learning visual features by designing auxiliary tasks which make use of freely available self-supervision has become increasingly popular in the computer vision community. In this paper, we put forward an idea to take advantage of multi-modal context to provide self-supervision for the training of computer vision algorithms. We show that adequate visual features can be learned efficiently by training a CNN to predict the semantic textual context in which a particular image is more probable to appear as an illustration. More specifically we use popular text embedding techniques to provide the self-supervision for the training of deep CNN.
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes	DAG; 600.084; 601.338; 600.121			Approved	no
Call Number	Admin @ si @ PGG2018			Serial	3177
Permanent link to this record



Author	Victor Ponce; Mario Gorga; Xavier Baro; Petia Radeva; Sergio Escalera
Title	Análisis de la expresión oral y gestual en proyectos fin de carrera vía un sistema de visión artificial			Type	Journal Article
Year	2011	Publication	ReVisión	Abbreviated Journal
Volume	4	Issue	1	Pages
Keywords
Abstract	La comunicación y expresión oral es una competencia de especial relevancia en el EEES. No obstante, en muchas enseñanzas superiores la puesta en práctica de esta competencia ha sido relegada principalmente a la presentación de proyectos fin de carrera. Dentro de un proyecto de innovación docente, se ha desarrollado una herramienta informática para la extracción de información objetiva para el análisis de la expresión oral y gestual de los alumnos. El objetivo es dar un “feedback” a los estudiantes que les permita mejorar la calidad de sus presentaciones. El prototipo inicial que se presenta en este trabajo permite extraer de forma automática información audiovisual y analizarla mediante técnicas de aprendizaje. El sistema ha sido aplicado a 15 proyectos fin de carrera y 15 exposiciones dentro de una asignatura de cuarto curso. Los resultados obtenidos muestran la viabilidad del sistema para sugerir factores que ayuden tanto en el éxito de la comunicación así como en los criterios de evaluación.
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN	1989-1199	ISBN		Medium
Area		Expedition		Conference
Notes	HuPBA; MILAB;MV			Approved	no
Call Number	Admin @ si @ PGB2011d			Serial	2514
Permanent link to this record



Author	Victor Ponce; Mario Gorga; Xavier Baro; Petia Radeva; Sergio Escalera
Title	Analisis de la Expresion Oral y Gestual en Proyectos Fin de Carrera Via un Sistema de Vision Artificial			Type	Miscellaneous
Year	2011	Publication	Revista electronica de la asociacion de enseñantes universitarios de la informatica AENUI	Abbreviated Journal	ReVision
Volume	4	Issue	1	Pages	8-18
Keywords
Abstract	La comunicación y expresión oral es una competencia de especial relevancia en el EEES. No obstante, en muchas enseñanzas superiores la puesta en práctica de esta competencia ha sido relegada principalmente a la presentación de proyectos fin de carrera. Dentro de un proyecto de innovación docente, se ha desarrollado una herramienta informática para la extracción de información objetiva para el análisis de la expresión oral y gestual de los alumnos. El objetivo es dar un “feedback” a los estudiantes que les permita mejorar la calidad de sus presentaciones. El prototipo inicial que se presenta en este trabajo permite extraer de forma automática información audiovisual y analizarla mediante técnicas de aprendizaje. El sistema ha sido aplicado a 15 proyectos fin de carrera y 15 exposiciones dentro de una asignatura de cuarto curso. Los resultados obtenidos muestran la viabilidad del sistema para sugerir factores que ayuden tanto en el éxito de la comunicación así como en los criterios de evaluación.
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN	1989-1199	ISBN		Medium
Area		Expedition		Conference
Notes	MILAB;HuPBA;MV			Approved	no
Call Number	Admin @ si @ PGB2011c			Serial	1783
Permanent link to this record



Author	Victor Ponce; Mario Gorga; Xavier Baro; Sergio Escalera
Title	Human Behavior Analysis from Video Data Using Bag-of-Gestures			Type	Conference Article
Year	2011	Publication	22nd International Joint Conference on Artificial Intelligence	Abbreviated Journal
Volume	3	Issue		Pages	2836-2837
Keywords
Abstract	Human Behavior Analysis in Uncontrolled Environments can be categorized in two main challenges: 1) Feature extraction and 2) Behavior analysis from a set of corporal language vocabulary. In this work, we present our achievements characterizing some simple behaviors from visual data on different real applications and discuss our plan for future work: low level vocabulary definition from bag-of-gesture units and high level modelling and inference of human behaviors.
Address	Barcelona
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN	978-1-57735-516-8	Medium
Area		Expedition		Conference	IJCAI
Notes	HuPBA;MV			Approved	no
Call Number	Admin @ si @ PGB2011b			Serial	1770
Permanent link to this record



Author	Marco Pedersoli; Jordi Gonzalez; Andrew Bagdanov; Xavier Roca
Title	Efficient Discriminative Multiresolution Cascade for Real-Time Human Detection Applications			Type	Journal Article
Year	2011	Publication	Pattern Recognition Letters	Abbreviated Journal	PRL
Volume	32	Issue	13	Pages	1581-1587
Keywords
Abstract	Human detection is fundamental in many machine vision applications, like video surveillance, driving assistance, action recognition and scene understanding. However in most of these applications real-time performance is necessary and this is not achieved yet by current detection methods. This paper presents a new method for human detection based on a multiresolution cascade of Histograms of Oriented Gradients (HOG) that can highly reduce the computational cost of detection search without affecting accuracy. The method consists of a cascade of sliding window detectors. Each detector is a linear Support Vector Machine (SVM) composed of HOG features at different resolutions, from coarse at the first level to fine at the last one. In contrast to previous methods, our approach uses a non-uniform stride of the sliding window that is defined by the feature resolution and allows the detection to be incrementally refined as going from coarse-to-fine resolution. In this way, the speed-up of the cascade is not only due to the fewer number of features computed at the first levels of the cascade, but also to the reduced number of windows that need to be evaluated at the coarse resolution. Experimental results show that our method reaches a detection rate comparable with the state-of-the-art of detectors based on HOG features, while at the same time the detection search is up to 23 times faster.
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes	ISE			Approved	no
Call Number	Admin @ si @ PGB2011a			Serial	1707
Permanent link to this record



Author	Jaykishan Patel; Alban Flachot; Javier Vazquez; David H. Brainard; Thomas S. A. Wallis; Marcus A. Brubaker; Richard F. Murray
Title	A deep convolutional neural network trained to infer surface reflectance is deceived by mid-level lightness illusions			Type	Journal Article
Year	2023	Publication	Journal of Vision	Abbreviated Journal	JV
Volume	23	Issue	9	Pages	4817-4817
Keywords
Abstract	A long-standing view is that lightness illusions are by-products of strategies employed by the visual system to stabilize its perceptual representation of surface reflectance against changes in illumination. Computationally, one such strategy is to infer reflectance from the retinal image, and to base the lightness percept on this inference. CNNs trained to infer reflectance from images have proven successful at solving this problem under limited conditions. To evaluate whether these CNNs provide suitable starting points for computational models of human lightness perception, we tested a state-of-the-art CNN on several lightness illusions, and compared its behaviour to prior measurements of human performance. We trained a CNN (Yu & Smith, 2019) to infer reflectance from luminance images. The network had a 30-layer hourglass architecture with skip connections. We trained the network via supervised learning on 100K images, rendered in Blender, each showing randomly placed geometric objects (surfaces, cubes, tori, etc.), with random Lambertian reflectance patterns (solid, Voronoi, or low-pass noise), under randomized point+ambient lighting. The renderer also provided the ground-truth reflectance images required for training. After training, we applied the network to several visual illusions. These included the argyle, Koffka-Adelson, snake, White’s, checkerboard assimilation, and simultaneous contrast illusions, along with their controls where appropriate. The CNN correctly predicted larger illusions in the argyle, Koffka-Adelson, and snake images than in their controls. It also correctly predicted an assimilation effect in White's illusion. It did not, however, account for the checkerboard assimilation or simultaneous contrast effects. These results are consistent with the view that at least some lightness phenomena are by-products of a rational approach to inferring stable representations of physical properties from intrinsically ambiguous retinal images. Furthermore, they suggest that CNN models may be a promising starting point for new models of human lightness perception.
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes	MACO; CIC			Approved	no
Call Number	Admin @ si @ PFV2023			Serial	3890
Permanent link to this record



Author	Utkarsh Porwal; Alicia Fornes; Faisal Shafait (eds)
Title	Frontiers in Handwriting Recognition. International Conference on Frontiers in Handwriting Recognition. 18th International Conference, ICFHR 2022			Type	Book Whole
Year	2022	Publication	Frontiers in Handwriting Recognition.	Abbreviated Journal
Volume	13639	Issue		Pages
Keywords
Abstract
Address	ICFHR 2022, Hyderabad, India, December 4–7, 2022
Corporate Author				Thesis
Publisher	Springer	Place of Publication		Editor	Utkarsh Porwal; Alicia Fornes; Faisal Shafait
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title	LNCS
Series Volume		Series Issue		Edition
ISSN		ISBN	978-3-031-21648-0	Medium
Area		Expedition		Conference	ICFHR
Notes	DAG			Approved	no
Call Number	Admin @ si @ PFS2022			Serial	3809
Permanent link to this record



Author	Joana Maria Pujadas-Mora; Alicia Fornes; Oriol Ramos Terrades; Josep Llados; Jialuo Chen; Miquel Valls-Figols; Anna Cabre
Title	The Barcelona Historical Marriage Database and the Baix Llobregat Demographic Database. From Algorithms for Handwriting Recognition to Individual-Level Demographic and Socioeconomic Data			Type	Journal
Year	2022	Publication	Historical Life Course Studies	Abbreviated Journal	HLCS
Volume	12	Issue		Pages	99-132
Keywords	Individual demographic databases; Computer vision, Record linkage; Social mobility; Inequality; Migration; Word spotting; Handwriting recognition; Local censuses; Marriage Licences
Abstract	The Barcelona Historical Marriage Database (BHMD) gathers records of the more than 600,000 marriages celebrated in the Diocese of Barcelona and their taxation registered in Barcelona Cathedral's so-called Marriage Licenses Books for the long period 1451–1905 and the BALL Demographic Database brings together the individual information recorded in the population registers, censuses and fiscal censuses of the main municipalities of the county of Baix Llobregat (Barcelona). In this ongoing collection 263,786 individual observations have been assembled, dating from the period between 1828 and 1965 by December 2020. The two databases started as part of different interdisciplinary research projects at the crossroads of Historical Demography and Computer Vision. Their construction uses artificial intelligence and computer vision methods as Handwriting Recognition to reduce the time of execution. However, its current state still requires some human intervention which explains the implemented crowdsourcing and game sourcing experiences. Moreover, knowledge graph techniques have allowed the application of advanced record linkage to link the same individuals and families across time and space. Moreover, we will discuss the main research lines using both databases developed so far in historical demography.
Address	June 23, 2022
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes	DAG; 600.121; 600.162; 602.230; 600.140			Approved	no
Call Number	Admin @ si @ PFR2022			Serial	3737
Permanent link to this record



Author	Joana Maria Pujadas-Mora; Alicia Fornes; Josep Llados; Gabriel Brea-Martinez; Miquel Valls-Figols
Title	The Baix Llobregat (BALL) Demographic Database, between Historical Demography and Computer Vision (nineteenth–twentieth centuries			Type	Book Chapter
Year	2019	Publication	Nominative Data in Demographic Research in the East and the West: monograph	Abbreviated Journal
Volume		Issue		Pages	29-61
Keywords
Abstract	The Baix Llobregat (BALL) Demographic Database is an ongoing database project containing individual census data from the Catalan region of Baix Llobregat (Spain) during the nineteenth and twentieth centuries. The BALL Database is built within the project ‘NETWORKS: Technology and citizen innovation for building historical social networks to understand the demographic past’ directed by Alícia Fornés from the Center for Computer Vision and Joana Maria Pujadas-Mora from the Center for Demographic Studies, both at the Universitat Autònoma de Barcelona, funded by the Recercaixa program (2017–2019). Its webpage is http://dag.cvc.uab.es/xarxes/.The aim of the project is to develop technologies facilitating massive digitalization of demographic sources, and more specifically the padrones (local censuses), in order to reconstruct historical ‘social’ networks employing computer vision technology. Such virtual networks can be created thanks to the linkage of nominative records compiled in the local censuses across time and space. Thus, digitized versions of individual and family lifespans are established, and individuals and families can be located spatially.
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN	978-5-7996-2656-3	Medium
Area		Expedition		Conference
Notes	DAG; 600.121			Approved	no
Call Number	Admin @ si @ PFL2019			Serial	3351
Permanent link to this record



Author	Joana Maria Pujadas-Mora; Alicia Fornes; Josep Llados; Anna Cabre
Title	Bridging the gap between historical demography and computing: tools for computer-assisted transcription and the analysis of demographic sources			Type	Book Chapter
Year	2016	Publication	The future of historical demography. Upside down and inside out	Abbreviated Journal
Volume		Issue		Pages	127-131
Keywords
Abstract
Address
Corporate Author				Thesis
Publisher	Acco Publishers	Place of Publication		Editor	K.Matthijs; S.Hin; H.Matsuo; J.Kok
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN	978-94-6292-722-3	Medium
Area		Expedition		Conference
Notes	DAG; 600.097			Approved	no
Call Number	Admin @ si @ PFL2016			Serial	2907
Permanent link to this record



Author	Ruben Perez Tito
Title	Exploring the role of Text in Visual Question Answering on Natural Scenes and Documents			Type	Book Whole
Year	2023	Publication	PhD Thesis, Universitat Autonoma de Barcelona-CVC	Abbreviated Journal
Volume		Issue		Pages
Keywords
Abstract	Visual Question Answering (VQA) is the task where given an image and a natural language question, the objective is to generate a natural language answer. At the intersection between computer vision and natural language processing, this task can be seen as a measure of image understanding capabilities, as it requires to reason about objects, actions, colors, positions, the relations between the different elements as well as commonsense reasoning, world knowledge, arithmetic skills and natural language understanding. However, even though the text present in the images conveys important semantically rich information that is explicit and not available in any other form, most VQA methods remained illiterate, largely ignoring the text despite its potential significance. In this thesis, we set out on a journey to bring reading capabilities to computer vision models applied to the VQA task, creating new datasets and methods that can read, reason and integrate the text with other visual cues in natural scene images and documents. In Chapter 3, we address the combination of scene text with visual information to fully understand all the nuances of natural scene images. To achieve this objective, we define a new sub-task of VQA that requires reading the text in the image, and highlight the limitations of the current methods. In addition, we propose a new architecture that integrates both modalities and jointly reasons about textual and visual features. In Chapter 5, we shift the domain of VQA with reading capabilities and apply it on scanned industry document images, providing a high-level end-purpose perspective to Document Understanding, which has been primarily focused on digitizing the document’s contents and extracting key values without considering the ultimate purpose of the extracted information. For this, we create a dataset which requires methods to reason about the unique and challenging elements of documents, such as text, images, tables, graphs and complex layouts, to provide accurate answers in natural language. However, we observed that explicit visual features provide a slight contribution in the overall performance, since the main information is usually conveyed within the text and its position. In consequence, in Chapter 6, we propose VQA on infographic images, seeking for document images with more visually rich elements that require to fully exploit visual information in order to answer the questions. We show the performance gap of different methods when used over industry scanned and infographic images, and propose a new method that integrates the visual features in early stages, which allows the transformer architecture to exploit the visual features during the self-attention operation. Instead, in Chapter 7, we apply VQA on a big collection of single-page documents, where the methods must find which documents are relevant to answer the question, and provide the answer itself. Finally, in Chapter 8, mimicking real-world application problems where systems must process documents with multiple pages, we address the multipage document visual question answering task. We demonstrate the limitations of existing methods, including models specifically designed to process long sequences. To overcome these limitations, we propose a hierarchical architecture that can process long documents, answer questions, and provide the index of the page where the information to answer the question is located as an explainability measure.
Address
Corporate Author				Thesis	Ph.D. thesis
Publisher	IMPRIMA	Place of Publication		Editor	Ernest Valveny
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN	978-84-124793-5-5	Medium
Area		Expedition		Conference
Notes	DAG			Approved	no
Call Number	Admin @ si @ Per2023			Serial	3967
Permanent link to this record



Author	Lluis Pere de las Heras
Title	Syntactic Model for Semantic Document Analysis			Type	Report
Year	2010	Publication	CVC Technical Report	Abbreviated Journal
Volume	158	Issue		Pages
Keywords
Abstract
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes				Approved	no
Call Number	Admin @ si @ Per2010			Serial	1350
Permanent link to this record



Author	Victor Ponce; Sergio Escalera; Marc Perez; Oriol Janes; Xavier Baro
Title	Non-Verbal Communication Analysis in Victim-Offender Mediations			Type	Journal Article
Year	2015	Publication	Pattern Recognition Letters	Abbreviated Journal	PRL
Volume	67	Issue	1	Pages	19-27
Keywords	Victim–Offender Mediation; Multi-modal human behavior analysis; Face and gesture recognition; Social signal processing; Computer vision; Machine learning
Abstract	We present a non-invasive ambient intelligence framework for the semi-automatic analysis of non-verbal communication applied to the restorative justice field. We propose the use of computer vision and social signal processing technologies in real scenarios of Victim–Offender Mediations, applying feature extraction techniques to multi-modal audio-RGB-depth data. We compute a set of behavioral indicators that define communicative cues from the fields of psychology and observational methodology. We test our methodology on data captured in real Victim–Offender Mediation sessions in Catalonia. We define the ground truth based on expert opinions when annotating the observed social responses. Using different state of the art binary classification approaches, our system achieves recognition accuracies of 86% when predicting satisfaction, and 79% when predicting both agreement and receptivity. Applying a regression strategy, we obtain a mean deviation for the predictions between 0.5 and 0.7 in the range [1–5] for the computed social signals.
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN		ISBN		Medium
Area		Expedition		Conference
Notes	HuPBA;MV			Approved	no
Call Number	Admin @ si @ PEP2015			Serial	2583
Permanent link to this record