Publicacions CVC -- Query Results

	Publicacions CVC Home \| Show All \| Simple Search \| Advanced Search \| Add Record \| Import	Login Quick Search: Field: contains: ...
	3256–3270 of 3413 records found matching your query (RSS):

Search & Display Options

Select All Deselect All

[201–210] << 211 212 213 214 215 216 217 218 219 220 >> [221–228]

List View

Citations

Details

	Records
	Author	Claudio Baecchi; Francesco Turchini; Lorenzo Seidenari; Andrew Bagdanov; Alberto del Bimbo
	Title	Fisher vectors over random density forest for object recognition			Type	Conference Article
	Year	2014	Publication	22nd International Conference on Pattern Recognition	Abbreviated Journal
	Volume		Issue		Pages	4328-4333
	Keywords
	Abstract
	Address	Stockholm; Sweden; August 2014
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	ICPR
	Notes	LAMP; 600.079			Approved	no
	Call Number	Admin @ si @ BTS2014			Serial	2518
Permanent link to this record



	Author	Rafael E. Rivadeneira; Angel Sappa; Boris X. Vintimilla; Sabari Nathan; Priya Kansal; Armin Mehri; Parichehr Behjati Ardakani; A.Dalal; A.Akula; D.Sharma; S.Pandey; B.Kumar; J.Yao; R.Wu; KFeng; N.Li; Y.Zhao; H.Patel; V. Chudasama; K.Pjajapati; A.Sarvaiya; K.Upla; K.Raja; R.Ramachandra; C.Bush; F.Almasri; T.Vandamme; O.Debeir; N.Gutierrez; Q.Nguyen; W.Beksi
	Title	Thermal Image Super-Resolution Challenge – PBVS 2021			Type	Conference Article
	Year	2021	Publication	Conference on Computer Vision and Pattern Recognition Workshops	Abbreviated Journal
	Volume		Issue		Pages	4359-4367
	Keywords
	Abstract	This paper presents results from the second Thermal Image Super-Resolution (TISR) challenge organized in the framework of the Perception Beyond the Visible Spectrum (PBVS) 2021 workshop. For this second edition, the same thermal image dataset considered during the first challenge has been used; only mid-resolution (MR) and high-resolution (HR) sets have been considered. The dataset consists of 951 training images and 50 testing images for each resolution. A set of 20 images for each resolution is kept aside for evaluation. The two evaluation methodologies proposed for the first challenge are also considered in this opportunity. The first evaluation task consists of measuring the PSNR and SSIM between the obtained SR image and the corresponding ground truth (i.e., the HR thermal image downsampled by four). The second evaluation also consists of measuring the PSNR and SSIM, but in this case, considers the x2 SR obtained from the given MR thermal image; this evaluation is performed between the SR image with respect to the semi-registered HR image, which has been acquired with another camera. The results outperformed those from the first challenge, thus showing an improvement in both evaluation metrics.
	Address	Virtual; June 2021
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	CVPRW
	Notes	MSIAU; 600.130; 600.122			Approved	no
	Call Number	Admin @ si @ RSV2021			Serial	3581
Permanent link to this record



	Author	Jose Manuel Alvarez; Ferran Diego; Joan Serrat; Antonio Lopez
	Title	Automatic Ground-truthing using video registration for on-board detection algorithms			Type	Conference Article
	Year	2009	Publication	16th IEEE International Conference on Image Processing	Abbreviated Journal
	Volume		Issue		Pages	4389 - 4392
	Keywords
	Abstract	Ground-truth data is essential for the objective evaluation of object detection methods in computer vision. Many works claim their method is robust but they support it with experiments which are not quantitatively assessed with regard some ground-truth. This is one of the main obstacles to properly evaluate and compare such methods. One of the main reasons is that creating an extensive and representative ground-truth is very time consuming, specially in the case of video sequences, where thousands of frames have to be labelled. Could such a ground-truth be generated, at least in part, automatically? Though it may seem a contradictory question, we show that this is possible for the case of video sequences recorded from a moving camera. The key idea is transferring existing frame segmentations from a reference sequence into another video sequence recorded at a different time on the same track, possibly under a different ambient lighting. We have carried out experiments on several video sequence pairs and quantitatively assessed the precision of the transformed ground-truth, which prove that our approach is not only feasible but also quite accurate.
	Address	Cairo, Egypt
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN	1522-4880	ISBN	978-1-4244-5653-6	Medium
	Area		Expedition		Conference	ICIP
	Notes	ADAS			Approved	no
	Call Number	ADAS @ adas @ ADS2009			Serial	1201
Permanent link to this record



	Author	Idoia Ruiz; Joan Serrat
	Title	Hierarchical Novelty Detection for Traffic Sign Recognition			Type	Journal Article
	Year	2022	Publication	Sensors	Abbreviated Journal	SENS
	Volume	22	Issue	12	Pages	4389
	Keywords	Novelty detection; hierarchical classification; deep learning; traffic sign recognition; autonomous driving; computer vision
	Abstract	Recent works have made significant progress in novelty detection, i.e., the problem of detecting samples of novel classes, never seen during training, while classifying those that belong to known classes. However, the only information this task provides about novel samples is that they are unknown. In this work, we leverage hierarchical taxonomies of classes to provide informative outputs for samples of novel classes. We predict their closest class in the taxonomy, i.e., its parent class. We address this problem, known as hierarchical novelty detection, by proposing a novel loss, namely Hierarchical Cosine Loss that is designed to learn class prototypes along with an embedding of discriminative features consistent with the taxonomy. We apply it to traffic sign recognition, where we predict the parent class semantics for new types of traffic signs. Our model beats state-of-the art approaches on two large scale traffic sign benchmarks, Mapillary Traffic Sign Dataset (MTSD) and Tsinghua-Tencent 100K (TT100K), and performs similarly on natural images benchmarks (AWA2, CUB). For TT100K and MTSD, our approach is able to detect novel samples at the correct nodes of the hierarchy with 81% and 36% of accuracy, respectively, at 80% known class accuracy.
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	ADAS; 600.154			Approved	no
	Call Number	Admin @ si @ RuS2022			Serial	3684
Permanent link to this record



	Author	Fahad Shahbaz Khan; Jiaolong Xu; Muhammad Anwer Rao; Joost Van de Weijer; Andrew Bagdanov; Antonio Lopez
	Title	Recognizing Actions through Action-specific Person Detection			Type	Journal Article
	Year	2015	Publication	IEEE Transactions on Image Processing	Abbreviated Journal	TIP
	Volume	24	Issue	11	Pages	4422-4432
	Keywords
	Abstract	Action recognition in still images is a challenging problem in computer vision. To facilitate comparative evaluation independently of person detection, the standard evaluation protocol for action recognition uses an oracle person detector to obtain perfect bounding box information at both training and test time. The assumption is that, in practice, a general person detector will provide candidate bounding boxes for action recognition. In this paper, we argue that this paradigm is suboptimal and that action class labels should already be considered during the detection stage. Motivated by the observation that body pose is strongly conditioned on action class, we show that: 1) the existing state-of-the-art generic person detectors are not adequate for proposing candidate bounding boxes for action classification; 2) due to limited training examples, the direct training of action-specific person detectors is also inadequate; and 3) using only a small number of labeled action examples, the transfer learning is able to adapt an existing detector to propose higher quality bounding boxes for subsequent action classification. To the best of our knowledge, we are the first to investigate transfer learning for the task of action-specific person detection in still images. We perform extensive experiments on two benchmark data sets: 1) Stanford-40 and 2) PASCAL VOC 2012. For the action detection task (i.e., both person localization and classification of the action performed), our approach outperforms methods based on general person detection by 5.7% mean average precision (MAP) on Stanford-40 and 2.1% MAP on PASCAL VOC 2012. Our approach also significantly outperforms the state of the art with a MAP of 45.4% on Stanford-40 and 31.4% on PASCAL VOC 2012. We also evaluate our action detection approach for the task of action classification (i.e., recognizing actions without localizing them). For this task, our approach, without using any ground-truth person localization at test tim- , outperforms on both data sets state-of-the-art methods, which do use person locations.
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN	1057-7149	ISBN		Medium
	Area		Expedition		Conference
	Notes	ADAS; LAMP; 600.076; 600.079			Approved	no
	Call Number	Admin @ si @ KXR2015			Serial	2668
Permanent link to this record



	Author	Alejandro Cartas; Jordi Luque; Petia Radeva; Carlos Segura; Mariella Dimiccoli
	Title	Seeing and Hearing Egocentric Actions: How Much Can We Learn?			Type	Conference Article
	Year	2019	Publication	IEEE International Conference on Computer Vision Workshops	Abbreviated Journal
	Volume		Issue		Pages	4470-4480
	Keywords
	Abstract	Our interaction with the world is an inherently multimodal experience. However, the understanding of human-to-object interactions has historically been addressed focusing on a single modality. In particular, a limited number of works have considered to integrate the visual and audio modalities for this purpose. In this work, we propose a multimodal approach for egocentric action recognition in a kitchen environment that relies on audio and visual information. Our model combines a sparse temporal sampling strategy with a late fusion of audio, spatial, and temporal streams. Experimental results on the EPIC-Kitchens dataset show that multimodal integration leads to better performance than unimodal approaches. In particular, we achieved a 5.18% improvement over the state of the art on verb classification.
	Address	Seul; Korea; October 2019
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	ICCVW
	Notes	MILAB; no proj			Approved	no
	Call Number	Admin @ si @ CLR2019b			Serial	3385
Permanent link to this record



	Author	Joan Marti; Jose Miguel Benedi; Ana Maria Mendonça; Joan Serrat
	Title	Pattern Recognition and Image Analysis			Type	Book Whole
	Year	2007	Publication	3rd Iberian Conference	Abbreviated Journal
	Volume	6669	Issue		Pages	4477-4478
	Keywords
	Abstract
	Address	Girona (Spain)
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title	LNCS
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	IbPRIA
	Notes	ADAS			Approved	no
	Call Number	ADAS @ adas @ MBM2007			Serial	994
Permanent link to this record



	Author	Klara Janousckova; Jiri Matas; Lluis Gomez; Dimosthenis Karatzas
	Title	Text Recognition – Real World Data and Where to Find Them			Type	Conference Article
	Year	2020	Publication	25th International Conference on Pattern Recognition	Abbreviated Journal
	Volume		Issue		Pages	4489-4496
	Keywords
	Abstract	We present a method for exploiting weakly annotated images to improve text extraction pipelines. The approach uses an arbitrary end-to-end text recognition system to obtain text region proposals and their, possibly erroneous, transcriptions. The method includes matching of imprecise transcriptions to weak annotations and an edit distance guided neighbourhood search. It produces nearly error-free, localised instances of scene text, which we treat as “pseudo ground truth” (PGT). The method is applied to two weakly-annotated datasets. Training with the extracted PGT consistently improves the accuracy of a state of the art recognition model, by 3.7% on average, across different benchmark datasets (image domains) and 24.5% on one of the weakly annotated datasets 1 1 Acknowledgements. The authors were supported by Czech Technical University student grant SGS20/171/0HK3/3TJ13, the MEYS VVV project CZ.02.1.01/0.010.0J16 019/0000765 Research Center for Informatics, the Spanish Research project TIN2017-89779-P and the CERCA Programme / Generalitat de Catalunya.
	Address	Virtual; January 2021
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	ICPR
	Notes	DAG; 600.121; 600.129			Approved	no
	Call Number	Admin @ si @ JMG2020			Serial	3557
Permanent link to this record



	Author	Xinhang Song; Shuqiang Jiang; Luis Herranz
	Title	Combining Models from Multiple Sources for RGB-D Scene Recognition			Type	Conference Article
	Year	2017	Publication	26th International Joint Conference on Artificial Intelligence	Abbreviated Journal
	Volume		Issue		Pages	4523-4529
	Keywords	Robotics and Vision; Vision and Perception
	Abstract	Depth can complement RGB with useful cues about object volumes and scene layout. However, RGB-D image datasets are still too small for directly training deep convolutional neural networks (CNNs), in contrast to the massive monomodal RGB datasets. Previous works in RGB-D recognition typically combine two separate networks for RGB and depth data, pretrained with a large RGB dataset and then fine tuned to the respective target RGB and depth datasets. These approaches have several limitations: 1) only use low-level filters learned from RGB data, thus not being able to exploit properly depth-specific patterns, and 2) RGB and depth features are only combined at high-levels but rarely at lower-levels. In this paper, we propose a framework that leverages both knowledge acquired from large RGB datasets together with depth-specific cues learned from the limited depth data, obtaining more effective multi-source and multi-modal representations. We propose a multi-modal combination method that selects discriminative combinations of layers from the different source models and target modalities, capturing both high-level properties of the task and intrinsic low-level properties of both modalities.
	Address	Melbourne; Australia; August 2017
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	IJCAI
	Notes	LAMP; 600.120			Approved	no
	Call Number	Admin @ si @ SJH2017b			Serial	2966
Permanent link to this record



	Author	Angel Morera; Angel Sanchez; A. Belen Moreno; Angel Sappa; Jose F. Velez
	Title	SSD vs. YOLO for Detection of Outdoor Urban Advertising Panels under Multiple Variabilities			Type	Journal Article
	Year	2020	Publication	Sensors	Abbreviated Journal	SENS
	Volume	20	Issue	16	Pages	4587
	Keywords
	Abstract	This work compares Single Shot MultiBox Detector (SSD) and You Only Look Once (YOLO) deep neural networks for the outdoor advertisement panel detection problem by handling multiple and combined variabilities in the scenes. Publicity panel detection in images offers important advantages both in the real world as well as in the virtual one. For example, applications like Google Street View can be used for Internet publicity and when detecting these ads panels in images, it could be possible to replace the publicity appearing inside the panels by another from a funding company. In our experiments, both SSD and YOLO detectors have produced acceptable results under variable sizes of panels, illumination conditions, viewing perspectives, partial occlusion of panels, complex background and multiple panels in scenes. Due to the difficulty of finding annotated images for the considered problem, we created our own dataset for conducting the experiments. The major strength of the SSD model was the almost elimination of False Positive (FP) cases, situation that is preferable when the publicity contained inside the panel is analyzed after detecting them. On the other side, YOLO produced better panel localization results detecting a higher number of True Positive (TP) panels with a higher accuracy. Finally, a comparison of the two analyzed object detection models with different types of semantic segmentation networks and using the same evaluation metrics is also included.
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	MSIAU; 600.130; 601.349; 600.122			Approved	no
	Call Number	Admin @ si @ MSM2020			Serial	3452
Permanent link to this record



	Author	Felipe Codevilla; Matthias Muller; Antonio Lopez; Vladlen Koltun; Alexey Dosovitskiy
	Title	End-to-end Driving via Conditional Imitation Learning			Type	Conference Article
	Year	2018	Publication	IEEE International Conference on Robotics and Automation	Abbreviated Journal
	Volume		Issue		Pages	4693 - 4700
	Keywords
	Abstract	Deep networks trained on demonstrations of human driving have learned to follow roads and avoid obstacles. However, driving policies trained via imitation learning cannot be controlled at test time. A vehicle trained end-to-end to imitate an expert cannot be guided to take a specific turn at an upcoming intersection. This limits the utility of such systems. We propose to condition imitation learning on high-level command input. At test time, the learned driving policy functions as a chauffeur that handles sensorimotor coordination but continues to respond to navigational commands. We evaluate different architectures for conditional imitation learning in vision-based driving. We conduct experiments in realistic three-dimensional simulations of urban driving and on a 1/5 scale robotic truck that is trained to drive in a residential area. Both systems drive based on visual input yet remain responsive to high-level navigational commands. The supplementary video can be viewed at this https URL
	Address	Brisbane; Australia; May 2018
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	ICRA
	Notes	ADAS; 600.116; 600.124; 600.118			Approved	no
	Call Number	Admin @ si @ CML2018			Serial	3108
Permanent link to this record



	Author	Ciprian Corneanu; Meysam Madadi; Sergio Escalera; Aleix M. Martinez
	Title	What does it mean to learn in deep networks? And, how does one detect adversarial attacks?			Type	Conference Article
	Year	2019	Publication	32nd IEEE Conference on Computer Vision and Pattern Recognition	Abbreviated Journal
	Volume		Issue		Pages	4752-4761
	Keywords
	Abstract	The flexibility and high-accuracy of Deep Neural Networks (DNNs) has transformed computer vision. But, the fact that we do not know when a specific DNN will work and when it will fail has resulted in a lack of trust. A clear example is self-driving cars; people are uncomfortable sitting in a car driven by algorithms that may fail under some unknown, unpredictable conditions. Interpretability and explainability approaches attempt to address this by uncovering what a DNN models, i.e., what each node (cell) in the network represents and what images are most likely to activate it. This can be used to generate, for example, adversarial attacks. But these approaches do not generally allow us to determine where a DNN will succeed or fail and why. i.e., does this learned representation generalize to unseen samples? Here, we derive a novel approach to define what it means to learn in deep networks, and how to use this knowledge to detect adversarial attacks. We show how this defines the ability of a network to generalize to unseen testing samples and, most importantly, why this is the case.
	Address	California; June 2019
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	CVPR
	Notes	HuPBA; no proj			Approved	no
	Call Number	Admin @ si @ CME2019			Serial	3332
Permanent link to this record



	Author	Zhijie Fang; Antonio Lopez
	Title	Intention Recognition of Pedestrians and Cyclists by 2D Pose Estimation			Type	Journal Article
	Year	2019	Publication	IEEE Transactions on Intelligent Transportation Systems	Abbreviated Journal	TITS
	Volume	21	Issue	11	Pages	4773 - 4783
	Keywords
	Abstract	Anticipating the intentions of vulnerable road users (VRUs) such as pedestrians and cyclists is critical for performing safe and comfortable driving maneuvers. This is the case for human driving and, thus, should be taken into account by systems providing any level of driving assistance, from advanced driver assistant systems (ADAS) to fully autonomous vehicles (AVs). In this paper, we show how the latest advances on monocular vision-based human pose estimation, i.e. those relying on deep Convolutional Neural Networks (CNNs), enable to recognize the intentions of such VRUs. In the case of cyclists, we assume that they follow traffic rules to indicate future maneuvers with arm signals. In the case of pedestrians, no indications can be assumed. Instead, we hypothesize that the walking pattern of a pedestrian allows to determine if he/she has the intention of crossing the road in the path of the ego-vehicle, so that the ego-vehicle must maneuver accordingly (e.g. slowing down or stopping). In this paper, we show how the same methodology can be used for recognizing pedestrians and cyclists' intentions. For pedestrians, we perform experiments on the JAAD dataset. For cyclists, we did not found an analogous dataset, thus, we created our own one by acquiring and annotating videos which we share with the research community. Overall, the proposed pipeline provides new state-of-the-art results on the intention recognition of VRUs.
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	ADAS; 600.118			Approved	no
	Call Number	Admin @ si @ FaL2019			Serial	3305
Permanent link to this record



	Author	Jaume Garcia; Albert Andaluz; Debora Gil; Francesc Carreras
	Title	Decoupled External Forces in a Predictor-Corrector Segmentation Scheme for LV Contours in Tagged MR Images			Type	Conference Article
	Year	2010	Publication	32nd Annual International Conference of the IEEE Engineering in Medicine and Biology Society	Abbreviated Journal
	Volume		Issue		Pages	4805-4808
	Keywords
	Abstract	Computation of functional regional scores requires proper identification of LV contours. On one hand, manual segmentation is robust, but it is time consuming and requires high expertise. On the other hand, the tag pattern in TMR sequences is a problem for automatic segmentation of LV boundaries. We propose a segmentation method based on a predictorcorrector (Active Contours – Shape Models) scheme. Special stress is put in the definition of the AC external forces. First, we introduce a semantic description of the LV that discriminates myocardial tissue by using texture and motion descriptors. Second, in order to ensure convergence regardless of the initial contour, the external energy is decoupled according to the orientation of the edges in the image potential. We have validated the model in terms of error in segmented contours and accuracy of regional clinical scores.
	Address	Buenos Aires (Argentina)
	Corporate Author	IEEE EMB			Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN	1557-170X	ISBN	978-1-4244-4123-5	Medium
	Area		Expedition		Conference	EMBC
	Notes	IAM			Approved	no
	Call Number	IAM @ iam @ GAG2010			Serial	1514
Permanent link to this record



	Author	Jaykishan Patel; Alban Flachot; Javier Vazquez; David H. Brainard; Thomas S. A. Wallis; Marcus A. Brubaker; Richard F. Murray
	Title	A deep convolutional neural network trained to infer surface reflectance is deceived by mid-level lightness illusions			Type	Journal Article
	Year	2023	Publication	Journal of Vision	Abbreviated Journal	JV
	Volume	23	Issue	9	Pages	4817-4817
	Keywords
	Abstract	A long-standing view is that lightness illusions are by-products of strategies employed by the visual system to stabilize its perceptual representation of surface reflectance against changes in illumination. Computationally, one such strategy is to infer reflectance from the retinal image, and to base the lightness percept on this inference. CNNs trained to infer reflectance from images have proven successful at solving this problem under limited conditions. To evaluate whether these CNNs provide suitable starting points for computational models of human lightness perception, we tested a state-of-the-art CNN on several lightness illusions, and compared its behaviour to prior measurements of human performance. We trained a CNN (Yu & Smith, 2019) to infer reflectance from luminance images. The network had a 30-layer hourglass architecture with skip connections. We trained the network via supervised learning on 100K images, rendered in Blender, each showing randomly placed geometric objects (surfaces, cubes, tori, etc.), with random Lambertian reflectance patterns (solid, Voronoi, or low-pass noise), under randomized point+ambient lighting. The renderer also provided the ground-truth reflectance images required for training. After training, we applied the network to several visual illusions. These included the argyle, Koffka-Adelson, snake, White’s, checkerboard assimilation, and simultaneous contrast illusions, along with their controls where appropriate. The CNN correctly predicted larger illusions in the argyle, Koffka-Adelson, and snake images than in their controls. It also correctly predicted an assimilation effect in White's illusion. It did not, however, account for the checkerboard assimilation or simultaneous contrast effects. These results are consistent with the view that at least some lightness phenomena are by-products of a rational approach to inferring stable representations of physical properties from intrinsically ambiguous retinal images. Furthermore, they suggest that CNN models may be a promising starting point for new models of human lightness perception.
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	MACO; CIC			Approved	no
	Call Number	Admin @ si @ PFV2023			Serial	3890
Permanent link to this record