toggle visibility Search & Display Options

Select All    Deselect All
 |   | 
Details
   print
  Records Links
Author (up) Xialei Liu; Joost Van de Weijer; Andrew Bagdanov edit   pdf
doi  openurl
  Title Leveraging Unlabeled Data for Crowd Counting by Learning to Rank Type Conference Article
  Year 2018 Publication 31st IEEE Conference on Computer Vision and Pattern Recognition Abbreviated Journal  
  Volume Issue Pages 7661 - 7669  
  Keywords Task analysis; Training; Computer vision; Visualization; Estimation; Head; Context modeling  
  Abstract We propose a novel crowd counting approach that leverages abundantly available unlabeled crowd imagery in a learning-to-rank framework. To induce a ranking of
cropped images , we use the observation that any sub-image of a crowded scene image is guaranteed to contain the same number or fewer persons than the super-image. This allows us to address the problem of limited size of existing
datasets for crowd counting. We collect two crowd scene datasets from Google using keyword searches and queryby-example image retrieval, respectively. We demonstrate how to efficiently learn from these unlabeled datasets by incorporating learning-to-rank in a multi-task network which simultaneously ranks images and estimates crowd density maps. Experiments on two of the most challenging crowd counting datasets show that our approach obtains state-ofthe-art results.
 
  Address Salt Lake City; USA; June 2018  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference CVPR  
  Notes LAMP; 600.109; 600.106; 600.120 Approved no  
  Call Number Admin @ si @ LWB2018 Serial 3159  
Permanent link to this record
 

 
Author (up) Xialei Liu; Joost Van de Weijer; Andrew Bagdanov edit   pdf
url  doi
openurl 
  Title Exploiting Unlabeled Data in CNNs by Self-Supervised Learning to Rank Type Journal Article
  Year 2019 Publication IEEE Transactions on Pattern Analysis and Machine Intelligence Abbreviated Journal TPAMI  
  Volume 41 Issue 8 Pages 1862-1878  
  Keywords Task analysis;Training;Image quality;Visualization;Uncertainty;Labeling;Neural networks;Learning from rankings;image quality assessment;crowd counting;active learning  
  Abstract For many applications the collection of labeled data is expensive laborious. Exploitation of unlabeled data during training is thus a long pursued objective of machine learning. Self-supervised learning addresses this by positing an auxiliary task (different, but related to the supervised task) for which data is abundantly available. In this paper, we show how ranking can be used as a proxy task for some regression problems. As another contribution, we propose an efficient backpropagation technique for Siamese networks which prevents the redundant computation introduced by the multi-branch network architecture. We apply our framework to two regression problems: Image Quality Assessment (IQA) and Crowd Counting. For both we show how to automatically generate ranked image sets from unlabeled data. Our results show that networks trained to regress to the ground truth targets for labeled data and to simultaneously learn to rank unlabeled data obtain significantly better, state-of-the-art results for both IQA and crowd counting. In addition, we show that measuring network uncertainty on the self-supervised proxy task is a good measure of informativeness of unlabeled data. This can be used to drive an algorithm for active learning and we show that this reduces labeling effort by up to 50 percent.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes LAMP; 600.109; 600.106; 600.120 Approved no  
  Call Number LWB2019 Serial 3267  
Permanent link to this record
 

 
Author (up) Xialei Liu; Marc Masana; Luis Herranz; Joost Van de Weijer; Antonio Lopez; Andrew Bagdanov edit   pdf
doi  openurl
  Title Rotate your Networks: Better Weight Consolidation and Less Catastrophic Forgetting Type Conference Article
  Year 2018 Publication 24th International Conference on Pattern Recognition Abbreviated Journal  
  Volume Issue Pages 2262-2268  
  Keywords  
  Abstract In this paper we propose an approach to avoiding catastrophic forgetting in sequential task learning scenarios. Our technique is based on a network reparameterization that approximately diagonalizes the Fisher Information Matrix of the network parameters. This reparameterization takes the form of
a factorized rotation of parameter space which, when used in conjunction with Elastic Weight Consolidation (which assumes a diagonal Fisher Information Matrix), leads to significantly better performance on lifelong learning of sequential tasks. Experimental results on the MNIST, CIFAR-100, CUB-200 and
Stanford-40 datasets demonstrate that we significantly improve the results of standard elastic weight consolidation, and that we obtain competitive results when compared to the state-of-the-art in lifelong learning without forgetting.
 
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference ICPR  
  Notes LAMP; ADAS; 601.305; 601.109; 600.124; 600.106; 602.200; 600.120; 600.118 Approved no  
  Call Number Admin @ si @ LMH2018 Serial 3160  
Permanent link to this record
 

 
Author (up) Xiangyang Li; Luis Herranz; Shuqiang Jiang edit   pdf
url  openurl
  Title Multifaceted Analysis of Fine-Tuning in Deep Model for Visual Recognition Type Journal
  Year 2020 Publication ACM Transactions on Data Science Abbreviated Journal ACM  
  Volume Issue Pages  
  Keywords  
  Abstract In recent years, convolutional neural networks (CNNs) have achieved impressive performance for various visual recognition scenarios. CNNs trained on large labeled datasets can not only obtain significant performance on most challenging benchmarks but also provide powerful representations, which can be used to a wide range of other tasks. However, the requirement of massive amounts of data to train deep neural networks is a major drawback of these models, as the data available is usually limited or imbalanced. Fine-tuning (FT) is an effective way to transfer knowledge learned in a source dataset to a target task. In this paper, we introduce and systematically investigate several factors that influence the performance of fine-tuning for visual recognition. These factors include parameters for the retraining procedure (e.g., the initial learning rate of fine-tuning), the distribution of the source and target data (e.g., the number of categories in the source dataset, the distance between the source and target datasets) and so on. We quantitatively and qualitatively analyze these factors, evaluate their influence, and present many empirical observations. The results reveal insights into what fine-tuning changes CNN parameters and provide useful and evidence-backed intuitions about how to implement fine-tuning for computer vision tasks.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes LAMP; 600.141; 600.120 Approved no  
  Call Number Admin @ si @ LHJ2020 Serial 3423  
Permanent link to this record
 

 
Author (up) Xim Cerda-Company edit  isbn
openurl 
  Title Understanding color vision: from psychophysics to computational modeling Type Book Whole
  Year 2019 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract In this PhD we have approached the human color vision from two different points of view: psychophysics and computational modeling. First, we have evaluated 15 different tone-mapping operators (TMOs). We have conducted two experiments that
consider two different criteria: the first one evaluates the local relationships among intensity levels and the second one evaluates the global appearance of the tonemapped imagesw.r.t. the physical one (presented side by side). We conclude that the rankings depend on the criterion and they are not correlated. Considering both criteria, the best TMOs are KimKautz (Kim and Kautz, 2008) and Krawczyk (Krawczyk, Myszkowski, and Seidel, 2005). Another conclusion is that a more standardized evaluation criteria is needed to do a fair comparison among TMOs.
Secondly, we have conducted several psychophysical experiments to study the
color induction. We have studied two different properties of the visual stimuli: temporal frequency and luminance spatial distribution. To study the temporal frequency we defined equiluminant stimuli composed by both uniform and striped surrounds and we flashed them varying the flash duration. For uniform surrounds, the results show that color induction depends on both the flash duration and inducer’s chromaticity. As expected, in all chromatic conditions color contrast was induced. In contrast, for striped surrounds, we expected to induce color assimilation, but we observed color contrast or no induction. Since similar but not equiluminant striped stimuli induce color assimilation, we concluded that luminance differences could be a key factor to induce color assimilation. Thus, in a subsequent study, we have studied the luminance differences’ effect on color assimilation. We varied the luminance difference between the target region and its inducers and we observed that color assimilation depends on both this difference and the inducer’s chromaticity. For red-green condition (where the first inducer is red and the second one is green), color assimilation occurs in almost all luminance conditions.
Instead, for green-red condition, color assimilation never occurs. Purple-lime
and lime-purple chromatic conditions show that luminance difference is a key factor to induce color assimilation. When the target is darker than its surround, color assimilation is stronger in purple-lime, while when the target is brighter, color assimilation is stronger in lime-purple (’mirroring’ effect). Moreover, we evaluated whether color assimilation is due to luminance or brightness differences. Similarly to equiluminance condition, when the stimuli are equibrightness no color assimilation is induced. Our results support the hypothesis that mutual-inhibition plays a major role in color perception, or at least in color induction.
Finally, we have defined a new firing rate model of color processing in the V1
parvocellular pathway. We have modeled two different layers of this cortical area: layers 4Cb and 2/3. Our model is a recurrent dynamic computational model that considers both excitatory and inhibitory cells and their lateral connections. Moreover, it considers the existent laminar differences and the cells’ variety. Thus, we have modeled both single- and double-opponent simple cells and complex cells, which are a pool of double-opponent simple cells. A set of sinusoidal drifting gratings have been used to test the architecture. In these gratings we have varied several spatial properties such as temporal and spatial frequencies, grating’s area and orientation. To reproduce the electrophysiological observations, the architecture has to consider the existence of non-oriented double-opponent cells in layer 4Cb and the lack of lateral connections between single-opponent cells. Moreover, we have tested our lateral connections simulating the center-surround modulation and we have reproduced physiological measurements where for high contrast stimulus, the
result of the lateral connections is inhibitory, while it is facilitatory for low contrast stimulus.
 
  Address March 2019  
  Corporate Author Thesis Ph.D. thesis  
  Publisher Ediciones Graficas Rey Place of Publication Editor Xavier Otazu  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN 978-84-948531-4-2 Medium  
  Area Expedition Conference  
  Notes NEUROBIT Approved no  
  Call Number Admin @ si @ Cer2019 Serial 3259  
Permanent link to this record
 

 
Author (up) Xim Cerda-Company; C. Alejandro Parraga; Xavier Otazu edit  openurl
  Title Which tone-mapping is the best? A comparative study of tone-mapping perceived quality Type Abstract
  Year 2014 Publication Perception Abbreviated Journal  
  Volume 43 Issue Pages 106  
  Keywords  
  Abstract Perception 43 ECVP Abstract Supplement
High-dynamic-range (HDR) imaging refers to the methods designed to increase the brightness dynamic range present in standard digital imaging techniques. This increase is achieved by taking the same picture under di erent exposure values and mapping the intensity levels into a single image by way of a tone-mapping operator (TMO). Currently, there is no agreement on how to evaluate the quality
of di erent TMOs. In this work we psychophysically evaluate 15 di erent TMOs obtaining rankings based on the perceived properties of the resulting tone-mapped images. We performed two di erent experiments on a CRT calibrated display using 10 subjects: (1) a study of the internal relationships between grey-levels and (2) a pairwise comparison of the resulting 15 tone-mapped images. In (1) observers internally matched the grey-levels to a reference inside the tone-mapped images and in the real scene. In (2) observers performed a pairwise comparison of the tone-mapped images alongside the real scene. We obtained two rankings of the TMOs according their performance. In (1) the best algorithm
was ICAM by J.Kuang et al (2007) and in (2) the best algorithm was a TMO by Krawczyk et al (2005). Our results also show no correlation between these two rankings.
 
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference ECVP  
  Notes NEUROBIT; 600.074 Approved no  
  Call Number Admin @ si @ CPO2014 Serial 2527  
Permanent link to this record
 

 
Author (up) Xim Cerda-Company; C. Alejandro Parraga; Xavier Otazu edit   pdf
url  doi
openurl 
  Title Which tone-mapping operator is the best? A comparative study of perceptual quality Type Journal Article
  Year 2018 Publication Journal of the Optical Society of America A Abbreviated Journal JOSA A  
  Volume 35 Issue 4 Pages 626-638  
  Keywords  
  Abstract Tone-mapping operators (TMO) are designed to generate perceptually similar low-dynamic range images from high-dynamic range ones. We studied the performance of fifteen TMOs in two psychophysical experiments where observers compared the digitally-generated tone-mapped images to their corresponding physical scenes. All experiments were performed in a controlled environment and the setups were
designed to emphasize different image properties: in the first experiment we evaluated the local relationships among intensity-levels, and in the second one we evaluated global visual appearance among physical scenes and tone-mapped images, which were presented side by side. We ranked the TMOs according
to how well they reproduced the results obtained in the physical scene. Our results show that ranking position clearly depends on the adopted evaluation criteria, which implies that, in general, these tone-mapping algorithms consider either local or global image attributes but rarely both. Regarding the
question of which TMO is the best, KimKautz [1] and Krawczyk [2] obtained the better results across the different experiments. We conclude that a more thorough and standardized evaluation criteria is needed to study all the characteristics of TMOs, as there is ample room for improvement in future developments.
 
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes NEUROBIT; 600.120; 600.128 Approved no  
  Call Number Admin @ si @ CPO2018 Serial 3088  
Permanent link to this record
 

 
Author (up) Xim Cerda-Company; Olivier Penacchio; Xavier Otazu edit   pdf
url  openurl
  Title Chromatic Induction in Migraine Type Journal
  Year 2021 Publication VISION Abbreviated Journal  
  Volume 5 Issue 3 Pages 37  
  Keywords migraine; vision; colour; colour perception; chromatic induction; psychophysics  
  Abstract The human visual system is not a colorimeter. The perceived colour of a region does not only depend on its colour spectrum, but also on the colour spectra and geometric arrangement of neighbouring regions, a phenomenon called chromatic induction. Chromatic induction is thought to be driven by lateral interactions: the activity of a central neuron is modified by stimuli outside its classical receptive field through excitatory–inhibitory mechanisms. As there is growing evidence of an excitation/inhibition imbalance in migraine, we compared chromatic induction in migraine and control groups. As hypothesised, we found a difference in the strength of induction between the two groups, with stronger induction effects in migraine. On the other hand, given the increased prevalence of visual phenomena in migraine with aura, we also hypothesised that the difference between migraine and control would be more important in migraine with aura than in migraine without aura. Our experiments did not support this hypothesis. Taken together, our results suggest a link between excitation/inhibition imbalance and increased induction effects.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes NEUROBIT; no proj Approved no  
  Call Number Admin @ si @ CPO2021 Serial 3589  
Permanent link to this record
 

 
Author (up) Xim Cerda-Company; Xavier Otazu edit   pdf
doi  openurl
  Title Color induction in equiluminant flashed stimuli Type Journal Article
  Year 2019 Publication Journal of the Optical Society of America A Abbreviated Journal JOSA A  
  Volume 36 Issue 1 Pages 22-31  
  Keywords  
  Abstract Color induction is the influence of the surrounding color (inducer) on the perceived color of a central region. There are two different types of color induction: color contrast (the color of the central region shifts away from that of the inducer) and color assimilation (the color shifts towards the color of the inducer). Several studies on these effects have used uniform and striped surrounds, reporting color contrast and color assimilation, respectively. Other authors [J. Vis. 12(1), 22 (2012) [CrossRef] ] have studied color induction using flashed uniform surrounds, reporting that the contrast is higher for shorter flash duration. Extending their study, we present new psychophysical results using both flashed and static (i.e., non-flashed) equiluminant stimuli for both striped and uniform surrounds. Similarly to them, for uniform surround stimuli we observed color contrast, but we did not obtain the maximum contrast for the shortest (10 ms) flashed stimuli, but for 40 ms. We only observed this maximum contrast for red, green, and lime inducers, while for a purple inducer we obtained an asymptotic profile along the flash duration. For striped stimuli, we observed color assimilation only for the static (infinite flash duration) red–green surround inducers (red first inducer, green second inducer). For the other inducers’ configurations, we observed color contrast or no induction. Since other studies showed that non-equiluminant striped static stimuli induce color assimilation, our results also suggest that luminance differences could be a key factor to induce it.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes NEUROBIT; 600.120; 600.128 Approved no  
  Call Number Admin @ si @ CeO2019 Serial 3226  
Permanent link to this record
 

 
Author (up) Xim Cerda-Company; Xavier Otazu; Nilai Sallent; C. Alejandro Parraga edit   pdf
doi  openurl
  Title The effect of luminance differences on color assimilation Type Journal Article
  Year 2018 Publication Journal of Vision Abbreviated Journal JV  
  Volume 18 Issue 11 Pages 10-10  
  Keywords  
  Abstract The color appearance of a surface depends on the color of its surroundings (inducers). When the perceived color shifts towards that of the surroundings, the effect is called “color assimilation” and when it shifts away from the surroundings it is called “color contrast.” There is also evidence that the phenomenon depends on the spatial configuration of the inducer, e.g., uniform surrounds tend to induce color contrast and striped surrounds tend to induce color assimilation. However, previous work found that striped surrounds under certain conditions do not induce color assimilation but induce color contrast (or do not induce anything at all), suggesting that luminance differences and high spatial frequencies could be key factors in color assimilation. Here we present a new psychophysical study of color assimilation where we assessed the contribution of luminance differences (between the target and its surround) present in striped stimuli. Our results show that luminance differences are key factors in color assimilation for stimuli varying along the s axis of MacLeod-Boynton color space, but not for stimuli varying along the l axis. This asymmetry suggests that koniocellular neural mechanisms responsible for color assimilation only contribute when there is a luminance difference, supporting the idea that mutual-inhibition has a major role in color induction.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes NEUROBIT; 600.120; 600.128 Approved no  
  Call Number Admin @ si @ COS2018 Serial 3148  
Permanent link to this record
 

 
Author (up) Xinhang Song; Haitao Zeng; Sixian Zhang; Luis Herranz; Shuqiang Jiang edit  url
openurl 
  Title Generalized Zero-shot Learning with Multi-source Semantic Embeddings for Scene Recognition Type Conference Article
  Year 2020 Publication 28th ACM International Conference on Multimedia Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract Recognizing visual categories from semantic descriptions is a promising way to extend the capability of a visual classifier beyond the concepts represented in the training data (i.e. seen categories). This problem is addressed by (generalized) zero-shot learning methods (GZSL), which leverage semantic descriptions that connect them to seen categories (e.g. label embedding, attributes). Conventional GZSL are designed mostly for object recognition. In this paper we focus on zero-shot scene recognition, a more challenging setting with hundreds of categories where their differences can be subtle and often localized in certain objects or regions. Conventional GZSL representations are not rich enough to capture these local discriminative differences. Addressing these limitations, we propose a feature generation framework with two novel components: 1) multiple sources of semantic information (i.e. attributes, word embeddings and descriptions), 2) region descriptions that can enhance scene discrimination. To generate synthetic visual features we propose a two-step generative approach, where local descriptions are sampled and used as conditions to generate visual features. The generated features are then aggregated and used together with real features to train a joint classifier. In order to evaluate the proposed method, we introduce a new dataset for zero-shot scene recognition with multi-semantic annotations. Experimental results on the proposed dataset and SUN Attribute dataset illustrate the effectiveness of the proposed method.  
  Address Virtual; October 2020  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference ACM  
  Notes LAMP; 600.141; 600.120 Approved no  
  Call Number Admin @ si @ SZZ2020 Serial 3465  
Permanent link to this record
 

 
Author (up) Xinhang Song; Luis Herranz; Shuqiang Jiang edit   pdf
openurl 
  Title Depth CNNs for RGB-D Scene Recognition: Learning from Scratch Better than Transferring from RGB-CNNs Type Conference Article
  Year 2017 Publication 31st AAAI Conference on Artificial Intelligence Abbreviated Journal  
  Volume Issue Pages  
  Keywords RGB-D scene recognition; weakly supervised; fine tune; CNN  
  Abstract Scene recognition with RGB images has been extensively studied and has reached very remarkable recognition levels, thanks to convolutional neural networks (CNN) and large scene datasets. In contrast, current RGB-D scene data is much more limited, so often leverages RGB large datasets, by transferring pretrained RGB CNN models and fine-tuning with the target RGB-D dataset. However, we show that this approach has the limitation of hardly reaching bottom layers, which is key to learn modality-specific features. In contrast, we focus on the bottom layers, and propose an alternative strategy to learn depth features combining local weakly supervised training from patches followed by global fine tuning with images. This strategy is capable of learning very discriminative depth-specific features with limited depth images, without resorting to Places-CNN. In addition we propose a modified CNN architecture to further match the complexity of the model and the amount of data available. For RGB-D scene recognition, depth and RGB features are combined by projecting them in a common space and further leaning a multilayer classifier, which is jointly optimized in an end-to-end network. Our framework achieves state-of-the-art accuracy on NYU2 and SUN RGB-D in both depth only and combined RGB-D data.  
  Address San Francisco CA; February 2017  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference AAAI  
  Notes LAMP; 600.120 Approved no  
  Call Number Admin @ si @ SHJ2017 Serial 2967  
Permanent link to this record
 

 
Author (up) Xinhang Song; Shuqiang Jiang; Luis Herranz edit  doi
openurl 
  Title Multi-Scale Multi-Feature Context Modeling for Scene Recognition in the Semantic Manifold Type Journal Article
  Year 2017 Publication IEEE Transactions on Image Processing Abbreviated Journal TIP  
  Volume 26 Issue 6 Pages 2721-2735  
  Keywords  
  Abstract Before the big data era, scene recognition was often approached with two-step inference using localized intermediate representations (objects, topics, and so on). One of such approaches is the semantic manifold (SM), in which patches and images are modeled as points in a semantic probability simplex. Patch models are learned resorting to weak supervision via image labels, which leads to the problem of scene categories co-occurring in this semantic space. Fortunately, each category has its own co-occurrence patterns that are consistent across the images in that category. Thus, discovering and modeling these patterns are critical to improve the recognition performance in this representation. Since the emergence of large data sets, such as ImageNet and Places, these approaches have been relegated in favor of the much more powerful convolutional neural networks (CNNs), which can automatically learn multi-layered representations from the data. In this paper, we address many limitations of the original SM approach and related works. We propose discriminative patch representations using neural networks and further propose a hybrid architecture in which the semantic manifold is built on top of multiscale CNNs. Both representations can be computed significantly faster than the Gaussian mixture models of the original SM. To combine multiple scales, spatial relations, and multiple features, we formulate rich context models using Markov random fields. To solve the optimization problem, we analyze global and local approaches, where a top-down hierarchical algorithm has the best performance. Experimental results show that exploiting different types of contextual relations jointly consistently improves the recognition accuracy.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes LAMP; 600.120 Approved no  
  Call Number Admin @ si @ SJH2017a Serial 2963  
Permanent link to this record
 

 
Author (up) Xinhang Song; Shuqiang Jiang; Luis Herranz edit   pdf
doi  openurl
  Title Combining Models from Multiple Sources for RGB-D Scene Recognition Type Conference Article
  Year 2017 Publication 26th International Joint Conference on Artificial Intelligence Abbreviated Journal  
  Volume Issue Pages 4523-4529  
  Keywords Robotics and Vision; Vision and Perception  
  Abstract Depth can complement RGB with useful cues about object volumes and scene layout. However, RGB-D image datasets are still too small for directly training deep convolutional neural networks (CNNs), in contrast to the massive monomodal RGB datasets. Previous works in RGB-D recognition typically combine two separate networks for RGB and depth data, pretrained with a large RGB dataset and then fine tuned to the respective target RGB and depth datasets. These approaches have several limitations: 1) only use low-level filters learned from RGB data, thus not being able to exploit properly depth-specific patterns, and 2) RGB and depth features are only combined at high-levels but rarely at lower-levels. In this paper, we propose a framework that leverages both knowledge acquired from large RGB datasets together with depth-specific cues learned from the limited depth data, obtaining more effective multi-source and multi-modal representations. We propose a multi-modal combination method that selects discriminative combinations of layers from the different source models and target modalities, capturing both high-level properties of the task and intrinsic low-level properties of both modalities.  
  Address Melbourne; Australia; August 2017  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference IJCAI  
  Notes LAMP; 600.120 Approved no  
  Call Number Admin @ si @ SJH2017b Serial 2966  
Permanent link to this record
 

 
Author (up) Xinhang Song; Shuqiang Jiang; Luis Herranz; Chengpeng Chen edit   pdf
url  doi
openurl 
  Title Learning Effective RGB-D Representations for Scene Recognition Type Journal Article
  Year 2019 Publication IEEE Transactions on Image Processing Abbreviated Journal TIP  
  Volume 28 Issue 2 Pages 980-993  
  Keywords  
  Abstract Deep convolutional networks can achieve impressive results on RGB scene recognition thanks to large data sets such as places. In contrast, RGB-D scene recognition is still underdeveloped in comparison, due to two limitations of RGB-D data we address in this paper. The first limitation is the lack of depth data for training deep learning models. Rather than fine tuning or transferring RGB-specific features, we address this limitation by proposing an architecture and a two-step training approach that directly learns effective depth-specific features using weak supervision via patches. The resulting RGB-D model also benefits from more complementary multimodal features. Another limitation is the short range of depth sensors (typically 0.5 m to 5.5 m), resulting in depth images not capturing distant objects in the scenes that RGB images can. We show that this limitation can be addressed by using RGB-D videos, where more comprehensive depth information is accumulated as the camera travels across the scenes. Focusing on this scenario, we introduce the ISIA RGB-D video data set to evaluate RGB-D scene recognition with videos. Our video recognition architecture combines convolutional and recurrent neural networks that are trained in three steps with increasingly complex data to learn effective features (i.e., patches, frames, and sequences). Our approach obtains the state-of-the-art performances on RGB-D image (NYUD2 and SUN RGB-D) and video (ISIA RGB-D) scene recognition.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes LAMP; 600.141; 600.120 Approved no  
  Call Number Admin @ si @ SJH2019 Serial 3247  
Permanent link to this record
Select All    Deselect All
 |   | 
Details
   print

Save Citations:
Export Records: