|
Sounak Dey, Anguelos Nicolaou, Josep Llados, & Umapada Pal. (2016). Local Binary Pattern for Word Spotting in Handwritten Historical Document. In Joint IAPR International Workshops on Statistical Techniques in Pattern Recognition (SPR) and Structural and Syntactic Pattern Recognition (SSPR) (pp. 574–583). LNCS.
Abstract: Digital libraries store images which can be highly degraded and to index this kind of images we resort to word spotting as our information retrieval system. Information retrieval for handwritten document images is more challenging due to the difficulties in complex layout analysis, large variations of writing styles, and degradation or low quality of historical manuscripts. This paper presents a simple innovative learning-free method for word spotting from large scale historical documents combining Local Binary Pattern (LBP) and spatial sampling. This method offers three advantages: firstly, it operates in completely learning free paradigm which is very different from unsupervised learning methods, secondly, the computational time is significantly low because of the LBP features, which are very fast to compute, and thirdly, the method can be used in scenarios where annotations are not available. Finally, we compare the results of our proposed retrieval method with other methods in the literature and we obtain the best results in the learning free paradigm.
Keywords: Local binary patterns; Spatial sampling; Learning-free; Word spotting; Handwritten; Historical document analysis; Large-scale data
|
|
|
David Guillamet, & Jordi Vitria. (2000). Local Discriminant Regions Using Support Vector Machines for Object Recognition..
|
|
|
Cristhian Aguilera. (2017). Local feature description in cross-spectral imagery (Angel Sappa, Ed.). Ph.D. thesis, Ediciones Graficas Rey, .
Abstract: Over the last few years, the number of consumer computer vision applications has increased dramatically. Today, computer vision solutions can be found in video game consoles, smartphone applications, driving assistance – just to name a few. Ideally, we require the performance of those applications, particularly those that are safety critical to remain constant under any external environment factors, such as changes in illumination or weather conditions. However, this is not always possible or very difficult to obtain by only using visible imagery, due to the inherent limitations of the images from that spectral band. For that reason, the use of images from different or multiple spectral bands is becoming more appealing.
The aforementioned possible advantages of using images from multiples spectral bands on various vision applications make multi-spectral image processing a relevant topic for research and development. Like in visible image processing, multi-spectral image processing needs tools and algorithms to handle information from various spectral bands. Furthermore, traditional tools such as local feature detection, which is the basis of many vision tasks such as visual odometry, image registration, or structure from motion, must be adjusted or reformulated to operate under new conditions. Traditional feature detection, description, and matching methods tend to underperform in multi-spectral settings, in comparison to mono-spectral settings, due to the natural differences between each spectral band.
The work in this thesis is focused on the local feature description problem when cross-spectral images are considered. In this context, this dissertation has three main contributions. Firstly, the work starts by proposing the usage of a combination of frequency and spatial information, in a multi-scale scheme, as feature description. Evaluations of this proposal, based on classical hand-made feature descriptors, and comparisons with state of the art cross-spectral approaches help to find and understand limitations of such strategy. Secondly, different convolutional neural network (CNN) based architectures are evaluated when used to describe cross-spectral image patches. Results showed that CNN-based methods, designed to work with visible monocular images, could be successfully applied to the description of images from two different spectral bands, with just minor modifications. In this framework, a novel CNN-based network model, specifically intended to describe image patches from two different spectral bands, is proposed. This network, referred to as Q-Net, outperforms state of the art in the cross-spectral domain, including both previous hand-made solutions as well as L2 CNN-based architectures. The third contribution of this dissertation is in the cross-spectral feature description application domain. The multispectral odometry problem is tackled showing a real application of cross-spectral descriptors
In addition to the three main contributions mentioned above, in this dissertation, two different multi-spectral datasets are generated and shared with the community to be used as benchmarks for further studies.
|
|
|
Jose Antonio Rodriguez, & Florent Perronnin. (2008). Local Gradient Histogram Features for Word Spotting in Unconstrained Handwritten Documents. In J.M. Ogier J. L. W. Liu (Ed.), Graphics Recognition: Recent Advances and New Opportunities (Vol. 5046, 188–198). LNCS.
|
|
|
Jose Antonio Rodriguez, & Florent Perronnin. (2008). Local Gradient Histogram Features for Word Spotting in Unconstrained Handwritten Documents. In International Conference on Frontiers in Handwriting Recognition (7–12).
|
|
|
Oriol Ramos Terrades, & Ernest Valveny. (2005). Local Norm Features based on ridgelets Transform.
|
|
|
Jaime Moreno, Xavier Otazu, & Maria Vanrell. (2010). Local Perceptual Weighting in JPEG2000 for Color Images. In 5th European Conference on Colour in Graphics, Imaging and Vision and 12th International Symposium on Multispectral Colour Science (255–260).
Abstract: The aim of this work is to explain how to apply perceptual concepts to define a perceptual pre-quantizer and to improve JPEG2000 compressor. The approach consists in quantizing wavelet transform coefficients using some of the human visual system behavior properties. Noise is fatal to image compression performance, because it can be both annoying for the observer and consumes excessive bandwidth when the imagery is transmitted. Perceptual pre-quantization reduces unperceivable details and thus improve both visual impression and transmission properties. The comparison between JPEG2000 without and with perceptual pre-quantization shows that the latter is not favorable in PSNR, but the recovered image is more compressed at the same or even better visual quality measured with a weighted PSNR. Perceptual criteria were taken from the CIWaM (Chromatic Induction Wavelet Model).
|
|
|
Shiqi Yang, Yaxing Wang, Kai Wang, Shangling Jui, & Joost Van de Weijer. (2022). Local Prediction Aggregation: A Frustratingly Easy Source-free Domain Adaptation Method.
Abstract: We propose a simple but effective source-free domain adaptation (SFDA) method. Treating SFDA as an unsupervised clustering problem and following the intuition that local neighbors in feature space should have more similar predictions than other features, we propose to optimize an objective of prediction consistency. This objective encourages local neighborhood features in feature space to have similar predictions while features farther away in feature space have dissimilar predictions, leading to efficient feature clustering and cluster assignment simultaneously. For efficient training, we seek to optimize an upper-bound of the objective resulting in two simple terms. Furthermore, we relate popular existing methods in domain adaptation, source-free domain adaptation and contrastive learning via the perspective of discriminability and diversity. The experimental results prove the superiority of our method, and our method can be adopted as a simple but strong baseline for future research in SFDA. Our method can be also adapted to source-free open-set and partial-set DA which further shows the generalization ability of our method. Code is available in this https URL.
|
|
|
Lorenzo Seidenari, Giuseppe Serra, Andrew Bagdanov, & Alberto del Bimbo. (2014). Local pyramidal descriptors for image recognition. TPAMI - IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(5), 1033–1040.
Abstract: In this paper we present a novel method to improve the flexibility of descriptor matching for image recognition by using local multiresolution
pyramids in feature space. We propose that image patches be represented at multiple levels of descriptor detail and that these levels be defined in terms of local spatial pooling resolution. Preserving multiple levels of detail in local descriptors is a way of hedging one’s bets on which levels will most relevant for matching during learning and recognition. We introduce the Pyramid SIFT (P-SIFT) descriptor and show that its use in four state-of-the-art image recognition pipelines improves accuracy and yields state-of-the-art results. Our technique is applicable independently of spatial pyramid matching and we show that spatial pyramids can be combined with local pyramids to obtain
further improvement.We achieve state-of-the-art results on Caltech-101
(80.1%) and Caltech-256 (52.6%) when compared to other approaches based on SIFT features over intensity images. Our technique is efficient and is extremely easy to integrate into image recognition pipelines.
Keywords: Object categorization; local features; kernel methods
|
|
|
Christophe Rigaud, & Clement Guerin. (2014). Localisation contextuelle des personnages de bandes dessinées. In Colloque International Francophone sur l'Écrit et le Document.
Abstract: Les auteurs proposent une méthode de localisation des personnages dans des cases de bandes dessinées en s'appuyant sur les caractéristiques des bulles de dialogue. L'évaluation montre un taux de localisation des personnages allant jusqu'à 65%.
|
|
|
Mohammad Ali Bagheri, Qigang Gao, Sergio Escalera, Huamin Ren, Thomas B. Moeslund, & Elham Etemad. (2017). Locality Regularized Group Sparse Coding for Action Recognition. CVIU - Computer Vision and Image Understanding, 158, 106–114.
Abstract: Bag of visual words (BoVW) models are widely utilized in image/ video representation and recognition. The cornerstone of these models is the encoding stage, in which local features are decomposed over a codebook in order to obtain a representation of features. In this paper, we propose a new encoding algorithm by jointly encoding the set of local descriptors of each sample and considering the locality structure of descriptors. The proposed method takes advantages of locality coding such as its stability and robustness to noise in descriptors, as well as the strengths of the group coding strategy by taking into account the potential relation among descriptors of a sample. To efficiently implement our proposed method, we consider the Alternating Direction Method of Multipliers (ADMM) framework, which results in quadratic complexity in the problem size. The method is employed for a challenging classification problem: action recognition by depth cameras. Experimental results demonstrate the outperformance of our methodology compared to the state-of-the-art on the considered datasets.
Keywords: Bag of words; Feature encoding; Locality constrained coding; Group sparse coding; Alternating direction method of multipliers; Action recognition
|
|
|
Eva Costa. (2001). Localitzacio i seguiment de persones amb una camera amb Pan, Tilt i Zoom.
|
|
|
A.S. Coquel, Jean-Pascal Jacob, M. Primet, A. Demarez, Mariella Dimiccoli, T. Julou, et al. (2013). Localization of protein aggregation in Escherichia coli is governed by diffusion and nucleoid macromolecular crowding effect. PCB - Plos Computational Biology, 9(4).
Abstract: Aggregates of misfolded proteins are a hallmark of many age-related diseases. Recently, they have been linked to aging of Escherichia coli (E. coli) where protein aggregates accumulate at the old pole region of the aging bacterium. Because of the potential of E. coli as a model organism, elucidating aging and protein aggregation in this bacterium may pave the way to significant advances in our global understanding of aging. A first obstacle along this path is to decipher the mechanisms by which protein aggregates are targeted to specific intercellular locations. Here, using an integrated approach based on individual-based modeling, time-lapse fluorescence microscopy and automated image analysis, we show that the movement of aging-related protein aggregates in E. coli is purely diffusive (Brownian). Using single-particle tracking of protein aggregates in live E. coli cells, we estimated the average size and diffusion constant of the aggregates. Our results provide evidence that the aggregates passively diffuse within the cell, with diffusion constants that depend on their size in agreement with the Stokes-Einstein law. However, the aggregate displacements along the cell long axis are confined to a region that roughly corresponds to the nucleoid-free space in the cell pole, thus confirming the importance of increased macromolecular crowding in the nucleoids. We thus used 3D individual-based modeling to show that these three ingredients (diffusion, aggregation and diffusion hindrance in the nucleoids) are sufficient and necessary to reproduce the available experimental data on aggregate localization in the cells. Taken together, our results strongly support the hypothesis that the localization of aging-related protein aggregates in the poles of E. coli results from the coupling of passive diffusion-aggregation with spatially non-homogeneous macromolecular crowding. They further support the importance of “soft” intracellular structuring (based on macromolecular crowding) in diffusion-based protein localization in E. coli.
|
|
|
Pau Riba, Sounak Dey, Ali Furkan Biten, & Josep Llados. (2021). Localizing Infinity-shaped fishes: Sketch-guided object localization in the wild.
Abstract: This work investigates the problem of sketch-guided object localization (SGOL), where human sketches are used as queries to conduct the object localization in natural images. In this cross-modal setting, we first contribute with a tough-to-beat baseline that without any specific SGOL training is able to outperform the previous works on a fixed set of classes. The baseline is useful to analyze the performance of SGOL approaches based on available simple yet powerful methods. We advance prior arts by proposing a sketch-conditioned DETR (DEtection TRansformer) architecture which avoids a hard classification and alleviates the domain gap between sketches and images to localize object instances. Although the main goal of SGOL is focused on object detection, we explored its natural extension to sketch-guided instance segmentation. This novel task allows to move towards identifying the objects at pixel level, which is of key importance in several applications. We experimentally demonstrate that our model and its variants significantly advance over previous state-of-the-art results. All training and testing code of our model will be released to facilitate future researchhttps://github.com/priba/sgol_wild.
|
|
|
Esmitt Ramirez, Carles Sanchez, & Debora Gil. (2019). Localizing Pulmonary Lesions Using Fuzzy Deep Learning. In 21st International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (pp. 290–294).
Abstract: The usage of medical images is part of the clinical daily in several healthcare centers around the world. Particularly, Computer Tomography (CT) images are an important key in the early detection of suspicious lung lesions. The CT image exploration allows the detection of lung lesions before any invasive procedure (e.g. bronchoscopy, biopsy). The effective localization of lesions is performed using different image processing and computer vision techniques. Lately, the usage of deep learning models into medical imaging from detection to prediction shown that is a powerful tool for Computer-aided software. In this paper, we present an approach to localize pulmonary lung lesion using fuzzy deep learning. Our approach uses a simple convolutional neural network based using the LIDC-IDRI dataset. Each image is divided into patches associated a probability vector (fuzzy) according their belonging to anatomical structures on a CT. We showcase our approach as part of a full CAD system to exploration, planning, guiding and detection of pulmonary lesions.
|
|