|
Fahad Shahbaz Khan, Joost Van de Weijer, Muhammad Anwer Rao, Michael Felsberg, & Carlo Gatta. (2014). Semantic Pyramids for Gender and Action Recognition. TIP - IEEE Transactions on Image Processing, 23(8), 3633–3645.
Abstract: Person description is a challenging problem in computer vision. We investigated two major aspects of person description: 1) gender and 2) action recognition in still images. Most state-of-the-art approaches for gender and action recognition rely on the description of a single body part, such as face or full-body. However, relying on a single body part is suboptimal due to significant variations in scale, viewpoint, and pose in real-world images. This paper proposes a semantic pyramid approach for pose normalization. Our approach is fully automatic and based on combining information from full-body, upper-body, and face regions for gender and action recognition in still images. The proposed approach does not require any annotations for upper-body and face of a person. Instead, we rely on pretrained state-of-the-art upper-body and face detectors to automatically extract semantic information of a person. Given multiple bounding boxes from each body part detector, we then propose a simple method to select the best candidate bounding box, which is used for feature extraction. Finally, the extracted features from the full-body, upper-body, and face regions are combined into a single representation for classification. To validate the proposed approach for gender recognition, experiments are performed on three large data sets namely: 1) human attribute; 2) head-shoulder; and 3) proxemics. For action recognition, we perform experiments on four data sets most used for benchmarking action recognition in still images: 1) Sports; 2) Willow; 3) PASCAL VOC 2010; and 4) Stanford-40. Our experiments clearly demonstrate that the proposed approach, despite its simplicity, outperforms state-of-the-art methods for gender and action recognition.
|
|
|
Fahad Shahbaz Khan, Joost Van de Weijer, Andrew Bagdanov, & Michael Felsberg. (2014). Scale Coding Bag-of-Words for Action Recognition. In 22nd International Conference on Pattern Recognition (pp. 1514–1519).
Abstract: Recognizing human actions in still images is a challenging problem in computer vision due to significant amount of scale, illumination and pose variation. Given the bounding box of a person both at training and test time, the task is to classify the action associated with each bounding box in an image.
Most state-of-the-art methods use the bag-of-words paradigm for action recognition. The bag-of-words framework employing a dense multi-scale grid sampling strategy is the de facto standard for feature detection. This results in a scale invariant image representation where all the features at multiple-scales are binned in a single histogram. We argue that such a scale invariant
strategy is sub-optimal since it ignores the multi-scale information
available with each bounding box of a person.
This paper investigates alternative approaches to scale coding for action recognition in still images. We encode multi-scale information explicitly in three different histograms for small, medium and large scale visual-words. Our first approach exploits multi-scale information with respect to the image size. In our second approach, we encode multi-scale information relative to the size of the bounding box of a person instance. In each approach, the multi-scale histograms are then concatenated into a single representation for action classification. We validate our approaches on the Willow dataset which contains seven action categories: interacting with computer, photography, playing music,
riding bike, riding horse, running and walking. Our results clearly suggest that the proposed scale coding approaches outperform the conventional scale invariant technique. Moreover, we show that our approach obtains promising results compared to more complex state-of-the-art methods.
|
|
|
Marcos V Conde, Javier Vazquez, Michael S Brown, & Radu TImofte. (2024). NILUT: Conditional Neural Implicit 3D Lookup Tables for Image Enhancement. In 38th AAAI Conference on Artificial Intelligence.
Abstract: 3D lookup tables (3D LUTs) are a key component for image enhancement. Modern image signal processors (ISPs) have dedicated support for these as part of the camera rendering pipeline. Cameras typically provide multiple options for picture styles, where each style is usually obtained by applying a unique handcrafted 3D LUT. Current approaches for learning and applying 3D LUTs are notably fast, yet not so memory-efficient, as storing multiple 3D LUTs is required. For this reason and other implementation limitations, their use on mobile devices is less popular. In this work, we propose a Neural Implicit LUT (NILUT), an implicitly defined continuous 3D color transformation parameterized by a neural network. We show that NILUTs are capable of accurately emulating real 3D LUTs. Moreover, a NILUT can be extended to incorporate multiple styles into a single network with the ability to blend styles implicitly. Our novel approach is memory-efficient, controllable and can complement previous methods, including learned ISPs.
|
|
|
Danna Xue, Javier Vazquez, Luis Herranz, Yang Zhang, & Michael S Brown. (2023). Integrating High-Level Features for Consistent Palette-based Multi-image Recoloring. CGF - Computer Graphics Forum, .
Abstract: Achieving visually consistent colors across multiple images is important when images are used in photo albums, websites, and brochures. Unfortunately, only a handful of methods address multi-image color consistency compared to one-to-one color transfer techniques. Furthermore, existing methods do not incorporate high-level features that can assist graphic designers in their work. To address these limitations, we introduce a framework that builds upon a previous palette-based color consistency method and incorporates three high-level features: white balance, saliency, and color naming. We show how these features overcome the limitations of the prior multi-consistency workflow and showcase the user-friendly nature of our framework.
|
|
|
Danna Xue, Luis Herranz, Javier Vazquez, & Yanning Zhang. (2023). Burst Perception-Distortion Tradeoff: Analysis and Evaluation. In IEEE International Conference on Acoustics, Speech and Signal Processing.
Abstract: Burst image restoration attempts to effectively utilize the complementary cues appearing in sequential images to produce a high-quality image. Most current methods use all the available images to obtain the reconstructed image. However, using more images for burst restoration is not always the best option regarding reconstruction quality and efficiency, as the images acquired by handheld imaging devices suffer from degradation and misalignment caused by the camera noise and shake. In this paper, we extend the perception-distortion tradeoff theory by introducing multiple-frame information. We propose the area of the unattainable region as a new metric for perception-distortion tradeoff evaluation and comparison. Based on this metric, we analyse the performance of burst restoration from the perspective of the perception-distortion tradeoff under both aligned bursts and misaligned bursts situations. Our analysis reveals the importance of inter-frame alignment for burst restoration and shows that the optimal burst length for the restoration model depends both on the degree of degradation and misalignment.
|
|
|
Xim Cerda-Company, C. Alejandro Parraga, & Xavier Otazu. (2014). Which tone-mapping is the best? A comparative study of tone-mapping perceived quality. In Perception (Vol. 43, 106).
Abstract: Perception 43 ECVP Abstract Supplement
High-dynamic-range (HDR) imaging refers to the methods designed to increase the brightness dynamic range present in standard digital imaging techniques. This increase is achieved by taking the same picture under dierent exposure values and mapping the intensity levels into a single image by way of a tone-mapping operator (TMO). Currently, there is no agreement on how to evaluate the quality
of dierent TMOs. In this work we psychophysically evaluate 15 dierent TMOs obtaining rankings based on the perceived properties of the resulting tone-mapped images. We performed two dierent experiments on a CRT calibrated display using 10 subjects: (1) a study of the internal relationships between grey-levels and (2) a pairwise comparison of the resulting 15 tone-mapped images. In (1) observers internally matched the grey-levels to a reference inside the tone-mapped images and in the real scene. In (2) observers performed a pairwise comparison of the tone-mapped images alongside the real scene. We obtained two rankings of the TMOs according their performance. In (1) the best algorithm
was ICAM by J.Kuang et al (2007) and in (2) the best algorithm was a TMO by Krawczyk et al (2005). Our results also show no correlation between these two rankings.
|
|
|
Maria del Camp Davesa. (2011). Human action categorization in image sequences (Vol. 169). Master's thesis, , .
|
|
|
Albert Gordo. (2009). A Cyclic Page Layout Descriptor for Document Classification & Retrieval (Vol. 128). Master's thesis, , Bellaterra, Barcelona.
|
|
|
Arjan Gijsenij, Theo Gevers, & Joost Van de Weijer. (2012). Improving Color Constancy by Photometric Edge Weighting. TPAMI - IEEE Transaction on Pattern Analysis and Machine Intelligence, 34(5), 918–929.
Abstract: : Edge-based color constancy methods make use of image derivatives to estimate the illuminant. However, different edge types exist in real-world images such as material, shadow and highlight edges. These different edge types may have a distinctive influence on the performance of the illuminant estimation. Therefore, in this paper, an extensive analysis is provided of different edge types on the performance of edge-based color constancy methods. First, an edge-based taxonomy is presented classifying edge types based on their photometric properties (e.g. material, shadow-geometry and highlights). Then, a performance evaluation of edge-based color constancy is provided using these different edge types. From this performance evaluation it is derived that specular and shadow edge types are more valuable than material edges for the estimation of the illuminant. To this end, the (iterative) weighted Grey-Edge algorithm is proposed in which these edge types are more emphasized for the estimation of the illuminant. Images that are recorded under controlled circumstances demonstrate that the proposed iterative weighted Grey-Edge algorithm based on highlights reduces the median angular error with approximately $25\%$. In an uncontrolled environment, improvements in angular error up to $11\%$ are obtained with respect to regular edge-based color constancy.
|
|
|
Josep Llados, J. Lopez-Krahe, & Enric Marti. (1999). A Hough-based method for hatched pattern detection in maps and diagrams..
|
|
|
Josep Llados, Gemma Sanchez, & Enric Marti. (1997). A String-Based Method to Recognize Symbols and Structural Textures in Architectural Plans..
|
|
|
V. Chapaprieta, & Ernest Valveny. (2001). Handwritten Digit Recognition Using Point Distribution Models..
|
|
|
Josep Llados. (1996). Interpretacio de dibuixos linials fets a ma alçada mitjançant isomorfisme entre subgrafs i transformacio de Hough.
|
|
|
Gemma Sanchez, & Josep Llados. (2001). A Graph Grammar to Recognize Textured Symbols..
|
|
|
Gemma Sanchez, Josep Llados, & K. Tombre. (2001). An Algorithm to Recognize Graphical Textured Symbols using String Representations..
|
|