|
Swathikiran Sudhakaran, Sergio Escalera, & Oswald Lanz. (2021). Learning to Recognize Actions on Objects in Egocentric Video with Attention Dictionaries. TPAMI - IEEE Transactions on Pattern Analysis and Machine Intelligence, .
Abstract: We present EgoACO, a deep neural architecture for video action recognition that learns to pool action-context-object descriptors from frame level features by leveraging the verb-noun structure of action labels in egocentric video datasets. The core component of EgoACO is class activation pooling (CAP), a differentiable pooling operation that combines ideas from bilinear pooling for fine-grained recognition and from feature learning for discriminative localization. CAP uses self-attention with a dictionary of learnable weights to pool from the most relevant feature regions. Through CAP, EgoACO learns to decode object and scene context descriptors from video frame features. For temporal modeling in EgoACO, we design a recurrent version of class activation pooling termed Long Short-Term Attention (LSTA). LSTA extends convolutional gated LSTM with built-in spatial attention and a re-designed output gate. Action, object and context descriptors are fused by a multi-head prediction that accounts for the inter-dependencies between noun-verb-action structured labels in egocentric video datasets. EgoACO features built-in visual explanations, helping learning and interpretation. Results on the two largest egocentric action recognition datasets currently available, EPIC-KITCHENS and EGTEA, show that by explicitly decoding action-context-object descriptors, EgoACO achieves state-of-the-art recognition performance.
|
|
|
T. Alejandra Vidal, A. Sanfeliu, & Juan Andrade. (2006). Autonomous Single Camera Exploration.
|
|
|
T. Alejandra Vidal, Andrew J. Davison, Juan Andrade, & David W. Murray. (2006). Active Control for Single Camera SLAM.
|
|
|
T. Mouats, N. Aouf, Angel Sappa, Cristhian A. Aguilera-Carrasco, & Ricardo Toledo. (2015). Multi-Spectral Stereo Odometry. TITS - IEEE Transactions on Intelligent Transportation Systems, 16(3), 1210–1224.
Abstract: In this paper, we investigate the problem of visual odometry for ground vehicles based on the simultaneous utilization of multispectral cameras. It encompasses a stereo rig composed of an optical (visible) and thermal sensors. The novelty resides in the localization of the cameras as a stereo setup rather
than two monocular cameras of different spectrums. To the best of our knowledge, this is the first time such task is attempted. Log-Gabor wavelets at different orientations and scales are used to extract interest points from both images. These are then described using a combination of frequency and spatial information within the local neighborhood. Matches between the pairs of multimodal images are computed using the cosine similarity function based
on the descriptors. Pyramidal Lucas–Kanade tracker is also introduced to tackle temporal feature matching within challenging sequences of the data sets. The vehicle egomotion is computed from the triangulated 3-D points corresponding to the matched features. A windowed version of bundle adjustment incorporating
Gauss–Newton optimization is utilized for motion estimation. An outlier removal scheme is also included within the framework to deal with outliers. Multispectral data sets were generated and used as test bed. They correspond to real outdoor scenarios captured using our multimodal setup. Finally, detailed results validating the proposed strategy are illustrated.
Keywords: Egomotion estimation; feature matching; multispectral odometry (MO); optical flow; stereo odometry; thermal imagery
|
|
|
T. Widemann, & Xavier Otazu. (2009). Titanias radius and an upper limit on its atmosphere from the September 8, 2001 stellar occultation. International Journal of Solar System Studies, 199(2), 458–476.
Abstract: On September 8, 2001 around 2 h UT, the largest uranian moon, Titania, occulted Hipparcos star 106829 (alias SAO 164538, a V=7.2, K0 III star). This was the first-ever observed occultation by this satellite, a rare event as Titania subtends only 0.11 arcsec on the sky. The star's unusual brightness allowed many observers, both amateurs or professionals, to monitor this unique event, providing fifty-seven occultations chords over three continents, all reported here. Selecting the best 27 occultation chords, and assuming a circular limb, we derive Titania's radius: View the MathML source (1-σ error bar). This implies a density of View the MathML source using the value View the MathML source derived by Taylor [Taylor, D.B., 1998. Astron. Astrophys. 330, 362–374]. We do not detect any significant difference between equatorial and polar radii, in the limit View the MathML source, in agreement with Voyager limb image retrieval during the 1986 flyby. Titania's offset with respect to the DE405 + URA027 (based on GUST86 theory) ephemeris is derived: ΔαTcos(δT)=−108±13 mas and ΔδT=−62±7 mas (ICRF J2000.0 system). Most of this offset is attributable to a Uranus' barycentric offset with respect to DE405, that we estimate to be: View the MathML source and ΔδU=−85±25 mas at the moment of occultation. This offset is confirmed by another Titania stellar occultation observed on August 1st, 2003, which provides an offset of ΔαTcos(δT)=−127±20 mas and ΔδT=−97±13 mas for the satellite. The combined ingress and egress data do not show any significant hint for atmospheric refraction, allowing us to set surface pressure limits at the level of 10–20 nbar. More specifically, we find an upper limit of 13 nbar (1-σ level) at 70 K and 17 nbar at 80 K, for a putative isothermal CO2 atmosphere. We also provide an upper limit of 8 nbar for a possible CH4 atmosphere, and 22 nbar for pure N2, again at the 1-σ level. We finally constrain the stellar size using the time-resolved star disappearance and reappearance at ingress and egress. We find an angular diameter of 0.54±0.03 mas (corresponding to View the MathML source projected at Titania). With a distance of 170±25 parsecs, this corresponds to a radius of 9.8±0.2 solar radii for HIP 106829, typical of a K0 III giant.
Keywords: Occultations; Uranus, satellites; Satellites, shapes; Satellites, dynamics; Ices; Satellites, atmospheres
|
|
|
T.Chauhan, E.Perales, Kaida Xiao, E.Hird, Dimosthenis Karatzas, & Sophie Wuerger. (2014). The achromatic locus: Effect of navigation direction in color space. VSS - Journal of Vision, 14 (1)(25), 1–11.
Abstract: 5Y Impact Factor: 2.99 / 1st (Ophthalmology)
An achromatic stimulus is defined as a patch of light that is devoid of any hue. This is usually achieved by asking observers to adjust the stimulus such that it looks neither red nor green and at the same time neither yellow nor blue. Despite the theoretical and practical importance of the achromatic locus, little is known about the variability in these settings. The main purpose of the current study was to evaluate whether achromatic settings were dependent on the task of the observers, namely the navigation direction in color space. Observers could either adjust the test patch along the two chromatic axes in the CIE u*v* diagram or, alternatively, navigate along the unique-hue lines. Our main result is that the navigation method affects the reliability of these achromatic settings. Observers are able to make more reliable achromatic settings when adjusting the test patch along the directions defined by the four unique hues as opposed to navigating along the main axes in the commonly used CIE u*v* chromaticity plane. This result holds across different ambient viewing conditions (Dark, Daylight, Cool White Fluorescent) and different test luminance levels (5, 20, and 50 cd/m2). The reduced variability in the achromatic settings is consistent with the idea that internal color representations are more aligned with the unique-hue lines than the u* and v* axes.
Keywords: achromatic; unique hues; color constancy; luminance; color space
|
|
|
T.O. Nguyen, Salvatore Tabbone, & Oriol Ramos Terrades. (2008). Symbol Descriptor Based on Shape Context and Vector Model of Information Retrieval. In Proceedings of the 8th IAPR International Workshop on Document Analysis Systems, (pp. 191–197).
|
|
|
T.O. Nguyen, Salvatore Tabbone, Oriol Ramos Terrades, & A.T. Thierry. (2008). Proposition d'un descripteur de formes et du modèle vectoriel pour la recherche de symboles. In Colloque International Francophone sur l'Ecrit et le Document (pp. 79–84).
|
|
|
Tadashi Araki, Nobutaka Ikeda, Nilanjan Dey, Sayan Chakraborty, Luca Saba, Dinesh Kumar, et al. (2015). A comparative approach of four different image registration techniques for quantitative assessment of coronary artery calcium lesions using intravascular ultrasound. CMPB - Computer Methods and Programs in Biomedicine, 118(2), 158–172.
|
|
|
Tadashi Araki, Sumit K. Banchhor, Narendra D. Londhe, Nobutaka Ikeda, Petia Radeva, Devarshi Shukla, et al. (2016). Reliable and Accurate Calcium Volume Measurement in Coronary Artery Using Intravascular Ultrasound Videos. JMS - Journal of Medical Systems, 40(3), 51:1–51:20.
Abstract: Quantitative assessment of calcified atherosclerotic volume within the coronary artery wall is vital for cardiac interventional procedures. The goal of this study is to automatically measure the calcium volume, given the borders of coronary vessel wall for all the frames of the intravascular ultrasound (IVUS) video. Three soft computing fuzzy classification techniques were adapted namely Fuzzy c-Means (FCM), K-means, and Hidden Markov Random Field (HMRF) for automated segmentation of calcium regions and volume computation. These methods were benchmarked against previously developed threshold-based method. IVUS image data sets (around 30,600 IVUS frames) from 15 patients were collected using 40 MHz IVUS catheter (Atlantis® SR Pro, Boston Scientific®, pullback speed of 0.5 mm/s). Calcium mean volume for FCM, K-means, HMRF and threshold-based method were 37.84 ± 17.38 mm3, 27.79 ± 10.94 mm3, 46.44 ± 19.13 mm3 and 35.92 ± 16.44 mm3 respectively. Cross-correlation, Jaccard Index and Dice Similarity were highest between FCM and threshold-based method: 0.99, 0.92 ± 0.02 and 0.95 + 0.02 respectively. Student’s t-test, z-test and Wilcoxon-test are also performed to demonstrate consistency, reliability and accuracy of the results. Given the vessel wall region, the system reliably and automatically measures the calcium volume in IVUS videos. Further, we validated our system against a trained expert using scoring: K-means showed the best performance with an accuracy of 92.80 %. Out procedure and protocol is along the line with method previously published clinically.
Keywords: Interventional cardiology; Atherosclerosis; Coronary arteries; IVUS; calcium volume; Soft computing; Performance Reliability; Accuracy
|
|
|
Tao Wu, Kai Wang, Chuanming Tang, & Jianlin Zhang. (2024). Diffusion-based network for unsupervised landmark detection. Knowledge-Based Systems, 292, 111627.
Abstract: Landmark detection is a fundamental task aiming at identifying specific landmarks that serve as representations of distinct object features within an image. However, the present landmark detection algorithms often adopt complex architectures and are trained in a supervised manner using large datasets to achieve satisfactory performance. When faced with limited data, these algorithms tend to experience a notable decline in accuracy. To address these drawbacks, we propose a novel diffusion-based network (DBN) for unsupervised landmark detection, which leverages the generation ability of the diffusion models to detect the landmark locations. In particular, we introduce a dual-branch encoder (DualE) for extracting visual features and predicting landmarks. Additionally, we lighten the decoder structure for faster inference, referred to as LightD. By this means, we avoid relying on extensive data comparison and the necessity of designing complex architectures as in previous methods. Experiments on CelebA, AFLW, 300W and Deepfashion benchmarks have shown that DBN performs state-of-the-art compared to the existing methods. Furthermore, DBN shows robustness even when faced with limited data cases.
|
|
|
Thanh Ha Do, Oriol Ramos Terrades, & Salvatore Tabbone. (2019). DSD: document sparse-based denoising algorithm. PAA - Pattern Analysis and Applications, 22(1), 177–186.
Abstract: In this paper, we present a sparse-based denoising algorithm for scanned documents. This method can be applied to any kind of scanned documents with satisfactory results. Unlike other approaches, the proposed approach encodes noise documents through sparse representation and visual dictionary learning techniques without any prior noise model. Moreover, we propose a precision parameter estimator. Experiments on several datasets demonstrate the robustness of the proposed approach compared to the state-of-the-art methods on document denoising.
Keywords: Document denoising; Sparse representations; Sparse dictionary learning; Document degradation models
|
|
|
Thanh Ha Do, Salvatore Tabbone, & Oriol Ramos Terrades. (2013). New Approach for Symbol Recognition Combining Shape Context of Interest Points with Sparse Representation. In 12th International Conference on Document Analysis and Recognition (pp. 265–269).
Abstract: In this paper, we propose a new approach for symbol description. Our method is built based on the combination of shape context of interest points descriptor and sparse representation. More specifically, we first learn a dictionary describing shape context of interest point descriptors. Then, based on information retrieval techniques, we build a vector model for each symbol based on its sparse representation in a visual vocabulary whose visual words are columns in the learneddictionary. The retrieval task is performed by ranking symbols based on similarity between vector models. Evaluation of our method, using benchmark datasets, demonstrates the validity of our approach and shows that it outperforms related state-of-theart methods.
|
|
|
Thanh Ha Do, Salvatore Tabbone, & Oriol Ramos Terrades. (2013). Document noise removal using sparse representations over learned dictionary. In Symposium on Document engineering (pp. 161–168).
Abstract: best paper award
In this paper, we propose an algorithm for denoising document images using sparse representations. Following a training set, this algorithm is able to learn the main document characteristics and also, the kind of noise included into the documents. In this perspective, we propose to model the noise energy based on the normalized cross-correlation between pairs of noisy and non-noisy documents. Experimental
results on several datasets demonstrate the robustness of our method compared with the state-of-the-art.
|
|
|
Thanh Ha Do, Salvatore Tabbone, & Oriol Ramos Terrades. (2012). Text/graphic separation using a sparse representation with multi-learned dictionaries. In 21st International Conference on Pattern Recognition.
Abstract: In this paper, we propose a new approach to extract text regions from graphical documents. In our method, we first empirically construct two sequences of learned dictionaries for the text and graphical parts respectively. Then, we compute the sparse representations of all different sizes and non-overlapped document patches in these learned dictionaries. Based on these representations, each patch can be classified into the text or graphic category by comparing its reconstruction errors. Same-sized patches in one category are then merged together to define the corresponding text or graphic layers which are combined to createfinal text/graphic layer. Finally, in a post-processing step, text regions are further filtered out by using some learned thresholds.
Keywords: Graphics Recognition; Layout Analysis; Document Understandin
|
|