Alex Falcon, Swathikiran Sudhakaran, Giuseppe Serra, Sergio Escalera, & Oswald Lanz. (2022). Relevance-based Margin for Contrastively-trained Video Retrieval Models. In ICMR '22: Proceedings of the 2022 International Conference on Multimedia Retrieval (pp. 146–157).
Abstract: Video retrieval using natural language queries has attracted increasing interest due to its relevance in real-world applications, from intelligent access in private media galleries to web-scale video search. Learning the cross-similarity of video and text in a joint embedding space is the dominant approach. To do so, a contrastive loss is usually employed because it organizes the embedding space by putting similar items close and dissimilar items far. This framework leads to competitive recall rates, as they solely focus on the rank of the groundtruth items. Yet, assessing the quality of the ranking list is of utmost importance when considering intelligent retrieval systems, since multiple items may share similar semantics, hence a high relevance. Moreover, the aforementioned framework uses a fixed margin to separate similar and dissimilar items, treating all non-groundtruth items as equally irrelevant. In this paper we propose to use a variable margin: we argue that varying the margin used during training based on how much relevant an item is to a given query, i.e. a relevance-based margin, easily improves the quality of the ranking lists measured through nDCG and mAP. We demonstrate the advantages of our technique using different models on EPIC-Kitchens-100 and YouCook2. We show that even if we carefully tuned the fixed margin, our technique (which does not have the margin as a hyper-parameter) would still achieve better performance. Finally, extensive ablation studies and qualitative analysis support the robustness of our approach. Code will be released at \urlhttps://github.com/aranciokov/RelevanceMargin-ICMR22.
|
Jaume Amores, David Geronimo, & Antonio Lopez. (2010). Multiple instance and active learning for weakly-supervised object-class segmentation. In 3rd IEEE International Conference on Machine Vision.
Abstract: In object-class segmentation, one of the most tedious tasks is to manually segment many object examples in order to learn a model of the object category. Yet, there has been little research on reducing the degree of manual annotation for
object-class segmentation. In this work we explore alternative strategies which do not require full manual segmentation of the object in the training set. In particular, we study the use of bounding boxes as a coarser and much cheaper form of segmentation and we perform a comparative study of several Multiple-Instance Learning techniques that allow to obtain a model with this type of weak annotation. We show that some of these methods can be competitive, when used with coarse
segmentations, with methods that require full manual segmentation of the objects. Furthermore, we show how to use active learning combined with this weakly supervised strategy.
As we see, this strategy permits to reduce the amount of annotation and optimize the number of examples that require full manual segmentation in the training set.
Keywords: Multiple Instance Learning; Active Learning; Object-class segmentation.
|
Maedeh Aghaei, Mariella Dimiccoli, & Petia Radeva. (2015). Towards social interaction detection in egocentric photo-streams. In Proceedings of SPIE, 8th International Conference on Machine Vision , ICMV 2015 (Vol. 9875).
Abstract: Detecting social interaction in videos relying solely on visual cues is a valuable task that is receiving increasing attention in recent years. In this work, we address this problem in the challenging domain of egocentric photo-streams captured by a low temporal resolution wearable camera (2fpm). The major difficulties to be handled in this context are the sparsity of observations as well as unpredictability of camera motion and attention orientation due to the fact that the camera is worn as part of clothing. Our method consists of four steps: multi-faces localization and tracking, 3D localization, pose estimation and analysis of f-formations. By estimating pair-to-pair interaction probabilities over the sequence, our method states the presence or absence of interaction with the camera wearer and specifies which people are more involved in the interaction. We tested our method over a dataset of 18.000 images and we show its reliability on our considered purpose. © (2015) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
|
Katerine Diaz, Francesc J. Ferri, & W. Diaz. (2013). Fast Approximated Discriminative Common Vectors using rank-one SVD updates. In 20th International Conference On Neural Information Processing (Vol. 8228, pp. 368–375). LNCS. Springer Berlin Heidelberg.
Abstract: An efficient incremental approach to the discriminative common vector (DCV) method for dimensionality reduction and classification is presented. The proposal consists of a rank-one update along with an adaptive restriction on the rank of the null space which leads to an approximate but convenient solution. The algorithm can be implemented very efficiently in terms of matrix operations and space complexity, which enables its use in large-scale dynamic application domains. Deep comparative experimentation using publicly available high dimensional image datasets has been carried out in order to properly assess the proposed algorithm against several recent incremental formulations.
K. Diaz-Chito, F.J. Ferri, W. Diaz
|
F. de la Torre, Jordi Vitria, Petia Radeva, & J. Melenchon. (2000). EigenFiltering for flexible Eigentracking. In 15 th International Conference on Pattern Recognition (Vol. 3, pp. 1118–1121).
|
C. Sbert, & A.F. Sole. (2000). Stereo reconstruction of 3D curves. In 15 th International Conference on Pattern Recognition (Vol. 1, 912–915).
|
J.M. Sanchez, & X. Binefa. (2000). Color Normalization for Appearance Based Recognition of Video Key-frames. In 15 th International Conference on Pattern Recognition (Vol. 1, pp. 815–818).
|
Margarita Torre, & Petia Radeva. (2000). Agricultural-Field Extraction on Aerial Images by Region Competition Algorithm. In 15 th International Conference on Pattern Recognition (Vol. 1, pp. 313–316).
|
Robert Benavente, Gemma Sanchez, Ramon Baldrich, Maria Vanrell, & Josep Llados. (2000). Normalized colour segmentation for human appearance description. In 15 th International Conference on Pattern Recognition (Vol. 3, pp. 637–641).
|
X. Orriols, Ricardo Toledo, X. Binefa, Petia Radeva, Jordi Vitria, & Juan J. Villanueva. (2000). Probabilistic Saliency Approach for Elongated Structure Detection using Deformable Models. In 15 th International Conference on Pattern Recognition (Vol. 3, pp. 1006–1009).
|
A. Pujol, Juan J. Villanueva, & H. Wechsler. (2000). Automatic View Based Caricaturing. In 15 th International Conference on Pattern Recognition (Vol. 1, pp. 1072–1075).
|
Javier Varona, Jordi Gonzalez, Xavier Roca, & Juan J. Villanueva. (2000). iTrack: Image-based Probabilistic Tracking of People. In 15 th International Conference on Pattern Recognition (Vol. 3, pp. 1122–1125).
|
David Guillamet, & Jordi Vitria. (2000). A Comparison of Local versus Global Color Histograms for Object Recognition. In 15 th International Conference on Pattern Recognition (Vol. 2, pp. 422–425).
|
V. Valev, B. Sankur, & Petia Radeva. (2000). Generalized Non Reducible Descriptors. In 15 th International Conference on Pattern Recognition (Vol. 2, p. 397).
|
David Lloret, Joan Serrat, Antonio Lopez, A. Soler, & Juan J. Villanueva. (2000). Retinal image registration using creases as anatomical landmarks. In 15 th International Conference on Pattern Recognition (Vol. 3, pp. 207–2010).
Abstract: Retinal images are routinely used in ophthalmology to study the optical nerve head and the retina. To assess objectively the evolution of an illness, images taken at different times must be registered. Most methods so far have been designed specifically for a single image modality, like temporal series or stereo pairs of angiographies, fluorescein angiographies or scanning laser ophthalmoscope (SLO) images, which makes them prone to fail when conditions vary. In contrast, the method we propose has shown to be accurate and reliable on all the former modalities. It has been adapted from the 3D registration of CT and MR image to 2D. Relevant features (also known as landmarks) are extracted by means of a robust creaseness operator, and resulting images are iteratively transformed until a maximum in their correlation is achieved. Our method has succeeded in more than 100 pairs tried so far, in all cases including also the scaling as a parameter to be optimized
|