|
Records |
Links |
|
Author |
Javier Selva; Anders S. Johansen; Sergio Escalera; Kamal Nasrollahi; Thomas B. Moeslund; Albert Clapes |
|
|
Title |
Video transformers: A survey |
Type |
Journal Article |
|
Year |
2023 |
Publication |
IEEE Transactions on Pattern Analysis and Machine Intelligence |
Abbreviated Journal |
TPAMI |
|
|
Volume |
45 |
Issue |
11 |
Pages |
12922-12943 |
|
|
Keywords |
Artificial Intelligence; Computer Vision; Self-Attention; Transformers; Video Representations |
|
|
Abstract |
Transformer models have shown great success handling long-range interactions, making them a promising tool for modeling video. However, they lack inductive biases and scale quadratically with input length. These limitations are further exacerbated when dealing with the high dimensionality introduced by the temporal dimension. While there are surveys analyzing the advances of Transformers for vision, none focus on an in-depth analysis of video-specific designs. In this survey, we analyze the main contributions and trends of works leveraging Transformers to model video. Specifically, we delve into how videos are handled at the input level first. Then, we study the architectural changes made to deal with video more efficiently, reduce redundancy, re-introduce useful inductive biases, and capture long-term temporal dynamics. In addition, we provide an overview of different training regimes and explore effective self-supervised learning strategies for video. Finally, we conduct a performance comparison on the most common benchmark for Video Transformers (i.e., action classification), finding them to outperform 3D ConvNets even with less computational complexity. |
|
|
Address |
1 Nov. 2023 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
HUPBA; no menciona |
Approved |
no |
|
|
Call Number |
Admin @ si @ SJE2023 |
Serial |
3823 |
|
Permanent link to this record |
|
|
|
|
Author |
Jelena Gorbova; Egils Avots; Iiris Lusi; Mark Fishel; Sergio Escalera; Gholamreza Anbarjafari |
|
|
Title |
Integrating Vision and Language for First Impression Personality Analysis |
Type |
Journal Article |
|
Year |
2018 |
Publication |
IEEE Multimedia |
Abbreviated Journal |
MULTIMEDIA |
|
|
Volume |
25 |
Issue |
2 |
Pages |
24 - 33 |
|
|
Keywords |
|
|
|
Abstract |
The authors present a novel methodology for analyzing integrated audiovisual signals and language to assess a persons personality. An evaluation of their proposed multimodal method using a job candidate screening system that predicted five personality traits from a short video demonstrates the methods effectiveness. |
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
HUPBA; 602.133 |
Approved |
no |
|
|
Call Number |
Admin @ si @ GAL2018 |
Serial |
3124 |
|
Permanent link to this record |
|
|
|
|
Author |
Jianzhy Guo; Zhen Lei; Jun Wan; Egils Avots; Noushin Hajarolasvadi; Boris Knyazev; Artem Kuharenko; Julio C. S. Jacques Junior; Xavier Baro; Hasan Demirel; Sergio Escalera; Juri Allik; Gholamreza Anbarjafari |
|
|
Title |
Dominant and Complementary Emotion Recognition from Still Images of Faces |
Type |
Journal Article |
|
Year |
2018 |
Publication |
IEEE Access |
Abbreviated Journal |
ACCESS |
|
|
Volume |
6 |
Issue |
|
Pages |
26391 - 26403 |
|
|
Keywords |
|
|
|
Abstract |
Emotion recognition has a key role in affective computing. Recently, fine-grained emotion analysis, such as compound facial expression of emotions, has attracted high interest of researchers working on affective computing. A compound facial emotion includes dominant and complementary emotions (e.g., happily-disgusted and sadly-fearful), which is more detailed than the seven classical facial emotions (e.g., happy, disgust, and so on). Current studies on compound emotions are limited to use data sets with limited number of categories and unbalanced data distributions, with labels obtained automatically by machine learning-based algorithms which could lead to inaccuracies. To address these problems, we released the iCV-MEFED data set, which includes 50 classes of compound emotions and labels assessed by psychologists. The task is challenging due to high similarities of compound facial emotions from different categories. In addition, we have organized a challenge based on the proposed iCV-MEFED data set, held at FG workshop 2017. In this paper, we analyze the top three winner methods and perform further detailed experiments on the proposed data set. Experiments indicate that pairs of compound emotion (e.g., surprisingly-happy vs happily-surprised) are more difficult to be recognized if compared with the seven basic emotions. However, we hope the proposed data set can help to pave the way for further research on compound facial emotion recognition. |
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
HUPBA; no proj |
Approved |
no |
|
|
Call Number |
Admin @ si @ GLW2018 |
Serial |
3122 |
|
Permanent link to this record |
|
|
|
|
Author |
Joakim Bruslund Haurum; Meysam Madadi; Sergio Escalera; Thomas B. Moeslund |
|
|
Title |
Multi-scale hybrid vision transformer and Sinkhorn tokenizer for sewer defect classification |
Type |
Journal Article |
|
Year |
2022 |
Publication |
Automation in Construction |
Abbreviated Journal |
AC |
|
|
Volume |
144 |
Issue |
|
Pages |
104614 |
|
|
Keywords |
Sewer Defect Classification; Vision Transformers; Sinkhorn-Knopp; Convolutional Neural Networks; Closed-Circuit Television; Sewer Inspection |
|
|
Abstract |
A crucial part of image classification consists of capturing non-local spatial semantics of image content. This paper describes the multi-scale hybrid vision transformer (MSHViT), an extension of the classical convolutional neural network (CNN) backbone, for multi-label sewer defect classification. To better model spatial semantics in the images, features are aggregated at different scales non-locally through the use of a lightweight vision transformer, and a smaller set of tokens was produced through a novel Sinkhorn clustering-based tokenizer using distinct cluster centers. The proposed MSHViT and Sinkhorn tokenizer were evaluated on the Sewer-ML multi-label sewer defect classification dataset, showing consistent performance improvements of up to 2.53 percentage points. |
|
|
Address |
Dec 2022 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
HuPBA |
Approved |
no |
|
|
Call Number |
Admin @ si @ BME2022c |
Serial |
3780 |
|
Permanent link to this record |
|
|
|
|
Author |
Jordi Esquirol; Cristina Palmero; Vanessa Bayo; Miquel Angel Cos; Sergio Escalera; David Sanchez; Maider Sanchez; Noelia Serrano; Mireia Relats |
|
|
Title |
Automatic RBG-depth-pressure anthropometric analysis and individualised sleep solution prescription |
Type |
Journal |
|
Year |
2017 |
Publication |
Journal of Medical Engineering & Technology |
Abbreviated Journal |
JMET |
|
|
Volume |
41 |
Issue |
6 |
Pages |
486-497 |
|
|
Keywords |
|
|
|
Abstract |
INTRODUCTION:
Sleep surfaces must adapt to individual somatotypic features to maintain a comfortable, convenient and healthy sleep, preventing diseases and injuries. Individually determining the most adequate rest surface can often be a complex and subjective question.
OBJECTIVES:
To design and validate an automatic multimodal somatotype determination model to automatically recommend an individually designed mattress-topper-pillow combination.
METHODS:
Design and validation of an automated prescription model for an individualised sleep system is performed through a single-image 2 D-3 D analysis and body pressure distribution, to objectively determine optimal individual sleep surfaces combining five different mattress densities, three different toppers and three cervical pillows.
RESULTS:
A final study (n = 151) and re-analysis (n = 117) defined and validated the model, showing high correlations between calculated and real data (>85% in height and body circumferences, 89.9% in weight, 80.4% in body mass index and more than 70% in morphotype categorisation).
CONCLUSIONS:
Somatotype determination model can accurately prescribe an individualised sleep solution. This can be useful for healthy people and for health centres that need to adapt sleep surfaces to people with special needs. Next steps will increase model's accuracy and analise, if this prescribed individualised sleep solution can improve sleep quantity and quality; additionally, future studies will adapt the model to mattresses with technological improvements, tailor-made production and will define interfaces for people with special needs. |
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
HUPBA; no menciona |
Approved |
no |
|
|
Call Number |
Admin @ si @ EPB2017 |
Serial |
3010 |
|
Permanent link to this record |