Publicacions CVC -- Query Results

<< 1 2 3 4 5 6 7 8 9 10 >> [11–20]

Details

	Records
	Author	Javier Selva; Anders S. Johansen; Sergio Escalera; Kamal Nasrollahi; Thomas B. Moeslund; Albert Clapes
	Title	Video transformers: A survey			Type	Journal Article
	Year	2023	Publication	IEEE Transactions on Pattern Analysis and Machine Intelligence	Abbreviated Journal	TPAMI
	Volume	45	Issue	11	Pages	12922-12943
	Keywords	Artificial Intelligence; Computer Vision; Self-Attention; Transformers; Video Representations
	Abstract	Transformer models have shown great success handling long-range interactions, making them a promising tool for modeling video. However, they lack inductive biases and scale quadratically with input length. These limitations are further exacerbated when dealing with the high dimensionality introduced by the temporal dimension. While there are surveys analyzing the advances of Transformers for vision, none focus on an in-depth analysis of video-specific designs. In this survey, we analyze the main contributions and trends of works leveraging Transformers to model video. Specifically, we delve into how videos are handled at the input level first. Then, we study the architectural changes made to deal with video more efficiently, reduce redundancy, re-introduce useful inductive biases, and capture long-term temporal dynamics. In addition, we provide an overview of different training regimes and explore effective self-supervised learning strategies for video. Finally, we conduct a performance comparison on the most common benchmark for Video Transformers (i.e., action classification), finding them to outperform 3D ConvNets even with less computational complexity.
	Address	1 Nov. 2023
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	HUPBA; no menciona;MILAB			Approved	no
	Call Number	Admin @ si @ SJE2023			Serial	3823
Permanent link to this record



	Author	Lei Li; Fuping Wu; Sihan Wang; Xinzhe Luo; Carlos Martin-Isla; Shuwei Zhai; Jianpeng Zhang; Yanfei Liu; Zhen Zhang; Markus J. Ankenbrand; Haochuan Jiang; Xiaoran Zhang; Linhong Wang; Tewodros Weldebirhan Arega; Elif Altunok; Zhou Zhao; Feiyan Li; Jun Ma; Xiaoping Yang; Elodie Puybareau; Ilkay Oksuz; Stephanie Bricq; Weisheng Li;Kumaradevan Punithakumar; Sotirios A. Tsaftaris; Laura M. Schreiber; Mingjing Yang; Guocai Liu; Yong Xia; Guotai Wang; Sergio Escalera; Xiahai Zhuag
	Title	MyoPS: A benchmark of myocardial pathology segmentation combining three-sequence cardiac magnetic resonance images			Type	Journal Article
	Year	2023	Publication	Medical Image Analysis	Abbreviated Journal	MIA
	Volume	87	Issue		Pages	102808
	Keywords
	Abstract	Assessment of myocardial viability is essential in diagnosis and treatment management of patients suffering from myocardial infarction, and classification of pathology on the myocardium is the key to this assessment. This work defines a new task of medical image analysis, i.e., to perform myocardial pathology segmentation (MyoPS) combining three-sequence cardiac magnetic resonance (CMR) images, which was first proposed in the MyoPS challenge, in conjunction with MICCAI 2020. Note that MyoPS refers to both myocardial pathology segmentation and the challenge in this paper. The challenge provided 45 paired and pre-aligned CMR images, allowing algorithms to combine the complementary information from the three CMR sequences for pathology segmentation. In this article, we provide details of the challenge, survey the works from fifteen participants and interpret their methods according to five aspects, i.e., preprocessing, data augmentation, learning strategy, model architecture and post-processing. In addition, we analyze the results with respect to different factors, in order to examine the key obstacles and explore the potential of solutions, as well as to provide a benchmark for future research. The average Dice scores of submitted algorithms were and for myocardial scars and edema, respectively. We conclude that while promising results have been reported, the research is still in the early stage, and more in-depth exploration is needed before a successful application to the clinics. MyoPS data and evaluation tool continue to be publicly available upon registration via its homepage (www.sdspeople.fudan.edu.cn/zhuangxiahai/0/myops20/).
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	HUPBA;MILAB			Approved	no
	Call Number	Admin @ si @ LWW2023a			Serial	3878
Permanent link to this record



	Author	Razieh Rastgoo; Kourosh Kiani; Sergio Escalera
	Title	A deep co-attentive hand-based video question answering framework using multi-view skeleton			Type	Journal Article
	Year	2023	Publication	Multimedia Tools and Applications	Abbreviated Journal	MTAP
	Volume	82	Issue		Pages	1401–1429
	Keywords
	Abstract	In this paper, we present a novel hand –based Video Question Answering framework, entitled Multi-View Video Question Answering (MV-VQA), employing the Single Shot Detector (SSD), Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM), Bidirectional Encoder Representations from Transformers (BERT), and Co-Attention mechanism with RGB videos as the inputs. Our model includes three main blocks: vision, language, and attention. In the vision block, we employ a novel representation to obtain some efficient multiview features from the hand object using the combination of five 3DCNNs and one LSTM network. To obtain the question embedding, we use the BERT model in language block. Finally, we employ a co-attention mechanism on vision and language features to recognize the final answer. For the first time, we propose such a hand-based Video-QA framework including the multi-view hand skeleton features combined with the question embedding and co-attention mechanism. Our framework is capable of processing the arbitrary numbers of questions in the dataset annotations. There are different application domains for this framework. Here, as an application domain, we applied our framework to dynamic hand gesture recognition for the first time. Since the main object in dynamic hand gesture recognition is the human hand, we performed a step-by-step analysis of the hand detection and multi-view hand skeleton impact on the model performance. Evaluation results on five datasets, including two datasets in VideoQA, two datasets in dynamic hand gesture, and one dataset in hand action recognition show that MV-VQA outperforms state-of-the-art alternatives.
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	HUPBA;MILAB			Approved	no
	Call Number	Admin @ si @ RKE2023b			Serial	3881
Permanent link to this record



	Author	Razieh Rastgoo; Kourosh Kiani; Sergio Escalera
	Title	ZS-GR: zero-shot gesture recognition from RGB-D videos			Type	Journal Article
	Year	2023	Publication	Multimedia Tools and Applications	Abbreviated Journal	MTAP
	Volume	82	Issue		Pages	43781-43796
	Keywords
	Abstract	Gesture Recognition (GR) is a challenging research area in computer vision. To tackle the annotation bottleneck in GR, we formulate the problem of Zero-Shot Gesture Recognition (ZS-GR) and propose a two-stream model from two input modalities: RGB and Depth videos. To benefit from the vision Transformer capabilities, we use two vision Transformer models, for human detection and visual features representation. We configure a transformer encoder-decoder architecture, as a fast and accurate human detection model, to overcome the challenges of the current human detection models. Considering the human keypoints, the detected human body is segmented into nine parts. A spatio-temporal representation from human body is obtained using a vision Transformer and a LSTM network. A semantic space maps the visual features to the lingual embedding of the class labels via a Bidirectional Encoder Representations from Transformers (BERT) model. We evaluated the proposed model on five datasets, Montalbano II, MSR Daily Activity 3D, CAD-60, NTU-60, and isoGD obtaining state-of-the-art results compared to state-of-the-art ZS-GR models as well as the Zero-Shot Action Recognition (ZS-AR).
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	HUPBA;MILAB			Approved	no
	Call Number	Admin @ si @ RKE2023a			Serial	3879
Permanent link to this record



	Author	Reuben Dorent; Aaron Kujawa; Marina Ivory; Spyridon Bakas; Nikola Rieke; Samuel Joutard; Ben Glocker; Jorge Cardoso; Marc Modat; Kayhan Batmanghelich; Arseniy Belkov; Maria Baldeon Calisto; Jae Won Choi; Benoit M. Dawant; Hexin Dong; Sergio Escalera; Yubo Fan; Lasse Hansen; Mattias P. Heinrich; Smriti Joshi; Victoriya Kashtanova; Hyeon Gyu Kim; Satoshi Kondo; Christian N. Kruse; Susana K. Lai-Yuen; Hao Li; Han Liu; Buntheng Ly; Ipek Oguz; Hyungseob Shin; Boris Shirokikh; Zixian Su; Guotai Wang; Jianghao Wu; Yanwu Xu; Kai Yao; Li Zhang; Sebastien Ourselin,
	Title	CrossMoDA 2021 challenge: Benchmark of Cross-Modality Domain Adaptation techniques for Vestibular Schwannoma and Cochlea Segmentation			Type	Journal Article
	Year	2023	Publication	Medical Image Analysis	Abbreviated Journal	MIA
	Volume	83	Issue		Pages	102628
	Keywords	Domain Adaptation; Segmen tation; Vestibular Schwnannoma
	Abstract	Domain Adaptation (DA) has recently raised strong interests in the medical imaging community. While a large variety of DA techniques has been proposed for image segmentation, most of these techniques have been validated either on private datasets or on small publicly available datasets. Moreover, these datasets mostly addressed single-class problems. To tackle these limitations, the Cross-Modality Domain Adaptation (crossMoDA) challenge was organised in conjunction with the 24th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2021). CrossMoDA is the first large and multi-class benchmark for unsupervised cross-modality DA. The challenge's goal is to segment two key brain structures involved in the follow-up and treatment planning of vestibular schwannoma (VS): the VS and the cochleas. Currently, the diagnosis and surveillance in patients with VS are performed using contrast-enhanced T1 (ceT1) MRI. However, there is growing interest in using non-contrast sequences such as high-resolution T2 (hrT2) MRI. Therefore, we created an unsupervised cross-modality segmentation benchmark. The training set provides annotated ceT1 (N=105) and unpaired non-annotated hrT2 (N=105). The aim was to automatically perform unilateral VS and bilateral cochlea segmentation on hrT2 as provided in the testing set (N=137). A total of 16 teams submitted their algorithm for the evaluation phase. The level of performance reached by the top-performing teams is strikingly high (best median Dice – VS:88.4%; Cochleas:85.7%) and close to full supervision (median Dice – VS:92.5%; Cochleas:87.7%). All top-performing methods made use of an image-to-image translation approach to transform the source-domain images into pseudo-target-domain images. A segmentation network was then trained using these generated images and the manual annotations provided for the source image.
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	HUPBA;MILAB			Approved	no
	Call Number	Admin @ si @ DKI2023			Serial	3706
Permanent link to this record

Select All Deselect All

<< 1 2 3 4 5 6 7 8 9 10 >> [11–20]

List View

Citations

Details

All Found Records Selected Records:

Save Citations: Format:

Export Records: Format: