Publicacions CVC -- Query Results

	Publicacions CVC Home \| Show All \| Simple Search \| Advanced Search \| Add Record \| Import	Login Quick Search: Field: contains: ...
	241–255 of 3396 records found matching your query (RSS \| history):

Search & Display Options

Select All Deselect All

[1–10] << 11 12 13 14 15 16 17 18 19 20 >> [21–30]

List View

Citations

Details

	Records
	Author	Iban Berganzo-Besga; Hector A. Orengo; Felipe Lumbreras; Paloma Aliende; Monica N. Ramsey
	Title	Automated detection and classification of multi-cell Phytoliths using Deep Learning-Based Algorithms			Type	Journal Article
	Year	2022	Publication	Journal of Archaeological Science	Abbreviated Journal	JArchSci
	Volume	148	Issue		Pages	105654
	Keywords
	Abstract	This paper presents an algorithm for automated detection and classification of multi-cell phytoliths, one of the major components of many archaeological and paleoenvironmental deposits. This identification, based on phytolith wave pattern, is made using a pretrained VGG19 deep learning model. This approach has been tested in three key phytolith genera for the study of agricultural origins in Near East archaeology: Avena, Hordeum and Triticum. Also, this classification has been validated at species-level using Triticum boeoticum and dicoccoides images. Due to the diversity of microscopes, cameras and chemical treatments that can influence images of phytolith slides, three types of data augmentation techniques have been implemented: rotation of the images at 45-degree angles, random colour and brightness jittering, and random blur/sharpen. The implemented workflow has resulted in an overall accuracy of 93.68% for phytolith genera, improving previous attempts. The algorithm has also demonstrated its potential to automatize the classification of phytoliths species with an overall accuracy of 100%. The open code and platforms employed to develop the algorithm assure the method's accessibility, reproducibility and reusability.
	Address	December 2022
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	MSIAU; MACO; 600.167			Approved	no
	Call Number	Admin @ si @ BOL2022			Serial	3753
Permanent link to this record



	Author	Arnau Baro
	Title	Reading Music Systems: From Deep Optical Music Recognition to Contextual Methods			Type	Book Whole
	Year	2022	Publication	PhD Thesis, Universitat Autonoma de Barcelona-CVC	Abbreviated Journal
	Volume		Issue		Pages
	Keywords
	Abstract	The transcription of sheet music into some machine-readable format can be carried out manually. However, the complexity of music notation inevitably leads to burdensome software for music score editing, which makes the whole process very time-consuming and prone to errors. Consequently, automatic transcription systems for musical documents represent interesting tools. Document analysis is the subject that deals with the extraction and processing of documents through image and pattern recognition. It is a branch of computer vision. Taking music scores as source, the field devoted to address this task is known as Optical Music Recognition (OMR). Typically, an OMR system takes an image of a music score and automatically extracts its content into some symbolic structure such as MEI or MusicXML. In this dissertation, we have investigated different methods for recognizing a single staff section (e.g. scores for violin, flute, etc.), much in the same way as most text recognition research focuses on recognizing words appearing in a given line image. These methods are based in two different methodologies. On the one hand, we present two methods based on Recurrent Neural Networks, in particular, the Long Short-Term Memory Neural Network. On the other hand, a method based on Sequence to Sequence models is detailed. Music context is needed to improve the OMR results, just like language models and dictionaries help in handwriting recognition. For example, syntactical rules and grammars could be easily defined to cope with the ambiguities in the rhythm. In music theory, for example, the time signature defines the amount of beats per bar unit. Thus, in the second part of this dissertation, different methodologies have been investigated to improve the OMR recognition. We have explored three different methods: (a) a graphic tree-structure representation, Dendrograms, that joins, at each level, its primitives following a set of rules, (b) the incorporation of Language Models to model the probability of a sequence of tokens, and (c) graph neural networks to analyze the music scores to avoid meaningless relationships between music primitives. Finally, to train all these methodologies, and given the method-specificity of the datasets in the literature, we have created four different music datasets. Two of them are synthetic with a modern or old handwritten appearance, whereas the other two are real handwritten scores, being one of them modern and the other old.
	Address
	Corporate Author				Thesis	Ph.D. thesis
	Publisher	IMPRIMA	Place of Publication		Editor	Alicia Fornes
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN	978-84-124793-8-6	Medium
	Area		Expedition		Conference
	Notes	DAG;			Approved	no
	Call Number	Admin @ si @ Bar2022			Serial	3754
Permanent link to this record



	Author	Ali Furkan Biten
	Title	A Bitter-Sweet Symphony on Vision and Language: Bias and World Knowledge			Type	Book Whole
	Year	2022	Publication	PhD Thesis, Universitat Autonoma de Barcelona-CVC	Abbreviated Journal
	Volume		Issue		Pages
	Keywords
	Abstract	Vision and Language are broadly regarded as cornerstones of intelligence. Even though language and vision have different aims – language having the purpose of communication, transmission of information and vision having the purpose of constructing mental representations around us to navigate and interact with objects – they cooperate and depend on one another in many tasks we perform effortlessly. This reliance is actively being studied in various Computer Vision tasks, e.g. image captioning, visual question answering, image-sentence retrieval, phrase grounding, just to name a few. All of these tasks share the inherent difficulty of the aligning the two modalities, while being robust to language priors and various biases existing in the datasets. One of the ultimate goal for vision and language research is to be able to inject world knowledge while getting rid of the biases that come with the datasets. In this thesis, we mainly focus on two vision and language tasks, namely Image Captioning and Scene-Text Visual Question Answering (STVQA). In both domains, we start by defining a new task that requires the utilization of world knowledge and in both tasks, we find that the models commonly employed are prone to biases that exist in the data. Concretely, we introduce new tasks and discover several problems that impede performance at each level and provide remedies or possible solutions in each chapter: i) We define a new task to move beyond Image Captioning to Image Interpretation that can utilize Named Entities in the form of world knowledge. ii) We study the object hallucination problem in classic Image Captioning systems and develop an architecture-agnostic solution. iii) We define a sub-task of Visual Question Answering that requires reading the text in the image (STVQA), where we highlight the limitations of current models. iv) We propose an architecture for the STVQA task that can point to the answer in the image and show how to combine it with classic VQA models. v) We show how far language can get us in STVQA and discover yet another bias which causes the models to disregard the image while doing Visual Question Answering.
	Address
	Corporate Author				Thesis	Ph.D. thesis
	Publisher	IMPRIMA	Place of Publication		Editor	Dimosthenis Karatzas;Lluis Gomez
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN	978-84-124793-5-5	Medium
	Area		Expedition		Conference
	Notes	DAG			Approved	no
	Call Number	Admin @ si @ Bit2022			Serial	3755
Permanent link to this record



	Author	Andres Mafla
	Title	Leveraging Scene Text Information for Image Interpretation			Type	Book Whole
	Year	2022	Publication	PhD Thesis, Universitat Autonoma de Barcelona-CVC	Abbreviated Journal
	Volume		Issue		Pages
	Keywords
	Abstract	Until recently, most computer vision models remained illiterate, largely ignoring the semantically rich and explicit information contained in scene text. Recent progress in scene text detection and recognition has recently allowed exploring its role in a diverse set of open computer vision problems, e.g. image classification, image-text retrieval, image captioning, and visual question answering to name a few. The explicit semantics of scene text closely requires specific modeling similar to language. However, scene text is a particular signal that has to be interpreted according to a comprehensive perspective that encapsulates all the visual cues in an image. Incorporating this information is a straightforward task for humans, but if we are unfamiliar with a language or scripture, achieving a complete world understanding is impossible (e.a. visiting a foreign country with a different alphabet). Despite the importance of scene text, modeling it requires considering the several ways in which scene text interacts with an image, processing and fusing an additional modality. In this thesis, we mainly focus on two tasks, scene text-based fine-grained image classification, and cross-modal retrieval. In both studied tasks we identify existing limitations in current approaches and propose plausible solutions. Concretely, in each chapter: i) We define a compact way to embed scene text that generalizes to unseen words at training time while performing in real-time. ii) We incorporate the previously learned scene text embedding to create an image-level descriptor that overcomes optical character recognition (OCR) errors which is well-suited to the fine-grained image classification task. iii) We design a region-level reasoning network that learns the interaction through semantics among salient visual regions and scene text instances. iv) We employ scene text information in image-text matching and introduce the Scene Text Aware Cross-Modal retrieval StacMR task. We gather a dataset that incorporates scene text and design a model suited for the newly studied modality. v) We identify the drawbacks of current retrieval metrics in cross-modal retrieval. An image captioning metric is proposed as a way of better evaluating semantics in retrieved results. Ample experimentation shows that incorporating such semantics into a model yields better semantic results while requiring significantly less data to converge.
	Address
	Corporate Author				Thesis	Ph.D. thesis
	Publisher	IMPRIMA	Place of Publication		Editor	Dimosthenis Karatzas;Lluis Gomez
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN	978-84-124793-6-2	Medium
	Area		Expedition		Conference
	Notes	DAG			Approved	no
	Call Number	Admin @ si @ Maf2022			Serial	3756
Permanent link to this record



	Author	Mohamed Ali Souibgui
	Title	Document Image Enhancement and Recognition in Low Resource Scenarios: Application to Ciphers and Handwritten Text			Type	Book Whole
	Year	2022	Publication	PhD Thesis, Universitat Autonoma de Barcelona-CVC	Abbreviated Journal
	Volume		Issue		Pages
	Keywords
	Abstract	In this thesis, we propose different contributions with the goal of enhancing and recognizing historical handwritten document images, especially the ones with rare scripts, such as cipher documents. In the first part, some effective end-to-end models for Document Image Enhancement (DIE) using deep learning models were presented. First, Generative Adversarial Networks (cGAN) for different tasks (document clean-up, binarization, deblurring, and watermark removal) were explored. Next, we further improve the results by recovering the degraded document images into a clean and readable form by integrating a text recognizer into the cGAN model to promote the generated document image to be more readable. Afterward, we present a new encoder-decoder architecture based on vision transformers to enhance both machine-printed and handwritten document images, in an end-to-end fashion. The second part of the thesis addresses Handwritten Text Recognition (HTR) in low resource scenarios, i.e. when only few labeled training data is available. We propose novel methods for recognizing ciphers with rare scripts. First, a few-shot object detection based method was proposed. Then, we incorporate a progressive learning strategy that automatically assignspseudo-labels to a set of unlabeled data to reduce the human labor of annotating few pages while maintaining the good performance of the model. Secondly, a data generation technique based on Bayesian Program Learning (BPL) is proposed to overcome the lack of data in such rare scripts. Thirdly, we propose a Text-Degradation Invariant Auto Encoder (Text-DIAE). This latter self-supervised model is designed to tackle two tasks, text recognition and document image enhancement. The proposed model does not exhibit limitations of previous state-of-the-art methods based on contrastive losses, while at the same time, it requires substantially fewer data samples to converge. In the third part of the thesis, we analyze, from the user perspective, the usage of HTR systems in low resource scenarios. This contrasts with the usual research on HTR, which often focuses on technical aspects only and rarely devotes efforts on implementing software tools for scholars in Humanities.
	Address
	Corporate Author				Thesis	Ph.D. thesis
	Publisher	IMPRIMA	Place of Publication		Editor	Alicia Fornes;Yousri Kessentini
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN	978-84-124793-8-6	Medium
	Area		Expedition		Conference
	Notes	DAG			Approved	no
	Call Number	Admin @ si @ Sou2022			Serial	3757
Permanent link to this record



	Author	Danna Xue; Fei Yang; Pei Wang; Luis Herranz; Jinqiu Sun; Yu Zhu; Yanning Zhang
	Title	SlimSeg: Slimmable Semantic Segmentation with Boundary Supervision			Type	Conference Article
	Year	2022	Publication	30th ACM International Conference on Multimedia	Abbreviated Journal
	Volume		Issue		Pages	6539-6548
	Keywords
	Abstract	Accurate semantic segmentation models typically require significant computational resources, inhibiting their use in practical applications. Recent works rely on well-crafted lightweight models to achieve fast inference. However, these models cannot flexibly adapt to varying accuracy and efficiency requirements. In this paper, we propose a simple but effective slimmable semantic segmentation (SlimSeg) method, which can be executed at different capacities during inference depending on the desired accuracy-efficiency tradeoff. More specifically, we employ parametrized channel slimming by stepwise downward knowledge distillation during training. Motivated by the observation that the differences between segmentation results of each submodel are mainly near the semantic borders, we introduce an additional boundary guided semantic segmentation loss to further improve the performance of each submodel. We show that our proposed SlimSeg with various mainstream networks can produce flexible models that provide dynamic adjustment of computational cost and better performance than independent models. Extensive experiments on semantic segmentation benchmarks, Cityscapes and CamVid, demonstrate the generalization ability of our framework.
	Address	Lisboa, Portugal, October 2022
	Corporate Author				Thesis
	Publisher	Association for Computing Machinery	Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN	978-1-4503-9203-7	Medium
	Area		Expedition		Conference	MM
	Notes	MACO; 600.161; 601.400			Approved	no
	Call Number	Admin @ si @ XYW2022			Serial	3758
Permanent link to this record



	Author	Sonia Baeza; Debora Gil; I.Garcia Olive; M.Salcedo; J.Deportos; Carles Sanchez; Guillermo Torres; G.Moragas; Antoni Rosell
	Title	A novel intelligent radiomic analysis of perfusion SPECT/CT images to optimize pulmonary embolism diagnosis in COVID-19 patients			Type	Journal Article
	Year	2022	Publication	EJNMMI Physics	Abbreviated Journal	EJNMMI-PHYS
	Volume	9	Issue	1, Article 84	Pages	1-17
	Keywords
	Abstract	Background: COVID-19 infection, especially in cases with pneumonia, is associated with a high rate of pulmonary embolism (PE). In patients with contraindications for CT pulmonary angiography (CTPA) or non-diagnostic CTPA, perfusion single-photon emission computed tomography/computed tomography (Q-SPECT/CT) is a diagnostic alternative. The goal of this study is to develop a radiomic diagnostic system to detect PE based only on the analysis of Q-SPECT/CT scans. Methods: This radiomic diagnostic system is based on a local analysis of Q-SPECT/CT volumes that includes both CT and Q-SPECT values for each volume point. We present a combined approach that uses radiomic features extracted from each scan as input into a fully connected classifcation neural network that optimizes a weighted crossentropy loss trained to discriminate between three diferent types of image patterns (pixel sample level): healthy lungs (control group), PE and pneumonia. Four types of models using diferent confguration of parameters were tested. Results: The proposed radiomic diagnostic system was trained on 20 patients (4,927 sets of samples of three types of image patterns) and validated in a group of 39 patients (4,410 sets of samples of three types of image patterns). In the training group, COVID-19 infection corresponded to 45% of the cases and 51.28% in the test group. In the test group, the best model for determining diferent types of image patterns with PE presented a sensitivity, specifcity, positive predictive value and negative predictive value of 75.1%, 98.2%, 88.9% and 95.4%, respectively. The best model for detecting pneumonia presented a sensitivity, specifcity, positive predictive value and negative predictive value of 94.1%, 93.6%, 85.2% and 97.6%, respectively. The area under the curve (AUC) was 0.92 for PE and 0.91 for pneumonia. When the results obtained at the pixel sample level are aggregated into regions of interest, the sensitivity of the PE increases to 85%, and all metrics improve for pneumonia. Conclusion: This radiomic diagnostic system was able to identify the diferent lung imaging patterns and is a frst step toward a comprehensive intelligent radiomic system to optimize the diagnosis of PE by Q-SPECT/CT.
	Address	5 dec 2022
	Corporate Author				Thesis
	Publisher	Springer	Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	IAM			Approved	no
	Call Number	Admin @ si @ BGG2022			Serial	3759
Permanent link to this record



	Author	David Castells; Vinh Ngo; Juan Borrego-Carazo; Marc Codina; Carles Sanchez; Debora Gil; Jordi Carrabina
	Title	A Survey of FPGA-Based Vision Systems for Autonomous Cars			Type	Journal Article
	Year	2022	Publication	IEEE Access	Abbreviated Journal	ACESS
	Volume	10	Issue		Pages	132525-132563
	Keywords	Autonomous automobile; Computer vision; field programmable gate arrays; reconfigurable architectures
	Abstract	On the road to making self-driving cars a reality, academic and industrial researchers are working hard to continue to increase safety while meeting technical and regulatory constraints Understanding the surrounding environment is a fundamental task in self-driving cars. It requires combining complex computer vision algorithms. Although state-of-the-art algorithms achieve good accuracy, their implementations often require powerful computing platforms with high power consumption. In some cases, the processing speed does not meet real-time constraints. FPGA platforms are often used to implement a category of latency-critical algorithms that demand maximum performance and energy efficiency. Since self-driving car computer vision functions fall into this category, one could expect to see a wide adoption of FPGAs in autonomous cars. In this paper, we survey the computer vision FPGA-based works from the literature targeting automotive applications over the last decade. Based on the survey, we identify the strengths and weaknesses of FPGAs in this domain and future research opportunities and challenges.
	Address	16 December 2022
	Corporate Author				Thesis
	Publisher	IEEE	Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	IAM; 600.166			Approved	no
	Call Number	Admin @ si @ CNB2022			Serial	3760
Permanent link to this record



	Author	Saad Minhas; Zeba Khanam; Shoaib Ehsan; Klaus McDonald Maier; Aura Hernandez-Sabate
	Title	Weather Classification by Utilizing Synthetic Data			Type	Journal Article
	Year	2022	Publication	Sensors	Abbreviated Journal	SENS
	Volume	22	Issue	9	Pages	3193
	Keywords	Weather classification; synthetic data; dataset; autonomous car; computer vision; advanced driver assistance systems; deep learning; intelligent transportation systems
	Abstract	Weather prediction from real-world images can be termed a complex task when targeting classification using neural networks. Moreover, the number of images throughout the available datasets can contain a huge amount of variance when comparing locations with the weather those images are representing. In this article, the capabilities of a custom built driver simulator are explored specifically to simulate a wide range of weather conditions. Moreover, the performance of a new synthetic dataset generated by the above simulator is also assessed. The results indicate that the use of synthetic datasets in conjunction with real-world datasets can increase the training efficiency of the CNNs by as much as 74%. The article paves a way forward to tackle the persistent problem of bias in vision-based datasets.
	Address	21 April 2022
	Corporate Author				Thesis
	Publisher	MDPI	Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	IAM; 600.139; 600.159; 600.166; 600.145;			Approved	no
	Call Number	Admin @ si @ MKE2022			Serial	3761
Permanent link to this record



	Author	Eduardo Aguilar; Bhalaji Nagarajan; Beatriz Remeseiro; Petia Radeva
	Title	Bayesian deep learning for semantic segmentation of food images			Type	Journal Article
	Year	2022	Publication	Computers and Electrical Engineering	Abbreviated Journal	CEE
	Volume	103	Issue		Pages	108380
	Keywords	Deep learning; Uncertainty quantification; Bayesian inference; Image segmentation; Food analysis
	Abstract	Deep learning has provided promising results in various applications; however, algorithms tend to be overconfident in their predictions, even though they may be entirely wrong. Particularly for critical applications, the model should provide answers only when it is very sure of them. This article presents a Bayesian version of two different state-of-the-art semantic segmentation methods to perform multi-class segmentation of foods and estimate the uncertainty about the given predictions. The proposed methods were evaluated on three public pixel-annotated food datasets. As a result, we can conclude that Bayesian methods improve the performance achieved by the baseline architectures and, in addition, provide information to improve decision-making. Furthermore, based on the extracted uncertainty map, we proposed three measures to rank the images according to the degree of noisy annotations they contained. Note that the top 135 images ranked by one of these measures include more than half of the worst-labeled food images.
	Address	October 2022
	Corporate Author				Thesis
	Publisher	Science Direct	Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	MILAB			Approved	no
	Call Number	Admin @ si @ ANR2022			Serial	3763
Permanent link to this record



	Author	Zhen Xu; Sergio Escalera; Adrien Pavao; Magali Richard; Wei-Wei Tu; Quanming Yao; Huan Zhao; Isabelle Guyon
	Title	Codabench: Flexible, easy-to-use, and reproducible meta-benchmark platform			Type	Journal Article
	Year	2022	Publication	Patterns	Abbreviated Journal	PATTERNS
	Volume	3	Issue	7	Pages	100543
	Keywords	Machine learning; data science; benchmark platform; reproducibility; competitions
	Abstract	Obtaining a standardized benchmark of computational methods is a major issue in data-science communities. Dedicated frameworks enabling fair benchmarking in a unified environment are yet to be developed. Here, we introduce Codabench, a meta-benchmark platform that is open sourced and community driven for benchmarking algorithms or software agents versus datasets or tasks. A public instance of Codabench is open to everyone free of charge and allows benchmark organizers to fairly compare submissions under the same setting (software, hardware, data, algorithms), with custom protocols and data formats. Codabench has unique features facilitating easy organization of flexible and reproducible benchmarks, such as the possibility of reusing templates of benchmarks and supplying compute resources on demand. Codabench has been used internally and externally on various applications, receiving more than 130 users and 2,500 submissions. As illustrative use cases, we introduce four diverse benchmarks covering graph machine learning, cancer heterogeneity, clinical diagnosis, and reinforcement learning.
	Address	June 24, 2022
	Corporate Author				Thesis
	Publisher	Science Direct	Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	HuPBA			Approved	no
	Call Number	Admin @ si @ XEP2022			Serial	3764
Permanent link to this record



	Author	Shiqi Yang; Yaxing Wang; Kai Wang; Shangling Jui; Joost Van de Weijer
	Title	Attracting and Dispersing: A Simple Approach for Source-free Domain Adaptation			Type	Conference Article
	Year	2022	Publication	36th Conference on Neural Information Processing Systems	Abbreviated Journal
	Volume		Issue		Pages
	Keywords
	Abstract	We propose a simple but effective source-free domain adaptation (SFDA) method. Treating SFDA as an unsupervised clustering problem and following the intuition that local neighbors in feature space should have more similar predictions than other features, we propose to optimize an objective of prediction consistency. This objective encourages local neighborhood features in feature space to have similar predictions while features farther away in feature space have dissimilar predictions, leading to efficient feature clustering and cluster assignment simultaneously. For efficient training, we seek to optimize an upper-bound of the objective resulting in two simple terms. Furthermore, we relate popular existing methods in domain adaptation, source-free domain adaptation and contrastive learning via the perspective of discriminability and diversity. The experimental results prove the superiority of our method, and our method can be adopted as a simple but strong baseline for future research in SFDA. Our method can be also adapted to source-free open-set and partial-set DA which further shows the generalization ability of our method.
	Address	Virtual; November 2022
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	NEURIPS
	Notes	LAMP; 600.147			Approved	no
	Call Number	Admin @ si @ YWW2022a			Serial	3792
Permanent link to this record



	Author	Saiping Zhang; Luis Herranz; Marta Mrak; Marc Gorriz Blanch; Shuai Wan; Fuzheng Yang
	Title	DCNGAN: A Deformable Convolution-Based GAN with QP Adaptation for Perceptual Quality Enhancement of Compressed Video			Type	Conference Article
	Year	2022	Publication	47th International Conference on Acoustics, Speech, and Signal Processing	Abbreviated Journal
	Volume		Issue		Pages
	Keywords
	Abstract	In this paper, we propose a deformable convolution-based generative adversarial network (DCNGAN) for perceptual quality enhancement of compressed videos. DCNGAN is also adaptive to the quantization parameters (QPs). Compared with optical flows, deformable convolutions are more effective and efficient to align frames. Deformable convolutions can operate on multiple frames, thus leveraging more temporal information, which is beneficial for enhancing the perceptual quality of compressed videos. Instead of aligning frames in a pairwise manner, the deformable convolution can process multiple frames simultaneously, which leads to lower computational complexity. Experimental results demonstrate that the proposed DCNGAN outperforms other state-of-the-art compressed video quality enhancement algorithms.
	Address	Virtual; May 2022
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	ICASSP
	Notes	MACO; 600.161; 601.379			Approved	no
	Call Number	Admin @ si @ ZHM2022a			Serial	3765
Permanent link to this record



	Author	German Barquero; Johnny Nuñez; Sergio Escalera; Zhen Xu; Wei-Wei Tu; Isabelle Guyon
	Title	Didn’t see that coming: a survey on non-verbal social human behavior forecasting			Type	Conference Article
	Year	2022	Publication	Understanding Social Behavior in Dyadic and Small Group Interactions	Abbreviated Journal
	Volume	173	Issue		Pages	139-178
	Keywords
	Abstract	Non-verbal social human behavior forecasting has increasingly attracted the interest of the research community in recent years. Its direct applications to human-robot interaction and socially-aware human motion generation make it a very attractive field. In this survey, we define the behavior forecasting problem for multiple interactive agents in a generic way that aims at unifying the fields of social signals prediction and human motion forecasting, traditionally separated. We hold that both problem formulations refer to the same conceptual problem, and identify many shared fundamental challenges: future stochasticity, context awareness, history exploitation, etc. We also propose a taxonomy that comprises methods published in the last 5 years in a very informative way and describes the current main concerns of the community with regard to this problem. In order to promote further research on this field, we also provide a summarized and friendly overview of audiovisual datasets featuring non-acted social interactions. Finally, we describe the most common metrics used in this task and their particular issues.
	Address	Virtual; June 2022
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	PMLR
	Notes	HuPBA; no proj			Approved	no
	Call Number	Admin @ si @ BNE2022			Serial	3766
Permanent link to this record



	Author	Guillem Martinez; Maya Aghaei; Martin Dijkstra; Bhalaji Nagarajan; Femke Jaarsma; Jaap van de Loosdrecht; Petia Radeva; Klaas Dijkstra
	Title	Hyper-Spectral Imaging for Overlapping Plastic Flakes Segmentation			Type	Conference Article
	Year	2022	Publication	47th International Conference on Acoustics, Speech, and Signal Processing	Abbreviated Journal
	Volume		Issue		Pages
	Keywords	Hyper-spectral imaging; plastic sorting; multi-label segmentation; bitfield encoding
	Abstract	In this paper, we propose a deformable convolution-based generative adversarial network (DCNGAN) for perceptual quality enhancement of compressed videos. DCNGAN is also adaptive to the quantization parameters (QPs). Compared with optical flows, deformable convolutions are more effective and efficient to align frames. Deformable convolutions can operate on multiple frames, thus leveraging more temporal information, which is beneficial for enhancing the perceptual quality of compressed videos. Instead of aligning frames in a pairwise manner, the deformable convolution can process multiple frames simultaneously, which leads to lower computational complexity. Experimental results demonstrate that the proposed DCNGAN outperforms other state-of-the-art compressed video quality enhancement algorithms.
	Address	Singapore; May 2022
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	ICASSP
	Notes	MILAB; no proj			Approved	no
	Call Number	Admin @ si @ MAD2022			Serial	3767
Permanent link to this record