|
Pau Baiget, Carles Fernandez, Xavier Roca, & Jordi Gonzalez. (2009). Generation of Augmented Video Sequences Combining Behavioral Animation and Multi Object Tracking. Computer Animation and Virtual Worlds, 20(4), 473–489.
Abstract: In this paper we present a novel approach to generate augmented video sequences in real-time, involving interactions between virtual and real agents in real scenarios. On the one hand, real agent motion is estimated by means of a multi-object tracking algorithm, which determines real objects' position over the scenario for each time step. On the other hand, virtual agents are provided with behavior models considering their interaction with the environment and with other agents. The resulting framework allows to generate video sequences involving behavior-based virtual agents that react to real agent behavior and has applications in education, simulation, and in the game and movie industries. We show the performance of the proposed approach in an indoor and outdoor scenario simulating human and vehicle agents. Copyright © 2009 John Wiley & Sons, Ltd.
We present a novel approach to generate augmented video sequences in real-time, involving interactions between virtual and real agents in real scenarios. On the one hand, real agent motion is estimated by means of a multi-object tracking algorithm, which determines real objects' position over the scenario for each time step. On the other hand, virtual agents are provided with behavior models considering their interaction with the environment and with other agents. © 2009 Wiley Periodicals, Inc.
|
|
|
Parichehr Behjati, Pau Rodriguez, Carles Fernandez, Isabelle Hupont, Armin Mehri, & Jordi Gonzalez. (2023). Single image super-resolution based on directional variance attention network. PR - Pattern Recognition, 133, 108997.
Abstract: Recent advances in single image super-resolution (SISR) explore the power of deep convolutional neural networks (CNNs) to achieve better performance. However, most of the progress has been made by scaling CNN architectures, which usually raise computational demands and memory consumption. This makes modern architectures less applicable in practice. In addition, most CNN-based SR methods do not fully utilize the informative hierarchical features that are helpful for final image recovery. In order to address these issues, we propose a directional variance attention network (DiVANet), a computationally efficient yet accurate network for SISR. Specifically, we introduce a novel directional variance attention (DiVA) mechanism to capture long-range spatial dependencies and exploit inter-channel dependencies simultaneously for more discriminative representations. Furthermore, we propose a residual attention feature group (RAFG) for parallelizing attention and residual block computation. The output of each residual block is linearly fused at the RAFG output to provide access to the whole feature hierarchy. In parallel, DiVA extracts most relevant features from the network for improving the final output and preventing information loss along the successive operations inside the network. Experimental results demonstrate the superiority of DiVANet over the state of the art in several datasets, while maintaining relatively low computation and memory footprint. The code is available at https://github.com/pbehjatii/DiVANet.
|
|
|
Parichehr Behjati Ardakani, Pau Rodriguez, Carles Fernandez, Armin Mehri, Xavier Roca, Seiichi Ozawa, et al. (2022). Frequency-based Enhancement Network for Efficient Super-Resolution. ACCESS - IEEE Access, 10, 57383–57397.
Abstract: Recently, deep convolutional neural networks (CNNs) have provided outstanding performance in single image super-resolution (SISR). Despite their remarkable performance, the lack of high-frequency information in the recovered images remains a core problem. Moreover, as the networks increase in depth and width, deep CNN-based SR methods are faced with the challenge of computational complexity in practice. A promising and under-explored solution is to adapt the amount of compute based on the different frequency bands of the input. To this end, we present a novel Frequency-based Enhancement Block (FEB) which explicitly enhances the information of high frequencies while forwarding low-frequencies to the output. In particular, this block efficiently decomposes features into low- and high-frequency and assigns more computation to high-frequency ones. Thus, it can help the network generate more discriminative representations by explicitly recovering finer details. Our FEB design is simple and generic and can be used as a direct replacement of commonly used SR blocks with no need to change network architectures. We experimentally show that when replacing SR blocks with FEB we consistently improve the reconstruction error, while reducing the number of parameters in the model. Moreover, we propose a lightweight SR model — Frequency-based Enhancement Network (FENet) — based on FEB that matches the performance of larger models. Extensive experiments demonstrate that our proposal performs favorably against the state-of-the-art SR algorithms in terms of visual quality, memory footprint, and inference time. The code is available at https://github.com/pbehjatii/FENet
Keywords: Deep learning; Frequency-based methods; Lightweight architectures; Single image super-resolution
|
|
|
Oscar Lopes, Miguel Reyes, Sergio Escalera, & Jordi Gonzalez. (2014). Spherical Blurred Shape Model for 3-D Object and Pose Recognition: Quantitative Analysis and HCI Applications in Smart Environments. TSMCB - IEEE Transactions on Systems, Man and Cybernetics (Part B), 44(12), 2379–2390.
Abstract: The use of depth maps is of increasing interest after the advent of cheap multisensor devices based on structured light, such as Kinect. In this context, there is a strong need of powerful 3-D shape descriptors able to generate rich object representations. Although several 3-D descriptors have been already proposed in the literature, the research of discriminative and computationally efficient descriptors is still an open issue. In this paper, we propose a novel point cloud descriptor called spherical blurred shape model (SBSM) that successfully encodes the structure density and local variabilities of an object based on shape voxel distances and a neighborhood propagation strategy. The proposed SBSM is proven to be rotation and scale invariant, robust to noise and occlusions, highly discriminative for multiple categories of complex objects like the human hand, and computationally efficient since the SBSM complexity is linear to the number of object voxels. Experimental evaluation in public depth multiclass object data, 3-D facial expressions data, and a novel hand poses data sets show significant performance improvements in relation to state-of-the-art approaches. Moreover, the effectiveness of the proposal is also proved for object spotting in 3-D scenes and for real-time automatic hand pose recognition in human computer interaction scenarios.
|
|
|
O.F.Ahmad, Y.Mori, M.Misawa, S.Kudo, J.T.Anderson, & Jorge Bernal. (2021). Establishing key research questions for the implementation of artificial intelligence in colonoscopy: a modified Delphi method. END - Endoscopy, 53(9), 893–901.
Abstract: BACKGROUND : Artificial intelligence (AI) research in colonoscopy is progressing rapidly but widespread clinical implementation is not yet a reality. We aimed to identify the top implementation research priorities. METHODS : An established modified Delphi approach for research priority setting was used. Fifteen international experts, including endoscopists and translational computer scientists/engineers, from nine countries participated in an online survey over 9 months. Questions related to AI implementation in colonoscopy were generated as a long-list in the first round, and then scored in two subsequent rounds to identify the top 10 research questions. RESULTS : The top 10 ranked questions were categorized into five themes. Theme 1: clinical trial design/end points (4 questions), related to optimum trial designs for polyp detection and characterization, determining the optimal end points for evaluation of AI, and demonstrating impact on interval cancer rates. Theme 2: technological developments (3 questions), including improving detection of more challenging and advanced lesions, reduction of false-positive rates, and minimizing latency. Theme 3: clinical adoption/integration (1 question), concerning the effective combination of detection and characterization into one workflow. Theme 4: data access/annotation (1 question), concerning more efficient or automated data annotation methods to reduce the burden on human experts. Theme 5: regulatory approval (1 question), related to making regulatory approval processes more efficient. CONCLUSIONS : This is the first reported international research priority setting exercise for AI in colonoscopy. The study findings should be used as a framework to guide future research with key stakeholders to accelerate the clinical implementation of AI in endoscopy.
|
|