Home | [1–10] << 11 >> |
Records | |||||
---|---|---|---|---|---|
Author | Wenwen Yu; Chengquan Zhang; Haoyu Cao; Wei Hua; Bohan Li; Huang Chen; Mingyu Liu; Mingrui Chen; Jianfeng Kuang; Mengjun Cheng; Yuning Du; Shikun Feng; Xiaoguang Hu; Pengyuan Lyu; Kun Yao; Yuechen Yu; Yuliang Liu; Wanxiang Che; Errui Ding; Cheng-Lin Liu; Jiebo Luo; Shuicheng Yan; Min Zhang; Dimosthenis Karatzas; Xing Sun; Jingdong Wang; Xiang Bai | ||||
Title | ICDAR 2023 Competition on Structured Text Extraction from Visually-Rich Document Images | Type | Conference Article | ||
Year | 2023 | Publication | 17th International Conference on Document Analysis and Recognition | Abbreviated Journal | |
Volume | 14188 | Issue | Pages | 536–552 | |
Keywords | |||||
Abstract | Structured text extraction is one of the most valuable and challenging application directions in the field of Document AI. However, the scenarios of past benchmarks are limited, and the corresponding evaluation protocols usually focus on the submodules of the structured text extraction scheme. In order to eliminate these problems, we organized the ICDAR 2023 competition on Structured text extraction from Visually-Rich Document images (SVRD). We set up two tracks for SVRD including Track 1: HUST-CELL and Track 2: Baidu-FEST, where HUST-CELL aims to evaluate the end-to-end performance of Complex Entity Linking and Labeling, and Baidu-FEST focuses on evaluating the performance and generalization of Zero-shot/Few-shot Structured Text extraction from an end-to-end perspective. Compared to the current document benchmarks, our two tracks of competition benchmark enriches the scenarios greatly and contains more than 50 types of visually-rich document images (mainly from the actual enterprise applications). The competition opened on 30th December, 2022 and closed on 24th March, 2023. There are 35 participants and 91 valid submissions received for Track 1, and 15 participants and 26 valid submissions received for Track 2. In this report we will presents the motivation, competition datasets, task definition, evaluation protocol, and submission summaries. According to the performance of the submissions, we believe there is still a large gap on the expected information extraction performance for complex and zero-shot scenarios. It is hoped that this competition will attract many researchers in the field of CV and NLP, and bring some new thoughts to the field of Document AI. | ||||
Address | San Jose; CA; USA; August 2023 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | LNCS | ||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | ICDAR | ||
Notes | DAG | Approved | no | ||
Call Number | Admin @ si @ YZC2023 | Serial | 3896 | ||
Permanent link to this record | |||||
Author | Wenwen Yu; Mingyu Liu; Mingrui Chen; Ning Lu; Yinlong We; Yuliang Liu; Dimosthenis Karatzas; Xiang Bai | ||||
Title | ICDAR 2023 Competition on Reading the Seal Title | Type | Conference Article | ||
Year | 2023 | Publication | 17th International Conference on Document Analysis and Recognition | Abbreviated Journal | |
Volume | 14188 | Issue | Pages | 522–535 | |
Keywords | |||||
Abstract | Reading seal title text is a challenging task due to the variable shapes of seals, curved text, background noise, and overlapped text. However, this important element is commonly found in official and financial scenarios, and has not received the attention it deserves in the field of OCR technology. To promote research in this area, we organized ICDAR 2023 competition on reading the seal title (ReST), which included two tasks: seal title text detection (Task 1) and end-to-end seal title recognition (Task 2). We constructed a dataset of 10,000 real seal data, covering the most common classes of seals, and labeled all seal title texts with text polygons and text contents. The competition opened on 30th December, 2022 and closed on 20th March, 2023. The competition attracted 53 participants and received 135 submissions from academia and industry, including 28 participants and 72 submissions for Task 1, and 25 participants and 63 submissions for Task 2, which demonstrated significant interest in this challenging task. In this report, we present an overview of the competition, including the organization, challenges, and results. We describe the dataset and tasks, and summarize the submissions and evaluation results. The results show that significant progress has been made in the field of seal title text reading, and we hope that this competition will inspire further research and development in this important area of OCR technology. | ||||
Address | San Jose; CA; USA; August 2023 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | LNCS | ||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | ICDAR | ||
Notes | DAG | Approved | no | ||
Call Number | Admin @ si @ YLC2023 | Serial | 3897 | ||
Permanent link to this record | |||||
Author | Xavier Soria; Angel Sappa; Patricio Humanante; Arash Akbarinia | ||||
Title | Dense extreme inception network for edge detection | Type | Journal Article | ||
Year | 2023 | Publication | Pattern Recognition | Abbreviated Journal | PR |
Volume | 139 | Issue | Pages | 109461 | |
Keywords | |||||
Abstract | Edge detection is the basis of many computer vision applications. State of the art predominantly relies on deep learning with two decisive factors: dataset content and network architecture. Most of the publicly available datasets are not curated for edge detection tasks. Here, we address this limitation. First, we argue that edges, contours and boundaries, despite their overlaps, are three distinct visual features requiring separate benchmark datasets. To this end, we present a new dataset of edges. Second, we propose a novel architecture, termed Dense Extreme Inception Network for Edge Detection (DexiNed), that can be trained from scratch without any pre-trained weights. DexiNed outperforms other algorithms in the presented dataset. It also generalizes well to other datasets without any fine-tuning. The higher quality of DexiNed is also perceptually evident thanks to the sharper and finer edges it outputs. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | MSIAU | Approved | no | ||
Call Number | Admin @ si @ SSH2023 | Serial | 3982 | ||
Permanent link to this record | |||||
Author | Xavier Soria; Yachuan Li; Mohammad Rouhani; Angel Sappa | ||||
Title | Tiny and Efficient Model for the Edge Detection Generalization | Type | Conference Article | ||
Year | 2023 | Publication | Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops | Abbreviated Journal | |
Volume | Issue | Pages | |||
Keywords | |||||
Abstract | Most high-level computer vision tasks rely on low-level image operations as their initial processes. Operations such as edge detection, image enhancement, and super-resolution, provide the foundations for higher level image analysis. In this work we address the edge detection considering three main objectives: simplicity, efficiency, and generalization since current state-of-the-art (SOTA) edge detection models are increased in complexity for better accuracy. To achieve this, we present Tiny and Efficient Edge Detector (TEED), a light convolutional neural network with only 58K parameters, less than 0:2% of the state-of-the-art models. Training on the BIPED dataset takes less than 30 minutes, with each epoch requiring less than 5 minutes. Our proposed model is easy to train and it quickly converges within very first few epochs, while the predicted edge-maps are crisp and of high quality. Additionally, we propose a new dataset to test the generalization of edge detection, which comprises samples from popular images used in edge detection and image segmentation. The source code is available in https://github.com/xavysp/TEED. | ||||
Address | Paris; France; October 2023 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | ICCVW | ||
Notes | MSIAU | Approved | no | ||
Call Number | Admin @ si @ SLR2023 | Serial | 3941 | ||
Permanent link to this record | |||||
Author | Yael Tudela; Ana Garcia Rodriguez; Gloria Fernandez Esparrach; Jorge Bernal | ||||
Title | Towards Fine-Grained Polyp Segmentation and Classification | Type | Conference Article | ||
Year | 2023 | Publication | Workshop on Clinical Image-Based Procedures | Abbreviated Journal | |
Volume | 14242 | Issue | Pages | 32-42 | |
Keywords | Medical image segmentation; Colorectal Cancer; Vision Transformer; Classification | ||||
Abstract | Colorectal cancer is one of the main causes of cancer death worldwide. Colonoscopy is the gold standard screening tool as it allows lesion detection and removal during the same procedure. During the last decades, several efforts have been made to develop CAD systems to assist clinicians in lesion detection and classification. Regarding the latter, and in order to be used in the exploration room as part of resect and discard or leave-in-situ strategies, these systems must identify correctly all different lesion types. This is a challenging task, as the data used to train these systems presents great inter-class similarity, high class imbalance, and low representation of clinically relevant histology classes such as serrated sessile adenomas.
In this paper, a new polyp segmentation and classification method, Swin-Expand, is introduced. Based on Swin-Transformer, it uses a simple and lightweight decoder. The performance of this method has been assessed on a novel dataset, comprising 1126 high-definition images representing the three main histological classes. Results show a clear improvement in both segmentation and classification performance, also achieving competitive results when tested in public datasets. These results confirm that both the method and the data are important to obtain more accurate polyp representations. |
||||
Address | Vancouver; October 2023 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | LNCS | ||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | MICCAIW | ||
Notes | ISE | Approved | no | ||
Call Number | Admin @ si @ TGF2023 | Serial | 3837 | ||
Permanent link to this record | |||||
Author | Yawei Li; Yulun Zhang; Radu Timofte; Luc Van Gool; Zhijun Tu; Kunpeng Du; Hailing Wang; Hanting Chen; Wei Li; Xiaofei Wang; Jie Hu; Yunhe Wang; Xiangyu Kong; Jinlong Wu; Dafeng Zhang; Jianxing Zhang; Shuai Liu; Furui Bai; Chaoyu Feng; Hao Wang; Yuqian Zhang; Guangqi Shao; Xiaotao Wang; Lei Lei; Rongjian Xu; Zhilu Zhang; Yunjin Chen; Dongwei Ren; Wangmeng Zuo; Qi Wu; Mingyan Han; Shen Cheng; Haipeng Li; Ting Jiang; Chengzhi Jiang; Xinpeng Li; Jinting Luo; Wenjie Lin; Lei Yu; Haoqiang Fan; Shuaicheng Liu; Aditya Arora; Syed Waqas Zamir; Javier Vazquez; Konstantinos G. Derpanis; Michael S. Brown; Hao Li; Zhihao Zhao; Jinshan Pan; Jiangxin Dong; Jinhui Tang; Bo Yang; Jingxiang Chen; Chenghua Li; Xi Zhang; Zhao Zhang; Jiahuan Ren; Zhicheng Ji; Kang Miao; Suiyi Zhao; Huan Zheng; YanYan Wei; Kangliang Liu; Xiangcheng Du; Sijie Liu; Yingbin Zheng; Xingjiao Wu; Cheng Jin; Rajeev Irny; Sriharsha Koundinya; Vighnesh Kamath; Gaurav Khandelwal; Sunder Ali Khowaja; Jiseok Yoon; Ik Hyun Lee; Shijie Chen; Chengqiang Zhao; Huabin Yang; Zhongjian Zhang; Junjia Huang; Yanru Zhang | ||||
Title | NTIRE 2023 challenge on image denoising: Methods and results | Type | Conference Article | ||
Year | 2023 | Publication | Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops | Abbreviated Journal | |
Volume | Issue | Pages | 1904-1920 | ||
Keywords | |||||
Abstract | This paper reviews the NTIRE 2023 challenge on image denoising (σ = 50) with a focus on the proposed solutions and results. The aim is to obtain a network design capable to produce high-quality results with the best performance measured by PSNR for image denoising. Independent additive white Gaussian noise (AWGN) is assumed and the noise level is 50. The challenge had 225 registered participants, and 16 teams made valid submissions. They gauge the state-of-the-art for image denoising. | ||||
Address | Vancouver; Canada; June 2023 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | CVPRW | ||
Notes | MACO; CIC | Approved | no | ||
Call Number | Admin @ si @ LZT2023 | Serial | 3910 | ||
Permanent link to this record | |||||
Author | Yi Xiao | ||||
Title | Advancing Vision-based End-to-End Autonomous Driving | Type | Book Whole | ||
Year | 2023 | Publication | PhD Thesis, Universitat Autonoma de Barcelona-CVC | Abbreviated Journal | |
Volume | Issue | Pages | |||
Keywords | |||||
Abstract | In autonomous driving, artificial intelligence (AI) processes the traffic environment to drive the vehicle to a desired destination. Currently, there are different paradigms that address the development of AI-enabled drivers. On the one hand, we find modular pipelines, which divide the driving task into sub-tasks such as perception, maneuver planning, and control. On the other hand, we find end-to-end driving approaches that attempt to learn the direct mapping of raw data from input sensors to vehicle control signals. The latter are relatively less studied but are gaining popularity as they are less demanding in terms of data labeling. Therefore, in this thesis, our goal is to investigate end-to-end autonomous driving.
We propose to evaluate three approaches to tackle the challenge of end-to-end autonomous driving. First, we focus on the input, considering adding depth information as complementary to RGB data, in order to mimic the human being’s ability to estimate the distance to obstacles. Notice that, in the real world, these depth maps can be obtained either from a LiDAR sensor, or a trained monocular depth estimation module, where human labeling is not needed. Then, based on the intuition that the latent space of end-to-end driving models encodes relevant information for driving, we use it as prior knowledge for training an affordancebased driving model. In this case, the trained affordance-based model can achieve good performance while requiring less human-labeled data, and it can provide interpretability regarding driving actions. Finally, we present a new pure vision-based end-to-end driving model termed CIL++, which is trained by imitation learning. CIL++ leverages modern best practices, such as a large horizontal field of view and a self-attention mechanism, which are contributing to the agent’s understanding of the driving scene and bringing a better imitation of human drivers. Using training data without any human labeling, our model yields almost expert performance in the CARLA NoCrash benchmark and could rival SOTA models that require large amounts of human-labeled data. |
||||
Address | |||||
Corporate Author | Thesis | Ph.D. thesis | |||
Publisher | IMPRIMA | Place of Publication | Editor | Antonio Lopez | |
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | 978-84-126409-4-6 | Medium | ||
Area | Expedition | Conference | |||
Notes | ADAS | Approved | no | ||
Call Number | Admin @ si @ Xia2023 | Serial | 3964 | ||
Permanent link to this record | |||||
Author | Yi Xiao; Felipe Codevilla; Diego Porres; Antonio Lopez | ||||
Title | Scaling Vision-Based End-to-End Autonomous Driving with Multi-View Attention Learning | Type | Conference Article | ||
Year | 2023 | Publication | International Conference on Intelligent Robots and Systems | Abbreviated Journal | |
Volume | Issue | Pages | |||
Keywords | |||||
Abstract | On end-to-end driving, human driving demonstrations are used to train perception-based driving models by imitation learning. This process is supervised on vehicle signals (e.g., steering angle, acceleration) but does not require extra costly supervision (human labeling of sensor data). As a representative of such vision-based end-to-end driving models, CILRS is commonly used as a baseline to compare with new driving models. So far, some latest models achieve better performance than CILRS by using expensive sensor suites and/or by using large amounts of human-labeled data for training. Given the difference in performance, one may think that it is not worth pursuing vision-based pure end-to-end driving. However, we argue that this approach still has great value and potential considering cost and maintenance. In this paper, we present CIL++, which improves on CILRS by both processing higher-resolution images using a human-inspired HFOV as an inductive bias and incorporating a proper attention mechanism. CIL++ achieves competitive performance compared to models which are more costly to develop. We propose to replace CILRS with CIL++ as a strong vision-based pure end-to-end driving baseline supervised by only vehicle signals and trained by conditional imitation learning. | ||||
Address | Detroit; USA; October 2023 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | IROS | ||
Notes | ADAS | Approved | no | ||
Call Number | Admin @ si @ XCP2023 | Serial | 3930 | ||
Permanent link to this record | |||||
Author | Yifan Wang; Luka Murn; Luis Herranz; Fei Yang; Marta Mrak; Wei Zhang; Shuai Wan; Marc Gorriz Blanch | ||||
Title | Efficient Super-Resolution for Compression Of Gaming Videos | Type | Conference Article | ||
Year | 2023 | Publication | IEEE International Conference on Acoustics, Speech and Signal Processing | Abbreviated Journal | |
Volume | Issue | Pages | |||
Keywords | |||||
Abstract | Due to the increasing demand for game-streaming services, efficient compression of computer-generated video is more critical than ever, especially when the available bandwidth is low. This paper proposes a super-resolution framework that improves the coding efficiency of computer-generated gaming videos at low bitrates. Most state-of-the-art super-resolution networks generalize over a variety of RGB inputs and use a unified network architecture for frames of different levels of degradation, leading to high complexity and redundancy. Since games usually consist of a limited number of fixed scenarios, we specialize one model for each scenario and assign appropriate network capacities for different QPs to perform super-resolution under the guidance of reconstructed high-quality luma components. Experimental results show that our framework achieves a superior quality-complexity trade-off compared to the ESRnet baseline, saving at most 93.59% parameters while maintaining comparable performance. The compression efficiency compared to HEVC is also improved by more than 17% BD-rate gain. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | ICASSP | ||
Notes | LAMP; MACO | Approved | no | ||
Call Number | Admin @ si @ WMH2023 | Serial | 3911 | ||
Permanent link to this record | |||||
Author | Yuyang Liu; Yang Cong; Dipam Goswami; Xialei Liu; Joost Van de Weijer | ||||
Title | Augmented Box Replay: Overcoming Foreground Shift for Incremental Object Detection | Type | Conference Article | ||
Year | 2023 | Publication | 20th IEEE International Conference on Computer Vision | Abbreviated Journal | |
Volume | Issue | Pages | 11367-11377 | ||
Keywords | |||||
Abstract | In incremental learning, replaying stored samples from previous tasks together with current task samples is one of the most efficient approaches to address catastrophic forgetting. However, unlike incremental classification, image replay has not been successfully applied to incremental object detection (IOD). In this paper, we identify the overlooked problem of foreground shift as the main reason for this. Foreground shift only occurs when replaying images of previous tasks and refers to the fact that their background might contain foreground objects of the current task. To overcome this problem, a novel and efficient Augmented Box Replay (ABR) method is developed that only stores and replays foreground objects and thereby circumvents the foreground shift problem. In addition, we propose an innovative Attentive RoI Distillation loss that uses spatial attention from region-of-interest (RoI) features to constrain current model to focus on the most important information from old model. ABR significantly reduces forgetting of previous classes while maintaining high plasticity in current classes. Moreover, it considerably reduces the storage requirements when compared to standard image replay. Comprehensive experiments on Pascal-VOC and COCO datasets support the state-of-the-art performance of our model. | ||||
Address | Paris; France; October 2023 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | ICCV | ||
Notes | LAMP | Approved | no | ||
Call Number | Admin @ si @ LCG2023 | Serial | 3949 | ||
Permanent link to this record | |||||
Author | Zahra Raisi-Estabragh; Carlos Martin-Isla; Louise Nissen; Liliana Szabo; Victor M. Campello; Sergio Escalera; Simon Winther; Morten Bottcher; Karim Lekadir; and Steffen E. Petersen | ||||
Title | Radiomics analysis enhances the diagnostic performance of CMR stress perfusion: a proof-of-concept study using the Dan-NICAD dataset | Type | Journal Article | ||
Year | 2023 | Publication | Frontiers in Cardiovascular Medicine | Abbreviated Journal | FCM |
Volume | Issue | Pages | |||
Keywords | |||||
Abstract | |||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | HUPBA | Approved | no | ||
Call Number | Admin @ si @ RMN2023 | Serial | 3937 | ||
Permanent link to this record |