| Home | << 1 >> |
|
| Record | |||||
|---|---|---|---|---|---|
| Author | Joakim Bruslund Haurum; Meysam Madadi; Sergio Escalera; Thomas B. Moeslund | ||||
| Title | Multi-scale hybrid vision transformer and Sinkhorn tokenizer for sewer defect classification | Type | Journal Article | ||
| Year | 2022 | Publication | Automation in Construction | Abbreviated Journal | AC |
| Volume | 144 | Issue | Pages | 104614 | |
| Keywords | Sewer Defect Classification; Vision Transformers; Sinkhorn-Knopp; Convolutional Neural Networks; Closed-Circuit Television; Sewer Inspection | ||||
| Abstract | A crucial part of image classification consists of capturing non-local spatial semantics of image content. This paper describes the multi-scale hybrid vision transformer (MSHViT), an extension of the classical convolutional neural network (CNN) backbone, for multi-label sewer defect classification. To better model spatial semantics in the images, features are aggregated at different scales non-locally through the use of a lightweight vision transformer, and a smaller set of tokens was produced through a novel Sinkhorn clustering-based tokenizer using distinct cluster centers. The proposed MSHViT and Sinkhorn tokenizer were evaluated on the Sewer-ML multi-label sewer defect classification dataset, showing consistent performance improvements of up to 2.53 percentage points. | ||||
| Address | Dec 2022 | ||||
| Corporate Author | Thesis | ||||
| Publisher | Place of Publication | Editor | |||
| Language | Summary Language | Original Title | |||
| Series Editor |
Series Title | Abbreviated Series Title | |||
| Series Volume | Series Issue | Edition | |||
| ISSN | ISBN | Medium | |||
| Area | Expedition | Conference | |||
| Notes | HuPBA;MILAB | Approved | no | ||
| Call Number | Admin @ si @ BME2022c | Serial | 3780 | ||
| Permanent link to this record | |||||