Home | [211–220] << 221 222 223 224 225 226 227 228 >> |
Records | |||||
---|---|---|---|---|---|
Author | Ali Furkan Biten | ||||
Title | A Bitter-Sweet Symphony on Vision and Language: Bias and World Knowledge | Type | Book Whole | ||
Year | 2022 | Publication | PhD Thesis, Universitat Autonoma de Barcelona-CVC | Abbreviated Journal | |
Volume | Issue | Pages | |||
Keywords | |||||
Abstract | Vision and Language are broadly regarded as cornerstones of intelligence. Even though language and vision have different aims – language having the purpose of communication, transmission of information and vision having the purpose of constructing mental representations around us to navigate and interact with objects – they cooperate and depend on one another in many tasks we perform effortlessly. This reliance is actively being studied in various Computer Vision tasks, e.g. image captioning, visual question answering, image-sentence retrieval, phrase grounding, just to name a few. All of these tasks share the inherent difficulty of the aligning the two modalities, while being robust to language
priors and various biases existing in the datasets. One of the ultimate goal for vision and language research is to be able to inject world knowledge while getting rid of the biases that come with the datasets. In this thesis, we mainly focus on two vision and language tasks, namely Image Captioning and Scene-Text Visual Question Answering (STVQA). In both domains, we start by defining a new task that requires the utilization of world knowledge and in both tasks, we find that the models commonly employed are prone to biases that exist in the data. Concretely, we introduce new tasks and discover several problems that impede performance at each level and provide remedies or possible solutions in each chapter: i) We define a new task to move beyond Image Captioning to Image Interpretation that can utilize Named Entities in the form of world knowledge. ii) We study the object hallucination problem in classic Image Captioning systems and develop an architecture-agnostic solution. iii) We define a sub-task of Visual Question Answering that requires reading the text in the image (STVQA), where we highlight the limitations of current models. iv) We propose an architecture for the STVQA task that can point to the answer in the image and show how to combine it with classic VQA models. v) We show how far language can get us in STVQA and discover yet another bias which causes the models to disregard the image while doing Visual Question Answering. |
||||
Address | |||||
Corporate Author | Thesis | Ph.D. thesis | |||
Publisher | IMPRIMA | Place of Publication | Editor | Dimosthenis Karatzas;Lluis Gomez | |
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | 978-84-124793-5-5 | Medium | ||
Area | Expedition | Conference | |||
Notes | DAG | Approved | no | ||
Call Number | Admin @ si @ Bit2022 | Serial | 3755 | ||
Permanent link to this record | |||||
Author | Shigang Yue; F. Claire Rind; Matthias S. Keil; Jorge Cuadri; Richard Stafford | ||||
Title | A bio-inspired visual collision detection mechanism for cars: Optimisation of a model of a locust neuron to a novel environment | Type | Journal | ||
Year | 2006 | Publication | Neurocomputing 69(13–15): 1591–1598 | Abbreviated Journal | |
Volume | Issue | Pages | |||
Keywords | |||||
Abstract | |||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | Approved | no | |||
Call Number | Admin @ si @ YRK2006 | Serial | 652 | ||
Permanent link to this record | |||||
Author | Alicia Fornes; Josep Llados; Joan Mas; Joana Maria Pujadas-Mora; Anna Cabre | ||||
Title | A Bimodal Crowdsourcing Platform for Demographic Historical Manuscripts | Type | Conference Article | ||
Year | 2014 | Publication | Digital Access to Textual Cultural Heritage Conference | Abbreviated Journal | |
Volume | Issue | Pages | 103-108 | ||
Keywords | |||||
Abstract | In this paper we present a crowdsourcing web-based application for extracting information from demographic handwritten document images. The proposed application integrates two points of view: the semantic information for demographic research, and the ground-truthing for document analysis research. Concretely, the application has the contents view, where the information is recorded into forms, and the labeling view, with the word labels for evaluating document analysis techniques. The crowdsourcing architecture allows to accelerate the information extraction (many users can work simultaneously), validate the information, and easily provide feedback to the users. We finally show how the proposed application can be extended to other kind of demographic historical manuscripts. | ||||
Address | Madrid; May 2014 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | 978-1-4503-2588-2 | Medium | ||
Area | Expedition | Conference | DATeCH | ||
Notes | DAG; 600.061; 602.006; 600.077 | Approved | no | ||
Call Number | Admin @ si @ FLM2014 | Serial | 2516 | ||
Permanent link to this record | |||||
Author | Juan Borrego-Carazo; Carles Sanchez; David Castells; Jordi Carrabina; Debora Gil | ||||
Title | A benchmark for the evaluation of computational methods for bronchoscopic navigation | Type | Journal Article | ||
Year | 2022 | Publication | International Journal of Computer Assisted Radiology and Surgery | Abbreviated Journal | IJCARS |
Volume | 17 | Issue | 1 | Pages | |
Keywords | |||||
Abstract | |||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | IAM | Approved | no | ||
Call Number | Admin @ si @ BSC2022 | Serial | 3832 | ||
Permanent link to this record | |||||
Author | David Vazquez; Jorge Bernal; F. Javier Sanchez; Gloria Fernandez Esparrach; Antonio Lopez; Adriana Romero; Michal Drozdzal; Aaron Courville | ||||
Title | A Benchmark for Endoluminal Scene Segmentation of Colonoscopy Images | Type | Conference Article | ||
Year | 2017 | Publication | 31st International Congress and Exhibition on Computer Assisted Radiology and Surgery | Abbreviated Journal | |
Volume | Issue | Pages | |||
Keywords | Deep Learning; Medical Imaging | ||||
Abstract | Colorectal cancer (CRC) is the third cause of cancer death worldwide. Currently, the standard approach to reduce CRC-related mortality is to perform regular screening in search for polyps and colonoscopy is the screening tool of choice. The main limitations of this screening procedure are polyp miss-rate and inability to perform visual assessment of polyp malignancy. These drawbacks can be reduced by designing Decision Support Systems (DSS) aiming to help clinicians in the different stages of the procedure by providing endoluminal scene segmentation. Thus, in this paper, we introduce an extended benchmark of colonoscopy image, with the hope of establishing a new strong benchmark for colonoscopy image analysis research. We provide new baselines on this dataset by training standard fully convolutional networks (FCN) for semantic segmentation and significantly outperforming, without any further post-processing, prior results in endoluminal scene segmentation. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | CARS | ||
Notes | ADAS; MV; 600.075; 600.085; 600.076; 601.281; 600.118 | Approved | no | ||
Call Number | ADAS @ adas @ VBS2017a | Serial | 2880 | ||
Permanent link to this record | |||||
Author | David Vazquez; Jorge Bernal; F. Javier Sanchez; Gloria Fernandez Esparrach; Antonio Lopez; Adriana Romero; Michal Drozdzal; Aaron Courville | ||||
Title | A Benchmark for Endoluminal Scene Segmentation of Colonoscopy Images | Type | Journal Article | ||
Year | 2017 | Publication | Journal of Healthcare Engineering | Abbreviated Journal | JHCE |
Volume | Issue | Pages | 2040-2295 | ||
Keywords | Colonoscopy images; Deep Learning; Semantic Segmentation | ||||
Abstract | Colorectal cancer (CRC) is the third cause of cancer death world-wide. Currently, the standard approach to reduce CRC-related mortality is to perform regular screening in search for polyps and colonoscopy is the screening tool of choice. The main limitations of this screening procedure are polyp miss- rate and inability to perform visual assessment of polyp malignancy. These drawbacks can be reduced by designing Decision Support Systems (DSS) aim- ing to help clinicians in the different stages of the procedure by providing endoluminal scene segmentation. Thus, in this paper, we introduce an extended benchmark of colonoscopy image segmentation, with the hope of establishing a new strong benchmark for colonoscopy image analysis research. The proposed dataset consists of 4 relevant classes to inspect the endolumninal scene, tar- geting different clinical needs. Together with the dataset and taking advantage of advances in semantic segmentation literature, we provide new baselines by training standard fully convolutional networks (FCN). We perform a compar- ative study to show that FCN significantly outperform, without any further post-processing, prior results in endoluminal scene segmentation, especially with respect to polyp segmentation and localization. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | ADAS; MV; 600.075; 600.085; 600.076; 601.281; 600.118 | Approved | no | ||
Call Number | VBS2017b | Serial | 2940 | ||
Permanent link to this record | |||||
Author | Anjan Dutta; Josep Llados; Umapada Pal | ||||
Title | A Bag-of-Paths Based Serialized Subgraph Matching for Symbol Spotting in Line Drawings | Type | Conference Article | ||
Year | 2011 | Publication | 5th Iberian Conference on Pattern Recognition and Image Analysis | Abbreviated Journal | |
Volume | 6669 | Issue | Pages | 620-627 | |
Keywords | |||||
Abstract | In this paper we propose an error tolerant subgraph matching algorithm based on bag-of-paths for solving the problem of symbol spotting in line drawings. Bag-of-paths is a factorized representation of graphs where the factorization is done by considering all the acyclic paths between each pair of connected nodes. Similar paths within the whole collection of documents are clustered and organized in a lookup table for efficient indexing. The lookup table contains the index key of each cluster and the corresponding list of locations as a single entry. The mean path of each of the clusters serves as the index key for each table entry. The spotting method is then formulated by a spatial voting scheme to the list of locations of the paths that are decided in terms of search of similar paths that compose the query symbol. Efficient indexing of common substructures helps to reduce the computational burden of usual graph based methods. The proposed method can also be seen as a way to serialize graphs which allows to reduce the complexity of the subgraph isomorphism. We have encoded the paths in terms of both attributed strings and turning functions, and presented a comparative results between them within the symbol spotting framework. Experimentations for matching different shape silhouettes are also reported and the method has been proved to work in noisy environment also. | ||||
Address | Las Palmas de Gran Canaria. Spain | ||||
Corporate Author | Thesis | ||||
Publisher | Springer Berlin Heidelberg | Place of Publication | Berlin | Editor | Jordi Vitria; Joao Miguel Raposo; Mario Hernandez |
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | LNCS | ||
Series Volume | Series Issue | Edition | |||
ISSN | 0302-9743 | ISBN | 978-3-642-21256-7 | Medium | |
Area | Expedition | Conference | IbPRIA | ||
Notes | DAG | Approved | no | ||
Call Number | Admin @ si @ DLP2011a | Serial | 1738 | ||
Permanent link to this record | |||||
Author | Albert Gordo; Florent Perronnin | ||||
Title | A Bag-of-Pages Approach to Unordered Multi-Page Document Classification | Type | Conference Article | ||
Year | 2010 | Publication | 20th International Conference on Pattern Recognition | Abbreviated Journal | |
Volume | Issue | Pages | 1920–1923 | ||
Keywords | |||||
Abstract | We consider the problem of classifying documents containing multiple unordered pages. For this purpose, we propose a novel bag-of-pages document representation. To represent a document, one assigns every page to a prototype in a codebook of pages. This leads to a histogram representation which can then be fed to any discriminative classifier. We also consider several refinements over this initial approach. We show on two challenging datasets that the proposed approach significantly outperforms a baseline system. | ||||
Address | Istanbul (Turkey) | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | 1051-4651 | ISBN | 978-1-4244-7542-1 | Medium | |
Area | Expedition | Conference | ICPR | ||
Notes | DAG | Approved | no | ||
Call Number | Admin @ si @ GoP2010 | Serial | 1480 | ||
Permanent link to this record | |||||
Author | Albert Gordo; Alicia Fornes; Ernest Valveny; Josep Llados | ||||
Title | A Bag of Notes Approach to Writer Identification in Old Handwritten Music Scores | Type | Conference Article | ||
Year | 2010 | Publication | 9th IAPR International Workshop on Document Analysis Systems | Abbreviated Journal | |
Volume | Issue | Pages | 247–254 | ||
Keywords | |||||
Abstract | Determining the authorship of a document, namely writer identification, can be an important source of information for document categorization. Contrary to text documents, the identification of the writer of graphical documents is still a challenge. In this paper we present a robust approach for writer identification in a particular kind of graphical documents, old music scores. This approach adapts the bag of visual terms method for coping with graphic documents. The identification is performed only using the graphical music notation. For this purpose, we generate a graphic vocabulary without recognizing any music symbols, and consequently, avoiding the difficulties in the recognition of hand-drawn symbols in old and degraded documents. The proposed method has been tested on a database of old music scores from the 17th to 19th centuries, achieving very high identification rates. | ||||
Address | Boston; USA; | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | 978-1-60558-773-8 | Medium | ||
Area | Expedition | Conference | DAS | ||
Notes | DAG | Approved | no | ||
Call Number | DAG @ dag @ GFV2010 | Serial | 1320 | ||
Permanent link to this record | |||||
Author | Angel Valencia; Roger Idrovo; Angel Sappa; Douglas Plaza; Daniel Ochoa | ||||
Title | A 3D Vision Based Approach for Optimal Grasp of Vacuum Grippers | Type | Conference Article | ||
Year | 2017 | Publication | IEEE International Workshop of Electronics, Control, Measurement, Signals and their application to Mechatronics | Abbreviated Journal | |
Volume | Issue | Pages | |||
Keywords | |||||
Abstract | In general, robot grasping approaches are based on the usage of multi-finger grippers. However, when large size objects need to be manipulated vacuum grippers are preferred, instead of finger based grippers. This paper aims to estimate the best picking place for a two suction cups vacuum gripper,
when planar objects with an unknown size and geometry are considered. The approach is based on the estimation of geometric properties of object’s shape from a partial cloud of points (a single 3D view), in such a way that combine with considerations of a theoretical model to generate an optimal contact point that minimizes the vacuum force needed to guarantee a grasp. Experimental results in real scenarios are presented to show the validity of the proposed approach. |
||||
Address | San Sebastian; Spain; May 2017 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | ECMSM | ||
Notes | ADAS; 600.086; 600.118 | Approved | no | ||
Call Number | Admin @ si @ VIS2017 | Serial | 2917 | ||
Permanent link to this record | |||||
Author | Ignasi Rius; Dani Rowe; Jordi Gonzalez; Xavier Roca | ||||
Title | A 3D Dynamic Model of Human Actions for Probabilistic Image Tracking | Type | Book Chapter | ||
Year | 2005 | Publication | Pattern Recognition and Image Analysis (IbPRIA 2005), LNCS 3522: 529–536 | Abbreviated Journal | |
Volume | Issue | Pages | |||
Keywords | |||||
Abstract | |||||
Address | Estoril (Portugal) | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | ISE | Approved | no | ||
Call Number | ISE @ ise @ RRG2005b | Serial | 544 | ||
Permanent link to this record | |||||
Author | Mohamed Ramzy Ibrahim; Robert Benavente; Felipe Lumbreras; Daniel Ponsa | ||||
Title | 3DRRDB: Super Resolution of Multiple Remote Sensing Images using 3D Residual in Residual Dense Blocks | Type | Conference Article | ||
Year | 2022 | Publication | CVPR 2022 Workshop on IEEE Perception Beyond the Visible Spectrum workshop series (PBVS, 18th Edition) | Abbreviated Journal | |
Volume | Issue | Pages | |||
Keywords | Training; Solid modeling; Three-dimensional displays; PSNR; Convolution; Superresolution; Pattern recognition | ||||
Abstract | The rapid advancement of Deep Convolutional Neural Networks helped in solving many remote sensing problems, especially the problems of super-resolution. However, most state-of-the-art methods focus more on Single Image Super-Resolution neglecting Multi-Image Super-Resolution. In this work, a new proposed 3D Residual in Residual Dense Blocks model (3DRRDB) focuses on remote sensing Multi-Image Super-Resolution for two different single spectral bands. The proposed 3DRRDB model explores the idea of 3D convolution layers in deeply connected Dense Blocks and the effect of local and global residual connections with residual scaling in Multi-Image Super-Resolution. The model tested on the Proba-V challenge dataset shows a significant improvement above the current state-of-the-art models scoring a Corrected Peak Signal to Noise Ratio (cPSNR) of 48.79 dB and 50.83 dB for Near Infrared (NIR) and RED Bands respectively. Moreover, the proposed 3DRRDB model scores a Corrected Structural Similarity Index Measure (cSSIM) of 0.9865 and 0.9909 for NIR and RED bands respectively. | ||||
Address | New Orleans, USA; 19 June 2022 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | CVPRW | ||
Notes | MSIAU; 600.130 | Approved | no | ||
Call Number | Admin @ si @ IBL2022 | Serial | 3693 | ||
Permanent link to this record | |||||
Author | Alejandro Gonzalez Alzate; Gabriel Villalonga; German Ros; David Vazquez; Antonio Lopez | ||||
Title | 3D-Guided Multiscale Sliding Window for Pedestrian Detection | Type | Conference Article | ||
Year | 2015 | Publication | Pattern Recognition and Image Analysis, Proceedings of 7th Iberian Conference , ibPRIA 2015 | Abbreviated Journal | |
Volume | 9117 | Issue | Pages | 560-568 | |
Keywords | Pedestrian Detection | ||||
Abstract | The most relevant modules of a pedestrian detector are the candidate generation and the candidate classification. The former aims at presenting image windows to the latter so that they are classified as containing a pedestrian or not. Much attention has being paid to the classification module, while candidate generation has mainly relied on (multiscale) sliding window pyramid. However, candidate generation is critical for achieving real-time. In this paper we assume a context of autonomous driving based on stereo vision. Accordingly, we evaluate the effect of taking into account the 3D information (derived from the stereo) in order to prune the hundred of thousands windows per image generated by classical pyramidal sliding window. For our study we use a multimodal (RGB, disparity) and multi-descriptor (HOG, LBP, HOG+LBP) holistic ensemble based on linear SVM. Evaluation on data from the challenging KITTI benchmark suite shows the effectiveness of using 3D information to dramatically reduce the number of candidate windows, even improving the overall pedestrian detection accuracy. | ||||
Address | Santiago de Compostela; España; June 2015 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | ACDC | Expedition | Conference | IbPRIA | |
Notes | ADAS; 600.076; 600.057; 600.054 | Approved | no | ||
Call Number | ADAS @ adas @ GVR2015 | Serial | 2585 | ||
Permanent link to this record | |||||
Author | Senmao Li; Joost Van de Weijer; Yaxing Wang; Fahad Shahbaz Khan; Meiqin Liu; Jian Yang | ||||
Title | 3D-Aware Multi-Class Image-to-Image Translation with NeRFs | Type | Conference Article | ||
Year | 2023 | Publication | 36th IEEE Conference on Computer Vision and Pattern Recognition | Abbreviated Journal | |
Volume | Issue | Pages | 12652-12662 | ||
Keywords | |||||
Abstract | Recent advances in 3D-aware generative models (3D-aware GANs) combined with Neural Radiance Fields (NeRF) have achieved impressive results. However no prior works investigate 3D-aware GANs for 3D consistent multiclass image-to-image (3D-aware 121) translation. Naively using 2D-121 translation methods suffers from unrealistic shape/identity change. To perform 3D-aware multiclass 121 translation, we decouple this learning process into a multiclass 3D-aware GAN step and a 3D-aware 121 translation step. In the first step, we propose two novel techniques: a new conditional architecture and an effective training strategy. In the second step, based on the well-trained multiclass 3D-aware GAN architecture, that preserves view-consistency, we construct a 3D-aware 121 translation system. To further reduce the view-consistency problems, we propose several new techniques, including a U-net-like adaptor network design, a hierarchical representation constrain and a relative regularization loss. In exten-sive experiments on two datasets, quantitative and qualitative results demonstrate that we successfully perform 3D-aware 121 translation with multi-view consistency. Code is available in 3DI2I. | ||||
Address | Vancouver; Canada; June 2023 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | CVPR | ||
Notes | LAMP | Approved | no | ||
Call Number | Admin @ si @ LWW2023b | Serial | 3920 | ||
Permanent link to this record | |||||
Author | Petia Radeva; Ricardo Toledo; Craig Von Land; Juan J. Villanueva | ||||
Title | 3D Vessel Reconstruction from Biplane Angiograms using Snakes. | Type | Miscellaneous | ||
Year | 1998 | Publication | Abbreviated Journal | ||
Volume | Issue | Pages | |||
Keywords | |||||
Abstract | |||||
Address | Cleveland, OH | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | MILAB;ADAS | Approved | no | ||
Call Number | BCNPCL @ bcnpcl @ RTV1998a | Serial | 198 | ||
Permanent link to this record |