|
Lluis Gomez, Andres Mafla, Marçal Rusiñol, & Dimosthenis Karatzas. (2018). Single Shot Scene Text Retrieval. In 15th European Conference on Computer Vision (Vol. 11218, pp. 728–744). LNCS.
Abstract: Textual information found in scene images provides high level semantic information about the image and its context and it can be leveraged for better scene understanding. In this paper we address the problem of scene text retrieval: given a text query, the system must return all images containing the queried text. The novelty of the proposed model consists in the usage of a single shot CNN architecture that predicts at the same time bounding boxes and a compact text representation of the words in them. In this way, the text based image retrieval task can be casted as a simple nearest neighbor search of the query text representation over the outputs of the CNN over the entire image
database. Our experiments demonstrate that the proposed architecture
outperforms previous state-of-the-art while it offers a significant increase
in processing speed.
Keywords: Image retrieval; Scene text; Word spotting; Convolutional Neural Networks; Region Proposals Networks; PHOC
|
|
|
Marc Oliu, Javier Selva, & Sergio Escalera. (2018). Folded Recurrent Neural Networks for Future Video Prediction. In 15th European Conference on Computer Vision (Vol. 11218, pp. 745–761). LNCS.
Abstract: Future video prediction is an ill-posed Computer Vision problem that recently received much attention. Its main challenges are the high variability in video content, the propagation of errors through time, and the non-specificity of the future frames: given a sequence of past frames there is a continuous distribution of possible futures. This work introduces bijective Gated Recurrent Units, a double mapping between the input and output of a GRU layer. This allows for recurrent auto-encoders with state sharing between encoder and decoder, stratifying the sequence representation and helping to prevent capacity problems. We show how with this topology only the encoder or decoder needs to be applied for input encoding and prediction, respectively. This reduces the computational cost and avoids re-encoding the predictions when generating a sequence of frames, mitigating the propagation of errors. Furthermore, it is possible to remove layers from an already trained model, giving an insight to the role performed by each layer and making the model more explainable. We evaluate our approach on three video datasets, outperforming state of the art prediction results on MMNIST and UCF101, and obtaining competitive results on KTH with 2 and 3 times less memory usage and computational cost than the best scored approach.
|
|
|
Felipe Codevilla, Antonio Lopez, Vladlen Koltun, & Alexey Dosovitskiy. (2018). On Offline Evaluation of Vision-based Driving Models. In 15th European Conference on Computer Vision (Vol. 11219, pp. 246–262). LNCS.
Abstract: Autonomous driving models should ideally be evaluated by deploying
them on a fleet of physical vehicles in the real world. Unfortunately, this approach is not practical for the vast majority of researchers. An attractive alternative is to evaluate models offline, on a pre-collected validation dataset with ground truth annotation. In this paper, we investigate the relation between various online and offline metrics for evaluation of autonomous driving models. We find that offline prediction error is not necessarily correlated with driving quality, and two models with identical prediction error can differ dramatically in their driving performance. We show that the correlation of offline evaluation with driving quality can be significantly improved by selecting an appropriate validation dataset and
suitable offline metrics.
Keywords: Autonomous driving; deep learning
|
|
|
Santi Puch, Irina Sanchez, Aura Hernandez-Sabate, Gemma Piella, & Vesna Prckovska. (2018). Global Planar Convolutions for Improved Context Aggregation in Brain Tumor Segmentation. In International MICCAI Brainlesion Workshop (Vol. 11384, pp. 393–405). LNCS.
Abstract: In this work, we introduce the Global Planar Convolution module as a building-block for fully-convolutional networks that aggregates global information and, therefore, enhances the context perception capabilities of segmentation networks in the context of brain tumor segmentation. We implement two baseline architectures (3D UNet and a residual version of 3D UNet, ResUNet) and present a novel architecture based on these two architectures, ContextNet, that includes the proposed Global Planar Convolution module. We show that the addition of such module eliminates the need of building networks with several representation levels, which tend to be over-parametrized and to showcase slow rates of convergence. Furthermore, we provide a visual demonstration of the behavior of GPC modules via visualization of intermediate representations. We finally participate in the 2018 edition of the BraTS challenge with our best performing models, that are based on ContextNet, and report the evaluation scores on the validation and the test sets of the challenge.
Keywords: Brain tumors; 3D fully-convolutional CNN; Magnetic resonance imaging; Global planar convolution
|
|
|
Anguelos Nicolaou, Sounak Dey, V.Christlein, A.Maier, & Dimosthenis Karatzas. (2018). Non-deterministic Behavior of Ranking-based Metrics when Evaluating Embeddings. In International Workshop on Reproducible Research in Pattern Recognition (Vol. 11455, pp. 71–82). LNCS.
Abstract: Embedding data into vector spaces is a very popular strategy of pattern recognition methods. When distances between embeddings are quantized, performance metrics become ambiguous. In this paper, we present an analysis of the ambiguity quantized distances introduce and provide bounds on the effect. We demonstrate that it can have a measurable effect in empirical data in state-of-the-art systems. We also approach the phenomenon from a computer security perspective and demonstrate how someone being evaluated by a third party can exploit this ambiguity and greatly outperform a random predictor without even access to the input data. We also suggest a simple solution making the performance metrics, which rely on ranking, totally deterministic and impervious to such exploits.
|
|