Publicacions CVC -- Query Results

[51–60] << 61 62 63 64 65 66 67 68 69 70 >> [71–74]

Details

	Records
	Author	Sergio Escalera; Alicia Fornes; Oriol Pujol; Josep Llados; Petia Radeva
	Title	Multi-class Binary Object Categorization using Blurred Shape Models			Type	Conference Article
	Year	2007	Publication	Progress in Pattern Recognition, Image Analysis and Applications, 12th Iberoamerican Congress on Pattern	Abbreviated Journal
	Volume	4756	Issue		Pages	773–782
	Keywords
	Abstract
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title	LCNS
	Series Volume		Series Issue		Edition
	ISSN		ISBN	978-3-540-76724-4	Medium
	Area		Expedition		Conference	CIARP
	Notes	MILAB; DAG;HuPBA			Approved	no
	Call Number	BCNPCL @ bcnpcl @ EFP2007			Serial	911
Permanent link to this record



	Author	Sergio Escalera; Alicia Fornes; Oriol Pujol; Josep Llados; Petia Radeva
	Title	Circular Blurred Shape Model for Multiclass Symbol Recognition			Type	Journal Article
	Year	2011	Publication	IEEE Transactions on Systems, Man and Cybernetics (Part B) (IEEE)	Abbreviated Journal	TSMCB
	Volume	41	Issue	2	Pages	497-506
	Keywords
	Abstract	In this paper, we propose a circular blurred shape model descriptor to deal with the problem of symbol detection and classification as a particular case of object recognition. The feature extraction is performed by capturing the spatial arrangement of significant object characteristics in a correlogram structure. The shape information from objects is shared among correlogram regions, where a prior blurring degree defines the level of distortion allowed in the symbol, making the descriptor tolerant to irregular deformations. Moreover, the descriptor is rotation invariant by definition. We validate the effectiveness of the proposed descriptor in both the multiclass symbol recognition and symbol detection domains. In order to perform the symbol detection, the descriptors are learned using a cascade of classifiers. In the case of multiclass categorization, the new feature space is learned using a set of binary classifiers which are embedded in an error-correcting output code design. The results over four symbol data sets show the significant improvements of the proposed descriptor compared to the state-of-the-art descriptors. In particular, the results are even more significant in those cases where the symbols suffer from elastic deformations.
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN	1083-4419	ISBN		Medium
	Area		Expedition		Conference
	Notes	MILAB; DAG;HuPBA			Approved	no
	Call Number	Admin @ si @ EFP2011			Serial	1784
Permanent link to this record



	Author	Sergio Escalera; Alicia Fornes; Oriol Pujol; Petia Radeva
	Title	Multi-class Binary Symbol Classification with Circular Blurred Shape Models			Type	Conference Article
	Year	2009	Publication	15th International Conference on Image Analysis and Processing	Abbreviated Journal
	Volume	5716	Issue		Pages	1005–1014
	Keywords
	Abstract	Multi-class binary symbol classification requires the use of rich descriptors and robust classifiers. Shape representation is a difficult task because of several symbol distortions, such as occlusions, elastic deformations, gaps or noise. In this paper, we present the Circular Blurred Shape Model descriptor. This descriptor encodes the arrangement information of object parts in a correlogram structure. A prior blurring degree defines the level of distortion allowed to the symbol. Moreover, we learn the new feature space using a set of Adaboost classifiers, which are combined in the Error-Correcting Output Codes framework to deal with the multi-class categorization problem. The presented work has been validated over different multi-class data sets, and compared to the state-of-the-art descriptors, showing significant performance improvements.
	Address	Salerno, Italy
	Corporate Author				Thesis
	Publisher	Springer Berlin Heidelberg	Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title	LNCS
	Series Volume		Series Issue		Edition
	ISSN	0302-9743	ISBN	978-3-642-04145-7	Medium
	Area		Expedition		Conference	ICIAP
	Notes	MILAB;HuPBA;DAG			Approved	no
	Call Number	BCNPCL @ bcnpcl @ EFP2009c			Serial	1186
Permanent link to this record



	Author	Sophie Wuerger; Kaida Xiao; Chenyang Fu; Dimosthenis Karatzas
	Title	Colour-opponent mechanisms are not affected by age-related chromatic sensitivity changes			Type	Journal Article
	Year	2010	Publication	Ophthalmic and Physiological Optics	Abbreviated Journal	OPO
	Volume	30	Issue	5	Pages	635-659
	Keywords
	Abstract	The purpose of this study was to assess whether age-related chromatic sensitivity changes are associated with corresponding changes in hue perception in a large sample of colour-normal observers over a wide age range (n = 185; age range: 18-75 years). In these observers we determined both the sensitivity along the protan, deutan and tritan line; and settings for the four unique hues, from which the characteristics of the higher-order colour mechanisms can be derived. We found a significant decrease in chromatic sensitivity due to ageing, in particular along the tritan line. From the unique hue settings we derived the cone weightings associated with the colour mechanisms that are at equilibrium for the four unique hues. We found that the relative cone weightings (w(L) /w(M) and w(L) /w(S)) associated with the unique hues were independent of age. Our results are consistent with previous findings that the unique hues are rather constant with age while chromatic sensitivity declines. They also provide evidence in favour of the hypothesis that higher-order colour mechanisms are equipped with flexible cone weightings, as opposed to fixed weights. The mechanism underlying this compensation is still poorly understood.
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	DAG; IF: 1.259			Approved	no
	Call Number	Admin @ si @ WXF2010			Serial	1826
Permanent link to this record



	Author	Sophie Wuerger; Kaida Xiao; Dimitris Mylonas; Q. Huang; Dimosthenis Karatzas; Galina Paramei
	Title	Blue green color categorization in mandarin english speakers			Type	Journal Article
	Year	2012	Publication	Journal of the Optical Society of America A	Abbreviated Journal	JOSA A
	Volume	29	Issue	2	Pages	A102-A1207
	Keywords
	Abstract	Observers are faster to detect a target among a set of distracters if the targets and distracters come from different color categories. This cross-boundary advantage seems to be limited to the right visual field, which is consistent with the dominance of the left hemisphere for language processing [Gilbert et al., Proc. Natl. Acad. Sci. USA 103, 489 (2006)]. Here we study whether a similar visual field advantage is found in the color identification task in speakers of Mandarin, a language that uses a logographic system. Forty late Mandarin-English bilinguals performed a blue-green color categorization task, in a blocked design, in their first language (L1: Mandarin) or second language (L2: English). Eleven color singletons ranging from blue to green were presented for 160 ms, randomly in the left visual field (LVF) or right visual field (RVF). Color boundary and reaction times (RTs) at the color boundary were estimated in L1 and L2, for both visual fields. We found that the color boundary did not differ between the languages; RTs at the color boundary, however, were on average more than 100 ms shorter in the English compared to the Mandarin sessions, but only when the stimuli were presented in the RVF. The finding may be explained by the script nature of the two languages: Mandarin logographic characters are analyzed visuospatially in the right hemisphere, which conceivably facilitates identification of color presented to the LVF.
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	DAG			Approved	no
	Call Number	Admin @ si @ WXM2012			Serial	2007
Permanent link to this record



	Author	Souhail Bakkali; Sanket Biswas; Zuheng Ming; Mickael Coustaty; Marçal Rusiñol; Oriol Ramos Terrades; Josep Llados
	Title	TransferDoc: A Self-Supervised Transferable Document Representation Learning Model Unifying Vision and Language			Type	Miscellaneous
	Year	2023	Publication	Arxiv	Abbreviated Journal
	Volume		Issue		Pages
	Keywords
	Abstract	The field of visual document understanding has witnessed a rapid growth in emerging challenges and powerful multi-modal strategies. However, they rely on an extensive amount of document data to learn their pretext objectives in a ``pre-train-then-fine-tune'' paradigm and thus, suffer a significant performance drop in real-world online industrial settings. One major reason is the over-reliance on OCR engines to extract local positional information within a document page. Therefore, this hinders the model's generalizability, flexibility and robustness due to the lack of capturing global information within a document image. We introduce TransferDoc, a cross-modal transformer-based architecture pre-trained in a self-supervised fashion using three novel pretext objectives. TransferDoc learns richer semantic concepts by unifying language and visual representations, which enables the production of more transferable models. Besides, two novel downstream tasks have been introduced for a ``closer-to-real'' industrial evaluation scenario where TransferDoc outperforms other state-of-the-art approaches.
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	DAG			Approved	no
	Call Number	Admin @ si @ BBM2023			Serial	3995
Permanent link to this record



	Author	Souhail Bakkali; Zuheng Ming; Mickael Coustaty; Marçal Rusiñol; Oriol Ramos Terrades
	Title	VLCDoC: Vision-Language Contrastive Pre-Training Model for Cross-Modal Document Classification			Type	Journal Article
	Year	2023	Publication	Pattern Recognition	Abbreviated Journal	PR
	Volume	139	Issue		Pages	109419
	Keywords
	Abstract	Multimodal learning from document data has achieved great success lately as it allows to pre-train semantically meaningful features as a prior into a learnable downstream approach. In this paper, we approach the document classification problem by learning cross-modal representations through language and vision cues, considering intra- and inter-modality relationships. Instead of merging features from different modalities into a common representation space, the proposed method exploits high-level interactions and learns relevant semantic information from effective attention flows within and across modalities. The proposed learning objective is devised between intra- and inter-modality alignment tasks, where the similarity distribution per task is computed by contracting positive sample pairs while simultaneously contrasting negative ones in the common feature representation space}. Extensive experiments on public document classification datasets demonstrate the effectiveness and the generalization capacity of our model on both low-scale and large-scale datasets.
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN	ISSN 0031-3203	ISBN		Medium
	Area		Expedition		Conference
	Notes	DAG; 600.140; 600.121			Approved	no
	Call Number	Admin @ si @ BMC2023			Serial	3826
Permanent link to this record



	Author	Soumya Jahagirdar; Minesh Mathew; Dimosthenis Karatzas; CV Jawahar
	Title	Watching the News: Towards VideoQA Models that can Read			Type	Conference Article
	Year	2023	Publication	Proceedings of the IEEE/CVF Winter Conference on Applications of Computer	Abbreviated Journal
	Volume		Issue		Pages
	Keywords
	Abstract	Video Question Answering methods focus on commonsense reasoning and visual cognition of objects or persons and their interactions over time. Current VideoQA approaches ignore the textual information present in the video. Instead, we argue that textual information is complementary to the action and provides essential contextualisation cues to the reasoning process. To this end, we propose a novel VideoQA task that requires reading and understanding the text in the video. To explore this direction, we focus on news videos and require QA systems to comprehend and answer questions about the topics presented by combining visual and textual cues in the video. We introduce the ``NewsVideoQA'' dataset that comprises more than 8,600 QA pairs on 3,000+ news videos obtained from diverse news channels from around the world. We demonstrate the limitations of current Scene Text VQA and VideoQA methods and propose ways to incorporate scene text information into VideoQA methods.
	Address	Waikoloa; Hawai; USA; January 2023
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	WACV
	Notes	DAG			Approved	no
	Call Number	Admin @ si @ JMK2023			Serial	3899
Permanent link to this record



	Author	Soumya Jahagirdar; Minesh Mathew; Dimosthenis Karatzas; CV Jawahar
	Title	Understanding Video Scenes Through Text: Insights from Text-Based Video Question Answering			Type	Conference Article
	Year	2023	Publication	Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops	Abbreviated Journal
	Volume		Issue		Pages
	Keywords
	Abstract	Researchers have extensively studied the field of vision and language, discovering that both visual and textual content is crucial for understanding scenes effectively. Particularly, comprehending text in videos holds great significance, requiring both scene text understanding and temporal reasoning. This paper focuses on exploring two recently introduced datasets, NewsVideoQA and M4-ViteVQA, which aim to address video question answering based on textual content. The NewsVideoQA dataset contains question-answer pairs related to the text in news videos, while M4- ViteVQA comprises question-answer pairs from diverse categories like vlogging, traveling, and shopping. We provide an analysis of the formulation of these datasets on various levels, exploring the degree of visual understanding and multi-frame comprehension required for answering the questions. Additionally, the study includes experimentation with BERT-QA, a text-only model, which demonstrates comparable performance to the original methods on both datasets, indicating the shortcomings in the formulation of these datasets. Furthermore, we also look into the domain adaptation aspect by examining the effectiveness of training on M4-ViteVQA and evaluating on NewsVideoQA and vice-versa, thereby shedding light on the challenges and potential benefits of out-of-domain training.
	Address	Paris; France; October 2023
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	ICCVW
	Notes	DAG			Approved	no
	Call Number	Admin @ si @ JMK2023			Serial	3946
Permanent link to this record



	Author	Sounak Dey
	Title	Mapping between Images and Conceptual Spaces: Sketch-based Image Retrieval			Type	Book Whole
	Year	2020	Publication	PhD Thesis, Universitat Autonoma de Barcelona-CVC	Abbreviated Journal
	Volume		Issue		Pages
	Keywords
	Abstract	This thesis presents several contributions to the literature of sketch based image retrieval (SBIR). In SBIR the first challenge we face is how to map two different domains to common space for effective retrieval of images, while tackling the different levels of abstraction people use to express their notion of objects around while sketching. To this extent we first propose a cross-modal learning framework that maps both sketches and text into a joint embedding space invariant to depictive style, while preserving semantics. Then we have also investigated different query types possible to encompass people's dilema in sketching certain world objects. For this we propose an approach for multi-modal image retrieval in multi-labelled images. A multi-modal deep network architecture is formulated to jointly model sketches and text as input query modalities into a common embedding space, which is then further aligned with the image feature space. This permits encoding the object-based features and its alignment with the query irrespective of the availability of the co-occurrence of different objects in the training set. Finally, we explore the problem of zero-shot sketch-based image retrieval (ZS-SBIR), where human sketches are used as queries to conduct retrieval of photos from unseen categories. We importantly advance prior arts by proposing a novel ZS-SBIR scenario that represents a firm step forward in its practical application. The new setting uniquely recognises two important yet often neglected challenges of practical ZS-SBIR, (i) the large domain gap between amateur sketch and photo, and (ii) the necessity for moving towards large-scale retrieval. We first contribute to the community a novel ZS-SBIR dataset, QuickDraw-Extended. We also in this dissertation pave the path to the future direction of research in this domain.
	Address
	Corporate Author				Thesis	Ph.D. thesis
	Publisher	Ediciones Graficas Rey	Place of Publication		Editor	Josep Llados;Umapada Pal
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN	978-84-121011-8-8	Medium
	Area		Expedition		Conference
	Notes	DAG; 600.121			Approved	no
	Call Number	Admin @ si @ Dey20			Serial	3480
Permanent link to this record

Select All Deselect All

[51–60] << 61 62 63 64 65 66 67 68 69 70 >> [71–74]

List View

Citations

Details

All Found Records Selected Records:

Save Citations: Format:

Export Records: Format: