|
Records |
Links |
|
Author |
Sergio Escalera; Petia Radeva; Jordi Vitria; Xavier Baro; Bogdan Raducanu |
|
|
Title |
Modelling and Analyzing Multimodal Dyadic Interactions Using Social Networks |
Type |
Conference Article |
|
Year |
2010 |
Publication |
12th International Conference on Multimodal Interfaces and 7th Workshop on Machine Learning for Multimodal Interaction. |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
|
|
|
Keywords |
Social interaction; Multimodal fusion, Influence model; Social network analysis |
|
|
Abstract |
Social network analysis became a common technique used to model and quantify the properties of social interactions. In this paper, we propose an integrated framework to explore the characteristics of a social network extracted from
multimodal dyadic interactions. First, speech detection is performed through an audio/visual fusion scheme based on stacked sequential learning. In the audio domain, speech is detected through clusterization of audio features. Clusters
are modelled by means of an One-state Hidden Markov Model containing a diagonal covariance Gaussian Mixture Model. In the visual domain, speech detection is performed through differential-based feature extraction from the segmented
mouth region, and a dynamic programming matching procedure. Second, in order to model the dyadic interactions, we employed the Influence Model whose states
encode the previous integrated audio/visual data. Third, the social network is extracted based on the estimated influences. For our study, we used a set of videos belonging to New York Times’ Blogging Heads opinion blog. The results
are reported both in terms of accuracy of the audio/visual data fusion and centrality measures used to characterize the social network. |
|
|
Address |
Beijing (China) |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
ICMI-MLI |
|
|
Notes |
OR;MILAB;HUPBA;MV |
Approved |
no |
|
|
Call Number |
BCNPCL @ bcnpcl @ ERV2010 |
Serial |
1427 |
|
Permanent link to this record |
|
|
|
|
Author |
Sergio Escalera; Jordi Gonzalez; Xavier Baro; Miguel Reyes; Oscar Lopes; Isabelle Guyon; V. Athitsos; Hugo Jair Escalante |
|
|
Title |
Multi-modal Gesture Recognition Challenge 2013: Dataset and Results |
Type |
Conference Article |
|
Year |
2013 |
Publication |
15th ACM International Conference on Multimodal Interaction |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
445-452 |
|
|
Keywords |
|
|
|
Abstract |
The recognition of continuous natural gestures is a complex and challenging problem due to the multi-modal nature of involved visual cues (e.g. fingers and lips movements, subtle facial expressions, body pose, etc.), as well as technical limitations such as spatial and temporal resolution and unreliable
depth cues. In order to promote the research advance on this field, we organized a challenge on multi-modal gesture recognition. We made available a large video database of 13; 858 gestures from a lexicon of 20 Italian gesture categories recorded with a KinectTM camera, providing the audio, skeletal model, user mask, RGB and depth images. The focus of the challenge was on user independent multiple gesture learning. There are no resting positions and the gestures are performed in continuous sequences lasting 1-2 minutes, containing between 8 and 20 gesture instances in each sequence. As a result, the dataset contains around 1:720:800 frames. In addition to the 20 main gesture categories, ‘distracter’ gestures are included, meaning that additional audio
and gestures out of the vocabulary are included. The final evaluation of the challenge was defined in terms of the Levenshtein edit distance, where the goal was to indicate the real order of gestures within the sequence. 54 international teams participated in the challenge, and outstanding results
were obtained by the first ranked participants. |
|
|
Address |
Sidney; Australia; December 2013 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
978-1-4503-2129-7 |
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
ICMI |
|
|
Notes |
HUPBA; ISE; 600.063;MV |
Approved |
no |
|
|
Call Number |
Admin @ si @ EGB2013 |
Serial |
2373 |
|
Permanent link to this record |
|
|
|
|
Author |
David Vazquez; Antonio Lopez; Daniel Ponsa; Javier Marin |
|
|
Title |
Virtual Worlds and Active Learning for Human Detection |
Type |
Conference Article |
|
Year |
2011 |
Publication |
13th International Conference on Multimodal Interaction |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
393-400 |
|
|
Keywords |
Pedestrian Detection; Human detection; Virtual; Domain Adaptation; Active Learning |
|
|
Abstract |
Image based human detection is of paramount interest due to its potential applications in fields such as advanced driving assistance, surveillance and media analysis. However, even detecting non-occluded standing humans remains a challenge of intensive research. The most promising human detectors rely on classifiers developed in the discriminative paradigm, i.e., trained with labelled samples. However, labeling is a manual intensive step, especially in cases like human detection where it is necessary to provide at least bounding boxes framing the humans for training. To overcome such problem, some authors have proposed the use of a virtual world where the labels of the different objects are obtained automatically. This means that the human models (classifiers) are learnt using the appearance of rendered images, i.e., using realistic computer graphics. Later, these models are used for human detection in images of the real world. The results of this technique are surprisingly good. However, these are not always as good as the classical approach of training and testing with data coming from the same camera, or similar ones. Accordingly, in this paper we address the challenge of using a virtual world for gathering (while playing a videogame) a large amount of automatically labelled samples (virtual humans and background) and then training a classifier that performs equal, in real-world images, than the one obtained by equally training from manually labelled real-world samples. For doing that, we cast the problem as one of domain adaptation. In doing so, we assume that a small amount of manually labelled samples from real-world images is required. To collect these labelled samples we propose a non-standard active learning technique. Therefore, ultimately our human model is learnt by the combination of virtual and real world labelled samples (Fig. 1), which has not been done before. We present quantitative results showing that this approach is valid. |
|
|
Address |
Alicante, Spain |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
ACM DL |
Place of Publication |
New York, NY, USA, USA |
Editor |
|
|
|
Language |
English |
Summary Language |
English |
Original Title |
Virtual Worlds and Active Learning for Human Detection |
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
978-1-4503-0641-6 |
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
ICMI |
|
|
Notes |
ADAS |
Approved |
yes |
|
|
Call Number |
ADAS @ adas @ VLP2011a |
Serial |
1683 |
|
Permanent link to this record |
|
|
|
|
Author |
Victor Ponce; Sergio Escalera; Xavier Baro |
|
|
Title |
Multi-modal Social Signal Analysis for Predicting Agreement in Conversation Settings |
Type |
Conference Article |
|
Year |
2013 |
Publication |
15th ACM International Conference on Multimodal Interaction |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
495-502 |
|
|
Keywords |
|
|
|
Abstract |
In this paper we present a non-invasive ambient intelligence framework for the analysis of non-verbal communication applied to conversational settings. In particular, we apply feature extraction techniques to multi-modal audio-RGB-depth data. We compute a set of behavioral indicators that define communicative cues coming from the fields of psychology and observational methodology. We test our methodology over data captured in victim-offender mediation scenarios. Using different state-of-the-art classification approaches, our system achieve upon 75% of recognition predicting agreement among the parts involved in the conversations, using as ground truth the experts opinions. |
|
|
Address |
Sidney; Australia; December 2013 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
978-1-4503-2129-7 |
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
ICMI |
|
|
Notes |
HuPBA;MV |
Approved |
no |
|
|
Call Number |
Admin @ si @ PEB2013 |
Serial |
2488 |
|
Permanent link to this record |
|
|
|
|
Author |
Ruth Aylett; Ginevra Castellano; Bogdan Raducanu; Ana Paiva; Marc Hanheide |
|
|
Title |
Long-term socially perceptive and interactive robot companions: challenges and future perspectives |
Type |
Conference Article |
|
Year |
2011 |
Publication |
13th International Conference on Multimodal Interaction |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
323-326 |
|
|
Keywords |
human-robot interaction, multimodal interaction, social robotics |
|
|
Abstract |
This paper gives a brief overview of the challenges for multi-model perception and generation applied to robot companions located in human social environments. It reviews the current position in both perception and generation and the immediate technical challenges and goes on to consider the extra issues raised by embodiment and social context. Finally, it briefly discusses the impact of systems that must function continually over months rather than just for a few hours. |
|
|
Address |
Alicante |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
ACM |
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
978-1-4503-0641-6 |
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
ICMI |
|
|
Notes |
OR;MV |
Approved |
no |
|
|
Call Number |
Admin @ si @ ACR2011 |
Serial |
1888 |
|
Permanent link to this record |
|
|
|
|
Author |
Javier M. Olaso; Alain Vazquez; Leila Ben Letaifa; Mikel de Velasco; Aymen Mtibaa; Mohamed Amine Hmani; Dijana Petrovska-Delacretaz; Gerard Chollet; Cesar Montenegro; Asier Lopez-Zorrilla; Raquel Justo; Roberto Santana; Jofre Tenorio-Laranga; Eduardo Gonzalez-Fraile; Begoña Fernandez-Ruanova; Gennaro Cordasco; Anna Esposito; Kristin Beck Gjellesvik; Anna Torp Johansen; Maria Stylianou Kornes; Colin Pickard; Cornelius Glackin; Gary Cahalane; Pau Buch; Cristina Palmero; Sergio Escalera; Olga Gordeeva; Olivier Deroo; Anaïs Fernandez; Daria Kyslitska; Jose Antonio Lozano; Maria Ines Torres; Stephan Schlogl |
|
|
Title |
The EMPATHIC Virtual Coach: a demo |
Type |
Conference Article |
|
Year |
2021 |
Publication |
23rd ACM International Conference on Multimodal Interaction |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
848-851 |
|
|
Keywords |
|
|
|
Abstract |
The main objective of the EMPATHIC project has been the design and development of a virtual coach to engage the healthy-senior user and to enhance well-being through awareness of personal status. The EMPATHIC approach addresses this objective through multimodal interactions supported by the GROW coaching model. The paper summarizes the main components of the EMPATHIC Virtual Coach (EMPATHIC-VC) and introduces a demonstration of the coaching sessions in selected scenarios. |
|
|
Address |
Virtual; October 2021 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
ICMI |
|
|
Notes |
HUPBA; no proj |
Approved |
no |
|
|
Call Number |
Admin @ si @ OVB2021 |
Serial |
3644 |
|
Permanent link to this record |
|
|
|
|
Author |
Fadi Dornaika; Bogdan Raducanu |
|
|
Title |
Constructing Panoramic Views Through Facial Gaze Tracking |
Type |
Conference Article |
|
Year |
2008 |
Publication |
IEEE International Conference on Multimedia and Expo, |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
969–972 |
|
|
Keywords |
|
|
|
Abstract |
|
|
|
Address |
Hannover (Germany) |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
ICME |
|
|
Notes |
OR;MV |
Approved |
no |
|
|
Call Number |
BCNPCL @ bcnpcl @ DoR2008b |
Serial |
983 |
|
Permanent link to this record |
|
|
|
|
Author |
Xavier Baro; Sergio Escalera; Petia Radeva; Jordi Vitria |
|
|
Title |
Visual Content Layer for Scalable Recognition in Urban Image Databases, Internet Multimedia Search and Mining |
Type |
Conference Article |
|
Year |
2009 |
Publication |
10th IEEE International Conference on Multimedia and Expo |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
1616–1619 |
|
|
Keywords |
|
|
|
Abstract |
Rich online map interaction represents a useful tool to get multimedia information related to physical places. With this type of systems, users can automatically compute the optimal route for a trip or to look for entertainment places or hotels near their actual position. Standard maps are defined as a fusion of layers, where each one contains specific data such height, streets, or a particular business location. In this paper we propose the construction of a visual content layer which describes the visual appearance of geographic locations in a city. We captured, by means of a Mobile Mapping system, a huge set of georeferenced images (> 500K) which cover the whole city of Barcelona. For each image, hundreds of region descriptions are computed off-line and described as a hash code. This allows an efficient and scalable way of accessing maps by visual content. |
|
|
Address |
New York (USA) |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
978-1-4244-4291-1 |
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
ICME |
|
|
Notes |
OR;MILAB;HuPBA;MV |
Approved |
no |
|
|
Call Number |
BCNPCL @ bcnpcl @ BER2009 |
Serial |
1189 |
|
Permanent link to this record |
|
|
|
|
Author |
D. Jayagopi; Bogdan Raducanu; D. Gatica-Perez |
|
|
Title |
Characterizing conversational group dynamics using nonverbal behaviour |
Type |
Conference Article |
|
Year |
2009 |
Publication |
10th IEEE International Conference on Multimedia and Expo |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
370–373 |
|
|
Keywords |
|
|
|
Abstract |
This paper addresses the novel problem of characterizing conversational group dynamics. It is well documented in social psychology that depending on the objectives a group, the dynamics are different. For example, a competitive meeting has a different objective from that of a collaborative meeting. We propose a method to characterize group dynamics based on the joint description of a group members' aggregated acoustical nonverbal behaviour to classify two meeting datasets (one being cooperative-type and the other being competitive-type). We use 4.5 hours of real behavioural multi-party data and show that our methodology can achieve a classification rate of upto 100%. |
|
|
Address |
New York, USA |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
1945-7871 |
ISBN |
978-1-4244-4290-4 |
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
ICME |
|
|
Notes |
OR;MV |
Approved |
no |
|
|
Call Number |
BCNPCL @ bcnpcl @ JRG2009 |
Serial |
1217 |
|
Permanent link to this record |
|
|
|
|
Author |
H. Emrah Tasli; Cevahir Çigla; Theo Gevers; A. Aydin Alatan |
|
|
Title |
Super pixel extraction via convexity induced boundary adaptation |
Type |
Conference Article |
|
Year |
2013 |
Publication |
14th IEEE International Conference on Multimedia and Expo |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
1-6 |
|
|
Keywords |
|
|
|
Abstract |
This study presents an efficient super-pixel extraction algorithm with major contributions to the state-of-the-art in terms of accuracy and computational complexity. Segmentation accuracy is improved through convexity constrained geodesic distance utilization; while computational efficiency is achieved by replacing complete region processing with boundary adaptation idea. Starting from the uniformly distributed rectangular equal-sized super-pixels, region boundaries are adapted to intensity edges iteratively by assigning boundary pixels to the most similar neighboring super-pixels. At each iteration, super-pixel regions are updated and hence progressively converging to compact pixel groups. Experimental results with state-of-the-art comparisons, validate the performance of the proposed technique in terms of both accuracy and speed. |
|
|
Address |
San Jose; USA; July 2013 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
1945-7871 |
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
ICME |
|
|
Notes |
ALTRES;ISE |
Approved |
no |
|
|
Call Number |
Admin @ si @ TÇG2013 |
Serial |
2367 |
|
Permanent link to this record |
|
|
|
|
Author |
Jaime Moreno; Xavier Otazu |
|
|
Title |
Image compression algorithm based on Hilbert scanning of embedded quadTrees: an introduction of the Hi-SET coder |
Type |
Conference Article |
|
Year |
2011 |
Publication |
IEEE International Conference on Multimedia and Expo |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
1-6 |
|
|
Keywords |
|
|
|
Abstract |
In this work we present an effective and computationally simple algorithm for image compression based on Hilbert Scanning of Embedded quadTrees (Hi-SET). It allows to represent an image as an embedded bitstream along a fractal function. Embedding is an important feature of modern image compression algorithms, in this way Salomon in [1, pg. 614] cite that another feature and perhaps a unique one is the fact of achieving the best quality for the number of bits input by the decoder at any point during the decoding. Hi-SET possesses also this latter feature. Furthermore, the coder is based on a quadtree partition strategy, that applied to image transformation structures such as discrete cosine or wavelet transform allows to obtain an energy clustering both in frequency and space. The coding algorithm is composed of three general steps, using just a list of significant pixels. The implementation of the proposed coder is developed for gray-scale and color image compression. Hi-SET compressed images are, on average, 6.20dB better than the ones obtained by other compression techniques based on the Hilbert scanning. Moreover, Hi-SET improves the image quality in 1.39dB and 1.00dB in gray-scale and color compression, respectively, when compared with JPEG2000 coder. |
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
1945-7871 |
ISBN |
978-1-61284-348-3 |
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
ICME |
|
|
Notes |
CIC |
Approved |
no |
|
|
Call Number |
Admin @ si @ MoO2011a |
Serial |
2176 |
|
Permanent link to this record |
|
|
|
|
Author |
Marc Bolaños; R. Mestre; Estefania Talavera; Xavier Giro; Petia Radeva |
|
|
Title |
Visual Summary of Egocentric Photostreams by Representative Keyframes |
Type |
Conference Article |
|
Year |
2015 |
Publication |
IEEE International Conference on Multimedia and Expo ICMEW2015 |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
1-6 |
|
|
Keywords |
egocentric; lifelogging; summarization; keyframes |
|
|
Abstract |
Building a visual summary from an egocentric photostream captured by a lifelogging wearable camera is of high interest for different applications (e.g. memory reinforcement). In this paper, we propose a new summarization method based on keyframes selection that uses visual features extracted bymeans of a convolutional neural network. Our method applies an unsupervised clustering for dividing the photostreams into events, and finally extracts the most relevant keyframe for each event. We assess the results by applying a blind-taste test on a group of 20 people who assessed the quality of the
summaries. |
|
|
Address |
Torino; italy; July 2015 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
978-1-4799-7079-7 |
Edition |
|
|
|
ISSN |
|
ISBN |
978-1-4799-7079-7 |
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
ICME |
|
|
Notes |
MILAB |
Approved |
no |
|
|
Call Number |
Admin @ si @ BMT2015 |
Serial |
2638 |
|
Permanent link to this record |
|
|
|
|
Author |
Adriana Romero; Nicolas Ballas; Samira Ebrahimi Kahou; Antoine Chassang; Carlo Gatta; Yoshua Bengio |
|
|
Title |
FitNets: Hints for Thin Deep Nets |
Type |
Conference Article |
|
Year |
2015 |
Publication |
3rd International Conference on Learning Representations ICLR2015 |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
|
|
|
Keywords |
Computer Science ; Learning; Computer Science ;Neural and Evolutionary Computing |
|
|
Abstract |
While depth tends to improve network performances, it also makes gradient-based training more difficult since deeper networks tend to be more non-linear. The recently proposed knowledge distillation approach is aimed at obtaining small and fast-to-execute models, and it has shown that a student network could imitate the soft output of a larger teacher network or ensemble of networks. In this paper, we extend this idea to allow the training of a student that is deeper and thinner than the teacher, using not only the outputs but also the intermediate representations learned by the teacher as hints to improve the training process and final performance of the student. Because the student intermediate hidden layer will generally be smaller than the teacher's intermediate hidden layer, additional parameters are introduced to map the student hidden layer to the prediction of the teacher hidden layer. This allows one to train deeper students that can generalize better or run faster, a trade-off that is controlled by the chosen student capacity. For example, on CIFAR-10, a deep student network with almost 10.4 times less parameters outperforms a larger, state-of-the-art teacher network. |
|
|
Address |
San Diego; CA; May 2015 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
ICLR |
|
|
Notes |
MILAB |
Approved |
no |
|
|
Call Number |
Admin @ si @ RBK2015 |
Serial |
2593 |
|
Permanent link to this record |
|
|
|
|
Author |
Marc Masana; Joost Van de Weijer; Andrew Bagdanov |
|
|
Title |
On-the-fly Network pruning for object detection |
Type |
Conference Article |
|
Year |
2016 |
Publication |
International conference on learning representations |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
|
|
|
Keywords |
|
|
|
Abstract |
Object detection with deep neural networks is often performed by passing a few
thousand candidate bounding boxes through a deep neural network for each image.
These bounding boxes are highly correlated since they originate from the same
image. In this paper we investigate how to exploit feature occurrence at the image scale to prune the neural network which is subsequently applied to all bounding boxes. We show that removing units which have near-zero activation in the image allows us to significantly reduce the number of parameters in the network. Results on the PASCAL 2007 Object Detection Challenge demonstrate that up to 40% of units in some fully-connected layers can be entirely eliminated with little change in the detection result. |
|
|
Address |
Puerto Rico; May 2016 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
ICLR |
|
|
Notes |
LAMP; 600.068; 600.106; 600.079 |
Approved |
no |
|
|
Call Number |
Admin @ si @MWB2016 |
Serial |
2758 |
|
Permanent link to this record |
|
|
|
|
Author |
Ishaan Gulrajani; Kundan Kumar; Faruk Ahmed; Adrien Ali Taiga; Francesco Visin; David Vazquez; Aaron Courville |
|
|
Title |
PixelVAE: A Latent Variable Model for Natural Images |
Type |
Conference Article |
|
Year |
2017 |
Publication |
5th International Conference on Learning Representations |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
|
|
|
Keywords |
Deep Learning; Unsupervised Learning |
|
|
Abstract |
Natural image modeling is a landmark challenge of unsupervised learning. Variational Autoencoders (VAEs) learn a useful latent representation and generate samples that preserve global structure but tend to suffer from image blurriness. PixelCNNs model sharp contours and details very well, but lack an explicit latent representation and have difficulty modeling large-scale structure in a computationally efficient way. In this paper, we present PixelVAE, a VAE model with an autoregressive decoder based on PixelCNN. The resulting architecture achieves state-of-the-art log-likelihood on binarized MNIST. We extend PixelVAE to a hierarchy of multiple latent variables at different scales; this hierarchical model achieves competitive likelihood on 64x64 ImageNet and generates high-quality samples on LSUN bedrooms. |
|
|
Address |
Toulon; France; April 2017 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
ICLR |
|
|
Notes |
ADAS; 600.085; 600.076; 601.281; 600.118 |
Approved |
no |
|
|
Call Number |
ADAS @ adas @ GKA2017 |
Serial |
2815 |
|
Permanent link to this record |