[ English | Japanese ]

講演会 (2007年7月以降)


Date & Time
2016/12/19 16:00-17:30
Venue
Faculty of Science Bldg. 7, Room 214
Speaker
Heng Tao Shen (The University of Queensland, Australia)
Title
Hashing Big Multimedia Data
Abstract
Real-time access and analysis of big multimedia data has become really critical to many research problems and practical applications, ranging from search, recognition, to classification and understanding. It has been shown that heterogeneous multimedia data gathered from different sources in different media types can be often correlated and linked to the same knowledge space. In this talk, we will discuss the phenomena of scale, heterogeneity and linkage of big multimedia data. In particular, I will introduce some recent progress on hashing multimedia data for efficient search and classification.

Date & Time
2016/12/19 14:30-16:00
Venue
Faculty of Science Bldg. 7, Room 007
Speaker
Shane Gu (Cambridge University, UK and Max Planck Institute for Intelligent Systems, Germany)
Title
Sample-Efficient and Stable Deep Reinforcement Learning for Robotics
Abstract
Model-free deep reinforcement learning (RL) methods have been successful in a wide variety of simulated domains. However, a major obstacle facing deep RL in the real world is the high sample complexity of such methods. We present two independent lines of work to address this fundamental problem. In the first part, we explore how off-policy deep RL methods based on normalized advantage functions (NAF) can learn real-world robotic manipulation skills, with multiple robots simultaneously pooling their experiences. Our results show that we can obtain faster training and, in some cases, converge to a better solution when training on multiple robots, and we show that we can learn a real-world door opening skill with deep neural network policies using about 2.5 hours of total training time with two robots. In the second part, we present Q-Prop, a novel model-free method that combines the stability of unbiased policy gradients with the efficiency of off-policy RL. We analyze the connection between Q-Prop and existing model-free algorithms, and use control variate theory to derive two variants of Q-Prop with conservative and aggressive adaptation. We show that conservative Q-Prop provides substantial gains in sample efficiency over trust region policy optimization (TRPO) with generalized advantage estimation (GAE), and improves stability over deep deterministic policy gradient (DDPG), the state-of-the-art on-policy and off-policy methods, on OpenAI Gym’s MuJoCo continuous control environments.

Date & Time
2016/11/21 14:15-15:45
Venue
Faculty of Science Bldg. 7, Room 007
Speaker
Eric Xing (CMU, USA)
Title
Strategies & Principles for Distributed Machine Learning
Abstract
The rise of Big Data has led to new demands for Machine Learning (ML) systems to learn complex models with millions to billions of parameters that promise adequate capacity to digest massive datasets and offer powerful predictive analytics (such as high-dimensional latent features, intermediate representations, and decision functions) thereupon. In order to run ML algorithms at such scales, on a distributed cluster with 10s to 1000s of machines, it is often the case that significant engineering efforts are required --- and one might fairly ask if such engineering truly falls within the domain of ML research or not. Taking the view that Big ML systems can indeed benefit greatly from ML-rooted statistical and algorithmic insights --- and that ML researchers should therefore not shy away from such systems design --- we discuss a series of principles and strategies distilled from our resent effort on industrial-scale ML solutions that involve a continuum from application, to engineering, and to theoretical research and development of Big ML system and architecture, on how to make them efficient, general, and with convergence and scaling guarantees. These principles concern four key questions which traditionally receive little attention in ML research: How to distribute an ML program over a cluster? How to bridge ML computation with inter-machine communication? How to perform such communication? What should be communicated between machines? By exposing underlying statistical and algorithmic characteristics unique to ML programs but not typical in traditional computer programs, and by dissecting successful cases of how we harness these principles to design both high-performance distributed ML software and general-purpose ML framework, we present opportunities for ML researchers and practitioners to further shape and grow the area that lies between ML and systems.

Date & Time
2016/11/09 13:00-14:30
Venue
Faculty of Science Bldg. 7, Room 007
Speaker
Parag Rastogi (EPFL, Switzerland)
Title
Machine Learning for Sustainable Building Design: A Building Engineer's Perspective on Machine Learning
Abstract
For the engineering community at large, the promise of artificial intelligence and machine learning is vast and exciting. The potential applications will be both transformative and disruptive for science, engineering, and society in general. The flip side is that taking workable machines and algorithms from the laboratory to application is challenging (to put it mildly). Mathematical elegance, beautiful as it is in its own right, is usually no substitute for application success. In other words, if your algorithm/method does not work and/or I cannot easily use it, I do not care how elegant the derivation is.
In this talk, I will introduce the application I am currently working on: supporting architects and engineers in the design of sustainable climate-resilient buildings. After this initial background and context, I will discuss the expectations from machine learning experts, from the point of view of a user. I will conclude with some of the challenges and frustrations I have experienced in my short experience with using simple machine learning techniques and collaborating with experts.

Date & Time
2016/10/31 14:30-16:00
Venue
Faculty of Science Bldg. 7, Room 007
Speaker
Mohammad Emtiyaz Khan (RIKEN, Japan)
Title
Approximate Bayesian Inference: Bringing Statistics, Optimization, Machine Learning, and AI together
Abstract
Machine learning relies heavily on data to design computers that can learn autonomously, but dealing with noisy, unreliable, heterogeneous, high-dimensional, and missing data is a big challenge in itself. Surprisingly, living beings - even young ones - are very good in dealing with such data. This raises the question: how do they do it, and how can we design computers that can learn like them?
Bayesian methods are promising in answering such questions, but they are computationally challenging, especially when data are large and models are complex. In this talk, I will start by showing a few example applications where this is the case. I will then discuss my work which solves many computational challenges associated with Bayesian methods by converting the "Bayesian integration" problem into an optimization problem. I will outline some of my future plans to design linear-time algorithms for Bayesian inference. Overall, I will argue that, by combining ideas from statistics, optimization, machine learning, and artificial intelligence, we might be able to design computers that can learn autonomously, just like us.

Date & Time
2016/10/06 14:30-16:00
Venue
Faculty of Science Bldg. 7, Room 007
Speaker
Takayuki Osa (Technical University Darmstadt, Germany)
Title
Learning multiple grasping policies and its application to nuclear sort and segregation
Abstract
Costs of nuclear decommissioning are really huge, and many countries are now facing this problem. Recently, EU launched projects to improve the efficacy of nuclear decommissioning by developing robotic systems to assist nuclear operators. One crucial challenge in nuclear waste manipulation is grasping. To deal with various objects in nuclear waste, it is essential to learn multiple grasping policies and generalize them to unseen objects. To address this problem, we developed a framework for hierarchical reinforcement learning. The lower-level policies learn multiple grasp types, and the upper-level policy learns to select from the learned grasp types according to a point cloud of a new object. We verified experimentally that our framework learns multiple grasping policies and generalizes the learned grasps by using local point cloud information.

Date & Time
2016/08/04 15:00-16:30
Venue
Faculty of Science Chemistry Bldg. East, Room 236
Speaker
Nathan Srebro (Toyota Technological Institute at Chicago, USA)
Title
Geometry of Optimization and Generalization in Multilayer Networks
Abstract
What is it that enables learning with multi-layer networks? What causes the network to generalize well? What makes it possible to optimize the error, despite the problem being hard in the worst case? In this talk I will attempt to address these questions and relate between them, carrying over insights from matrix factorization, and highlighting the important role of optimization in deep learning. I will then use the insight to suggest studying novel optimization methods, and will present Path-SGD, a novel optimization approach for multi-layer RELU networks that yields better optimization and better generalization.
Joint work with Behnam Neyshabur, Yuhuai Wu, Ryota Tomioka and Russ Salakhutdinov.

Date & Time
2016/08/03 14:30-16:00
Venue
Faculty of Science Chemistry Bldg. East, Room 236
Speaker
Matthew Holland (Nara Institute of Science and Technology, Japan)
Title
Stable learning: big gains through simple re-coding
Abstract
This talk will revolve around the following notion: the possibility of making simple, principled modifications to common learning algorithms, in such a way that dramatic performance improvements (both formal guarantees and in practice) are obtained at a tolerable cost. An elementary example is the task of estimating the population mean given a finite real-valued sample. The sample mean does enjoy a form of minimax optimality, but in terms of high-confidence bounds on the error, one is better off "throwing away" some information. As has been well-studied for over a half-century, this can be done in the form of truncating observations, ignoring marginal quantiles, sub-sampling and then discarding oddities, and so forth. In fact, equally simple strategies often work for much more complicated tasks, and this shall be our focus in the first part of the talk. More concretely, we discuss how sharp performance bounds can be obtained under very weak assumptions in tasks including clustering, K-armed bandits, and high-dimensional sparse linear regression, through a relatively simple re-coding of canonical algorithms. In the second part, I will discuss some of my own related work, which centres on the analysis and application of a class of robust loss minimizers designed by re-coding the archetypal empirical risk minimizer.

Date & Time
2016/08/02 18:00-19:30
Venue
Faculty of Science Bldg. 7, Room 214
Speaker
Bin Yang (Rakuten Institute of Technology, Japan)
Title
An Introduction to Differential Privacy and Bayesian Differential Privacy
Abstract
Recently, differential privacy has become a popular privacy definition, since it provides a rigorous standard for evaluating the privacy of perturbation algorithms. It has widely been regarded that differential privacy is a universal definition that deals with both independent and correlated data and a differentially private algorithm can protect privacy against arbitrary adversaries. However, recent research indicates that differential privacy may not guarantee privacy against arbitrary adversaries if the data are correlated.
In this talk, let us focus on the private perturbation algorithms on correlated data. The following three problems will be investigated: (1) the influence of data correlations on privacy; (2) the influence of adversary prior knowledge on privacy; and (3) a general perturbation algorithm that is private for prior knowledge of any subset of tuples in the data when the data are correlated. I will show our definition of privacy, called Bayesian differential privacy, by which, the privacy level of a probabilistic perturbation algorithm can be evaluated even when the data are correlated and when the prior knowledge is incomplete. I will also present a Gaussian correlation model to accurately describe the structure of data correlations and analyze the Bayesian differential privacy of the perturbation algorithm on the basis of this model. Our results show that privacy is poorest for an adversary who has the least prior knowledge.

Date & Time
2016/06/10 13:30-15:00
Venue
Faculty of Science Bldg. 7, Room 202
Speaker
Stephan Zheng (California Institute of Technology, USA)
Title
Modeling long-term planning behavior using hierarchical policy networks
Abstract
In this talk, I will discuss the problem of learning to plan spatiotemporal trajectories over long time horizons using expert demonstrations. For instance, in sports, agents often choose action sequences with long-term goals in mind, such as achieving a certain strategic position. Conventional policy learning approaches, such as those based on Markov decision processes, generally fail at learning cohesive long-term behavior in such high-dimensional state spaces, and are only effective when myopic planning leads to the desired behavior. The key difficulty is that such approaches use “shallow” planners that only learn a single state-action policy. We instead propose to learn a hierarchical planner that reasons about both long-term and short-term goals, which we instantiate as a hierarchical deep memory network. We showcase our approach in a case study on learning to imitate demonstrated basketball trajectories, and show that it generates significantly more realistic trajectories compared to non-hierarchical baselines as judged by professional sports analysts. If time permits, I will also summarize my previous research in deep learning and tensor model optimization.

Date & Time
2016/04/06 13:00-14:45
Venue
Faculty of Science Bldg. 7, Room 102
Speaker
Wittawat Jitkrittum (University College London, UK)
Title
Interpretable Distribution Features with Maximum Testing Power
Abstract
Two distances on probability distributions are proposed, based on a difference between features chosen from each, where these features can be in either the spatial or Fourier domains. The features are chosen so as to maximize the distinguishability of the distributions, by optimizing an estimate of power for a statistical test using these features. The result is a parsimonious and interpretable indication of how and where two distributions differ, which can be used even in high dimensions, and when the difference is localized in the Fourier domain. It is shown that the test power estimate converges with increasing sample size, thus ensuring the quality of the returned features. In benchmark experiments, statistical tests based on these features outperform previous linear-time two-sample tests. Real-world benchmarks on text and image data demonstrate that the returned features provide a meaningful and informative indication as to how the distributions differ.

Date & Time
2016/03/22 13:30-15:00
Venue
Faculty of Science Bldg. 7, Room 007
Speaker
Le Song (Georgia Institute of Technology, USA)
Title
Understanding Deep Learning via Kernel Methods
Abstract
Nowadays, deep neural networks are the methods of choice when it comes to large scale nonlinear learning problems. What makes deep neural networks work? Is there any general principle for tackling high dimensional nonlinear problems which we can learn from deep neural works? Can we design competitive or better alternatives based on such knowledge? To make progress in these questions, we have scaled up kernel methods to the regime where deep neural networks work well using techniques such as doubly stochastic gradient descent and structured approximation to random features. These methods allow us to conduct "lesion-and-replace" experiments on existing deep learning architectures using large scale image datasets such ImageNet. The experimental results provide insights to three important aspects of deep learning, namely the usefulness of the fully connected layers, the importance of the compositional structures, and the advantage of the feature adaptation. Our results also point to promising directions for future research on big nonlinear models.

Date & Time
2016/02/12 10:30-12:00
Venue
Faculty of Science Bldg. 7, Room 007
Speaker
David Blei (Columbia University, USA)
Title
Scaling and Generalizing Variational Inference
Abstract
Latent variable models have become a key tool for the modern statistician, letting us express complex assumptions about the hidden structures that underlie our data. Latent variable models have been successfully applied in numerous fields.
The central computational problem in latent variable modeling is posterior inference, the problem of approximating the conditional distribution of the latent variables given the observations. Posterior inference is central to both exploratory tasks and predictive tasks. Approximate posterior inference algorithms have revolutionized Bayesian statistics, revealing its potential as a usable and general-purpose language for data analysis.
Bayesian statistics, however, has not yet reached this potential. First, statisticians and scientists regularly encounter massive data sets, but existing approximate inference algorithms do not scale well. Second, most approximate inference algorithms are not generic; each must be adapted to the specific model at hand.
In this talk I will discuss our recent research on addressing these two limitations. I will describe stochastic variational inference, an approximate inference algorithm for handling massive data sets. I will demonstrate its application to probabilistic topic models of text conditioned on millions of articles. Then I will discuss black box variational inference. Black box inference is a generic algorithm for approximating the posterior. We can easily apply it to many models with little model-specific derivation and few restrictions on their properties. I will demonstrate its use on longitudinal models of healthcare data, deep exponential families, and discuss a new black-box variational inference algorithm in the Stan programming language.
This is joint work based on these three papers:
M. Hoffman, D. Blei, J. Paisley, and C. Wang. Stochastic variational inference. Journal of Machine Learning Research, 14:1303-1347, 2013.
R. Ranganath, S. Gerrish, and D. Blei. Black box variational inference. Artificial Intelligence and Statistics, 2014.
A. Kucukelbir, R. Ranganath, A. Gelman, and D. Blei. Automatic variational inference in Stan. Neural Information Processing Systems, 2015.

Date & Time
2015/12/16 15:00-17:00
Venue
Faculty of Science Bldg. 7, Room 202
Speaker
Kangasrääsiö Antti (Aalto University, Finland)
Title
SciNet - Scientific Search Engine for Exploratory Search
Abstract
Exploratory search tasks are in many ways different from traditional lookup search tasks. They are often open-ended, require the user to learn while searching and proceed in an iterative manner. Traditional search interfaces for scientific search, such as Google Scholar, are generally tailored for lookup search, and thus offer only limited support for exploration. In this talk I introduce the SciNet search engine for exploratory scientific search. The main feature of the search engine is the interface that visualizes the current search intent model to the user. The interface allows the user to make iterative improvements to the model, thus directing the search. I also discuss a few general improvements I have developed for this type of interaction between a user and a learning algorithm. Live demonstration of the system will be also be available.

Date & Time
2015/11/18 13:30-15:00
Venue
Faculty of Science Bldg. 7, Room 007
Speaker
Shinichi Nakajima (Technische Universitat Berlin, Germany)
Title
Efficient Exact Inference with Loss Augmented Objective in Structured Learning
Abstract
Structural SVM is an elegant approach for building complex and accurate models with structured outputs. However, its applicability relies on the availability of efficient inference algorithm---the state-of-the-art training algorithms perform inference in each iteration to find the most violating configuration. In this paper, we propose an exact inference algorithm for maximizing non-decomposable objectives, which significantly extend the applicability of structural SVM. As an important application, our method covers the loss augmented inference, which enables the slack scaling formulation with a variety of dissimilarity measures, e.g. Hamming loss, precision and recall, F-beta-loss, intersection over union, and many other functions that can be efficiently computed from the contingency table. We demonstrate advantages of our approach in natural language parsing and sequence segmentation applications.

Date & Time
2015/11/16 13:30-15:00
Venue
Faculty of Science Bldg. 7, Room 007
Speaker
Samuel Kaski (Aalto University, Finland)
Title
Bayesian factorization of multiple data sources
Abstract
An increasingly common data analysis task is to factorize multiple data matrices together. The goal can be to borrow strength from related data sources for missing value imputation or prediction, or to find out what is shared between different sources and what is unique in each. I will discuss an extension of factor analysis to this task, group factor analysis GFA, and its extension from analysis of multiple coupled matrices to multiple coupled tensors and matrices. I will pick examples from molecular medicine and brain data analysis.

Date & Time
2015/10/16 13:30-15:00
Venue
Faculty of Science Bldg. 7, Room 007
Speaker
Klaus-Robert Müller (Technische Universitat Berlin, Germany and Korea University, Korea)
Title
Machine Learning applications in Quantum Chemistry
Abstract
In recent years machine learning (ML) methods have begun to play a more and more enabling role in the sciences and in industry. Part I of the talk provides a brief introduction to machine learning and the subsequent two parts touch the topics of explaining machine learning and finally the ML application in Physics.
Part II: Understanding and interpreting classification decisions of automated image classification systems is of high value in many applications, as it allows to verify the reasoning of the system and provides additional information to the human expert. Although machine learning methods are solving very successfully a plethora of tasks, they have in most cases the disadvantage of acting as a black box, not providing any information about what made them arrive at a particular decision. This work proposes a general solution to the problem of understanding classification decisions by pixel-wise decomposition of nonlinear classifiers.
Part III reports on recent work, where ML is applied to the exploration of chemical compound space and materials. Here the focus will be placed on the quest for better representations of molecules and solids.

Date & Time
2015/9/7 13:30-15:00
Venue
Faculty of Science Bldg. 7, Room 214
Speaker
Daniele Calandriello (INRIA, France)
Title
Online Spectral Sparsification for Large-Scale Graph-Based Semi-Supervised Learning
Abstract
While the harmonic function solution performs well in many semi-supervised learning (SSL) tasks, it is known to scale poorly with the number of samples. Recent successful and scalable methods focus on efficiently approximating the whole spectrum of the graph Laplacian constructed from the data. This is in contrast to various subsampling and quantization methods proposed in the past, which may fail in preserving the spectral structure of the graph. However, the impact of the approximation of the spectrum on the final generalization error is either unknown, or requires strong assumptions on the data. In this paper, we introduce SPARSE-HFS, an efficient edge-sparsification algorithm for SSL. By constructing an edge-sparse and spectrally similar graph, we are able to leverage the approximation guarantees of spectral sparsification methods to bound the generalization error of SPARSE-HFS. As a result, we obtain a theoretically-grounded approximation scheme for graph-based SSL that also empirically matches the performance of known large-scale methods.

Date & Time
2015/8/20 10:30-12:00
Venue
Faculty of Science Bldg. 7, Room 214
Speaker
Peter Wittek (ICFO - Institute of Photonic Sciences, Spain)
Title
Learning with Quantum Resources: The Challenges of Generalizing Classical Results
Abstract
The theory of computational learning is rich in important results: the trade-offs between sample and model complexities, no-free-lunch theorems, learning capacity and computational complexity are widely studied and understood. Over the last two decades and especially over the last few years, several proposals have been put forward to perform machine learning with quantum resources. Advantages range from quadratic or even exponential speedup, increased learning capacity, reduced sample complexity and better generalization performance. Some examples are quantum perceptrons, quantum neural networks and quantum deep learning, boosting training by Grover's search or adiabatic quantum annealing, and quantum support vector machines relying on a quantum random access memory. The methods stand a chance to be the next real-world application of quantum information processing, with several impressive experimental demonstrations. Yet, the theory of quantum machine learning lags behind: we do not have a good understanding of the limits implied by the new set of constraints. In this talk, we give an introduction to the most relevant concepts in quantum mechanics and quantum information theory to understand the key research directions and highlight the most important challenges.

Date & Time
2015/8/17 13:30-15:00
Venue
Faculty of Science Bldg. 7, Room 214
Speaker
Gregor Gebhardt (Technical University Darmstadt, Germany)
Title
The Generalized Kernel Kalman Filter - Learning Forward Models from High Dimensional Observations
Abstract
Learning forward models from high-dimensional partial observations of the real state is a challenging machine learning problem. Recently, nonparametric inference methods have been proposed to tackle such problems. However, such methods either do not provide an uncertainty estimate, are computationally expensive, or can only be applied to a limited set of problems. We generalize the formulation of Kalman Filters (KF) embeddings into a reproducing kernel Hilbert space (RKHS) to be applicable to systems with high-dimensional, partial observations. Our formulation provides probabilistic state estimations and predictions for non-linear dynamical systems that can also be directly learned from the observations. Additionally, we propose an alternative formulation of the RKHS embedding of a conditional density that allows to learn from large data sets, while maintaining computational efficiency. We show on a nonlinear state estimation task with high dimensional observations that our approach provides an improved estimation accuracy.

Date & Time
2015/7/29 14:00-15:00
Venue
Faculty of Science Bldg. 7, Room 214
Speaker
Mitsuhiro Hayashibe (INRIA, France)
Title
Personalized Neuroprosthetics and Synergetic Learning Control
Abstract
One of the challenging issues in computational rehabilitation is that there is a large variety of patient situations depending on the type of neurological disorder. To improve the performance of motor neuroprosthetics beyond the current limited use of such system, subject-specific modelling would be essential. In addition, human characteristics are basically time variant, for instance, neuromuscular dynamics may vary according to muscle fatigue. In order to correspond to time-varying characteristics, we believe that robust bio-signal processing and model-based control which can manage the nonlinearity and time variance of the system, would bring break-through and new modality in rehabilitation. In order to predict FES-induced joint torque, evoked-Electromyography (eEMG) has been applied to correlate muscle electrical and mechanical activities. The robustness of the torque prediction has been investigated in a fatigue tracking task in experiment with Spinal Cord Injured subjects. The results demonstrate good tracking performance of muscle variations in the presence of fatigue and against some other disturbances. A new control strategy, EMG-Feedback Predictive Control (EFPC), was proposed to adaptively control stimulation pattern compensating to time-varying muscle state changes. It is implemented together with wireless portable stimulator. In addition, Synergetic Learning Control is introduced for solving redundancy coordination issues in peripheral motor control. It is combined with BCI application for multi-DOF robot control.

Date & Time
2015/7/14 13:30-15:00
Venue
Faculty of Science Bldg. 7, Room 214
Speaker
Fabian Lotte (INRIA, France)
Title
Robust EEG signals classification towards practical Brain-Computer Interface technologies
Abstract
Brain-Computer Interfaces (BCI) are systems that can translate the brain activity patterns of a user into messages or commands for an interactive application. The brain activity which is processed by BCI systems is usually measured using Electroencephalography (EEG). BCI technologies proved to be promising for a wide range of applications including communication and control for motor impaired users, gaming targeted toward the general public, real-time mental state monitoring or stroke rehabilitation. Despite this promising potential, BCI still su ffer from a number of limitations that need to be overcome before they can be used in practical applications, outside laboratories. Among these limitations, one can cite their lack of robustness and reliability to noise and non-stationarity (over time or contexts) and their long calibration time. This last point, namely the long calibration time of BCI, is due to the fact that many examples of the user’s EEG signals must be recorded in order to calibrate the BCI specifically for each user, using machine learning. This talk will present our research works toward addressing these limitations. First, I will describe our work using a-priori knowledge about EEG signals and regularization to design more robust spatial filters, i.e., to efficiently combine EEG sensors signals into more discriminant signals. We will see that such spatial filters can also be robustified by using robust covariance matrices averaging approaches. I will also present how regularized spatial filters can be used to reduce BCI calibration time by using estimators dedicated to small samples problems or by combining EEG data from other users. If no data from other users is available, I will show that artificial EEG data can be generated from a few available data to calibrate BCI systems with very little data, hence reducing calibration times. Finally, this talk will illustrate our work on making BCI robust across contexts for neuroergonomics applications, i.e., when using EEG signals to assess the ergonomics qualities of a human-computer interface. In particular, we will show how we can estimate mental workload levels from EEG signals across different application scenarios, i.e., even when the BCI is calibrated on a very different context than the one in which it will be used.

Date & Time
2015/5/26 13:30-15:00
Venue
Faculty of Science Bldg. 7, Room 214
Speaker
Christos Dimitrakakis (Chalmers University of Technology, Sweden)
Title
When Bayesian inference makes (differential) privacy easy
Abstract
We study sufficient conditions on prior distributions and likelihood families that result in differential privacy for posterior distributions. This directly results in a simple posterior sampling mechanism, for which we prove bounds on the utility and distinguishability. We also connect this to the exponential mechanism.

Date & Time
2015/4/10 13:30-15:00
Venue
Faculty of Science Bldg. 7, Room 214
Speaker
Ryota Tomioka (Toyota Technological Institute at Chicago, USA)
Title
Jointly Learning Multiple Perceptual Similarities
Abstract
Perceptual similarity between objects is multi-faceted and it is easier to judge similarity when the focus is on a specific aspect. We consider the problem of mapping objects into view specific embeddings where the distance between them is consistent with the similarity comparisons of the form "from the t-th perspective, object A is more similar to B than to C". Our framework jointly learns view specific embeddings and can exploit correlations between views if they exist. Experiments on a number of datasets, including a large dataset of multi-view crowdsourced comparison on bird images, show the proposed method achieves lower triplet generalization error and better grouping of classes in most cases, when compared to learning embeddings independently for each view. The improvements are especially large in the realistic setting when there is limited triplet data for each view.
Joint work with Liwen Zhang and Subhransu Maji
http://arxiv.org/abs/1503.01521

Date & Time
2015/4/9 11:00-12:00
Venue
Faculty of Science Bldg. 7, Room 214
Speaker
Danushka Bollegala (University of Liverpool, UK)
Title
Unsupervised Learning of Lexical Semantics
Abstract
Representing semantics of lexical units such as words, phrases, sentences, or documents is a fundamental task for text processing applications. Once we have a semantic representation for a lexical unit such as a vector, matrix or a higher-order tensor, then we can tap into rich linear algebraic operators to compose, analyze, or decompose the meanings represented by those lexical units. In this talk, I will first overview existing approaches on lexical semantic representations, and introduce the recent advancement we have made in semantic representations of words and relations. Moreover, I will discuss future research directions in the hope of potential collaborations in this field.

Date & Time
2015/4/7 13:30-15:00
Venue
Faculty of Science Bldg. 7, Room 214
Speaker
Mohammad Emtiyaz Khan (EPFL, Switzerland)
Title
Non-conjugate variational inference using proximal gradient method
Abstract
In this talk, I will give a very informal and a broad overview of my ongoing work. I will first summarize the applications that motivates my work on Bayesian inference. I will then discuss one of my recent work on non-conjugate variational inference where we are concerned with marginalization of latent variables. Such marginalization requires computation of high-dimensional integral and is usually intractable. Variational inference simplify the "integration" problem to an "optimization" problem, but still suffers from high computation and memory requirement, mostly due to a non-linear terms arising from the 'non-conjugate' part of the model. I will present a solution based on proximal gradient method that is surprisingly simple and simplify the problem greatly and also works like a charm!

Date & Time
2014/12/18 13:20-14:50
Venue
Meeting Room on 4th Floor, W8E Building (Campus map, O-okayama Area, Building No.26)
Speaker
Makoto Yamada (Yahoo Labs, USA)
Title
Minimum Redundancy Maximum Relevance Feature Selection for Large and High-dimensional Data
Abstract
Feature selection is an important machine learning problem, and it is widely used for various types of applications such as gene selection from microarray data, document categorization, and prosthesis control, to name a few. The feature selection problem is a traditional and popular machine learning problem, and thus there exist many methods including the least absolute shrinkage and selection operator (Lasso) and the spectral feature selection (SPEC). Recently, a wrapper based large-scale feature selection method called the feature generation machine (FGM) was proposed (Tan et al., 2014). However, to the best of our knowledge, there is a few filter based methods for large and high-dimensional setting, in particular for nonlinear and dense setting. Moreover, existing filter type methods employ a maximum relevance based approach which selects m features with the largest relevance to the output. MR-based methods are simple yet efficient and can be easily applicable to high-dimensional and large sample problems. However, since MR-based approaches only use input-output relevance and not use input-input relevance, they tend to select redundant features. In this talk, we first propose a nonlinear extension of the non-negative least-angle regression (N3LARS). An advantage of N3LARS is that it can easily incorporate with map-reduce framework such as Hadoop and Spark. Thus, with the help of distributed computing, a set of features can be efficiently selected from a large and high-dimensional data. Finally, we show that the N3LARS can solve a large and high-dimensional feature selection problem in a few hours.

Date & Time
2014/11/14 13:20-14:50
Venue
Meeting Room on 4th Floor, W8E Building (Campus map, O-okayama Area, Building No.26)
Speaker
Aapo Hyvarinen (University of Helsinki, Finland)
Title
Dynamic Connectivity Factorization: Interpretable Decompositions of Non-Stationarity
Abstract
In many multivariate time series, the correlation structure is non-stationary, i.e. it changes over time. Analysis of such non-stationarities is of particular interest in neuroimaging, in which it leads to investigation of the dynamics of connectivity. A fundamental approach for such analysis is to estimate connectivities separately in short time windows, and use existing machine learning methods, such as principal component analysis (PCA), to summarize or visualize the changes in connectivity. Here, we use the PCA approach by Leonardi et al as the starting point and present two new methods. Our goal is to simplify interpretation of the results by finding components in the original data space instead of the connectivity space. First, we show how to further analyse the principal components of connectivity matrices by a tailor-made two-rank matrix approximation, in which the eigenvectors of the conventional low-rank approximation are transformed. Second, we show how to incorporate the two-rank constraint in the estimation of PCA itself to improve the results. We further provide an interpretation of the method in terms of estimation of a probabilistic generative model related to blind source separation methods and ICA. Preliminary experiments on magnetoencephalographic data reveal possibly meaningful non-stationarity patterns in power-to-power coherence of rhythmicsources (i.e. correlation of amplitudes).

Date & Time
2014/10/20 15:00-16:00
Venue
Meeting Room on 4th Floor, W8E Building (Campus map, O-okayama Area, Building No.26)
Speaker
Julian Zimmert (Humboldt University of Berlin, Germany)
Title
Multi-task Relationship Learning with Partly Given Task Similarities
Abstract
Multi-task Learning is about jointly learning classifications for several related tasks. By incorporating additional data of similar tasks, the classification performance has been greatly improved (for example in the biomedical domain). In almost all MTL approaches, the task similarity matrix Sigma is a crucial quantity. While some similarities might be known a priori and it is theoretically possible to compute them for each pair given the data, computation is too computationally expensive (slow). We like to extend the approach given by "Y. Zhang and D. Y. Yeung. A convex formulation for learning task relationships in multi-task learning. arXiv preprint arXiv:1203.3536, 2010." by modifying the problem, such that we can make use of the task similarities we do know already.

Date & Time
2014/08/18 13:30-15:00
Venue
Meeting Room on 4th Floor, W8E Building (Campus map, O-okayama Area, Building No.26)
Speaker
Zhouchen Lin (Peking University, China)
Title
Learning Partial Differential Equations for Computer Vision and Image Processing
Abstract
Many computer vision and image processing problems can be posed as solving partial differential equations (PDEs). However, designing PDE system usually requires high mathematical skills and good insight into the problems. In this paper, we consider designing PDEs for various problems arising in computer vision and image processing in a lazy manner: learning PDEs from training data via optimal control approach. We first propose a general intelligent PDE system which holds the basic translational and rotational invariance rule for most vision problems. By introducing a PDE-constrained optimal control framework, it is possible to use the training data resulting from multiple ways (ground truth, results from other methods, and manual results from humans) to learn PDEs for different computer vision tasks. The proposed optimal control based training framework aims at learning a PDE-based regressor to approximate the unknown (and usually nonlinear) mapping of different vision tasks. The experimental results show that the learnt PDEs can solve different vision problems reasonably well. In particular, we can obtain PDEs not only for problems that traditional PDEs work well but also for problems that PDE-based methods have never been tried before, due to the difficulty in describing those problems in a mathematical way.

Date & Time
2014/08/08 13:30-15:00
Venue
Meeting Room on 4th Floor, W8E Building (Campus map, O-okayama Area, Building No.26)
Speaker
Mohammad Emtiyaz Khan (EPFL, Switzerland)
Title
Decoupled Variational Gaussian Inference
Abstract
In this talk, I will present a new method called the decoupled variational inference for variational Gaussian (VG) approximation. The standard VG inference methods are inefficient at large-scale since they require storage of large covariance matrices. Decoupled variational inference reduces this storage requirement by using a Lagrangian method. I will show that the original VG solution can be recovered by optimizing the Lagrangian. I will then present an optimization algorithm that uses a sequence of highly-parallelizable convex programs. In addition, the gradient is obtained by fitting a conjugate model, thereby reducing the computation of a non-conjugate model to that of a conjugate model. Overall, decoupled variational inference leads to an easy, efficient, and scalable implementation of non-conjugate models at large scale.

Date & Time
2014/07/24 13:30-15:00
Venue
Meeting Room on 10th Floor, W8E Building (Campus map, O-okayama Area, Building No.26)
Speaker
Motoaki Kawanabe (ATR, Japan)
Title
On robust feature construction against non-stationaritiy for EEG-BMI decoders
Abstract
Electroencephalographic (EEG) signals are known to be non-stationary and easily affected by artifacts. Since such fluctuations may be caused by changes in the subject's brain processes, e.g. change of task involvement, fatigue, learning effects etc., it is particularly important to alleviate non-stationarity in EEG time series in order to construct useful Brain-Machine Interface (BMI) systems in real-world environments. For supporting elderly and disabled people in daily life at home by BMIs with portable EEG-NIRS measurement devices, ATR-BICR and the partners have been carrying out the Network BMI project since 2011. We are tackling this challenging problem by simultaneous measurement of human behavior and brain activities acquired at the real-world experimental laboratory (the BMI house) in the premise of ATR, and also by parallel and distributed processing of the large- scale data.
In this talk, at first I will briefly introduce the demonstration experiment of our way-point BMI prototype system for controlling a wheelchair and electric appliances at the BMI house. For developing this system, we have recorded brain activities of a single subject during motor imagery experiments (left-hand/arm vs. right-hand/arm) with a portable EEG device (g.tec MOBIlab+ 8 channels) over 15 days and 83 runs. Indeed, offline analyses of the data revealed various non-stationary changes in sensory motor rhythms (SMR) and BMI performances caused by change of imagined actions, drowsiness etc.
Then, I will explain a few joint works with the Berlin BCI team on spatial filtering for constructing robust features against outliers and other non-stationary changes. One of them is the maxmin Common Spatial Pattern (CSP), a robust version of the popular spatial filter CSP for EEG-BMI by a maxmin approach (Kawanabe et al., 2014). In contrast to standard CSP that maximizes the variance ratio between two conditions based on a single estimate of the class covariance matrices, we propose to robustly compute spatial filters by maximizing the minimum variance ratio within a prefixed set of covariance matrices called the tolerance set. We show that this kind of maxmin optimization makes CSP robust to outliers and reduces its tendency to overfit. Another approach utilizes the beta divergence which can construct robust statistical methods in a principled manner. I will introduce briefly this divergence-based framework for robust feature extraction in BCI (Samek et al., 2014).

Date & Time
2014/07/14 13:30-15:00
Venue
Meeting Room on 4th Floor, W8E Building (Campus map, O-okayama Area, Building No.26)
Speaker
Naoki Masuda (University of Bristol, UK)
Title
Temporal networks
Abstract
ネットワークにおいて、枝は、ほとんどの時間に使われていない場合も多い。例えば、感染症が友人間で伝搬するというとき、実際に伝搬が起こりうるのは、枝を成す2人が会っている時間帯のみであり、友人間であっても、この時間は相対的に短いことが普通である。テンポラル・ネットワークは、このような状況を扱う枠組みであり、ここ数年で急速に研究が進んでいる。本発表では、テンポラル・ネットワークについて概説する。特に、データ解析に関係する内容に重点を置き、データマイニングの必要性、新しい解析手法の可能性、既存の機械学習や統計手法の利用などについても議論したい。

Date & Time
2014/07/11 13:30-15:00
Venue
Meeting Room on 4th Floor, W8E Building (Campus map, O-okayama Area, Building No.26)
Speaker
Yung-Kyun Noh (Korea Advanced Institute of Science and Technology, Korea)
Title
Machine Learning with Nearest Neighbors
Abstract
The theoretical study for nearest neighbor (NN) information goes back to T. Cover and P. Hart's work in the 1960s connecting the NN information to the underlying probability density functions. The predictions from the theoretical study are very powerful, while the empirical study in general does not show the prediction even with many data. In this talk, I will explain how the powerful prediction for NN classification can be achieved through metric learning approach, which is directly derived from the T. Cover's work considering the asymptotic situation. I will first show how the learned metric is fundamentally different from conventional metric learning methods. In several contemporary machine learning methods as well as the classification, the proposed method can be widely applied achieving state-of-the-art performance. Also, with the proposed method, the well-known heuristics for better nearest neighbor methods can be exploited in the theoretical context.

Date & Time
2014/07/09 13:30-15:00
Venue
Meeting Room on 4th Floor, W8E Building (Campus map, O-okayama Area, Building No.26)
Speaker
Samory Kpotufe (Toyota Technological Institute at Chicago, USA)
Title
Self-tuning in nonparametric regression
Abstract
Contemporary statistical procedures are making inroads into a diverse range of applications in the natural sciences and engineering. However it is difficult to use those procedures "off-the-shelf" because they have to be properly tuned to the particular application. In this talk, we present some "adaptive" regression procedures, i.e. procedures which self-tune, optimally, to the unknown parameters of the problem at hand. We consider regression on a general metric space X of unknown dimension, where the output Y is given as f(x) + noise. We are interested in adaptivity at any input point x in X: the algorithm must self-tune to the unknown "local" parameters of the problem at x. The most important such parameters, are (1) the unknown smoothness of f, and (2) the unknown intrinsic dimension, both defined over a neighborhood of x. Existing results on adaptivity have typically treated these two problem parameters separately, resulting in methods that solve only part of the self-tuning problem. Using various regressors as an example, we first develop insight into tuning to unknown dimension. We then present an approach for kernel regression which allows simultaneous adaptivity to smoothness and dimension locally at a point x. This latest approach combines intuition for tuning to dimension, and intuition from so-called Lepski's methods for tuning to smoothness. The overall approach is likely to generalize to other nonparametric methods.

Date & Time
2014/07/07 15:00-16:30
Venue
Meeting Room on 4th Floor, W8E Building (Campus map, O-okayama Area, Building No.26)
Speaker
Tomonari Sei (Keio University, Japan)
Title
Use of optimal transport in statistics -- a review
Abstract
In this talk, two applications of optimal transport in statistics are reviewed. The first topic of this talk is how to construct a parametric family of multivariate distributions via optimal transport. In multivariate analysis, linear transformation is used in various standard methods: PCA, ICA, SEM and so forth. In contrast, multi-dimensional non-linear transformation is not so often used since its construction and interpretation might be difficult. However, any continuous distribution can be, in principle, constructed from a given distribution via optimal transport mapping. Furthermore, it is shown that the family is tractable in some sense. As the second topic, the multivariate Q-Q plot proposed by Easton and McCulloch (1990) is reviewed. The plotting method uses the assignment problem together with affine transformation. It is pointed out that the affine-invariant version of optimal transport problem naturally arises.

Date & Time
2014/07/01 15:00-16:30
Venue
Meeting Room on 4th Floor, W8E Building (Campus map, O-okayama Area, Building No.26)
Speaker
Shinichi Nakajima (Nikon Corporation, Japan)
Title
Analysis of Empirical MAP and Empirical Partially Bayes: Can They be Alternatives to Variational Bayes?
Abstract
Variational Bayesian (VB) learning is known to be a promising approximation to Bayesian learning with computational efficiency. However, in some applications, e.g., large-scale collaborative filtering and tensor factorization, VB is still computationally too costly. In such cases, looser approximations such as MAP estimation and partially Bayesian (PB) learning, where a part of the parameters are point-estimated, seem attractive. In this paper, we theoretically investigate the behavior of the MAP and the PB solutions of matrix factorization. A notable finding is that the global solutions of MAP and PB in the empirical Bayesian scenario, where the hyperparameters are also estimated from observation, are trivial and useless, while their local solutions behave similarly to the global solution of VB. This suggests that empirical MAP and empirical PB with local search can be alternatives to empirical VB equipped with the useful automatic relevance determination property. Experiments support our theory.

Date & Time
2014/07/01 13:20-15:00
Venue
Meeting Room on 4th Floor, W8E Building (Campus map, O-okayama Area, Building No.26)
Speaker
Thomas Gaetner (Fraunhofer Institute, Germany)
Title
Machine Learning out of the Box
Abstract
In this talk I will show a few advances that have the potential to make machine learning algorithms more usable for non-experts and in non-standard situations. In particular, I will address interactive visualisations and learning in structured spaces. Our approach to interactive visualisations is based on adding knowledge-based constraints to kernel PCA. For structured spaces we consider abstract convexity spaces. We show that efficient online learning is possible if the VC dimensions is bounded and an oracle is available for sampling from any convex set.

Date & Time
2014/06/30 13:30-15:00
Venue
Meeting Room on 4th Floor, W8E Building (Campus map, O-okayama Area, Building No.26)
Speaker
Chiyuan Zhang (Massachussetts Institute of Technology, USA)
Title
Data Representation and Machine Learning
Abstract
Data representation is a very important component in machine learning. There are many different ways of looking at this problem. In this talk, I will present two of them (that I have worked on). Firstly, I will talk about manifold learning, and introduce how to use parallel vector field on the manifold to help recover or regularize along the data manifold when the data is supported on some intrinsic nonlinear manifold embedded in high dimensional Euclidean space. Secondly, I will talk about invariant representations, which aim at getting rid of task-independent variabilities. I will introduce the framework of invariant representation and give some applications in speech and audio classification problems.

Date & Time
2014/04/11 16:30-18:00
Venue
Meeting Room on 4th Floor, W8E Building (Campus map, O-okayama Area, Building No.26)
Speaker
Tony Jebara (Columbia University, USA)
Title
Quadratic Majorization for Convex and Nonconvex Learning
Abstract
The partition function plays a key role in probabilistic modeling including conditional random fields, graphical models, and maximum likelihood estimation. To optimize partition functions of log-linear models, we introduce a quadratic variational upper bound. This inequality facilitates majorization methods: optimization of complicated functions through the iterative solution of simpler sub-problems. Such bounds remain efficient to compute even when the partition function involves a graphical model (with small tree-width) or in latent likelihood settings. For large-scale problems, low-rank versions of the bound are provided and outperform LBFGS as well as first-order methods. Several learning applications are shown and reduce to fast and convergent update rules. Experimental results show advantages over state-of-the-art optimization methods. We also propose a stochastic version of bound majorization which competes well against stochastic gradient descent (across any of its variations and tunings). It converges in fewer iterations, reduces computation time and finds better parameter estimates. The proposed method bridges first- and second-order stochastic optimization methods by maintaining linear computational complexity (with respect to dimensionality) while exploiting second order information about the pseudo-global curvature of the objective function.

Date & Time
2014/02/28 10:45-12:00
Venue
Meeting Room on 4th Floor, W8E Building (Campus map, O-okayama Area, Building No.26)
Speaker
Alessio Del Bue (Italian Institute of Technology, Italy)
Title
Bilinear modelling with missing data in Computer Vision, Image Processing and Machine Learning
Abstract
This presentation will show a unified approach to solve different bilinear factorization problems in Computer Vision, Image Processing and Machine Learning. Interestingly, many known problems can be solved using bilinear factorization such as Structure from Motion, non-rigid image registration, Photometric Stereo, image pose estimation, learning via matrix factorization, recommender systems strategies, sound localisation and sensor networks calibration. In particular, I will show that the only difference among such problems is the manifold where the data lies on. Following this insight, it is possible to introduce an equivalent reformulation of the bilinear factorization problem that decouples the core bilinear aspect from the manifold specificity. Then the algorithm tackles the resulting constrained optimization problem via Augmented Lagrange Multipliers (the BALM algorithm). This creates an approach that can deal with matrix factorization problems with up to 10^8 entries and 90% missing data in several simulated and real experiments.

Date & Time
2013/12/03 11:00-12:00
Venue
Meeting Room on 10th Floor, W8E Building (Campus map, O-okayama Area, Building No.26)
Speaker
Aapo Hyvarinen (University of Helsinki, Finland)
Title
Testing independent components, with applications to brain imaging
Abstract
Independent component analysis (ICA) is increasingly used for analyzing brain imaging data. ICA typically gives a large number of components many of which may be just random, due to insufficient sample size, violations of the model, or algorithmic problems. Few methods are available for computing the statistical significance (reliability) of the components. We propose to approach this problem by performing ICA separately on a number of subjects, and finding components which are sufficiently consistent (similar) over subjects. Similarity can be defined in two different ways: 1) the similarity of the mixing coefficients, which usually correspond to spatial patterns in EEG and MEG, or 2) the similarity of the independent components themselves, which usually correspond to spatial patterns in fMRI. The threshold of what is ``sufficient'' is rigorously defined by a null hypothesis under which the independent components are random orthogonal components in the whitened space. Components which are consistent in different subjects are found by clustering under the constraint that a cluster can only contain one source from each subject, and by constraining the number the false positives based on the null hypothesis. Instead of different subjects, the method can also be applied on different sessions of recordings from a single subject. The methods are applicable to both EEG/MEG and fMRI.

Date & Time
2013/11/27 11:00-12:00
Venue
Seminar Room on 4th Floor, W8E Building (Campus map, O-okayama Area, Building No.26)
Speaker
Hiroshi Kajino (University of Tokyo, Japan)
Title
Convex formulations for learning from crowds
Abstract
Crowdsourcing is a technique to request unspecified workers to perform various tasks in the Web. Recently, the machine learning community has been interested in crowdsourcing as a tool to create a dataset for supervised learning because it allows us to construct datasets in low costs. However, it is often pointed out that the quality of the resultant dataset heavily depends on the ability of the workers who performed the task. Therefore, many researchers have been involved in a learning from crowds problem where a goal is to learn a high performance classifier from a dataset of variable quality. In this talk, I will introduce two methods called a personal classifier method and a clustered personal classifier method. Both methods are novel in that they are formulated in convex optimization problems in contrast to the existing methods. I will present key points of formulating the problem into convex optimization problems and also give experimental results to show the effectiveness of our formulations.

Date & Time
2013/11/20 13:30-14:30
Venue
Seminar Room on 4th Floor, W8E Building (Campus map, O-okayama Area, Building No.26)
Speaker
Kazuho Watanabe (Nara Institute of Science and Technology, Japan)
Title
Optimal Mixing Distributions in Rate-Distortion Analysis and Universal Coding
Abstract
We discuss the evaluation of rate-distortion functions in lossy source coding. This problem is reduced to deriving the optimal reconstruction distribution, which corresponds to the mixing distribution of a certain mixture model defined for a distortion measure. We review several pairs of a source and a distortion measure where the optimal reconstruction distributions are explicitly obtained or bounds for the rate-distortion function are evaluated. We also discuss the optimization of prior distributions in a universal coding problem. We examine the relationship between the achievability of asymptotic minimax optimality and the knowledge of the sample size.

Date & Time
2013/11/20 11:00-12:00
Venue
Seminar Room on 4th Floor, W8E Building (Campus map, O-okayama Area, Building No.26)
Speaker
Shinichi Nakajima (Nikon Corporation, Japan)
Title
Global Solvers for Variational Bayesian Low-rank Subspace Clustering
Abstract
When a probabilistic model and its prior are given, Bayesian learning offers inference with automatic parameter tuning. However, Bayesian learning is often obstructed by computational difficulty: the rigorous Bayesian learning is intractable in many models, and its variational Bayesian (VB) approximation is prone to suffer from local minima. In this talk, we overcome this difficulty for low-rank subspace clustering (LRSC) by providing an exact global solver and its efficient approximation. LRSC extracts a low-dimensional structure of data by embedding samples into the union of low-dimensional subspaces, and its variational Bayesian variant has shown good performance. We first prove a key property that the VB-LRSC model is highly redundant. Thanks to this property, the optimization problem of VB-LRSC can be separated into small subproblems, each of which has only a small number of unknown variables. Our exact global solver relies on another key property that the stationary condition of each subproblem is written as a set of polynomial equations, which is solvable with the homotopy method. For further computational efficiency, we also propose an efficient approximate variant, of which the stationary condition can be written as a polynomial equation with a single variable. Experimental results show the usefulness of our approach.

Date & Time
2013/11/18 13:20-14:50
Venue
Seminar Room on 4th Floor, W8E Building (Campus map, O-okayama Area, Building No.26)
Speaker
Taiji Suzuki (Tokyo Institute of Technology, Japan)
Title
Convex Tensor Decomposition via Structured Schatten Norm Regularization
Abstract
We study a new class of structured Schatten norms for tensors that includes two recently proposed norms ("overlapped" and "latent") for convex-optimization-based tensor decomposition. Based on the properties of the structured Schatten norms, we analyze the performance of "latent" approach for tensor decomposition, which was empirically found to perform better than the "overlapped" approach in somesettings. We show theoretically that this is indeed the case. In particular, when the unknown true tensor is low-rank in a specific unknown mode, this approach performs as well as knowing the mode with the smallest rank. Along the way, we show a novel duality result for structured Schatten norms, which is also interesting in the general context of structured sparsity. We confirm through numerical simulations that our theory can precisely predict the scaling behaviour of the mean squared error.

Date & Time
2013/10/08 11:00-12:00
Venue
Meeting Room on 10th Floor, W8E Building (Campus map, O-okayama Area, Building No.26)
Speaker
Hiroaki Sasaki (University of Electro-Communications, Japan)
Title
Topography Estimation on Correlated Components
Abstract
This talk presents a method, which we call correlated topographic analysis (CTA), to estimate non-Gaussian components and these ordering (topography). CTA tries to make use of the residual dependencies which independent component analysis cannot remove. The key assumption is that only nearby components are allowed to have both linear and energy correlations, while far-away components are as statistically independent as possible. These nearby dependencies define the proximity between the components and are used to fix the ordering.
In this talk, I introduce a generative model for non-Gaussian components. This model enables us to derive the likelihood for CTA and to discuss the relationship to previous methods in a unified way. Experimental results on artificial data show that CTA generalizes a previous method in terms of topography estimation. Finally, application results from natural images and text data are given.

Date & Time
2013/08/09 14:00-15:00
Venue
Seminar Room on 4th Floor, W8E Building (Campus map, O-okayama Area, Building No.26)
Speaker
Taku Komura (University of Edinburgh, UK)
Title
Animating Close Character Interactions
Abstract
Close interactions, not necessarily with any contacts, between different body parts of single or multiple characters or with the environment are common in computer animation and 3D computer games. Yoga, wrestling, dancing and moving through a constrained environment are some examples. In this talk, I will describe about the problems that rise when animating such scenes and the solutions that we have provided in the last few years. I will then describe about topics that we plan to work on.

Date & Time
2013/07/22 13:30-15:00
Venue
Seminar Room on 4th Floor, W8E Building (Campus map, O-okayama Area, Building No.26)
Speaker
Daniel Lee (University of Pennsylvania, USA)
Title
Perception, Planning, and Motor Control in Machines vs. Animals
Abstract
It's ironic that machines today are able to excel at seemingly complex games humans find difficult, yet struggle with basic perceptual and motor tasks that we take for granted. What are the appropriate perceptual, world and motor representations needed to generate robust behaviors in real-time? A variety of algorithms use low-dimensional dynamical models to simplify information in high-dimensional trajectories. I will present some recent work on learning low-dimensional reductions, and show examples of how these algorithms can be implemented on humanoid robots.

Date & Time
2013/07/03 16:00-17:30
Venue
Seminar Room on 4th Floor, W8E Building (Campus map, O-okayama Area, Building No.26)
Speaker
Shinichi Nakajima (Nikon Corporation, Japan)
Title
Global Solution and Theoretical Guarantee of Variational Bayesian PCA
Abstract
For statistical models in which the rigorous Bayesian learning is computationally intractable, the variational Bayesian (VB) approximation is a good alternative with its efficient computation and automatic relevance determination (or model selection) property. This talk starts with a general explanation on the VB approximation, including the standard procedure to derive a tractable iterative algorithm, and its advantage over the MAP estimation. Then, I'll show our recent theoretical results on VB in the probabilistic PCA or matrix factorization with no missing entry. These results include an analytic-form global solution, bounds of the noise variance estimator, and a theoretical guarantee for perfect dimensionality recovery. The analytic-form solution not only provides a stable and fast algorithm for VB-PCA, but also forms a building block of efficient algorithms for more complicated models.

Date & Time
2013/06/06 15:00-16:30
Venue
Meeting Room on 10th Floor, W8E Building (Campus map, O-okayama Area, Building No.26)
Speaker
Taiji Suzuki (University of Tokyo, Japan)
Title
Some Recent Developments in Stochastic Alternating Direction Multiplier Method
Abstract
In this talk, we present a new stochastic optimization methods that are applicable to a wide range of structured regularizations. The proposed methods are based on stochastic optimization techniques and Alternating Direction Multiplier Method (ADMM). ADMM is a general framework for optimizing a composite function, and has a wide range of applications. We propose two types of online variants of ADMM, which correspond to online proximal gradient descent and regularized dual averaging respectively. The proposed algorithms are computationally efficient and easy to implement. Our methods yield the minimax optimal rate. Finally, we will talk about our on-going research topic related to the stochasic ADMM.

Date & Time
2013/05/31 13:30-15:00
Venue
Seminar Room on 4th Floor, W8E Building (Campus map, O-okayama Area, Building No.26)
Speaker
Tsuyoshi Ide (IBM Research - Tokyo, Japan)
Title
Tackling real business problems using machine learning
Abstract
Most of advanced companies have already been aware of the tremendous business potential of data analytics technologies including machine learning. They can not only work as a key differentiator over competitors, but also can be a major driver of a new market. In this talk, I will cover recent trends of machine learning in action with a particular focus on recent changes in the business model of the IT industry. I also share some of latest examples of machine learning research applied to real business problems.

Date & Time
2013/05/10 13:30-15:00
Venue
Seminar Room on 4th Floor, W8E Building (Campus map, O-okayama Area, Building No.26)
Speaker
Nathan Srebro (Toyota Technological Institute at Chicago, USA)
Title
Matrix Learning: A Tale of Two Norms
Abstract
There has been much interest in recent years in various ways of constraining the complexity of matrices based on factorizations into a product of two simpler matrices. Such measures of matrix complexity can then be used as regularizers for such tasks as matrix completion, collaborative filtering, multi-task learning and multi-class learning. In this talk I will discuss two forms of matrix regularization which constrain the norm of the factorization, namely the trace-norm (aka nuclear-norm) and the so-called max-norm (aka $\gamma_2:\ell_1\rightarrow\ell_\infty$ norm). I will both argue that they are independently motivated and often better model data then rank constraints, as well as explore their relationships to the rank. In particular, I will discuss how simple low-rank matrix completion guarantees can be obtained using these measures, and without various "incoherence" assumptions. I will present both theoretical and empirical arguments for why the max-norm might actually be a better regularizer, as well as a better convex surrogate for the rank. Based on joint work with Rina Foygel, Jason Lee, Ben Recht, Russ Salakhutdinov, Ohad Shamir, Adi Shraibman and Joel Tropp and others.

Date & Time
2013/05/07 13:30-15:00
Venue
Seminar Room on 4th Floor, W8E Building (Campus map, O-okayama Area, Building No.26)
Speaker
Yung-Kyun Noh (Seoul National University, Korea)
Title
Exploiting k-nearest neighbor information with many data
Abstract
The theoretical study of nearest neighbors goes back to T. Cover and P. Hart's work in the 1960s which is based on the asymptotic behavior of nearest neighbor classification with many data. Their best-known contribution is the upper bound of the error in the asymptotic situation, which is twice the Bayes error, as well as the idea of connecting nearest neighbor information to the underlying probability density functions. More recently, studies on nearest neighbors have developed various useful techniques for many contemporary machine learning algorithms showing how nearest neighbors can be better used from the theoretical perspective. In this talk, some of our works will be presented utilizing recent theoretical findings on nearest neighbors. First, metric learning methods will be introduced to minimize the finite sampling effect that produces a bias from the result in the asymptotic situation. Applications include the nearest neighbor classification and the estimation of various information-theoretic measures. Second, the optimality of the majority voting strategy in k-nearest neighbor classification will be discussed comparing the strategy with a well-known psychology model: diffusion decision making (DDM). In light of DDM, the simple majority voting is suboptimal in terms of maximizing the classification accuracy with the same number of nearest neighbors used. I will introduce various strategies with criteria derived from the DDM setting which produce much better performance. All of the work in this talk is based on the analysis which assumes many data, and often the presented results only appear in tasks with large numbers of data. The results in this talk will help understand the behavior of nearest neighbors when treating large scale data.

Date & Time
2013/04/11 13:30-14:30
Venue
Seminar Room on 4th Floor, W8E Building (Campus map, O-okayama Area, Building No.26)
Speaker
Masahiro Tomono (Chiba Institute of Technology, Japan)
Title
Environment Sensing for Mobile Robots(移動ロボットの環境センシング)
Abstract
Sensing capabilities are indispensable for mobile robots to move stably and safely in indoor and outdoor environments. Key technologies in mobile robot sensing include 3D map building, robot localization, and object recognition. This presentation will introduce our research activities in this field.

移動ロボットが屋内外の多様な環境を安定かつ安全に走行するには, ロボットのセンシング能力が大きな鍵となります.その要素技術 として,3次元地図の生成,ロボットの自己位置推定,物体・場所認識など があります.本講演では,この分野における私たちの研究を紹介します.

Date & Time
2013/04/04 10:30-12:00
Venue
Seminar Room on 4th Floor, W8E Building (Campus map, O-okayama Area, Building No.26)
Speaker
Ichiro Takeuchi (Nagoya Institute of Technology, Japan)
Title
Parametric-task learning and its application to joint estimation of family of cost-sensitive models
Abstract
Multi-task learning (MTL) has been shown to be effective for training multiple related tasks. The main idea of MTL is to learn a common shared structure across multiple tasks. In this talk, we extend MTL framework to the cases where there are infinitely many tasks parametrized by a continuous parameter. We introduce a new approach called parametric-task learning (PTL), and show that it can find a common shared structure of infinitely many parametrized tasks in the same way as MTL. As an illustration, we apply the PTL algorithm to the problem of jointly estimating family of cost-sensitive models.

Date & Time
2013/04/03 13:30-15:00
Venue
Seminar Room on 4th Floor, W8E Building (Campus map, O-okayama Area, Building No.26)
Speaker
岡谷貴之(東北大学)
title
ディープラーニング

Date & Time
2013/03/28 13:30-15:00
Venue
Seminar Room on 4th Floor, W8E Building (Campus map, O-okayama Area, Building No.26)
Speaker
Jun Sakuma (Tsukuba University, Japan)
Title
Data Privacy and Machine Learning
Abstract
With advances of network-based services and smart phones, various types of information tightly associated with individual and organizational activities are being collected. Privacy-preserving data mining (PPDM) is now gaining much attention as technologies which enable secure exploitation of such sensitive information. In the talk, two topics are introduced; differential privacy and PPDM with homomorphic encryption.
Differential privacy aims to guarantee privacy of query responses given by statistical databases with randomization. After elementary introduction of differential privacy, we show differential privacy mechanisms for low-rank approximated matrices as an advanced topic.
PPDM with homomorphic encryption aims to perform distributed data mining with taking inputs which are privately distributed over two or more parties and cannot be shared mutually. To this end, we introduce homomorphic encryption, which allows a certain prescribed operation, such as addiction or multiplication, over encrypted values. In the talk, we demonstrate our development framework for PPDM with smart phones, fairy ring.

Date & Time
2013/01/24 13:30-14:30
Venue
Seminar Room on 4th Floor, W8E Building (Campus map, O-okayama Area, Building No.26)
Speaker
Makoto Okabe (University of Electro-Communications, Tokyo)
Title
Video Retrieval based on User-Specified Appearance and Application to Animation Synthesis
Abstract
In our research group, we investigate techniques for retrieving videos based on user-specified appearances. In this paper, we introduce two of our research activities. First, we present a user interface for quickly and easily retrieving scenes of a desired appearance from videos. Given an input image, our system allows the user to sketch a transformation of an object inside the image, and then retrieves scenes showing this object in the user-specified transformed pose. Our method employs two steps to retrieve the target scenes. We first apply a standard image-retrieval technique based on feature matching, and find scenes in which the same object appears in a similar pose. Then we find the target scene by automatically forwarding or rewinding the video, starting from the frame selected in the previous step. When the user-specified transformation is matched, we stop forwarding or rewinding, and thus the target scene is retrieved. We demonstrate that our method successfully retrieves scenes of a racing car, a running horse, and a flying airplane with user-specified poses and motions. Secondly, we present a method for synthesizing fluid animation from a single image, using a fluid video database. The user inputs a target painting or photograph of a fluid scene. Employing the database of fluid video examples, the core algorithm of our technique then automatically retrieves and assigns appropriate fluid videos for each part of the target image. The procedure can thus be used to handle various paintings and photographs of rivers, waterfalls, fire, and smoke, and the resulting animations demonstrate that it is more powerful and efficient than our prior work.

Date & Time
2012/11/19 15:00-16:30
Venue
Seminar Room on 4th Floor, W8E Building (Campus map, O-okayama Area, Building No.26)
Speaker
Michael Gutmann (University of Helsinki, Finland)
Title
On the Estimation of Multi-Layer Models of Natural Image Statistics
Abstract
This talk is on the intersection between natural image statistics and machine learning. I will start by presenting a multi-layer model of natural images that is motivated by neuroscience and which we like to estimate. I then point out that consistent estimation of the model is difficult because it is unnormalized: the model does not integrate to one for all values of the parameters. Maximum likelihood estimation can then not be used without resorting to numerical approximations which are often computationally expensive. I will then present a recent method to estimate unnormalized models, explain some of its properties, and show the estimation results for the multi-layer model.

Related publications:
M.U. Gutmann and A. Hyvarinen Noise-Contrastive Estimation of Unnormalized Statistical Models, with Applications to Natural Image Statistics Journal of Machine Learning Research, 13:307-361, 2012.
M.U. Gutmann and A. Hyvarinen Learning a selectivity--invariance--selectivity feature extraction architecture for images Proc. Int. Conf. on Pattern Recognition (ICPR), 2012.
M.U. Gutmann and J.Hirayama Bregman Divergence as General Framework to Estimate Unnormalized Statistical Models, Proc. Conf. on Uncertainty in Artificial Intelligence (UAI), 2011.

Date & Time
2012/11/12 15:00-16:30
Venue
Seminar Room on 4th Floor, W8E Building (Campus map, O-okayama Area, Building No.26)
Speaker
Motoki Shiga (Toyohashi University of Technology, Japan)
Title
Data Mining from Multiple Networks
Abstract
On applied analysis, e.g. web pages, SNS and biological datasets, we often have a set of interactions (or edges) between nodes instead of numerical data. This talk focuses on the problem setting when we have multiple networks (or multiple types of interactions) to be analyzed, and introduces our two proposed data mining methods by effectively integrating multiple networks. The first method is a node clustering based on Bayesian learning. This work assumes a model for generating edges in networks as probabilistic events, and then developed an algorithm for clustering and parameter optimization by a variational Bayesian approach. The second method is a semi-supervised learning for predicting unknown labels of nodes. This method chooses informative sub-networks over multiple networks to perform accurate predictions. Performance results of theses methods are evaluated by using synthetic datasets and real genome datasets.

[1] Motoki Shiga and Hiroshi Mamitsuka, A Variational Bayesian Framework for Clustering with Multiple Graphs, IEEE Trans. on Knowledge and Data Engineering, 24(4), 577-590, 2012.
[2] Motoki Shiga and Hiroshi Mamitsuka, Efficient Semi-Supervised Learning on Locally Informative Multiple Graphs, Pattern Recognition, 45(3), 1035-1049, 2012.

Date & Time
2012/11/05 15:00-16:30
Venue
Seminar Room on 4th Floor, W8E Building (Campus map, O-okayama Area, Building No.26)
Speaker
John Quinn (Makerere University, Uganda)
Title
Extremely Detailed Spatiotemporal Inference of Disease Risk
Abstract
Spatiotemporal models are commonly deployed to answer two questions: what is happening now (across the spatial field, including places we have no direct observations), and what is likely to happen next. Both of these questions are important in the analysis of disease data, and classical models of disease spread are continually being extended to deal with new spatial observation types to improve the accuracy of these types of estimates. I'll discuss probabilistic inference issues that arise when using an extremely detailed type of observation: symptom data from individuals at known locations. Although in general the complexity of calculating the posterior distribution at each time frame scales exponentially with the number of individuals, I will demonstrate how a simple approximation reduces this to linear complexity and is therefore tractable even when applied to a population of millions. I'll illustrate these ideas with examples of work on malaria mapping and automated diagnosis in Uganda.

Date & Time
2012/07/06 13:20-14:50
Venue
Seminar Room on 5th Floor, W8E Building (Campus map, O-okayama Area, Building No.26)
Speaker
Mathieu Blondel (Kobe University, Japan)
Title
Learning Non-Linear Classifiers with Sparsity-Promoting Norms
Abstract
Kernel Support Vector Machines (SVMs) achieve outstanding accuracy on many datasets, thanks to their non-linearity. Unfortunately, they are unrealistic in settings where fast prediction is required (e.g., real-time prediction), since the computational cost of their prediction function is linear with the number support vectors. In this talk, I will present algorithms to learn non-linear classifiers using sparsity-promoting norms: L1 norm in the two-class case and L1/Lp mixed-norms in the multi-class case. The presented algorithms give more flexibility to the user to adjust the accuracy / sparsity trade-off and, compared to SVMs, typically lead to much sparser classifiers with comparable accuracy.

Date & Time
2012/03/30 13:30-14:30
Venue
Seminar Room on 5th Floor, W8E Building (Campus map, O-okayama Area, Building No.26)
Speaker
Hal Daume III (University of Maryland, USA)
Title
Structured Prediction need not be Slow
Abstract
Classic algorithms for predicting structured data (e.g., graphs, trees, etc.) rely on expensive (sometimes intractable) inference at test time. In this talk, I'll discuss several recent approaches that enable computationally efficient (e.g., linear-time) prediction at test time. These approaches fall in the category of learning algorithms that optimize accuracy for some fixed notion of efficiency. I'll conclude by considering the question: can a learning algorithm figure out how to make fast predictions on its own?

Date & Time
2012/03/30 14:30-15:30
Venue
Seminar Room on 5th Floor, W8E Building (Campus map, O-okayama Area, Building No.26)
Speaker
Ruslan Salakhutdinov (University of Toronto, Canada)
Title
Learning Hierarchical Models
Abstract
Building intelligent systems that are capable of extracting meaningful representations from high-dimensional data lies at the core of solving many Artificial Intelligence tasks, including speech perception, visual object recognition, information retrieval, and language understanding. Theoretical and biological arguments strongly suggest that building such systems requires models with deep hierarchical structure that support inferences at multiple levels.
In this talk, I will introduce a broad class of probabilistic generative models called Deep Boltzmann Machines (DBMs), and a new algorithm for learning these models that uses variational methods and Markov chain Monte Carlo. I will show that DBMs can learn useful hierarchical representations from large volumes of high-dimensional data, and that they can be successfully applied in many domains, including speech perception, information retrieval, object recognition, and nonlinear dimensionality reduction. I will then describe a new class of more complex probabilistic graphical models that combine Deep Boltzmann Machines with structured hierarchical Bayesian models, called Hierarchical-Deep (HD) Models. I will show how these models can learn a deep hierarchical structure for sharing knowledge across hundreds of visual categories, which allows accurate learning of novel visual concepts from few examples.

Date & Time
2012/03/05 10:30-12:00
Venue
Seminar Room on 5th Floor, W8E Building (Campus map, O-okayama Area, Building No.26)
Speaker
Minh Ha Quang (Italian Institute of Technology, Italy)
Title 1
Vector-Valued Reproducing Kernel Hilbert Spaces and Applications
Abstract 1
Kernel methods have recently emerged as a powerful framework for many machine learning and data mining applications. Most of the literature on kernel methods so far has focussed on scalar-valued kernels. In this talk, we will give an overview of the theory of operator-valued positive definite kernels and their associated vector-valued reproducing kernel Hilbert spaces (RKHS).
We will present two sets of applications. The first is for the problem of colorization of black and white images (joint work with Sung Ha Kang and Triet Le, Journal of Mathematical Imaging and Vision, 2010).
The second, which is joint work with Vikas Sindhwani (ICML 2011), is on vector-valued manifold regularization, with examples in multi-label image classification and hierarchical text categorization.
Title 2
Slow Feature Analysis and Decorrelation Filtering for Separating Correlated Sources
Abstract 2
Slow Feature Analysis (SFA) is a method for extracting slowly varying features from input signals. In this talk, we generalize SFA to vector-valued functions of multivariables and apply it to the problem of blind source separation, in particular image separation. When the sources are correlated, we apply the following technique called decorrelation filtering: use a linear filter to decorrelate the sources and their derivatives, then apply the separating matrix obtained on the filtered sources to the original sources. We show that if the filtered sources are perfectly separated by this matrix, then so are the original sources. We show how to numerically obtain such a decorrelation filter by solving a nonlinear optimization problem. This technique can also be applied to other linear separation methods, whose output signals are uncorrelated, such as ICA.
This is joint work with Laurenz Wiskott (ICCV 2011).

Date & Time
2012/01/10 15:00-16:30
Venue
Seminar Room on 5th Floor, W8E Building (Campus map, O-okayama Area, Building No.26)
Speaker
Milan Vojnovic (Microsoft Research Cambridge, UK)
Title
Continuous Distributed Counting for Non-Monotonic Streams
Abstract
We consider the continual count tracking problem in a distributed environment where the input is an aggregate data stream originating from k distinct sites and the updates are allowed to be non-monotonic, i.e., both increments and decrements are allowed. The goal is to continually track the count within a prescribed relative accuracy \epsilon at the lowest possible communication cost. Specifically, we consider an adversarial setting where the input values are selected and assigned to sites by an adversary but the order is according to a random permutation or is a random i.i.d process. The input stream of values is allowed to be non-monotonic with an unknown drift -1\leq \mu \leq 1 where the case \mu = 1 corresponds to the special case of a monotonic stream of only non-negative updates. We show that a randomized algorithm guarantees to track the count accurately with high probability and has the expected communication cost \tilde O(\min{\sqrt{k}/(|\mu| \epsilon), \sqrt{k n}/\epsilon, n}), for an input stream of length n, and establish matching lower bounds. Last but not least, we also provide an algorithm and a communication complexity upper bound for a fractional Brownian motion input, and show how our non-monotonic counter can be applied to track the second frequency moment and to a Bayesian linear regression problem.

Joint work with Zhenming Liu and Bozidar Radunovic.

Date & Time
2011/11/04 13:30-15:00
Venue
Seminar Room on 5th Floor, W8E Building (Campus map, O-okayama Area, Building No.26)
Speaker
Marco Cuturi (Kyoto University, Japan)
Title
Ground Metric Learning
Abstract
Transportation distances have been used for more than a decade now in machine learning to compare histograms of features. They have one parameter: the ground metric, which can be any metric between the features themselves. As is the case for all parameterized distances, transportation distances can only prove useful in practice when this parameter is carefully chosen. To date, the only option available to practitioners to set the ground metric parameter was to rely on a priori knowledge of the features, which limited considerably the scope of application of transportation distances. We propose to lift this limitation and consider instead algorithms that can learn the ground metric using only a training set of labeled histograms. We call this approach ground metric learning. We formulate the problem of learning the ground metric as the minimization of the difference of two polyhedral convex functions over a convex set of distance matrices. We follow the presentation of our algorithms with promising experimental results on binary classification tasks using GIST descriptors of images taken in the Caltech-256 set.
Preprint

Date & Time
2011/11/02 10:30-12:00
Venue
Seminar Room on 5th Floor, W8E Building (Campus map, O-okayama Area, Building No.26)
Speaker
Moritz Grosse-Wentrup (Max-Planck Institute, Germany)
Title
What are the Neurophysiological Causes of Performance Variations in Brain-Computer Interfacing?
Abstract
When a subject operates a non-invasive brain-computer interface (BCI), the system correctly infers the subject's intention in some trials, yet fails to make the right decision in other trials. As the algorithm used to decode brain signals is typically fixed, the reason for this variation in performance has to be found in the subject's brain states. In this talk, I argue that distributed gamma-range oscillations play a major role in determining BCI-performance. In particular, I present empirical evidence that gamma-range oscillations modulate the sensorimotor-rhythm [1], and may be used to predict BCI-performance on a trial-to-trial basis [2]. I further present preliminary evidence that feedback of fronto-parietal gamma-range oscillations may be used to induce a state-of-mind beneficial for operating a BCI [3].

References:
1. Grosse-Wentrup, M., B. Scholkopf and J. Hill. Causal Influence of Gamma Oscillations on the Sensorimotor Rhythm. NeuroImage 56(2), pp. 837-842, 2011.
2. Grosse-Wentrup, M., Fronto-Parietal Gamma-Oscillations are a Cause of Performance Variation in Brain-Computer Interfacing. Proceedings of the 5th International IEEE EMBS Conference on Neural Engineering (NER 2011), pp. 384-387, 2011.
3. Grosse-Wentrup, M. Neuro-Feedback of Fronto-Parietal Gamma-Oscillations. 5th International BCI Conference, Graz, Austria, 2011.

Date & Time
2011/10/28 13:20-15:00
Venue
Seminar Room on 5th Floor, W8E Building (Campus map, O-okayama Area, Building No.26)
Speaker
Jun-ichiro Hirayama (Kyoto University, Japan)
Title
Bregman Divergence as General Framework to Estimate Unnormalized Statistical Models
Abstract
A parametric statistical model often has an intractable normalization factor which makes standard maximum likelihood estimation impractical. Recently, several alternative methods have been proposed to deal with this difficulty in the estimation of "unnormalized" statistical models, where "unnormalized" means that the model has intractable normalization factor, or even has no normalization factor. A classical example is Pseudolikelihood proposed by Besag for discrete MRF; other recent examples includes Contrastive Divergence, Score Matching, Ratio Matching, Noise-Contrastive Estimation and its generalization.

We have recently shown that minimization of Bregman divergence (BD) provides a rich framework to estimate unnormalized statistical models, which unifies and generalizes some of the existing principles. This talk is about some selected pieces from this study, with a few new results. I will first introduce the problem of estimating unnormalized models, and then show how the Noise-Contrastive Estimation and its generalization can be interpreted as BD minimization. I will also be pointing out its connection to a framework of "density ratio estimation" using BD. I will finally show that the proposed framework also contains Score Matching, Ratio Matching and Pseudolikelihood as special cases.

Date & Time
2011/08/05 10:30-12:00
Venue
Seminar Room on 5th Floor, W8E Building (Campus map, O-okayama Area, Building No.26)
Speaker
Oliver Kroemer (Max-Planck Institute, Germany)
Title
Learning Dynamic Tactile Sensing with Robust Vision-based Training
Abstract
Dynamic tactile sensing is a fundamental ability for recognizing materials and objects. However, while humans are born with partially developed dynamic tactile sensing and master this skill quickly, today’s robots remain in their infancy. The development of such a sense requires not only better sensors, but also the right algorithms to deal with these sensors’ data. For example, when classifying a material based on touch, the data is noisy, high-dimensional and contains irrelevant signals as well as essential ones. Few classification methods from machine learning can deal with such problems. In this talk, I will discuss an efficient approach to inferring suitable lower-dimensional representations of the tactile data. In order to classify materials based on only the sense of touch, these representations are autonomously discovered using visual information of the surfaces during training. However, accurately pairing vision and tactile samples in real robot applications is a difficult problem. The proposed approach therefore works with weak pairings between the modalities. Experiments show that the resulting approach is very robust and yields significantly higher classification performance based on only dynamic tactile sensing.

Date & Time
2011/06/16 10:45-12:15
Venue
Seminar Room on 5th Floor, W8E Building (Campus map, O-okayama Area, Building No.26)
Speaker
Pritee Khanna (Indian Institute of Information Technology, India)
Title
Content-Based Image Retrieval
Abstract
In recent years, with the increasing needs of multimedia information retrieval on the Internet, the research and application of image retrieval technology conforms to the development trend and has a bright future. But, it still faces many technical problems for achieving fast and efficient image retrieval on the Internet, which is restricted in the extensive and in-depth application and is become one of the research focuses. However, it is relatively difficult to achieve the co-ordination between the system response time and the image retrieval accuracy in the distributed networks which store massive unstructured or semi-structured data.

Content-based image retrieval has taken the low-level visual features (color, texture, shape and object, etc.) as research priorities of image retrieval since the early 1990s. Some of its important characteristics are intuitiveness (example description), efficiency (similarity matching), and universality (query without the help of domain knowledge). These characteristics are applied to overcome some defects of keywords-based image retrieval, such as subjectivity (unintuitive retrieval results), ambiguity (inaccuracy content description of image with natural language) and inconvenience (large massive of manual annotations of image), and shown a vigorous development trend. Content-based image retrieval has inevitably shown some shortcomings at the same time. In order to guarantee retrieval accuracy, the extracted image has features of great dimensions and they rise drastically with the improvement of retrieval accuracy. This increases the burden of indexing by a great amount and decreases the efficiency of retrieval.

I will discuss the issues which need to be focused for the development of an effective CBIR system.

Date & Time
2011/05/31 13:30-15:00
Venue
Meeting Room on 10th Floor, W8E Building (Campus map, O-okayama Area, Building No.26)
Speaker
Nigel Collier (National Institute of Informatics, Japan)
Title
Web Sensing for Real Time Disaster Detection and Tracking
Abstract
Accurate and timely detection of public health disasters such as the spread of infectious diseases and chemical contamination are necessary to help support risk assessment and ultimately to save lives and livelihoods. In this talk I will present progress on the JST funded BioCaster project. BioCaster exploits high throughput biomedical text mining from global news media to detect norm violations in near real time. Additionally, I will discuss our recent investigation into tracking syndromic trends from user generated content in the DIZIE project and show how social media can complement news events in both spatial and temporal resolution. Early results for DIZIE illustrate how selected features are highly correlated with laboratory data for influenza. Ongoing challenges will also be discussed including: (1) bridging the gap between laymen's and expert's terminology, (2) integrating evidence across documents and information spaces, and (3) providing realistic benchmarks.

Date & Time
2011/05/13 15:30-17:00
Venue
Meeting Room on 10th Floor, W8E Building (Campus map, O-okayama Area, Building No.26)
Speaker
Masanori Kawakita (Kyushu University, Japan)
Title
A Class of Semi-Supervised Learning in View of Statistical Paradox and Its Model Selection
Abstract
We analyze the performance of a certain class of semi-supervised regression and propose its model selection. In a semi-supervised learning, it is often assumed that the number of labeled data is quite few. As for model selection, however, almost conventional semi-supervised methods use AIC or cross-validation based on few labeled data. This leads to a large variance in risk estimation.

First, we focus a certain class of semi-supervised regression, which is based on the weighted likelihood with the ratio between labeled data density p(x) and unlabeled data density p'(x). We refer to this approach as Density-Ratio- Estimation-based Semi-Supervised (DRESS) regression in this talk. This approach has been studied well in a situation where p'(x) differs from p(x). If p'(x)=p(x), DRESS approach seems meaningless because the target density ratio p'(x)/p(x) is trivially one at any x, leading to the usual least-squares estimator (LSE). Indeed, almost no theory has guaranteed the performance of DRESS when p'=p so far to our knowledge. However, we can prove that DRESS improves the risk of LSE under some conditions. This issue has an analogical structure with a statistical paradox "Even if we know a true value of nuisance parameter, estimating it improves the accuracy in some situations". This analogy plays a central role in the above proof.

Second, we propose a new risk estimator for DRESS regression, which is referred to as Criterion-based-on-Risk-Of-Semi-Supervised regression (CROSS). Its derivation does not require a large sample assumption, prior knowledge of noise variance and distribution. DRESS+CROSS performs better than LSE under model misspecification, while it performs equally or slightly worse than LSE when the model is correctly specified. Thus, it is necessary to estimate whether the model is correct or not.

Third study solves this issue to some extent. Simulations illustrate the performance of these proposals.

Date & Time
2011/03/17 10:30-12:00
Venue
Seminar Room on 5th Floor, W8E Building (Campus map, O-okayama Area, Building No.26)
Speaker
大知 正直(電気通信大学)
Title
レビュー文を利用したランキング関数の特徴量の提案
Abstract
ユーザの評価値を予想するランキング関数において,他のユーザ群の評価 値を特徴量として採用すると,有為な値がスパースになることが知られている. 本研究では,各ユーザの評価値とともに記されたレビュー文を利用する新たな特 徴量の提案を行い,有為な値のスパース性を改善できることを示した.また,実 際のレビューデータを元にした評価実験の結果,従来手法と比較してユーザ評価 値の予想精度が改善されることを示した.

Date & Time
2011/03/09 10:00-11:30
Venue
Seminar Room on 5th Floor, W8E Building (Campus map, O-okayama Area, Building No.26)
Speaker
Jen-Tzung Chien (National Cheng Kung University, Taiwan)
Title
Bayesian and Sparse Learning of Acoustic and Language Models
Abstract
In this talk, I will present my recent studies on machine learning and speech recognition. Speech recognition involves extensive knowledge of machine learning and statistical modeling. Both acoustic modeling and language modeling are important parts of modern speech recognition algorithms. In acoustic modeling, I will introduce a sparse representation of acoustic features based on a set of state-dependent basis vectors. The Bayesian sensing hidden Markov models can be established from the heterogeneous training data. The hybrid dictionary learning and sparse representation is performed. In language modeling, I will address the topic model and present a Dirichlet class language model, which projects the sequence of history words onto a latent class space and calculates a marginal likelihood over the uncertainties of classes, which are expressed by Dirichlet priors. A Bayesian class-based language model is established and a variational Bayesian inference procedure is presented. In this presentation, I will report different evaluations on large vocabulary continuous speech recognition and briefly address some other works we are doing now on different topics of machine learning.

Short Bio: Jen-Tzung Chien received his Ph.D. degree in electrical engineering from the National Tsing Hua University, Hsinchu, Taiwan, in 1997. Since 1997, he has been with the Department of Computer Science and Information Engineering, National Cheng Kung University, Tainan, Taiwan, where he is currently a Professor. He held the Visiting Professor positions at the Panasonic Technologies Inc., Santa Barbara, CA, the Tokyo Institute of Technology, Tokyo, Japan, the Georgia Institute of Technology, Atlanta, GA, the Microsoft Research Asia, Beijing, China, and the IBM T. J. Watson Research Center, Yorktown Heights, NY. His research interests include machine learning, speech recognition, face recognition, information retrieval and signal separation.

Date & Time
2011/01/18 10:00-11:30
Venue
Seminar Room on 5th Floor, W8E Building (Campus map, O-okayama Area, Building No.26)
Speaker
Yee Whye Teh (University College London, UK)
Title
Hierarchical Bayesian Models of Language and Text
Abstract
In this talk I will present a new approach to modelling sequence data called the sequence memoizer. As opposed to most other sequence models, our model does not make any Markovian assumptions. Instead, we use a hierarchical Bayesian approach which enforces sharing of statistical strength across the different parts of the model. To make computations with the model efficient, and to better model the power-law statistics often observed in sequence data, we use a Bayesian nonparametric prior called the Pitman-Yor process as building blocks in the hierarchical model. We show state-of-the-art results on language modelling and text compression.

This is joint work with Frank Wood, Jan Gasthaus, Cedric Archambeau and Lancelot James.

Date & Time
2010/11/15 13:30-15:00
Venue
Seminar Room on 5th Floor, W8E Building (Campus map, O-okayama Area, Building No.26)
Speaker
Fernando Villavicencio (Yamaha Corporation, Japan)
Title
Application of Voice-Conversion to Singing-Voice
Abstract
In this talk we will present the main features of our work concerning the application of Voice Conversion to Singing-Voice in order to achieve singer-timbre conversion on Yamaha's VOCALOID singing-synthesizer. Our main goal is to find the transformation of singing-voice samples of a source singer in order to perceive the timbre of a desired target singer. The timbre-conversion framework is based on a probabilistic conversion function derived after Gaussian Mixture Modeling of spectral envelope features. We will describe the main parts of this work as well as the results of the study of several issues as the spectral envelope modeling, the statistical modeling of the features and the derivation of the timbre mapping from un-paired source-target data.

Date & Time
2010/11/11 10:30-12:00
Venue
Meeting Room on 10th Floor, W8E Building (Campus map, O-okayama Area, Building No.26)
Speaker
Thomas G. Dietterich (Oregon State University, USA)
Title
Fine-Grained Visual Categorization and the Problem of Novel Objects
Abstract
Fine-grained visual categorization is the problem of discriminating among very similar objects (e.g., species of animals, makes of automobiles). For the past seven years, we have been developing methods for fine-grained categorization of aquatic macro invertebrates (insect larvae that live in freshwater streams). This talk will discuss the computer vision and machine learning methods that we have developed and that show performance exceeding 88% correct on 29 species of aquatic macro invertebrates. An important challenge in this application is that insects belonging to species outside the training set can arise frequently, so the vision system must detect that these do not belong to any of the classes known to the system. We will discuss various methods that we have applied to this problem, and speculate on how we can improve the performance of these methods.

Date & Time
2010/10/26 10:30-12:00
Venue
Seminar Room on 5th Floor, W8E Building (Campus map, O-okayama Area, Building No.26)
Speaker
Yasuo Tabei (Japan Science and Technology Agency, Japan)
Title
SketchSort: Fast All Pairs Similarity Search Method by Multiple Sorting
Abstract
Recently, it is increasingly common that images and signals are represented as vectorial data. To save memory and improve speed, vectorial data are often represented as binary strings called sketches. Chariker (2002) proposed a fast approximate method for finding neighbor pairs of sketches by sorting and scanning with a small window. This method, which we shall call "single sorting", is applied to locality sensitive codes and prevalently used in speed-demanding web-related applications. In this presentation, we present the multiple sorting method, which combines blockwise masking and radixsort. Additionally, the average false negative rate is computable and duplicated discoveries are deliberately avoided. In empirical experiments on a large-scale image dataset, it is shown that it is much faster than cover tree and Lanczos bisection.

Date & Time
2010/10/07 16:30-18:00
Venue
Meeting Room on 10th Floor, W8E Building (Campus map, O-okayama Area, Building No.26)
Speaker
井手剛(IBM東京基礎研究所)
Title
ネットワーク上のトラジェクトリ回帰問題について
Abstract
台風の軌跡や、店舗内の人の動き、あるいは地図上の自動車の動きなど、移動体の軌跡(トラジェクトリ)からの知識発見技術は、最近のデータマイニングにおける興味深い話題のひとつである。我々は最近、「トラジェクトリ回帰」、すなわちトラジェクトリのコストを予測する問題を、カーネル回帰の枠組みで定式化した(T. Ide and S. Kato, SDM 2009)。本講演では、それと別の定式化をたどることで、実用上より有用なコスト予測方式が得られることを示す。同時に、トラジェクトリに対するカーネル関数についての新しい理解が得られることを示す。また、具体的な応用として、地図上の交通流解析を取り上げ、どのような研究課題が存在するかを議論したい。

Date & Time
2010/07/26 10:30-12:00
Venue
Seminar Room on 5th Floor, W8E Building (Campus map, O-okayama Area, Building No.26)
Speaker
Ivor Wai-Hung Tsang (Nanyang Technological University, Singapore)
Title
Non-parametric Kernel Learning: Algorithms and Applications
Abstract
Previous studies of Non-Parametric Kernel Learning (NPKL) usually formulate the learning task as a Semi-Definite Programming (SDP) problem that is often solved by some general purpose SDP solvers. However, for N data examples, the time complexity of NPKL using a standard interior-point SDP solver could be as high as O(N^6.5), which prohibits NPKL methods applicable to real applications, even for datasets of moderate size. In this paper, we present a family of efficient NPKL algorithms, termed "SimpleNPKL", which can learn non-parametric kernels from a large set of pairwise constraints efficiently. In particular, we propose two efficient SimpleNPKL algorithms. One is SimpleNPKL algorithm with linear loss, which enjoys a closed-form solution that can be efficiently computed by the Lanczos sparse eigen decomposition technique. Another one is SimpleNPKL algorithm with other loss functions (including square hinge loss, hinge loss, square loss) that can be re-formulated as a saddle-point optimization problem, which can be further resolved by a fast iterative algorithm. In contrast to the previous NPKL approaches, our empirical results show that the proposed new technique, maintaining the same accuracy, is significantly more efficient and scalable. Finally, we also demonstrate that the proposed new technique is also applicable to speed up many kernel learning tasks, including colored maximum variance unfolding, minimum volume embedding, and structure preserving embedding.

Besides SimpleNPKL, we also propose a novel non-parametric spectral kernel learning method which can seamlessly combine manifold structure of unlabeled data and Regularized Least-Squares (RLS) to learn a new kernel. Interestingly, the new kernel matrix can be obtained analytically with the use of spectral decomposition of graph Laplacian matrix. Hence, the proposed algorithm does not require any numerical optimization solvers. Moreover, by maximizing kernel target alignment on labeled data, we can also learn model parameters automatically with a closed-form solution. For a given graph Laplacian matrix, our proposed method does not need to tune any model parameter including the tradeoff parameter in RLS and the balance parameter for unlabeled data. Extensive experiments on ten benchmark datasets show that our proposed non-parametric and parameter-free spectral kernel learning algorithm can obtain comparable performance with fine-tuned manifold regularization methods in transductive setting, and outperform multiple kernel learning in supervised setting.


Date & Time
2010/06/11 13:30-
Venue
Seminar Room on 5th Floor, W8E Building (Campus map, O-okayama Area, Building No.26)
Speaker
Yukikazu Hidaka (University of Southern California, USA, and ATR Computational Neuroscience Laboratories, Japan)
Title
Use It and Improve It, or Lose It: Non-Linear Interactions between Arm and Hand Use and Function During Stroke Recovery
Abstract
In this talk, we introduce our research about neuro-computational rehabilitation for patients with stroke. Stroke-affected arm use in daily life presumably forms a part of effective rehabilitation therapy. However, there is little understanding of the interactions between arm use and function in humans post-stroke. In a previous computational study (Han, Arbib, and Schweighofer, 2008), we suggested that the dependence of function on use is non-linear after therapy: above a threshold of function, use will spontaneously improve, and in turn, function further improves; below this threshold, use and function of the affected limb will plateau or deteriorate, and compensatory strategies will develop further.

Here, we directly test this hypothesis, by developing a 1st order dynamical model with non-linear interactions between function and use, and by analyzing how this model can account for actual stroke recovery data. Using a Bayesian framework, we systematically compared this model to other time-varying models with and without interactions between function and use. To train the parameters of all the models, we used data from the immediate treatment group of the EXCITE clinical trial (Wolf et al. 2006) in which use and function data were collected following two weeks of therapy in four month intervals for 2 years.

Comparison of the model evidence probabilities showed that the best fitting model was our 1st order dynamical model with the non-linear interaction between function and uses. We also predicted that the recovery process of each patient, and categorized patients into the vicious or vicious group, by using a threshold surface of long term arm use estimate. Finally, we compared model parameters before and after therapy and found that the only parameter which increased is related to the motivation to use the affected arm. Our results suggest that after rehabilitation, the interaction between function and use is a crucial factor for functional recovery.


Date & Time
2010/05/31 13:30-
Venue
Seminar Room on 5th Floor, W8E Building (Campus map, O-okayama Area, Building No.26)
Speaker
Sungyoung Kim (Yamaha Corporation, Japan)
Title
Beyond Surround: Towards Enhanced Immersive Presence of Auditory Information
Abstract
This presentation covers the psychoacoustical principles of the conventional multichannel audio system, proposes how should such principles apply for the future 3-dimensional audio, and introduces the influence of non auditory cues for enhanced immersive feeling of "being there." As a case study, this talk introduces a newly developed signal processing method by Yamaha, which creates virtually elevated auditory imagery via a conventional 5.1 channel reproduction system. As an interim procedure between the current surround audio and the future periphonic audio, the proposed method allows listeners to experience vertically extended space where musicians and composers can express their musical expression better.

Date & Time
2010/03/01 10:30-
Venue
Seminar Room on 5th Floor, W8E Building (Campus map, O-okayama Area, Building No.26)
Speaker
Katja Hansen (Technical University of Berlin, Germany)
Title
Machine Learning in Drug Discovery and Design
Abstract
Within the past two decades, Machine Learning methods have been established in a variety of applications in the field of computational chemistry. Due to the complex nature of drug design, these methods serve as perfect tools to decrease development time, cost and use of chemical resources.

Starting from a general overview on drug discovery the talk will focus on different problems related to the specific requirements of the algorithms arising in this field of research. In particular the question of interpretability is of great importance for a drug designing scientist: Complex Machine Learning approaches in general result in black box models - while delivering excellent prediction performance, most of these methods will provide no answer as to why the model predicts a particular label (e.g. toxic/non-toxic) for a certain molecule. Given the immense impact for the following drug development steps and the correlated costs, the certainty of a prediction is nearly as precious for the chemist as the prediction itself. Two different approaches on confidence estimation will be introduced and evaluated on Ames mutagenicity data. Both focus on kernel based Machine Learning algorithms in particular Gaussian processes. Finally additional approaches to enhance machine learning in drug discovery will be discussed.

Date & Time
2009/09/28 14:00-
Venue
Seminar Room on 5th Floor, W8E Building (Campus map, O-okayama Area, Building No.26)
Speaker
Klaus-Robert Müller (Technical University of Berlin, Germany)
Title
Denoising and Dimension Reduction in Feature Space
Abstract
The talk presents recent work that interestingly complements our understanding of the VC picture in kernel based learning. Our finding is that the relevant information of a supervised learning problem is contained up to negligible error in a finite number of leading kernel PCA components if the kernel matches the underlying learning problem. Thus, kernels not only transform data sets such that good generalization can be achieved using only linear discriminant functions, but this transformation is also performed in a manner which makes economic use of feature space dimensions. In the best case, kernels provide efficient implicit representations of the data for supervised learning problems. Practically, we propose an algorithm which enables us to recover the subspace and dimensionality relevant for good classification. Our algorithm can therefore be applied (1) to analyze the interplay of data set and kernel in a geometric fashion, (2) to aid in model selection, and to (3) denoise in feature space in order to yield better classification results.

We complement our theoretical findings by reporting on applications of our method to data from gene finding and brain computer interfacing.

This is joint work with Claudia Sanelli, Mikio Braun and Joachim M. Buhmann.

Date & Time
2009/07/17 13:20-
Venue
Seminar Room on 5th Floor, W8E Building (Campus map, O-okayama Area, Building No.26)
Speaker
Toru Wakahara (Hosei University, Japan)
Title
Affine-Invariant Recognition of Face Images Using GAT Correlation
Abstract
My talk addresses a challenging problem of performing normalization and recognition of face images at one time. The key idea is use of Global Affine Transformation (GAT) correlation for determining optimal 2D affine parameters that normalize a given image to yield the maximum correlation value with a target image. The GAT correlation method assigns an input face image to the face template having the largest GAT correlation value among all of enrolled face templates. Experimental results using the public HOIP face image database demonstrates a very high recognition rate of 99.79%. Moreover, the proposed method successfully matches face templates with their artificially affine-transformed images subject to rotation within 45 degrees, scale change within 50 percent, and translation within 25 percent of the face extent.

Date & Time
2009/06/01 15:00-
Venue
Seminar Room on 5th Floor, W8E Building (Campus map, O-okayama Area, Building No.26)
Speaker
Paul von Bünau (Technical University of Berlin, Germany)
Title
Stationary Subspace Analysis
Abstract
Non-stationarities are an ubiquitous phenomenon in statistical data analysis, yet they pose a challenge to standard Machine Learning methodology since the classic assumption of a stationary data generating process is violated. Conversely, understanding the nature of observed non-stationary behaviour often lies at the heart of a scientific question. To this end, we propose a novel unsupervised technique: Stationary Subspace Analysis (SSA). SSA decomposes a multi-variate time-series into its stationary and non-stationary components. In this context, we also investigate the occurrence of spurious stationarity and provide useful theoretical results on the circumstances under which spurious stationary components arise. We demonstrate the performance of our novel concept in extensive simulations and present a real world application to Brain Computer Interfacing.

Date & Time
2009/03/19 16:00-
Venue
Seminar Room on 5th Floor, W8E Building (Campus map, No.26)
Speaker
Yu Takahashi (Nara Institute of Science and Technology, Japan)
Title
Musical Noise Analysis for Integration Method of Microphone Array and Nonlinear Signal Processing with Higher-Order Statistics
Abstract
In recently years, for better noise reduction, integration methods of microphone array signal processing and nonlinear signal processing have been researched. Indeed the integrated method can achieve good noise reduction performance, but a nonlinear processing in the method causes an artificial distortion, so-called musical noise. Since such a musical noise makes user uncomfortable, it is desired that such a musical noise is mitigated. Moreover, in these days, it is reported that higher-order statistics is strongly related with the amount of generated musical noise. Thus, we analyze the integrated method of microphone array signal processing and nonlinear signal processing, based on higher-order statistics. Also, we propose an architecture for reducing musical noise based on the analysis.

Date & Time
2008/09/11 13:30-
Venue
Seminar Room on 5th Floor, W8E Building (Campus map, O-okayama Area, Building No.26)
Speaker
Taiji Suzuki (University of Tokyo, Japan)
Title
A Least-Squares Approach to Mutual Information Estimation with Application in Variable Selection
Abstract
We propose a new method of estimating mutual information from samples. Our method, called Least-Squares Mutual Information (LSMI), has several attractive properties, e.g., density estimation is not involved, an analytic-form solution is available, a variant of cross-validation can be used for model selection, and an approximate leave-one-out error can be computed very efficiently. Numerical experiments show that LSMI compares favorably with existing methods in mutual information estimation and variable selection. The practical usefulness of LSMI is demonstrated also in protein subcellular localization prediction.

Date & Time
2008/08/20 13:20-15:00
Venue
Seminar Room on 5th Floor, W8E Building (Campus map, O-okayama Area, Building No.26)
Speaker
Ron Begleiter (Technion Israel Institute of Technology, Israel)
Title
Repairing Self-Confident Active-Transductive Learners Using Systematic Exploration
Abstract
We consider an active learning game within a transductive learning model. A major problem with many active learning algorithms is that an unreliable current hypothesis can mislead the querying component to query "uninformative" points. In this work we propose a remedy to this problem. Our solution can be viewed as a "patch" for fixing this deficiency and also as a proposed modular approach for active transductive learning that produces powerful new algorithms. Extensive experiments on "real" data demonstrate the advantage of our method.

Reference:
R. Begleiter, R. El-Yaniv, and D. Pechyony, Repairing self-confident active-transductive learners using systematic exploration, Pattern Recognition Letters, 29(9), 1245--1251, 2008.

Date & Time
2008/05/19 10:30-12:00
Venue
Seminar Room on 5th Floor, W8E Building (Campus map, O-okayama Area, Building No.26)
Speaker
安田宗樹(東北大学)
Title
統計的近似理論を用いたボルツマンマシンの近似学習則
Abstract
ボルツマンマシン(Boltzmann machine)はネットワーク内部に信号帰還のループ構造 を含む相互結合型のニューラルネットワークの一種であり,連想記憶モデルの代表の 一つとして知られるホップフィールドモデルに確率的な状態遷移を持たせた拡張とし てみなすことができる確率的ニューラルネットワークである. ボルツマンマシンのもつ豊富な構造から,さまざまな最適化問題やパターン認識問題 等への応用が期待されているが,その学習にはギブス分布の平均と相関を計算する必 要があり,まともにそれを実行しようとすると非常に膨大な計算時間を必要としてしまう. そこで平均場理論をはじめとした様々な統計的近似理論を用いた近似学習則が古くか ら研究されてきている.本講演では,最近情報科学の諸分野で広く利用されているビ リーフプロパゲーションに線形応答近似と呼ばれる近似手法を組み合わせた隠れ素子 なしのボルツマンマシンに対する新しい近似学習則を紹介する. 線形応答近似はビリーフプロパゲーションの相関の近似精度を向上させることが知られて おり,従来の近似学習則より高い近似精度を期待できる. また本講演では,隠れ素子がある場合の近似学習の戦略についても議論する.

■参考文献
[1]M. Yasuda and T. Horiguchi: Triangular approximation for Ising model and its application to Boltzmann machine, Physica A, vol. 836, pp. 83-95, 2006.
[2]M. Yasuda and K. Tanaka: The Mathematical Structure of the Approximate Linear Response Relation, J. Phys. A: Math. and Theor., vol. 40. pp. 9993-10007, 2007.

Date & Time
2008/04/03 13:30-15:00
Venue
Seminar Room on 5th Floor, W8E Building (Campus map, O-okayama Area, Building No.26)
Speaker
後藤順哉(中央大学)
Title
汎化誤差評価に基づくポートフォリオ選択
Abstract
手持ちの資金をどの資産にどれだけ投資するのか(すなわち、投資配分)を決 定する問題をポートフォリオ選択という。伝統的なポートフォリオ選択モデル は金融資産の収益率分布を、事前に決めた基準に則ってin-sampleの意味で最 適にするよう投資配分を決定するが、通常、サンプルの数が限られるため、場 合によっては大きな推定誤差が生じるものと考えられる。また、金融資産の収 益率分布の特定は困難であることから、ノンパラメトリックな仮定に基づく裏 付けが望まれる。本研究では外れ値検出のモデルとしても知られる1クラス nu-SVMとポートフォリオ選択問題との類似性から、ポートフォリオに対する汎 化誤差(のようなもの)を評価し、それに基づく新たなポートフォリオ選択モ デル、およびその解法を提示する。このモデルは伝統的なモデルと異なり、 out-of-sampleのパフォーマンス向上を目指したものである点が新しいが、従 来からポートフォリオ選択の基準として用いられてきたVaR、CVaRの最小化に 深く関係することから、それらのパフォーマンスに対する理論的な裏付けも与 えることになる。(この研究は武田@東工大との共同研究である)

Date & Time
2008/03/19 13:30-15:00
Venue
Seminar Room on 5th Floor, W8E Building (Campus map, O-okayama Area, Building No.26)
Speaker
Justin Dauwels (Massachussetts Institute of Technology, USA)
Title
Machine Learning Techniques for Quantifying Neural Synchrony: Application to the Early Diagnosis of Alzheimer's Disease from EEG
Abstract
We present a novel approach to measure the interdependence of multiple time series, referred to as "stochastic event synchrony" (SES). As a first step, "events" from the given time series are extracted, next, those events are aligned. The better the alignment, the more the time series are considered to be similar. The similarity measure is computed by performing statistical inference on a sparse graph. As an application, we consider the problem of detecting anomalies in EEG synchrony of Mild Cognitive Impairment (MCI) patients. We present some results and discuss ideas for future research.

This talk is based on joint work with F. Vialatte (RIKEN, Japan), Theophane Weber (MIT), and A. Cichocki (RIKEN, Japan).

Date & Time
2008/01/31 15:00-16:30
Venue
Seminar Room on 5th Floor, W8E Building (Campus map, O-okayama Area, Building No.26)
Speaker
Kengo Kato (University of Tokyo, Japan)
Title
On the Degrees of Freedom in Shrinkage Estimator
Abstract
We study the degrees of freedom in shrinkage estimation of the regression coefficients. Generalizing the idea of the Lasso, we consider the problem of estimating the coefficients by the projection of the ordinary least squares estimator onto a closed convex set. Then an unbiased estimator of the degrees of freedom is derived in terms of geometric quantities under a smoothness condition on the boundary of the closed convex set. The result presented in this paper is applicable to estimation with a wide class of constraints. As an application, we obtain a Cp-type criterion and AIC for selecting the tuning parameter.

Reference: Technical Report
 
Date & Time
2008/01/08 15:00-16:30
Venue
Meeting Room on 10th Floor, W8E Building (Campus map, O-okayama Area, Building No.26)
Speaker
前田賢一(東芝)
Title
画像認識 -技術と応用の最前線-
Abstract
画像認識の最前線の紹介です。利用される技術(ハード、 ソフト、アルゴリズム)と、それらがに応用される場面を、 実例(顔認識、車載障害物検出など)を交えて紹介します。

Date & Time
2007/12/20 13:30-15:00
Venue
Seminar Room on 5th Floor, W8E Building (Campus map, O-okayama Area, Building No.26)
Speaker
中島伸一(ニコン)
Title
Wishart行列の極限固有値分布を利用した特異モデルの汎化誤差解析
Abstract
確率ベクトルの各成分が平均0の正規分布に従うとき,その共分散行列は Wishart分布に従う.元の正規分布の各成分が独立であるとし,次元と サンプル数の比を一定に保ったままそれらを大きくしていくと,共分散 行列の固有値密度はある関数に概収束することが知られている(Marcenko- Pastur則).本トークでは,この性質を利用して縮小ランク回帰モデル の汎化性能を解析した例を紹介する. 実は,極限固有値分布は元の確率変数の正規性には依存しない.(ただし, 独立性は本質的である.)また,他の種類のランダム行列に対し,Wigner の半円則および円則が知られている.このような,より一般的なランダム 行列の性質についても簡単に触れる.

Date & Time
2007/11/20 13:30-15:00
Venue
Meeting Room on 10th Floor, W8E Building (Campus map, O-okayama Area, Building No.26)
Speaker
Jan Peters (Max-Planck Institute, Germany)
Title
Towards Motor Skill Learning in Robotics
Abstract
Autonomous robots that can assist humans in situations of daily life have been a long standing vision of robotics, artificial intelligence, and cognitive sciences. A first step towards this goal is to create robots that can learn tasks triggered by environmental context or higher level instruction. However, learning techniques have yet to live up to this promise as only few methods manage to scale to high-dimensional manipulator or humanoid robots. In this talk, we investigate a general framework suitable for learning motor skills in robotics which is based on the principles behind many analytical robotics approaches. It involves generating a representation of motor skills by parameterized motor primitive policies acting as building blocks of movement generation, and a learned task execution module that transforms these movements into motor commands.

Learning parameterized motor primitives usually requires reward-related self-improvement, i.e., reinforcement learning. We propose a new, task-appropriate architecture, the Natural Actor-Critic. This algorithm is based on natural policy gradient updates for the actor while the critic estimates the natural policy gradient. Empirical evaluations illustrate the effectiveness and applicability to learning control on an anthropomorphic robot arm.

For the proper execution of motion, we need to learn how to realize the behavior prescribed by the motor primitives in their respective task space through the generation of motor commands. This transformation corresponds to solving the classical problem of operational space control through machine learning techniques. Such robot control problems can be reformulated as immediate reward reinforcement learning problems. We derive an EM-based reinforcement learning algorithm which reduces the problem of learning with immediate rewards to a reward-weighted regression problem. The resulting algorithm learns smoothly without dangerous jumps in solution space, and works well in application to complex high degree-of-freedom robots.

Date & Time
2007/11/09 13:30-15:00
Venue
Seminar Room on 5th Floor, W8E Building (Campus map, O-okayama Area, Building No.26)
Speaker
Jens Kohlmorgen (Fraunhofer FIRST, Germany)
Title
Real-Time Mental Workload Detection while Driving
Abstract
The ability to immediately detect mental overload in human operators is a vital demand for complex monitoring and control processes. Such processes can be found, for example, in industrial production lines and in aviation, but also in common every day tasks like driving. We here present an EEG-based system that is able to detect high mental workload in drivers while they are driving a car on the highway during the usual daytime traffic. The information is immediately utilized to mitigate the workload typically induced by the influx of information that is generated by the car's electronic systems. Two experimental paradigms were tested: an auditory workload scheme and a mental calculation task. While the detection performance turns out to be strongly subject-dependent, the results are good to excellent for the majority of subjects. We show that in these cases an induced mitigation of a reaction time experiment leads to an improved performance of the driver in that task. Example videos demonstrate the efficiency of this approach.

Date & Time
2007/11/02 13:30-15:00
Venue
Seminar Room on 5th Floor, W8E Building (Campus map, O-okayama Area, Building No.26)
Speaker
Nicole Krämer (Technical University of Berlin, Germany)
Title
Error Bars and Degrees of Freedom for Kernel Partial Least Squares
Abstract
Kernel Partial Least Squares (KPLS) is a supervised dimensionality reduction method that constructs orthogonal features with maximal covariance to the response variable(s). For prediction, the response is then projected onto these features. For the derivation of prediction intervals (on top of the usual point estimates), we need to determine an (approximate) distribution of the fitted function. As for KPLS, the distribution cannot be determined analytically, we propose an approximation in terms of a first order Taylor approximation of PLS. Following the same line, we also derive an unbiased estimate of the Degrees of Freedom of KPLS. This estimate can then be used for model selection.

Date & Time
2007/10/18 15:00-16:30
Venue
Meeting Room on 10th Floor, W8E Building (Campus map, O-okayama Area, Building No.26)
Speaker
Jean-Philippe Vert (Ecole des Mines, France)
Title
QSAR and Virtual Screening with Support Vector Machines
Abstract
Support vector machines (SVM) are machine learning algorithms increasingly popular in many fields including chemoinformatics. They enjoy good performances on many real-world applications, and introduce a new framework to represent and compare the data to be processed, such as molecules: instead of an explicit representation of molecules as a set of features or a fingerprint, SVM only require the definition of a measure of similarity between molecules, called a kernel, that can in some cases be defined directly, without prior vectorization of the molecules. After a brief introduction to SVM and the notion of kernels, I will give several examples of kernels for molecules based on their 2D and 3D structures, and illustrate their relevance on toxicity prediction experiments.

Date & Time
2007/10/16 15:00-16:30
Venue
Seminar Room on 5th Floor, W8E Building (Campus map, O-okayama Area, Building No.26)
Speaker
Ryota Tomioka (University of Tokyo, Japan)
Title
Prediction over Matrices with Dual Spectral Regularization and EEG Classification
Abstract
Prediction over matrices arises naturally in many real world problems. It is a common prior belief that the discriminative information is concentrated in some low dimensional subspace. The dual spectral regularization expresses this induction bias in a convex optimization framework. In fact, the L1 nature of the reuglarization forces many singular values to be zero. This sparseness allows good interpretation of the solution. Moreover, we propose an efficient optimization algorithm based on interior-point method. The convex duality plays the key role in the implementation. We apply the logistic regression with dual spectral regularization to motor-imagery EEG classification problem in the context of Brain-Computer Interface (BCI). Classification results on 162 BCI datasets show significant improvement in the classification accuracy against l2-regularized logistic regression, rank=2 approximated logistic regression as well as Common Spatial Pattern (CSP) based classifier, which is a popular technique in BCI . Connections to LASSO, GP classification with a second order polynomial kernel, and SVM are discussed.

Date & Time
2007/07/25 14:00-15:30
Venue
Meeting Room on 10th Floor, W8E Building (Campus map, O-okayama Area, Building No.26)
Speaker
Liwei Wang (Peking University, China)
Title
On Learning with Dissimilarity Functions and Rademacher Margin Complexity
Abstract
Learning with dissimilarity functions is the problem of learning a classification task when only similarity information of the objects are given. This problem arises partly in image recognition where feature extraction is usually difficult but there are a number of image dissimilarity measures can be used. The first part of this talk devotes to the sufficient conditions of a dissimilarity functions to allow one building efficient learning algorithms. It turns out that the theory suggests a boosting type algorithm for which the base classifier is a special kind of decision stump. I will also discuss some modifications of to make the algorithm tractable. The experimental results are promising. The second part of this talk is an on going work called Rademacher Margin complexity. The goal work is to provide more powerful error bound analysis tools especially for dissimilarity based learning algorithms. I will pose two open problems on the Rademacher Margin Complexity. Finally I will discuss some possible future directions.

Date & Time
2007/07/17 15:00-16:30
Venue
Meeting Room on 10th Floor, W8E Building (Campus map, O-okayama Area, Building No.26)
Speaker
Klaus-Robert Müller (Technical University of Berlin, Germany)
Title
Machine Learning for Computational Chemistry
Abstract
This talk will first introduce standard kernel methods (SVM) and Gaussian Processes.An interesting application scenario is then discussed: in-silico modeling of chemical properties such as water solubility, toxicity, lipophilicity etc. Accurate in-silico models for predicting aqueous solubility are needed in drug design and discovery, and many other areas of chemical research. A first principles modeling of solubility, however, would be overly complex, since too many physical factors with separate mechanisms are involved in the phase transition from solid to solvated molecules. We present machine learning approaches that provide a statistical modeling of aqueous solubility based on measured data. The model was validated on the well known set of 1311 compounds by Huuskonen et.al., and on an in-house dataset of 632 drug candidates at Schering.On top of the excellent predictions, the proposed machine learning models also provide confidence estimates for each individual prediction.