ACML2010 | The 2nd Asian Conference on Machine Learning

In many prediction problems, including those that arise in computer security and computational finance, the process generating the data is best modeled as an adversary with whom the predictor competes. The predictor's aim is to minimize the regret, or the difference between the predictor's performance and the best performance among some comparison class, whereas the adversary aims to maximize the predictor's regret. Even decision problems that are not inherently adversarial can be usefully modeled in this way, since the assumptions are sufficiently weak that effective prediction strategies for adversarial settings are very widely applicable.

The first part of this talk presents an example of online decision problems of this kind: a resource allocation problem from computational finance. We describe an efficient strategy with near-optimal performance.

The second part of the talk presents results on the regret of optimal strategies. These results are closely related to finite sample analyses of prediction strategies for probabilistic settings, where the data are chosen iid from an unknown probability distribution. In particular, we show that the optimal online regret is closely related to the behavior of empirical minimization in a probabilistic setting, but with a non-iid stochastic process generating the data. This allows the application of techniques from the analysis of the performance of empirical minimization in an iid setting, which relates the optimal regret to a measure of complexity of the comparison class that is similar to the Rademacher averages that have been studied in the iid setting.

Biography

Peter Bartlett is a professor in the Computer Science Division and the Department of Statistics at the University of California at Berkeley. He is the co-author, with Martin Anthony, of the book Learning in Neural Networks: Theoretical Foundations, has edited three other books, and has co-authored many papers in the areas of machine learning and statistical learning theory. He has served as an associate editor of the journals Machine Learning, Mathematics of Control Signals and Systems, the Journal of Machine Learning Research, the Journal of Artificial Intelligence Research, and the IEEE Transactions on Information Theory, as a member of the editorial boards of Machine Learning, the Journal of Artificial Intelligence Research, and Foundations and Trends in Machine Learning, and as a member of the steering committees of the Conference on Computational Learning Theory and the Algorithmic Learning Theory Workshop. He has consulted to a number of organizations, including General Electric, Telstra, Polaris Wireless and SAC Capital Advisors. In 2001, he was awarded the Malcolm McIntosh Prize for Physical Scientist of the Year in Australia, for his work in statistical learning theory. He was a Miller Institute Visiting Research Professor in Statistics and Computer Science at U.C. Berkeley, a fellow, senior fellow and professor in the Research School of Information Sciences and Engineering at the Australian National University's Institute for Advanced Studies, and an honorary professor in the School of Information Technology and Electrical Engineering at the University of Queensland. His research interests include machine learning, statistical learning theory, and adaptive control.

Geoff Webb

Research Professor
Faculty of Information Technology
Monash University
http://www.csse.monash.edu.au/~webb/

Title

Learning without Search (slides in pdf)

Abstract

Machine learning is classically conceived as search through a hypothesis space for a hypothesis that best fits the training data. In contrast, naive Bayes performs no search, extrapolating an estimate of a high-order conditional probability by composition from lower-order conditional probabilities. In this talk I show how this searchless approach can be generalised, creating a family of learners that provide a principled method for controlling the bias/variance trade-off. At one extreme very low variance can be achieved as appropriate for small data. Bias can be decreased with larger data in a manner that ensure Bayes optimal asymptotic error. These algorithms have the desirable properties of

training time that is linear with respect to training set size,
supporting parallel and anytime classification,
allowing incremental learning,
providing direct prediction of class probabilities,
supporting direct handling of missing values, and
robust handling of noise.

Despite being generative, they deliver classification accuracy competitive with state-of-the-art discriminative techniques.

Biography

Geoff Webb holds a research chair in the Faculty of Information Technology at Monash University, where he heads the Centre for Research in Intelligent Systems. Prior to Monash he held appointments at Griffith University and then Deakin University, where he received a personal chair. His primary research areas are machine learning, data mining, and user modelling. He is known for the development of numerous methods, algorithms and techniques for machine learning, data mining and user modelling. His commercial data mining software, Magnum Opus, incorporates many techniques from his association discovery research. Many of his learning algorithms are included in the widely-used Weka machine learning workbench. He is editor-in-chief of the highest impact data mining journal, Data Mining and Knowledge Discovery, co-editor of the Encyclopedia of Machine Learning (to be published by Springer), a member of the advisory board of Statistical Analysis and Data Mining and a member of the editorial boards of Machine Learning and ACM Transactions on Knowledge Discovery in Data.

Kenji Fukumizu

Professor
Department of Statistical Modeling
The Institute of Statistical Mathematics
http://www.ism.ac.jp/~fukumizu/

Title

Kernel Method for Bayesian Inference (slides in pdf)

Abstract

Since the proposal of support vector machine, various kernel methods have been extensively developed as nonlinear extensions or "kernelization" of classical linear methods. More recently, however, it has become clear that a potentially more reaching use of kernels is a linear way of dealing higher order statistics by embedding distributions as the form of means in reproducing kernel Hilbert spaces (RKHS) and by considering linear operators among them.

This talk will present how general Bayesian inference can be realized based on this recent recognition of the kernel method. First, I will explain the kernel method for expressing conditional probabilities by the kernel covariance operators of the distributions. Second, it will be shown that the general Bayes' rule, which is the center of Bayesian inference, is realized by operations on the kernel expression of the conditional probability and the prior represented as the mean in RKHS. The kernel mean of the posterior is obtained by Gram matrix computations to realize the procedure of Bayes' rule: constructing the joint probability and its normalization. The rate of convergence of the empirical kernel estimate to the true posterior is also derived.

As an application, I will discuss kernel nonparametric HMM, in which the conditional probabilities to define the HMM model are neither given in a specific form nor estimated with a parametric model, but given in the form of finite samples. By sequential application of the kernel Bayes' rule, it will be shown with some experiments that the hidden states can be sequentially estimated nonparametrically.

Biography

Kenji Fukumizu is a professor in the Department of Statistical Modeling at The Institute of Statistical Mathematics, where he serves as director of the Research Innovation Center. Prior to the current institute, he worked as a researcher in the Research and Development Center, Ricoh Co., Ltd. and the Institute of Physical and Chemical Research (RIKEN). He was a visiting scholar at the Department of Statistics, UC Berkeley, and a Humboldt fellow at Max Planck Institute for Biological Cybernetics. He serves as an associate editor of the journals, Annals of the Institute of Statistical Mathematics, Neural Networks, and Foundations and Trends in Machine Learning. His research interests include machine learning and mathematical statistics. He has co-authored a book on singular statistical models, and has authored a book on kernel methods (to be published in 2010).