Menu: [ Top, Publications, Software, Talks, Courses, CV, UTokyo Lab., RIKEN AIP ]
[ English | Japanese ]

Software

This page is outdated. For more recent software on reliable and robust machine learning, please see here, which is maintained by Imperfect Information Learning Team, Center for Advanced Intelligence Project (AIP), RIKEN.
The software available below is free of charge for research and education purposes. However, you must obtain a license from the author(s) to use it for commercial purposes. The software must not be distributed without prior permission of the author(s).

The software is supplied "as is" without warranty of any kind, and the author(s) disclaim any and all warranties, including but not limited to any implied warranties of merchantability and fitness for a particular purpose, and any warranties or non infringement. The user assumes all liability and responsibility for use of the software and in no event shall the author(s) be liable for damages of any kind resulting from its use.

Fundamentals


Applications



Kullback-Leibler Importance Estimation Procedure (KLIEP)


Unconstrained Least-Squares Importance Fitting (uLSIF)


Least-Squares Density-Difference (LSDD)

  • Examples:


  • References:

    Least-Squares Log-Density Gradient (LSLDG)

    • Least-Squares Log-Density Gradient (LSLDG) is an algorithm which directly estimates the gradient of a log-density without going through density estimation. The solution is computed analytically.
    • The application of LSLDG is clustering based on mode seeking. The clustering method has the following advantages:
      • We do not need to set the number of clusters in advance.
      • All the parameters (e.g. bandwidth) can be optimized by cross validation.
      • It works significantly better than mean-shift clustering in high-dimensional data.
    • MATLAB implementation of LSLDG: LSLDG.zip
      • "demo_LSLDG.m" is a demo script for log-density gradient estimation.
      • "demo_LSLDGClust.m" is a demo script for clustering.
    • Examples:
      • Log-density gradient estimation.

      • Clustering by seeking modes.

    • References:

    Maximum Likelihood Mutual Information (MLMI)

    • Maximum Likelihood Mutual Information (MLMI) is an estimator of mutual information based on the density-ratio estimation method KLIEP. A mutual information estimator could be used as a measure of statistical independence between random variables (smaller is more independent).
    • MATLAB implementation of MLMI: MLMI.zip
      • "MLMI.m" is the main function.
      • "demo_MLMI.m" is a demo script.
    • Examples:


    • References:

    Least-Squares Mutual Information (LSMI)

    • Least-Squares Mutual Information (LSMI) is an estimator of a squared-loss variant of mutual information based on the density-ratio estimation method uLSIF. A mutual information estimator could be used as a measure of statistical independence between random variables (smaller is more independent).
    • MATLAB implementation of LSMI for plain kernel models: LSMI.zip
      • "LSMIregression.m" and "LSMIclassification.m" are the main functions.
      • "demo_LSMI.m" is a demo script.
    • MATLAB implementation of LSMI for multiplicative kernel models: mLSMI.zip
      • "mLSMI.m" is the main function.
      • "demo_mLSMI.m" is a demo script.
    • Examples:


    • References:
      • Suzuki, T., Sugiyama, M., Kanamori, T., & Sese, J.
        Mutual information estimation reveals global associations between stimuli and biological processes.
        BMC Bioinformatics, vol.10, no.1, pp.S52, 2009.
        [ paper ]
      • Sakai, T. & Sugiyama, M.
        Computationally efficient estimation of squared-loss mutual information with multiplicative kernel models.
        IEICE Transactions on Information and Systems, vol.E97-D, no.4, pp.968-971, 2014.
        [ paper ]

    Least-Squares Quadratic Mutual Information (LSQMI)

    • Least-Squares Quadratic Mutual Information (LSQMI) is an estimator of a L2-loss variant of mutual information called quadratic mutual information (QMI) based on the density-difference estimation method LSDD. An QMI estimator could be used as a measure of statistical independence between random variables (smaller is more independent).
    • MATLAB implementation of LSQMI: LSQMI.zip
      • "LSQMIregression.m" and "LSQMIclassification.m" are the main functions.
      • "demo_LSQMI.m" is a demo script.
    • Examples:


    • References:
      • Sainui, J. & Sugiyama, M.
        Direct approximation of quadratic mutual information and its application to dependence-maximization clustering.
        IEICE Transactions on Information and Systems, vol.E96-D, no.19, pp.2282-2285, 2013.
        [ paper ]

    Least-squares Hetero-distributional Subspace Search (LHSS)

    • Least-squares Hetero-distributional Subspace Search (LHSS) is an algorithm to find a subspace in which two probability distributions are similar (which is called the hetero-distributional subspace). LHSS can be used for improving the accuracy of direct density ratio estimatoin in high dimensions: first identify the hetero-distributional subspace by LHSS and then perform density ratio estimation only in the hetero-distributional subspace. This is called direct density-ratio estimation with dimensionality reduction (D3).
    • MATLAB implementation of LHSS: LHSS.zip
      • "demo_LHSS.m" is a demo script.
      • "LHSS_train.m" is the function to find the hetero-distributional subspace.
      • "LHSS_test.m" is the function to estimate the density ratio based on LHSS.
    • Examples:




    • References:
      • Sugiyama, M., Yamada, M., von Bünau, P., Suzuki, T., Kanamori, T., & Kawanabe, M.
        Direct density-ratio estimation with dimensionality reduction via least-squares hetero-distributional subspace search.
        Neural Networks, vol.24, no.2, pp.183-198, 2011.
        [ paper ]

    Importance-Weighted Least-Squares (IWLS)

    • Importance-Weighted Least-Squares (IWLS) is an importance-weighted version of regularized kernel least-squares for covariate shift adaptation, where the training and test input distributions differ but the conditional distribution of outputs given inputs is unchanged between training and test phases. uLSIF is used for importance estimation, and Importance-Weighted Cross-Validation (IWCV) is used for model selection.

    • MATLAB implementation of IWLS: IWLS.zip
      • "demo_IWLS.m" is a demo script.
    • Examples:
    • References:

    Importance-Weighted Least-Squares Probabilistic Classifier (IWLSPC)

    • The Importance-Weighted Least-Squares Probabilistic Classifier (IWLSPC) is an importance-weighted version of the LSPC for covariate shift adaptation, where training and test input distributions differ but the conditional distribution of outputs given inputs is unchanged between the training and test phases. uLSIF is used for importance estimation, and Importance-Weighted Cross-Validation (IWCV) is used for model selection.
    • MATLAB implementation of IWLSPC: IWLSPC.zip
      • "demo_IWLSPC.m" is a demo script.
    • Examples:
      Training and test samples

      Training and test labels predicted by plain LSPC

      Training and test labels predicted by IWLSPC
    • References:
      • Hachiya, H., Sugiyama, M., & Ueda, N.
        Importance-weighted least-squares probabilistic classifier for covariate shift adaptation with application to human activity recognition.
        Neurocomputing, vol.80, pp.93-101, 2012.
        [ paper ]
      • Kanamori, T., Hido, S., & Sugiyama, M.
        A least-squares approach to direct importance estimation.
        Journal of Machine Learning Research, vol.10 (Jul.), pp.1391-1445, 2009.
        [ paper ]
      • Sugiyama, M., Krauledat, M., & Müller, K.-R.
        Covariate shift adaptation by importance weighted cross validation.
        Journal of Machine Learning Research, vol.8 (May), pp.985-1005, 2007.
        [ paper ]

    Maximum Likelihood Outlier Detection (MLOD)

    • Maximum Likelihood Outlier Detection (MLOD) is an inlier-based outlier detection algorithm. The problem of inlier-based outlier detection is to find outliers in a set of samples (called the evaluation set) using another set of samples which consists only of inliers (called the model set). MLOD orders the samples in the evaluation set according to their degree of outlyingness. The degree of outlyingness is measured by the ratio of probability densities of evaluation and model samples. The ratio is estimated by the density-ratio estimation method KLIEP.
    • MATLAB implementation of MLOD: MLOD.zip
      • "MLOD.m" is the main function.
      • "demo_MLOD.m" is a demo script.
    • Examples:
    • References:

    Least-Squares Outlier Detection (LSOD)

    • Least-Squares Outlier Detection (LSOD) is an inlier-based outlier detection algorithm. The problem of inlier-based outlier detection is to find outliers in a set of samples (called the evaluation set) using another set of samples which consists only of inliers (called the model set). LSOD orders the samples in the evaluation set according to their degree of outlyingness. The degree of outlyingness is measured by the ratio of probability densities of evaluation and model samples. The ratio is estimated by the density-ratio estimation method uLSIF.
    • MATLAB implementation of LSOD: LSOD.zip
      • "LSOD.m" is the main function.
      • "demo_LSOD.m" is a demo script.
    • Examples:
    • References:

    Maximum Likelihood Feature Selection (MLFS)

    • Maximum Likelihood Feature Selection (MLFS) is a feature selection method for supervised regression and classification. MLFS orders input features according to their dependence on output values. Dependency between inputs and outputs is evaluated based on an estimator of mutual information called MLMI.

    • MATLAB implementation of MLFS: MLFS.zip
      • "MLFSP.m" is the main function.
      • "demo_MLFS.m" is a demo script.
    • Examples:
    • References:

    Least-Squares Feature Selection (LSFS)

    • Least-Squares Feature Selection (LSFS) is a feature selection method for supervised regression and classification. LSFS orders input features according to their dependence on output values. Dependency between inputs and outputs is evaluated based on an estimator of squared-loss mutual information called LSMI.
    • MATLAB implementation of LSFS: LSFS.zip
      • "LSFSP.m" is the main function.
      • "demo_LSFS.m" is a demo script.
    • Examples:
    • References:
      • Suzuki, T., Sugiyama, M., Kanamori, T., & Sese, J.
        Mutual information estimation reveals global associations between stimuli and biological processes.
        BMC Bioinformatics, vol.10, no.1, pp.S52, 2009.
        [ paper ]
      • Kanamori, T., Hido, S., & Sugiyama, M.
        A least-squares approach to direct importance estimation.
        Journal of Machine Learning Research, vol.10 (Jul.), pp.1391-1445, 2009.
        [ paper ]


    Least-Squares Dimensionality Reduction (LSDR)

    • Least-Squares Dimensionality Reduction (LSDR) is a supervised dimensionality reduction method. LSDR adopts a squared-loss variant of mutual information as an independence measure and estimates it using the density-ratio estimation method uLSIF. Thanks to this formulation, all tuning parameters such as the Gaussian width and the regularization parameter can be automatically chosen based on a cross-validation method. Then LSDR maximizes this independence measure (making the complementary features conditional independent of outputs) by a natural gradient algorithm.
    • MATLAB implementation of LSDR: LSDR.zip
      • "demo_LSDR.m" is a demo script.
    • Examples:
    • Reference:
      • Suzuki, T. & Sugiyama, M.
        Sufficient dimension reduction via squared-loss mutual information estimation.
        Neural Computation, vol.25, no.3, pp.725-758, 2013.
        [ paper ]

    Least-Squares Quadratic Mutual Information Derivative (LSQMID)

    • The Least-Squares Quadratic Mutual Information Derivative (LSQMID) is a supervised dimensionality reduction method. LSQMID aims to find a linear projection of input such that quadratic mutual information (QMI) between projected input and output is maximized. LSQMID directly estimates the derivative of QMI without estimating QMI itself. Then, an QMI maximizer is obtained by fixed-point iteration. An important property of LSQMID is its robustness against outliers. Moreover, all tuning parameters such as the Gaussian width and the regularization parameter can be automatically chosen based on cross-validation.
    • MATLAB implementation of LSQMID: LSQMID.zip
      • "demo_LSQMID_SDR.m" is a demo script.
    • Examples:
    • Reference:
      • Tangkaratt, V., Sasaki, H., & Sugiyama, M.
        Direct estimation of the derivative of quadratic mutual information with application in supervised dimension reduction.
        arXiv, 1508.01019, 2015.

    Local Fisher Discriminant Analysis (LFDA)

    • Local Fisher Discriminant Analysis (LFDA) is a linear supervised dimensionality reduction method and is particularly useful when some class consists of separate clusters. LFDA has an analytic form of the embedding matrix and the solution can be easily computed just by solving a generalized eigenvalue problem. Therefore, LFDA is scalable to large datasets and computationally reliable. A kernelized variant of LFDA called Kernel LFDA (KLFDA) is also available.

    • MATLAB implementation of LFDA: LFDA.zip
      • "LFDA.m" is the main function.
      • "demo_LFDA.m" is a demo script.
    • Examples:
    • MATLAB implementation of KLFDA: KLFDA.zip
      • "KLFDA.m" is the main function.
      • "demo_KLFDA.m" is a demo script.
    • Examples:
    • References:

    Semi-supervised Local Fisher discriminant analysis (SELF)

    • Semi-supervised Local Fisher discriminant analysis (SELF) is a linear semi-supervised dimensionality reduction method. SELF smoothly bridges supervised Local Fisher Discriminant Analysis (LFDA) and unsupervised Principal Component Analysis (PCA), by which a natural regularization effect can be obtained when only a small number of labeled samples are available. SELF has an analytic form of the embedding matrix and the solution can be easily computed just by solving a generalized eigenvalue problem. Therefore, SELF is scalable to large datasets and computationally reliable. Applying the standard kernel trick allows us to obtain a non-linear extension of SELF called Kernel SELF (KSELF).
    • When SELF is operated in the complete supervised mode, it is reduced to LFDA. However, its solution is generally slightly different from the one obtained by LFDA since nearest neighbor search (used for computing local data scaling in the affinity matrix) is carried out in a different manner: LFDA searches for nearest neighbors within each class, while nearest neighbor search in SELF is performed over all samples (including unlabeled samples). This is becuase SELF presumes that only a small number of labeled samples are available and searching for nearest neighbors within each class is not effective for capturing local data scaling in small sample cases. When SELF is operated in the complete unsupervised mode, it is reduced to PCA.
    • MATLAB implementation of SELF: SELF.zip
      • "SELF.m" is the main function.
      • "demo_SELF.m" is a demo script.
    • Examples:
    • Reference:
      • Sugiyama, M., Idé, T., Nakajima, S., & Sese, J.
        Semi-supervised local Fisher discriminant analysis for dimensionality reduction.
        Machine Learning, vol.78, no.1-2, pp.35-61, 2010.
        [ paper ]

    PU Classification


    PNU Classification

    • PNU classification is a semi-supervised classification method that combines PN classification (classification from positive and negative samples, ordinary supervised classification) with PU classification (classification from positive and unlabeled samples) or NU classification (classification from negative and unlabeled samples). Unlike existing semi-supervised classification methods, PNU classification does not require any distributional assumptions such as the cluster assumption and the manifold assumption.
    • MATLAB implementation of PNU classification with the squared loss: PNU.zip
      • "PNU_SL.m" is a function for training a classifier.
      • "demo.m" is a demo function.
    • Examples:

    • Reference:
      • Sakai, T., du Plessis, M. C., Niu, G., & Sugiyama, M.
        Semi-supervised classification based on classification from positive and unlabeled data.
        arXiv:1605.06955 [cs.LG]
        [ paper ]

    Least-Squares Conditional Density Estimation (LSCDE)

    • Least-Squares Conditional Density Estimation (LSCDE) is an algorithm to estimate the conditional density function for multi-dimensional continuous variables. The solution of LSCDE can be computed analytically and all the tuning parameters such as the kernel width and regularization parameters can be automatically chosen by cross-validation.
    • MATLAB implementation of LSCDE: LSCDE.zip
      • "LSCDE.m" is the main function.
      • "demo_LSCDE.m" is a demo script.
    • Examples:


    • References:
      • Sugiyama, M., Takeuchi, I., Kanamori, T., Suzuki, T., Hachiya, H., & Okanohara, D.
        Least-squares conditional density estimation.
        IEICE Transactions on Information and Systems, vol.E93-D, no.3, pp.583-594, 2010.
        [ paper ]

    Least-Squares Probabilistic Classifier (LSPC)

    • Least-Squares Probabilistic Classifier (LSPC) is a multi-class probabilistic classification algorithm. Its solution can be computed analytically in a class-wise manner, so it is computationally very efficient.
    • MATLAB implementation of LSPC: LSPC.zip
      • "demo_LSPC.m" is a demo script.
    • Examples:






    • References:
      • Sugiyama, M.
        Superfast-trainable multi-class probabilistic classifier by least-squares posterior fitting.
        IEICE Transactions on Information and Systems, vol.E93-D, no.10, pp.2690-2701, 2010.
        [ paper (revised version) ]

      • Yamada, M., Sugiyama, M., Wichern, G., & Simm, J.
        Improving the accuracy of least-squares probabilistic classifiers.
        IEICE Transactions on Information and Systems, vol.E94-D, no.6, pp.1337-1340, 2011.
        [ paper ]


    Least-Squares Independence Test (LSIT)

    • Least-Squares Independence Test (LSIT) is a method of testing the null hypothesis that paired (input-output) samples are independent. LSIT adopts a squared-loss variant of mutual information as an independence measure and estimates it using the density-ratio estimation method uLSIF. Thanks to this formulation, all tuning parameters such as the Gaussian width and the regularization parameter can be automatically chosen based on a cross-validation method.
    • MATLAB implementation of LSIT: LSIT.zip
      • "demo_LSIT.m" is a demo script.
    • Examples:


    • References:
      • Sugiyama, M. & Suzuki, T.
        Least-squares independence test.
        IEICE Transactions on Information and Systems, vol.E94-D, no.6, pp.1333-1336, 2011.
        [ paper ]

    Least-Squares Two-Sample Test (LSTT)

    • Least-Squares Two-Sample Test (LSTT) is a method of testing the null hypothesis that two sets of samples are drawn from the same probability distribution. LSTT adopts a squared-loss variant of mutual information as an independence measure and estimates it using the density-ratio estimation method uLSIF. Thanks to this formulation, all tuning parameters such as the Gaussian width and the regularization parameter can be automatically chosen based on a cross-validation method.
    • MATLAB implementation of LSTT: LSTT.zip
      • "demo_LSTT.m" is a demo script.
    • Examples:




    • References:
      • Sugiyama, M., Suzuki, T., Itoh, Y., Kanamori, T., & Kimura, M.
        Least-squares two-sample test.
        Neural Networks, vol.24, no.7, pp.735-751, 2011.
        [ paper ]

    SMI-based Clustering (SMIC)

    • SMI-based Clustering (SMIC) is an information-maximization clustering algorithm based on the squared-loss mutual information (SMI). SMIC is equipped with automatic tuning parameter selection based on an SMI estimator called least-squares mutual information (LSMI).

    • MATLAB implementation of SMIC: SMIC.zip
      • "SMIC.m" is the main function.
      • "demo_SMIC.m" is a demo script.
    • Examples:
    • References:
      • Sugiyama, M., Gang, N., Yamada, M., Kimura, M., & Hachiya, H.
        Information-maximization clustering based on squared-loss mutual information.
        Neural Computation, to appear.
        [ paper ]
      • Sugiyama, M., Yamada, M., Kimura, M., & Hachiya, H.
        On information-maximization clustering: tuning parameter selection and analytic solution.
        In L. Getoor and T. Scheffer (Eds.), Proceedings of 28th International Conference on Machine Learning (ICML2011), pp.65-72, Bellevue, Washington, USA, Jun. 28-Jul. 2, 2011.
        [ paper, slides ]

    Variational Bayesian Matrix Factorization (VBMF)

    • Givan a fully-observed noisy matrix V, Variational Bayesian Matrix Factorization (VBMF) denoises the matrix V under a low-rank assumption. Based on the empirical Bayesian method, VBMF automatically determines all the tuning parameters such as the rank of the denoised matrix, the noise variance, and the prior variances.
    • MATLAB implementation of VBMF: VBMF.zip
      • "VBMF.m" is the main function.
      • "demo_VBMF.m" is a demo script.
    • Examples:
    • References:

    Masashi Sugiyama (sugi [at] k.u-tokyo.ac.jp)