上QQ阅读APP看书，第一时间看更新

Summary

In this chapter, we introduced semi-supervised learning, starting from the scenario and the assumptions needed to justify the approaches. We discussed the importance of the smoothness assumption when working with both supervised and semi-supervised classifiers in order to guarantee a reasonable generalization ability. Then we introduced the clustering assumption, which is strictly related to the geometry of the datasets and allows coping with density estimation problems with a strong structural condition. Finally, we discussed the manifold assumption and its importance in order to avoid the curse of dimensionality.

The chapter continued by introducing a generative and inductive model: Generative Gaussian mixtures, which allow clustering labeled and unlabeled samples starting from the assumption that the prior probabilities are modeled by multivariate Gaussian distributions.

The next topic was about a very important algorithm: contrastive pessimistic likelihood estimation, which is an inductive, semi-supervised classification framework that can be adopted together with any supervised classifier. The main concept is to define a contrastive log-likelihood based on soft-labels (representing the probabilities for the unlabeled samples) and impose a pessimistic condition in order to minimize the trust in the soft-labels. The algorithm can find the best configuration that maximizes the log-likelihood, taking into account both labeled and unlabeled samples.

Another inductive classification approach is provided by the S³VM algorithm, which is an extension of the classical SVM approach, based on two extra optimization constraints to address the unlabeled samples. This method is relatively powerful, but it's non-convex and, therefore, very sensitive to the algorithms employed to minimize the objective function.

An alternative to S³VM is provided by the TSVM, which tries to minimize the objective with a condition based on variable labels. The problem is, hence, pided into two parts: the supervised one, which is exactly the same as standard SVM, and the semi-supervised one, which has a similar structure but without fixed y labels. This problem is non-convex too and it's necessary to evaluate different optimization strategies to find the best trade-off between accuracy and computational complexity. In the reference section, there are some useful resources so you can examine all these problems in depth and find a suitable solution for each particular scenario.

In the next chapter, Chapter 3, Graph-Based Semi-Supervised Learning we're continuing this exploration by discussing some important algorithms based on the structure underlying the dataset. In particular, we're going to employ graph theory to perform the propagation of labels to unlabeled samples and to reduce the dimensionality of datasets in non-linear contexts.