A Survey on the 20 Years Journey of Semi-Supervised Learning

1. Basic Concepts and Assumptions

1.1 Assumptions of semi-supervised learning

Fig 1: Illustrations of the semi-supervised learning assumptions. In each picture, a reasonable supervised decision boundary is depicted, as well as the optimal decision boundary, which could be closely approximated by a semi-supervised learning algorithm relying on the respective assumption [Jesper et al. 2019]

1.2 Connection to Clustering

2. Taxonomy of Semi-supervised learning methods

Fig 2: Visualization of the semi-supervised classification taxonomy. Each leaf in the taxonomy corresponds to a specific type of approach to incorporating unlabeled data into classification methods. In the leaf corresponding to transductive, graph-based methods, the dashed boxes represent distinct phases of the graph-based classification process, each of which has a multitude of variations [Jesper et al. 2019]

3. Inductive methods

3.1 Wrapper methods

3.2 Unsupervised pre-processing

3.3 Intrinsically semi-supervised

4. Transductive methods

Fig 3: Example of an undirected graphical model for graph-based classification. Filled nodes and edges between them correspond to the original graph G. Unfilled nodes with plus and minus signs correspond to auxiliary nodes connected to labeled data [Jesper et al. 2019]

5. Conclusions and future scope

  1. One important issue to address in near future is, possible performance degradation caused by the introduction of unlabeled data. Limited supervised techniques only perform better than their supervised counterparts or base learners in specific cases (Li and Zhou 2015; Singh et al. 2009).
  2. Recent studies have shown that perturbation-based methods with neural networks consistently outperform their supervised counterparts. This flexibility of using considerable advantage of using neural networks should be explored more and should gain more popularity in the field of semi-supervised neural networks.
  3. Recently, automated machine learning (AutoML) has been used widely to achieve the robustness of the models. These approaches include meta-learning and neural architecture for automatic algorithm selection and hyperparameter optimization. While AutoML techniques are successfully being used to supervised learning, there is a lack of improvement of research in the semi-supervised field. This field should be studied more to bring striking results to semi-supervised approaches.
  4. Another important step towards the advancement of semi-supervised approaches are having standardized software packages or library dedicated to this domain. Currently, some generic packages like the KEEL software package do exist that include a semi-supervised learning module (Triguero et al. 2017).
  5. One vital sector that needs serious attention from researchers is building a strong connection between clustering and classification. Essentially, both approaches are a special branch of semi-supervised, where either only labeled or unlabeled data is considered. The recent hype in generative models can be seen as a good sign to address this field.
  1. J. E. van Engelen and H. H. Hoos, “A survey on semi-supervised learning,” Machine Learning, vol. 109, no. 2, pp. 373–440, feb 2020.
  2. Bengio, Y., Delalleau, O., & Le Roux, N. (2006). Chapter 11. Label propagation and quadratic criterion. In O. Chapelle, B. Schölkopf, & A. Zien (Eds.), Semi-supervised learning (pp. 193–216). Cambridge: The
    MIT Press.
  3. Bengio, Y., Delalleau, O., & Le Roux, N. (2006). Chapter 11. Label propagation and quadratic criterion. In O. Chapelle, B. Schölkopf, & A. Zien (Eds.), Semi-supervised learning (pp. 193–216). Cambridge: The MIT Press.
  4. Goldberg, A. B., Zhu, X., Singh, A., Xu, Z.,&Nowak, R. D. (2009).Multi-manifold semi-supervised learning. In Proceedings of the 12th international conference on artificial intelligence and statistics (pp. 169–176).
  5. Haffari, G. R., & Sarkar, A. (2007). Analysis of semi-supervised learning with the Yarowsky algorithm. In Proceedings of the 23rd conference on uncertainty in artificial intelligence (pp. 159–166).
  6. Triguero, I., García, S.,&Herrera, F. (2015). Self-labeled techniques for semi-supervised learning: Taxonomy, software and empirical study. Knowledge and Information Systems, 42(2), 245–284
  7. Jebara, T., Wang, J., & Chang, S. F. (2009) Graph construction and b-matching for semi-supervised learning. In Proceedings of the 26th annual international conference on machine learning (pp. 441–448).
  8. Guyon, I., & Elisseeff, A. (2006). An introduction to feature extraction. In I. Guyon, M. Nikravesh, S. Gunn, & L. A. Zadeh (Eds.), Feature extraction (pp. 1–25). Berlin: Springer.
  9. Sheikhpour, R., Sarram, M. A., Gharaghani, S., & Chahooki, M. A. Z. (2017). A survey on semi-supervised feature selection methods. Pattern Recognition, 64, 141–158.
  10. Bennett, K. P., & Demiriz, A. (1999). Semi-supervised support vector machines. In Advances in neural information processing systems (pp. 368–374).
  11. Goldberg, A. B., Zhu, X., Singh, A., Xu, Z.,&Nowak, R. D. (2009).Multi-manifold semi-supervised learning. In Proceedings of the 12th international conference on artificial intelligence and statistics (pp. 169–176).
  12. Urner, R., Ben-David, S.,&Shalev-Shwartz, S. (2011). Access to unlabeled data can speed up prediction time. In Proceedings of the 27th international conference on machine learning (pp. 641–648).
  13. Bruna, J., Zaremba, W., Szlam, A., & LeCun, Y. (2014). Spectral networks and locally connected networks on graphs. In International conference on learning, representations.

--

--

--

Full stack developer and a new enthusiast in Machine Learning.

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Kids just have their own language

Risk Scores, Fairness, and Impossibility

Let’s talk about TensorFlow — the Google-made framework to get you into machine learning

Picking a Visual Classifier : A Design Perspective

It's just numbers

Preparing the text Data with scikit-learn

Object Detection With Synthetic Data

Managing Your Machine Learning Experiments with MLflow

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
subarna chowdhury soma

subarna chowdhury soma

Full stack developer and a new enthusiast in Machine Learning.

More from Medium

Using Gaussian Mixture to handle multimodal distributed features

png

Using Machine Learning to Understand the Effectiveness of the Ichimoku Cloud in Daily Price…

Creating Energy Efficient Deep ML Models

Let’s learn Intel oneAPI AI Analytics Toolkit