0% found this document useful (0 votes)
42 views3 pages

CS282 Machine Learning Course Project

The document discusses various machine learning concepts, focusing on Variational Autoencoders (VAEs), the Information Bottleneck principle, and Multi-View Clustering techniques. VAEs are highlighted for their generative modeling capabilities, while the Information Bottleneck principle is emphasized for its role in enhancing model generalization and efficiency. Multi-View Clustering methods are explored for their ability to leverage diverse data perspectives to improve clustering outcomes, particularly through deep learning approaches.

Uploaded by

sinuspray
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views3 pages

CS282 Machine Learning Course Project

The document discusses various machine learning concepts, focusing on Variational Autoencoders (VAEs), the Information Bottleneck principle, and Multi-View Clustering techniques. VAEs are highlighted for their generative modeling capabilities, while the Information Bottleneck principle is emphasized for its role in enhancing model generalization and efficiency. Multi-View Clustering methods are explored for their ability to leverage diverse data perspectives to improve clustering outcomes, particularly through deep learning approaches.

Uploaded by

sinuspray
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

CS282 Machine Learning

Course Project

Student 1 Student 2
ID: xxxxxxxx ID: xxxxxxxx
email1@[Link] email2@[Link]

1 Related Work

Variational Autoencoder
Variational autoencoders (VAEs) are widely used generative models known for encoding input data
into a compact representation using an encoder-decoder structure. Unlike traditional autoencoders,
VAEs aim to model the latent space distribution of input data, enabling the generation of new data
samples that align with this distribution. Pioneering work on VAEs by [1] introduced the model
and its objective function, while [2] developed a gradient approximation technique by sampling
latent variables. Since then, VAEs have been utilized in various applications, including image
generation, data compression, and anomaly detection. For instance, [3] combined VAEs with
generative adversarial networks (GANs) for producing high-quality images, and [4] leveraged VAEs
for detecting anomalies in seasonal key performance indicators (KPIs). Additionally, VAEs have
been integrated with other deep learning models, such as convolutional neural networks (CNNs) and
recurrent neural networks (RNNs), to enhance task performance. An example is [5], where VAEs
and CNNs were combined to extract hierarchical features from images. In summary, VAEs represent
a significant advancement in generative modeling, contributing notably to deep learning. They are
particularly useful in deep clustering within the probabilistic graphical model framework, as they
effectively merge variational inference with deep autoencoders [6].
Information Bottleneck
The information bottleneck (IB) principle is a crucial concept in information theory and machine
learning, providing insight into balancing compression and prediction. Introduced by [7], it em-
phasizes the importance of learning algorithms that effectively retain relevant information while
discarding irrelevant data. Since its inception, the IB principle has found extensive application in
various machine learning tasks, including unsupervised learning [8], classification [9], and clustering
[10, 11]. More recently, it has contributed to understanding brain information processing [12, 13].
By constraining the representations learned by deep neural networks, the IB principle can enhance
these models’ generalization abilities and reduce computational demands[10] . It has also inspired
new methods for training deep neural networks, like the deep variational information bottleneck
[10]. Its foundation in information theory concepts such as Markov chains, entropy, and conditional
entropy [14] supports its diverse applications across fields like data mining, image processing, natural
language processing, computer vision [15–17], and even control theory [18]. Overall, it has become
a valuable tool for designing advanced machine learning algorithms.
Multi - View Clustering
In actual clustering tasks, data is often multi-viewed, leading to the development of multi-view
clustering (MVC) techniques. These methods leverage complementary information from different
data perspectives to tackle the limitations of traditional clustering methods. Co-training, introduced
by [19], is a prevalent MVC approach, involving multiple models trained on distinct data views to
label additional unlabeled data, successfully applied in text classification [20], image recognition
[21], and community detection [22]. Low-rank matrix factorization is another strategy aimed at

CS282 Machine Learning (2024 Fall), SIST, ShanghaiTech University.


discovering a shared latent data representation across views, with variations like structured low-rank
matrix factorization [23], tensor-based factorization[24] , and deep matrix factorization [25]. Recent
years have seen subspace methods used in MVC, such as [26, 27, 6, 28] integrating multiple affinity
graphs into a consensus graph considering topological relevance.
Traditional MVC approaches largely rely on linear and shallow embedding, which can’t fully exploit
data nonlinearities crucial for complex clustering structures. Deep learning’s emergence as a potent
MVC tool has seen various neural network architectures proposed. For instance, [29] introduced a
deep adversarial MVC method using adversarial training to learn a joint data representation across
multiple views, while [30] proposed deep embedded clustering (DEC) to map high-dimensional orig-
inal feature spaces to optimized lower-dimensional spaces. DEC’s introduction has led to numerous
extensions and improvements. Autoencoder networks are frequently employed in unsupervised data
representation learning, efficiently learning complex nonlinear mapping functions. Utilizing deep
autoencoders (DAE) [31] is a common strategy in developing deep clustering techniques. Lastly, [32]
proposed a multi-level feature learning framework for contrastive multi-view clustering (MFLVC),
merging MVC with contrastive learning to boost clustering effectiveness.

2 Contribution Percent

References
[1] D. P. Kingma and M. Welling, “Auto-encoding variational bayes,” arXiv preprint
arXiv:1312.6114, 2013.
[2] D. P. Kingma, S. Mohamed, D. J. Rezende, and M. Welling, “Semi-supervised learning with
deep generative models,” in NeurIPS, 2014, pp. 3581–3589.
[3] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville,
and Y. Bengio, “Generative adversarial networks,” Communications of the ACM, vol. 63, pp.
139–144, 2020.
[4] Y. Liu, Y. Lin, Q. Xiao, G. Hu, and J. Wang, “Self-adversarial variational autoencoder with
spectral residual for time series anomaly detection,” Neurocomputing, vol. 458, pp. 349–363,
2021.
[5] J. Masci, U. Meier, D. Cire¸san, and J. Schmidhuber, “Stacked convolutional auto-encoders for
hierarchical feature extraction,” in ICANN, 2011, pp. 52–59.
[6] C.-Y. Lu, H. Min, Z.-Q. Zhao, L. Zhu, D.-S. Huang, and S. Yan, “Robust and efficient subspace
segmentation via least squares regression,” in ECCV. Springer, 2012, pp. 347–360.
[7] N. TISHBY, “The information bottleneck method,” Computing Research Repository (CoRR),
2000.
[8] N. Tishby and N. Zaslavsky, “Deep learning and the information bottleneck principle,” in 2015
IEEE Information Theory Workshop, 2015, pp. 1–5.
[9] A. A. Alemi, I. Fischer, J. V. Dillon, and K. Murphy, “Deep variational information bottleneck,”
arXiv preprint arXiv:1612.00410, 2016.
[10] A. Achille and S. Soatto, “Information dropout: Learning optimal representations through noisy
computation,” TPAMI, vol. 40, pp. 2897–2905, 2018.
[11] W. Yan, J. Zhu, Y. Zhou, Y. Wang, and Q. Zheng, “Multi-view semantic consistency based
information bottleneck for clustering,” arXiv preprint arXiv:2303.00002, 2023.
[12] A. M. Saxe, J. L. McClelland, and S. Ganguli, “A mathematical theory of semantic development
in deep neural networks,” Proceedings of the National Academy of Sciences, vol. 116, pp.
11 537–11 546, 2019.
[13] R. Shwartz-Ziv and N. Tishby, “Opening the black box of deep neural networks via information,”
Information Flow in Deep Neural Networks, 2022.

2
[14] T. Cover and J. Thomas, Elements of Information Theory, 2nd ed. John Wiley Sons: Hoboken,
NJ, USA, 2006.
[15] J. Goldberger, H. Greenspan, and S. Gordon, “Unsupervised image clustering using the infor-
mation bottleneck method,” in DAGM-Symposium, 2002, pp. 158–165.
[16] S. M. D. A. C. Jayatilake and G. U. Ganegoda, “Involvement of machine learning tools in
healthcare decision making,” Journal of Healthcare Engineering, 2021.
[17] Q. Sun, J. Li, H. Peng, J. Wu, X. Fu, C. Ji, and P. S. Yu, “Graph structure learning with
variational information bottleneck,” in AAAI, 2022, pp. 4165–4174.
[18] B. Paranjape, M. Joshi, J. Thickstun, H. Hajishirzi, and L. Zettlemoyer, “An information
bottleneck approach for controlling conciseness in rationale extraction,” in EMNLP, 2020, pp.
1938–1952.
[19] A. Blum and T. Mitchell, “Combining labeled and unlabeled data with co-training,” in Pro-
ceedings of the Eleventh Annual Conference on Computational Learning Theory, 1998, pp.
92–100.
[20] D. Zhou, O. Bousquet, T. Lal, J. Weston, and B. Schölkopf, “Learning with local and global
consistency,” in NeurIPS, 2003, pp. 321–328.
[21] C.-K. Lee and T.-L. Liu, “Guided co-training for multi-view spectral clustering,” in ICIP, 2016,
pp. 4042–4046.
[22] J. Liu, C. Wang, J. Gao, and J. Han, “Multi-view clustering via joint nonnegative matrix
factorization,” in SDM, 2013, pp. 252–260.
[23] S. Zheng, X. Cai, C. Ding, F. Nie, and H. Huang, “A closed form solution to multi-view low-rank
regression,” in AAAI, 2015, pp. 1973–1979.
[24] T. V. de Cruys, T. Poibeau, and A. Korhonen, “A tensor-based factorization model of seman-
tic compositionality,” in Conference of the North American Chapter of the Association of
Computational Linguistics (HTL-NAACL), 2013, pp. 1142–1151.
[25] H. Zhao, Z. Ding, and Y. Fu, “Multi-view clustering via deep matrix factorization,” in AAAI,
2017, pp. 2921–2927.
[26] E. Elhamifar and R. Vidal, “Sparse subspace clustering: Algorithm, theory, and applications,”
TPAMI, vol. 35(11), pp. 2765–2781, 2013.
[27] S. Huang, H. Wu, Y. Ren, I. Tsang, Z. Xu, W. Feng, and J. Lv, “Multi-view subspace clustering
on topological manifold,” in NeurIPS, 2022, pp. 25 883–25 894.
[28] F. Nie, H. Wang, H. Huang, and C. Ding, “Unsupervised and semi-supervised learning via l
1-norm graph,” in ICCV. IEEE, 2011, pp. 2268–2273.
[29] Z. Li, Q. Wang, Z. Tao, Q. Gao, Z. Yang et al., “Deep adversarial multi-view clustering network,”
in IJCAI, 2019, pp. 2952–2958.
[30] J. Xie, R. Girshick, and A. Farhadi, “Unsupervised deep embedding for clustering analysis,” in
ICML, 2016, pp. 478–487.
[31] G. E. Hinton and R. R. Salakhutdinov, “Reducing the dimensionality of data with neural
networks,” Science, vol. 313, pp. 504–507, 2006.
[32] J. Xu, Y. Ren, H. Tang, X. Pu, X. Zhu, M. Zeng, and L. He, “Multi-vae: Learning disentangled
view-common and view-peculiar visual representations for multi-view clustering,” ICCV, pp.
9234–9243, 2021.

You might also like