{"id":19059,"date":"2020-09-19T10:16:21","date_gmt":"2020-09-19T10:16:21","guid":{"rendered":"https:\/\/ittutorial.org\/?p=19059"},"modified":"2020-09-28T12:11:43","modified_gmt":"2020-09-28T12:11:43","slug":"python-unsupervised-learning-6","status":"publish","type":"post","link":"https:\/\/ittutorial.org\/python-unsupervised-learning-6\/","title":{"rendered":"Dimension reduction with PCA | Python Unsupervised Learning -6"},"content":{"rendered":"<h1 class=\"dc-u-fs-h5\">Dimension reduction with PCA<\/h1>\n<p>&nbsp;<\/p>\n<p>Dimension reduction represent the same data using less features and is vital for building machine learning pipelines using real-world data.<\/p>\n<p>PCA performs dimension reduction by discarding the PCA features with lower variance, which it assumes to be noise, and retaining the higher variance PCA features, which it assumes to be informative.<\/p>\n<p>To use PCA for dimension reduction, you need to specify how many PCA features to keep.\u00a0 For example, specifying n_components=2 when creating a PCA model tells it to keep only the first two PCA features.\u00a0 A good choice is the intrinsic dimension of the dataset, if you know it.<\/p>\n<p>&nbsp;<\/p>\n<h2>Example<\/h2>\n<p>In a previous exercise, you saw that 2 was a reasonable choice for the &#8220;intrinsic dimension&#8221; of the fish measurements. Now use PCA for dimensionality reduction of the fish measurements, retaining only the 2 most important components.<\/p>\n<p>The fish measurements have already been scaled for you, and are available as <strong>scaled_samples<\/strong>.<\/p>\n<p>You can acces full code below link.<\/p>\n<p><strong><a href=\"https:\/\/drive.google.com\/file\/d\/15lEraueL3ZRVp2T1eTjM4K8qJt_7ZluZ\/view?usp=sharing\">https:\/\/drive.google.com\/file\/d\/15lEraueL3ZRVp2T1eTjM4K8qJt_7ZluZ\/view?usp=sharing<\/a><\/strong><\/p>\n<p>&nbsp;<\/p>\n<pre>from sklearn.decomposition import PCA\r\n\r\n# Create a PCA model with 2 components: pca\r\npca = PCA(n_components=2)\r\n\r\n# Fit the PCA instance to the scaled samples\r\npca.fit(scaled_samples)\r\n\r\n# Transform the scaled samples: pca_features\r\npca_features = pca.transform(scaled_samples)\r\n\r\n# Print the shape of pca_features\r\nprint(pca_features.shape)<\/pre>\n<h1>Dimension reduction with PCA<\/h1>\n<h3 class=\"exercise--title\">A tf-idf word-frequency array<\/h3>\n<div class=\"\">\n<p>In this exercise, you&#8217;ll create a tf-idf word frequency array for a toy collection of documents. For this, use the <strong>TfidfVectorizer<\/strong> from sklearn. It transforms a list of documents into a word frequency array, which it outputs as a csr_matrix. It has <strong>fit()<\/strong> and <strong>transform()<\/strong> methods like other sklearn objects.<\/p>\n<\/div>\n<pre># Import TfidfVectorizer\r\nfrom sklearn.feature_extraction.text import TfidfVectorizer\r\n\r\n# Create a TfidfVectorizer: tfidf\r\ntfidf = TfidfVectorizer()\r\n\r\n# Apply fit_transform to document: csr_mat\r\ncsr_mat = tfidf.fit_transform(documents)\r\n\r\n# Print result of toarray() method\r\nprint(csr_mat.toarray())\r\n\r\n# Get the words: words\r\nwords = tfidf.get_feature_names()\r\n\r\n# Print words\r\nprint(words)\r\n\r\n<\/pre>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-19060\" src=\"https:\/\/ittutorial.org\/wp-content\/uploads\/2020\/09\/Screenshot_21-1.png\" alt=\"\" width=\"994\" height=\"628\" srcset=\"https:\/\/ittutorial.org\/wp-content\/uploads\/2020\/09\/Screenshot_21-1.png 902w, https:\/\/ittutorial.org\/wp-content\/uploads\/2020\/09\/Screenshot_21-1-300x190.png 300w, https:\/\/ittutorial.org\/wp-content\/uploads\/2020\/09\/Screenshot_21-1-768x485.png 768w\" sizes=\"auto, (max-width: 994px) 100vw, 994px\" \/><\/p>\n<p>&nbsp;<\/p>\n<p>See you in the next article ..<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Dimension reduction with PCA &nbsp; Dimension reduction represent the same data using less features and is vital for building machine learning pipelines using real-world data. PCA performs dimension reduction by discarding the PCA features with lower variance, which it assumes to be noise, and retaining the higher variance PCA features, which it assumes to be &hellip;<\/p>\n","protected":false},"author":67,"featured_media":18628,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"om_disable_all_campaigns":false,"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"_uf_show_specific_survey":0,"_uf_disable_surveys":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[12904],"tags":[12362,12909,12905,12865,12898,12883,12864,12886,12889,13264,13633,13263,12900,12899,12910,12876,12877,12878,12879,12888,12913,12907,12908,12906,13259,13262,13261,13265,12896,12872,12871,12891,12911,12868,12884,12897,12875,12869,12881,12894,12870,12880,12893,12895,12902,12890,12912,13260,12914,12901,12885,12882,12892,12915,12916,12867,13004,13005,12866,12874,12873,12887],"class_list":["post-19059","post","type-post","status-publish","format-standard","has-post-thumbnail","","category-data-science","tag-advance-python","tag-clustering-quality","tag-cross-tabulation","tag-data-science","tag-data-science-example-of-unsupervised-learning","tag-data-science-in-python","tag-datascience","tag-denetimsiz-ogrenme","tag-derin-ogrenme","tag-dimension-reduction","tag-dimension-reduction-with-pca","tag-dimensionel-reduction","tag-example-of-supervised-learning","tag-example-of-unsupervised-learning","tag-inertia-measures","tag-iris-dataset-examle","tag-k-means-example","tag-kmeans-example","tag-kmeans-example-in-python","tag-makina-ogrenmesi","tag-matplotlib","tag-numpy","tag-numpy-array-in-python","tag-pandas","tag-pca","tag-principal-component","tag-principal-component-analys","tag-principal-component-analysis-pca","tag-python-advance-clustering","tag-python-classification","tag-python-clustering","tag-python-clustering-ornekleri","tag-python-cross-validation","tag-python-data-science","tag-python-deep-learning","tag-python-example-of-unsupervised-learning","tag-python-iris-dataset","tag-python-k-means","tag-python-k-means-examle","tag-python-k-means-ornek","tag-python-kmeans","tag-python-kmeans-example","tag-python-knn-ornekleri","tag-python-kumeleme-ornegi","tag-python-machine-learning-example","tag-python-makina-ogrenmesi","tag-python-matplotlib","tag-python-pca","tag-python-sklearn","tag-python-supervised-learning-example","tag-python-unlabeled-data","tag-python-unsupervised-learning-example","tag-python-unsupervised-learning-uygulamalari","tag-sklearn-clustering","tag-sklearn-cluster-kmeans","tag-supervised-learning","tag-t-sne","tag-t-sne-visualization","tag-unsupervised-learning","tag-unsupervised-learning-classification","tag-unsupervised-learning-clustering","tag-veri-bilimi"],"aioseo_notices":[],"jetpack_featured_media_url":"https:\/\/ittutorial.org\/wp-content\/uploads\/2020\/09\/indir.png","jetpack_sharing_enabled":true,"amp_enabled":true,"_links":{"self":[{"href":"https:\/\/ittutorial.org\/wp-json\/wp\/v2\/posts\/19059","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/ittutorial.org\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/ittutorial.org\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/ittutorial.org\/wp-json\/wp\/v2\/users\/67"}],"replies":[{"embeddable":true,"href":"https:\/\/ittutorial.org\/wp-json\/wp\/v2\/comments?post=19059"}],"version-history":[{"count":8,"href":"https:\/\/ittutorial.org\/wp-json\/wp\/v2\/posts\/19059\/revisions"}],"predecessor-version":[{"id":19431,"href":"https:\/\/ittutorial.org\/wp-json\/wp\/v2\/posts\/19059\/revisions\/19431"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/ittutorial.org\/wp-json\/wp\/v2\/media\/18628"}],"wp:attachment":[{"href":"https:\/\/ittutorial.org\/wp-json\/wp\/v2\/media?parent=19059"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/ittutorial.org\/wp-json\/wp\/v2\/categories?post=19059"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/ittutorial.org\/wp-json\/wp\/v2\/tags?post=19059"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}