{"id":18658,"date":"2020-09-07T13:44:14","date_gmt":"2020-09-07T13:44:14","guid":{"rendered":"https:\/\/ittutorial.org\/?p=18658"},"modified":"2020-09-28T12:32:36","modified_gmt":"2020-09-28T12:32:36","slug":"python-unsupervised-learning-3","status":"publish","type":"post","link":"https:\/\/ittutorial.org\/python-unsupervised-learning-3\/","title":{"rendered":"Transforming Features For Better Clustering | Python Unsupervised Learning -3"},"content":{"rendered":"<p>Hi, we continue where we left off on Unsupervised Learning. I recommend that you read our previous article before moving on to this article.<\/p>\n<blockquote class=\"wp-embedded-content\" data-secret=\"KfOf10qKbB\"><p><a href=\"https:\/\/ittutorial.org\/python-unsupervised-learning\/\">Evaluating a Clustering | Python Unsupervised Learning -2<\/a><\/p><\/blockquote>\n<p><iframe loading=\"lazy\" class=\"wp-embedded-content\" sandbox=\"allow-scripts\" security=\"restricted\" style=\"position: absolute; clip: rect(1px, 1px, 1px, 1px);\" title=\"&#8220;Evaluating a Clustering | Python Unsupervised Learning -2&#8221; &#8212; IT Tutorial\" src=\"https:\/\/ittutorial.org\/python-unsupervised-learning\/embed\/#?secret=0YjOiBgTAv#?secret=KfOf10qKbB\" data-secret=\"KfOf10qKbB\" width=\"600\" height=\"338\" frameborder=\"0\" marginwidth=\"0\" marginheight=\"0\" scrolling=\"no\"><\/iframe><\/p>\n<p>&nbsp;<\/p>\n<h1>Transforming Features For Better Clustering<\/h1>\n<h1><\/h1>\n<p>Let&#8217;s look now at another dataset, the Piedmont wines dataset.<\/p>\n<ul>\n<li>178 samples from 3 distinct varieties of red wine: Barolo, Grignolino and Barbera<\/li>\n<li>Features measure chemical composition e.g. alcohol content<\/li>\n<li>Visual properties like color intensity<\/li>\n<\/ul>\n<h4>Clustering the wines:<\/h4>\n<p>If you remember from our previous article, our cluster operations gave good results as a result of crosstabulation. Let&#8217;s write a new example with the Wine data and examine the results.<\/p>\n<pre>from sklearn.cluster import KMeans\r\nfrom sklearn.datasets import load_wine\r\nwine = load_wine()\r\nmodel = KMeans(n_clusters=3)\r\nlabels = model.fit_predict(wine.data)<\/pre>\n<p>&nbsp;<\/p>\n<pre>df = pd.DataFrame({'labels':labels})<\/pre>\n<p>&nbsp;<\/p>\n<pre>def species(theta):\r\nif theta ==0:\r\nreturn data.target_names[0]\r\nelif theta == 1:\r\nreturn data.target_names[1]\r\nelse:\r\nreturn data.target_names[2]<\/pre>\n<pre>df[\"species\"] = [species(theta) for theta in data.target]<\/pre>\n<p>&nbsp;<\/p>\n<pre>cross_tab = pd.crosstab(df[\"labels\"],df[\"species\"])\r\ncross_tab<\/pre>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-18719\" src=\"https:\/\/ittutorial.org\/wp-content\/uploads\/2020\/09\/Screenshot_17.png\" alt=\"\" width=\"965\" height=\"513\" srcset=\"https:\/\/ittutorial.org\/wp-content\/uploads\/2020\/09\/Screenshot_17.png 314w, https:\/\/ittutorial.org\/wp-content\/uploads\/2020\/09\/Screenshot_17-300x160.png 300w, https:\/\/ittutorial.org\/wp-content\/uploads\/2020\/09\/Screenshot_17-310x165.png 310w\" sizes=\"auto, (max-width: 965px) 100vw, 965px\" \/><\/p>\n<p>As you can see, this time things haven&#8217;t worked out so well.\u00a0 The KMeans clusters don&#8217;t correspond well with the wine varieties.<\/p>\n<h3>Feature variances<\/h3>\n<ul>\n<li>The wine features have very different variances!<\/li>\n<li>Variance of a feature measures spread of its values<\/li>\n<\/ul>\n<h1>Transforming Features For Better Clustering<\/h1>\n<p>&nbsp;<\/p>\n<h2>StandartScaler<\/h2>\n<ul>\n<li>In KMeans: feature variance = feature influence<\/li>\n<\/ul>\n<p>To give every feature a chance the data needs to be transformed so that features have equal variance. This can be achieved with the <span style=\"color: #008000\">StandartScaler<\/span> from<strong> scikit-learn.\u00a0<\/strong>It transforms every feature to have mean 0 and variance 1.<\/p>\n<p>The resulting &#8220;standadized&#8221; features can be very informative.<\/p>\n<p>&nbsp;<\/p>\n<p>Let&#8217;s practice,<\/p>\n<pre>from sklearn.preprocessing import StandardScaler\r\nscaler = StandardScaler()\r\nscaler.fit(wine.data)\r\nStandardScaler(copy=True, with_mean=True, with_std=True)\r\nwine_scaled = scaler.transform(wine.data)<\/pre>\n<p>&nbsp;<\/p>\n<p>The transform method can now be used to standardize any samples, either the same ones, or completely new ones.<\/p>\n<h2>Similar Methods<\/h2>\n<ul>\n<li>StandardScaler and KMeans have similar methods<\/li>\n<li>Use <span style=\"color: #008000\">fit()<\/span> \/ <span style=\"color: #008000\">transform()<\/span> with <span style=\"color: #008000\">StandardScaler<\/span><\/li>\n<li>Use <span style=\"color: #008000\">fit()<\/span> \/ <span style=\"color: #008000\">predict()<\/span> with<span style=\"color: #008000\"> KMeans<\/span><\/li>\n<\/ul>\n<p>&nbsp;<\/p>\n<h1>Piplines combine multiple steps<\/h1>\n<pre>from sklearn.preprocessing import StandardScaler\r\nfrom sklearn.cluster import KMeans\r\n\r\nscaler = StandardScaler()\r\nkmeans = KMeans(n_clusters=3)\r\n\r\nfrom sklearn.pipeline import make_pipeline\r\npipline = make_pipeline(scaler,kmeans)\r\npipline.fit(wine.data)<\/pre>\n<p>&nbsp;<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-18722\" src=\"https:\/\/ittutorial.org\/wp-content\/uploads\/2020\/09\/Screenshot_18.png\" alt=\"\" width=\"980\" height=\"453\" srcset=\"https:\/\/ittutorial.org\/wp-content\/uploads\/2020\/09\/Screenshot_18.png 582w, https:\/\/ittutorial.org\/wp-content\/uploads\/2020\/09\/Screenshot_18-300x139.png 300w\" sizes=\"auto, (max-width: 980px) 100vw, 980px\" \/><\/p>\n<pre>labels = pipline.predict(wine.data)<\/pre>\n<pre>cross_tab = pd.crosstab(df[\"labels\"],df[\"species\"])\r\ncross_tab<\/pre>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-18723\" src=\"https:\/\/ittutorial.org\/wp-content\/uploads\/2020\/09\/Screenshot_19.png\" alt=\"\" width=\"993\" height=\"527\" srcset=\"https:\/\/ittutorial.org\/wp-content\/uploads\/2020\/09\/Screenshot_19.png 360w, https:\/\/ittutorial.org\/wp-content\/uploads\/2020\/09\/Screenshot_19-300x159.png 300w, https:\/\/ittutorial.org\/wp-content\/uploads\/2020\/09\/Screenshot_19-310x165.png 310w\" sizes=\"auto, (max-width: 993px) 100vw, 993px\" \/><\/p>\n<p>Checking the correspondence between the cluster labels and the wine varietes reveals that this new clustering, incorporating standardization, is fantastic.<\/p>\n<p>Its three clusters correspond almost exactly to the three wine varieties.\u00a0 This is a huge improvement on the clustering without standardization.<\/p>\n<p>See you in the next article.<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<blockquote class=\"wp-embedded-content\" data-secret=\"UzdaYqW2mb\"><p><a href=\"https:\/\/ittutorial.org\/unsupervised-learning\/\">k-means clustering | Python Unsupervised Learning -1<\/a><\/p><\/blockquote>\n<p><iframe loading=\"lazy\" class=\"wp-embedded-content\" sandbox=\"allow-scripts\" security=\"restricted\" style=\"position: absolute; clip: rect(1px, 1px, 1px, 1px);\" title=\"&#8220;k-means clustering | Python Unsupervised Learning -1&#8221; &#8212; IT Tutorial\" src=\"https:\/\/ittutorial.org\/unsupervised-learning\/embed\/#?secret=6OowGDG6N6#?secret=UzdaYqW2mb\" data-secret=\"UzdaYqW2mb\" width=\"600\" height=\"338\" frameborder=\"0\" marginwidth=\"0\" marginheight=\"0\" scrolling=\"no\"><\/iframe><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Hi, we continue where we left off on Unsupervised Learning. I recommend that you read our previous article before moving on to this article. Evaluating a Clustering | Python Unsupervised Learning -2 &nbsp; Transforming Features For Better Clustering Let&#8217;s look now at another dataset, the Piedmont wines dataset. 178 samples from 3 distinct varieties of &hellip;<\/p>\n","protected":false},"author":67,"featured_media":18628,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"om_disable_all_campaigns":false,"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"_uf_show_specific_survey":0,"_uf_disable_surveys":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[12904],"tags":[12362,12909,12905,12865,12898,12883,12864,12886,12889,12900,12899,12910,12876,12877,12878,12879,12888,12913,12907,12908,12906,12896,12872,12871,12891,12911,12868,12884,12897,12875,12869,12881,12894,12870,12880,12893,12895,12902,12890,12912,12914,12901,12885,12882,12892,12915,12916,12867,13635,12866,12874,12873,12887],"class_list":["post-18658","post","type-post","status-publish","format-standard","has-post-thumbnail","","category-data-science","tag-advance-python","tag-clustering-quality","tag-cross-tabulation","tag-data-science","tag-data-science-example-of-unsupervised-learning","tag-data-science-in-python","tag-datascience","tag-denetimsiz-ogrenme","tag-derin-ogrenme","tag-example-of-supervised-learning","tag-example-of-unsupervised-learning","tag-inertia-measures","tag-iris-dataset-examle","tag-k-means-example","tag-kmeans-example","tag-kmeans-example-in-python","tag-makina-ogrenmesi","tag-matplotlib","tag-numpy","tag-numpy-array-in-python","tag-pandas","tag-python-advance-clustering","tag-python-classification","tag-python-clustering","tag-python-clustering-ornekleri","tag-python-cross-validation","tag-python-data-science","tag-python-deep-learning","tag-python-example-of-unsupervised-learning","tag-python-iris-dataset","tag-python-k-means","tag-python-k-means-examle","tag-python-k-means-ornek","tag-python-kmeans","tag-python-kmeans-example","tag-python-knn-ornekleri","tag-python-kumeleme-ornegi","tag-python-machine-learning-example","tag-python-makina-ogrenmesi","tag-python-matplotlib","tag-python-sklearn","tag-python-supervised-learning-example","tag-python-unlabeled-data","tag-python-unsupervised-learning-example","tag-python-unsupervised-learning-uygulamalari","tag-sklearn-clustering","tag-sklearn-cluster-kmeans","tag-supervised-learning","tag-transforming-features-for-better-clustering","tag-unsupervised-learning","tag-unsupervised-learning-classification","tag-unsupervised-learning-clustering","tag-veri-bilimi"],"aioseo_notices":[],"jetpack_featured_media_url":"https:\/\/ittutorial.org\/wp-content\/uploads\/2020\/09\/indir.png","jetpack_sharing_enabled":true,"amp_enabled":true,"_links":{"self":[{"href":"https:\/\/ittutorial.org\/wp-json\/wp\/v2\/posts\/18658","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/ittutorial.org\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/ittutorial.org\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/ittutorial.org\/wp-json\/wp\/v2\/users\/67"}],"replies":[{"embeddable":true,"href":"https:\/\/ittutorial.org\/wp-json\/wp\/v2\/comments?post=18658"}],"version-history":[{"count":6,"href":"https:\/\/ittutorial.org\/wp-json\/wp\/v2\/posts\/18658\/revisions"}],"predecessor-version":[{"id":19439,"href":"https:\/\/ittutorial.org\/wp-json\/wp\/v2\/posts\/18658\/revisions\/19439"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/ittutorial.org\/wp-json\/wp\/v2\/media\/18628"}],"wp:attachment":[{"href":"https:\/\/ittutorial.org\/wp-json\/wp\/v2\/media?parent=18658"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/ittutorial.org\/wp-json\/wp\/v2\/categories?post=18658"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/ittutorial.org\/wp-json\/wp\/v2\/tags?post=18658"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}