{"id":2628,"date":"2015-03-10T13:15:32","date_gmt":"2015-03-10T11:15:32","guid":{"rendered":"http:\/\/www.webcodegeeks.com\/?p=2628"},"modified":"2015-03-08T11:41:11","modified_gmt":"2015-03-08T09:41:11","slug":"python-scikit-learn-training-classifier-non-numeric-features","status":"publish","type":"post","link":"https:\/\/www.webcodegeeks.com\/python\/python-scikit-learn-training-classifier-non-numeric-features\/","title":{"rendered":"Python: scikit-learn \u2013 Training a classifier with non numeric features"},"content":{"rendered":"<p>Following on from my <a href=\"http:\/\/www.markhneedham.com\/blog\/2015\/02\/20\/pythonscikit-learn-detecting-which-sentences-in-a-transcript-contain-a-speaker\/\">previous<\/a> <a href=\"http:\/\/www.markhneedham.com\/blog\/2015\/02\/24\/pythonnltk-naive-vs-naive-bayes-vs-decision-tree\/\">posts<\/a> on <a href=\"http:\/\/www.markhneedham.com\/blog\/2015\/03\/01\/python-detecting-the-speaker-in-himym-using-parts-of-speech-pos-tagging\/\">training a classifier<\/a> to pick out the speaker in sentences of <a href=\"http:\/\/en.wikipedia.org\/wiki\/How_I_Met_Your_Mother\">HIMYM<\/a> transcripts the next thing to do was train a random forest of decision trees to see how that fared.<\/p>\n<p>I\u2019ve <a href=\"http:\/\/www.markhneedham.com\/blog\/2013\/11\/09\/python-making-scikit-learn-and-pandas-play-nice\/\">used scikit-learn for this before<\/a> so I decided to use that. However, before building a random forest I wanted to check that I could build an equivalent <a href=\"http:\/\/scikit-learn.org\/stable\/modules\/tree.html\">decision tree<\/a>.<\/p>\n<p>I initially thought that scikit-learn\u2019s DecisionTree classifier would take in data in the same format as nltk\u2019s so I started out with the following code:<br \/>\n&nbsp;<br \/>\n&nbsp;<\/p>\n<pre class=\"brush: python; title: ; notranslate\" title=\"\">import json\r\nimport nltk\r\nimport collections\r\n \r\nfrom himymutil.ml import pos_features\r\nfrom sklearn import tree\r\nfrom sklearn.cross_validation import train_test_split\r\n \r\nwith open('data\/import\/trained_sentences.json', 'r') as json_file:\r\n    json_data = json.load(json_file)\r\n \r\ntagged_sents = &#x5B;]\r\nfor sentence in json_data:\r\n    tagged_sents.append(&#x5B;(word&#x5B;'word'], word&#x5B;'speaker']) for word in sentence&#x5B;'words']])\r\n \r\nfeaturesets = &#x5B;]\r\nfor tagged_sent in tagged_sents:\r\n    untagged_sent = nltk.tag.untag(tagged_sent)\r\n    sentence_pos = nltk.pos_tag(untagged_sent)\r\n    for i, (word, tag) in enumerate(tagged_sent):\r\n        featuresets.append((pos_features(untagged_sent, sentence_pos, i), tag) )\r\n \r\nclf = tree.DecisionTreeClassifier()\r\n \r\ntrain_data, test_data = train_test_split(featuresets, test_size=0.20, train_size=0.80)\r\n \r\n&gt;&gt;&gt; train_data&#x5B;1]\r\n({'word': u'your', 'word-pos': 'PRP$', 'next-word-pos': 'NN', 'prev-word-pos': 'VB', 'prev-word': u'throw', 'next-word': u'body'}, False)\r\n \r\n&gt;&gt;&gt; clf.fit(&#x5B;item&#x5B;0] for item in train_data], &#x5B;item&#x5B;1] for item in train_data])\r\nTraceback (most recent call last):\r\n  File '&lt;stdin&gt;', line 1, in &lt;module&gt;\r\n  File '\/Users\/markneedham\/projects\/neo4j-himym\/himym\/lib\/python2.7\/site-packages\/sklearn\/tree\/tree.py', line 137, in fit\r\n    X, = check_arrays(X, dtype=DTYPE, sparse_format='dense')\r\n  File '\/Users\/markneedham\/projects\/neo4j-himym\/himym\/lib\/python2.7\/site-packages\/sklearn\/utils\/validation.py', line 281, in check_arrays\r\n    array = np.asarray(array, dtype=dtype)\r\n  File '\/Users\/markneedham\/projects\/neo4j-himym\/himym\/lib\/python2.7\/site-packages\/numpy\/core\/numeric.py', line 460, in asarray\r\n    return array(a, dtype, copy=False, order=order)\r\nTypeError: float() argument must be a string or a number<\/pre>\n<p>In fact, the classifier can only deal with numeric features so we need to <a href=\"http:\/\/scikit-learn.org\/dev\/modules\/feature_extraction.html#loading-features-from-dicts\">translate our features into that format using DictVectorizer<\/a>.<\/p>\n<pre class=\"brush: python; title: ; notranslate\" title=\"\">from sklearn.feature_extraction import DictVectorizer\r\n \r\nvec = DictVectorizer()\r\nX = vec.fit_transform(&#x5B;item&#x5B;0] for item in featuresets]).toarray()\r\n \r\n&gt;&gt;&gt; len(X)\r\n13016\r\n \r\n&gt;&gt;&gt; len(X&#x5B;0])\r\n7302\r\n \r\n&gt;&gt;&gt; vec.get_feature_names()&#x5B;10:15]\r\n&#x5B;'next-word-pos=EX', 'next-word-pos=IN', 'next-word-pos=JJ', 'next-word-pos=JJR', 'next-word-pos=JJS']<\/pre>\n<p>We end up with one feature for every key\/value combination that exists in <cite>featuresets<\/cite>.<\/p>\n<p>I was initially confused about how to split up <a href=\"http:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.cross_validation.train_test_split.html\">training and test data sets<\/a> but it\u2019s actually fairly easy \u2013 <cite>train_test_split<\/cite> allows us to pass in multiple lists which it splits along the same seam:<\/p>\n<pre class=\"brush: python; title: ; notranslate\" title=\"\">vec = DictVectorizer()\r\nX = vec.fit_transform(&#x5B;item&#x5B;0] for item in featuresets]).toarray()\r\nY = &#x5B;item&#x5B;1] for item in featuresets]\r\nX_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.20, train_size=0.80)<\/pre>\n<p>Next we want to train the classifier which is a couple of lines of code:<\/p>\n<pre class=\"brush: python; title: ; notranslate\" title=\"\">clf = tree.DecisionTreeClassifier()\r\nclf = clf.fit(X_train, Y_train)<\/pre>\n<p>I wrote the following function to assess the classifier:<\/p>\n<pre class=\"brush: python; title: ; notranslate\" title=\"\">import collections\r\nimport nltk\r\n \r\ndef assess(text, predictions_actual):\r\n    refsets = collections.defaultdict(set)\r\n    testsets = collections.defaultdict(set)\r\n    for i, (prediction, actual) in enumerate(predictions_actual):\r\n        refsets&#x5B;actual].add(i)\r\n        testsets&#x5B;prediction].add(i)\r\n    speaker_precision = nltk.metrics.precision(refsets&#x5B;True], testsets&#x5B;True])\r\n    speaker_recall = nltk.metrics.recall(refsets&#x5B;True], testsets&#x5B;True])\r\n    non_speaker_precision = nltk.metrics.precision(refsets&#x5B;False], testsets&#x5B;False])\r\n    non_speaker_recall = nltk.metrics.recall(refsets&#x5B;False], testsets&#x5B;False])\r\n    return &#x5B;text, speaker_precision, speaker_recall, non_speaker_precision, non_speaker_recall]<\/pre>\n<p>We can call it like so:<\/p>\n<pre class=\"brush: python; title: ; notranslate\" title=\"\">predictions = clf.predict(X_test)\r\nassessment = assess('Decision Tree', zip(predictions, Y_test))\r\n \r\n&gt;&gt;&gt; assessment\r\n&#x5B;'Decision Tree', 0.9459459459459459, 0.9210526315789473, 0.9970134395221503, 0.9980069755854509]<\/pre>\n<p>Those values are in the same ball park as we\u2019ve seen with the nltk classifier so I\u2019m happy it\u2019s all wired up correctly.<\/p>\n<p>The last thing I wanted to do was <a href=\"http:\/\/stackoverflow.com\/questions\/23557545\/how-to-explain-the-decision-tree-from-scikit-learn\">visualise the decision tree<\/a> that had been created and the easiest way to do that is <a href=\"http:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.tree.export_graphviz.html\">export the classifier to DOT format<\/a> and then use graphviz to create an image:<\/p>\n<pre class=\"brush: python; title: ; notranslate\" title=\"\">with open('\/tmp\/decisionTree.dot', 'w') as file:\r\n    tree.export_graphviz(clf, out_file = file, feature_names = vec.get_feature_names())<\/pre>\n<pre class=\"brush: bash; title: ; notranslate\" title=\"\">dot -Tpng \/tmp\/decisionTree.dot -o \/tmp\/decisionTree.png<\/pre>\n<p>The decision tree is quite a few levels deep so here\u2019s part of it:<\/p>\n<p><a href=\"http:\/\/www.webcodegeeks.com\/wp-content\/uploads\/2015\/03\/decisionTreeSection.png\"><img decoding=\"async\" class=\"aligncenter wp-image-2643 size-full\" src=\"http:\/\/www.webcodegeeks.com\/wp-content\/uploads\/2015\/03\/decisionTreeSection.png\" alt=\"decisionTreeSection\" width=\"600\" height=\"343\" srcset=\"https:\/\/www.webcodegeeks.com\/wp-content\/uploads\/2015\/03\/decisionTreeSection.png 600w, https:\/\/www.webcodegeeks.com\/wp-content\/uploads\/2015\/03\/decisionTreeSection-300x172.png 300w\" sizes=\"(max-width: 600px) 100vw, 600px\" \/><\/a><\/p>\n<p>The <a href=\"https:\/\/github.com\/mneedham\/neo4j-himym\/blob\/master\/scripts\/scikit_dt.py\">full script is on github<\/a> if you want to play around with it.<\/p>\n<div class=\"attribution\">\n<table>\n<tbody>\n<tr>\n<td><span class=\"reference\">Reference: <\/span><\/td>\n<td><a href=\"http:\/\/www.markhneedham.com\/blog\/2015\/03\/02\/python-scikit-learn-training-a-classifier-with-non-numeric-features\/\">Python: scikit-learn \u2013 Training a classifier with non numeric features<\/a> from our <a href=\"http:\/\/www.webcodegeeks.com\/wcg\/\">WCG partner<\/a> Mark Needham at the <a href=\"http:\/\/www.markhneedham.com\/blog\/\">Mark Needham Blog<\/a> blog.<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>Following on from my previous posts on training a classifier to pick out the speaker in sentences of HIMYM transcripts the next thing to do was train a random forest of decision trees to see how that fared. I\u2019ve used scikit-learn for this before so I decided to use that. However, before building a random &hellip;<\/p>\n","protected":false},"author":48,"featured_media":1651,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[53],"tags":[],"class_list":["post-2628","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-python"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v26.5 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Python: scikit-learn \u2013 Training a classifier with non numeric features - Web Code Geeks - 2026<\/title>\n<meta name=\"description\" content=\"Following on from my previous posts on training a classifier to pick out the speaker in sentences of HIMYM transcripts the next thing to do was train a\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.webcodegeeks.com\/python\/python-scikit-learn-training-classifier-non-numeric-features\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Python: scikit-learn \u2013 Training a classifier with non numeric features - Web Code Geeks - 2026\" \/>\n<meta property=\"og:description\" content=\"Following on from my previous posts on training a classifier to pick out the speaker in sentences of HIMYM transcripts the next thing to do was train a\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.webcodegeeks.com\/python\/python-scikit-learn-training-classifier-non-numeric-features\/\" \/>\n<meta property=\"og:site_name\" content=\"Web Code Geeks\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/webcodegeeks\" \/>\n<meta property=\"article:published_time\" content=\"2015-03-10T11:15:32+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.webcodegeeks.com\/wp-content\/uploads\/2014\/11\/python-logo.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"150\" \/>\n\t<meta property=\"og:image:height\" content=\"150\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"Mark Needham\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@webcodegeeks\" \/>\n<meta name=\"twitter:site\" content=\"@webcodegeeks\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Mark Needham\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"4 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/www.webcodegeeks.com\/python\/python-scikit-learn-training-classifier-non-numeric-features\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/www.webcodegeeks.com\/python\/python-scikit-learn-training-classifier-non-numeric-features\/\"},\"author\":{\"name\":\"Mark Needham\",\"@id\":\"https:\/\/www.webcodegeeks.com\/#\/schema\/person\/848a54e2ee724e46069ce36c2e52e98e\"},\"headline\":\"Python: scikit-learn \u2013 Training a classifier with non numeric features\",\"datePublished\":\"2015-03-10T11:15:32+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/www.webcodegeeks.com\/python\/python-scikit-learn-training-classifier-non-numeric-features\/\"},\"wordCount\":815,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/www.webcodegeeks.com\/#organization\"},\"image\":{\"@id\":\"https:\/\/www.webcodegeeks.com\/python\/python-scikit-learn-training-classifier-non-numeric-features\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.webcodegeeks.com\/wp-content\/uploads\/2014\/11\/python-logo.jpg\",\"articleSection\":[\"Python\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/www.webcodegeeks.com\/python\/python-scikit-learn-training-classifier-non-numeric-features\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.webcodegeeks.com\/python\/python-scikit-learn-training-classifier-non-numeric-features\/\",\"url\":\"https:\/\/www.webcodegeeks.com\/python\/python-scikit-learn-training-classifier-non-numeric-features\/\",\"name\":\"Python: scikit-learn \u2013 Training a classifier with non numeric features - Web Code Geeks - 2026\",\"isPartOf\":{\"@id\":\"https:\/\/www.webcodegeeks.com\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/www.webcodegeeks.com\/python\/python-scikit-learn-training-classifier-non-numeric-features\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/www.webcodegeeks.com\/python\/python-scikit-learn-training-classifier-non-numeric-features\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.webcodegeeks.com\/wp-content\/uploads\/2014\/11\/python-logo.jpg\",\"datePublished\":\"2015-03-10T11:15:32+00:00\",\"description\":\"Following on from my previous posts on training a classifier to pick out the speaker in sentences of HIMYM transcripts the next thing to do was train a\",\"breadcrumb\":{\"@id\":\"https:\/\/www.webcodegeeks.com\/python\/python-scikit-learn-training-classifier-non-numeric-features\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.webcodegeeks.com\/python\/python-scikit-learn-training-classifier-non-numeric-features\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.webcodegeeks.com\/python\/python-scikit-learn-training-classifier-non-numeric-features\/#primaryimage\",\"url\":\"https:\/\/www.webcodegeeks.com\/wp-content\/uploads\/2014\/11\/python-logo.jpg\",\"contentUrl\":\"https:\/\/www.webcodegeeks.com\/wp-content\/uploads\/2014\/11\/python-logo.jpg\",\"width\":150,\"height\":150},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.webcodegeeks.com\/python\/python-scikit-learn-training-classifier-non-numeric-features\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.webcodegeeks.com\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Python\",\"item\":\"https:\/\/www.webcodegeeks.com\/category\/python\/\"},{\"@type\":\"ListItem\",\"position\":3,\"name\":\"Python: scikit-learn \u2013 Training a classifier with non numeric features\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.webcodegeeks.com\/#website\",\"url\":\"https:\/\/www.webcodegeeks.com\/\",\"name\":\"Web Code Geeks\",\"description\":\"Web Developers Resource Center\",\"publisher\":{\"@id\":\"https:\/\/www.webcodegeeks.com\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.webcodegeeks.com\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/www.webcodegeeks.com\/#organization\",\"name\":\"Exelixis Media P.C.\",\"url\":\"https:\/\/www.webcodegeeks.com\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.webcodegeeks.com\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/www.webcodegeeks.com\/wp-content\/uploads\/2022\/06\/exelixis-logo.png\",\"contentUrl\":\"https:\/\/www.webcodegeeks.com\/wp-content\/uploads\/2022\/06\/exelixis-logo.png\",\"width\":864,\"height\":246,\"caption\":\"Exelixis Media P.C.\"},\"image\":{\"@id\":\"https:\/\/www.webcodegeeks.com\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/www.facebook.com\/webcodegeeks\",\"https:\/\/x.com\/webcodegeeks\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/www.webcodegeeks.com\/#\/schema\/person\/848a54e2ee724e46069ce36c2e52e98e\",\"name\":\"Mark Needham\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.webcodegeeks.com\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/5489baed26ce2d932bf951ecfb47afe80bec45d3648c23521d87c83b8f1c3ea9?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/5489baed26ce2d932bf951ecfb47afe80bec45d3648c23521d87c83b8f1c3ea9?s=96&d=mm&r=g\",\"caption\":\"Mark Needham\"},\"sameAs\":[\"http:\/\/www.markhneedham.com\/blog\/\"],\"url\":\"https:\/\/www.webcodegeeks.com\/author\/mark-needham\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Python: scikit-learn \u2013 Training a classifier with non numeric features - Web Code Geeks - 2026","description":"Following on from my previous posts on training a classifier to pick out the speaker in sentences of HIMYM transcripts the next thing to do was train a","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.webcodegeeks.com\/python\/python-scikit-learn-training-classifier-non-numeric-features\/","og_locale":"en_US","og_type":"article","og_title":"Python: scikit-learn \u2013 Training a classifier with non numeric features - Web Code Geeks - 2026","og_description":"Following on from my previous posts on training a classifier to pick out the speaker in sentences of HIMYM transcripts the next thing to do was train a","og_url":"https:\/\/www.webcodegeeks.com\/python\/python-scikit-learn-training-classifier-non-numeric-features\/","og_site_name":"Web Code Geeks","article_publisher":"https:\/\/www.facebook.com\/webcodegeeks","article_published_time":"2015-03-10T11:15:32+00:00","og_image":[{"width":150,"height":150,"url":"https:\/\/www.webcodegeeks.com\/wp-content\/uploads\/2014\/11\/python-logo.jpg","type":"image\/jpeg"}],"author":"Mark Needham","twitter_card":"summary_large_image","twitter_creator":"@webcodegeeks","twitter_site":"@webcodegeeks","twitter_misc":{"Written by":"Mark Needham","Est. reading time":"4 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.webcodegeeks.com\/python\/python-scikit-learn-training-classifier-non-numeric-features\/#article","isPartOf":{"@id":"https:\/\/www.webcodegeeks.com\/python\/python-scikit-learn-training-classifier-non-numeric-features\/"},"author":{"name":"Mark Needham","@id":"https:\/\/www.webcodegeeks.com\/#\/schema\/person\/848a54e2ee724e46069ce36c2e52e98e"},"headline":"Python: scikit-learn \u2013 Training a classifier with non numeric features","datePublished":"2015-03-10T11:15:32+00:00","mainEntityOfPage":{"@id":"https:\/\/www.webcodegeeks.com\/python\/python-scikit-learn-training-classifier-non-numeric-features\/"},"wordCount":815,"commentCount":0,"publisher":{"@id":"https:\/\/www.webcodegeeks.com\/#organization"},"image":{"@id":"https:\/\/www.webcodegeeks.com\/python\/python-scikit-learn-training-classifier-non-numeric-features\/#primaryimage"},"thumbnailUrl":"https:\/\/www.webcodegeeks.com\/wp-content\/uploads\/2014\/11\/python-logo.jpg","articleSection":["Python"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.webcodegeeks.com\/python\/python-scikit-learn-training-classifier-non-numeric-features\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/www.webcodegeeks.com\/python\/python-scikit-learn-training-classifier-non-numeric-features\/","url":"https:\/\/www.webcodegeeks.com\/python\/python-scikit-learn-training-classifier-non-numeric-features\/","name":"Python: scikit-learn \u2013 Training a classifier with non numeric features - Web Code Geeks - 2026","isPartOf":{"@id":"https:\/\/www.webcodegeeks.com\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.webcodegeeks.com\/python\/python-scikit-learn-training-classifier-non-numeric-features\/#primaryimage"},"image":{"@id":"https:\/\/www.webcodegeeks.com\/python\/python-scikit-learn-training-classifier-non-numeric-features\/#primaryimage"},"thumbnailUrl":"https:\/\/www.webcodegeeks.com\/wp-content\/uploads\/2014\/11\/python-logo.jpg","datePublished":"2015-03-10T11:15:32+00:00","description":"Following on from my previous posts on training a classifier to pick out the speaker in sentences of HIMYM transcripts the next thing to do was train a","breadcrumb":{"@id":"https:\/\/www.webcodegeeks.com\/python\/python-scikit-learn-training-classifier-non-numeric-features\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.webcodegeeks.com\/python\/python-scikit-learn-training-classifier-non-numeric-features\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.webcodegeeks.com\/python\/python-scikit-learn-training-classifier-non-numeric-features\/#primaryimage","url":"https:\/\/www.webcodegeeks.com\/wp-content\/uploads\/2014\/11\/python-logo.jpg","contentUrl":"https:\/\/www.webcodegeeks.com\/wp-content\/uploads\/2014\/11\/python-logo.jpg","width":150,"height":150},{"@type":"BreadcrumbList","@id":"https:\/\/www.webcodegeeks.com\/python\/python-scikit-learn-training-classifier-non-numeric-features\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.webcodegeeks.com\/"},{"@type":"ListItem","position":2,"name":"Python","item":"https:\/\/www.webcodegeeks.com\/category\/python\/"},{"@type":"ListItem","position":3,"name":"Python: scikit-learn \u2013 Training a classifier with non numeric features"}]},{"@type":"WebSite","@id":"https:\/\/www.webcodegeeks.com\/#website","url":"https:\/\/www.webcodegeeks.com\/","name":"Web Code Geeks","description":"Web Developers Resource Center","publisher":{"@id":"https:\/\/www.webcodegeeks.com\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.webcodegeeks.com\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.webcodegeeks.com\/#organization","name":"Exelixis Media P.C.","url":"https:\/\/www.webcodegeeks.com\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.webcodegeeks.com\/#\/schema\/logo\/image\/","url":"https:\/\/www.webcodegeeks.com\/wp-content\/uploads\/2022\/06\/exelixis-logo.png","contentUrl":"https:\/\/www.webcodegeeks.com\/wp-content\/uploads\/2022\/06\/exelixis-logo.png","width":864,"height":246,"caption":"Exelixis Media P.C."},"image":{"@id":"https:\/\/www.webcodegeeks.com\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/webcodegeeks","https:\/\/x.com\/webcodegeeks"]},{"@type":"Person","@id":"https:\/\/www.webcodegeeks.com\/#\/schema\/person\/848a54e2ee724e46069ce36c2e52e98e","name":"Mark Needham","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.webcodegeeks.com\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/5489baed26ce2d932bf951ecfb47afe80bec45d3648c23521d87c83b8f1c3ea9?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/5489baed26ce2d932bf951ecfb47afe80bec45d3648c23521d87c83b8f1c3ea9?s=96&d=mm&r=g","caption":"Mark Needham"},"sameAs":["http:\/\/www.markhneedham.com\/blog\/"],"url":"https:\/\/www.webcodegeeks.com\/author\/mark-needham\/"}]}},"_links":{"self":[{"href":"https:\/\/www.webcodegeeks.com\/wp-json\/wp\/v2\/posts\/2628","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.webcodegeeks.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.webcodegeeks.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.webcodegeeks.com\/wp-json\/wp\/v2\/users\/48"}],"replies":[{"embeddable":true,"href":"https:\/\/www.webcodegeeks.com\/wp-json\/wp\/v2\/comments?post=2628"}],"version-history":[{"count":0,"href":"https:\/\/www.webcodegeeks.com\/wp-json\/wp\/v2\/posts\/2628\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.webcodegeeks.com\/wp-json\/wp\/v2\/media\/1651"}],"wp:attachment":[{"href":"https:\/\/www.webcodegeeks.com\/wp-json\/wp\/v2\/media?parent=2628"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.webcodegeeks.com\/wp-json\/wp\/v2\/categories?post=2628"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.webcodegeeks.com\/wp-json\/wp\/v2\/tags?post=2628"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}