{"@attributes":{"version":"2.0"},"channel":{"title":"PyVideo.org - mllib","link":"https:\/\/pyvideo.org\/","description":{},"lastBuildDate":"Tue, 31 May 2016 00:00:00 +0000","item":{"title":"Machine Learning at Scale","link":"https:\/\/pyvideo.org\/pydata-berlin-2016\/machine-learning-at-scale.html","description":"<h3>Description<\/h3><p>PyData Berlin 2016<\/p>\n<p>Python machine learning libraries like scikit-learn are a fantastic resource but not always well suited to large datasets. How can we use Python for machine learning in such cases? This talk will introduce PySpark and MLlib as tools for distributed machine learning. We will discuss what these tools are, how they work, and cover some basic code examples of machine learning on a cluster.<\/p>\n<ol class=\"arabic simple\">\n<li><dl class=\"first docutils\">\n<dt>Intro<\/dt>\n<dd><ol class=\"first last loweralpha\">\n<li>Why is scikit-learn not enough?<\/li>\n<li>What is Spark?<\/li>\n<li>What is MLlib?<\/li>\n<\/ol>\n<\/dd>\n<\/dl>\n<\/li>\n<li><dl class=\"first docutils\">\n<dt>Spark<\/dt>\n<dd><ol class=\"first last loweralpha\">\n<li>Overview of Spark<\/li>\n<li>Overview of PySpark<\/li>\n<li>PySpark code sample<\/li>\n<\/ol>\n<\/dd>\n<\/dl>\n<\/li>\n<li><dl class=\"first docutils\">\n<dt>MLlib<\/dt>\n<dd><ol class=\"first last loweralpha\">\n<li>Overview of MLlib<\/li>\n<li>MLlib code samples<\/li>\n<\/ol>\n<\/dd>\n<\/dl>\n<\/li>\n<\/ol>\n","pubDate":"Tue, 31 May 2016 00:00:00 +0000","guid":"tag:pyvideo.org,2016-05-31:\/pydata-berlin-2016\/machine-learning-at-scale.html","category":["PyData Berlin 2016","scikit-learn","pyspark","mllib"]}}}