{"id":374,"date":"2016-11-13T22:49:43","date_gmt":"2016-11-13T22:49:43","guid":{"rendered":"http:\/\/data36.com\/?p=374"},"modified":"2019-04-25T22:28:44","modified_gmt":"2019-04-25T22:28:44","slug":"data-collection","status":"publish","type":"post","link":"https:\/\/data36.com\/data-collection\/","title":{"rendered":"How data collection works"},"content":{"rendered":"\n<p>The first step for every data science project is data collection, that is, getting the actual raw data.<\/p>\n\n\n\n<p>There are two ways to do this:<\/p>\n\n\n\n<p><strong>A)<\/strong> You can pick one or more &#8220;smart tools&#8221; to use. These services will collect the data for you automatically. You only need to copy-paste a snippet of code into your website and you are ready to go. <em>(E.g. Google Analytics, Hotjar, Google Optimize, CrazyEgg, etc.)<\/em><\/p>\n\n\n\n<p><strong>B)<\/strong> You can collect the data for yourself. (E.g. via a javascript code snippet that sends data to a .csv plain text file on your server.) It&#8217;s a bit more difficult to implement since it requires some coding skills. But in the long term this solution will serve you much better (and it will be more profitable, too) than version A.<br>Why? For several reasons that I&#8217;ve already written about <a href=\"https:\/\/data36.com\/build-data-tools-google-analytics-vs-sql\/\">in this article<\/a>. But here&#8217;s a quick summary:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>You&#8217;ll have your own data &#8211; you won&#8217;t depend on Google Analytics, Hotjar, etc&#8230;<\/li><li>You&#8217;ll have one unified data warehouse. No need for integrations, API hacks, and so on.<\/li><li>There won&#8217;t be any limitations on how you can use your data or how you can connect different data points. <em>(E.g. you can&#8217;t use your raw data in Google Analytics to implement machine learning models, but you can do it if you have your own database.)<\/em><\/li><li>You can trust your data 100%. (No more black boxes. You know your data since you own it.)<\/li><li>Data server costs are significantly lower than 3rd party tools\u2019 monthly fees.<\/li><\/ul>\n\n\n\n<p>Either way you choose, <strong>it&#8217;s worth understanding how raw data collection works in general &#8212; and how you can collect data from your website visitors&#8217; behaviour.<\/strong><\/p>\n\n\n\n<p>Do it for yourself or using a 3rd party tool\u2026 very similar things are happening under the hood!<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>How does data collection work?<\/strong><\/h2>\n\n\n\n<p>Let&#8217;s go with the simplest example!<\/p>\n\n\n\n<p>You have a website and you would like to collect every visitor mouse click for an upcoming <a href=\"https:\/\/data36.com\/data-science-for-business\/\">data science<\/a> project.<\/p>\n\n\n\n<p>How do you do that?<\/p>\n\n\n\n<p>First, implement an invisible tracking script (aka &#8220;data collection script&#8221;) on every clickable element of your site! From that point on, when a website visitor clicks on a specific element (e.g. a link or a button), the click causes two things to happen:<\/p>\n\n\n\n<ol class=\"wp-block-list\"><li>Obviously, the button will do what it should do. E.g. it will land the user on the page she clicked.<\/li><li>The data collection script will send a small data package to your data warehouse.<\/li><\/ol>\n\n\n\n<figure class=\"wp-block-image\"><img fetchpriority=\"high\" decoding=\"async\" width=\"1024\" height=\"430\" src=\"https:\/\/data36.com\/wp-content\/uploads\/2016\/11\/data-collection-1-1024x430.png\" alt=\"data collection - tracking scripts send usage data to your data logs\" class=\"wp-image-377\" srcset=\"https:\/\/data36.com\/wp-content\/uploads\/2016\/11\/data-collection-1-1024x430.png 1024w, https:\/\/data36.com\/wp-content\/uploads\/2016\/11\/data-collection-1-300x126.png 300w, https:\/\/data36.com\/wp-content\/uploads\/2016\/11\/data-collection-1-768x322.png 768w, https:\/\/data36.com\/wp-content\/uploads\/2016\/11\/data-collection-1-973x408.png 973w, https:\/\/data36.com\/wp-content\/uploads\/2016\/11\/data-collection-1-508x213.png 508w, https:\/\/data36.com\/wp-content\/uploads\/2016\/11\/data-collection-1.png 1778w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><figcaption><em>tracking scripts send usage data to your data logs<\/em><\/figcaption><\/figure>\n\n\n\n<p>As simple as that.<\/p>\n\n\n\n<p>You could track every user interaction (let&#8217;s call them &#8220;<em>events<\/em>&#8220;) on your website (or in your mobile app): page views, feature usage, clicks, taps &#8212; even mouse movements, if you need to.<\/p>\n\n\n\n<p>A more general illustration to help you imagine what&#8217;s happening here:<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" width=\"1024\" height=\"767\" src=\"https:\/\/data36.com\/wp-content\/uploads\/2019\/04\/data-collection-scripts-send-data-from-the-front-end-to-production-and-data-servers-1024x767.png\" alt=\"data collection scripts send data from the front-end to production and data servers\" class=\"wp-image-3777\" srcset=\"https:\/\/data36.com\/wp-content\/uploads\/2019\/04\/data-collection-scripts-send-data-from-the-front-end-to-production-and-data-servers-1024x767.png 1024w, https:\/\/data36.com\/wp-content\/uploads\/2019\/04\/data-collection-scripts-send-data-from-the-front-end-to-production-and-data-servers-300x225.png 300w, https:\/\/data36.com\/wp-content\/uploads\/2019\/04\/data-collection-scripts-send-data-from-the-front-end-to-production-and-data-servers-768x575.png 768w, https:\/\/data36.com\/wp-content\/uploads\/2019\/04\/data-collection-scripts-send-data-from-the-front-end-to-production-and-data-servers-510x382.png 510w, https:\/\/data36.com\/wp-content\/uploads\/2019\/04\/data-collection-scripts-send-data-from-the-front-end-to-production-and-data-servers-1080x808.png 1080w, https:\/\/data36.com\/wp-content\/uploads\/2019\/04\/data-collection-scripts-send-data-from-the-front-end-to-production-and-data-servers-973x728.png 973w, https:\/\/data36.com\/wp-content\/uploads\/2019\/04\/data-collection-scripts-send-data-from-the-front-end-to-production-and-data-servers-508x380.png 508w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><figcaption><em>data collection scripts send data from the front-end to production <\/em><strong><em>and<\/em><\/strong><em> data servers\ufeff<\/em><\/figcaption><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>How to store the collected raw data<\/strong><\/h2>\n\n\n\n<p>When the collected raw data hits your data warehouse, it can be stored in different formats.<\/p>\n\n\n\n<p><strong>For startups the best format is the <\/strong><em><strong>plain text<\/strong><\/em><strong> format as it is very flexible.<\/strong> You can imagine this as a simple .<em>txt, .csv or .tsv<\/em> file with text in it. Many companies follow this model.<\/p>\n\n\n\n<p>But it&#8217;s also worth mentioning that many other companies (e.g. almost all multinational companies) like to collect their data directly into SQL databases (or to other similar structured formats). <\/p>\n\n\n\n<p>And there are several other ways to store your data. (Graph databases, noSQL databases, etc.)<\/p>\n\n\n\n<p>In this example I&#8217;ll keep it simple and will go with the most common solution: <strong>plain text format.<\/strong><\/p>\n\n\n\n<p>Remember that each <em>event<\/em> from a website visitor (e.g. a click on your website) creates one <em>line<\/em> of data using your previously implemented data collection scripts. This <em>line<\/em> goes into a file on your data server. We call this file with a bunch of <em>events<\/em> in it a <em>log<\/em>. You can have more than one log, but almost all of them will have the same format. Something like this:<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" width=\"1024\" height=\"357\" src=\"https:\/\/data36.com\/wp-content\/uploads\/2016\/11\/data-collection-sample-log-1024x357.png\" alt=\"data collection sample .csv plain text event log (email addresses removed)\" class=\"wp-image-380\" srcset=\"https:\/\/data36.com\/wp-content\/uploads\/2016\/11\/data-collection-sample-log-1024x357.png 1024w, https:\/\/data36.com\/wp-content\/uploads\/2016\/11\/data-collection-sample-log-300x104.png 300w, https:\/\/data36.com\/wp-content\/uploads\/2016\/11\/data-collection-sample-log-768x268.png 768w, https:\/\/data36.com\/wp-content\/uploads\/2016\/11\/data-collection-sample-log-973x339.png 973w, https:\/\/data36.com\/wp-content\/uploads\/2016\/11\/data-collection-sample-log-508x177.png 508w, https:\/\/data36.com\/wp-content\/uploads\/2016\/11\/data-collection-sample-log.png 1378w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><figcaption><em>sample .csv plain text event log (email addresses removed)<\/em><\/figcaption><\/figure>\n\n\n\n<p>Look messy?<\/p>\n\n\n\n<p>Maybe at first, but go through that column by column! (This is a .csv file which means that the field-separator is a semicolon.)<\/p>\n\n\n\n<ol class=\"wp-block-list\"><li>the date and the time: when the event happened<\/li><li>the event itself (in this case: &#8220;click&#8221;)<\/li><li>the specifics of the event, e.g. what exact button has been clicked<\/li><\/ol>\n\n\n\n<p>These are the most basic data points that every event log should contain.<\/p>\n\n\n\n<p>But you can add even more dimensions. Just a few examples:<\/p>\n\n\n\n<ol class=\"wp-block-list\"><li>visitor&#8217;s unique ID (really important!)<\/li><li>visitor&#8217;s email address<\/li><li>visitor segment <em>(if you have any)<\/em><\/li><li>visitor&#8217;s operation system<\/li><li>last payment amount<\/li><li>visitor&#8217;s device type<\/li><li>acquisition channel (source, medium, etc.)<\/li><li>previous site visited<\/li><li>etc\u2026<\/li><\/ol>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>What kind of raw data should you collect?<\/strong><\/h2>\n\n\n\n<p>If you run an online business, you can collect and store a virtually infinite amount of data. Infinite vertically (the number of different events you can log) as well as horizontally (the number of dimensions you can collect about one event in one line).<\/p>\n\n\n\n<p>This raises the obvious question: <strong>what you should collect and what you shouldn&#8217;t.<\/strong><\/p>\n\n\n\n<p><strong>The principle here is very simple:<\/strong> <strong>collect everything you can.<\/strong> Every click, every pageview, every feature usage, everything.<\/p>\n\n\n\n<p>It&#8217;s interesting to note that (according to market benchmarks) most startups who follow this collect-everything-principle actually end up using less than 10% of their data. 90% is not even touched by their data scientists!<\/p>\n\n\n\n<p>So why do they collect everything? The answer is: <strong>because you can never know what data you will need in the future for your data projects.<\/strong><\/p>\n\n\n\n<p>Let&#8217;s say you want to change a 3-year-old key feature of your online product. You don&#8217;t want to mess anything up, so before the change, you will spend some time to understand the exact role of that 3-year-old key feature. For that you will need to analyze your data retrospectively. Whoops, you realize that you didn&#8217;t collect any data about it. Game over, you&#8217;ve just lost 3-years &nbsp;worth of information\u2026 Get it?<\/p>\n\n\n\n<p><strong>If you start thinking about collecting a specific data point when it&#8217;s actually needed for a data science project, you are already too late.<\/strong><\/p>\n\n\n\n<p>And that&#8217;s the reason behind the principle &#8220;collect everything you can&#8221;.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>What kind of data should you not collect?<\/strong><\/h2>\n\n\n\n<p>There are some obvious limitations, of course.<\/p>\n\n\n\n<p>But the price of storing data is <strong>not one of those.<\/strong> Storing data (in the cloud at least) is very cheap today.<\/p>\n\n\n\n<p>The real limitations are:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li><strong>engineering time:<\/strong> The developers need to spend time to implement your tracking scripts. And if you have a complex data warehouse, you will need a full-time person to build and maintain the data infrastructure, too. So if your developers spend more time collecting raw data than implementing new features, fixes or design ideas, then maybe you are too data focused.<\/li><li><strong>common sense:<\/strong> yes, even if it&#8217;s cheap, you can still overload your database if you do foolish things. E.g. if you log every mouse movement of every user every millisecond. You should not do that.<\/li><li><strong>forgot-to-think-about-it:<\/strong> in most cases, the main reason why people don&#8217;t collect particular data points is that they simply forget that they should be collected. It happens, don&#8217;t worry. If you want to avoid it, I recommend setting up a workshop in which you sit together and talk through how and what data to collect and why. I wrote more about that <a href=\"https:\/\/blog.panoply.io\/data-collection-how-what-when\">in this article<\/a>.<\/li><li><strong>legal questions:<\/strong> You should consider legal questions, too. They differ from country to country, so I recommend consulting with a legal professional in your country. (Update in 2018: mind GDPR if you have EU users.)<\/li><li>And one more comment here. Some countries have strict legal restrictions about data collection, others don&#8217;t. Regardless of the regulations: <strong>always consider ethics.<\/strong> Never collect data from your website visitors that you wouldn&#8217;t want collected about you.<\/li><\/ul>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Conclusion<\/strong><\/h2>\n\n\n\n<p>This is how raw data collection works at a high level. Google Analytics, Mixpanel, Crazyegg or your own data warehouses &#8212; all are based on these principles. Of course there are small differences, but now you understand what happens in the background and you can be more confident when talking about raw data collection with your co-workers!<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>If you want to learn more about how to become a data scientist, take my 50-minute video course: <a href=\"https:\/\/data36.com\/how-to-become-a-data-scientist\/\">How to Become a Data Scientist.<\/a>&nbsp;(It&#8217;s&nbsp;free!)<\/li>\n\n\n\n<li>Also check out my 6-week online course: <a href=\"https:\/\/data36.com\/jds\/\">The Junior Data Scientist\u2019s First Month video course.<\/a><\/li>\n<\/ul>\n\n\n\n<p><em>Cheers,<\/em><br><strong><em>Tomi Mester<\/em><\/strong><\/p>\n\n\n\n<p><em>Cheers,<\/em><br><strong><em>Tomi Mester<\/em><\/strong><\/p>\n","protected":false},"excerpt":{"rendered":"<p>The first step for every data science project is data collection, that is, getting the actual raw data. There are two ways to do this: A) You can pick one or more &#8220;smart tools&#8221; to use. These services will collect the data for you automatically. You only need to copy-paste a snippet of code into [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":1421,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[141],"tags":[14,53,13,73,28,59,41,62,16,30],"class_list":["post-374","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-strategy-business-data-science-analytics","tag-big-data","tag-collect","tag-data","tag-data-collection","tag-google-analytics","tag-hotjar","tag-metrics","tag-mixpanel","tag-startup","tag-tomi-mester"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v21.1 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Data Collection (Getting the Actual Raw Data) -- This is How it Works!<\/title>\n<meta name=\"description\" content=\"The first step for every data science project is data collection, that is, getting the actual raw data. Do it for yourself or using a 3rd party tool\u2026\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/data36.com\/data-collection\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Data Collection (Getting the Actual Raw Data) -- This is How it Works!\" \/>\n<meta property=\"og:description\" content=\"The first step for every data science project is data collection, that is, getting the actual raw data. Do it for yourself or using a 3rd party tool\u2026\" \/>\n<meta property=\"og:url\" content=\"https:\/\/data36.com\/data-collection\/\" \/>\n<meta property=\"og:site_name\" content=\"Data36\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/data36\/\" \/>\n<meta property=\"article:author\" content=\"https:\/\/www.facebook.com\/data36\" \/>\n<meta property=\"article:published_time\" content=\"2016-11-13T22:49:43+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2019-04-25T22:28:44+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/data36.com\/wp-content\/uploads\/2016\/11\/data_collection-1.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1200\" \/>\n\t<meta property=\"og:image:height\" content=\"628\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Tomi Mester\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:title\" content=\"Data Collection (Getting the Actual Raw Data) -- This is How it Works!\" \/>\n<meta name=\"twitter:description\" content=\"The first step for every data science project is data collection, that is, getting the actual raw data. Do it for yourself or using a 3rd party tool\u2026\" \/>\n<meta name=\"twitter:image\" content=\"http:\/\/data36.com\/wp-content\/uploads\/2016\/11\/data_collection-1.png\" \/>\n<meta name=\"twitter:creator\" content=\"@data36_com\" \/>\n<meta name=\"twitter:site\" content=\"@data36_com\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Tomi Mester\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"7 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/data36.com\/data-collection\/\",\"url\":\"https:\/\/data36.com\/data-collection\/\",\"name\":\"Data Collection (Getting the Actual Raw Data) -- This is How it Works!\",\"isPartOf\":{\"@id\":\"https:\/\/data36.com\/#website\"},\"datePublished\":\"2016-11-13T22:49:43+00:00\",\"dateModified\":\"2019-04-25T22:28:44+00:00\",\"author\":{\"@id\":\"https:\/\/data36.com\/#\/schema\/person\/cbc505eee4cecd9d74a2c0f0d00d356e\"},\"description\":\"The first step for every data science project is data collection, that is, getting the actual raw data. Do it for yourself or using a 3rd party tool\u2026\",\"breadcrumb\":{\"@id\":\"https:\/\/data36.com\/data-collection\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/data36.com\/data-collection\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/data36.com\/data-collection\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/data36.com\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"How data collection works\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/data36.com\/#website\",\"url\":\"https:\/\/data36.com\/\",\"name\":\"Data36\",\"description\":\"Learn Data Science the Hard Way!\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/data36.com\/?s={search_term_string}\"},\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/data36.com\/#\/schema\/person\/cbc505eee4cecd9d74a2c0f0d00d356e\",\"name\":\"Tomi Mester\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/data36.com\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/8b782b29236065ff5e1c0e47a8bdb6ba?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/8b782b29236065ff5e1c0e47a8bdb6ba?s=96&d=mm&r=g\",\"caption\":\"Tomi Mester\"},\"description\":\"Tomi Mester is a data analyst and researcher. He\u2019s the author of the Data36 blog where he gives a sneak peek into online data analysts\u2019 best practices. He writes posts and tutorials on a weekly basis about data science, AB-testing, online research and data coding. Tomi is a guest blogger on Crazyegg, Hackernoon and Tech-In-Asia. You can meet him as a presenter on conferences like: Global E-commerce Summit, TEDx, Business Intelligence Forum, etc...\",\"sameAs\":[\"https:\/\/data36.com\",\"https:\/\/www.facebook.com\/data36\"],\"url\":\"https:\/\/data36.com\/author\/mestitomi\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Data Collection (Getting the Actual Raw Data) -- This is How it Works!","description":"The first step for every data science project is data collection, that is, getting the actual raw data. Do it for yourself or using a 3rd party tool\u2026","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/data36.com\/data-collection\/","og_locale":"en_US","og_type":"article","og_title":"Data Collection (Getting the Actual Raw Data) -- This is How it Works!","og_description":"The first step for every data science project is data collection, that is, getting the actual raw data. Do it for yourself or using a 3rd party tool\u2026","og_url":"https:\/\/data36.com\/data-collection\/","og_site_name":"Data36","article_publisher":"https:\/\/www.facebook.com\/data36\/","article_author":"https:\/\/www.facebook.com\/data36","article_published_time":"2016-11-13T22:49:43+00:00","article_modified_time":"2019-04-25T22:28:44+00:00","og_image":[{"width":1200,"height":628,"url":"https:\/\/data36.com\/wp-content\/uploads\/2016\/11\/data_collection-1.png","type":"image\/png"}],"author":"Tomi Mester","twitter_card":"summary_large_image","twitter_title":"Data Collection (Getting the Actual Raw Data) -- This is How it Works!","twitter_description":"The first step for every data science project is data collection, that is, getting the actual raw data. Do it for yourself or using a 3rd party tool\u2026","twitter_image":"http:\/\/data36.com\/wp-content\/uploads\/2016\/11\/data_collection-1.png","twitter_creator":"@data36_com","twitter_site":"@data36_com","twitter_misc":{"Written by":"Tomi Mester","Est. reading time":"7 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/data36.com\/data-collection\/","url":"https:\/\/data36.com\/data-collection\/","name":"Data Collection (Getting the Actual Raw Data) -- This is How it Works!","isPartOf":{"@id":"https:\/\/data36.com\/#website"},"datePublished":"2016-11-13T22:49:43+00:00","dateModified":"2019-04-25T22:28:44+00:00","author":{"@id":"https:\/\/data36.com\/#\/schema\/person\/cbc505eee4cecd9d74a2c0f0d00d356e"},"description":"The first step for every data science project is data collection, that is, getting the actual raw data. Do it for yourself or using a 3rd party tool\u2026","breadcrumb":{"@id":"https:\/\/data36.com\/data-collection\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/data36.com\/data-collection\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/data36.com\/data-collection\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/data36.com\/"},{"@type":"ListItem","position":2,"name":"How data collection works"}]},{"@type":"WebSite","@id":"https:\/\/data36.com\/#website","url":"https:\/\/data36.com\/","name":"Data36","description":"Learn Data Science the Hard Way!","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/data36.com\/?s={search_term_string}"},"query-input":"required name=search_term_string"}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/data36.com\/#\/schema\/person\/cbc505eee4cecd9d74a2c0f0d00d356e","name":"Tomi Mester","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/data36.com\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/8b782b29236065ff5e1c0e47a8bdb6ba?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/8b782b29236065ff5e1c0e47a8bdb6ba?s=96&d=mm&r=g","caption":"Tomi Mester"},"description":"Tomi Mester is a data analyst and researcher. He\u2019s the author of the Data36 blog where he gives a sneak peek into online data analysts\u2019 best practices. He writes posts and tutorials on a weekly basis about data science, AB-testing, online research and data coding. Tomi is a guest blogger on Crazyegg, Hackernoon and Tech-In-Asia. You can meet him as a presenter on conferences like: Global E-commerce Summit, TEDx, Business Intelligence Forum, etc...","sameAs":["https:\/\/data36.com","https:\/\/www.facebook.com\/data36"],"url":"https:\/\/data36.com\/author\/mestitomi\/"}]}},"_links":{"self":[{"href":"https:\/\/data36.com\/wp-json\/wp\/v2\/posts\/374","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/data36.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/data36.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/data36.com\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/data36.com\/wp-json\/wp\/v2\/comments?post=374"}],"version-history":[{"count":0,"href":"https:\/\/data36.com\/wp-json\/wp\/v2\/posts\/374\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/data36.com\/wp-json\/wp\/v2\/media\/1421"}],"wp:attachment":[{"href":"https:\/\/data36.com\/wp-json\/wp\/v2\/media?parent=374"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/data36.com\/wp-json\/wp\/v2\/categories?post=374"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/data36.com\/wp-json\/wp\/v2\/tags?post=374"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}