{"id":1160,"date":"2017-07-10T22:07:45","date_gmt":"2017-07-10T22:07:45","guid":{"rendered":"https:\/\/data36.com\/?p=1160"},"modified":"2024-02-14T19:02:10","modified_gmt":"2024-02-14T19:02:10","slug":"sql-join-data-analysis-tutorial-ep5","status":"publish","type":"post","link":"https:\/\/data36.com\/sql-join-data-analysis-tutorial-ep5\/","title":{"rendered":"SQL for Data Analysis &#8211; Tutorial for Beginners &#8211; ep5"},"content":{"rendered":"\n<p>Combining tables is a key component in data science and analytics. And SQL is really good at it! Of course each case is different, but I run into data science tasks all the time in which joining two multi-million-row data tables took <em>~20-30 minutes<\/em> in <a href=\"https:\/\/data36.com\/learn-python-for-data-science-from-scratch\/\">Python<\/a> or <a href=\"https:\/\/data36.com\/learn-data-analytics-bash-scratch\/\">bash<\/a>&#8230; <strong>and not more than ~10-20 seconds in SQL.<\/strong> I\u2019m not saying I couldn\u2019t have done those tasks in Python or bash at all\u2026 <strong>But for sure SQL JOIN was the easiest and fastest solution!<\/strong><\/p>\n\n\n\n<p>So let\u2019s learn how to use SQL JOIN to&nbsp;step up your analytics projects!<\/p>\n\n\n\n<p><em>Note: to get the most out of this article, you should not just read it, but actually do the coding part with me! So if you are on the phone, I suggest saving this article and continuing on your computer!<\/em><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">But before we start\u2026<\/h2>\n\n\n\n<p>\u2026 I highly recommend&nbsp;going through these articles first \u2013 if you haven\u2019t done so yet:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Set up your own data server to practice: <a href=\"https:\/\/data36.com\/data-coding-101-install-python-sql-r-bash\/\">How to install&nbsp;Python, SQL, R and Bash (for non-devs)<\/a><\/li>\n\n\n\n<li>Install SQL Workbench to manage your SQL stuff better: <a href=\"https:\/\/data36.com\/install-sql-workbench-postgresql\/\">How to install SQL Workbench for postgreSQL<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/data36.com\/sql-for-data-analysis-tutorial-beginners\/\">SQL for Data Analysis ep1 (SQL basics)<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/data36.com\/sql-where-clause-tutorial-beginners-ep2\/\">SQL for Data Analysis ep2 (SQL WHERE clause)<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/data36.com\/sql-functions-beginners-tutorial-ep3\/\">SQL for Data Analysis ep3 (SQL functions and GROUP BY)<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/data36.com\/sql-best-practices-data-analysts\/\">SQL for Data Analysis ep4 (SQL best practices)<\/a><\/li>\n<\/ol>\n\n\n\t\t<div data-elementor-type=\"section\" data-elementor-id=\"6935\" class=\"elementor elementor-6935\" data-elementor-post-type=\"elementor_library\">\n\t\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-6d6c325e elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"6d6c325e\" data-element_type=\"section\" data-settings=\"{&quot;background_background&quot;:&quot;classic&quot;}\">\n\t\t\t\t\t\t\t<div class=\"elementor-background-overlay\"><\/div>\n\t\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-ff17a5b\" data-id=\"ff17a5b\" data-element_type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-a8e0a54 elementor-widget elementor-widget-heading\" data-id=\"a8e0a54\" data-element_type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h4 class=\"elementor-heading-title elementor-size-default\">How to Become a Data Scientist<br>(free 50-minute video course by Tomi Mester)<\/h4>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-470bc82d elementor-widget elementor-widget-text-editor\" data-id=\"470bc82d\" data-element_type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p><span style=\"color: var( --e-global-color-text ); font-family: 'PT Serif'; font-size: 1em; word-spacing: var( --e-global-typography-text-word-spacing );\">Just subscribe to the Data36 Newsletter here (it&#8217;s free)!<\/span><\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-4daf807 elementor-widget elementor-widget-html\" data-id=\"4daf807\" data-element_type=\"widget\" data-widget_type=\"html.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<style>\n.ml-form-checkboxRow label.checkbox {\n    margin-top: 20px;\n  display: flex;\n  align-items: flex-start; \/* Aligns items at the start (top) *\/\n}\n.ml-form-checkboxRow label.checkbox input[type=\"checkbox\"] {\n  margin-top: 5px;\n  margin-right: 10px; \/* Adjusts space between the checkbox and the text, ensuring the checkbox aligns at the start of the text *\/\n}\n\n\n.ml-form-embedSubmit {\n  text-align: center; \/* Centers inline or inline-block elements horizontally *\/\n  margin-top: 30px;\n  margin-bottom: 40px;\n}\n<\/style>\n\n<div id=\"mlb2-13389319\" class=\"ml-form-embedContainer ml-subscribe-form ml-subscribe-form-13389319\">\n      <div class=\"ml-form-align-center \">\n        <div class=\"ml-form-embedWrapper embedForm\">\n\n          \n          \n\n          <div class=\"ml-form-embedBody ml-form-embedBodyDefault row-form\">\n\n            <div class=\"ml-form-embedContent\" style=\"margin-bottom: 0px; \">\n              \n            <\/div>\n\n            <form class=\"ml-block-form\" action=\"https:\/\/assets.mailerlite.com\/jsonp\/876500\/forms\/116418002614748567\/subscribe\" data-code=\"\" method=\"post\" target=\"_blank\">\n              <div class=\"ml-form-formContent\">\n                \n\n                  \n                  <div class=\"ml-form-fieldRow ml-last-item\">\n                    <div class=\"ml-field-group ml-field-email ml-validate-email ml-validate-required\">\n\n                      \n\n\n                      <!-- input -->\n                      <input aria-label=\"email\" aria-required=\"true\" type=\"email\" class=\"form-control\" data-inputmask=\"\" name=\"fields[email]\" placeholder=\"Email\" autocomplete=\"email\">\n                      <!-- \/input -->\n\n                      <!-- textarea -->\n                      \n                      <!-- \/textarea -->\n\n                      <!-- select -->\n                      \n                      <!-- \/select -->\n\n                      <!-- checkboxes -->\n            \n            <!-- \/checkboxes -->\n\n                      <!-- radio -->\n                      \n                      <!-- \/radio -->\n\n                      <!-- countries -->\n                      \n                      <!-- \/countries -->\n\n\n\n\n\n                    <\/div>\n                  <\/div>\n                \n              <\/div>\n\n              \n\n              <!-- Privacy policy -->\n              \n              <!-- \/Privacy policy -->\n\n              \n\n              \n\n              <div class=\"ml-form-checkboxRow ml-validate-required\">\n                \n                  <label class=\"checkbox\">\n                    <input type=\"checkbox\">\n                    <div class=\"label-description\">\n                      <p>I accept Data36's <a href=\"https:\/\/data36.com\/privacy-policy\/\">Privacy Policy<\/a>. (No spam. Only useful data science related content. When you subscribe, I\u2019ll keep you updated with a couple emails per week. You'll get articles, courses, cheatsheets, tutorials and many cool stuff.)<\/p>\n                    <\/div>\n                  <\/label>\n                \n              <\/div>\n\n\n\n\n\n\n              \n              <input type=\"hidden\" name=\"ml-submit\" value=\"1\">\n\n              <div class=\"ml-form-embedSubmit\">\n                \n                  <button type=\"submit\" class=\"primary\">Get Access Now!<\/button>\n                \n                <button disabled=\"disabled\" style=\"display: none;\" type=\"button\" class=\"loading\">\n                  <div class=\"ml-form-embedSubmitLoad\"><\/div>\n                  <span class=\"sr-only\">Loading...<\/span>\n                <\/button>\n              <\/div>\n\n              \n              <input type=\"hidden\" name=\"anticsrf\" value=\"true\">\n            <\/form>\n          <\/div>\n\n          <div class=\"ml-form-successBody row-success\" style=\"display: none\">\n\n            <div class=\"ml-form-successContent\">\n              \n                <h4>Thank you!<\/h4>\n                \n                  <p>You have successfully joined our subscriber list.<\/p>\n                \n              \n            <\/div>\n\n          <\/div>\n        <\/div>\n      <\/div>\n    <\/div>\n\n  \n\n  \n  <script>\n    function ml_webform_success_13389319() {\n    try {\n        window.top.location.href = 'https:\/\/data36.com\/data36-inner-circle-subscription\/';\n      } catch (e) {\n        window.location.href = 'https:\/\/data36.com\/data36-inner-circle-subscription\/';\n      }\n    }\n      <\/script>\n  \n  \n  \n      <script src=\"https:\/\/groot.mailerlite.com\/js\/w\/webforms.min.js?v2d8fb22bb5b3677f161552cd9e774127\" type=\"text\/javascript\"><\/script>\n        <script>\n            fetch(\"https:\/\/assets.mailerlite.com\/jsonp\/876500\/forms\/116418002614748567\/takel\")\n        <\/script>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<\/div>\n\t\t\n\n\n\n<h2 class=\"wp-block-heading\">What is SQL JOIN?<\/h2>\n\n\n\n<p>What does it mean&nbsp;to <code>JOIN<\/code> two tables? Here\u2019s a simple example.<\/p>\n\n\n\n<p>Let\u2019s say we have these two datasets:<\/p>\n\n\n\n<p><span style=\"text-decoration: underline;\"><strong>TABLE #1<\/strong><\/span><\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><thead><tr><th><strong>INVENTION<\/strong><\/th><th><strong>INVENTOR<\/strong><\/th><\/tr><\/thead><tbody><tr><td><span style=\"font-weight: 400;\">Rubik\u2019s Cube<\/span><\/td><td><span style=\"font-weight: 400;\">Erno Rubik<\/span><\/td><\/tr><tr><td>Carburetor<\/td><td><span style=\"font-weight: 400;\">Janos Csonka<\/span><\/td><\/tr><tr><td>Carburetor<\/td><td><span style=\"font-weight: 400;\">Donat Banki<\/span><\/td><\/tr><tr><td><span style=\"font-weight: 400;\">Ballpen<\/span><\/td><td><span style=\"font-weight: 400;\">Laszlo Biro<\/span><\/td><\/tr><tr><td><span style=\"font-weight: 400;\">Dynamo<\/span><\/td><td><span style=\"font-weight: 400;\">Anyos Jedlik<\/span><\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p><span style=\"text-decoration: underline;\"><strong>TABLE #2<\/strong><\/span><\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><thead><tr><th><strong>INVENTOR<\/strong><\/th><th><strong>PROFESSION<\/strong><\/th><\/tr><\/thead><tbody><tr><td><span style=\"font-weight: 400;\">Erno Rubik<\/span><\/td><td><span style=\"font-weight: 400;\">architect<\/span><\/td><\/tr><tr><td><span style=\"font-weight: 400;\">Janos Csonka<\/span><\/td><td><span style=\"font-weight: 400;\">engineer<\/span><\/td><\/tr><tr><td><span style=\"font-weight: 400;\">Donat Banki<\/span><\/td><td><span style=\"font-weight: 400;\">engineer<\/span><\/td><\/tr><tr><td><span style=\"font-weight: 400;\">Laszlo Biro<\/span><\/td><td><span style=\"font-weight: 400;\">journalist<\/span><\/td><\/tr><tr><td><span style=\"font-weight: 400;\">Anyos Jedlik<\/span><\/td><td><span style=\"font-weight: 400;\">engineer<\/span><\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>In TABLE #1 we have inventors and inventions and in TABLE #2 we have the same inventors and their original professions. Let\u2019s say we want to see which inventions was invented by an inventor with what original profession. To query that, we have to merge the two tables, based on the column that shows up in both: <strong><em>inventor<\/em><\/strong>.<\/p>\n\n\n\n<p>And this is what SQL JOIN is good for. After joining the two tables, this is what we will get:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><thead><tr><th><strong><span style=\"text-decoration: underline;\">INVENTION<\/span><\/strong><\/th><th><strong><span style=\"text-decoration: underline;\">INVENTOR<\/span><\/strong><\/th><th><strong><span style=\"text-decoration: underline;\">PROFESSION<\/span><\/strong><\/th><\/tr><\/thead><tbody><tr><td><span style=\"font-weight: 400;\">Rubik\u2019s Cube<\/span><\/td><td><span style=\"font-weight: 400;\">Erno Rubik<\/span><\/td><td><span style=\"font-weight: 400;\">architect<\/span><\/td><\/tr><tr><td>Carburetor<\/td><td><span style=\"font-weight: 400;\">Janos Csonka<\/span><\/td><td><span style=\"font-weight: 400;\">engineer<\/span><\/td><\/tr><tr><td>Carburetor<\/td><td><span style=\"font-weight: 400;\">Donat Banki<\/span><\/td><td><span style=\"font-weight: 400;\">engineer<\/span><\/td><\/tr><tr><td><span style=\"font-weight: 400;\">Ballpen<\/span><\/td><td><span style=\"font-weight: 400;\">Laszlo Biro<\/span><\/td><td><span style=\"font-weight: 400;\">journalist<\/span><\/td><\/tr><tr><td><span style=\"font-weight: 400;\">Dynamo<\/span><\/td><td><span style=\"font-weight: 400;\">Anyos Jedlik<\/span><\/td><td><span style=\"font-weight: 400;\">engineer<\/span><\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>Now we can finally see that the Rubik\u2019s Cube was invented by an architect!<\/p>\n\n\n\n<p>(<em>Extra (not SQL-related) task: find out what these 5 inventions have in common!<\/em>)<\/p>\n\n\n\n<p>Perfect, but what does this look like in practice?<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Get some data!<\/h2>\n\n\n\n<p>To put SQL JOIN into practice, we will get some data first.<\/p>\n\n\n\n<p>Open up SQL Workbench! We are gonna <a href=\"https:\/\/data36.com\/create-table-sql\/\">create<\/a> two new temporary data tables: <code>playlist<\/code> and <code>toplist<\/code>. Run these two queries in SQL Workbench one by one.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">CREATE TABLE playlist (\n  artist VARCHAR,\n  song VARCHAR);<\/pre>\n\n\n\n<pre class=\"wp-block-preformatted\">CREATE TABLE toplist (\n  tophit VARCHAR,\n  play INT);<\/pre>\n\n\n\n<p>The new SQL tables have been created. Now, load some data into them!<\/p>\n\n\n\n<p>Run these two queries one by one in your SQL Workbench:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">INSERT INTO playlist (artist,song) VALUES\n  ('ABBA','Dancing Queen'),\n  ('ABBA','Gimme!'),\n  ('ABBA','The Winner Takes It All'),\n  ('ABBA','Mamma Mia'),\n  ('ABBA','Take a Chance On Me'),\n  ('Tove Lo','Cool Girl'),\n  ('Tove Lo','Stay High'),\n  ('Tove Lo','Talking Body'),\n  ('Tove Lo','Habits'),\n  ('Tove Lo','True Disaster'),\n  ('Avicii','Wake Me Up'),\n  ('Avicii','Waiting For Love'),\n  ('Avicii','The Nights'),\n  ('Avicii','Hey Brother'),\n  ('Avicii','Levels'),\n  ('Zara Larsson','Lush Life');<\/pre>\n\n\n\n<pre class=\"wp-block-preformatted\">INSERT INTO toplist (tophit,play) VALUES\n  ('Dancing Queen',95145796),\n  ('Gimme!',32785696),\n  ('The Winner Takes It All',34458597),\n  ('Mamma Mia',47901900),\n  ('Take a Chance On Me',30654536),\n  ('Cool Girl',227055115),\n  ('Stay High',263901766),\n  ('Talking Body',272334711),\n  ('Habits',214685822),\n  ('True Disaster',27028538),\n  ('Wake Me Up',520259542),\n  ('Waiting For Love',399906192),\n  ('The Nights',278063930),\n  ('Hey Brother',321270703),\n  ('Levels',206004691),\n  ('Despacito',519689490);<\/pre>\n\n\n\n<p><em>Note: This is actually real data that I pulled from Spotify when I wrote this article.<\/em><\/p>\n\n\n\n<p>Let\u2019s see whether everything works fine&#8230; Query the tables with:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">SELECT\n  artist,\n  song\nFROM\n  playlist;<\/pre>\n\n\n\n<figure class=\"wp-block-image aligncenter\"><img fetchpriority=\"high\" decoding=\"async\" width=\"1024\" height=\"636\" src=\"https:\/\/data36.com\/wp-content\/uploads\/2017\/07\/playlist-demo-1024x636.png\" alt=\"playlist demo\" class=\"wp-image-1161\" srcset=\"https:\/\/data36.com\/wp-content\/uploads\/2017\/07\/playlist-demo-1024x636.png 1024w, https:\/\/data36.com\/wp-content\/uploads\/2017\/07\/playlist-demo-300x186.png 300w, https:\/\/data36.com\/wp-content\/uploads\/2017\/07\/playlist-demo-768x477.png 768w, https:\/\/data36.com\/wp-content\/uploads\/2017\/07\/playlist-demo-973x604.png 973w, https:\/\/data36.com\/wp-content\/uploads\/2017\/07\/playlist-demo-508x316.png 508w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<pre class=\"wp-block-preformatted\">SELECT\n  tophit,\n  play\nFROM\n  toplist;<\/pre>\n\n\n\n<figure class=\"wp-block-image aligncenter\"><img decoding=\"async\" width=\"1024\" height=\"632\" src=\"https:\/\/data36.com\/wp-content\/uploads\/2017\/07\/toplist-demo-1024x632.png\" alt=\"toplist demo\" class=\"wp-image-1162\" srcset=\"https:\/\/data36.com\/wp-content\/uploads\/2017\/07\/toplist-demo-1024x632.png 1024w, https:\/\/data36.com\/wp-content\/uploads\/2017\/07\/toplist-demo-300x185.png 300w, https:\/\/data36.com\/wp-content\/uploads\/2017\/07\/toplist-demo-768x474.png 768w, https:\/\/data36.com\/wp-content\/uploads\/2017\/07\/toplist-demo-973x601.png 973w, https:\/\/data36.com\/wp-content\/uploads\/2017\/07\/toplist-demo-508x314.png 508w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p><em>Note: Why did I use the column names and why didn\u2019t I use&nbsp;<code>SELECT *<\/code> instead? Find out from my <a href=\"https:\/\/data36.com\/sql-best-practices-data-analysts\/\">SQL best practices<\/a> article!<\/em><\/p>\n\n\n\n<p>Ah, cool, some nice songs from some nice artists, that people have played a lot already&#8230;<\/p>\n\n\n\n<p><em><strong>An important note: when you close SQL Workbench, these temporary data tables will disappear. If you do so, you have to <code>CREATE<\/code> them again and then <code>INSERT<\/code> data into them!<\/strong><\/em> <strong><em>If you want to keep them<\/em><\/strong>, <strong><em>just type and run this command in SQL Workbench:<\/em><\/strong><\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">COMMIT;<\/pre>\n\n\n\n<h2 class=\"wp-block-heading\">SQL JOIN &#8211; the basics<\/h2>\n\n\n\n<p>Let\u2019s perform our first SQL JOIN!<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">SELECT *\nFROM toplist\nJOIN playlist\nON tophit = song;<\/pre>\n\n\n\n<figure class=\"wp-block-image aligncenter\"><img decoding=\"async\" width=\"1024\" height=\"636\" src=\"https:\/\/data36.com\/wp-content\/uploads\/2017\/07\/SQL-JOIN-simple-way-1024x636.png\" alt=\"SQL JOIN simple way\" class=\"wp-image-1163\" srcset=\"https:\/\/data36.com\/wp-content\/uploads\/2017\/07\/SQL-JOIN-simple-way-1024x636.png 1024w, https:\/\/data36.com\/wp-content\/uploads\/2017\/07\/SQL-JOIN-simple-way-300x186.png 300w, https:\/\/data36.com\/wp-content\/uploads\/2017\/07\/SQL-JOIN-simple-way-768x477.png 768w, https:\/\/data36.com\/wp-content\/uploads\/2017\/07\/SQL-JOIN-simple-way-973x604.png 973w, https:\/\/data36.com\/wp-content\/uploads\/2017\/07\/SQL-JOIN-simple-way-508x315.png 508w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>First of all, take a look at the results!<\/p>\n\n\n\n<p>We have four columns, two plus two from the two tables. Makes sense, right?<\/p>\n\n\n\n<p>The values in the <code>song<\/code> and <code>tophit<\/code> columns are the same. In fact, that was the column that we have joined on, so this is not a big surprise. But hey, we have just merged two tables!<\/p>\n\n\n\n<p>Let\u2019s see what has happened code-wise:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><code>SELECT *<\/code> &#8211;\u00bb We want to select every column&#8230;<\/li>\n\n\n\n<li><code>FROM toplist<\/code> &#8211;\u00bb &#8230;from the <code>toplist<\/code> data table&#8230;<\/li>\n\n\n\n<li><code><strong>JOIN playlist<\/strong><\/code> &#8211;\u00bb &#8230;but we also want to merge the <code>playlist<\/code> table to the the <code>toplist<\/code> table&#8230;<\/li>\n\n\n\n<li><code><strong>ON tophit = song;<\/strong><\/code> &#8211;\u00bb &#8230;and we want to connect those lines, where the value of the <code>tophit<\/code> column is matching with the value of the <code>song<\/code> column.<\/li>\n<\/ul>\n\n\n\n<p>Not that complicated&#8230; Yet. \ud83d\ude42<\/p>\n\n\n\n<p>Now there is one important question here&#8230;<\/p>\n\n\n\n<p>If you haven\u2019t&nbsp;realized it yet: there is one song (<em><code>Zara Larsson<\/code>: <code>Lush Life<\/code><\/em>) that exists in the <code>playlist<\/code> table, but not in the <code>toplist<\/code> table. And there is another one (<em><code>Despacito<\/code>, <code>519689490<\/code><\/em>) that exists in the <code>toplist<\/code> table, but not in the <code>playlist<\/code> table. I did this on purpose, because I wanted to demonstrate what happens when you have to deal with partially missing data, which actually happens quite often in real-life data science projects.<\/p>\n\n\n\n<p>To understand that, take a look at this Venn-diagram. It shows what happens with the data after <code>JOIN<\/code>-ing the two SQL tables.<\/p>\n\n\n\n<p>When you use the <em>default<\/em> <code>JOIN<\/code>, your query will manage your data like this:<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"682\" src=\"https:\/\/data36.com\/wp-content\/uploads\/2017\/07\/SQL-JOIN-1024x682.png\" alt=\"SQL JOIN INNER\" class=\"wp-image-1164\" srcset=\"https:\/\/data36.com\/wp-content\/uploads\/2017\/07\/SQL-JOIN-1024x682.png 1024w, https:\/\/data36.com\/wp-content\/uploads\/2017\/07\/SQL-JOIN-300x200.png 300w, https:\/\/data36.com\/wp-content\/uploads\/2017\/07\/SQL-JOIN-768x512.png 768w, https:\/\/data36.com\/wp-content\/uploads\/2017\/07\/SQL-JOIN-973x648.png 973w, https:\/\/data36.com\/wp-content\/uploads\/2017\/07\/SQL-JOIN-508x338.png 508w, https:\/\/data36.com\/wp-content\/uploads\/2017\/07\/SQL-JOIN.png 1636w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><figcaption class=\"wp-element-caption\">INNER SQL JOIN<\/figcaption><\/figure>\n\n\n\n<p>As you can see: <strong>the default SQL JOIN keeps only the data that occurs in both tables.<\/strong>&nbsp;So <code>Despacito<\/code> and <code>Zara Larsson<\/code> were removed.<\/p>\n\n\n\n<p>This method is called <code>INNER JOIN<\/code> but since this is what we use most often, in SQL it has been implemented as the default behaviour of the <code>JOIN<\/code> keyword.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">SQL JOIN &#8211; same query, better syntax<\/h2>\n\n\n\n<p>Remember that in the <a href=\"https:\/\/data36.com\/sql-best-practices-data-analysts\/\">SQL best practices article<\/a>&nbsp;I strongly recommended not to use <code>*<\/code> in a <code>SELECT<\/code> statement&#8230; It&#8217;s better to use the actual names of the columns instead. And it\u2019s even more important when you use <code>JOIN<\/code>s in your queries! <\/p>\n\n\n\n<p>Instead of this query&#8230;<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">SELECT *\nFROM toplist\nJOIN playlist\nON tophit = song;<\/pre>\n\n\n\n<p>&#8230;I recommend&nbsp;using this one:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">SELECT\n   <strong>tophit,<\/strong>\n  <strong> play,<\/strong>\n  <strong> artist,<\/strong>\n  <strong> song<\/strong>\nFROM toplist\nJOIN playlist\nON tophit = song;<\/pre>\n\n\n\n<p>And to be honest, this is still not the most bulletproof syntax.<\/p>\n\n\n\n<p>The problem is that when you look at this code, you can\u2019t instantly see what columns belong to which table. Thus I like to add the table names also to the column names. It works with dot notation and it&#8217;s really simple.<\/p>\n\n\n\n<p>For instance:<\/p>\n\n\n\n<p>If the <code>tophit<\/code> column is in the <code>toplist<\/code> table, then instead of <code>tophit<\/code>, I\u2019ll write: <code>toplist.tophit<\/code>.<\/p>\n\n\n\n<p>If you apply this to all column names in the above query:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">SELECT\n  <strong>toplist.<\/strong>tophit,\n  <strong>toplist.<\/strong>play,\n  <strong>playlist.<\/strong>artist,\n  <strong>playlist.<\/strong>song\nFROM toplist\nJOIN playlist\nON <strong>toplist.<\/strong>tophit = <strong>playlist.<\/strong>song;\n<\/pre>\n\n\n\n<figure class=\"wp-block-image aligncenter\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"632\" src=\"https:\/\/data36.com\/wp-content\/uploads\/2017\/07\/SQL-JOIN-formatting-1024x632.png\" alt=\"SQL JOIN formatting\" class=\"wp-image-1165\" srcset=\"https:\/\/data36.com\/wp-content\/uploads\/2017\/07\/SQL-JOIN-formatting-1024x632.png 1024w, https:\/\/data36.com\/wp-content\/uploads\/2017\/07\/SQL-JOIN-formatting-300x185.png 300w, https:\/\/data36.com\/wp-content\/uploads\/2017\/07\/SQL-JOIN-formatting-768x474.png 768w, https:\/\/data36.com\/wp-content\/uploads\/2017\/07\/SQL-JOIN-formatting-973x601.png 973w, https:\/\/data36.com\/wp-content\/uploads\/2017\/07\/SQL-JOIN-formatting-508x314.png 508w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>Much better. But one last small tweak! The <em><code>playlist.song<\/code><\/em> and the <em><code>toplist.tophit<\/code><\/em> columns are actually the same. We don\u2019t need both of them\u2026 so I&#8217;ll remove one:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">SELECT\n  toplist.tophit,\n  toplist.play,\n  playlist.artist\nFROM toplist\nJOIN playlist\nON toplist.tophit = playlist.song;<\/pre>\n\n\n\n<figure class=\"wp-block-image aligncenter\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"635\" src=\"https:\/\/data36.com\/wp-content\/uploads\/2017\/07\/SQL-JOIN-formatting-2-1024x635.png\" alt=\"SQL JOIN formatting 2\" class=\"wp-image-1166\" srcset=\"https:\/\/data36.com\/wp-content\/uploads\/2017\/07\/SQL-JOIN-formatting-2-1024x635.png 1024w, https:\/\/data36.com\/wp-content\/uploads\/2017\/07\/SQL-JOIN-formatting-2-300x186.png 300w, https:\/\/data36.com\/wp-content\/uploads\/2017\/07\/SQL-JOIN-formatting-2-768x476.png 768w, https:\/\/data36.com\/wp-content\/uploads\/2017\/07\/SQL-JOIN-formatting-2-973x604.png 973w, https:\/\/data36.com\/wp-content\/uploads\/2017\/07\/SQL-JOIN-formatting-2-508x315.png 508w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>Now this is finally a pretty decent SQL JOIN.<\/p>\n\n\n\t\t<div data-elementor-type=\"section\" data-elementor-id=\"7012\" class=\"elementor elementor-7012\" data-elementor-post-type=\"elementor_library\">\n\t\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-259c3993 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"259c3993\" data-element_type=\"section\" data-settings=\"{&quot;background_background&quot;:&quot;classic&quot;}\">\n\t\t\t\t\t\t\t<div class=\"elementor-background-overlay\"><\/div>\n\t\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-679a79f1\" data-id=\"679a79f1\" data-element_type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-e5ca0c1 elementor-widget elementor-widget-heading\" data-id=\"e5ca0c1\" data-element_type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h4 class=\"elementor-heading-title elementor-size-default\">The Junior Data Scientist's First Month<\/h4>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-f5bc28c elementor-widget elementor-widget-text-editor\" data-id=\"f5bc28c\" data-element_type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p><span style=\"color: var( --e-global-color-text ); font-family: 'PT Serif'; font-size: 1em; word-spacing: var( --e-global-typography-text-word-spacing );\">A 100% practical online course. A 6-week simulation of being a junior data scientist at a true-to-life startup.<\/span><\/p><p><i>&#8220;Solving real problems, getting real experience &#8211; just like in a real data science job.&#8221;<\/i><\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-6da23e4 elementor-align-center elementor-widget elementor-widget-button\" data-id=\"6da23e4\" data-element_type=\"widget\" data-widget_type=\"button.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<div class=\"elementor-button-wrapper\">\n\t\t\t\t\t<a class=\"elementor-button elementor-button-link elementor-size-md\" href=\"https:\/\/data36.com\/the-junior-data-scientists-first-month-online-course\/\">\n\t\t\t\t\t\t<span class=\"elementor-button-content-wrapper\">\n\t\t\t\t\t\t\t\t\t<span class=\"elementor-button-text\">Learn more...<\/span>\n\t\t\t\t\t<\/span>\n\t\t\t\t\t<\/a>\n\t\t\t\t<\/div>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<\/div>\n\t\t\n\n\n\n<h2 class=\"wp-block-heading\">FULL JOIN<\/h2>\n\n\n\n<p>But you might ask: <em>\u201dWhat should we do to keep Despacito and Zara Larsson in the data set, even though they don\u2019t show up in both data tables?\u201d<\/em> And I\u2019d answer: great question!<\/p>\n\n\n\n<p>Get back to our Venn-diagram&#8230; This time, this is what we want to achieve:<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"696\" src=\"https:\/\/data36.com\/wp-content\/uploads\/2017\/07\/SQL-FULL-JOIN-1024x696.png\" alt=\"SQL FULL JOIN\" class=\"wp-image-1167\" srcset=\"https:\/\/data36.com\/wp-content\/uploads\/2017\/07\/SQL-FULL-JOIN-1024x696.png 1024w, https:\/\/data36.com\/wp-content\/uploads\/2017\/07\/SQL-FULL-JOIN-300x204.png 300w, https:\/\/data36.com\/wp-content\/uploads\/2017\/07\/SQL-FULL-JOIN-768x522.png 768w, https:\/\/data36.com\/wp-content\/uploads\/2017\/07\/SQL-FULL-JOIN-973x661.png 973w, https:\/\/data36.com\/wp-content\/uploads\/2017\/07\/SQL-FULL-JOIN-508x345.png 508w, https:\/\/data36.com\/wp-content\/uploads\/2017\/07\/SQL-FULL-JOIN.png 1604w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><figcaption class=\"wp-element-caption\">SQL FULL JOIN<\/figcaption><\/figure>\n\n\n\n<p>This version of the SQL JOINs is called <em><strong><code>FULL JOIN<\/code><\/strong><\/em><strong>,<\/strong> and to execute it, you should change only one tiny thing in our previous query: add the <code>FULL<\/code> keyword before the <code>JOIN<\/code> keyword.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">SELECT\n  toplist.tophit,\n  toplist.play,\n  playlist.artist,\n  playlist.song\nFROM toplist\n<strong>FULL<\/strong> JOIN playlist\nON toplist.tophit = playlist.song;<\/pre>\n\n\n\n<p>Boom! The magic has happened!<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"634\" src=\"https:\/\/data36.com\/wp-content\/uploads\/2017\/07\/SQL-FULL-JOIN-example-1024x634.png\" alt=\"SQL FULL JOIN example\" class=\"wp-image-1168\" srcset=\"https:\/\/data36.com\/wp-content\/uploads\/2017\/07\/SQL-FULL-JOIN-example-1024x634.png 1024w, https:\/\/data36.com\/wp-content\/uploads\/2017\/07\/SQL-FULL-JOIN-example-300x186.png 300w, https:\/\/data36.com\/wp-content\/uploads\/2017\/07\/SQL-FULL-JOIN-example-768x476.png 768w, https:\/\/data36.com\/wp-content\/uploads\/2017\/07\/SQL-FULL-JOIN-example-973x603.png 973w, https:\/\/data36.com\/wp-content\/uploads\/2017\/07\/SQL-FULL-JOIN-example-508x315.png 508w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>As you can see, <code>Zara Larsson<\/code> and <code>Despacito<\/code> are there, but those fields that don\u2019t have data, stay empty. These empty fields are called <code>NULL<\/code> values in SQL. I have already mentioned briefly <code>NULL<\/code> and its importance (<a href=\"https:\/\/data36.com\/sql-functions-beginners-tutorial-ep3\/\">in the SQL functions article<\/a>) but I\u2019ll get back to that&nbsp;in more detail later!<\/p>\n\n\n\n<p>However, now you know: the fact that a given value doesn\u2019t exist in both data tables, doesn\u2019t mean that you can\u2019t <code>JOIN<\/code> them. You can with a <code>FULL JOIN<\/code>, but if you do so, the cells of the missing values will stay empty.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">LEFT JOIN AND RIGHT JOIN<\/h2>\n\n\n\n<p>And this brings us right to the next question.<\/p>\n\n\n\n<p>What if we want to apply the concept of <code>FULL JOIN<\/code> &#8212; but keeping the missing values only from one of the SQL tables?<\/p>\n\n\n\n<p>Back to the Venn-diagram! Either this\u2026<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"668\" src=\"https:\/\/data36.com\/wp-content\/uploads\/2017\/07\/SQL-LEFT-JOIN-1024x668.png\" alt=\"SQL LEFT JOIN\" class=\"wp-image-1169\" srcset=\"https:\/\/data36.com\/wp-content\/uploads\/2017\/07\/SQL-LEFT-JOIN-1024x668.png 1024w, https:\/\/data36.com\/wp-content\/uploads\/2017\/07\/SQL-LEFT-JOIN-300x196.png 300w, https:\/\/data36.com\/wp-content\/uploads\/2017\/07\/SQL-LEFT-JOIN-768x501.png 768w, https:\/\/data36.com\/wp-content\/uploads\/2017\/07\/SQL-LEFT-JOIN-973x635.png 973w, https:\/\/data36.com\/wp-content\/uploads\/2017\/07\/SQL-LEFT-JOIN-508x331.png 508w, https:\/\/data36.com\/wp-content\/uploads\/2017\/07\/SQL-LEFT-JOIN.png 1616w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><figcaption class=\"wp-element-caption\">LEFT JOIN<\/figcaption><\/figure>\n\n\n\n<p>\u2026 or this:<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"693\" src=\"https:\/\/data36.com\/wp-content\/uploads\/2017\/07\/SQL-RIGHT-JOIN-1024x693.png\" alt=\"SQL RIGHT JOIN\" class=\"wp-image-1170\" srcset=\"https:\/\/data36.com\/wp-content\/uploads\/2017\/07\/SQL-RIGHT-JOIN-1024x693.png 1024w, https:\/\/data36.com\/wp-content\/uploads\/2017\/07\/SQL-RIGHT-JOIN-300x203.png 300w, https:\/\/data36.com\/wp-content\/uploads\/2017\/07\/SQL-RIGHT-JOIN-768x520.png 768w, https:\/\/data36.com\/wp-content\/uploads\/2017\/07\/SQL-RIGHT-JOIN-973x659.png 973w, https:\/\/data36.com\/wp-content\/uploads\/2017\/07\/SQL-RIGHT-JOIN-508x344.png 508w, https:\/\/data36.com\/wp-content\/uploads\/2017\/07\/SQL-RIGHT-JOIN.png 1580w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><figcaption class=\"wp-element-caption\">RIGHT JOIN<\/figcaption><\/figure>\n\n\n\n<p>Before I reveal the the &#8220;big secret,&#8221; I\u2019d like to emphasize that <strong>this problem occurs quite often in real data projects too.<\/strong><\/p>\n\n\n\n<p>E.g. let\u2019s say, you are running an A\/B test\u00a0in which\u00a0you have two data tables.<\/p>\n\n\n\n<p><span style=\"text-decoration: underline;\"><strong>TABLE #1<\/strong><\/span><\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><thead><tr><th><span style=\"text-decoration: underline;\">USER<\/span><\/th><th><span style=\"text-decoration: underline;\">BUCKET<\/span><\/th><\/tr><\/thead><tbody><tr><td><span style=\"font-weight: 400;\">user1<\/span><\/td><td><span style=\"font-weight: 400;\">A<\/span><\/td><\/tr><tr><td><span style=\"font-weight: 400;\">user2<\/span><\/td><td><span style=\"font-weight: 400;\">B<\/span><\/td><\/tr><tr><td><span style=\"font-weight: 400;\">user3<\/span><\/td><td><span style=\"font-weight: 400;\">A<\/span><\/td><\/tr><tr><td><span style=\"font-weight: 400;\">user4<\/span><\/td><td><span style=\"font-weight: 400;\">B<\/span><\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p><span style=\"text-decoration: underline;\"><strong>TABLE #2<\/strong><\/span><\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><thead><tr><th><span style=\"text-decoration: underline;\">USER<\/span><\/th><th><span style=\"text-decoration: underline;\">TIMESTAMP<\/span><\/th><\/tr><\/thead><tbody><tr><td><span style=\"font-weight: 400;\">user1<\/span><\/td><td>2018-02-01 <span style=\"font-weight: 400;\">19:00:00.000<\/span><\/td><\/tr><tr><td><span style=\"font-weight: 400;\">user2<\/span><\/td><td>2018-02-01 <span style=\"font-weight: 400;\">19:32:11.000<\/span><\/td><\/tr><tr><td><span style=\"font-weight: 400;\">user4<\/span><\/td><td>2018-02-01 <span style=\"font-weight: 400;\">19:44:54.000<\/span><\/td><\/tr><tr><td><span style=\"font-weight: 400;\">user5<\/span><\/td><td>2018-02-01 <span style=\"font-weight: 400;\">19:48:23.000<\/span><\/td><\/tr><tr><td><span style=\"font-weight: 400;\">user1<\/span><\/td><td>2018-02-01 <span style=\"font-weight: 400;\">19:59:01.000<\/span><\/td><\/tr><tr><td><span style=\"font-weight: 400;\">user5<\/span><\/td><td>2018-02-01 <span style=\"font-weight: 400;\">20:01:10.000<\/span><\/td><\/tr><tr><td><span style=\"font-weight: 400;\">user2<\/span><\/td><td>2018-02-01 <span style=\"font-weight: 400;\">20:04:32.000<\/span><\/td><\/tr><tr><td><span style=\"font-weight: 400;\">user4<\/span><\/td><td>2018-02-01 <span style=\"font-weight: 400;\">20:09:54.000<\/span><\/td><\/tr><tr><td><span style=\"font-weight: 400;\">user1<\/span><\/td><td>2018-02-01 <span style=\"font-weight: 400;\">20:12:32.000<\/span><\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>You want to include only 80% of your audience (e.g. <code>user1<\/code>, <code>user2<\/code>, <code>user3<\/code>, <code>user4<\/code> &#8212; but <strong><em>not<\/em><\/strong> <code>user5<\/code>) in your A\/B test, and you put these selected users into a table with the info about which bucket (<code>A<\/code> or <code>B<\/code>) they belong to.<\/p>\n\n\n\n<p>You have another table that works as a usage log (see more here: <a href=\"https:\/\/data36.com\/data-collection\/\">data collection<\/a>) and collects the feature usage for <em>all<\/em> users (<code>user1<\/code>, <code>user2<\/code>, <code>user3<\/code>, <code>user4<\/code> and <strong><em>also<\/em><\/strong> <code>user5<\/code>).<\/p>\n\n\n\n<p>To evaluate your A\/B test, you have to combine the two tables. You want to keep all users from TABLE #1 (even if they didn\u2019t use the feature, ergo didn\u2019t show up in the other table at all &#8212; like <code>user3<\/code>), but you don\u2019t want to keep the users who showed up only in the second table (used the feature, but not part of the A\/B test &#8212; like <code>user5<\/code>).<\/p>\n\n\n\n<p>What do you do? <\/p>\n\n\n\n<p>This:<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"703\" src=\"https:\/\/data36.com\/wp-content\/uploads\/2017\/07\/SQL-LEFT-JOIN-2-1024x703.png\" alt=\"SQL LEFT JOIN 2\" class=\"wp-image-1171\" srcset=\"https:\/\/data36.com\/wp-content\/uploads\/2017\/07\/SQL-LEFT-JOIN-2-1024x703.png 1024w, https:\/\/data36.com\/wp-content\/uploads\/2017\/07\/SQL-LEFT-JOIN-2-300x206.png 300w, https:\/\/data36.com\/wp-content\/uploads\/2017\/07\/SQL-LEFT-JOIN-2-768x527.png 768w, https:\/\/data36.com\/wp-content\/uploads\/2017\/07\/SQL-LEFT-JOIN-2-973x668.png 973w, https:\/\/data36.com\/wp-content\/uploads\/2017\/07\/SQL-LEFT-JOIN-2-508x349.png 508w, https:\/\/data36.com\/wp-content\/uploads\/2017\/07\/SQL-LEFT-JOIN-2.png 1582w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><figcaption class=\"wp-element-caption\">another LEFT JOIN example<\/figcaption><\/figure>\n\n\n\n<p>This is called <code>LEFT JOIN<\/code>. And if you want to perform a <code>LEFT JOIN<\/code>, you simply have to add a <code>LEFT<\/code> keyword to your <code>JOIN<\/code> keyword.<\/p>\n\n\n\n<p>Getting back to our <code>playlist<\/code> + <code>toplist<\/code> data sets, try something like this:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">SELECT\n  toplist.tophit,\n  toplist.play,\n  playlist.artist\nFROM toplist\n<strong>LEFT<\/strong> JOIN playlist\nON toplist.tophit = playlist.song;<\/pre>\n\n\n\n<figure class=\"wp-block-image aligncenter\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"630\" src=\"https:\/\/data36.com\/wp-content\/uploads\/2017\/07\/SQL-LEFT-JOIN-example-2-1024x630.png\" alt=\"SQL LEFT JOIN example 2\" class=\"wp-image-1172\" srcset=\"https:\/\/data36.com\/wp-content\/uploads\/2017\/07\/SQL-LEFT-JOIN-example-2-1024x630.png 1024w, https:\/\/data36.com\/wp-content\/uploads\/2017\/07\/SQL-LEFT-JOIN-example-2-300x185.png 300w, https:\/\/data36.com\/wp-content\/uploads\/2017\/07\/SQL-LEFT-JOIN-example-2-768x472.png 768w, https:\/\/data36.com\/wp-content\/uploads\/2017\/07\/SQL-LEFT-JOIN-example-2-973x599.png 973w, https:\/\/data36.com\/wp-content\/uploads\/2017\/07\/SQL-LEFT-JOIN-example-2-508x313.png 508w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>It keeps every line from the <code>toplist<\/code> table (that&#8217;s the <em>LEFT<\/em> table) even if it doesn\u2019t exist in the <code>playlist<\/code> table (which is the <em>RIGHT<\/em> table). But it keeps only those lines from the <code>playlist<\/code> table that exist in the <code>toplist<\/code> table too. Good!<\/p>\n\n\n\n<p>If you want to execute the opposite, you should do a <code>RIGHT JOIN<\/code> instead:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">SELECT\n  toplist.tophit,\n  toplist.play,\n  playlist.artist\nFROM toplist\n<strong>RIGHT<\/strong> JOIN playlist\nON toplist.tophit = playlist.song;<\/pre>\n\n\n\n<p>Awesome!<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Test yourself #1<\/h2>\n\n\n\n<p>You now know everything you have to know at this point about SQL JOIN! But the best way of learning is practicing! Here\u2019s an analysis, that you should perform on our <code>playlist<\/code> and <code>toplist<\/code> tables:<\/p>\n\n\n\n<p><strong>Given the information in the <em>playlist<\/em> and <em>toplist<\/em> data tables:<\/strong><br><strong>How many plays does each artist have in total?<\/strong><br>.<br>.<br>.<br>Here\u2019s my solution:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">SELECT\n  playlist.artist,\n  SUM(toplist.play)\nFROM playlist\nFULL JOIN toplist\n  ON playlist.song = toplist.tophit\nGROUP BY artist;<\/pre>\n\n\n\n<figure class=\"wp-block-image aligncenter\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"402\" src=\"https:\/\/data36.com\/wp-content\/uploads\/2017\/07\/SQL-test-yourself-1-1024x402.png\" alt=\"SQL test yourself 1\" class=\"wp-image-1173\" srcset=\"https:\/\/data36.com\/wp-content\/uploads\/2017\/07\/SQL-test-yourself-1-1024x402.png 1024w, https:\/\/data36.com\/wp-content\/uploads\/2017\/07\/SQL-test-yourself-1-300x118.png 300w, https:\/\/data36.com\/wp-content\/uploads\/2017\/07\/SQL-test-yourself-1-768x302.png 768w, https:\/\/data36.com\/wp-content\/uploads\/2017\/07\/SQL-test-yourself-1-973x382.png 973w, https:\/\/data36.com\/wp-content\/uploads\/2017\/07\/SQL-test-yourself-1-508x199.png 508w, https:\/\/data36.com\/wp-content\/uploads\/2017\/07\/SQL-test-yourself-1.png 1966w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>And a short explanation:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><code>SELECT<\/code> &#8211;\u00bb We select&#8230;<\/li>\n\n\n\n<li><code>playlist.artist,<\/code> &#8211;\u00bb the artists from the <code>playlist<\/code> table&#8230;<\/li>\n\n\n\n<li><code>SUM(toplist.play)<\/code> &#8211;\u00bb and we sum the number of plays from the <code>toplist<\/code> table\u2026<\/li>\n\n\n\n<li><code>FROM playlist<\/code> &#8211;\u00bb we actually specify that we will use the <code>playlist<\/code> table&#8230;<\/li>\n\n\n\n<li><code>FULL JOIN toplist<\/code> &#8211;\u00bb and we also want to merge the <code>toplist<\/code> table (using the empty fields too)&#8230;<\/li>\n\n\n\n<li><code>ON playlist.song = toplist.tophit<\/code> &#8211;\u00bb the <code>JOIN<\/code> matches those lines where the <code>song<\/code> and <code>tophit<\/code> columns have the same data&#8230;<\/li>\n\n\n\n<li><code>GROUP BY artist;<\/code> &#8211;\u00bb and <code>GROUP BY<\/code> refers to the <code>SUM<\/code> function, above &#8211; we want to see the sums by artist.<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">Test yourself #2<\/h2>\n\n\n\n<p>One more test, before I let you go!<br><strong>Print the top 5 ABBA songs ordered by number of plays!<\/strong><br>.<br>.<br>.<br>And the solution is:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">SELECT\n  playlist.artist,\n  playlist.song,\n  toplist.play\nFROM playlist\nFULL JOIN toplist\n  ON playlist.song = toplist.tophit\nWHERE playlist.artist = 'ABBA'\nORDER BY toplist.play DESC;\n<\/pre>\n\n\n\n<figure class=\"wp-block-image aligncenter\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"456\" src=\"https:\/\/data36.com\/wp-content\/uploads\/2017\/07\/SQL-test-yourself-2-1024x456.png\" alt=\"SQL test yourself 2\" class=\"wp-image-1174\" srcset=\"https:\/\/data36.com\/wp-content\/uploads\/2017\/07\/SQL-test-yourself-2-1024x456.png 1024w, https:\/\/data36.com\/wp-content\/uploads\/2017\/07\/SQL-test-yourself-2-300x134.png 300w, https:\/\/data36.com\/wp-content\/uploads\/2017\/07\/SQL-test-yourself-2-768x342.png 768w, https:\/\/data36.com\/wp-content\/uploads\/2017\/07\/SQL-test-yourself-2-973x433.png 973w, https:\/\/data36.com\/wp-content\/uploads\/2017\/07\/SQL-test-yourself-2-508x226.png 508w, https:\/\/data36.com\/wp-content\/uploads\/2017\/07\/SQL-test-yourself-2.png 1968w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>Well, nothing new here. \ud83d\ude09 (But if you need an explanation, just let me know and I\u2019ll give you one!<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Conclusion<\/h2>\n\n\n\n<p>SQL <code>JOIN<\/code> is really important and you will use it quite often as a data analyst or scientist.&nbsp;This article has given you a solid base of knowledge. Further down the road you will meet even more advanced applications, but using what you have learned from this article \u2013 combined with what you have learned from <a href=\"https:\/\/data36.com\/learn-sql-for-data-analysis-from-scratch\/\">the previous ones<\/a> \u2013 will cover most cases for now.<\/p>\n\n\n\n<p>And there is only one more article left from my <em><a href=\"https:\/\/data36.com\/learn-sql-data-analysis-scratch-redirect\/\">SQL for Data Analysis &#8211; Tutorial for Beginners<\/a><\/em> series. In that one, I\u2019ll introduce some intermediate SQL concepts:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>how to write a query-in-a-query &#8212; aka. <em>subquery<\/em><\/li>\n\n\n\n<li>SQL <code>HAVING<\/code> and<\/li>\n\n\n\n<li>SQL <code>CASE<\/code>!<\/li>\n<\/ul>\n\n\n\n<p><strong>Continue here: <a href=\"https:\/\/data36.com\/sql-data-analysis-advanced-tutorial-ep6\/\">Advanced SQL concepts<\/a>.<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>If you want to learn more about how to become a data scientist, take my 50-minute video course: <a href=\"https:\/\/data36.com\/how-to-become-a-data-scientist\/\">How to Become a Data Scientist.<\/a>&nbsp;(It&#8217;s&nbsp;free!)<\/li>\n\n\n\n<li>Also check out my 6-week online course: <a href=\"https:\/\/data36.com\/jds\/\">The Junior Data Scientist\u2019s First Month video course.<\/a><\/li>\n<\/ul>\n\n\n\n<p><em>Cheers,<\/em><br><strong><em>Tomi Mester<\/em><\/strong><\/p>\n\n\n\n<p><em>Cheers,<\/em><br><strong><em>Tomi Mester<\/em><\/strong><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Combining tables is a key component in data science and analytics. And SQL is really good at it! Of course each case is different, but I run into data science tasks all the time in which joining two multi-million-row data tables took ~20-30 minutes in Python or bash&#8230; and not more than ~10-20 seconds in [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":1418,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[139],"tags":[24,79,53,13,90,73,92,112,29,65,118,30],"class_list":["post-1160","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-coding-data-science-analytics","tag-ab-test","tag-analytics","tag-collect","tag-data","tag-data-coding","tag-data-collection","tag-learn-to-code","tag-postgresql","tag-small-data","tag-sql","tag-sql-join","tag-tomi-mester"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v21.1 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>SQL JOIN for Data Analysts (SQL tutorial for Beginners - ep5)<\/title>\n<meta name=\"description\" content=\"Joining tables is key, when you are doing data analysis. And SQL is really good at it! In this article, you&#039;ll learn how SQL JOIN works!\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/data36.com\/sql-join-data-analysis-tutorial-ep5\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"SQL JOIN for Data Analysts (SQL tutorial for Beginners - ep5)\" \/>\n<meta property=\"og:description\" content=\"Joining tables is a key component, when you are doing data analysis. And SQL is really good at it! Today I&#039;ll show you SQL JOIN and related queries...\" \/>\n<meta property=\"og:url\" content=\"https:\/\/data36.com\/sql-join-data-analysis-tutorial-ep5\/\" \/>\n<meta property=\"og:site_name\" content=\"Data36\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/data36\/\" \/>\n<meta property=\"article:author\" content=\"https:\/\/www.facebook.com\/data36\" \/>\n<meta property=\"article:published_time\" content=\"2017-07-10T22:07:45+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2024-02-14T19:02:10+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/data36.com\/wp-content\/uploads\/2017\/07\/sqljoin.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1200\" \/>\n\t<meta property=\"og:image:height\" content=\"630\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Tomi Mester\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:title\" content=\"SQL JOIN for Data Analysts (SQL tutorial for Beginners - ep5)\" \/>\n<meta name=\"twitter:description\" content=\"Joining tables is a key component, when you are doing data analysis. And SQL is really good at it! Today I&#039;ll show you SQL JOIN and related queries...\" \/>\n<meta name=\"twitter:creator\" content=\"@data36_com\" \/>\n<meta name=\"twitter:site\" content=\"@data36_com\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Tomi Mester\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"11 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/data36.com\/sql-join-data-analysis-tutorial-ep5\/\",\"url\":\"https:\/\/data36.com\/sql-join-data-analysis-tutorial-ep5\/\",\"name\":\"SQL JOIN for Data Analysts (SQL tutorial for Beginners - ep5)\",\"isPartOf\":{\"@id\":\"https:\/\/data36.com\/#website\"},\"datePublished\":\"2017-07-10T22:07:45+00:00\",\"dateModified\":\"2024-02-14T19:02:10+00:00\",\"author\":{\"@id\":\"https:\/\/data36.com\/#\/schema\/person\/cbc505eee4cecd9d74a2c0f0d00d356e\"},\"description\":\"Joining tables is key, when you are doing data analysis. And SQL is really good at it! In this article, you'll learn how SQL JOIN works!\",\"breadcrumb\":{\"@id\":\"https:\/\/data36.com\/sql-join-data-analysis-tutorial-ep5\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/data36.com\/sql-join-data-analysis-tutorial-ep5\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/data36.com\/sql-join-data-analysis-tutorial-ep5\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/data36.com\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"SQL for Data Analysis &#8211; Tutorial for Beginners &#8211; ep5\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/data36.com\/#website\",\"url\":\"https:\/\/data36.com\/\",\"name\":\"Data36\",\"description\":\"Learn Data Science the Hard Way!\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/data36.com\/?s={search_term_string}\"},\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/data36.com\/#\/schema\/person\/cbc505eee4cecd9d74a2c0f0d00d356e\",\"name\":\"Tomi Mester\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/data36.com\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/8b782b29236065ff5e1c0e47a8bdb6ba?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/8b782b29236065ff5e1c0e47a8bdb6ba?s=96&d=mm&r=g\",\"caption\":\"Tomi Mester\"},\"description\":\"Tomi Mester is a data analyst and researcher. He\u2019s the author of the Data36 blog where he gives a sneak peek into online data analysts\u2019 best practices. He writes posts and tutorials on a weekly basis about data science, AB-testing, online research and data coding. Tomi is a guest blogger on Crazyegg, Hackernoon and Tech-In-Asia. You can meet him as a presenter on conferences like: Global E-commerce Summit, TEDx, Business Intelligence Forum, etc...\",\"sameAs\":[\"https:\/\/data36.com\",\"https:\/\/www.facebook.com\/data36\"],\"url\":\"https:\/\/data36.com\/author\/mestitomi\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"SQL JOIN for Data Analysts (SQL tutorial for Beginners - ep5)","description":"Joining tables is key, when you are doing data analysis. And SQL is really good at it! In this article, you'll learn how SQL JOIN works!","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/data36.com\/sql-join-data-analysis-tutorial-ep5\/","og_locale":"en_US","og_type":"article","og_title":"SQL JOIN for Data Analysts (SQL tutorial for Beginners - ep5)","og_description":"Joining tables is a key component, when you are doing data analysis. And SQL is really good at it! Today I'll show you SQL JOIN and related queries...","og_url":"https:\/\/data36.com\/sql-join-data-analysis-tutorial-ep5\/","og_site_name":"Data36","article_publisher":"https:\/\/www.facebook.com\/data36\/","article_author":"https:\/\/www.facebook.com\/data36","article_published_time":"2017-07-10T22:07:45+00:00","article_modified_time":"2024-02-14T19:02:10+00:00","og_image":[{"width":1200,"height":630,"url":"https:\/\/data36.com\/wp-content\/uploads\/2017\/07\/sqljoin.png","type":"image\/png"}],"author":"Tomi Mester","twitter_card":"summary_large_image","twitter_title":"SQL JOIN for Data Analysts (SQL tutorial for Beginners - ep5)","twitter_description":"Joining tables is a key component, when you are doing data analysis. And SQL is really good at it! Today I'll show you SQL JOIN and related queries...","twitter_creator":"@data36_com","twitter_site":"@data36_com","twitter_misc":{"Written by":"Tomi Mester","Est. reading time":"11 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/data36.com\/sql-join-data-analysis-tutorial-ep5\/","url":"https:\/\/data36.com\/sql-join-data-analysis-tutorial-ep5\/","name":"SQL JOIN for Data Analysts (SQL tutorial for Beginners - ep5)","isPartOf":{"@id":"https:\/\/data36.com\/#website"},"datePublished":"2017-07-10T22:07:45+00:00","dateModified":"2024-02-14T19:02:10+00:00","author":{"@id":"https:\/\/data36.com\/#\/schema\/person\/cbc505eee4cecd9d74a2c0f0d00d356e"},"description":"Joining tables is key, when you are doing data analysis. And SQL is really good at it! In this article, you'll learn how SQL JOIN works!","breadcrumb":{"@id":"https:\/\/data36.com\/sql-join-data-analysis-tutorial-ep5\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/data36.com\/sql-join-data-analysis-tutorial-ep5\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/data36.com\/sql-join-data-analysis-tutorial-ep5\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/data36.com\/"},{"@type":"ListItem","position":2,"name":"SQL for Data Analysis &#8211; Tutorial for Beginners &#8211; ep5"}]},{"@type":"WebSite","@id":"https:\/\/data36.com\/#website","url":"https:\/\/data36.com\/","name":"Data36","description":"Learn Data Science the Hard Way!","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/data36.com\/?s={search_term_string}"},"query-input":"required name=search_term_string"}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/data36.com\/#\/schema\/person\/cbc505eee4cecd9d74a2c0f0d00d356e","name":"Tomi Mester","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/data36.com\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/8b782b29236065ff5e1c0e47a8bdb6ba?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/8b782b29236065ff5e1c0e47a8bdb6ba?s=96&d=mm&r=g","caption":"Tomi Mester"},"description":"Tomi Mester is a data analyst and researcher. He\u2019s the author of the Data36 blog where he gives a sneak peek into online data analysts\u2019 best practices. He writes posts and tutorials on a weekly basis about data science, AB-testing, online research and data coding. Tomi is a guest blogger on Crazyegg, Hackernoon and Tech-In-Asia. You can meet him as a presenter on conferences like: Global E-commerce Summit, TEDx, Business Intelligence Forum, etc...","sameAs":["https:\/\/data36.com","https:\/\/www.facebook.com\/data36"],"url":"https:\/\/data36.com\/author\/mestitomi\/"}]}},"_links":{"self":[{"href":"https:\/\/data36.com\/wp-json\/wp\/v2\/posts\/1160","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/data36.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/data36.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/data36.com\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/data36.com\/wp-json\/wp\/v2\/comments?post=1160"}],"version-history":[{"count":1,"href":"https:\/\/data36.com\/wp-json\/wp\/v2\/posts\/1160\/revisions"}],"predecessor-version":[{"id":8870,"href":"https:\/\/data36.com\/wp-json\/wp\/v2\/posts\/1160\/revisions\/8870"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/data36.com\/wp-json\/wp\/v2\/media\/1418"}],"wp:attachment":[{"href":"https:\/\/data36.com\/wp-json\/wp\/v2\/media?parent=1160"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/data36.com\/wp-json\/wp\/v2\/categories?post=1160"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/data36.com\/wp-json\/wp\/v2\/tags?post=1160"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}