{"id":5091,"date":"2020-05-15T16:11:10","date_gmt":"2020-05-15T16:11:10","guid":{"rendered":"https:\/\/data36.com\/?p=5091"},"modified":"2021-09-01T13:44:00","modified_gmt":"2021-09-01T13:44:00","slug":"plot-histogram-python-pandas","status":"publish","type":"post","link":"https:\/\/data36.com\/plot-histogram-python-pandas\/","title":{"rendered":"How to Plot a Histogram in Python (Using Pandas)"},"content":{"rendered":"\n<p>Plotting a histogram in Python is easier than you\u2019d think! And in this article, I&#8217;ll show you how.<\/p>\n\n\n\n<p>I have a strong opinion about visualization in Python, which is: <strong>it should be useful and not pretty<\/strong>.<\/p>\n\n\n\n<p>Why? Because the fancy data visualization for high-stakes presentations should happen in tools that are the best for it: <a href=\"https:\/\/data36.com\/data-visualisation-discovery-tableau\/\">Tableau<\/a>, <a href=\"https:\/\/data36.com\/10-minute-crash-course-google-data-studio\/\">Google Data Studio<\/a>, PowerBI, etc&#8230; Creating charts and graphs natively in Python should serve only one purpose: to make your data science tasks (e.g. prototyping machine learning models) easier and more intuitive.<\/p>\n\n\n\n<p>So in this tutorial, I&#8217;ll focus on how to plot a histogram in Python that&#8217;s:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>fast<\/li><li>easy<\/li><li>useful<\/li><li>and yeah&#8230; probably not the most beautiful (but not ugly, either).<\/li><\/ul>\n\n\n\n<p>The tool we will use for that is a function in our favorite Python data analytics library &#8212; <code>pandas<\/code> &#8212; and it&#8217;s called <code>.hist()<\/code>&#8230; But more about that in the article!<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Download the code base!<\/h2>\n\n\n\n<p>Find the whole code base for this article (in Jupyter Notebook format) here:<\/p>\n\n\n\n<p><a href=\"https:\/\/github.com\/tomimester\/python-histogram\/blob\/master\/plot-histogram-python-pandas.ipynb\">Plot histograms in Python (GitHub link)<\/a><\/p>\n\n\n\n<p>Download it from:&nbsp;<a href=\"https:\/\/github.com\/tomimester\/python-histogram\/archive\/master.zip\" target=\"_blank\" rel=\"noreferrer noopener\" aria-label=\"here (opens in a new tab)\">here<\/a>.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Before we get started&#8230;<\/strong><\/h2>\n\n\n\n<p>In this article, I assume that you have some basic Python and pandas knowledge.<\/p>\n\n\n\n<p>If you don&#8217;t, I recommend starting with these articles:<\/p>\n\n\n\n<ol class=\"wp-block-list\"><li><a href=\"https:\/\/data36.com\/python-libraries-packages-data-scientists\/\">Python libraries and packages for Data Scientists<\/a><\/li><li><a href=\"https:\/\/data36.com\/learn-python-for-data-science-from-scratch\/\">Learn Python from Scratch<\/a><\/li><li><a href=\"https:\/\/data36.com\/pandas-tutorial-1-basics-reading-data-files-dataframes-data-selection\/\">Pandas Tutorial 1<\/a> (Basics)<\/li><li><a href=\"https:\/\/data36.com\/pandas-tutorial-2-aggregation-and-grouping\/\">Pandas Tutorial 2<\/a> (Aggregation and grouping)<\/li><li><a href=\"https:\/\/data36.com\/pandas-tutorial-3-important-data-formatting-methods-merge-sort-reset_index-fillna\/\">Pandas Tutorial 3<\/a> (Data Formatting)<\/li><\/ol>\n\n\n\n<p>Also, this is a hands-on tutorial, so it&#8217;s the best if you do the coding part with me!<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>What is a histogram?<\/strong><\/h2>\n\n\n\n<p>Start with the basics!<\/p>\n\n\n\n<p>What is a histogram and how is it useful?<\/p>\n\n\n\n<p><strong>A histogram shows the number of occurrences of different values in a dataset. At first glance, it is very similar to a bar chart.<\/strong><\/p>\n\n\n\n<p><strong>It looks like this:<\/strong><\/p>\n\n\n\n<figure class=\"wp-block-image\"><img fetchpriority=\"high\" decoding=\"async\" width=\"1024\" height=\"683\" src=\"https:\/\/data36.com\/wp-content\/uploads\/2020\/05\/histogram-example-1024x683.png\" alt=\"histogram example\" class=\"wp-image-5093\" srcset=\"https:\/\/data36.com\/wp-content\/uploads\/2020\/05\/histogram-example-1024x683.png 1024w, https:\/\/data36.com\/wp-content\/uploads\/2020\/05\/histogram-example-300x200.png 300w, https:\/\/data36.com\/wp-content\/uploads\/2020\/05\/histogram-example-768x512.png 768w, https:\/\/data36.com\/wp-content\/uploads\/2020\/05\/histogram-example-973x649.png 973w, https:\/\/data36.com\/wp-content\/uploads\/2020\/05\/histogram-example-508x339.png 508w, https:\/\/data36.com\/wp-content\/uploads\/2020\/05\/histogram-example.png 1080w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p><strong>But a histogram is more than a simple bar chart.<\/strong><\/p>\n\n\n\n<p>Let me give you an example and you&#8217;ll see immediately why.<\/p>\n\n\n\n<p>Let&#8217;s say that you run a gym and you have 250 clients. For some reason, you want to analyze their <em>height<\/em>s. Good!<\/p>\n\n\n\n<p>You have the individual data points &#8211; the height of each and every client in one big Python list:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">height = [185, 172, 172, 169, 181, 162, 186, 171, 177, 174, 184, 163, 174, 173, 182, 169, 174, 170, 176, 179, 169, 182, 181, 179, 181, 171, 175, 170, 174, 179, 171, 173, 171, 170, 171, 175, 169, 177, 185, 180, 174, 170, 171, 186, 176, 172, 177, 188, 176, 179, 177, 173, 169, 173, 174, 179, 181, 181, 177, 181, 171, 183, 179, 174, 178, 175, 182, 185, 189, 167, 167, 172, 176, 181, 177, 163, 174, 180, 177, 180, 174, 174, 177, 178, 177, 176, 171, 178, 176, 182, 183, 177, 173, 172, 178, 176, 173, 176, 172, 180, 173, 183, 178, 179, 169, 177, 180, 170, 174, 176, 167, 177, 181, 170, 178, 168, 175, 166, 182, 178, 175, 171, 183, 187, 164, 183, 185, 178, 168, 181, 174, 172, 168, 179, 180, 172, 179, 169, 180, 176, 174, 175, 181, 180, 179, 176, 176, 179, 177, 180, 174, 161, 182, 189, 178, 175, 175, 175, 176, 169, 172, 170, 177, 174, 178, 174, 181, 177, 189, 164, 172, 181, 191, 174, 176, 174, 183, 174, 180, 174, 168, 177, 179, 183, 175, 172, 179, 177, 177, 175, 182, 178, 187, 182, 179, 166, 179, 178, 180, 182, 173, 180, 172, 187, 168, 165, 166, 170, 169, 187, 174, 167, 182, 172, 168, 181, 179, 173, 184, 176, 185, 179, 185, 176, 168, 190, 172, 174, 171, 174, 177, 177, 179, 186, 175, 168, 168, 172, 165, 180, 173, 174, 175, 167, 170, 180, 179, 173, 186, 168]<br><\/pre>\n\n\n\n<p><em>Note: it&#8217;s in centimeters, folks!<\/em><\/p>\n\n\n\n<p>Looking at 250 data points is not very intuitive, is it?<\/p>\n\n\n\n<p>As we&#8217;ve discussed in the <a href=\"https:\/\/data36.com\/statistical-averages-mean-median-mode\/\">statistical averages<\/a> and <a href=\"https:\/\/data36.com\/statistical-variability-standard-deviation-percentiles-histogram\/\">statistical variability<\/a> articles, you have to &#8220;compress&#8221; these numbers into a few values that are easier to understand yet describe your dataset well enough. These could be:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li><strong>mean:<\/strong> <strong><code>175.952<\/code><\/strong><\/li><li><strong>median: <code>176<\/code><\/strong><\/li><li><strong>mode: <code>174<\/code><\/strong><\/li><li><strong>standard deviation: <code>5.65<\/code><\/strong><\/li><li><strong>10% percentile: <code>168<\/code><\/strong><\/li><li><strong>90% percentile: <code>183<\/code><\/strong><\/li><\/ul>\n\n\n\n<p>Based on these values, you can get a pretty good sense of your data\u2026<\/p>\n\n\n\n<p>But if you plot a histogram, too, you can also visualize the distribution of your data points.<\/p>\n\n\n\n<p>For this dataset above, a histogram would look like this:<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img fetchpriority=\"high\" decoding=\"async\" width=\"1024\" height=\"683\" src=\"https:\/\/data36.com\/wp-content\/uploads\/2020\/05\/histogram-example-1024x683.png\" alt=\"histogram example\" class=\"wp-image-5093\" srcset=\"https:\/\/data36.com\/wp-content\/uploads\/2020\/05\/histogram-example-1024x683.png 1024w, https:\/\/data36.com\/wp-content\/uploads\/2020\/05\/histogram-example-300x200.png 300w, https:\/\/data36.com\/wp-content\/uploads\/2020\/05\/histogram-example-768x512.png 768w, https:\/\/data36.com\/wp-content\/uploads\/2020\/05\/histogram-example-973x649.png 973w, https:\/\/data36.com\/wp-content\/uploads\/2020\/05\/histogram-example-508x339.png 508w, https:\/\/data36.com\/wp-content\/uploads\/2020\/05\/histogram-example.png 1080w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>It&#8217;s very visual, very intuitive and tells you even more than the averages and variability measures above. I love it!<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Bins and ranges. A histogram is not the same as a bar chart!<\/strong><\/h2>\n\n\n\n<p>You most probably realized that in the height dataset we have ~25-30 unique values. If you simply counted the unique values in the dataset and put that on a bar chart, you would have gotten this:<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" width=\"1024\" height=\"683\" src=\"https:\/\/data36.com\/wp-content\/uploads\/2020\/05\/histogram-vs-bar-chart-1024x683.png\" alt=\"histogram vs bar chart\" class=\"wp-image-5095\" srcset=\"https:\/\/data36.com\/wp-content\/uploads\/2020\/05\/histogram-vs-bar-chart-1024x683.png 1024w, https:\/\/data36.com\/wp-content\/uploads\/2020\/05\/histogram-vs-bar-chart-300x200.png 300w, https:\/\/data36.com\/wp-content\/uploads\/2020\/05\/histogram-vs-bar-chart-768x512.png 768w, https:\/\/data36.com\/wp-content\/uploads\/2020\/05\/histogram-vs-bar-chart-973x649.png 973w, https:\/\/data36.com\/wp-content\/uploads\/2020\/05\/histogram-vs-bar-chart-508x339.png 508w, https:\/\/data36.com\/wp-content\/uploads\/2020\/05\/histogram-vs-bar-chart.png 1080w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><figcaption><em>Bar chart that shows the frequency of unique values in the dataset<\/em><\/figcaption><\/figure>\n\n\n\n<p>But when you plot a histogram, there&#8217;s one more initial step: <strong>these unique values will be grouped into ranges.<\/strong> These ranges are called bins or buckets &#8212; and in Python, the default number of bins is <code>10<\/code>. So after the grouping, your histogram looks like this:<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img fetchpriority=\"high\" decoding=\"async\" width=\"1024\" height=\"683\" src=\"https:\/\/data36.com\/wp-content\/uploads\/2020\/05\/histogram-example-1024x683.png\" alt=\"histogram example\" class=\"wp-image-5093\" srcset=\"https:\/\/data36.com\/wp-content\/uploads\/2020\/05\/histogram-example-1024x683.png 1024w, https:\/\/data36.com\/wp-content\/uploads\/2020\/05\/histogram-example-300x200.png 300w, https:\/\/data36.com\/wp-content\/uploads\/2020\/05\/histogram-example-768x512.png 768w, https:\/\/data36.com\/wp-content\/uploads\/2020\/05\/histogram-example-973x649.png 973w, https:\/\/data36.com\/wp-content\/uploads\/2020\/05\/histogram-example-508x339.png 508w, https:\/\/data36.com\/wp-content\/uploads\/2020\/05\/histogram-example.png 1080w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>As I said: pretty similar to a bar chart &#8212; but not the same!<\/p>\n\n\n\n<p><strong>When is this grouping-into-ranges concept useful?<\/strong><\/p>\n\n\n\n<p>For instance when you have way too many unique values in your dataset. (In big data projects, it won&#8217;t be ~25-30 as it was in our example&#8230; more like 25-30 *<em>million* <\/em>unique values.)<\/p>\n\n\n\n<p>For instance, let&#8217;s imagine that you measure the heights of your clients with a laser meter and you store first decimal values, too. Like this:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">height = [185.7, 172.3, 172.8, 169.6, 181.2, 162.2, 186.5, 171.4, 177.9, 174.5, 184.8, 163.6, 174.1, 173.7, 182.8, 169.4, 175.0, 170.7, 176.3, 179.5, 169.4, 182.9, 181.4, 179.0, 181.4, 171.9, 175.3, 170.4, 174.4, 179.2, 171.9, 173.6, 171.9, 170.9, 172.0, 175.9, 169.3, 177.4, 186.0, 180.5, 174.8, 170.7, 171.5, 186.2, 176.3, 172.2, 177.1, 188.6, 176.7, 179.7, 177.8, 173.9, 169.1, 173.9, 174.7, 179.5, 181.0, 181.6, 177.7, 181.3, 171.5, 183.5, 179.1, 174.2, 178.9, 175.5, 182.8, 185.1, 189.1, 167.6, 167.3, 173.0, 177.0, 181.3, 177.9, 163.9, 174.2, 181.0, 177.4, 180.6, 174.7, 174.8, 177.1, 178.5, 177.2, 176.7, 172.0, 178.3, 176.7, 182.8, 183.2, 177.1, 173.7, 172.2, 178.5, 176.5, 173.9, 176.3, 172.3, 180.2, 173.3, 183.3, 178.4, 179.6, 169.4, 177.0, 180.4, 170.3, 174.4, 176.2, 167.8, 177.9, 181.1, 170.8, 178.1, 168.1, 175.8, 166.3, 182.7, 178.5, 175.9, 171.3, 183.6, 187.8, 164.9, 183.4, 185.8, 178.0, 168.8, 181.2, 174.9, 172.4, 168.6, 179.3, 180.8, 172.3, 179.1, 169.1, 180.8, 176.3, 174.9, 175.4, 181.2, 180.5, 179.2, 176.8, 176.5, 179.7, 177.4, 180.1, 174.1, 161.4, 182.2, 189.1, 178.6, 175.4, 175.2, 175.3, 176.1, 169.3, 172.9, 170.0, 177.5, 174.2, 179.0, 175.0, 181.9, 177.3, 189.1, 164.6, 172.1, 181.4, 191.2, 174.5, 176.3, 174.6, 184.0, 174.3, 180.1, 174.1, 168.4, 177.9, 179.0, 183.8, 175.3, 172.3, 179.4, 177.4, 177.7, 175.6, 183.0, 178.2, 187.4, 182.7, 180.0, 166.2, 179.6, 178.5, 180.9, 182.3, 173.6, 180.9, 172.6, 187.7, 168.0, 165.4, 166.1, 170.7, 169.3, 187.7, 174.0, 167.9, 182.7, 172.5, 168.6, 181.3, 179.7, 173.4, 184.4, 176.8, 185.7, 179.0, 185.4, 176.7, 168.7, 190.7, 172.7, 174.8, 171.8, 174.8, 177.5, 177.2, 180.0, 186.8, 175.3, 168.6, 168.9, 172.0, 166.0, 181.0, 173.0, 174.1, 176.0, 167.6, 170.8, 180.0, 179.7, 173.3, 186.9, 168.2]<\/pre>\n\n\n\n<p>This is the very same dataset as it was before&#8230; only one decimal more accurate.<\/p>\n\n\n\n<p>But because of that tiny difference, now you have not ~25 but ~150 unique values. So if you count the occurrences of each value and put it on a bar chart now, you would get this:<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" width=\"908\" height=\"596\" src=\"https:\/\/data36.com\/wp-content\/uploads\/2020\/05\/bar-chart-is-no-historgram.png\" alt=\"bar chart is no historgram\" class=\"wp-image-5096\" srcset=\"https:\/\/data36.com\/wp-content\/uploads\/2020\/05\/bar-chart-is-no-historgram.png 908w, https:\/\/data36.com\/wp-content\/uploads\/2020\/05\/bar-chart-is-no-historgram-300x197.png 300w, https:\/\/data36.com\/wp-content\/uploads\/2020\/05\/bar-chart-is-no-historgram-768x504.png 768w, https:\/\/data36.com\/wp-content\/uploads\/2020\/05\/bar-chart-is-no-historgram-508x333.png 508w\" sizes=\"(max-width: 908px) 100vw, 908px\" \/><\/figure>\n\n\n\n<p>Ouch&#8230;<\/p>\n\n\n\n<p>A histogram, though, even in this case, conveniently does the grouping for you. You get values that are close to each other counted and plotted as values of given ranges\/bins:<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img fetchpriority=\"high\" decoding=\"async\" width=\"1024\" height=\"683\" src=\"https:\/\/data36.com\/wp-content\/uploads\/2020\/05\/histogram-example-1024x683.png\" alt=\"histogram example\" class=\"wp-image-5093\" srcset=\"https:\/\/data36.com\/wp-content\/uploads\/2020\/05\/histogram-example-1024x683.png 1024w, https:\/\/data36.com\/wp-content\/uploads\/2020\/05\/histogram-example-300x200.png 300w, https:\/\/data36.com\/wp-content\/uploads\/2020\/05\/histogram-example-768x512.png 768w, https:\/\/data36.com\/wp-content\/uploads\/2020\/05\/histogram-example-973x649.png 973w, https:\/\/data36.com\/wp-content\/uploads\/2020\/05\/histogram-example-508x339.png 508w, https:\/\/data36.com\/wp-content\/uploads\/2020\/05\/histogram-example.png 1080w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>Beautiful\u2026 but more importantly: useful!<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>How to plot a histogram in Python (step by step)<\/strong><\/h2>\n\n\n\n<p>Now that you know the theory, what a histogram is and why it is useful, it&#8217;s time to learn how to plot one using Python. There are many Python libraries that can do so:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li><code>pandas<\/code><\/li><li><code>matplotlib<\/code><\/li><li><code>seaborn<\/code><\/li><li>&#8230;<\/li><\/ul>\n\n\n\n<p>But I&#8217;ll go with the simplest solution: I&#8217;ll use the <code>.hist()<\/code> function that&#8217;s built into pandas. As I said in the introduction: you don&#8217;t have to do anything fancy here&#8230; You rather need a histogram that&#8217;s useful and informative <em>for you<\/em> &#8212; and for your data science tasks.<\/p>\n\n\n\n<p>Anyway, the <code>.hist()<\/code> pandas function is built on top of the original matplotlib solution. (See more info in the <a href=\"https:\/\/pandas.pydata.org\/pandas-docs\/stable\/reference\/api\/pandas.DataFrame.hist.html\">documentation<\/a>.) So the result and the visual you&#8217;ll get is more or less the same that you&#8217;d get by using matplotlib&#8230; The syntax will be also similar but a little bit closer to the logic that you got used to in pandas. So in my opinion, it&#8217;s better for your learning curve to get familiar with this solution.<\/p>\n\n\n\n<p>Either way, let&#8217;s see how this works!<\/p>\n\n\n\n<p><em>Note: if you are looking for something eye-catching, check out the seaborn Python dataviz library.<\/em><\/p>\n\n\n\t\t<div data-elementor-type=\"section\" data-elementor-id=\"7012\" class=\"elementor elementor-7012\" data-elementor-post-type=\"elementor_library\">\n\t\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-259c3993 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"259c3993\" data-element_type=\"section\" data-settings=\"{&quot;background_background&quot;:&quot;classic&quot;}\">\n\t\t\t\t\t\t\t<div class=\"elementor-background-overlay\"><\/div>\n\t\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-679a79f1\" data-id=\"679a79f1\" data-element_type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-e5ca0c1 elementor-widget elementor-widget-heading\" data-id=\"e5ca0c1\" data-element_type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h4 class=\"elementor-heading-title elementor-size-default\">The Junior Data Scientist's First Month<\/h4>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-f5bc28c elementor-widget elementor-widget-text-editor\" data-id=\"f5bc28c\" data-element_type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p><span style=\"color: var( --e-global-color-text ); font-family: 'PT Serif'; font-size: 1em; word-spacing: var( --e-global-typography-text-word-spacing );\">A 100% practical online course. A 6-week simulation of being a junior data scientist at a true-to-life startup.<\/span><\/p><p><i>&#8220;Solving real problems, getting real experience &#8211; just like in a real data science job.&#8221;<\/i><\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-6da23e4 elementor-align-center elementor-widget elementor-widget-button\" data-id=\"6da23e4\" data-element_type=\"widget\" data-widget_type=\"button.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<div class=\"elementor-button-wrapper\">\n\t\t\t\t\t<a class=\"elementor-button elementor-button-link elementor-size-md\" href=\"https:\/\/data36.com\/the-junior-data-scientists-first-month-online-course\/\">\n\t\t\t\t\t\t<span class=\"elementor-button-content-wrapper\">\n\t\t\t\t\t\t\t\t\t<span class=\"elementor-button-text\">Learn more...<\/span>\n\t\t\t\t\t<\/span>\n\t\t\t\t\t<\/a>\n\t\t\t\t<\/div>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<\/div>\n\t\t\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Step #1: Import pandas and numpy, and set matplotlib<\/strong><\/h3>\n\n\n\n<p>One of the advantages of using the built-in pandas histogram function is that you don&#8217;t have to <code>import<\/code> any other libraries than the usual: <code>numpy<\/code> and <code>pandas<\/code>.<\/p>\n\n\n\n<p>At the very beginning of your project (and of your Jupyter Notebook), run these two lines:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">import numpy as np\nimport pandas as pd<\/pre>\n\n\n\n<p>Great! numpy and pandas are imported and ready to use.<\/p>\n\n\n\n<p>And don&#8217;t forget to add the:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">%matplotlib inline<\/pre>\n\n\n\n<p>line, either &#8212; so you can plot your charts into your Jupyter Notebook.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"2152\" height=\"264\" src=\"https:\/\/data36.com\/wp-content\/uploads\/2020\/05\/import-numpy-pandas-matplotlib.png\" alt=\"import numpy pandas matplotlib\" class=\"wp-image-5097\" srcset=\"https:\/\/data36.com\/wp-content\/uploads\/2020\/05\/import-numpy-pandas-matplotlib.png 2152w, https:\/\/data36.com\/wp-content\/uploads\/2020\/05\/import-numpy-pandas-matplotlib-300x37.png 300w, https:\/\/data36.com\/wp-content\/uploads\/2020\/05\/import-numpy-pandas-matplotlib-768x94.png 768w, https:\/\/data36.com\/wp-content\/uploads\/2020\/05\/import-numpy-pandas-matplotlib-1024x126.png 1024w, https:\/\/data36.com\/wp-content\/uploads\/2020\/05\/import-numpy-pandas-matplotlib-973x119.png 973w, https:\/\/data36.com\/wp-content\/uploads\/2020\/05\/import-numpy-pandas-matplotlib-508x62.png 508w\" sizes=\"(max-width: 2152px) 100vw, 2152px\" \/><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Step #2: Get the data!<\/strong><\/h3>\n\n\n\n<p>As I said, in this tutorial, I assume that you have some basic Python and pandas knowledge. So I also assume that you know how to access your data using Python. <em>(If you don&#8217;t, go back to the top of this article and check out the tutorials I linked there.)<\/em><\/p>\n\n\n\n<p>For this tutorial, you don&#8217;t have to open any files &#8212; I&#8217;ve used a random generator to generate the data points of the height data set.<\/p>\n\n\n\n<p>If you want to work with the exact same dataset as I do (and I recommend doing so), copy-paste these lines into a cell of your Jupyter Notebook:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">mu = 168 #mean<br>sigma = 5 #stddev<br>sample = 250<br>np.random.seed(0)<br>height_f = np.random.normal(mu, sigma, sample).astype(int)<\/pre>\n\n\n\n<pre class=\"wp-block-preformatted\">mu = 176 #mean<br>sigma = 6 #stddev<br>sample = 250<br>np.random.seed(1)<br>height_m = np.random.normal(mu, sigma, sample).astype(int)<\/pre>\n\n\n\n<p>Run them!<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"305\" src=\"https:\/\/data36.com\/wp-content\/uploads\/2020\/05\/getting-the-data-in-python-histogram-1024x305.png\" alt=\"getting the data in python histogram\" class=\"wp-image-5098\" srcset=\"https:\/\/data36.com\/wp-content\/uploads\/2020\/05\/getting-the-data-in-python-histogram-1024x305.png 1024w, https:\/\/data36.com\/wp-content\/uploads\/2020\/05\/getting-the-data-in-python-histogram-300x89.png 300w, https:\/\/data36.com\/wp-content\/uploads\/2020\/05\/getting-the-data-in-python-histogram-768x228.png 768w, https:\/\/data36.com\/wp-content\/uploads\/2020\/05\/getting-the-data-in-python-histogram-973x289.png 973w, https:\/\/data36.com\/wp-content\/uploads\/2020\/05\/getting-the-data-in-python-histogram-508x151.png 508w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>For now, you don&#8217;t have to know what exactly happened above. (I&#8217;ll write a separate article about the <code>np.random<\/code> function.) Just know that this generated two datasets, with 250 data points in each. And because I fixed the parameter of the random generator (with the <code>np.random.seed()<\/code> line<strong>)<\/strong>, you&#8217;ll get the very same numpy arrays with the very same data points that I have.<\/p>\n\n\n\n<p>In the <code>height_f<\/code> dataset you&#8217;ll get 250 height values of female clients of our hypothetical gym.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"462\" src=\"https:\/\/data36.com\/wp-content\/uploads\/2020\/05\/height-f-python-histogram-data-generated-1024x462.png\" alt=\"height f python histogram data generated\" class=\"wp-image-5099\" srcset=\"https:\/\/data36.com\/wp-content\/uploads\/2020\/05\/height-f-python-histogram-data-generated-1024x462.png 1024w, https:\/\/data36.com\/wp-content\/uploads\/2020\/05\/height-f-python-histogram-data-generated-300x135.png 300w, https:\/\/data36.com\/wp-content\/uploads\/2020\/05\/height-f-python-histogram-data-generated-768x347.png 768w, https:\/\/data36.com\/wp-content\/uploads\/2020\/05\/height-f-python-histogram-data-generated-973x439.png 973w, https:\/\/data36.com\/wp-content\/uploads\/2020\/05\/height-f-python-histogram-data-generated-508x229.png 508w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>In the <code>height_m<\/code> dataset there are 250 height values of male clients.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"462\" src=\"https:\/\/data36.com\/wp-content\/uploads\/2020\/05\/height-m-histogram-data-generated-1024x462.png\" alt=\"height m histogram data generated\" class=\"wp-image-5100\" srcset=\"https:\/\/data36.com\/wp-content\/uploads\/2020\/05\/height-m-histogram-data-generated-1024x462.png 1024w, https:\/\/data36.com\/wp-content\/uploads\/2020\/05\/height-m-histogram-data-generated-300x135.png 300w, https:\/\/data36.com\/wp-content\/uploads\/2020\/05\/height-m-histogram-data-generated-768x346.png 768w, https:\/\/data36.com\/wp-content\/uploads\/2020\/05\/height-m-histogram-data-generated-973x439.png 973w, https:\/\/data36.com\/wp-content\/uploads\/2020\/05\/height-m-histogram-data-generated-508x229.png 508w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Step #3: Prepare the data!<\/strong><\/h3>\n\n\n\n<p>The more complex your data science project is, the more things you should do before you can actually plot a histogram in Python.<\/p>\n\n\n\n<p>Preparing your data is usually more than 80% of the job&#8230;<\/p>\n\n\n\n<p>But in this simpler case, you don&#8217;t have to worry about data cleaning (removing duplicates, filling empty values, etc.). You just need to turn your <code>height_m<\/code> and <code>height_f<\/code> data into a pandas DataFrame.<\/p>\n\n\n\n<p>Run this line:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">gym = pd.DataFrame({'height_f': height_f, 'height_m': height_m})<\/pre>\n\n\n\n<p>Great:<\/p>\n\n\n\n<p>We have the heights of female and male gym members in one big 250-row dataframe.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">gym<\/pre>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"482\" src=\"https:\/\/data36.com\/wp-content\/uploads\/2020\/05\/pandas-dataframe-before-plotting-in-python-1024x482.png\" alt=\"pandas dataframe before plotting in python\" class=\"wp-image-5101\" srcset=\"https:\/\/data36.com\/wp-content\/uploads\/2020\/05\/pandas-dataframe-before-plotting-in-python-1024x482.png 1024w, https:\/\/data36.com\/wp-content\/uploads\/2020\/05\/pandas-dataframe-before-plotting-in-python-300x141.png 300w, https:\/\/data36.com\/wp-content\/uploads\/2020\/05\/pandas-dataframe-before-plotting-in-python-768x361.png 768w, https:\/\/data36.com\/wp-content\/uploads\/2020\/05\/pandas-dataframe-before-plotting-in-python-973x458.png 973w, https:\/\/data36.com\/wp-content\/uploads\/2020\/05\/pandas-dataframe-before-plotting-in-python-508x239.png 508w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>[OPTIONAL] Basics: Plotting line charts and bar charts in Python using pandas<\/strong><\/h3>\n\n\n\n<p>Before we plot the histogram itself, I wanted to show you how you would plot a line chart and a bar chart that shows the frequency of the different values in the data set\u2026 so you&#8217;ll be able to compare the different approaches.<\/p>\n\n\n\n<p>And of course, if you have never plotted anything in pandas before, creating a simpler line chart first can be handy.<\/p>\n\n\n\n<p>To put your data on a chart, just type the <code>.plot()<\/code> function right after the pandas dataframe you want to visualize. <strong>By default, <code>.plot()<\/code> returns a line chart.<\/strong><\/p>\n\n\n\n<p>If you <code>plot()<\/code> the <code>gym<\/code> dataframe as it is:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">gym.plot()<\/pre>\n\n\n\n<p>you&#8217;ll get this:<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"683\" src=\"https:\/\/data36.com\/wp-content\/uploads\/2020\/05\/plot-line-chart-python-1024x683.png\" alt=\"plot line chart python\" class=\"wp-image-5102\" srcset=\"https:\/\/data36.com\/wp-content\/uploads\/2020\/05\/plot-line-chart-python-1024x683.png 1024w, https:\/\/data36.com\/wp-content\/uploads\/2020\/05\/plot-line-chart-python-300x200.png 300w, https:\/\/data36.com\/wp-content\/uploads\/2020\/05\/plot-line-chart-python-768x512.png 768w, https:\/\/data36.com\/wp-content\/uploads\/2020\/05\/plot-line-chart-python-973x649.png 973w, https:\/\/data36.com\/wp-content\/uploads\/2020\/05\/plot-line-chart-python-508x339.png 508w, https:\/\/data36.com\/wp-content\/uploads\/2020\/05\/plot-line-chart-python.png 1080w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>Uhh. Messy.<\/p>\n\n\n\n<p>On the y-axis, you can see the different values of the <code>height_m<\/code> and <code>height_f<\/code> datasets. And the x-axis shows the indexes of the dataframe &#8212; which is not very useful in this case.<\/p>\n\n\n\n<p>So let&#8217;s tweak this further!<\/p>\n\n\n\n<p>To get what we wanted to get (plot the occurrence of each unique value in the dataset), we have to work a bit more with the original dataset. Let&#8217;s add a <code>.groupby()<\/code> with a<code> .count()<\/code> aggregate function. (I wrote more about these in <a href=\"https:\/\/data36.com\/pandas-tutorial-2-aggregation-and-grouping\/\">this pandas tutorial<\/a>.)<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">gym.groupby('height_m').count()<\/pre>\n\n\n\n<p>If you plot the output of this, you&#8217;ll get a much nicer line chart:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">gym.groupby('height_m').count().plot()<\/pre>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"683\" src=\"https:\/\/data36.com\/wp-content\/uploads\/2020\/05\/plot-frequency-line-chart-python-1024x683.png\" alt=\"plot frequency line chart python\" class=\"wp-image-5103\" srcset=\"https:\/\/data36.com\/wp-content\/uploads\/2020\/05\/plot-frequency-line-chart-python-1024x683.png 1024w, https:\/\/data36.com\/wp-content\/uploads\/2020\/05\/plot-frequency-line-chart-python-300x200.png 300w, https:\/\/data36.com\/wp-content\/uploads\/2020\/05\/plot-frequency-line-chart-python-768x512.png 768w, https:\/\/data36.com\/wp-content\/uploads\/2020\/05\/plot-frequency-line-chart-python-973x649.png 973w, https:\/\/data36.com\/wp-content\/uploads\/2020\/05\/plot-frequency-line-chart-python-508x339.png 508w, https:\/\/data36.com\/wp-content\/uploads\/2020\/05\/plot-frequency-line-chart-python.png 1080w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><figcaption>frequency of values<\/figcaption><\/figure>\n\n\n\n<p>This is closer to what we wanted\u2026 except that line charts are to show trends. If you want to compare different values, you should use bar charts instead.<\/p>\n\n\n\n<p>To turn your line chart into a bar chart, just add the <code>bar<\/code> keyword:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">gym.groupby('height_m').count().plot.bar()<\/pre>\n\n\n\n<p>or:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">gym.groupby('height_m').count().plot(kind='bar')<\/pre>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"683\" src=\"https:\/\/data36.com\/wp-content\/uploads\/2020\/05\/plot-frequency-bar-chart-1-python-1024x683.png\" alt=\"plot frequency bar chart 1 python\" class=\"wp-image-5105\" srcset=\"https:\/\/data36.com\/wp-content\/uploads\/2020\/05\/plot-frequency-bar-chart-1-python-1024x683.png 1024w, https:\/\/data36.com\/wp-content\/uploads\/2020\/05\/plot-frequency-bar-chart-1-python-300x200.png 300w, https:\/\/data36.com\/wp-content\/uploads\/2020\/05\/plot-frequency-bar-chart-1-python-768x512.png 768w, https:\/\/data36.com\/wp-content\/uploads\/2020\/05\/plot-frequency-bar-chart-1-python-973x649.png 973w, https:\/\/data36.com\/wp-content\/uploads\/2020\/05\/plot-frequency-bar-chart-1-python-508x339.png 508w, https:\/\/data36.com\/wp-content\/uploads\/2020\/05\/plot-frequency-bar-chart-1-python.png 1080w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>And of course, you should run this for the <code>height_f<\/code> dataset, separately:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">gym.groupby('height_f').count().plot.bar()<\/pre>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"683\" src=\"https:\/\/data36.com\/wp-content\/uploads\/2020\/05\/plot-frequency-bar-chart-2-python-1024x683.png\" alt=\"plot frequency bar chart 2 python\" class=\"wp-image-5106\" srcset=\"https:\/\/data36.com\/wp-content\/uploads\/2020\/05\/plot-frequency-bar-chart-2-python-1024x683.png 1024w, https:\/\/data36.com\/wp-content\/uploads\/2020\/05\/plot-frequency-bar-chart-2-python-300x200.png 300w, https:\/\/data36.com\/wp-content\/uploads\/2020\/05\/plot-frequency-bar-chart-2-python-768x512.png 768w, https:\/\/data36.com\/wp-content\/uploads\/2020\/05\/plot-frequency-bar-chart-2-python-973x649.png 973w, https:\/\/data36.com\/wp-content\/uploads\/2020\/05\/plot-frequency-bar-chart-2-python-508x339.png 508w, https:\/\/data36.com\/wp-content\/uploads\/2020\/05\/plot-frequency-bar-chart-2-python.png 1080w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>This is how you visualize the occurrence of each unique value on a bar chart in Python\u2026<\/p>\n\n\n\n<p><strong>But this is still not a histogram, right!?<\/strong><\/p>\n\n\n\n<p>So&#8230;<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Step #4: Plot a histogram in Python!<\/strong><\/h3>\n\n\n\n<p>Once you have your pandas dataframe with the values in it, it&#8217;s extremely easy to put that on a histogram.<\/p>\n\n\n\n<p>Type this:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">gym.hist()<\/pre>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"522\" src=\"https:\/\/data36.com\/wp-content\/uploads\/2020\/05\/plot-histograms-python-1024x522.png\" alt=\"plot histograms python\" class=\"wp-image-5108\" srcset=\"https:\/\/data36.com\/wp-content\/uploads\/2020\/05\/plot-histograms-python-1024x522.png 1024w, https:\/\/data36.com\/wp-content\/uploads\/2020\/05\/plot-histograms-python-300x153.png 300w, https:\/\/data36.com\/wp-content\/uploads\/2020\/05\/plot-histograms-python-768x392.png 768w, https:\/\/data36.com\/wp-content\/uploads\/2020\/05\/plot-histograms-python-973x496.png 973w, https:\/\/data36.com\/wp-content\/uploads\/2020\/05\/plot-histograms-python-508x259.png 508w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><figcaption>plotting histograms in Python<\/figcaption><\/figure>\n\n\n\n<p>Yepp, compared to the bar chart solution above, the <code>.hist()<\/code> function does a ton of cool things for you, automatically:<\/p>\n\n\n\n<ol class=\"wp-block-list\"><li><strong>It does the grouping.<\/strong><br>When using <code>.hist()<\/code> there is no need for the initial <code>.groupby()<\/code> function! <code>.hist()<\/code> automatically groups your data into bins. (By default, into 10 bins.)<br><em>Note: again, &#8220;grouping into bins&#8221; is not the same as &#8220;grouping by unique values&#8221; &#8212; as a bin usually contains a range of values.<\/em><\/li><li><strong>It does the counting.<\/strong> (No need for <code>.count()<\/code> function either.)<\/li><li><strong>It plots a histogram for each column<\/strong> in your dataframe that has numerical values in it.<\/li><\/ol>\n\n\n\n<p><strong>So plotting a histogram (in Python, at least) is definitely a very convenient way to visualize the distribution of your data.<\/strong><\/p>\n\n\n\n<p>If you want a different amount of bins\/buckets than the default 10, you can set that as a parameter. E.g:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><strong>gym.hist(bins=20)<\/strong><\/pre>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"717\" src=\"https:\/\/data36.com\/wp-content\/uploads\/2020\/05\/python-histograms-with-20-bins-1024x717.png\" alt=\"python histograms with 20 bins\" class=\"wp-image-5109\" srcset=\"https:\/\/data36.com\/wp-content\/uploads\/2020\/05\/python-histograms-with-20-bins-1024x717.png 1024w, https:\/\/data36.com\/wp-content\/uploads\/2020\/05\/python-histograms-with-20-bins-300x210.png 300w, https:\/\/data36.com\/wp-content\/uploads\/2020\/05\/python-histograms-with-20-bins-768x538.png 768w, https:\/\/data36.com\/wp-content\/uploads\/2020\/05\/python-histograms-with-20-bins-973x681.png 973w, https:\/\/data36.com\/wp-content\/uploads\/2020\/05\/python-histograms-with-20-bins-508x356.png 508w, https:\/\/data36.com\/wp-content\/uploads\/2020\/05\/python-histograms-with-20-bins.png 1148w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Bonus: Plot your histograms on the same chart!<\/strong><\/h2>\n\n\n\n<p>Sometimes, you want to plot histograms in Python to compare two different columns of your dataframe.<\/p>\n\n\n\n<p>In that case, it&#8217;s handy if you don&#8217;t put these histograms next to each other &#8212; but on the very same chart.<\/p>\n\n\n\n<p>It can be done with a small modification of the code that we have used in the previous section.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">gym.plot.hist(bins=20)<\/pre>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"683\" src=\"https:\/\/data36.com\/wp-content\/uploads\/2020\/05\/plot-pandas-histogram-1024x683.png\" alt=\"plot pandas histogram\" class=\"wp-image-5110\" srcset=\"https:\/\/data36.com\/wp-content\/uploads\/2020\/05\/plot-pandas-histogram-1024x683.png 1024w, https:\/\/data36.com\/wp-content\/uploads\/2020\/05\/plot-pandas-histogram-300x200.png 300w, https:\/\/data36.com\/wp-content\/uploads\/2020\/05\/plot-pandas-histogram-768x512.png 768w, https:\/\/data36.com\/wp-content\/uploads\/2020\/05\/plot-pandas-histogram-973x649.png 973w, https:\/\/data36.com\/wp-content\/uploads\/2020\/05\/plot-pandas-histogram-508x339.png 508w, https:\/\/data36.com\/wp-content\/uploads\/2020\/05\/plot-pandas-histogram.png 1080w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p><em>Note: in this version, you called the <code>.hist()<\/code> function from <code>.plot<\/code>.<\/em><\/p>\n\n\n\n<p>Anyway, since these histograms are overlapping each other, I recommend setting their transparency to 70% by using the <code>alpha<\/code> parameter:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">gym.plot.hist(bins=20, <strong>alpha=0.7<\/strong>)<\/pre>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"683\" src=\"https:\/\/data36.com\/wp-content\/uploads\/2020\/05\/plot-pandas-histogram-transparency-1024x683.png\" alt=\"plot pandas histogram transparency\" class=\"wp-image-5111\" srcset=\"https:\/\/data36.com\/wp-content\/uploads\/2020\/05\/plot-pandas-histogram-transparency-1024x683.png 1024w, https:\/\/data36.com\/wp-content\/uploads\/2020\/05\/plot-pandas-histogram-transparency-300x200.png 300w, https:\/\/data36.com\/wp-content\/uploads\/2020\/05\/plot-pandas-histogram-transparency-768x512.png 768w, https:\/\/data36.com\/wp-content\/uploads\/2020\/05\/plot-pandas-histogram-transparency-973x649.png 973w, https:\/\/data36.com\/wp-content\/uploads\/2020\/05\/plot-pandas-histogram-transparency-508x339.png 508w, https:\/\/data36.com\/wp-content\/uploads\/2020\/05\/plot-pandas-histogram-transparency.png 1080w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>So you can see both charts.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Conclusion<\/strong><\/h2>\n\n\n\n<p>This is it!<br>Just as I promised: plotting a histogram in Python is easy\u2026 as long as you want to keep it simple. You can make this complicated by adding more parameters to display everything more nicely.<\/p>\n\n\n\n<p>But you don&#8217;t have to\u2026<\/p>\n\n\n\n<p>Anyway, these were the basics. Just use the <code>.hist()<\/code> or the <code>.plot.hist()<\/code> functions on the dataframe that contains your data points and you&#8217;ll get beautiful histograms that will show you the distribution of your data.<\/p>\n\n\n\n<p>And don&#8217;t stop here, continue with the <a href=\"https:\/\/data36.com\/scatter-plot-pandas-matplotlib\/\">pandas tutorial episode #5<\/a> where I&#8217;ll show you <a href=\"https:\/\/data36.com\/scatter-plot-pandas-matplotlib\/\">how to plot a scatter plot in pandas<\/a>.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>If you want to learn more about how to become a data scientist, take my 50-minute video course: <a href=\"https:\/\/data36.com\/how-to-become-a-data-scientist\/\">How to Become a Data Scientist.<\/a>&nbsp;(It&#8217;s&nbsp;free!)<\/li>\n\n\n\n<li>Also check out my 6-week online course: <a href=\"https:\/\/data36.com\/jds\/\">The Junior Data Scientist\u2019s First Month video course.<\/a><\/li>\n<\/ul>\n\n\n\n<p><em>Cheers,<\/em><br><strong><em>Tomi Mester<\/em><\/strong><\/p>\n\n\n\n<p><em>Cheers,<\/em><br><strong><em>Tomi Mester<\/em><\/strong><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Plotting a histogram in Python is easier than you\u2019d think! And in this article, I&#8217;ll show you how. I have a strong opinion about visualization in Python, which is: it should be useful and not pretty. Why? Because the fancy data visualization for high-stakes presentations should happen in tools that are the best for it: [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":1418,"comment_status":"open","ping_status":"open","sticky":true,"template":"","format":"standard","meta":{"footnotes":""},"categories":[139],"tags":[],"class_list":["post-5091","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-coding-data-science-analytics"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v21.1 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>How to Plot a Histogram in Python Using Pandas (Tutorial)<\/title>\n<meta name=\"description\" content=\"Plotting a histogram in Python is easier than you\u2019d think! And in this article, I&#039;ll show you how. Follow these 4 easy steps!\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/data36.com\/plot-histogram-python-pandas\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"How to plot a histogram in Python using pandas (tutorial)\" \/>\n<meta property=\"og:description\" content=\"Plotting a histogram in Python is easier than you\u2019d think! And in this article, I&#039;ll show you how. Follow these 4 easy steps!\" \/>\n<meta property=\"og:url\" content=\"https:\/\/data36.com\/plot-histogram-python-pandas\/\" \/>\n<meta property=\"og:site_name\" content=\"Data36\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/data36\/\" \/>\n<meta property=\"article:author\" content=\"https:\/\/www.facebook.com\/data36\" \/>\n<meta property=\"article:published_time\" content=\"2020-05-15T16:11:10+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2021-09-01T13:44:00+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/data36.com\/wp-content\/uploads\/2020\/05\/plot-histogram-python.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1200\" \/>\n\t<meta property=\"og:image:height\" content=\"630\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Tomi Mester\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:title\" content=\"How to plot a histogram in Python using pandas (tutorial)\" \/>\n<meta name=\"twitter:description\" content=\"Plotting a histogram in Python is easier than you\u2019d think! And in this article, I&#039;ll show you how. Follow these 4 easy steps!\" \/>\n<meta name=\"twitter:image\" content=\"https:\/\/data36.com\/wp-content\/uploads\/2020\/05\/plot-histogram-python.png\" \/>\n<meta name=\"twitter:creator\" content=\"@data36_com\" \/>\n<meta name=\"twitter:site\" content=\"@data36_com\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Tomi Mester\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"10 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/data36.com\/plot-histogram-python-pandas\/\",\"url\":\"https:\/\/data36.com\/plot-histogram-python-pandas\/\",\"name\":\"How to Plot a Histogram in Python Using Pandas (Tutorial)\",\"isPartOf\":{\"@id\":\"https:\/\/data36.com\/#website\"},\"datePublished\":\"2020-05-15T16:11:10+00:00\",\"dateModified\":\"2021-09-01T13:44:00+00:00\",\"author\":{\"@id\":\"https:\/\/data36.com\/#\/schema\/person\/cbc505eee4cecd9d74a2c0f0d00d356e\"},\"description\":\"Plotting a histogram in Python is easier than you\u2019d think! And in this article, I'll show you how. Follow these 4 easy steps!\",\"breadcrumb\":{\"@id\":\"https:\/\/data36.com\/plot-histogram-python-pandas\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/data36.com\/plot-histogram-python-pandas\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/data36.com\/plot-histogram-python-pandas\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/data36.com\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"How to Plot a Histogram in Python (Using Pandas)\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/data36.com\/#website\",\"url\":\"https:\/\/data36.com\/\",\"name\":\"Data36\",\"description\":\"Learn Data Science the Hard Way!\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/data36.com\/?s={search_term_string}\"},\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/data36.com\/#\/schema\/person\/cbc505eee4cecd9d74a2c0f0d00d356e\",\"name\":\"Tomi Mester\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/data36.com\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/8b782b29236065ff5e1c0e47a8bdb6ba?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/8b782b29236065ff5e1c0e47a8bdb6ba?s=96&d=mm&r=g\",\"caption\":\"Tomi Mester\"},\"description\":\"Tomi Mester is a data analyst and researcher. He\u2019s the author of the Data36 blog where he gives a sneak peek into online data analysts\u2019 best practices. He writes posts and tutorials on a weekly basis about data science, AB-testing, online research and data coding. Tomi is a guest blogger on Crazyegg, Hackernoon and Tech-In-Asia. You can meet him as a presenter on conferences like: Global E-commerce Summit, TEDx, Business Intelligence Forum, etc...\",\"sameAs\":[\"https:\/\/data36.com\",\"https:\/\/www.facebook.com\/data36\"],\"url\":\"https:\/\/data36.com\/author\/mestitomi\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"How to Plot a Histogram in Python Using Pandas (Tutorial)","description":"Plotting a histogram in Python is easier than you\u2019d think! And in this article, I'll show you how. Follow these 4 easy steps!","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/data36.com\/plot-histogram-python-pandas\/","og_locale":"en_US","og_type":"article","og_title":"How to plot a histogram in Python using pandas (tutorial)","og_description":"Plotting a histogram in Python is easier than you\u2019d think! And in this article, I'll show you how. Follow these 4 easy steps!","og_url":"https:\/\/data36.com\/plot-histogram-python-pandas\/","og_site_name":"Data36","article_publisher":"https:\/\/www.facebook.com\/data36\/","article_author":"https:\/\/www.facebook.com\/data36","article_published_time":"2020-05-15T16:11:10+00:00","article_modified_time":"2021-09-01T13:44:00+00:00","og_image":[{"width":1200,"height":630,"url":"https:\/\/data36.com\/wp-content\/uploads\/2020\/05\/plot-histogram-python.png","type":"image\/png"}],"author":"Tomi Mester","twitter_card":"summary_large_image","twitter_title":"How to plot a histogram in Python using pandas (tutorial)","twitter_description":"Plotting a histogram in Python is easier than you\u2019d think! And in this article, I'll show you how. Follow these 4 easy steps!","twitter_image":"https:\/\/data36.com\/wp-content\/uploads\/2020\/05\/plot-histogram-python.png","twitter_creator":"@data36_com","twitter_site":"@data36_com","twitter_misc":{"Written by":"Tomi Mester","Est. reading time":"10 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/data36.com\/plot-histogram-python-pandas\/","url":"https:\/\/data36.com\/plot-histogram-python-pandas\/","name":"How to Plot a Histogram in Python Using Pandas (Tutorial)","isPartOf":{"@id":"https:\/\/data36.com\/#website"},"datePublished":"2020-05-15T16:11:10+00:00","dateModified":"2021-09-01T13:44:00+00:00","author":{"@id":"https:\/\/data36.com\/#\/schema\/person\/cbc505eee4cecd9d74a2c0f0d00d356e"},"description":"Plotting a histogram in Python is easier than you\u2019d think! And in this article, I'll show you how. Follow these 4 easy steps!","breadcrumb":{"@id":"https:\/\/data36.com\/plot-histogram-python-pandas\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/data36.com\/plot-histogram-python-pandas\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/data36.com\/plot-histogram-python-pandas\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/data36.com\/"},{"@type":"ListItem","position":2,"name":"How to Plot a Histogram in Python (Using Pandas)"}]},{"@type":"WebSite","@id":"https:\/\/data36.com\/#website","url":"https:\/\/data36.com\/","name":"Data36","description":"Learn Data Science the Hard Way!","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/data36.com\/?s={search_term_string}"},"query-input":"required name=search_term_string"}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/data36.com\/#\/schema\/person\/cbc505eee4cecd9d74a2c0f0d00d356e","name":"Tomi Mester","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/data36.com\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/8b782b29236065ff5e1c0e47a8bdb6ba?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/8b782b29236065ff5e1c0e47a8bdb6ba?s=96&d=mm&r=g","caption":"Tomi Mester"},"description":"Tomi Mester is a data analyst and researcher. He\u2019s the author of the Data36 blog where he gives a sneak peek into online data analysts\u2019 best practices. He writes posts and tutorials on a weekly basis about data science, AB-testing, online research and data coding. Tomi is a guest blogger on Crazyegg, Hackernoon and Tech-In-Asia. You can meet him as a presenter on conferences like: Global E-commerce Summit, TEDx, Business Intelligence Forum, etc...","sameAs":["https:\/\/data36.com","https:\/\/www.facebook.com\/data36"],"url":"https:\/\/data36.com\/author\/mestitomi\/"}]}},"_links":{"self":[{"href":"https:\/\/data36.com\/wp-json\/wp\/v2\/posts\/5091","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/data36.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/data36.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/data36.com\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/data36.com\/wp-json\/wp\/v2\/comments?post=5091"}],"version-history":[{"count":0,"href":"https:\/\/data36.com\/wp-json\/wp\/v2\/posts\/5091\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/data36.com\/wp-json\/wp\/v2\/media\/1418"}],"wp:attachment":[{"href":"https:\/\/data36.com\/wp-json\/wp\/v2\/media?parent=5091"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/data36.com\/wp-json\/wp\/v2\/categories?post=5091"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/data36.com\/wp-json\/wp\/v2\/tags?post=5091"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}