{"@attributes":{"version":"2.0"},"channel":{"title":"staticnotes.org","link":"\/","description":"Recent blog posts from Rob on staticnotes.org","generator":"Hugo -- gohugo.io","language":"en-US","lastBuildDate":"Sun, 18 Jan 2026 20:00:00 +0000","image":{"url":"https:\/\/staticnotes.org\/favicon.ico","title":"staticnotes.org"},"item":[{"title":"UK house price changes in real terms","link":"\/posts\/uk-house-prices\/","pubDate":"Sun, 11 Jan 2026 00:00:00 +0000","guid":"\/posts\/uk-house-prices\/","description":"<p>This post is a response to a newspaper article I read this morning. Every month the FT is reporting on how the housing market in the UK is performing as people love to talk about it:<\/p>\n<blockquote>\n<p><strong>UK house prices rise less than expected in 2025 as growth slows<\/strong><\/p>\n<p>UK house prices rose by 0.6 per cent in 2025 after a slowdown at the end of the year, according to lender Nationwide. [&hellip;] Prices fell 0.4 per cent between November and December to an average of \u00a3271,068. Both figures were below analysts\u2019 expectations of a 1.2 per cent annual rise and a 0.1 per cent month-on-month expansion.    <em>FT Weekend (02.01.2026)<\/em><\/p>\n<\/blockquote>\n<p>I am asking myself how useful this statistic is for the average reader, especially for a reader who wants to buy or sell a property.<\/p>\n<p>Two issues are immediately obvious:<\/p>\n<ol>\n<li>The average yearly house price change across all of the UK might not agree with the house price performance in the areas I live in or I want to move to. There are also difference by house type, e.g. flat vs. detached house vs. terraced house. Different types of property might even perform in opposite directions.<\/li>\n<li>British people often view property, at least partially, as an investment opportunity. However, in those cases it would help to at least report the real (inflation-adjusted) house price change, not just the nominal change. An increase of 0.6% in nominal house prices sounds different when one doesn&rsquo;t mention that inflation in 2025 was around 3.6% (UK CPI). Meanwhile other forms of investments did quite well. The FTSE 100 nominally returned 21.4% (which to be fair was a positive outlier this year compared to an average of 6% over the last 10 years).<\/li>\n<\/ol>\n<p>Fortunately, it is not that difficult to compute real house prices more closely to my situation. The raw data is openly available via API:<\/p>\n<ol>\n<li>The HM Land Registry registers the ownership of property in the UK and provides aggregate monthly data of the average transactions by region and property type.<\/li>\n<li>The Office for National Statistics (ONS) provides monthly inflation data (CPI) for the UK.<\/li>\n<\/ol>\n<p>So, in this post I want to compute and visualize how real flat and detached house prices in specific areas performed both nominally and in real terms.<\/p>\n <h2 id=\"querying-the-housing-price-data\">\n  <a href=\"#querying-the-housing-price-data\" class=\"header-link\">\n    Querying the housing price data\n  <\/a>\n<\/h2><p>I am using the UK House Price Index of the Land Registry API:<\/p>\n<p>\n<a href=\"http:\/\/landregistry.data.gov.uk\/data\/ukhpi\/region\" \n\n    target=\"_blank\"\n    >http:\/\/landregistry.data.gov.uk\/data\/ukhpi\/region\n    \n        <span style=\"white-space: nowrap\">&thinsp;<svg\n            style=\"height: 0.7em; width: 0.7em; padding-left: -0.2em;\" focusable=\"false\" data-prefix=\"fas\" data-icon=\"external-link-alt\"\n            class=\"svg-inline--fa fa-external-link-alt fa-w-16\" role=\"img\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\"\n            viewBox=\"0 0 512 512\">\n            <path fill=\"currentColor\"\n                d=\"M432,320H400a16,16,0,0,0-16,16V448H64V128H208a16,16,0,0,0,16-16V80a16,16,0,0,0-16-16H48A48,48,0,0,0,0,112V464a48,48,0,0,0,48,48H400a48,48,0,0,0,48-48V336A16,16,0,0,0,432,320ZM488,0h-128c-21.37,0-32.05,25.91-17,41l35.73,35.73L135,320.37a24,24,0,0,0,0,34L157.67,377a24,24,0,0,0,34,0L435.28,133.32,471,169c15,15,41,4.5,41-17V24A24,24,0,0,0,488,0Z\">\n            <\/path>\n        <\/svg>\n    <\/span>\n        \n    \n<\/a><\/p>\n<p>This allows me to query the monthly average house price value for different regions and property types. Important to note here is that this data is based on the succesful property transactions in a region which we use here as an approximation of the average value of similar properties in the region (that were or weren&rsquo;t sold).<\/p>\n<p>I am based in London and interested how the housing market performs close to me or areas that I am interested living in. So I will query the data for all London boroughs as well as some South England regions around London: Kent, Surrey, Buckinghamshire, and Oxfordshire.<\/p>\n<p>I then resample the monthly data to quarterly data by taking the mean across the 3 months of a quarter.<\/p>\n<p>I then store all the data in parquet files. You can find them \n<a href=\"https:\/\/gitlab.com\/frankRi89\/blog\/-\/tree\/601de34ababe1dc8d3a4aaaeb30a6bbb740de3b4\/notebooks\/uk-house-prices\" \n\n    target=\"_blank\"\n    >here\n    \n        <span style=\"white-space: nowrap\">&thinsp;<svg\n            style=\"height: 0.7em; width: 0.7em; padding-left: -0.2em;\" focusable=\"false\" data-prefix=\"fas\" data-icon=\"external-link-alt\"\n            class=\"svg-inline--fa fa-external-link-alt fa-w-16\" role=\"img\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\"\n            viewBox=\"0 0 512 512\">\n            <path fill=\"currentColor\"\n                d=\"M432,320H400a16,16,0,0,0-16,16V448H64V128H208a16,16,0,0,0,16-16V80a16,16,0,0,0-16-16H48A48,48,0,0,0,0,112V464a48,48,0,0,0,48,48H400a48,48,0,0,0,48-48V336A16,16,0,0,0,432,320ZM488,0h-128c-21.37,0-32.05,25.91-17,41l35.73,35.73L135,320.37a24,24,0,0,0,0,34L157.67,377a24,24,0,0,0,34,0L435.28,133.32,471,169c15,15,41,4.5,41-17V24A24,24,0,0,0,488,0Z\">\n            <\/path>\n        <\/svg>\n    <\/span>\n        \n    \n<\/a>.<\/p>\n <h2 id=\"inspecting-the-data\">\n  <a href=\"#inspecting-the-data\" class=\"header-link\">\n    Inspecting the data\n  <\/a>\n<\/h2><p>Let&rsquo;s have a look at how the data is stored in the file:<\/p>\n<div class=\"highlight\"><pre tabindex=\"0\" class=\"chroma\"><code class=\"language-python\" data-lang=\"python\"><span class=\"line\"><span class=\"cl\"><span class=\"kn\">import<\/span> <span class=\"nn\">pandas<\/span> <span class=\"k\">as<\/span> <span class=\"nn\">pd<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"kn\">import<\/span> <span class=\"nn\">warnings<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"kn\">import<\/span> <span class=\"nn\">matplotlib.pyplot<\/span> <span class=\"k\">as<\/span> <span class=\"nn\">plt<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"n\">pd<\/span><span class=\"o\">.<\/span><span class=\"n\">set_option<\/span><span class=\"p\">(<\/span><span class=\"s1\">&#39;display.float_format&#39;<\/span><span class=\"p\">,<\/span> <span class=\"k\">lambda<\/span> <span class=\"n\">x<\/span><span class=\"p\">:<\/span> <span class=\"s1\">&#39;<\/span><span class=\"si\">%.2f<\/span><span class=\"s1\">&#39;<\/span> <span class=\"o\">%<\/span> <span class=\"n\">x<\/span><span class=\"p\">)<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"n\">warnings<\/span><span class=\"o\">.<\/span><span class=\"n\">simplefilter<\/span><span class=\"p\">(<\/span><span class=\"s1\">&#39;always&#39;<\/span><span class=\"p\">,<\/span> <span class=\"n\">category<\/span><span class=\"o\">=<\/span><span class=\"ne\">UserWarning<\/span><span class=\"p\">)<\/span>\n<\/span><\/span><\/code><\/pre><\/div><p>Here I am loading the file for the different counties that I am interested in:<\/p>\n<div class=\"highlight\"><pre tabindex=\"0\" class=\"chroma\"><code class=\"language-python\" data-lang=\"python\"><span class=\"line\"><span class=\"cl\"><span class=\"n\">df_regions<\/span> <span class=\"o\">=<\/span> <span class=\"n\">pd<\/span><span class=\"o\">.<\/span><span class=\"n\">read_parquet<\/span><span class=\"p\">(<\/span><span class=\"s1\">&#39;.\/data\/uk_house_prices.parquet&#39;<\/span><span class=\"p\">)<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"n\">df_regions<\/span><span class=\"p\">[<\/span><span class=\"s1\">&#39;is_london_borough&#39;<\/span><span class=\"p\">]<\/span> <span class=\"o\">=<\/span> <span class=\"kc\">False<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"n\">df_regions<\/span><span class=\"o\">.<\/span><span class=\"n\">sample<\/span><span class=\"p\">(<\/span><span class=\"mi\">5<\/span><span class=\"p\">,<\/span><span class=\"n\">ignore_index<\/span><span class=\"o\">=<\/span><span class=\"kc\">True<\/span><span class=\"p\">)<\/span>\n<\/span><\/span><\/code><\/pre><\/div><div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n<pre><code>.dataframe tbody tr th {\n    vertical-align: top;\n}\n\n.dataframe thead th {\n    text-align: right;\n}\n<\/code><\/pre>\n<p><\/style><\/p>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th><\/th>\n      <th>Quarter<\/th>\n      <th>Region<\/th>\n      <th>Price<\/th>\n      <th>PropertyType<\/th>\n      <th>is_london_borough<\/th>\n    <\/tr>\n  <\/thead>\n  <tbody>\n    <tr>\n      <th>0<\/th>\n      <td>2023Q4<\/td>\n      <td>Surrey<\/td>\n      <td>517233.67<\/td>\n      <td>All<\/td>\n      <td>False<\/td>\n    <\/tr>\n    <tr>\n      <th>1<\/th>\n      <td>2011Q3<\/td>\n      <td>Kent<\/td>\n      <td>192827.67<\/td>\n      <td>All<\/td>\n      <td>False<\/td>\n    <\/tr>\n    <tr>\n      <th>2<\/th>\n      <td>2014Q2<\/td>\n      <td>Oxfordshire<\/td>\n      <td>277195.00<\/td>\n      <td>Semi_detached<\/td>\n      <td>False<\/td>\n    <\/tr>\n    <tr>\n      <th>3<\/th>\n      <td>2013Q4<\/td>\n      <td>London<\/td>\n      <td>415838.00<\/td>\n      <td>Semi_detached<\/td>\n      <td>False<\/td>\n    <\/tr>\n    <tr>\n      <th>4<\/th>\n      <td>2021Q3<\/td>\n      <td>Buckinghamshire<\/td>\n      <td>431147.67<\/td>\n      <td>All<\/td>\n      <td>False<\/td>\n    <\/tr>\n  <\/tbody>\n<\/table>\n<\/div>\n<p>and here I am loading the file for the London boroughs:<\/p>\n<div class=\"highlight\"><pre tabindex=\"0\" class=\"chroma\"><code class=\"language-python\" data-lang=\"python\"><span class=\"line\"><span class=\"cl\"><span class=\"n\">df_boroughs<\/span> <span class=\"o\">=<\/span> <span class=\"n\">pd<\/span><span class=\"o\">.<\/span><span class=\"n\">read_parquet<\/span><span class=\"p\">(<\/span><span class=\"s1\">&#39;.\/data\/london_borough_prices.parquet&#39;<\/span><span class=\"p\">)<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"n\">df_boroughs<\/span><span class=\"p\">[<\/span><span class=\"s1\">&#39;is_london_borough&#39;<\/span><span class=\"p\">]<\/span> <span class=\"o\">=<\/span> <span class=\"kc\">True<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"n\">df_boroughs<\/span><span class=\"o\">.<\/span><span class=\"n\">sample<\/span><span class=\"p\">(<\/span><span class=\"mi\">5<\/span><span class=\"p\">,<\/span><span class=\"n\">ignore_index<\/span><span class=\"o\">=<\/span><span class=\"kc\">True<\/span><span class=\"p\">)<\/span>\n<\/span><\/span><\/code><\/pre><\/div><div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n<pre><code>.dataframe tbody tr th {\n    vertical-align: top;\n}\n\n.dataframe thead th {\n    text-align: right;\n}\n<\/code><\/pre>\n<p><\/style><\/p>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th><\/th>\n      <th>Quarter<\/th>\n      <th>Region<\/th>\n      <th>Price<\/th>\n      <th>PropertyType<\/th>\n      <th>is_london_borough<\/th>\n    <\/tr>\n  <\/thead>\n  <tbody>\n    <tr>\n      <th>0<\/th>\n      <td>2020Q4<\/td>\n      <td>Hillingdon<\/td>\n      <td>406610.67<\/td>\n      <td>Terraced<\/td>\n      <td>True<\/td>\n    <\/tr>\n    <tr>\n      <th>1<\/th>\n      <td>2024Q4<\/td>\n      <td>Redbridge<\/td>\n      <td>490087.67<\/td>\n      <td>All<\/td>\n      <td>True<\/td>\n    <\/tr>\n    <tr>\n      <th>2<\/th>\n      <td>2017Q2<\/td>\n      <td>Lewisham<\/td>\n      <td>910228.33<\/td>\n      <td>Detached<\/td>\n      <td>True<\/td>\n    <\/tr>\n    <tr>\n      <th>3<\/th>\n      <td>2011Q3<\/td>\n      <td>Islington<\/td>\n      <td>369223.67<\/td>\n      <td>Flat<\/td>\n      <td>True<\/td>\n    <\/tr>\n    <tr>\n      <th>4<\/th>\n      <td>2010Q3<\/td>\n      <td>Enfield<\/td>\n      <td>661067.33<\/td>\n      <td>Detached<\/td>\n      <td>True<\/td>\n    <\/tr>\n  <\/tbody>\n<\/table>\n<\/div>\n<p>You can see that the dataset shows the average quarterly housing transaction values by region. The column <code>PropertyType<\/code> shows what type of property was sold (Flat, Terraced, Detached, Semi_detached).<\/p>\n <h2 id=\"computing-real-house-prices\">\n  <a href=\"#computing-real-house-prices\" class=\"header-link\">\n    Computing real house prices\n  <\/a>\n<\/h2><p>Next we adjust the nominal house prices from the <em>Land Registry<\/em> for inflation and compute the real house prices in terms of &ldquo;2025 pounds&rdquo;.\nI have queried the Office for National statistics API to get the UK&rsquo;s quarterly CPI data  and stored it in a parquet file. I am loading this data here:<\/p>\n<div class=\"highlight\"><pre tabindex=\"0\" class=\"chroma\"><code class=\"language-python\" data-lang=\"python\"><span class=\"line\"><span class=\"cl\"><span class=\"n\">cpi_df<\/span> <span class=\"o\">=<\/span> <span class=\"n\">pd<\/span><span class=\"o\">.<\/span><span class=\"n\">read_parquet<\/span><span class=\"p\">(<\/span><span class=\"s1\">&#39;.\/data\/cpi_data.parquet&#39;<\/span><span class=\"p\">)<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"n\">cpi_df<\/span><span class=\"o\">.<\/span><span class=\"n\">tail<\/span><span class=\"p\">(<\/span><span class=\"mi\">5<\/span><span class=\"p\">)<\/span>\n<\/span><\/span><\/code><\/pre><\/div><div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n<pre><code>.dataframe tbody tr th {\n    vertical-align: top;\n}\n\n.dataframe thead th {\n    text-align: right;\n}\n<\/code><\/pre>\n<p><\/style><\/p>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th><\/th>\n      <th>Quarter<\/th>\n      <th>CPI<\/th>\n    <\/tr>\n  <\/thead>\n  <tbody>\n    <tr>\n      <th>146<\/th>\n      <td>2024Q3<\/td>\n      <td>134.10<\/td>\n    <\/tr>\n    <tr>\n      <th>147<\/th>\n      <td>2024Q4<\/td>\n      <td>135.20<\/td>\n    <\/tr>\n    <tr>\n      <th>148<\/th>\n      <td>2025Q1<\/td>\n      <td>136.00<\/td>\n    <\/tr>\n    <tr>\n      <th>149<\/th>\n      <td>2025Q2<\/td>\n      <td>138.50<\/td>\n    <\/tr>\n    <tr>\n      <th>150<\/th>\n      <td>2025Q3<\/td>\n      <td>139.20<\/td>\n    <\/tr>\n  <\/tbody>\n<\/table>\n<\/div>\n<p>With the consumer price index, we can compute the real price of housing. This real price answers the question: &ldquo;How much was the property worth in last quarter&rsquo;s pounds?&rdquo; If you experienced inflation, your money is worth less, and you will need to pay more for the same house in today&rsquo;s money terms. If you experienced deflation, your money is worth more, and you will need to pay less for a house in today&rsquo;s terms (all else equal).<\/p>\n<p>The formula to compute the real house price in terms of the current CPI:<\/p>\n<p><code>real_price_current = nominal_price_t * (CPI_current \/ CPI_t)<\/code><\/p>\n<div class=\"highlight\"><pre tabindex=\"0\" class=\"chroma\"><code class=\"language-python\" data-lang=\"python\"><span class=\"line\"><span class=\"cl\"><span class=\"c1\"># Merging the dataframes here to do real price calculation in one step<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"n\">df<\/span> <span class=\"o\">=<\/span> <span class=\"n\">pd<\/span><span class=\"o\">.<\/span><span class=\"n\">concat<\/span><span class=\"p\">([<\/span><span class=\"n\">df_regions<\/span><span class=\"p\">,<\/span> <span class=\"n\">df_boroughs<\/span><span class=\"p\">])<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\">\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"c1\"># Get the most recent CPI value<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"n\">current_cpi<\/span> <span class=\"o\">=<\/span> <span class=\"n\">cpi_df<\/span><span class=\"p\">[<\/span><span class=\"s1\">&#39;CPI&#39;<\/span><span class=\"p\">]<\/span><span class=\"o\">.<\/span><span class=\"n\">iloc<\/span><span class=\"p\">[<\/span><span class=\"o\">-<\/span><span class=\"mi\">1<\/span><span class=\"p\">]<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"n\">latest_quarter<\/span> <span class=\"o\">=<\/span> <span class=\"n\">cpi_df<\/span><span class=\"p\">[<\/span><span class=\"s1\">&#39;Quarter&#39;<\/span><span class=\"p\">]<\/span><span class=\"o\">.<\/span><span class=\"n\">iloc<\/span><span class=\"p\">[<\/span><span class=\"o\">-<\/span><span class=\"mi\">1<\/span><span class=\"p\">]<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\">\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"c1\"># Merge CPI df with house prices df<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"n\">df<\/span> <span class=\"o\">=<\/span> <span class=\"n\">df<\/span><span class=\"o\">.<\/span><span class=\"n\">merge<\/span><span class=\"p\">(<\/span><span class=\"n\">cpi_df<\/span><span class=\"p\">,<\/span> <span class=\"n\">on<\/span><span class=\"o\">=<\/span><span class=\"s1\">&#39;Quarter&#39;<\/span><span class=\"p\">,<\/span> <span class=\"n\">how<\/span><span class=\"o\">=<\/span><span class=\"s1\">&#39;left&#39;<\/span><span class=\"p\">)<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\">\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"c1\"># For quarters without CPI data yet, use the most recent CPI<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"n\">df<\/span><span class=\"p\">[<\/span><span class=\"s1\">&#39;CPI&#39;<\/span><span class=\"p\">]<\/span> <span class=\"o\">=<\/span> <span class=\"n\">df<\/span><span class=\"p\">[<\/span><span class=\"s1\">&#39;CPI&#39;<\/span><span class=\"p\">]<\/span><span class=\"o\">.<\/span><span class=\"n\">fillna<\/span><span class=\"p\">(<\/span><span class=\"n\">current_cpi<\/span><span class=\"p\">)<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\">\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"c1\"># Calculate real prices using above formula <\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"n\">df<\/span><span class=\"p\">[<\/span><span class=\"s1\">&#39;RealPrice&#39;<\/span><span class=\"p\">]<\/span> <span class=\"o\">=<\/span> <span class=\"n\">df<\/span><span class=\"p\">[<\/span><span class=\"s1\">&#39;Price&#39;<\/span><span class=\"p\">]<\/span> <span class=\"o\">*<\/span> <span class=\"p\">(<\/span><span class=\"n\">current_cpi<\/span> <span class=\"o\">\/<\/span> <span class=\"n\">df<\/span><span class=\"p\">[<\/span><span class=\"s1\">&#39;CPI&#39;<\/span><span class=\"p\">])<\/span>\n<\/span><\/span><\/code><\/pre><\/div><div class=\"highlight\"><pre tabindex=\"0\" class=\"chroma\"><code class=\"language-python\" data-lang=\"python\"><span class=\"line\"><span class=\"cl\"><span class=\"n\">df<\/span><span class=\"o\">.<\/span><span class=\"n\">sample<\/span><span class=\"p\">(<\/span><span class=\"mi\">5<\/span><span class=\"p\">,<\/span><span class=\"n\">ignore_index<\/span><span class=\"o\">=<\/span><span class=\"kc\">True<\/span><span class=\"p\">)<\/span>\n<\/span><\/span><\/code><\/pre><\/div><div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n<pre><code>.dataframe tbody tr th {\n    vertical-align: top;\n}\n\n.dataframe thead th {\n    text-align: right;\n}\n<\/code><\/pre>\n<p><\/style><\/p>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th><\/th>\n      <th>Quarter<\/th>\n      <th>Region<\/th>\n      <th>Price<\/th>\n      <th>PropertyType<\/th>\n      <th>is_london_borough<\/th>\n      <th>CPI<\/th>\n      <th>RealPrice<\/th>\n    <\/tr>\n  <\/thead>\n  <tbody>\n    <tr>\n      <th>0<\/th>\n      <td>2012Q3<\/td>\n      <td>City Of Westminster<\/td>\n      <td>2138137.67<\/td>\n      <td>Semi_detached<\/td>\n      <td>True<\/td>\n      <td>96.10<\/td>\n      <td>3097073.50<\/td>\n    <\/tr>\n    <tr>\n      <th>1<\/th>\n      <td>2011Q3<\/td>\n      <td>Croydon<\/td>\n      <td>280930.00<\/td>\n      <td>Semi_detached<\/td>\n      <td>True<\/td>\n      <td>93.80<\/td>\n      <td>416902.52<\/td>\n    <\/tr>\n    <tr>\n      <th>2<\/th>\n      <td>2016Q3<\/td>\n      <td>Bromley<\/td>\n      <td>898828.67<\/td>\n      <td>Detached<\/td>\n      <td>True<\/td>\n      <td>100.90<\/td>\n      <td>1240009.42<\/td>\n    <\/tr>\n    <tr>\n      <th>3<\/th>\n      <td>2017Q4<\/td>\n      <td>Ealing<\/td>\n      <td>1160573.67<\/td>\n      <td>Detached<\/td>\n      <td>True<\/td>\n      <td>104.60<\/td>\n      <td>1544472.80<\/td>\n    <\/tr>\n    <tr>\n      <th>4<\/th>\n      <td>2025Q2<\/td>\n      <td>City Of London<\/td>\n      <td>792392.33<\/td>\n      <td>Flat<\/td>\n      <td>True<\/td>\n      <td>138.50<\/td>\n      <td>796397.20<\/td>\n    <\/tr>\n  <\/tbody>\n<\/table>\n<\/div>\n<p>As you can see we now have the column <code>RealPrice<\/code> which gives us the mean house price transation per quarter and region in real terms. Now we can visualize the data to finally see what happened.<\/p>\n <h2 id=\"visualizing-nominal-and-real-house-prices-over-time\">\n  <a href=\"#visualizing-nominal-and-real-house-prices-over-time\" class=\"header-link\">\n    Visualizing nominal and real house prices over time\n  <\/a>\n<\/h2><p>My goal is to plot the nominal and real house prices over time. To keep things visually simpler on the charts I am only selecting a few London boroughs that I am interested in and only look at data since 2015. For London boroughs I am looking at Flats and Terraced houses and for the counties I am only looking at detached houses.<\/p>\n<div class=\"highlight\"><pre tabindex=\"0\" class=\"chroma\"><code class=\"language-python\" data-lang=\"python\"><span class=\"line\"><span class=\"cl\"><span class=\"n\">df<\/span> <span class=\"o\">=<\/span> <span class=\"n\">df<\/span><span class=\"p\">[<\/span><span class=\"n\">df<\/span><span class=\"p\">[<\/span><span class=\"s1\">&#39;Quarter&#39;<\/span><span class=\"p\">]<\/span> <span class=\"o\">&gt;=<\/span> <span class=\"s1\">&#39;2015Q1&#39;<\/span><span class=\"p\">]<\/span><span class=\"o\">.<\/span><span class=\"n\">copy<\/span><span class=\"p\">()<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\">\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"n\">SELECTED_BOROUGHS<\/span> <span class=\"o\">=<\/span> <span class=\"p\">[<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\">    <span class=\"s1\">&#39;Ealing&#39;<\/span><span class=\"p\">,<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\">    <span class=\"s1\">&#39;Hackney&#39;<\/span><span class=\"p\">,<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\">    <span class=\"s1\">&#39;Hammersmith And Fulham&#39;<\/span><span class=\"p\">,<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\">    <span class=\"s1\">&#39;Hounslow&#39;<\/span><span class=\"p\">,<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\">    <span class=\"s1\">&#39;Islington&#39;<\/span><span class=\"p\">,<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\">    <span class=\"s1\">&#39;Richmond Upon Thames&#39;<\/span><span class=\"p\">,<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\">    <span class=\"s1\">&#39;Wandsworth&#39;<\/span><span class=\"p\">,<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"p\">]<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"n\">df_boroughs<\/span> <span class=\"o\">=<\/span> <span class=\"n\">df<\/span><span class=\"p\">[<\/span><span class=\"n\">df<\/span><span class=\"p\">[<\/span><span class=\"s1\">&#39;Region&#39;<\/span><span class=\"p\">]<\/span><span class=\"o\">.<\/span><span class=\"n\">isin<\/span><span class=\"p\">(<\/span><span class=\"n\">SELECTED_BOROUGHS<\/span><span class=\"p\">)<\/span> <span class=\"o\">&amp;<\/span> <span class=\"p\">(<\/span><span class=\"n\">df<\/span><span class=\"p\">[<\/span><span class=\"s1\">&#39;PropertyType&#39;<\/span><span class=\"p\">]<\/span><span class=\"o\">.<\/span><span class=\"n\">isin<\/span><span class=\"p\">([<\/span><span class=\"s1\">&#39;Flat&#39;<\/span><span class=\"p\">,<\/span> <span class=\"s1\">&#39;Terraced&#39;<\/span><span class=\"p\">]))]<\/span><span class=\"o\">.<\/span><span class=\"n\">copy<\/span><span class=\"p\">()<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"n\">df_regions<\/span> <span class=\"o\">=<\/span> <span class=\"n\">df<\/span><span class=\"p\">[(<\/span><span class=\"n\">df<\/span><span class=\"p\">[<\/span><span class=\"s1\">&#39;is_london_borough&#39;<\/span><span class=\"p\">]<\/span> <span class=\"o\">==<\/span> <span class=\"kc\">False<\/span><span class=\"p\">)<\/span> <span class=\"o\">&amp;<\/span> <span class=\"p\">(<\/span><span class=\"n\">df<\/span><span class=\"p\">[<\/span><span class=\"s1\">&#39;PropertyType&#39;<\/span><span class=\"p\">]<\/span> <span class=\"o\">==<\/span> <span class=\"s1\">&#39;Detached&#39;<\/span><span class=\"p\">)]<\/span><span class=\"o\">.<\/span><span class=\"n\">copy<\/span><span class=\"p\">()<\/span>\n<\/span><\/span><\/code><\/pre><\/div><p>In the file <code>utils.py<\/code> I added some code to create the plots and tables. You can safely skip this if not interested. Importing them here in the notebook:<\/p>\n<div class=\"highlight\"><pre tabindex=\"0\" class=\"chroma\"><code class=\"language-python\" data-lang=\"python\"><span class=\"line\"><span class=\"cl\"><span class=\"kn\">from<\/span> <span class=\"nn\">utils<\/span> <span class=\"kn\">import<\/span> <span class=\"n\">create_table<\/span><span class=\"p\">,<\/span> <span class=\"n\">plot_prices_by_type<\/span>\n<\/span><\/span><\/code><\/pre><\/div> <h3 id=\"detached-houses-in-south-england-counties\">\n  <a href=\"#detached-houses-in-south-england-counties\" class=\"header-link\">\n    Detached Houses in South England Counties\n  <\/a>\n<\/h3><p>I am first looking at the house price performance of detached houses in five different English counties. I am plotting both the nominal prices (dashed line) and the real prices (solid line).<\/p>\n<p>I am also creating a table with the real price changes vs. 1 year ago, 2 years ago, 5 years ago, 10 years ago.<\/p>\n<div class=\"highlight\"><pre tabindex=\"0\" class=\"chroma\"><code class=\"language-python\" data-lang=\"python\"><span class=\"line\"><span class=\"cl\"><span class=\"n\">fig<\/span><span class=\"p\">,<\/span> <span class=\"n\">axes<\/span> <span class=\"o\">=<\/span> <span class=\"n\">plt<\/span><span class=\"o\">.<\/span><span class=\"n\">subplots<\/span><span class=\"p\">(<\/span><span class=\"mi\">1<\/span><span class=\"p\">,<\/span> <span class=\"mi\">1<\/span><span class=\"p\">,<\/span> <span class=\"n\">figsize<\/span><span class=\"o\">=<\/span><span class=\"p\">(<\/span><span class=\"mi\">14<\/span><span class=\"p\">,<\/span> <span class=\"mi\">8<\/span><span class=\"p\">))<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"n\">plot_prices_by_type<\/span><span class=\"p\">(<\/span><span class=\"n\">df_regions<\/span><span class=\"p\">,<\/span> <span class=\"s1\">&#39;Detached&#39;<\/span><span class=\"p\">,<\/span> <span class=\"n\">axes<\/span><span class=\"p\">,<\/span> <span class=\"s1\">&#39;Price&#39;<\/span><span class=\"p\">,<\/span> <span class=\"s1\">&#39;Detached Houses in South England counties&#39;<\/span><span class=\"p\">)<\/span>\n<\/span><\/span><\/code><\/pre><\/div><p>\n\n<img src=\"output_21_0.png\" alt=\"png\" \/>\n<\/p>\n<p>You can see that within the last 10 years most counties saw a decent increase in nominal terms and a decline in real house prices for detached properties. The only exception is Kent with a small real increase of 0.9%. London and Surrey both lost around 10% in real terms.<\/p>\n<div class=\"highlight\"><pre tabindex=\"0\" class=\"chroma\"><code class=\"language-python\" data-lang=\"python\"><span class=\"line\"><span class=\"cl\"><span class=\"n\">regions<\/span> <span class=\"o\">=<\/span> <span class=\"nb\">sorted<\/span><span class=\"p\">(<\/span><span class=\"n\">df<\/span><span class=\"p\">[<\/span><span class=\"s1\">&#39;Region&#39;<\/span><span class=\"p\">]<\/span><span class=\"o\">.<\/span><span class=\"n\">unique<\/span><span class=\"p\">())<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"n\">gt_regions_flat<\/span> <span class=\"o\">=<\/span> <span class=\"n\">create_table<\/span><span class=\"p\">(<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\">        <span class=\"n\">df_regions<\/span><span class=\"p\">,<\/span> <span class=\"s1\">&#39;Detached&#39;<\/span><span class=\"p\">,<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\">        <span class=\"s1\">&#39;South England Counties&#39;<\/span><span class=\"p\">,<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\">        <span class=\"n\">regions<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\">    <span class=\"p\">)<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"n\">gt_regions_flat<\/span>\n<\/span><\/span><\/code><\/pre><\/div><div id=\"vuiiiwlslp\" style=\"padding-left:0px;padding-right:0px;padding-top:10px;padding-bottom:10px;overflow-x:auto;overflow-y:auto;width:auto;height:auto;\">\n<style>\n#vuiiiwlslp table {\n          font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Oxygen, Ubuntu, Cantarell, 'Helvetica Neue', 'Fira Sans', 'Droid Sans', Arial, sans-serif;\n          -webkit-font-smoothing: antialiased;\n          -moz-osx-font-smoothing: grayscale;\n        }\n<p>#vuiiiwlslp thead, tbody, tfoot, tr, td, th { border-style: none !important; }\ntr { background-color: transparent !important; }\n#vuiiiwlslp p { margin: 0 !important; padding: 0 !important; }\n#vuiiiwlslp .gt_table { display: table !important; border-collapse: collapse !important; line-height: normal !important; margin-left: auto !important; margin-right: auto !important; color: #333333 !important; font-size: 12px !important; font-weight: normal !important; font-style: normal !important; background-color: #FFFFFF !important; width: auto !important; border-top-style: solid !important; border-top-width: 2px !important; border-top-color: #A8A8A8 !important; border-right-style: none !important; border-right-width: 2px !important; border-right-color: #D3D3D3 !important; border-bottom-style: solid !important; border-bottom-width: 2px !important; border-bottom-color: #A8A8A8 !important; border-left-style: none !important; border-left-width: 2px !important; border-left-color: #D3D3D3 !important; }\n#vuiiiwlslp .gt_caption { padding-top: 4px !important; padding-bottom: 4px !important; }\n#vuiiiwlslp .gt_title { color: #333333 !important; font-size: 125% !important; font-weight: initial !important; padding-top: 4px !important; padding-bottom: 4px !important; padding-left: 5px !important; padding-right: 5px !important; border-bottom-color: #FFFFFF !important; border-bottom-width: 0 !important; }\n#vuiiiwlslp .gt_subtitle { color: #333333 !important; font-size: 85% !important; font-weight: initial !important; padding-top: 3px !important; padding-bottom: 5px !important; padding-left: 5px !important; padding-right: 5px !important; border-top-color: #FFFFFF !important; border-top-width: 0 !important; }\n#vuiiiwlslp .gt_heading { background-color: #FFFFFF !important; text-align: center !important; border-bottom-color: #FFFFFF !important; border-left-style: none !important; border-left-width: 1px !important; border-left-color: #D3D3D3 !important; border-right-style: none !important; border-right-width: 1px !important; border-right-color: #D3D3D3 !important; }\n#vuiiiwlslp .gt_bottom_border { border-bottom-style: solid !important; border-bottom-width: 2px !important; border-bottom-color: #D3D3D3 !important; }\n#vuiiiwlslp .gt_col_headings { border-top-style: solid !important; border-top-width: 2px !important; border-top-color: #D3D3D3 !important; border-bottom-style: solid !important; border-bottom-width: 2px !important; border-bottom-color: #D3D3D3 !important; border-left-style: none !important; border-left-width: 1px !important; border-left-color: #D3D3D3 !important; border-right-style: none !important; border-right-width: 1px !important; border-right-color: #D3D3D3 !important; }\n#vuiiiwlslp .gt_col_heading { color: #333333 !important; background-color: #FFFFFF !important; font-size: 100% !important; font-weight: normal !important; text-transform: inherit !important; border-left-style: none !important; border-left-width: 1px !important; border-left-color: #D3D3D3 !important; border-right-style: none !important; border-right-width: 1px !important; border-right-color: #D3D3D3 !important; vertical-align: bottom !important; padding-top: 5px !important; padding-bottom: 5px !important; padding-left: 5px !important; padding-right: 5px !important; overflow-x: hidden !important; }\n#vuiiiwlslp .gt_column_spanner_outer { color: #333333 !important; background-color: #FFFFFF !important; font-size: 100% !important; font-weight: normal !important; text-transform: inherit !important; padding-top: 0 !important; padding-bottom: 0 !important; padding-left: 4px !important; padding-right: 4px !important; }\n#vuiiiwlslp .gt_column_spanner_outer:first-child { padding-left: 0 !important; }\n#vuiiiwlslp .gt_column_spanner_outer:last-child { padding-right: 0 !important; }\n#vuiiiwlslp .gt_column_spanner { border-bottom-style: solid !important; border-bottom-width: 2px !important; border-bottom-color: #D3D3D3 !important; vertical-align: bottom !important; padding-top: 5px !important; padding-bottom: 5px !important; overflow-x: hidden !important; display: inline-block !important; width: 100% !important; }\n#vuiiiwlslp .gt_spanner_row { border-bottom-style: hidden !important; }\n#vuiiiwlslp .gt_group_heading { padding-top: 8px !important; padding-bottom: 8px !important; padding-left: 5px !important; padding-right: 5px !important; color: #333333 !important; background-color: #FFFFFF !important; font-size: 100% !important; font-weight: initial !important; text-transform: inherit !important; border-top-style: solid !important; border-top-width: 2px !important; border-top-color: #D3D3D3 !important; border-bottom-style: solid !important; border-bottom-width: 2px !important; border-bottom-color: #D3D3D3 !important; border-left-style: none !important; border-left-width: 1px !important; border-left-color: #D3D3D3 !important; border-right-style: none !important; border-right-width: 1px !important; border-right-color: #D3D3D3 !important; vertical-align: middle !important; text-align: left !important; }\n#vuiiiwlslp .gt_empty_group_heading { padding: 0.5px !important; color: #333333 !important; background-color: #FFFFFF !important; font-size: 100% !important; font-weight: initial !important; border-top-style: solid !important; border-top-width: 2px !important; border-top-color: #D3D3D3 !important; border-bottom-style: solid !important; border-bottom-width: 2px !important; border-bottom-color: #D3D3D3 !important; vertical-align: middle !important; }\n#vuiiiwlslp .gt_from_md&gt; :first-child { margin-top: 0 !important; }\n#vuiiiwlslp .gt_from_md&gt; :last-child { margin-bottom: 0 !important; }\n#vuiiiwlslp .gt_row { padding-top: 8px !important; padding-bottom: 8px !important; padding-left: 5px !important; padding-right: 5px !important; margin: 10px !important; border-top-style: solid !important; border-top-width: 1px !important; border-top-color: #D3D3D3 !important; border-left-style: none !important; border-left-width: 1px !important; border-left-color: #D3D3D3 !important; border-right-style: none !important; border-right-width: 1px !important; border-right-color: #D3D3D3 !important; vertical-align: middle !important; overflow-x: hidden !important; }\n#vuiiiwlslp .gt_stub { color: #333333 !important; background-color: #FFFFFF !important; font-size: 100% !important; font-weight: initial !important; text-transform: inherit !important; border-right-style: solid !important; border-right-width: 2px !important; border-right-color: #D3D3D3 !important; padding-left: 5px !important; padding-right: 5px !important; }\n#vuiiiwlslp .gt_stub_row_group { color: #333333 !important; background-color: #FFFFFF !important; font-size: 100% !important; font-weight: initial !important; text-transform: inherit !important; border-right-style: solid !important; border-right-width: 2px !important; border-right-color: #D3D3D3 !important; padding-left: 5px !important; padding-right: 5px !important; vertical-align: top !important; }\n#vuiiiwlslp .gt_row_group_first td { border-top-width: 2px !important; }\n#vuiiiwlslp .gt_row_group_first th { border-top-width: 2px !important; }\n#vuiiiwlslp .gt_striped { color: #333333 !important; background-color: #F4F4F4 !important; }\n#vuiiiwlslp .gt_table_body { border-top-style: solid !important; border-top-width: 2px !important; border-top-color: #D3D3D3 !important; border-bottom-style: solid !important; border-bottom-width: 2px !important; border-bottom-color: #D3D3D3 !important; }\n#vuiiiwlslp .gt_grand_summary_row { color: #333333 !important; background-color: #FFFFFF !important; text-transform: inherit !important; padding-top: 8px !important; padding-bottom: 8px !important; padding-left: 5px !important; padding-right: 5px !important; }\n#vuiiiwlslp .gt_first_grand_summary_row_bottom { border-top-style: double !important; border-top-width: 6px !important; border-top-color: #D3D3D3 !important; }\n#vuiiiwlslp .gt_last_grand_summary_row_top { border-bottom-style: double !important; border-bottom-width: 6px !important; border-bottom-color: #D3D3D3 !important; }\n#vuiiiwlslp .gt_sourcenotes { color: #333333 !important; background-color: #FFFFFF !important; border-bottom-style: none !important; border-bottom-width: 2px !important; border-bottom-color: #D3D3D3 !important; border-left-style: none !important; border-left-width: 2px !important; border-left-color: #D3D3D3 !important; border-right-style: none !important; border-right-width: 2px !important; border-right-color: #D3D3D3 !important; }\n#vuiiiwlslp .gt_sourcenote { font-size: 90% !important; padding-top: 4px !important; padding-bottom: 4px !important; padding-left: 5px !important; padding-right: 5px !important; text-align: left !important; }\n#vuiiiwlslp .gt_left { text-align: left !important; }\n#vuiiiwlslp .gt_center { text-align: center !important; }\n#vuiiiwlslp .gt_right { text-align: right !important; font-variant-numeric: tabular-nums !important; }\n#vuiiiwlslp .gt_font_normal { font-weight: normal !important; }\n#vuiiiwlslp .gt_font_bold { font-weight: bold !important; }\n#vuiiiwlslp .gt_font_italic { font-style: italic !important; }\n#vuiiiwlslp .gt_super { font-size: 65% !important; }\n#vuiiiwlslp .gt_footnote_marks { font-size: 75% !important; vertical-align: 0.4em !important; position: initial !important; }\n#vuiiiwlslp .gt_asterisk { font-size: 100% !important; vertical-align: 0 !important; }<\/p>\n<p><\/style><\/p>\n<table class=\"gt_table\" data-quarto-disable-processing=\"false\" data-quarto-bootstrap=\"false\">\n<thead>\n  <tr class=\"gt_heading\">\n    <td colspan=\"11\" class=\"gt_heading gt_title gt_font_normal\">South England Counties<\/td>\n  <\/tr>\n  <tr class=\"gt_heading\">\n    <td colspan=\"11\" class=\"gt_heading gt_subtitle gt_font_normal gt_bottom_border\">Detached properties - Real prices (CPI-adjusted) - Q4 2025 vs 1, 2, 5, 10 years ago<\/td>\n  <\/tr>\n<tr class=\"gt_col_headings gt_spanner_row\">\n  <th class=\"gt_col_heading gt_columns_bottom_border gt_left\" rowspan=\"2\" colspan=\"1\" scope=\"col\" id=\"\"><\/th>\n  <th class=\"gt_center gt_columns_top_border gt_column_spanner_outer\" rowspan=\"1\" colspan=\"2\" scope=\"colgroup\" id=\"2025-(Current)\">\n    <span class=\"gt_column_spanner\">2025 (Current)<\/span>\n  <\/th>\n  <th class=\"gt_center gt_columns_top_border gt_column_spanner_outer\" rowspan=\"1\" colspan=\"2\" scope=\"colgroup\" id=\"2024-(1y-ago)\">\n    <span class=\"gt_column_spanner\">2024 (1y ago)<\/span>\n  <\/th>\n  <th class=\"gt_center gt_columns_top_border gt_column_spanner_outer\" rowspan=\"1\" colspan=\"2\" scope=\"colgroup\" id=\"2023-(2y-ago)\">\n    <span class=\"gt_column_spanner\">2023 (2y ago)<\/span>\n  <\/th>\n  <th class=\"gt_center gt_columns_top_border gt_column_spanner_outer\" rowspan=\"1\" colspan=\"2\" scope=\"colgroup\" id=\"2020-(5y-ago)\">\n    <span class=\"gt_column_spanner\">2020 (5y ago)<\/span>\n  <\/th>\n  <th class=\"gt_center gt_columns_top_border gt_column_spanner_outer\" rowspan=\"1\" colspan=\"2\" scope=\"colgroup\" id=\"2015-(10y-ago)\">\n    <span class=\"gt_column_spanner\">2015 (10y ago)<\/span>\n  <\/th>\n<\/tr>\n<tr class=\"gt_col_headings\">\n  <th class=\"gt_col_heading gt_columns_bottom_border gt_right\" rowspan=\"1\" colspan=\"1\" scope=\"col\" id=\"2025-(Current)_Real\">Price (\u00a3)<\/th>\n  <th class=\"gt_col_heading gt_columns_bottom_border gt_right\" rowspan=\"1\" colspan=\"1\" scope=\"col\" id=\"2025-(Current)_RealChg\">%<\/th>\n  <th class=\"gt_col_heading gt_columns_bottom_border gt_right\" rowspan=\"1\" colspan=\"1\" scope=\"col\" id=\"2024-(1y-ago)_Real\">Price (\u00a3)<\/th>\n  <th class=\"gt_col_heading gt_columns_bottom_border gt_right\" rowspan=\"1\" colspan=\"1\" scope=\"col\" id=\"2024-(1y-ago)_RealChg\">%<\/th>\n  <th class=\"gt_col_heading gt_columns_bottom_border gt_right\" rowspan=\"1\" colspan=\"1\" scope=\"col\" id=\"2023-(2y-ago)_Real\">Price (\u00a3)<\/th>\n  <th class=\"gt_col_heading gt_columns_bottom_border gt_right\" rowspan=\"1\" colspan=\"1\" scope=\"col\" id=\"2023-(2y-ago)_RealChg\">%<\/th>\n  <th class=\"gt_col_heading gt_columns_bottom_border gt_right\" rowspan=\"1\" colspan=\"1\" scope=\"col\" id=\"2020-(5y-ago)_Real\">Price (\u00a3)<\/th>\n  <th class=\"gt_col_heading gt_columns_bottom_border gt_right\" rowspan=\"1\" colspan=\"1\" scope=\"col\" id=\"2020-(5y-ago)_RealChg\">%<\/th>\n  <th class=\"gt_col_heading gt_columns_bottom_border gt_right\" rowspan=\"1\" colspan=\"1\" scope=\"col\" id=\"2015-(10y-ago)_Real\">Price (\u00a3)<\/th>\n  <th class=\"gt_col_heading gt_columns_bottom_border gt_right\" rowspan=\"1\" colspan=\"1\" scope=\"col\" id=\"2015-(10y-ago)_RealChg\">%<\/th>\n<\/tr>\n<\/thead>\n<tbody class=\"gt_table_body\">\n  <tr>\n    <th class=\"gt_row gt_left gt_stub\">Buckinghamshire<\/th>\n    <td class=\"gt_row gt_right\">&#163;853,221<\/td>\n    <td style=\"color: #000000; background-color: #efc5c3;\" class=\"gt_row gt_right\">0.0<\/td>\n    <td class=\"gt_row gt_right\">&#163;883,178<\/td>\n    <td style=\"color: #000000; background-color: #ecb4b1;\" class=\"gt_row gt_right\">\u22123.4<\/td>\n    <td class=\"gt_row gt_right\">&#163;894,115<\/td>\n    <td style=\"color: #000000; background-color: #ebaeab;\" class=\"gt_row gt_right\">\u22124.6<\/td>\n    <td class=\"gt_row gt_right\">&#163;974,110<\/td>\n    <td style=\"color: #000000; background-color: #e58882;\" class=\"gt_row gt_right\">\u221212.4<\/td>\n    <td class=\"gt_row gt_right\">&#163;924,706<\/td>\n    <td style=\"color: #000000; background-color: #e99f9b;\" class=\"gt_row gt_right\">\u22127.7<\/td>\n  <\/tr>\n  <tr>\n    <th class=\"gt_row gt_left gt_stub\">Kent<\/th>\n    <td class=\"gt_row gt_right\">&#163;605,363<\/td>\n    <td style=\"color: #000000; background-color: #efc5c3;\" class=\"gt_row gt_right\">0.0<\/td>\n    <td class=\"gt_row gt_right\">&#163;620,638<\/td>\n    <td style=\"color: #000000; background-color: #edb9b6;\" class=\"gt_row gt_right\">\u22122.5<\/td>\n    <td class=\"gt_row gt_right\">&#163;633,552<\/td>\n    <td style=\"color: #000000; background-color: #ebafac;\" class=\"gt_row gt_right\">\u22124.4<\/td>\n    <td class=\"gt_row gt_right\">&#163;692,871<\/td>\n    <td style=\"color: #000000; background-color: #e58681;\" class=\"gt_row gt_right\">\u221212.6<\/td>\n    <td class=\"gt_row gt_right\">&#163;599,898<\/td>\n    <td style=\"color: #000000; background-color: #f0cac8;\" class=\"gt_row gt_right\">0.9<\/td>\n  <\/tr>\n  <tr>\n    <th class=\"gt_row gt_left gt_stub\">London<\/th>\n    <td class=\"gt_row gt_right\">&#163;1,132,911<\/td>\n    <td style=\"color: #000000; background-color: #efc5c3;\" class=\"gt_row gt_right\">0.0<\/td>\n    <td class=\"gt_row gt_right\">&#163;1,178,785<\/td>\n    <td style=\"color: #000000; background-color: #ecb2af;\" class=\"gt_row gt_right\">\u22123.9<\/td>\n    <td class=\"gt_row gt_right\">&#163;1,179,862<\/td>\n    <td style=\"color: #000000; background-color: #ecb1ae;\" class=\"gt_row gt_right\">\u22124.0<\/td>\n    <td class=\"gt_row gt_right\">&#163;1,331,199<\/td>\n    <td style=\"color: #000000; background-color: #e37b76;\" class=\"gt_row gt_right\">\u221214.9<\/td>\n    <td class=\"gt_row gt_right\">&#163;1,265,328<\/td>\n    <td style=\"color: #000000; background-color: #e7918d;\" class=\"gt_row gt_right\">\u221210.5<\/td>\n  <\/tr>\n  <tr>\n    <th class=\"gt_row gt_left gt_stub\">Oxfordshire<\/th>\n    <td class=\"gt_row gt_right\">&#163;680,124<\/td>\n    <td style=\"color: #000000; background-color: #efc5c3;\" class=\"gt_row gt_right\">0.0<\/td>\n    <td class=\"gt_row gt_right\">&#163;675,726<\/td>\n    <td style=\"color: #000000; background-color: #f0c8c6;\" class=\"gt_row gt_right\">0.7<\/td>\n    <td class=\"gt_row gt_right\">&#163;676,613<\/td>\n    <td style=\"color: #000000; background-color: #efc8c6;\" class=\"gt_row gt_right\">0.5<\/td>\n    <td class=\"gt_row gt_right\">&#163;757,910<\/td>\n    <td style=\"color: #000000; background-color: #e7928e;\" class=\"gt_row gt_right\">\u221210.3<\/td>\n    <td class=\"gt_row gt_right\">&#163;723,218<\/td>\n    <td style=\"color: #000000; background-color: #eaa8a4;\" class=\"gt_row gt_right\">\u22126.0<\/td>\n  <\/tr>\n  <tr>\n    <th class=\"gt_row gt_left gt_stub\">Surrey<\/th>\n    <td class=\"gt_row gt_right\">&#163;975,634<\/td>\n    <td style=\"color: #000000; background-color: #efc5c3;\" class=\"gt_row gt_right\">0.0<\/td>\n    <td class=\"gt_row gt_right\">&#163;1,007,309<\/td>\n    <td style=\"color: #000000; background-color: #ecb6b3;\" class=\"gt_row gt_right\">\u22123.1<\/td>\n    <td class=\"gt_row gt_right\">&#163;1,023,827<\/td>\n    <td style=\"color: #000000; background-color: #ebaeab;\" class=\"gt_row gt_right\">\u22124.7<\/td>\n    <td class=\"gt_row gt_right\">&#163;1,118,116<\/td>\n    <td style=\"color: #000000; background-color: #e58681;\" class=\"gt_row gt_right\">\u221212.7<\/td>\n    <td class=\"gt_row gt_right\">&#163;1,083,197<\/td>\n    <td style=\"color: #000000; background-color: #e7948f;\" class=\"gt_row gt_right\">\u22129.9<\/td>\n  <\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n <h3 id=\"flats-in-london-boroughs\">\n  <a href=\"#flats-in-london-boroughs\" class=\"header-link\">\n    Flats in London Boroughs\n  <\/a>\n<\/h3><p>Next, I am looking at flats in London. Again I am plotting nominal and real quarterly prices for selected London boroughs and create a table with the corresponding numbers.<\/p>\n<div class=\"highlight\"><pre tabindex=\"0\" class=\"chroma\"><code class=\"language-python\" data-lang=\"python\"><span class=\"line\"><span class=\"cl\"><span class=\"n\">fig<\/span><span class=\"p\">,<\/span> <span class=\"n\">axes<\/span> <span class=\"o\">=<\/span> <span class=\"n\">plt<\/span><span class=\"o\">.<\/span><span class=\"n\">subplots<\/span><span class=\"p\">(<\/span><span class=\"mi\">1<\/span><span class=\"p\">,<\/span> <span class=\"mi\">1<\/span><span class=\"p\">,<\/span> <span class=\"n\">figsize<\/span><span class=\"o\">=<\/span><span class=\"p\">(<\/span><span class=\"mi\">14<\/span><span class=\"p\">,<\/span> <span class=\"mi\">8<\/span><span class=\"p\">))<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"n\">plot_prices_by_type<\/span><span class=\"p\">(<\/span><span class=\"n\">df_boroughs<\/span><span class=\"p\">,<\/span> <span class=\"s1\">&#39;Flat&#39;<\/span><span class=\"p\">,<\/span> <span class=\"n\">axes<\/span><span class=\"p\">,<\/span> <span class=\"s1\">&#39;Price&#39;<\/span><span class=\"p\">,<\/span> <span class=\"s1\">&#39;Flats in selected London boroughs&#39;<\/span><span class=\"p\">)<\/span>\n<\/span><\/span><\/code><\/pre><\/div><p>\n\n<img src=\"output_25_0.png\" alt=\"png\" \/>\n<\/p>\n<p>Both the chart and the table show somewhat stable or slightly increasing nominal prices, but drastic reductions in real prices. Both Hammersmith\/Fulham and Wandsworth are down about 30% in real terms since 2015. Moreover compared to last year and two years ago we still see price declines in real terms.<\/p>\n<div class=\"highlight\"><pre tabindex=\"0\" class=\"chroma\"><code class=\"language-python\" data-lang=\"python\"><span class=\"line\"><span class=\"cl\"><span class=\"n\">regions<\/span> <span class=\"o\">=<\/span> <span class=\"nb\">sorted<\/span><span class=\"p\">(<\/span><span class=\"n\">df<\/span><span class=\"p\">[<\/span><span class=\"s1\">&#39;Region&#39;<\/span><span class=\"p\">]<\/span><span class=\"o\">.<\/span><span class=\"n\">unique<\/span><span class=\"p\">())<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"n\">gt_regions_flat<\/span> <span class=\"o\">=<\/span> <span class=\"n\">create_table<\/span><span class=\"p\">(<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\">        <span class=\"n\">df_boroughs<\/span><span class=\"p\">,<\/span> <span class=\"s1\">&#39;Flat&#39;<\/span><span class=\"p\">,<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\">        <span class=\"s1\">&#39;Selected London Boroughs&#39;<\/span><span class=\"p\">,<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\">        <span class=\"n\">regions<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\">    <span class=\"p\">)<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"n\">gt_regions_flat<\/span>\n<\/span><\/span><\/code><\/pre><\/div><div id=\"zldtaermwk\" style=\"padding-left:0px;padding-right:0px;padding-top:10px;padding-bottom:10px;overflow-x:auto;overflow-y:auto;width:auto;height:auto;\">\n<style>\n#zldtaermwk table {\n          font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Oxygen, Ubuntu, Cantarell, 'Helvetica Neue', 'Fira Sans', 'Droid Sans', Arial, sans-serif;\n          -webkit-font-smoothing: antialiased;\n          -moz-osx-font-smoothing: grayscale;\n        }\n<p>#zldtaermwk thead, tbody, tfoot, tr, td, th { border-style: none !important; }\ntr { background-color: transparent !important; }\n#zldtaermwk p { margin: 0 !important; padding: 0 !important; }\n#zldtaermwk .gt_table { display: table !important; border-collapse: collapse !important; line-height: normal !important; margin-left: auto !important; margin-right: auto !important; color: #333333 !important; font-size: 12px !important; font-weight: normal !important; font-style: normal !important; background-color: #FFFFFF !important; width: auto !important; border-top-style: solid !important; border-top-width: 2px !important; border-top-color: #A8A8A8 !important; border-right-style: none !important; border-right-width: 2px !important; border-right-color: #D3D3D3 !important; border-bottom-style: solid !important; border-bottom-width: 2px !important; border-bottom-color: #A8A8A8 !important; border-left-style: none !important; border-left-width: 2px !important; border-left-color: #D3D3D3 !important; }\n#zldtaermwk .gt_caption { padding-top: 4px !important; padding-bottom: 4px !important; }\n#zldtaermwk .gt_title { color: #333333 !important; font-size: 125% !important; font-weight: initial !important; padding-top: 4px !important; padding-bottom: 4px !important; padding-left: 5px !important; padding-right: 5px !important; border-bottom-color: #FFFFFF !important; border-bottom-width: 0 !important; }\n#zldtaermwk .gt_subtitle { color: #333333 !important; font-size: 85% !important; font-weight: initial !important; padding-top: 3px !important; padding-bottom: 5px !important; padding-left: 5px !important; padding-right: 5px !important; border-top-color: #FFFFFF !important; border-top-width: 0 !important; }\n#zldtaermwk .gt_heading { background-color: #FFFFFF !important; text-align: center !important; border-bottom-color: #FFFFFF !important; border-left-style: none !important; border-left-width: 1px !important; border-left-color: #D3D3D3 !important; border-right-style: none !important; border-right-width: 1px !important; border-right-color: #D3D3D3 !important; }\n#zldtaermwk .gt_bottom_border { border-bottom-style: solid !important; border-bottom-width: 2px !important; border-bottom-color: #D3D3D3 !important; }\n#zldtaermwk .gt_col_headings { border-top-style: solid !important; border-top-width: 2px !important; border-top-color: #D3D3D3 !important; border-bottom-style: solid !important; border-bottom-width: 2px !important; border-bottom-color: #D3D3D3 !important; border-left-style: none !important; border-left-width: 1px !important; border-left-color: #D3D3D3 !important; border-right-style: none !important; border-right-width: 1px !important; border-right-color: #D3D3D3 !important; }\n#zldtaermwk .gt_col_heading { color: #333333 !important; background-color: #FFFFFF !important; font-size: 100% !important; font-weight: normal !important; text-transform: inherit !important; border-left-style: none !important; border-left-width: 1px !important; border-left-color: #D3D3D3 !important; border-right-style: none !important; border-right-width: 1px !important; border-right-color: #D3D3D3 !important; vertical-align: bottom !important; padding-top: 5px !important; padding-bottom: 5px !important; padding-left: 5px !important; padding-right: 5px !important; overflow-x: hidden !important; }\n#zldtaermwk .gt_column_spanner_outer { color: #333333 !important; background-color: #FFFFFF !important; font-size: 100% !important; font-weight: normal !important; text-transform: inherit !important; padding-top: 0 !important; padding-bottom: 0 !important; padding-left: 4px !important; padding-right: 4px !important; }\n#zldtaermwk .gt_column_spanner_outer:first-child { padding-left: 0 !important; }\n#zldtaermwk .gt_column_spanner_outer:last-child { padding-right: 0 !important; }\n#zldtaermwk .gt_column_spanner { border-bottom-style: solid !important; border-bottom-width: 2px !important; border-bottom-color: #D3D3D3 !important; vertical-align: bottom !important; padding-top: 5px !important; padding-bottom: 5px !important; overflow-x: hidden !important; display: inline-block !important; width: 100% !important; }\n#zldtaermwk .gt_spanner_row { border-bottom-style: hidden !important; }\n#zldtaermwk .gt_group_heading { padding-top: 8px !important; padding-bottom: 8px !important; padding-left: 5px !important; padding-right: 5px !important; color: #333333 !important; background-color: #FFFFFF !important; font-size: 100% !important; font-weight: initial !important; text-transform: inherit !important; border-top-style: solid !important; border-top-width: 2px !important; border-top-color: #D3D3D3 !important; border-bottom-style: solid !important; border-bottom-width: 2px !important; border-bottom-color: #D3D3D3 !important; border-left-style: none !important; border-left-width: 1px !important; border-left-color: #D3D3D3 !important; border-right-style: none !important; border-right-width: 1px !important; border-right-color: #D3D3D3 !important; vertical-align: middle !important; text-align: left !important; }\n#zldtaermwk .gt_empty_group_heading { padding: 0.5px !important; color: #333333 !important; background-color: #FFFFFF !important; font-size: 100% !important; font-weight: initial !important; border-top-style: solid !important; border-top-width: 2px !important; border-top-color: #D3D3D3 !important; border-bottom-style: solid !important; border-bottom-width: 2px !important; border-bottom-color: #D3D3D3 !important; vertical-align: middle !important; }\n#zldtaermwk .gt_from_md&gt; :first-child { margin-top: 0 !important; }\n#zldtaermwk .gt_from_md&gt; :last-child { margin-bottom: 0 !important; }\n#zldtaermwk .gt_row { padding-top: 8px !important; padding-bottom: 8px !important; padding-left: 5px !important; padding-right: 5px !important; margin: 10px !important; border-top-style: solid !important; border-top-width: 1px !important; border-top-color: #D3D3D3 !important; border-left-style: none !important; border-left-width: 1px !important; border-left-color: #D3D3D3 !important; border-right-style: none !important; border-right-width: 1px !important; border-right-color: #D3D3D3 !important; vertical-align: middle !important; overflow-x: hidden !important; }\n#zldtaermwk .gt_stub { color: #333333 !important; background-color: #FFFFFF !important; font-size: 100% !important; font-weight: initial !important; text-transform: inherit !important; border-right-style: solid !important; border-right-width: 2px !important; border-right-color: #D3D3D3 !important; padding-left: 5px !important; padding-right: 5px !important; }\n#zldtaermwk .gt_stub_row_group { color: #333333 !important; background-color: #FFFFFF !important; font-size: 100% !important; font-weight: initial !important; text-transform: inherit !important; border-right-style: solid !important; border-right-width: 2px !important; border-right-color: #D3D3D3 !important; padding-left: 5px !important; padding-right: 5px !important; vertical-align: top !important; }\n#zldtaermwk .gt_row_group_first td { border-top-width: 2px !important; }\n#zldtaermwk .gt_row_group_first th { border-top-width: 2px !important; }\n#zldtaermwk .gt_striped { color: #333333 !important; background-color: #F4F4F4 !important; }\n#zldtaermwk .gt_table_body { border-top-style: solid !important; border-top-width: 2px !important; border-top-color: #D3D3D3 !important; border-bottom-style: solid !important; border-bottom-width: 2px !important; border-bottom-color: #D3D3D3 !important; }\n#zldtaermwk .gt_grand_summary_row { color: #333333 !important; background-color: #FFFFFF !important; text-transform: inherit !important; padding-top: 8px !important; padding-bottom: 8px !important; padding-left: 5px !important; padding-right: 5px !important; }\n#zldtaermwk .gt_first_grand_summary_row_bottom { border-top-style: double !important; border-top-width: 6px !important; border-top-color: #D3D3D3 !important; }\n#zldtaermwk .gt_last_grand_summary_row_top { border-bottom-style: double !important; border-bottom-width: 6px !important; border-bottom-color: #D3D3D3 !important; }\n#zldtaermwk .gt_sourcenotes { color: #333333 !important; background-color: #FFFFFF !important; border-bottom-style: none !important; border-bottom-width: 2px !important; border-bottom-color: #D3D3D3 !important; border-left-style: none !important; border-left-width: 2px !important; border-left-color: #D3D3D3 !important; border-right-style: none !important; border-right-width: 2px !important; border-right-color: #D3D3D3 !important; }\n#zldtaermwk .gt_sourcenote { font-size: 90% !important; padding-top: 4px !important; padding-bottom: 4px !important; padding-left: 5px !important; padding-right: 5px !important; text-align: left !important; }\n#zldtaermwk .gt_left { text-align: left !important; }\n#zldtaermwk .gt_center { text-align: center !important; }\n#zldtaermwk .gt_right { text-align: right !important; font-variant-numeric: tabular-nums !important; }\n#zldtaermwk .gt_font_normal { font-weight: normal !important; }\n#zldtaermwk .gt_font_bold { font-weight: bold !important; }\n#zldtaermwk .gt_font_italic { font-style: italic !important; }\n#zldtaermwk .gt_super { font-size: 65% !important; }\n#zldtaermwk .gt_footnote_marks { font-size: 75% !important; vertical-align: 0.4em !important; position: initial !important; }\n#zldtaermwk .gt_asterisk { font-size: 100% !important; vertical-align: 0 !important; }<\/p>\n<p><\/style><\/p>\n<table class=\"gt_table\" data-quarto-disable-processing=\"false\" data-quarto-bootstrap=\"false\">\n<thead>\n  <tr class=\"gt_heading\">\n    <td colspan=\"11\" class=\"gt_heading gt_title gt_font_normal\">Selected London Boroughs<\/td>\n  <\/tr>\n  <tr class=\"gt_heading\">\n    <td colspan=\"11\" class=\"gt_heading gt_subtitle gt_font_normal gt_bottom_border\">Flat properties - Real prices (CPI-adjusted) - Q4 2025 vs 1, 2, 5, 10 years ago<\/td>\n  <\/tr>\n<tr class=\"gt_col_headings gt_spanner_row\">\n  <th class=\"gt_col_heading gt_columns_bottom_border gt_left\" rowspan=\"2\" colspan=\"1\" scope=\"col\" id=\"\"><\/th>\n  <th class=\"gt_center gt_columns_top_border gt_column_spanner_outer\" rowspan=\"1\" colspan=\"2\" scope=\"colgroup\" id=\"2025-(Current)\">\n    <span class=\"gt_column_spanner\">2025 (Current)<\/span>\n  <\/th>\n  <th class=\"gt_center gt_columns_top_border gt_column_spanner_outer\" rowspan=\"1\" colspan=\"2\" scope=\"colgroup\" id=\"2024-(1y-ago)\">\n    <span class=\"gt_column_spanner\">2024 (1y ago)<\/span>\n  <\/th>\n  <th class=\"gt_center gt_columns_top_border gt_column_spanner_outer\" rowspan=\"1\" colspan=\"2\" scope=\"colgroup\" id=\"2023-(2y-ago)\">\n    <span class=\"gt_column_spanner\">2023 (2y ago)<\/span>\n  <\/th>\n  <th class=\"gt_center gt_columns_top_border gt_column_spanner_outer\" rowspan=\"1\" colspan=\"2\" scope=\"colgroup\" id=\"2020-(5y-ago)\">\n    <span class=\"gt_column_spanner\">2020 (5y ago)<\/span>\n  <\/th>\n  <th class=\"gt_center gt_columns_top_border gt_column_spanner_outer\" rowspan=\"1\" colspan=\"2\" scope=\"colgroup\" id=\"2015-(10y-ago)\">\n    <span class=\"gt_column_spanner\">2015 (10y ago)<\/span>\n  <\/th>\n<\/tr>\n<tr class=\"gt_col_headings\">\n  <th class=\"gt_col_heading gt_columns_bottom_border gt_right\" rowspan=\"1\" colspan=\"1\" scope=\"col\" id=\"2025-(Current)_Real\">Price (\u00a3)<\/th>\n  <th class=\"gt_col_heading gt_columns_bottom_border gt_right\" rowspan=\"1\" colspan=\"1\" scope=\"col\" id=\"2025-(Current)_RealChg\">%<\/th>\n  <th class=\"gt_col_heading gt_columns_bottom_border gt_right\" rowspan=\"1\" colspan=\"1\" scope=\"col\" id=\"2024-(1y-ago)_Real\">Price (\u00a3)<\/th>\n  <th class=\"gt_col_heading gt_columns_bottom_border gt_right\" rowspan=\"1\" colspan=\"1\" scope=\"col\" id=\"2024-(1y-ago)_RealChg\">%<\/th>\n  <th class=\"gt_col_heading gt_columns_bottom_border gt_right\" rowspan=\"1\" colspan=\"1\" scope=\"col\" id=\"2023-(2y-ago)_Real\">Price (\u00a3)<\/th>\n  <th class=\"gt_col_heading gt_columns_bottom_border gt_right\" rowspan=\"1\" colspan=\"1\" scope=\"col\" id=\"2023-(2y-ago)_RealChg\">%<\/th>\n  <th class=\"gt_col_heading gt_columns_bottom_border gt_right\" rowspan=\"1\" colspan=\"1\" scope=\"col\" id=\"2020-(5y-ago)_Real\">Price (\u00a3)<\/th>\n  <th class=\"gt_col_heading gt_columns_bottom_border gt_right\" rowspan=\"1\" colspan=\"1\" scope=\"col\" id=\"2020-(5y-ago)_RealChg\">%<\/th>\n  <th class=\"gt_col_heading gt_columns_bottom_border gt_right\" rowspan=\"1\" colspan=\"1\" scope=\"col\" id=\"2015-(10y-ago)_Real\">Price (\u00a3)<\/th>\n  <th class=\"gt_col_heading gt_columns_bottom_border gt_right\" rowspan=\"1\" colspan=\"1\" scope=\"col\" id=\"2015-(10y-ago)_RealChg\">%<\/th>\n<\/tr>\n<\/thead>\n<tbody class=\"gt_table_body\">\n  <tr>\n    <th class=\"gt_row gt_left gt_stub\">Ealing<\/th>\n    <td class=\"gt_row gt_right\">&#163;404,408<\/td>\n    <td style=\"color: #000000; background-color: #efc5c3;\" class=\"gt_row gt_right\">0.0<\/td>\n    <td class=\"gt_row gt_right\">&#163;419,269<\/td>\n    <td style=\"color: #000000; background-color: #ecb4b1;\" class=\"gt_row gt_right\">\u22123.5<\/td>\n    <td class=\"gt_row gt_right\">&#163;419,733<\/td>\n    <td style=\"color: #000000; background-color: #ecb3b0;\" class=\"gt_row gt_right\">\u22123.7<\/td>\n    <td class=\"gt_row gt_right\">&#163;489,258<\/td>\n    <td style=\"color: #000000; background-color: #e16f69;\" class=\"gt_row gt_right\">\u221217.3<\/td>\n    <td class=\"gt_row gt_right\">&#163;521,670<\/td>\n    <td style=\"color: #000000; background-color: #dd554e;\" class=\"gt_row gt_right\">\u221222.5<\/td>\n  <\/tr>\n  <tr>\n    <th class=\"gt_row gt_left gt_stub\">Hackney<\/th>\n    <td class=\"gt_row gt_right\">&#163;546,932<\/td>\n    <td style=\"color: #000000; background-color: #efc5c3;\" class=\"gt_row gt_right\">0.0<\/td>\n    <td class=\"gt_row gt_right\">&#163;554,931<\/td>\n    <td style=\"color: #000000; background-color: #eebebc;\" class=\"gt_row gt_right\">\u22121.4<\/td>\n    <td class=\"gt_row gt_right\">&#163;583,301<\/td>\n    <td style=\"color: #000000; background-color: #eaa6a3;\" class=\"gt_row gt_right\">\u22126.2<\/td>\n    <td class=\"gt_row gt_right\">&#163;685,095<\/td>\n    <td style=\"color: #000000; background-color: #df615a;\" class=\"gt_row gt_right\">\u221220.2<\/td>\n    <td class=\"gt_row gt_right\">&#163;667,622<\/td>\n    <td style=\"color: #000000; background-color: #e16b65;\" class=\"gt_row gt_right\">\u221218.1<\/td>\n  <\/tr>\n  <tr>\n    <th class=\"gt_row gt_left gt_stub\">Hammersmith And Fulham<\/th>\n    <td class=\"gt_row gt_right\">&#163;577,801<\/td>\n    <td style=\"color: #000000; background-color: #efc5c3;\" class=\"gt_row gt_right\">0.0<\/td>\n    <td class=\"gt_row gt_right\">&#163;639,856<\/td>\n    <td style=\"color: #000000; background-color: #e79591;\" class=\"gt_row gt_right\">\u22129.7<\/td>\n    <td class=\"gt_row gt_right\">&#163;707,815<\/td>\n    <td style=\"color: #000000; background-color: #e06a63;\" class=\"gt_row gt_right\">\u221218.4<\/td>\n    <td class=\"gt_row gt_right\">&#163;760,674<\/td>\n    <td style=\"color: #000000; background-color: #dc4e46;\" class=\"gt_row gt_right\">\u221224.0<\/td>\n    <td class=\"gt_row gt_right\">&#163;883,098<\/td>\n    <td style=\"color: #000000; background-color: #808080;\" class=\"gt_row gt_right\">\u221234.6<\/td>\n  <\/tr>\n  <tr>\n    <th class=\"gt_row gt_left gt_stub\">Hounslow<\/th>\n    <td class=\"gt_row gt_right\">&#163;367,011<\/td>\n    <td style=\"color: #000000; background-color: #efc5c3;\" class=\"gt_row gt_right\">0.0<\/td>\n    <td class=\"gt_row gt_right\">&#163;375,391<\/td>\n    <td style=\"color: #000000; background-color: #edbab7;\" class=\"gt_row gt_right\">\u22122.2<\/td>\n    <td class=\"gt_row gt_right\">&#163;373,890<\/td>\n    <td style=\"color: #000000; background-color: #eebcb9;\" class=\"gt_row gt_right\">\u22121.8<\/td>\n    <td class=\"gt_row gt_right\">&#163;441,401<\/td>\n    <td style=\"color: #000000; background-color: #e2716b;\" class=\"gt_row gt_right\">\u221216.9<\/td>\n    <td class=\"gt_row gt_right\">&#163;419,450<\/td>\n    <td style=\"color: #000000; background-color: #e58782;\" class=\"gt_row gt_right\">\u221212.5<\/td>\n  <\/tr>\n  <tr>\n    <th class=\"gt_row gt_left gt_stub\">Islington<\/th>\n    <td class=\"gt_row gt_right\">&#163;566,442<\/td>\n    <td style=\"color: #000000; background-color: #efc5c3;\" class=\"gt_row gt_right\">0.0<\/td>\n    <td class=\"gt_row gt_right\">&#163;599,386<\/td>\n    <td style=\"color: #000000; background-color: #ebaaa6;\" class=\"gt_row gt_right\">\u22125.5<\/td>\n    <td class=\"gt_row gt_right\">&#163;623,593<\/td>\n    <td style=\"color: #000000; background-color: #e89893;\" class=\"gt_row gt_right\">\u22129.2<\/td>\n    <td class=\"gt_row gt_right\">&#163;725,791<\/td>\n    <td style=\"color: #000000; background-color: #dd5851;\" class=\"gt_row gt_right\">\u221222.0<\/td>\n    <td class=\"gt_row gt_right\">&#163;771,081<\/td>\n    <td style=\"color: #000000; background-color: #da4139;\" class=\"gt_row gt_right\">\u221226.5<\/td>\n  <\/tr>\n  <tr>\n    <th class=\"gt_row gt_left gt_stub\">Richmond Upon Thames<\/th>\n    <td class=\"gt_row gt_right\">&#163;483,018<\/td>\n    <td style=\"color: #000000; background-color: #efc5c3;\" class=\"gt_row gt_right\">0.0<\/td>\n    <td class=\"gt_row gt_right\">&#163;524,730<\/td>\n    <td style=\"color: #000000; background-color: #e99e9a;\" class=\"gt_row gt_right\">\u22127.9<\/td>\n    <td class=\"gt_row gt_right\">&#163;520,433<\/td>\n    <td style=\"color: #000000; background-color: #e9a19e;\" class=\"gt_row gt_right\">\u22127.2<\/td>\n    <td class=\"gt_row gt_right\">&#163;601,897<\/td>\n    <td style=\"color: #000000; background-color: #df635c;\" class=\"gt_row gt_right\">\u221219.8<\/td>\n    <td class=\"gt_row gt_right\">&#163;633,108<\/td>\n    <td style=\"color: #000000; background-color: #dc4f48;\" class=\"gt_row gt_right\">\u221223.7<\/td>\n  <\/tr>\n  <tr>\n    <th class=\"gt_row gt_left gt_stub\">Wandsworth<\/th>\n    <td class=\"gt_row gt_right\">&#163;540,034<\/td>\n    <td style=\"color: #000000; background-color: #efc5c3;\" class=\"gt_row gt_right\">0.0<\/td>\n    <td class=\"gt_row gt_right\">&#163;568,240<\/td>\n    <td style=\"color: #000000; background-color: #ebada9;\" class=\"gt_row gt_right\">\u22125.0<\/td>\n    <td class=\"gt_row gt_right\">&#163;603,492<\/td>\n    <td style=\"color: #000000; background-color: #e7918c;\" class=\"gt_row gt_right\">\u221210.5<\/td>\n    <td class=\"gt_row gt_right\">&#163;715,552<\/td>\n    <td style=\"color: #000000; background-color: #db4b43;\" class=\"gt_row gt_right\">\u221224.5<\/td>\n    <td class=\"gt_row gt_right\">&#163;761,426<\/td>\n    <td style=\"color: #FFFFFF; background-color: #d8352c;\" class=\"gt_row gt_right\">\u221229.1<\/td>\n  <\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n <h3 id=\"terraced-houses-in-london-boroughs\">\n  <a href=\"#terraced-houses-in-london-boroughs\" class=\"header-link\">\n    Terraced houses in London Boroughs\n  <\/a>\n<\/h3><p>Next, I am looking at terraced houses in the same London boroughs. My expectation is that they did better than flats, especially since they were more popular during the COVID pandemic, where people wanted a garden.<\/p>\n<div class=\"highlight\"><pre tabindex=\"0\" class=\"chroma\"><code class=\"language-python\" data-lang=\"python\"><span class=\"line\"><span class=\"cl\"><span class=\"n\">fig<\/span><span class=\"p\">,<\/span> <span class=\"n\">axes<\/span> <span class=\"o\">=<\/span> <span class=\"n\">plt<\/span><span class=\"o\">.<\/span><span class=\"n\">subplots<\/span><span class=\"p\">(<\/span><span class=\"mi\">1<\/span><span class=\"p\">,<\/span> <span class=\"mi\">1<\/span><span class=\"p\">,<\/span> <span class=\"n\">figsize<\/span><span class=\"o\">=<\/span><span class=\"p\">(<\/span><span class=\"mi\">14<\/span><span class=\"p\">,<\/span> <span class=\"mi\">8<\/span><span class=\"p\">))<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"n\">plot_prices_by_type<\/span><span class=\"p\">(<\/span><span class=\"n\">df_boroughs<\/span><span class=\"p\">,<\/span> <span class=\"s1\">&#39;Terraced&#39;<\/span><span class=\"p\">,<\/span> <span class=\"n\">axes<\/span><span class=\"p\">,<\/span> <span class=\"s1\">&#39;Price&#39;<\/span><span class=\"p\">,<\/span> <span class=\"s1\">&#39;Terraced houses in selected London boroughs&#39;<\/span><span class=\"p\">)<\/span>\n<\/span><\/span><\/code><\/pre><\/div><p>\n\n<img src=\"output_29_0.png\" alt=\"png\" \/>\n<\/p>\n<p>Indeed, terraced houses did better in nominal terms with all of them up. Again in real terms we see price decreases over all considered comparison years (1,2,5,10). However, it is less drastic than for flats in the same locations.<\/p>\n<div class=\"highlight\"><pre tabindex=\"0\" class=\"chroma\"><code class=\"language-python\" data-lang=\"python\"><span class=\"line\"><span class=\"cl\"><span class=\"n\">regions<\/span> <span class=\"o\">=<\/span> <span class=\"nb\">sorted<\/span><span class=\"p\">(<\/span><span class=\"n\">df<\/span><span class=\"p\">[<\/span><span class=\"s1\">&#39;Region&#39;<\/span><span class=\"p\">]<\/span><span class=\"o\">.<\/span><span class=\"n\">unique<\/span><span class=\"p\">())<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"n\">gt_regions_flat<\/span> <span class=\"o\">=<\/span> <span class=\"n\">create_table<\/span><span class=\"p\">(<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\">        <span class=\"n\">df_boroughs<\/span><span class=\"p\">,<\/span> <span class=\"s1\">&#39;Terraced&#39;<\/span><span class=\"p\">,<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\">        <span class=\"s1\">&#39;Selected London Boroughs&#39;<\/span><span class=\"p\">,<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\">        <span class=\"n\">regions<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\">    <span class=\"p\">)<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"n\">gt_regions_flat<\/span>\n<\/span><\/span><\/code><\/pre><\/div><div id=\"ppuntbiqeb\" style=\"padding-left:0px;padding-right:0px;padding-top:10px;padding-bottom:10px;overflow-x:auto;overflow-y:auto;width:auto;height:auto;\">\n<style>\n#ppuntbiqeb table {\n          font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Oxygen, Ubuntu, Cantarell, 'Helvetica Neue', 'Fira Sans', 'Droid Sans', Arial, sans-serif;\n          -webkit-font-smoothing: antialiased;\n          -moz-osx-font-smoothing: grayscale;\n        }\n<p>#ppuntbiqeb thead, tbody, tfoot, tr, td, th { border-style: none !important; }\ntr { background-color: transparent !important; }\n#ppuntbiqeb p { margin: 0 !important; padding: 0 !important; }\n#ppuntbiqeb .gt_table { display: table !important; border-collapse: collapse !important; line-height: normal !important; margin-left: auto !important; margin-right: auto !important; color: #333333 !important; font-size: 12px !important; font-weight: normal !important; font-style: normal !important; background-color: #FFFFFF !important; width: auto !important; border-top-style: solid !important; border-top-width: 2px !important; border-top-color: #A8A8A8 !important; border-right-style: none !important; border-right-width: 2px !important; border-right-color: #D3D3D3 !important; border-bottom-style: solid !important; border-bottom-width: 2px !important; border-bottom-color: #A8A8A8 !important; border-left-style: none !important; border-left-width: 2px !important; border-left-color: #D3D3D3 !important; }\n#ppuntbiqeb .gt_caption { padding-top: 4px !important; padding-bottom: 4px !important; }\n#ppuntbiqeb .gt_title { color: #333333 !important; font-size: 125% !important; font-weight: initial !important; padding-top: 4px !important; padding-bottom: 4px !important; padding-left: 5px !important; padding-right: 5px !important; border-bottom-color: #FFFFFF !important; border-bottom-width: 0 !important; }\n#ppuntbiqeb .gt_subtitle { color: #333333 !important; font-size: 85% !important; font-weight: initial !important; padding-top: 3px !important; padding-bottom: 5px !important; padding-left: 5px !important; padding-right: 5px !important; border-top-color: #FFFFFF !important; border-top-width: 0 !important; }\n#ppuntbiqeb .gt_heading { background-color: #FFFFFF !important; text-align: center !important; border-bottom-color: #FFFFFF !important; border-left-style: none !important; border-left-width: 1px !important; border-left-color: #D3D3D3 !important; border-right-style: none !important; border-right-width: 1px !important; border-right-color: #D3D3D3 !important; }\n#ppuntbiqeb .gt_bottom_border { border-bottom-style: solid !important; border-bottom-width: 2px !important; border-bottom-color: #D3D3D3 !important; }\n#ppuntbiqeb .gt_col_headings { border-top-style: solid !important; border-top-width: 2px !important; border-top-color: #D3D3D3 !important; border-bottom-style: solid !important; border-bottom-width: 2px !important; border-bottom-color: #D3D3D3 !important; border-left-style: none !important; border-left-width: 1px !important; border-left-color: #D3D3D3 !important; border-right-style: none !important; border-right-width: 1px !important; border-right-color: #D3D3D3 !important; }\n#ppuntbiqeb .gt_col_heading { color: #333333 !important; background-color: #FFFFFF !important; font-size: 100% !important; font-weight: normal !important; text-transform: inherit !important; border-left-style: none !important; border-left-width: 1px !important; border-left-color: #D3D3D3 !important; border-right-style: none !important; border-right-width: 1px !important; border-right-color: #D3D3D3 !important; vertical-align: bottom !important; padding-top: 5px !important; padding-bottom: 5px !important; padding-left: 5px !important; padding-right: 5px !important; overflow-x: hidden !important; }\n#ppuntbiqeb .gt_column_spanner_outer { color: #333333 !important; background-color: #FFFFFF !important; font-size: 100% !important; font-weight: normal !important; text-transform: inherit !important; padding-top: 0 !important; padding-bottom: 0 !important; padding-left: 4px !important; padding-right: 4px !important; }\n#ppuntbiqeb .gt_column_spanner_outer:first-child { padding-left: 0 !important; }\n#ppuntbiqeb .gt_column_spanner_outer:last-child { padding-right: 0 !important; }\n#ppuntbiqeb .gt_column_spanner { border-bottom-style: solid !important; border-bottom-width: 2px !important; border-bottom-color: #D3D3D3 !important; vertical-align: bottom !important; padding-top: 5px !important; padding-bottom: 5px !important; overflow-x: hidden !important; display: inline-block !important; width: 100% !important; }\n#ppuntbiqeb .gt_spanner_row { border-bottom-style: hidden !important; }\n#ppuntbiqeb .gt_group_heading { padding-top: 8px !important; padding-bottom: 8px !important; padding-left: 5px !important; padding-right: 5px !important; color: #333333 !important; background-color: #FFFFFF !important; font-size: 100% !important; font-weight: initial !important; text-transform: inherit !important; border-top-style: solid !important; border-top-width: 2px !important; border-top-color: #D3D3D3 !important; border-bottom-style: solid !important; border-bottom-width: 2px !important; border-bottom-color: #D3D3D3 !important; border-left-style: none !important; border-left-width: 1px !important; border-left-color: #D3D3D3 !important; border-right-style: none !important; border-right-width: 1px !important; border-right-color: #D3D3D3 !important; vertical-align: middle !important; text-align: left !important; }\n#ppuntbiqeb .gt_empty_group_heading { padding: 0.5px !important; color: #333333 !important; background-color: #FFFFFF !important; font-size: 100% !important; font-weight: initial !important; border-top-style: solid !important; border-top-width: 2px !important; border-top-color: #D3D3D3 !important; border-bottom-style: solid !important; border-bottom-width: 2px !important; border-bottom-color: #D3D3D3 !important; vertical-align: middle !important; }\n#ppuntbiqeb .gt_from_md&gt; :first-child { margin-top: 0 !important; }\n#ppuntbiqeb .gt_from_md&gt; :last-child { margin-bottom: 0 !important; }\n#ppuntbiqeb .gt_row { padding-top: 8px !important; padding-bottom: 8px !important; padding-left: 5px !important; padding-right: 5px !important; margin: 10px !important; border-top-style: solid !important; border-top-width: 1px !important; border-top-color: #D3D3D3 !important; border-left-style: none !important; border-left-width: 1px !important; border-left-color: #D3D3D3 !important; border-right-style: none !important; border-right-width: 1px !important; border-right-color: #D3D3D3 !important; vertical-align: middle !important; overflow-x: hidden !important; }\n#ppuntbiqeb .gt_stub { color: #333333 !important; background-color: #FFFFFF !important; font-size: 100% !important; font-weight: initial !important; text-transform: inherit !important; border-right-style: solid !important; border-right-width: 2px !important; border-right-color: #D3D3D3 !important; padding-left: 5px !important; padding-right: 5px !important; }\n#ppuntbiqeb .gt_stub_row_group { color: #333333 !important; background-color: #FFFFFF !important; font-size: 100% !important; font-weight: initial !important; text-transform: inherit !important; border-right-style: solid !important; border-right-width: 2px !important; border-right-color: #D3D3D3 !important; padding-left: 5px !important; padding-right: 5px !important; vertical-align: top !important; }\n#ppuntbiqeb .gt_row_group_first td { border-top-width: 2px !important; }\n#ppuntbiqeb .gt_row_group_first th { border-top-width: 2px !important; }\n#ppuntbiqeb .gt_striped { color: #333333 !important; background-color: #F4F4F4 !important; }\n#ppuntbiqeb .gt_table_body { border-top-style: solid !important; border-top-width: 2px !important; border-top-color: #D3D3D3 !important; border-bottom-style: solid !important; border-bottom-width: 2px !important; border-bottom-color: #D3D3D3 !important; }\n#ppuntbiqeb .gt_grand_summary_row { color: #333333 !important; background-color: #FFFFFF !important; text-transform: inherit !important; padding-top: 8px !important; padding-bottom: 8px !important; padding-left: 5px !important; padding-right: 5px !important; }\n#ppuntbiqeb .gt_first_grand_summary_row_bottom { border-top-style: double !important; border-top-width: 6px !important; border-top-color: #D3D3D3 !important; }\n#ppuntbiqeb .gt_last_grand_summary_row_top { border-bottom-style: double !important; border-bottom-width: 6px !important; border-bottom-color: #D3D3D3 !important; }\n#ppuntbiqeb .gt_sourcenotes { color: #333333 !important; background-color: #FFFFFF !important; border-bottom-style: none !important; border-bottom-width: 2px !important; border-bottom-color: #D3D3D3 !important; border-left-style: none !important; border-left-width: 2px !important; border-left-color: #D3D3D3 !important; border-right-style: none !important; border-right-width: 2px !important; border-right-color: #D3D3D3 !important; }\n#ppuntbiqeb .gt_sourcenote { font-size: 90% !important; padding-top: 4px !important; padding-bottom: 4px !important; padding-left: 5px !important; padding-right: 5px !important; text-align: left !important; }\n#ppuntbiqeb .gt_left { text-align: left !important; }\n#ppuntbiqeb .gt_center { text-align: center !important; }\n#ppuntbiqeb .gt_right { text-align: right !important; font-variant-numeric: tabular-nums !important; }\n#ppuntbiqeb .gt_font_normal { font-weight: normal !important; }\n#ppuntbiqeb .gt_font_bold { font-weight: bold !important; }\n#ppuntbiqeb .gt_font_italic { font-style: italic !important; }\n#ppuntbiqeb .gt_super { font-size: 65% !important; }\n#ppuntbiqeb .gt_footnote_marks { font-size: 75% !important; vertical-align: 0.4em !important; position: initial !important; }\n#ppuntbiqeb .gt_asterisk { font-size: 100% !important; vertical-align: 0 !important; }<\/p>\n<p><\/style><\/p>\n<table class=\"gt_table\" data-quarto-disable-processing=\"false\" data-quarto-bootstrap=\"false\">\n<thead>\n  <tr class=\"gt_heading\">\n    <td colspan=\"11\" class=\"gt_heading gt_title gt_font_normal\">Selected London Boroughs<\/td>\n  <\/tr>\n  <tr class=\"gt_heading\">\n    <td colspan=\"11\" class=\"gt_heading gt_subtitle gt_font_normal gt_bottom_border\">Terraced properties - Real prices (CPI-adjusted) - Q4 2025 vs 1, 2, 5, 10 years ago<\/td>\n  <\/tr>\n<tr class=\"gt_col_headings gt_spanner_row\">\n  <th class=\"gt_col_heading gt_columns_bottom_border gt_left\" rowspan=\"2\" colspan=\"1\" scope=\"col\" id=\"\"><\/th>\n  <th class=\"gt_center gt_columns_top_border gt_column_spanner_outer\" rowspan=\"1\" colspan=\"2\" scope=\"colgroup\" id=\"2025-(Current)\">\n    <span class=\"gt_column_spanner\">2025 (Current)<\/span>\n  <\/th>\n  <th class=\"gt_center gt_columns_top_border gt_column_spanner_outer\" rowspan=\"1\" colspan=\"2\" scope=\"colgroup\" id=\"2024-(1y-ago)\">\n    <span class=\"gt_column_spanner\">2024 (1y ago)<\/span>\n  <\/th>\n  <th class=\"gt_center gt_columns_top_border gt_column_spanner_outer\" rowspan=\"1\" colspan=\"2\" scope=\"colgroup\" id=\"2023-(2y-ago)\">\n    <span class=\"gt_column_spanner\">2023 (2y ago)<\/span>\n  <\/th>\n  <th class=\"gt_center gt_columns_top_border gt_column_spanner_outer\" rowspan=\"1\" colspan=\"2\" scope=\"colgroup\" id=\"2020-(5y-ago)\">\n    <span class=\"gt_column_spanner\">2020 (5y ago)<\/span>\n  <\/th>\n  <th class=\"gt_center gt_columns_top_border gt_column_spanner_outer\" rowspan=\"1\" colspan=\"2\" scope=\"colgroup\" id=\"2015-(10y-ago)\">\n    <span class=\"gt_column_spanner\">2015 (10y ago)<\/span>\n  <\/th>\n<\/tr>\n<tr class=\"gt_col_headings\">\n  <th class=\"gt_col_heading gt_columns_bottom_border gt_right\" rowspan=\"1\" colspan=\"1\" scope=\"col\" id=\"2025-(Current)_Real\">Price (\u00a3)<\/th>\n  <th class=\"gt_col_heading gt_columns_bottom_border gt_right\" rowspan=\"1\" colspan=\"1\" scope=\"col\" id=\"2025-(Current)_RealChg\">%<\/th>\n  <th class=\"gt_col_heading gt_columns_bottom_border gt_right\" rowspan=\"1\" colspan=\"1\" scope=\"col\" id=\"2024-(1y-ago)_Real\">Price (\u00a3)<\/th>\n  <th class=\"gt_col_heading gt_columns_bottom_border gt_right\" rowspan=\"1\" colspan=\"1\" scope=\"col\" id=\"2024-(1y-ago)_RealChg\">%<\/th>\n  <th class=\"gt_col_heading gt_columns_bottom_border gt_right\" rowspan=\"1\" colspan=\"1\" scope=\"col\" id=\"2023-(2y-ago)_Real\">Price (\u00a3)<\/th>\n  <th class=\"gt_col_heading gt_columns_bottom_border gt_right\" rowspan=\"1\" colspan=\"1\" scope=\"col\" id=\"2023-(2y-ago)_RealChg\">%<\/th>\n  <th class=\"gt_col_heading gt_columns_bottom_border gt_right\" rowspan=\"1\" colspan=\"1\" scope=\"col\" id=\"2020-(5y-ago)_Real\">Price (\u00a3)<\/th>\n  <th class=\"gt_col_heading gt_columns_bottom_border gt_right\" rowspan=\"1\" colspan=\"1\" scope=\"col\" id=\"2020-(5y-ago)_RealChg\">%<\/th>\n  <th class=\"gt_col_heading gt_columns_bottom_border gt_right\" rowspan=\"1\" colspan=\"1\" scope=\"col\" id=\"2015-(10y-ago)_Real\">Price (\u00a3)<\/th>\n  <th class=\"gt_col_heading gt_columns_bottom_border gt_right\" rowspan=\"1\" colspan=\"1\" scope=\"col\" id=\"2015-(10y-ago)_RealChg\">%<\/th>\n<\/tr>\n<\/thead>\n<tbody class=\"gt_table_body\">\n  <tr>\n    <th class=\"gt_row gt_left gt_stub\">Ealing<\/th>\n    <td class=\"gt_row gt_right\">&#163;690,513<\/td>\n    <td style=\"color: #000000; background-color: #efc5c3;\" class=\"gt_row gt_right\">0.0<\/td>\n    <td class=\"gt_row gt_right\">&#163;697,849<\/td>\n    <td style=\"color: #000000; background-color: #eec0be;\" class=\"gt_row gt_right\">\u22121.1<\/td>\n    <td class=\"gt_row gt_right\">&#163;690,002<\/td>\n    <td style=\"color: #000000; background-color: #efc6c3;\" class=\"gt_row gt_right\">0.1<\/td>\n    <td class=\"gt_row gt_right\">&#163;788,622<\/td>\n    <td style=\"color: #000000; background-color: #e58782;\" class=\"gt_row gt_right\">\u221212.4<\/td>\n    <td class=\"gt_row gt_right\">&#163;800,316<\/td>\n    <td style=\"color: #000000; background-color: #e4817c;\" class=\"gt_row gt_right\">\u221213.7<\/td>\n  <\/tr>\n  <tr>\n    <th class=\"gt_row gt_left gt_stub\">Hackney<\/th>\n    <td class=\"gt_row gt_right\">&#163;972,274<\/td>\n    <td style=\"color: #000000; background-color: #efc5c3;\" class=\"gt_row gt_right\">0.0<\/td>\n    <td class=\"gt_row gt_right\">&#163;966,520<\/td>\n    <td style=\"color: #000000; background-color: #efc8c6;\" class=\"gt_row gt_right\">0.6<\/td>\n    <td class=\"gt_row gt_right\">&#163;1,006,975<\/td>\n    <td style=\"color: #000000; background-color: #ecb4b1;\" class=\"gt_row gt_right\">\u22123.4<\/td>\n    <td class=\"gt_row gt_right\">&#163;1,158,935<\/td>\n    <td style=\"color: #000000; background-color: #e2756f;\" class=\"gt_row gt_right\">\u221216.1<\/td>\n    <td class=\"gt_row gt_right\">&#163;1,073,030<\/td>\n    <td style=\"color: #000000; background-color: #e79792;\" class=\"gt_row gt_right\">\u22129.4<\/td>\n  <\/tr>\n  <tr>\n    <th class=\"gt_row gt_left gt_stub\">Hammersmith And Fulham<\/th>\n    <td class=\"gt_row gt_right\">&#163;1,111,278<\/td>\n    <td style=\"color: #000000; background-color: #efc5c3;\" class=\"gt_row gt_right\">0.0<\/td>\n    <td class=\"gt_row gt_right\">&#163;1,221,472<\/td>\n    <td style=\"color: #000000; background-color: #e89894;\" class=\"gt_row gt_right\">\u22129.0<\/td>\n    <td class=\"gt_row gt_right\">&#163;1,334,009<\/td>\n    <td style=\"color: #000000; background-color: #e2726c;\" class=\"gt_row gt_right\">\u221216.7<\/td>\n    <td class=\"gt_row gt_right\">&#163;1,392,109<\/td>\n    <td style=\"color: #000000; background-color: #df615a;\" class=\"gt_row gt_right\">\u221220.2<\/td>\n    <td class=\"gt_row gt_right\">&#163;1,547,731<\/td>\n    <td style=\"color: #FFFFFF; background-color: #d83930;\" class=\"gt_row gt_right\">\u221228.2<\/td>\n  <\/tr>\n  <tr>\n    <th class=\"gt_row gt_left gt_stub\">Hounslow<\/th>\n    <td class=\"gt_row gt_right\">&#163;625,089<\/td>\n    <td style=\"color: #000000; background-color: #efc5c3;\" class=\"gt_row gt_right\">0.0<\/td>\n    <td class=\"gt_row gt_right\">&#163;622,316<\/td>\n    <td style=\"color: #000000; background-color: #efc7c5;\" class=\"gt_row gt_right\">0.4<\/td>\n    <td class=\"gt_row gt_right\">&#163;612,967<\/td>\n    <td style=\"color: #000000; background-color: #f1cfcd;\" class=\"gt_row gt_right\">2.0<\/td>\n    <td class=\"gt_row gt_right\">&#163;704,956<\/td>\n    <td style=\"color: #000000; background-color: #e68d88;\" class=\"gt_row gt_right\">\u221211.3<\/td>\n    <td class=\"gt_row gt_right\">&#163;637,839<\/td>\n    <td style=\"color: #000000; background-color: #edbbb9;\" class=\"gt_row gt_right\">\u22122.0<\/td>\n  <\/tr>\n  <tr>\n    <th class=\"gt_row gt_left gt_stub\">Islington<\/th>\n    <td class=\"gt_row gt_right\">&#163;1,124,856<\/td>\n    <td style=\"color: #000000; background-color: #efc5c3;\" class=\"gt_row gt_right\">0.0<\/td>\n    <td class=\"gt_row gt_right\">&#163;1,175,162<\/td>\n    <td style=\"color: #000000; background-color: #ecb0ad;\" class=\"gt_row gt_right\">\u22124.3<\/td>\n    <td class=\"gt_row gt_right\">&#163;1,214,208<\/td>\n    <td style=\"color: #000000; background-color: #e9a19d;\" class=\"gt_row gt_right\">\u22127.4<\/td>\n    <td class=\"gt_row gt_right\">&#163;1,380,766<\/td>\n    <td style=\"color: #000000; background-color: #e06963;\" class=\"gt_row gt_right\">\u221218.5<\/td>\n    <td class=\"gt_row gt_right\">&#163;1,380,624<\/td>\n    <td style=\"color: #000000; background-color: #e06963;\" class=\"gt_row gt_right\">\u221218.5<\/td>\n  <\/tr>\n  <tr>\n    <th class=\"gt_row gt_left gt_stub\">Richmond Upon Thames<\/th>\n    <td class=\"gt_row gt_right\">&#163;890,118<\/td>\n    <td style=\"color: #000000; background-color: #efc5c3;\" class=\"gt_row gt_right\">0.0<\/td>\n    <td class=\"gt_row gt_right\">&#163;935,936<\/td>\n    <td style=\"color: #000000; background-color: #ebadaa;\" class=\"gt_row gt_right\">\u22124.9<\/td>\n    <td class=\"gt_row gt_right\">&#163;918,218<\/td>\n    <td style=\"color: #000000; background-color: #edb6b3;\" class=\"gt_row gt_right\">\u22123.1<\/td>\n    <td class=\"gt_row gt_right\">&#163;1,036,584<\/td>\n    <td style=\"color: #000000; background-color: #e47f7a;\" class=\"gt_row gt_right\">\u221214.1<\/td>\n    <td class=\"gt_row gt_right\">&#163;1,033,227<\/td>\n    <td style=\"color: #000000; background-color: #e4807b;\" class=\"gt_row gt_right\">\u221213.9<\/td>\n  <\/tr>\n  <tr>\n    <th class=\"gt_row gt_left gt_stub\">Wandsworth<\/th>\n    <td class=\"gt_row gt_right\">&#163;965,788<\/td>\n    <td style=\"color: #000000; background-color: #efc5c3;\" class=\"gt_row gt_right\">0.0<\/td>\n    <td class=\"gt_row gt_right\">&#163;1,000,163<\/td>\n    <td style=\"color: #000000; background-color: #ecb4b1;\" class=\"gt_row gt_right\">\u22123.4<\/td>\n    <td class=\"gt_row gt_right\">&#163;1,050,418<\/td>\n    <td style=\"color: #000000; background-color: #e99d99;\" class=\"gt_row gt_right\">\u22128.1<\/td>\n    <td class=\"gt_row gt_right\">&#163;1,214,261<\/td>\n    <td style=\"color: #000000; background-color: #df5f59;\" class=\"gt_row gt_right\">\u221220.5<\/td>\n    <td class=\"gt_row gt_right\">&#163;1,219,378<\/td>\n    <td style=\"color: #000000; background-color: #de5e57;\" class=\"gt_row gt_right\">\u221220.8<\/td>\n  <\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n <h1 id=\"winners-and-losers-in-real-terms-last-3-years\">\n  <a href=\"#winners-and-losers-in-real-terms-last-3-years\" class=\"header-link\">\n    Winners and Losers in real terms (last 3 years)\n  <\/a>\n<\/h1><p>Lastly I want to see how the housing market in the different London boroughs compare to each other. I am looking at the prices 3 years ago across all property types. (At first I wanted to use 5 years, but I wanted to stay clear of the COVID years which might introduce short-term anomalies into the result). I am comparing real prices even though for a relative comparison between boroughs nominal prices would be fine as well (they all experienced the same inflation).<\/p>\n<div class=\"highlight\"><pre tabindex=\"0\" class=\"chroma\"><code class=\"language-python\" data-lang=\"python\"><span class=\"line\"><span class=\"cl\"><span class=\"n\">selected<\/span> <span class=\"o\">=<\/span> <span class=\"n\">df<\/span><span class=\"p\">[(<\/span><span class=\"n\">df<\/span><span class=\"p\">[<\/span><span class=\"s1\">&#39;is_london_borough&#39;<\/span><span class=\"p\">]<\/span> <span class=\"o\">==<\/span> <span class=\"kc\">True<\/span><span class=\"p\">)<\/span> <span class=\"o\">&amp;<\/span> <span class=\"p\">(<\/span><span class=\"n\">df<\/span><span class=\"p\">[<\/span><span class=\"s1\">&#39;Quarter&#39;<\/span><span class=\"p\">]<\/span><span class=\"o\">.<\/span><span class=\"n\">isin<\/span><span class=\"p\">([<\/span><span class=\"s1\">&#39;2025Q4&#39;<\/span><span class=\"p\">,<\/span><span class=\"s1\">&#39;2022Q4&#39;<\/span><span class=\"p\">]))<\/span> <span class=\"o\">&amp;<\/span> <span class=\"p\">(<\/span><span class=\"n\">df<\/span><span class=\"p\">[<\/span><span class=\"s1\">&#39;PropertyType&#39;<\/span><span class=\"p\">]<\/span> <span class=\"o\">==<\/span> <span class=\"s1\">&#39;All&#39;<\/span><span class=\"p\">)]<\/span>\n<\/span><\/span><\/code><\/pre><\/div><div class=\"highlight\"><pre tabindex=\"0\" class=\"chroma\"><code class=\"language-python\" data-lang=\"python\"><span class=\"line\"><span class=\"cl\"><span class=\"n\">df_wide<\/span> <span class=\"o\">=<\/span> <span class=\"n\">selected<\/span><span class=\"o\">.<\/span><span class=\"n\">pivot<\/span><span class=\"p\">(<\/span><span class=\"n\">index<\/span><span class=\"o\">=<\/span><span class=\"p\">[<\/span><span class=\"s1\">&#39;Region&#39;<\/span><span class=\"p\">,<\/span> <span class=\"s1\">&#39;PropertyType&#39;<\/span><span class=\"p\">],<\/span> <span class=\"n\">columns<\/span><span class=\"o\">=<\/span><span class=\"s1\">&#39;Quarter&#39;<\/span><span class=\"p\">,<\/span> <span class=\"n\">values<\/span><span class=\"o\">=<\/span><span class=\"s1\">&#39;RealPrice&#39;<\/span><span class=\"p\">)<\/span><span class=\"o\">.<\/span><span class=\"n\">reset_index<\/span><span class=\"p\">()<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"n\">df_wide<\/span><span class=\"p\">[<\/span><span class=\"s1\">&#39;%_real_price_change&#39;<\/span><span class=\"p\">]<\/span> <span class=\"o\">=<\/span> <span class=\"p\">(<\/span><span class=\"n\">df_wide<\/span><span class=\"p\">[<\/span><span class=\"s1\">&#39;2025Q4&#39;<\/span><span class=\"p\">]<\/span> <span class=\"o\">-<\/span> <span class=\"n\">df_wide<\/span><span class=\"p\">[<\/span><span class=\"s1\">&#39;2022Q4&#39;<\/span><span class=\"p\">])<\/span> <span class=\"o\">\/<\/span> <span class=\"n\">df_wide<\/span><span class=\"p\">[<\/span><span class=\"s1\">&#39;2022Q4&#39;<\/span><span class=\"p\">]<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"n\">df_wide<\/span><span class=\"o\">.<\/span><span class=\"n\">sort_values<\/span><span class=\"p\">(<\/span><span class=\"s1\">&#39;%_real_price_change&#39;<\/span><span class=\"p\">,<\/span> <span class=\"n\">ascending<\/span><span class=\"o\">=<\/span><span class=\"kc\">False<\/span><span class=\"p\">,<\/span><span class=\"n\">ignore_index<\/span><span class=\"o\">=<\/span><span class=\"kc\">True<\/span><span class=\"p\">)<\/span><span class=\"o\">.<\/span><span class=\"n\">head<\/span><span class=\"p\">(<\/span><span class=\"mi\">10<\/span><span class=\"p\">)<\/span>\n<\/span><\/span><\/code><\/pre><\/div><div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n<pre><code>.dataframe tbody tr th {\n    vertical-align: top;\n}\n\n.dataframe thead th {\n    text-align: right;\n}\n<\/code><\/pre>\n<p><\/style><\/p>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th>Quarter<\/th>\n      <th>Region<\/th>\n      <th>PropertyType<\/th>\n      <th>2022Q4<\/th>\n      <th>2025Q4<\/th>\n      <th>%_real_price_change<\/th>\n    <\/tr>\n  <\/thead>\n  <tbody>\n    <tr>\n      <th>0<\/th>\n      <td>Lewisham<\/td>\n      <td>All<\/td>\n      <td>529841.12<\/td>\n      <td>503542.00<\/td>\n      <td>-0.05<\/td>\n    <\/tr>\n    <tr>\n      <th>1<\/th>\n      <td>Havering<\/td>\n      <td>All<\/td>\n      <td>479610.47<\/td>\n      <td>450760.00<\/td>\n      <td>-0.06<\/td>\n    <\/tr>\n    <tr>\n      <th>2<\/th>\n      <td>Southwark<\/td>\n      <td>All<\/td>\n      <td>647219.21<\/td>\n      <td>607299.00<\/td>\n      <td>-0.06<\/td>\n    <\/tr>\n    <tr>\n      <th>3<\/th>\n      <td>Waltham Forest<\/td>\n      <td>All<\/td>\n      <td>569427.61<\/td>\n      <td>533922.00<\/td>\n      <td>-0.06<\/td>\n    <\/tr>\n    <tr>\n      <th>4<\/th>\n      <td>Haringey<\/td>\n      <td>All<\/td>\n      <td>697080.71<\/td>\n      <td>649650.00<\/td>\n      <td>-0.07<\/td>\n    <\/tr>\n    <tr>\n      <th>5<\/th>\n      <td>Hackney<\/td>\n      <td>All<\/td>\n      <td>681376.13<\/td>\n      <td>627538.00<\/td>\n      <td>-0.08<\/td>\n    <\/tr>\n    <tr>\n      <th>6<\/th>\n      <td>Hounslow<\/td>\n      <td>All<\/td>\n      <td>588525.22<\/td>\n      <td>540799.00<\/td>\n      <td>-0.08<\/td>\n    <\/tr>\n    <tr>\n      <th>7<\/th>\n      <td>Sutton<\/td>\n      <td>All<\/td>\n      <td>493375.92<\/td>\n      <td>452726.00<\/td>\n      <td>-0.08<\/td>\n    <\/tr>\n    <tr>\n      <th>8<\/th>\n      <td>Hillingdon<\/td>\n      <td>All<\/td>\n      <td>526316.99<\/td>\n      <td>482852.00<\/td>\n      <td>-0.08<\/td>\n    <\/tr>\n    <tr>\n      <th>9<\/th>\n      <td>Barking And Dagenham<\/td>\n      <td>All<\/td>\n      <td>387895.94<\/td>\n      <td>354709.00<\/td>\n      <td>-0.09<\/td>\n    <\/tr>\n  <\/tbody>\n<\/table>\n<\/div>\n<p>You can see that even the best performing borough Lewisham did saw decreasing real house prices of 5% in the last 3 years. Overall a lot of these top 10 boroughs are further outside of London and generally more affordable areas. Both Hackney and Southwark are more central, but have recently undergone gentrification which might have counteracted the price pressure in other central boroughs.<\/p>\n<div class=\"highlight\"><pre tabindex=\"0\" class=\"chroma\"><code class=\"language-python\" data-lang=\"python\"><span class=\"line\"><span class=\"cl\"><span class=\"n\">df_wide<\/span><span class=\"o\">.<\/span><span class=\"n\">sort_values<\/span><span class=\"p\">(<\/span><span class=\"s1\">&#39;%_real_price_change&#39;<\/span><span class=\"p\">,<\/span> <span class=\"n\">ascending<\/span><span class=\"o\">=<\/span><span class=\"kc\">False<\/span><span class=\"p\">,<\/span><span class=\"n\">ignore_index<\/span><span class=\"o\">=<\/span><span class=\"kc\">True<\/span><span class=\"p\">)<\/span><span class=\"o\">.<\/span><span class=\"n\">tail<\/span><span class=\"p\">(<\/span><span class=\"mi\">10<\/span><span class=\"p\">)<\/span>\n<\/span><\/span><\/code><\/pre><\/div><div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n<pre><code>.dataframe tbody tr th {\n    vertical-align: top;\n}\n\n.dataframe thead th {\n    text-align: right;\n}\n<\/code><\/pre>\n<p><\/style><\/p>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th>Quarter<\/th>\n      <th>Region<\/th>\n      <th>PropertyType<\/th>\n      <th>2022Q4<\/th>\n      <th>2025Q4<\/th>\n      <th>%_real_price_change<\/th>\n    <\/tr>\n  <\/thead>\n  <tbody>\n    <tr>\n      <th>23<\/th>\n      <td>Barnet<\/td>\n      <td>All<\/td>\n      <td>706765.75<\/td>\n      <td>606006.00<\/td>\n      <td>-0.14<\/td>\n    <\/tr>\n    <tr>\n      <th>24<\/th>\n      <td>Camden<\/td>\n      <td>All<\/td>\n      <td>928639.42<\/td>\n      <td>792454.00<\/td>\n      <td>-0.15<\/td>\n    <\/tr>\n    <tr>\n      <th>25<\/th>\n      <td>Croydon<\/td>\n      <td>All<\/td>\n      <td>464581.92<\/td>\n      <td>393479.00<\/td>\n      <td>-0.15<\/td>\n    <\/tr>\n    <tr>\n      <th>26<\/th>\n      <td>Lambeth<\/td>\n      <td>All<\/td>\n      <td>654875.76<\/td>\n      <td>550720.00<\/td>\n      <td>-0.16<\/td>\n    <\/tr>\n    <tr>\n      <th>27<\/th>\n      <td>Hammersmith And Fulham<\/td>\n      <td>All<\/td>\n      <td>883837.60<\/td>\n      <td>741308.00<\/td>\n      <td>-0.16<\/td>\n    <\/tr>\n    <tr>\n      <th>28<\/th>\n      <td>Wandsworth<\/td>\n      <td>All<\/td>\n      <td>834257.35<\/td>\n      <td>695867.00<\/td>\n      <td>-0.17<\/td>\n    <\/tr>\n    <tr>\n      <th>29<\/th>\n      <td>Tower Hamlets<\/td>\n      <td>All<\/td>\n      <td>595217.88<\/td>\n      <td>470209.00<\/td>\n      <td>-0.21<\/td>\n    <\/tr>\n    <tr>\n      <th>30<\/th>\n      <td>Kensington And Chelsea<\/td>\n      <td>All<\/td>\n      <td>1637183.90<\/td>\n      <td>1194726.00<\/td>\n      <td>-0.27<\/td>\n    <\/tr>\n    <tr>\n      <th>31<\/th>\n      <td>City Of Westminster<\/td>\n      <td>All<\/td>\n      <td>1273329.16<\/td>\n      <td>889935.00<\/td>\n      <td>-0.30<\/td>\n    <\/tr>\n    <tr>\n      <th>32<\/th>\n      <td>City Of London<\/td>\n      <td>All<\/td>\n      <td>993657.65<\/td>\n      <td>607399.00<\/td>\n      <td>-0.39<\/td>\n    <\/tr>\n  <\/tbody>\n<\/table>\n<\/div>\n<p>Some of the worst performing boroughs over the last 3 years are the City of London, Westminster, Kensington &amp; Chelsea, and Hammersmith and Fulham, which are all very expensive &lsquo;prestige&rsquo; areas.<\/p>\n <h2 id=\"conclusion\">\n  <a href=\"#conclusion\" class=\"header-link\">\n    Conclusion\n  <\/a>\n<\/h2><p>This analysis shows that a prospective home buyer or seller only gains limited information from a national-level house price statistic. Instead, I showed that there are, unsurprisingly, vast differences in price developments between property types and locations.<\/p>\n<p>Moreover, in real terms almost none of the considered regions saw increasing house prices over the last 10 years (notable exception: Kent).<\/p>\n <h2 id=\"appendix\">\n  <a href=\"#appendix\" class=\"header-link\">\n    Appendix\n  <\/a>\n<\/h2><p>Since someone asked, here are the real price changes compare to 3 years ago for all London boroughs:<\/p>\n<div class=\"highlight\"><pre tabindex=\"0\" class=\"chroma\"><code class=\"language-python\" data-lang=\"python\"><span class=\"line\"><span class=\"cl\"><span class=\"n\">df_wide<\/span><span class=\"o\">.<\/span><span class=\"n\">sort_values<\/span><span class=\"p\">(<\/span><span class=\"s1\">&#39;%_real_price_change&#39;<\/span><span class=\"p\">,<\/span> <span class=\"n\">ascending<\/span><span class=\"o\">=<\/span><span class=\"kc\">False<\/span><span class=\"p\">,<\/span><span class=\"n\">ignore_index<\/span><span class=\"o\">=<\/span><span class=\"kc\">True<\/span><span class=\"p\">)<\/span>\n<\/span><\/span><\/code><\/pre><\/div><div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n<pre><code>.dataframe tbody tr th {\n    vertical-align: top;\n}\n\n.dataframe thead th {\n    text-align: right;\n}\n<\/code><\/pre>\n<p><\/style><\/p>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th>Quarter<\/th>\n      <th>Region<\/th>\n      <th>PropertyType<\/th>\n      <th>2022Q4<\/th>\n      <th>2025Q4<\/th>\n      <th>%_real_price_change<\/th>\n    <\/tr>\n  <\/thead>\n  <tbody>\n    <tr>\n      <th>0<\/th>\n      <td>Lewisham<\/td>\n      <td>All<\/td>\n      <td>529841.12<\/td>\n      <td>503542.00<\/td>\n      <td>-0.05<\/td>\n    <\/tr>\n    <tr>\n      <th>1<\/th>\n      <td>Havering<\/td>\n      <td>All<\/td>\n      <td>479610.47<\/td>\n      <td>450760.00<\/td>\n      <td>-0.06<\/td>\n    <\/tr>\n    <tr>\n      <th>2<\/th>\n      <td>Southwark<\/td>\n      <td>All<\/td>\n      <td>647219.21<\/td>\n      <td>607299.00<\/td>\n      <td>-0.06<\/td>\n    <\/tr>\n    <tr>\n      <th>3<\/th>\n      <td>Waltham Forest<\/td>\n      <td>All<\/td>\n      <td>569427.61<\/td>\n      <td>533922.00<\/td>\n      <td>-0.06<\/td>\n    <\/tr>\n    <tr>\n      <th>4<\/th>\n      <td>Haringey<\/td>\n      <td>All<\/td>\n      <td>697080.71<\/td>\n      <td>649650.00<\/td>\n      <td>-0.07<\/td>\n    <\/tr>\n    <tr>\n      <th>5<\/th>\n      <td>Hackney<\/td>\n      <td>All<\/td>\n      <td>681376.13<\/td>\n      <td>627538.00<\/td>\n      <td>-0.08<\/td>\n    <\/tr>\n    <tr>\n      <th>6<\/th>\n      <td>Hounslow<\/td>\n      <td>All<\/td>\n      <td>588525.22<\/td>\n      <td>540799.00<\/td>\n      <td>-0.08<\/td>\n    <\/tr>\n    <tr>\n      <th>7<\/th>\n      <td>Sutton<\/td>\n      <td>All<\/td>\n      <td>493375.92<\/td>\n      <td>452726.00<\/td>\n      <td>-0.08<\/td>\n    <\/tr>\n    <tr>\n      <th>8<\/th>\n      <td>Hillingdon<\/td>\n      <td>All<\/td>\n      <td>526316.99<\/td>\n      <td>482852.00<\/td>\n      <td>-0.08<\/td>\n    <\/tr>\n    <tr>\n      <th>9<\/th>\n      <td>Barking And Dagenham<\/td>\n      <td>All<\/td>\n      <td>387895.94<\/td>\n      <td>354709.00<\/td>\n      <td>-0.09<\/td>\n    <\/tr>\n    <tr>\n      <th>10<\/th>\n      <td>Merton<\/td>\n      <td>All<\/td>\n      <td>677772.16<\/td>\n      <td>618798.00<\/td>\n      <td>-0.09<\/td>\n    <\/tr>\n    <tr>\n      <th>11<\/th>\n      <td>Harrow<\/td>\n      <td>All<\/td>\n      <td>590633.55<\/td>\n      <td>538433.00<\/td>\n      <td>-0.09<\/td>\n    <\/tr>\n    <tr>\n      <th>12<\/th>\n      <td>Bromley<\/td>\n      <td>All<\/td>\n      <td>581127.96<\/td>\n      <td>529749.00<\/td>\n      <td>-0.09<\/td>\n    <\/tr>\n    <tr>\n      <th>13<\/th>\n      <td>Redbridge<\/td>\n      <td>All<\/td>\n      <td>524996.77<\/td>\n      <td>476764.00<\/td>\n      <td>-0.09<\/td>\n    <\/tr>\n    <tr>\n      <th>14<\/th>\n      <td>Richmond Upon Thames<\/td>\n      <td>All<\/td>\n      <td>870795.06<\/td>\n      <td>785086.00<\/td>\n      <td>-0.10<\/td>\n    <\/tr>\n    <tr>\n      <th>15<\/th>\n      <td>Enfield<\/td>\n      <td>All<\/td>\n      <td>533218.77<\/td>\n      <td>480183.00<\/td>\n      <td>-0.10<\/td>\n    <\/tr>\n    <tr>\n      <th>16<\/th>\n      <td>Bexley<\/td>\n      <td>All<\/td>\n      <td>456886.19<\/td>\n      <td>410002.00<\/td>\n      <td>-0.10<\/td>\n    <\/tr>\n    <tr>\n      <th>17<\/th>\n      <td>Islington<\/td>\n      <td>All<\/td>\n      <td>757742.76<\/td>\n      <td>677305.00<\/td>\n      <td>-0.11<\/td>\n    <\/tr>\n    <tr>\n      <th>18<\/th>\n      <td>Ealing<\/td>\n      <td>All<\/td>\n      <td>645069.13<\/td>\n      <td>572575.00<\/td>\n      <td>-0.11<\/td>\n    <\/tr>\n    <tr>\n      <th>19<\/th>\n      <td>Greenwich<\/td>\n      <td>All<\/td>\n      <td>533193.50<\/td>\n      <td>472599.00<\/td>\n      <td>-0.11<\/td>\n    <\/tr>\n    <tr>\n      <th>20<\/th>\n      <td>Brent<\/td>\n      <td>All<\/td>\n      <td>639480.63<\/td>\n      <td>558093.00<\/td>\n      <td>-0.13<\/td>\n    <\/tr>\n    <tr>\n      <th>21<\/th>\n      <td>Kingston Upon Thames<\/td>\n      <td>All<\/td>\n      <td>657636.32<\/td>\n      <td>573489.00<\/td>\n      <td>-0.13<\/td>\n    <\/tr>\n    <tr>\n      <th>22<\/th>\n      <td>Newham<\/td>\n      <td>All<\/td>\n      <td>471479.67<\/td>\n      <td>405808.00<\/td>\n      <td>-0.14<\/td>\n    <\/tr>\n    <tr>\n      <th>23<\/th>\n      <td>Barnet<\/td>\n      <td>All<\/td>\n      <td>706765.75<\/td>\n      <td>606006.00<\/td>\n      <td>-0.14<\/td>\n    <\/tr>\n    <tr>\n      <th>24<\/th>\n      <td>Camden<\/td>\n      <td>All<\/td>\n      <td>928639.42<\/td>\n      <td>792454.00<\/td>\n      <td>-0.15<\/td>\n    <\/tr>\n    <tr>\n      <th>25<\/th>\n      <td>Croydon<\/td>\n      <td>All<\/td>\n      <td>464581.92<\/td>\n      <td>393479.00<\/td>\n      <td>-0.15<\/td>\n    <\/tr>\n    <tr>\n      <th>26<\/th>\n      <td>Lambeth<\/td>\n      <td>All<\/td>\n      <td>654875.76<\/td>\n      <td>550720.00<\/td>\n      <td>-0.16<\/td>\n    <\/tr>\n    <tr>\n      <th>27<\/th>\n      <td>Hammersmith And Fulham<\/td>\n      <td>All<\/td>\n      <td>883837.60<\/td>\n      <td>741308.00<\/td>\n      <td>-0.16<\/td>\n    <\/tr>\n    <tr>\n      <th>28<\/th>\n      <td>Wandsworth<\/td>\n      <td>All<\/td>\n      <td>834257.35<\/td>\n      <td>695867.00<\/td>\n      <td>-0.17<\/td>\n    <\/tr>\n    <tr>\n      <th>29<\/th>\n      <td>Tower Hamlets<\/td>\n      <td>All<\/td>\n      <td>595217.88<\/td>\n      <td>470209.00<\/td>\n      <td>-0.21<\/td>\n    <\/tr>\n    <tr>\n      <th>30<\/th>\n      <td>Kensington And Chelsea<\/td>\n      <td>All<\/td>\n      <td>1637183.90<\/td>\n      <td>1194726.00<\/td>\n      <td>-0.27<\/td>\n    <\/tr>\n    <tr>\n      <th>31<\/th>\n      <td>City Of Westminster<\/td>\n      <td>All<\/td>\n      <td>1273329.16<\/td>\n      <td>889935.00<\/td>\n      <td>-0.30<\/td>\n    <\/tr>\n    <tr>\n      <th>32<\/th>\n      <td>City Of London<\/td>\n      <td>All<\/td>\n      <td>993657.65<\/td>\n      <td>607399.00<\/td>\n      <td>-0.39<\/td>\n    <\/tr>\n  <\/tbody>\n<\/table>\n<\/div>\n <h2 id=\"jupyter-notebook\">\n  <a href=\"#jupyter-notebook\" class=\"header-link\">\n    Jupyter Notebook\n  <\/a>\n<\/h2><p>You can find the jupyter notebook for this post \n<a href=\"https:\/\/gitlab.com\/frankRi89\/blog\/-\/tree\/main\/notebooks\/uk-house-prices\" \n\n    target=\"_blank\"\n    >here\n    \n        <span style=\"white-space: nowrap\">&thinsp;<svg\n            style=\"height: 0.7em; width: 0.7em; padding-left: -0.2em;\" focusable=\"false\" data-prefix=\"fas\" data-icon=\"external-link-alt\"\n            class=\"svg-inline--fa fa-external-link-alt fa-w-16\" role=\"img\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\"\n            viewBox=\"0 0 512 512\">\n            <path fill=\"currentColor\"\n                d=\"M432,320H400a16,16,0,0,0-16,16V448H64V128H208a16,16,0,0,0,16-16V80a16,16,0,0,0-16-16H48A48,48,0,0,0,0,112V464a48,48,0,0,0,48,48H400a48,48,0,0,0,48-48V336A16,16,0,0,0,432,320ZM488,0h-128c-21.37,0-32.05,25.91-17,41l35.73,35.73L135,320.37a24,24,0,0,0,0,34L157.67,377a24,24,0,0,0,34,0L435.28,133.32,471,169c15,15,41,4.5,41-17V24A24,24,0,0,0,488,0Z\">\n            <\/path>\n        <\/svg>\n    <\/span>\n        \n    \n<\/a>.<\/p>\n"},{"title":"Building a local data warehouse with DuckDB, dbt, and Superset","link":"\/posts\/duckdb-dbt-superset\/","pubDate":"Sun, 09 Mar 2025 15:00:00 +0000","guid":"\/posts\/duckdb-dbt-superset\/","description":"<p>In previous blog posts, I described two DuckDB use cases for data scientists and data engineers: \n<a href=\"\/posts\/duckdb-for-data-scientists\/\" \n>Querying remote parquet files\n<\/a> and \n<a href=\"\/posts\/duckdb-large-datasets\/\" \n>processing larger-than memory datasets\n<\/a>.<\/p>\n<p>Now I want to explore if I can set up a local and open-source version of the analytics data stack that we use at my company. At work, we are using Snowflake, dbt cloud, and Google&rsquo;s Looker which cost us several thousand EUR per month. I am going to use the following open-source tools in my local setup:<\/p>\n<table>\n  <thead>\n      <tr>\n          <th><\/th>\n          <th>cloud<\/th>\n          <th>local<\/th>\n      <\/tr>\n  <\/thead>\n  <tbody>\n      <tr>\n          <td>analytical database<\/td>\n          <td>snowflake<\/td>\n          <td>DuckDB<\/td>\n      <\/tr>\n      <tr>\n          <td>data modelling tool<\/td>\n          <td>dbt cloud<\/td>\n          <td>local dbt<\/td>\n      <\/tr>\n      <tr>\n          <td>dashboard tool<\/td>\n          <td>Looker<\/td>\n          <td>Apache Superset<\/td>\n      <\/tr>\n  <\/tbody>\n<\/table>\n<p>Figure1 shows how the components work together.<\/p>\n<p>\n\n<figure>\n  <div class=\"image-wrapper\">\n  <img src=\"duckdb_dbt_superset_setup.png\" alt=\"Local data warehouse\" loading=\"lazy\" \/>\n  <figcaption>Figure 1. Components of my local data warehouse stack.<\/figcaption>\n  <\/div>\n<\/figure>\n<\/p>\n<p>I will use a toy visualization problem to demonstrate the setup. My goal is to load the race time dataset that I used in \n<a href=\"\/posts\/vatternrundan-results\/\" \n>Data analysis: V\u00e4tternrundan 2024 results\n<\/a> into DuckDB. I then use dbt to create data models and Superset to create an interactive dashboard to visualize the data.<\/p>\n <h2 id=\"loading-the-raw-data-into-duckdb\">\n  <a href=\"#loading-the-raw-data-into-duckdb\" class=\"header-link\">\n    Loading the raw data into DuckDB\n  <\/a>\n<\/h2><p>I want to use DuckDB for all data processing. Therefore, I will load the raw race time data into DuckDB.<\/p>\n<p>I start by installing DuckDB as my analytical database. Since I use homebrew as a package manager, I can run:<\/p>\n<div class=\"highlight\"><pre tabindex=\"0\" class=\"chroma\"><code class=\"language-sh\" data-lang=\"sh\"><span class=\"line\"><span class=\"cl\">brew install duckdb\n<\/span><\/span><\/code><\/pre><\/div><p>I then create a persistent database (a file on my machine):<\/p>\n<div class=\"highlight\"><pre tabindex=\"0\" class=\"chroma\"><code class=\"language-sh\" data-lang=\"sh\"><span class=\"line\"><span class=\"cl\">duckdb database.duckdb\n<\/span><\/span><\/code><\/pre><\/div><p>and create a new DuckDB table from the parquet file that contains the raw data.<\/p>\n<div class=\"highlight\"><pre tabindex=\"0\" class=\"chroma\"><code class=\"language-sql\" data-lang=\"sql\"><span class=\"line\"><span class=\"cl\"><span class=\"k\">CREATE<\/span><span class=\"w\"> <\/span><span class=\"k\">TABLE<\/span><span class=\"w\"> <\/span><span class=\"n\">vatternrundan<\/span><span class=\"w\"> <\/span><span class=\"k\">AS<\/span><span class=\"w\"> <\/span><span class=\"k\">select<\/span><span class=\"w\"> <\/span><span class=\"n\">startnumber<\/span><span class=\"p\">,<\/span><span class=\"w\"> <\/span><span class=\"n\">city<\/span><span class=\"p\">,<\/span><span class=\"w\"> <\/span><span class=\"n\">country<\/span><span class=\"p\">,<\/span><span class=\"w\"> <\/span><span class=\"n\">result_time<\/span><span class=\"p\">,<\/span><span class=\"w\"> <\/span><span class=\"n\">start_time<\/span><span class=\"w\"> <\/span><span class=\"k\">from<\/span><span class=\"w\"> <\/span><span class=\"n\">read_parquet<\/span><span class=\"p\">(<\/span><span class=\"s1\">&#39;results_vatternrundan24.parquet&#39;<\/span><span class=\"p\">);<\/span><span class=\"w\">\n<\/span><\/span><\/span><\/code><\/pre><\/div><p>To verify that everything works, I run:<\/p>\n<div class=\"highlight\"><pre tabindex=\"0\" class=\"chroma\"><code class=\"language-sql\" data-lang=\"sql\"><span class=\"line\"><span class=\"cl\"><span class=\"k\">describe<\/span><span class=\"w\"> <\/span><span class=\"k\">table<\/span><span class=\"w\"> <\/span><span class=\"n\">vatternrundan<\/span><span class=\"p\">;<\/span><span class=\"w\">\n<\/span><\/span><\/span><\/code><\/pre><\/div><div class=\"highlight\"><pre tabindex=\"0\" class=\"chroma\"><code class=\"language-sql\" data-lang=\"sql\"><span class=\"line\"><span class=\"cl\"><span class=\"err\">\u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510<\/span><span class=\"w\">\n<\/span><\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"w\"><\/span><span class=\"err\">\u2502<\/span><span class=\"w\"> <\/span><span class=\"k\">column_name<\/span><span class=\"w\"> <\/span><span class=\"err\">\u2502<\/span><span class=\"w\"> <\/span><span class=\"n\">column_type<\/span><span class=\"w\"> <\/span><span class=\"err\">\u2502<\/span><span class=\"w\">  <\/span><span class=\"k\">null<\/span><span class=\"w\">   <\/span><span class=\"err\">\u2502<\/span><span class=\"w\">   <\/span><span class=\"k\">key<\/span><span class=\"w\">   <\/span><span class=\"err\">\u2502<\/span><span class=\"w\"> <\/span><span class=\"k\">default<\/span><span class=\"w\"> <\/span><span class=\"err\">\u2502<\/span><span class=\"w\">  <\/span><span class=\"n\">extra<\/span><span class=\"w\">  <\/span><span class=\"err\">\u2502<\/span><span class=\"w\">\n<\/span><\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"w\"><\/span><span class=\"err\">\u2502<\/span><span class=\"w\">   <\/span><span class=\"nb\">varchar<\/span><span class=\"w\">   <\/span><span class=\"err\">\u2502<\/span><span class=\"w\">   <\/span><span class=\"nb\">varchar<\/span><span class=\"w\">   <\/span><span class=\"err\">\u2502<\/span><span class=\"w\"> <\/span><span class=\"nb\">varchar<\/span><span class=\"w\"> <\/span><span class=\"err\">\u2502<\/span><span class=\"w\"> <\/span><span class=\"nb\">varchar<\/span><span class=\"w\"> <\/span><span class=\"err\">\u2502<\/span><span class=\"w\"> <\/span><span class=\"nb\">varchar<\/span><span class=\"w\"> <\/span><span class=\"err\">\u2502<\/span><span class=\"w\"> <\/span><span class=\"nb\">varchar<\/span><span class=\"w\"> <\/span><span class=\"err\">\u2502<\/span><span class=\"w\">\n<\/span><\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"w\"><\/span><span class=\"err\">\u251c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u253c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u253c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u253c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u253c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u253c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2524<\/span><span class=\"w\">\n<\/span><\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"w\"><\/span><span class=\"err\">\u2502<\/span><span class=\"w\"> <\/span><span class=\"n\">startnumber<\/span><span class=\"w\"> <\/span><span class=\"err\">\u2502<\/span><span class=\"w\"> <\/span><span class=\"nb\">BIGINT<\/span><span class=\"w\">      <\/span><span class=\"err\">\u2502<\/span><span class=\"w\"> <\/span><span class=\"n\">YES<\/span><span class=\"w\">     <\/span><span class=\"err\">\u2502<\/span><span class=\"w\"> <\/span><span class=\"k\">NULL<\/span><span class=\"w\">    <\/span><span class=\"err\">\u2502<\/span><span class=\"w\"> <\/span><span class=\"k\">NULL<\/span><span class=\"w\">    <\/span><span class=\"err\">\u2502<\/span><span class=\"w\"> <\/span><span class=\"k\">NULL<\/span><span class=\"w\">    <\/span><span class=\"err\">\u2502<\/span><span class=\"w\">\n<\/span><\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"w\"><\/span><span class=\"err\">\u2502<\/span><span class=\"w\"> <\/span><span class=\"n\">city<\/span><span class=\"w\">        <\/span><span class=\"err\">\u2502<\/span><span class=\"w\"> <\/span><span class=\"nb\">VARCHAR<\/span><span class=\"w\">     <\/span><span class=\"err\">\u2502<\/span><span class=\"w\"> <\/span><span class=\"n\">YES<\/span><span class=\"w\">     <\/span><span class=\"err\">\u2502<\/span><span class=\"w\"> <\/span><span class=\"k\">NULL<\/span><span class=\"w\">    <\/span><span class=\"err\">\u2502<\/span><span class=\"w\"> <\/span><span class=\"k\">NULL<\/span><span class=\"w\">    <\/span><span class=\"err\">\u2502<\/span><span class=\"w\"> <\/span><span class=\"k\">NULL<\/span><span class=\"w\">    <\/span><span class=\"err\">\u2502<\/span><span class=\"w\">\n<\/span><\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"w\"><\/span><span class=\"err\">\u2502<\/span><span class=\"w\"> <\/span><span class=\"n\">country<\/span><span class=\"w\">     <\/span><span class=\"err\">\u2502<\/span><span class=\"w\"> <\/span><span class=\"nb\">VARCHAR<\/span><span class=\"w\">     <\/span><span class=\"err\">\u2502<\/span><span class=\"w\"> <\/span><span class=\"n\">YES<\/span><span class=\"w\">     <\/span><span class=\"err\">\u2502<\/span><span class=\"w\"> <\/span><span class=\"k\">NULL<\/span><span class=\"w\">    <\/span><span class=\"err\">\u2502<\/span><span class=\"w\"> <\/span><span class=\"k\">NULL<\/span><span class=\"w\">    <\/span><span class=\"err\">\u2502<\/span><span class=\"w\"> <\/span><span class=\"k\">NULL<\/span><span class=\"w\">    <\/span><span class=\"err\">\u2502<\/span><span class=\"w\">\n<\/span><\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"w\"><\/span><span class=\"err\">\u2502<\/span><span class=\"w\"> <\/span><span class=\"n\">result_time<\/span><span class=\"w\"> <\/span><span class=\"err\">\u2502<\/span><span class=\"w\"> <\/span><span class=\"nb\">BIGINT<\/span><span class=\"w\">      <\/span><span class=\"err\">\u2502<\/span><span class=\"w\"> <\/span><span class=\"n\">YES<\/span><span class=\"w\">     <\/span><span class=\"err\">\u2502<\/span><span class=\"w\"> <\/span><span class=\"k\">NULL<\/span><span class=\"w\">    <\/span><span class=\"err\">\u2502<\/span><span class=\"w\"> <\/span><span class=\"k\">NULL<\/span><span class=\"w\">    <\/span><span class=\"err\">\u2502<\/span><span class=\"w\"> <\/span><span class=\"k\">NULL<\/span><span class=\"w\">    <\/span><span class=\"err\">\u2502<\/span><span class=\"w\">\n<\/span><\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"w\"><\/span><span class=\"err\">\u2502<\/span><span class=\"w\"> <\/span><span class=\"n\">start_time<\/span><span class=\"w\">  <\/span><span class=\"err\">\u2502<\/span><span class=\"w\"> <\/span><span class=\"nb\">VARCHAR<\/span><span class=\"w\">     <\/span><span class=\"err\">\u2502<\/span><span class=\"w\"> <\/span><span class=\"n\">YES<\/span><span class=\"w\">     <\/span><span class=\"err\">\u2502<\/span><span class=\"w\"> <\/span><span class=\"k\">NULL<\/span><span class=\"w\">    <\/span><span class=\"err\">\u2502<\/span><span class=\"w\"> <\/span><span class=\"k\">NULL<\/span><span class=\"w\">    <\/span><span class=\"err\">\u2502<\/span><span class=\"w\"> <\/span><span class=\"k\">NULL<\/span><span class=\"w\">    <\/span><span class=\"err\">\u2502<\/span><span class=\"w\">\n<\/span><\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"w\"><\/span><span class=\"err\">\u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518<\/span><span class=\"w\">\n<\/span><\/span><\/span><\/code><\/pre><\/div> <h2 id=\"setting-up-dbt-for-data-transformations\">\n  <a href=\"#setting-up-dbt-for-data-transformations\" class=\"header-link\">\n    Setting up dbt for data transformations\n  <\/a>\n<\/h2><p>I want to use dbt to create data models on top of the raw data. I start by creating a new poetry project to install dependencies:<\/p>\n<div class=\"highlight\"><pre tabindex=\"0\" class=\"chroma\"><code class=\"language-sh\" data-lang=\"sh\"><span class=\"line\"><span class=\"cl\">poetry init\n<\/span><\/span><span class=\"line\"><span class=\"cl\">poetry add duckdb-dbt\n<\/span><\/span><\/code><\/pre><\/div><p>Then I initialize the dbt project <code>local_warehouse<\/code> with<\/p>\n<div class=\"highlight\"><pre tabindex=\"0\" class=\"chroma\"><code class=\"language-sh\" data-lang=\"sh\"><span class=\"line\"><span class=\"cl\">poetry run dbt init local_warehouse\n<\/span><\/span><\/code><\/pre><\/div><p>Next, I configure dbt to use DuckDB as the data processing backend. In my work setup, this would point at a snowflake instance instead. I point at the local database file in the dbt profiles file <code>profiles.yml<\/code>:<\/p>\n<div class=\"highlight\" title=\"profiles.yml\"><pre tabindex=\"0\" class=\"chroma\"><code class=\"language-yml\" data-lang=\"yml\"><span class=\"line\"><span class=\"cl\"><span class=\"nt\">local_warehouse<\/span><span class=\"p\">:<\/span><span class=\"w\">\n<\/span><\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"w\">  <\/span><span class=\"nt\">outputs<\/span><span class=\"p\">:<\/span><span class=\"w\">\n<\/span><\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"w\">    <\/span><span class=\"nt\">dev<\/span><span class=\"p\">:<\/span><span class=\"w\">\n<\/span><\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"w\">      <\/span><span class=\"nt\">type<\/span><span class=\"p\">:<\/span><span class=\"w\"> <\/span><span class=\"l\">duckdb<\/span><span class=\"w\">\n<\/span><\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"w\">      <\/span><span class=\"nt\">path<\/span><span class=\"p\">:<\/span><span class=\"w\"> <\/span><span class=\"l\">..\/duckdb\/database.duckdb<\/span><span class=\"w\">\n<\/span><\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"w\">  <\/span><span class=\"nt\">target<\/span><span class=\"p\">:<\/span><span class=\"w\"> <\/span><span class=\"l\">dev<\/span><span class=\"w\">\n<\/span><\/span><\/span><\/code><\/pre><\/div> <h3 id=\"dbt-models\">\n  <a href=\"#dbt-models\" class=\"header-link\">\n    dbt models\n  <\/a>\n<\/h3><p>I define the DuckDB table <code>vatternrundan<\/code> as a dbt source in <code>sources.yml<\/code>:<\/p>\n<div class=\"highlight\"><pre tabindex=\"0\" class=\"chroma\"><code class=\"language-yml\" data-lang=\"yml\"><span class=\"line\"><span class=\"cl\"><span class=\"nt\">version<\/span><span class=\"p\">:<\/span><span class=\"w\"> <\/span><span class=\"m\">2<\/span><span class=\"w\">\n<\/span><\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"w\"><\/span><span class=\"nt\">sources<\/span><span class=\"p\">:<\/span><span class=\"w\">\n<\/span><\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"w\">  <\/span>- <span class=\"nt\">name<\/span><span class=\"p\">:<\/span><span class=\"w\"> <\/span><span class=\"l\">vatternrundan_db<\/span><span class=\"w\">\n<\/span><\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"w\">    <\/span><span class=\"nt\">schema<\/span><span class=\"p\">:<\/span><span class=\"w\"> <\/span><span class=\"l\">main <\/span><span class=\"w\">\n<\/span><\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"w\">    <\/span><span class=\"nt\">tables<\/span><span class=\"p\">:<\/span><span class=\"w\">\n<\/span><\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"w\">    <\/span>- <span class=\"nt\">name<\/span><span class=\"p\">:<\/span><span class=\"w\"> <\/span><span class=\"l\">vatternrundan<\/span><span class=\"w\">\n<\/span><\/span><\/span><\/code><\/pre><\/div><p>This means I can now refer to this raw data in dbt.\nMoreover, I want to create two more models. One thin staging layer over the raw data <code>stg_vatternrundan.sql<\/code>, and one refined model that aggregates the rider data by country <code>average_speed_by_country.sql<\/code>.<\/p>\n<p>The staging model is stored in the dbt project as <code>models\/staging\/stg_vatternrundan.sql<\/code>:<\/p>\n<div class=\"highlight\"><pre tabindex=\"0\" class=\"chroma\"><code class=\"language-sql\" data-lang=\"sql\"><span class=\"line\"><span class=\"cl\"><span class=\"k\">select<\/span><span class=\"w\"> \n<\/span><\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"w\">    <\/span><span class=\"n\">startnumber<\/span><span class=\"p\">::<\/span><span class=\"n\">int64<\/span><span class=\"w\"> <\/span><span class=\"k\">as<\/span><span class=\"w\"> <\/span><span class=\"n\">startnumber<\/span><span class=\"p\">,<\/span><span class=\"w\">\n<\/span><\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"w\">    <\/span><span class=\"n\">city<\/span><span class=\"p\">::<\/span><span class=\"nb\">varchar<\/span><span class=\"w\"> <\/span><span class=\"k\">as<\/span><span class=\"w\"> <\/span><span class=\"n\">city<\/span><span class=\"p\">,<\/span><span class=\"w\">\n<\/span><\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"w\">    <\/span><span class=\"n\">country<\/span><span class=\"p\">::<\/span><span class=\"nb\">varchar<\/span><span class=\"w\"> <\/span><span class=\"k\">as<\/span><span class=\"w\"> <\/span><span class=\"n\">country<\/span><span class=\"p\">,<\/span><span class=\"w\">\n<\/span><\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"w\">    <\/span><span class=\"n\">to_microseconds<\/span><span class=\"p\">((<\/span><span class=\"n\">result_time<\/span><span class=\"o\">\/<\/span><span class=\"mi\">1000<\/span><span class=\"p\">)::<\/span><span class=\"n\">int64<\/span><span class=\"p\">)<\/span><span class=\"w\"> <\/span><span class=\"k\">as<\/span><span class=\"w\"> <\/span><span class=\"n\">result_time<\/span><span class=\"p\">,<\/span><span class=\"w\">\n<\/span><\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"w\"><\/span><span class=\"k\">from<\/span><span class=\"w\"> <\/span><span class=\"err\">{{<\/span><span class=\"w\"> <\/span><span class=\"k\">source<\/span><span class=\"p\">(<\/span><span class=\"s1\">&#39;vatternrundan_db&#39;<\/span><span class=\"p\">,<\/span><span class=\"w\"> <\/span><span class=\"s1\">&#39;vatternrundan&#39;<\/span><span class=\"p\">)<\/span><span class=\"w\"> <\/span><span class=\"err\">}}<\/span><span class=\"w\">\n<\/span><\/span><\/span><\/code><\/pre><\/div><p>As you can see, I refer to the DuckDB source table, define the column types, and do some light conversions.<\/p>\n<p>The aggregation model is stored in <code>models\/refined\/average_speed_by_country.sql<\/code>:<\/p>\n<div class=\"highlight\"><pre tabindex=\"0\" class=\"chroma\"><code class=\"language-sql\" data-lang=\"sql\"><span class=\"line\"><span class=\"cl\"><span class=\"k\">with<\/span><span class=\"w\"> <\/span><span class=\"n\">speeds<\/span><span class=\"w\"> <\/span><span class=\"k\">as<\/span><span class=\"w\"> <\/span><span class=\"p\">(<\/span><span class=\"w\">\n<\/span><\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"w\">    <\/span><span class=\"k\">select<\/span><span class=\"w\"> <\/span><span class=\"n\">startnumber<\/span><span class=\"p\">,<\/span><span class=\"w\">\n<\/span><\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"w\">    <\/span><span class=\"n\">country<\/span><span class=\"p\">,<\/span><span class=\"w\">\n<\/span><\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"w\">    <\/span><span class=\"mi\">60<\/span><span class=\"w\"> <\/span><span class=\"o\">*<\/span><span class=\"w\"> <\/span><span class=\"n\">datepart<\/span><span class=\"p\">(<\/span><span class=\"s1\">&#39;hours&#39;<\/span><span class=\"p\">,<\/span><span class=\"w\"> <\/span><span class=\"n\">result_time<\/span><span class=\"p\">)<\/span><span class=\"w\"> <\/span><span class=\"o\">+<\/span><span class=\"w\"> <\/span><span class=\"n\">datepart<\/span><span class=\"p\">(<\/span><span class=\"s1\">&#39;minutes&#39;<\/span><span class=\"p\">,<\/span><span class=\"w\"> <\/span><span class=\"n\">result_time<\/span><span class=\"p\">)<\/span><span class=\"w\"> <\/span><span class=\"k\">as<\/span><span class=\"w\"> <\/span><span class=\"n\">result_time_minutes<\/span><span class=\"p\">,<\/span><span class=\"w\">\n<\/span><\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"w\">    <\/span><span class=\"mi\">315<\/span><span class=\"w\"> <\/span><span class=\"o\">*<\/span><span class=\"w\"> <\/span><span class=\"mi\">60<\/span><span class=\"w\"> <\/span><span class=\"o\">\/<\/span><span class=\"w\"> <\/span><span class=\"n\">result_time_minutes<\/span><span class=\"w\"> <\/span><span class=\"k\">as<\/span><span class=\"w\"> <\/span><span class=\"n\">average_speed<\/span><span class=\"p\">,<\/span><span class=\"w\">\n<\/span><\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"w\">    <\/span><span class=\"k\">from<\/span><span class=\"w\"> <\/span><span class=\"err\">{{<\/span><span class=\"k\">ref<\/span><span class=\"p\">(<\/span><span class=\"s1\">&#39;stg_vatternrundan&#39;<\/span><span class=\"p\">)<\/span><span class=\"err\">}}<\/span><span class=\"w\">\n<\/span><\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"w\"><\/span><span class=\"p\">)<\/span><span class=\"w\">\n<\/span><\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"w\">\n<\/span><\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"w\"><\/span><span class=\"k\">select<\/span><span class=\"w\"> <\/span><span class=\"n\">country<\/span><span class=\"p\">,<\/span><span class=\"w\"> <\/span><span class=\"k\">count<\/span><span class=\"p\">(<\/span><span class=\"o\">*<\/span><span class=\"p\">)<\/span><span class=\"w\"> <\/span><span class=\"k\">as<\/span><span class=\"w\"> <\/span><span class=\"n\">num_riders<\/span><span class=\"p\">,<\/span><span class=\"w\"> <\/span><span class=\"n\">mean<\/span><span class=\"p\">(<\/span><span class=\"n\">average_speed<\/span><span class=\"p\">)<\/span><span class=\"w\"> <\/span><span class=\"k\">as<\/span><span class=\"w\"> <\/span><span class=\"n\">average_speed_of_country<\/span><span class=\"w\"> \n<\/span><\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"w\"><\/span><span class=\"k\">from<\/span><span class=\"w\"> <\/span><span class=\"n\">speeds<\/span><span class=\"w\">\n<\/span><\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"w\"><\/span><span class=\"k\">group<\/span><span class=\"w\"> <\/span><span class=\"k\">by<\/span><span class=\"w\"> <\/span><span class=\"n\">country<\/span><span class=\"w\">\n<\/span><\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"w\"><\/span><span class=\"k\">order<\/span><span class=\"w\"> <\/span><span class=\"k\">by<\/span><span class=\"w\"> <\/span><span class=\"n\">average_speed_of_country<\/span><span class=\"w\"> <\/span><span class=\"k\">desc<\/span><span class=\"w\">\n<\/span><\/span><\/span><\/code><\/pre><\/div><p>I use this dbt model to compute the average rider speed for every country in the dataset.<\/p>\n<p>Now that I have defined my dbt models, I configure dbt to run them by adding the following to <code>dbt_project.yml<\/code>:<\/p>\n<div class=\"highlight\"><pre tabindex=\"0\" class=\"chroma\"><code class=\"language-yml\" data-lang=\"yml\"><span class=\"line\"><span class=\"cl\"><span class=\"nt\">models<\/span><span class=\"p\">:<\/span><span class=\"w\">\n<\/span><\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"w\">  <\/span><span class=\"nt\">local_warehouse<\/span><span class=\"p\">:<\/span><span class=\"w\">\n<\/span><\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"w\">    <\/span><span class=\"nt\">staging<\/span><span class=\"p\">:<\/span><span class=\"w\">\n<\/span><\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"w\">      <\/span><span class=\"nt\">+schema<\/span><span class=\"p\">:<\/span><span class=\"w\"> <\/span><span class=\"l\">staging<\/span><span class=\"w\">\n<\/span><\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"w\">      <\/span><span class=\"nt\">+materialized<\/span><span class=\"p\">:<\/span><span class=\"w\"> <\/span><span class=\"l\">table<\/span><span class=\"w\">\n<\/span><\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"w\">    <\/span><span class=\"nt\">refined<\/span><span class=\"p\">:<\/span><span class=\"w\">\n<\/span><\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"w\">      <\/span><span class=\"nt\">+schema<\/span><span class=\"p\">:<\/span><span class=\"w\"> <\/span><span class=\"l\">refined<\/span><span class=\"w\">\n<\/span><\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"w\">      <\/span><span class=\"nt\">+materialized<\/span><span class=\"p\">:<\/span><span class=\"w\"> <\/span><span class=\"l\">table<\/span><span class=\"w\">\n<\/span><\/span><\/span><\/code><\/pre><\/div><p>This tells dbt how to materialize the dbt models in DuckDB. So let&rsquo;s run dbt to build the models:<\/p>\n<div class=\"highlight\"><pre tabindex=\"0\" class=\"chroma\"><code class=\"language-sh\" data-lang=\"sh\"><span class=\"line\"><span class=\"cl\">poetry run dbt run --profiles-dir<span class=\"o\">=<\/span>.\n<\/span><\/span><\/code><\/pre><\/div> <h3 id=\"check-tables-were-created\">\n  <a href=\"#check-tables-were-created\" class=\"header-link\">\n    Check tables were created\n  <\/a>\n<\/h3><p>I can verify in DuckDB that these tables have been created<\/p>\n<p>Run<\/p>\n<div class=\"highlight\"><pre tabindex=\"0\" class=\"chroma\"><code class=\"language-sh\" data-lang=\"sh\"><span class=\"line\"><span class=\"cl\">duckdb database.duckdb\n<\/span><\/span><\/code><\/pre><\/div><p>followed by:<\/p>\n<div class=\"highlight\"><pre tabindex=\"0\" class=\"chroma\"><code class=\"language-sql\" data-lang=\"sql\"><span class=\"line\"><span class=\"cl\"><span class=\"k\">show<\/span><span class=\"w\"> <\/span><span class=\"k\">all<\/span><span class=\"w\"> <\/span><span class=\"n\">tables<\/span><span class=\"p\">;<\/span><span class=\"w\">\n<\/span><\/span><\/span><\/code><\/pre><\/div><p>which should now show:<\/p>\n<pre tabindex=\"0\"><code>database.main_staging.stg_vatternrundan\ndatabase.main_refined.average_speed_by_country\n<\/code><\/pre> <h2 id=\"superset-for-data-analytics-dashboards\">\n  <a href=\"#superset-for-data-analytics-dashboards\" class=\"header-link\">\n    Superset for data analytics dashboards\n  <\/a>\n<\/h2><p>After modelling my data, I want to create a dashboard and visualize the data. I use Apache Superset as my dashboarding tool. Superset works with many analytical databases. However, setting it up to work with DuckDB is a bit clunky and I had to troubleshoot quite a bit.<\/p>\n <h3 id=\"install-superset-with-duckdb-support\">\n  <a href=\"#install-superset-with-duckdb-support\" class=\"header-link\">\n    Install Superset with DuckDB support\n  <\/a>\n<\/h3><p>I follow the \n<a href=\"https:\/\/superset.apache.org\/docs\/quickstart\/\" \n\n    target=\"_blank\"\n    >Quickstart Guide\n    \n        <span style=\"white-space: nowrap\">&thinsp;<svg\n            style=\"height: 0.7em; width: 0.7em; padding-left: -0.2em;\" focusable=\"false\" data-prefix=\"fas\" data-icon=\"external-link-alt\"\n            class=\"svg-inline--fa fa-external-link-alt fa-w-16\" role=\"img\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\"\n            viewBox=\"0 0 512 512\">\n            <path fill=\"currentColor\"\n                d=\"M432,320H400a16,16,0,0,0-16,16V448H64V128H208a16,16,0,0,0,16-16V80a16,16,0,0,0-16-16H48A48,48,0,0,0,0,112V464a48,48,0,0,0,48,48H400a48,48,0,0,0,48-48V336A16,16,0,0,0,432,320ZM488,0h-128c-21.37,0-32.05,25.91-17,41l35.73,35.73L135,320.37a24,24,0,0,0,0,34L157.67,377a24,24,0,0,0,34,0L435.28,133.32,471,169c15,15,41,4.5,41-17V24A24,24,0,0,0,488,0Z\">\n            <\/path>\n        <\/svg>\n    <\/span>\n        \n    \n<\/a> with some modifications. First, I download the git repository and checkout the last tagged commit:<\/p>\n<div class=\"highlight\"><pre tabindex=\"0\" class=\"chroma\"><code class=\"language-sh\" data-lang=\"sh\"><span class=\"line\"><span class=\"cl\">git clone https:\/\/github.com\/apache\/superset\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"nb\">cd<\/span> superset\n<\/span><\/span><span class=\"line\"><span class=\"cl\">git checkout tags\/4.1.1\n<\/span><\/span><\/code><\/pre><\/div><p>I need to make two modifications before building the container.<\/p>\n<ol>\n<li>Install <code>duckdb-engine<\/code> inside the container. This allows us later to select DuckDB as a database in Superset. To do this, add<\/li>\n<\/ol>\n<div class=\"highlight\"><pre tabindex=\"0\" class=\"chroma\"><code class=\"language-yml\" data-lang=\"yml\"><span class=\"line\"><span class=\"cl\"><span class=\"l\">RUN pip install duckdb-engine<\/span><span class=\"w\">\n<\/span><\/span><\/span><\/code><\/pre><\/div><p>to the <code>Dockerfile<\/code>.<\/p>\n<ol start=\"2\">\n<li>I need to make the DuckDB database file available in the Docker container. I add its path as a volume. Modify <code>docker-compose-image-tag.yml<\/code>:<\/li>\n<\/ol>\n<div class=\"highlight\"><pre tabindex=\"0\" class=\"chroma\"><code class=\"language-yml\" data-lang=\"yml\"><span class=\"line\"><span class=\"cl\"><span class=\"nt\">superset<\/span><span class=\"p\">:<\/span><span class=\"w\">\n<\/span><\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"w\"><\/span><span class=\"p\">[<\/span><span class=\"l\">...]<\/span><span class=\"w\">\n<\/span><\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"w\">    <\/span><span class=\"nt\">volumes<\/span><span class=\"p\">:<\/span><span class=\"w\">\n<\/span><\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"w\">        <\/span>- <span class=\"l\">.\/docker:\/app\/docker<\/span><span class=\"w\">\n<\/span><\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"w\">        <\/span>- <span class=\"l\">superset_home:\/app\/superset_home<\/span><span class=\"w\">\n<\/span><\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"w\">        <\/span>- <span class=\"l\">&lt;local-machine-path-to-duckdb-database&gt;:\/app\/duckdb  <\/span><span class=\"w\">\n<\/span><\/span><\/span><\/code><\/pre><\/div><p>and force the Superset container to build from the Dockerfile and not from the image:<\/p>\n<div class=\"highlight\"><pre tabindex=\"0\" class=\"chroma\"><code class=\"language-yml\" data-lang=\"yml\"><span class=\"line\"><span class=\"cl\"><span class=\"nt\">superset<\/span><span class=\"p\">:<\/span><span class=\"w\">\n<\/span><\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"w\"><\/span><span class=\"p\">[<\/span><span class=\"l\">...]<\/span><span class=\"w\">\n<\/span><\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"w\">    <\/span><span class=\"nt\">build<\/span><span class=\"p\">:<\/span><span class=\"w\">\n<\/span><\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"w\">        <\/span><span class=\"nt\">context<\/span><span class=\"p\">:<\/span><span class=\"w\"> <\/span><span class=\"l\">.<\/span><span class=\"w\">\n<\/span><\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"w\">        <\/span><span class=\"nt\">dockerfile<\/span><span class=\"p\">:<\/span><span class=\"w\"> <\/span><span class=\"l\">Dockerfile<\/span><span class=\"w\">\n<\/span><\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"w\"><\/span><span class=\"c\">#   image: apache\/superset: comment out or remove<\/span><span class=\"w\">\n<\/span><\/span><\/span><\/code><\/pre><\/div><p>Then build the container with<\/p>\n<div class=\"highlight\"><pre tabindex=\"0\" class=\"chroma\"><code class=\"language-sh\" data-lang=\"sh\"><span class=\"line\"><span class=\"cl\">docker compose build\n<\/span><\/span><\/code><\/pre><\/div><p>and start Superset with<\/p>\n<div class=\"highlight\"><pre tabindex=\"0\" class=\"chroma\"><code class=\"language-sh\" data-lang=\"sh\"><span class=\"line\"><span class=\"cl\">docker compose -f docker-compose-image-tag.yml up\n<\/span><\/span><\/code><\/pre><\/div><p>Then, I can access Superset with my browser under <code>http:\/\/localhost:8088<\/code> and login with username: <code>admin<\/code> and password: <code>admin<\/code>.<\/p>\n <h3 id=\"configure-superset\">\n  <a href=\"#configure-superset\" class=\"header-link\">\n    Configure Superset\n  <\/a>\n<\/h3><p>Now, I can add DuckDB as a database in the Superset UI. I navigate to <code>Settings<\/code> -&gt; <code>Data<\/code> -&gt; <code>Database Connections<\/code> <code>+ Database<\/code> -&gt; <code>DuckDB<\/code><\/p>\n<p>and add the link to the DuckDB database file as the <code>SQLAlchemy URI<\/code>:<\/p>\n<pre tabindex=\"0\"><code>duckdb:\/\/\/\/app\/duckdb\/database.duckdb?access_mode=READ_ONLY\n<\/code><\/pre><p>Clicking on <code>Test Connection<\/code> should respond with &ldquo;Connection looks good!&rdquo;.<\/p>\n <h3 id=\"add-datasets-in-superset\">\n  <a href=\"#add-datasets-in-superset\" class=\"header-link\">\n    Add datasets in Superset\n  <\/a>\n<\/h3><p>In the navbar under <code>Datasets<\/code> I can add the DuckDB tables <code>stg_vatternrundan<\/code> and <code>average_speed_by_country<\/code> as new Superset datasets.<\/p>\n <h3 id=\"create-dashboard-in-superset\">\n  <a href=\"#create-dashboard-in-superset\" class=\"header-link\">\n    Create dashboard in Superset\n  <\/a>\n<\/h3><p>Creating the dashboard is self-explanatory.\nIn the navbar, I select <code>Dashboards<\/code> and add a new Dashboard called &ldquo;V\u00e4tternrundan Dashboard&rdquo;. Next, I create two charts:<\/p>\n<ul>\n<li>number of riders per country<\/li>\n<li>average speed of riders per country<\/li>\n<\/ul>\n<p>Just select the relevant Superset Dataset, build the charts, and add them to the dashboard. Figure 2 shows how I did it.<\/p>\n<p>\n\n<figure>\n  <div class=\"image-wrapper\">\n  <img src=\"superset_dashboard.jpg\" alt=\"Superset Dashboard\" loading=\"lazy\" \/>\n  <figcaption>Figure 2. Visualization of V\u00e4tternrundan rider speed by country in Apache Superset.<\/figcaption>\n  <\/div>\n<\/figure>\n<\/p>\n <h2 id=\"conclusion\">\n  <a href=\"#conclusion\" class=\"header-link\">\n    Conclusion\n  <\/a>\n<\/h2><p>The combination of DuckDB, dbt, and Apache Superset is a local version of &ldquo;the modern datastack&rdquo;. This can be useful for personal projects, and to experiment with data transformations and visualizations.<\/p>\n<p>I found that the integration of DuckDB and dbt worked seamlessly. On the other hand, connecting Superset to DuckDB was clunky. Superset would benefit from better DuckDB support out-of-the-box.<\/p>\n <h2 id=\"troubleshooting\">\n  <a href=\"#troubleshooting\" class=\"header-link\">\n    Troubleshooting\n  <\/a>\n<\/h2><ul>\n<li>DuckDB only allows one connection with read-write-access, but multiple concurrent read-only connections. Make sure you are not connected to your DuckDB database with more than one client as access mode allows write by default, e.g. Superset and the DuckDB CLI client in your terminal.<\/li>\n<li>Ensure to connect Superset to DuckDB in read only mode, i.e. add <code>access_mode=READ_ONLY<\/code> to the SQLAlchemy URI to connect.<\/li>\n<\/ul>\n"},{"title":"Book notes: Poor Charlie\u2019s Almanack","link":"\/posts\/poor-charlies-almanack-notes\/","pubDate":"Fri, 21 Feb 2025 00:00:00 +0000","guid":"\/posts\/poor-charlies-almanack-notes\/","description":"<p>The book is a collection of speeches that Charlie Munger, partner of Warren Buffett at Berkshire Hathaway, gave over the years (1986 - 2007) at universities and institutions.<\/p>\n<p>I found four interesting themes across the speeches:<\/p>\n <h2 id=\"advocating-interdisciplinary-collaboration-between-university-departments\">\n  <a href=\"#advocating-interdisciplinary-collaboration-between-university-departments\" class=\"header-link\">\n    Advocating interdisciplinary collaboration between university departments\n  <\/a>\n<\/h2><p>He believes that the social sciences, especially economics and psychology, are focusing on too narrow theoretical problems. Academics suffer from a <em>man-with-a-hammer-syndrome<\/em>. He suggests that they collaborate more across departments, e.g. economics borrowing from psychology, and within departments, e.g. macroeconomists shouldn&rsquo;t avoid microeconomic explanations. Moreover, the social science should try to incorporate findings \/ models from the hard sciences (physics, chemistry, etc.) and attribute properly.<\/p>\n <h2 id=\"fraud-and-advice-for-endowment-funds\">\n  <a href=\"#fraud-and-advice-for-endowment-funds\" class=\"header-link\">\n    Fraud and advice for endowment funds\n  <\/a>\n<\/h2><p>In some of his talks, he uses fictional examples of companies to show the problems with accounting and financial engineering fraud.<\/p>\n<p>He thinks that charitable foundations and endowment funds should as much as possible avoid wasteful investment practices. He criticizes the trend of university endowment funds to employ layers of analysts and consultants to use fund of funds. These multilayered systems erode a large chunk of the endowment funds&rsquo; returns when compared to simpler alternatives, like unlevered domestic equity indices. He directly criticizes the practices of his audience of consultants and fund managers in talk six, but he uses humour and self-deprecation to get away with it.<\/p>\n <h2 id=\"inverted-advice-for-graduates\">\n  <a href=\"#inverted-advice-for-graduates\" class=\"header-link\">\n    (Inverted) Advice for Graduates\n  <\/a>\n<\/h2><p>In his commencement speeches for university graduates, he gives advice on <strong>how to guarantee misery<\/strong> in life.<\/p>\n<ol>\n<li>Be unreliable.<\/li>\n<li>Don&rsquo;t learn from other&rsquo;s mistakes. Instead, make common mistakes of others again (join a cult, drive while drunk, gamble) and don&rsquo;t learn from people that came before you.<\/li>\n<li>Give up after adversity and failures. Just give up when the inevitable hard times occur.<\/li>\n<li>Don&rsquo;t ever invert. Don&rsquo;t attempt to learn from thinking about achieving the opposite of your goals. Don&rsquo;t try to be objective.<\/li>\n<\/ol>\n <h2 id=\"mental-checklist-to-analyse-problems\">\n  <a href=\"#mental-checklist-to-analyse-problems\" class=\"header-link\">\n    Mental checklist to analyse problems\n  <\/a>\n<\/h2><p>A recurring topic in his speeches is a checklist of mental models and human biases. He developed this checklist over time and iterated through it when analysing a problem or evaluating an investment. Using appropriate checklists and \n<a href=\"https:\/\/fs.blog\/inversion\/\" \n\n    target=\"_blank\"\n    >the inversion technique\n    \n        <span style=\"white-space: nowrap\">&thinsp;<svg\n            style=\"height: 0.7em; width: 0.7em; padding-left: -0.2em;\" focusable=\"false\" data-prefix=\"fas\" data-icon=\"external-link-alt\"\n            class=\"svg-inline--fa fa-external-link-alt fa-w-16\" role=\"img\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\"\n            viewBox=\"0 0 512 512\">\n            <path fill=\"currentColor\"\n                d=\"M432,320H400a16,16,0,0,0-16,16V448H64V128H208a16,16,0,0,0,16-16V80a16,16,0,0,0-16-16H48A48,48,0,0,0,0,112V464a48,48,0,0,0,48,48H400a48,48,0,0,0,48-48V336A16,16,0,0,0,432,320ZM488,0h-128c-21.37,0-32.05,25.91-17,41l35.73,35.73L135,320.37a24,24,0,0,0,0,34L157.67,377a24,24,0,0,0,34,0L435.28,133.32,471,169c15,15,41,4.5,41-17V24A24,24,0,0,0,488,0Z\">\n            <\/path>\n        <\/svg>\n    <\/span>\n        \n    \n<\/a> are two main tools that he promotes for better thinking.<\/p>\n<p>Here are 25 human biases from talk eleven: \n<a href=\"https:\/\/fs.blog\/great-talks\/psychology-human-misjudgment\/\" \n\n    target=\"_blank\"\n    >The Psychology of Human Misjudgment\n    \n        <span style=\"white-space: nowrap\">&thinsp;<svg\n            style=\"height: 0.7em; width: 0.7em; padding-left: -0.2em;\" focusable=\"false\" data-prefix=\"fas\" data-icon=\"external-link-alt\"\n            class=\"svg-inline--fa fa-external-link-alt fa-w-16\" role=\"img\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\"\n            viewBox=\"0 0 512 512\">\n            <path fill=\"currentColor\"\n                d=\"M432,320H400a16,16,0,0,0-16,16V448H64V128H208a16,16,0,0,0,16-16V80a16,16,0,0,0-16-16H48A48,48,0,0,0,0,112V464a48,48,0,0,0,48,48H400a48,48,0,0,0,48-48V336A16,16,0,0,0,432,320ZM488,0h-128c-21.37,0-32.05,25.91-17,41l35.73,35.73L135,320.37a24,24,0,0,0,0,34L157.67,377a24,24,0,0,0,34,0L435.28,133.32,471,169c15,15,41,4.5,41-17V24A24,24,0,0,0,488,0Z\">\n            <\/path>\n        <\/svg>\n    <\/span>\n        \n    \n<\/a>.<\/p>\n<ol>\n<li>\n<p><strong>Reward and Punishment Superresponse Tendency<\/strong>: If you want to persuade people, appeal to their own interests. Ensure the incentives of people you work with are aligned with the outcome you want. Don&rsquo;t reward them for metrics that they can easily game.<\/p>\n<\/li>\n<li>\n<p><strong>Liking\/Loving Tendency<\/strong>: People are seeking love and approval from other people. Moreover, we favour people and products that are merely associated with the target of our affection. We can use this to our advantage by liking truly admirable people or ideas.<\/p>\n<\/li>\n<li>\n<p><strong>Disliking\/Hating Tendency<\/strong>: People can have a tendency to dislike things different to them or products or people that are associated with the object of their dislike.<\/p>\n<\/li>\n<li>\n<p><strong>Doubt-Avoidance Tendency<\/strong>: Our brains are conditioned to quickly remove doubt after reaching a first decision. This tendency should be countered by forcing a delay for reflection before an important decision, e.g. jury decisions in court.<\/p>\n<\/li>\n<li>\n<p><strong>Inconsistency-Avoidance Tendency<\/strong>: A tendency to stick to previous conclusions, habits, and ideas to avoid change. Practically, this means that it is much easier to prevent a bad habit than to change it. To counter this tendency, we should force the discussion of counterarguments before a decision can be made.\nThis tendency can be used to manipulate people (see also \n<a href=\"https:\/\/en.wikipedia.org\/wiki\/Ben_Franklin_effect\" \n\n    target=\"_blank\"\n    >Ben Franklin effect\n    \n        <span style=\"white-space: nowrap\">&thinsp;<svg\n                style=\"height: 0.7em; width: 0.7em; margin-left: -0.2em;\" focusable=\"false\" data-prefix=\"fas\" data-icon=\"external-link-alt\"\n                class=\"svg-inline--fa fa-external-link-alt fa-w-16\" role=\"img\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\"\n                viewBox=\"0 0 640 512\">\n                <path fill=\"currentColor\"\n                    d=\"M640 51.2l-.3 12.2c-28.1 .8-45 15.8-55.8 40.3-25 57.8-103.3 240-155.3 358.6H415l-81.9-193.1c-32.5 63.6-68.3 130-99.2 193.1-.3 .3-15 0-15-.3C172 352.3 122.8 243.4 75.8 133.4 64.4 106.7 26.4 63.4 .2 63.7c0-3.1-.3-10-.3-14.2h161.9v13.9c-19.2 1.1-52.8 13.3-43.3 34.2 21.9 49.7 103.6 240.3 125.6 288.6 15-29.7 57.8-109.2 75.3-142.8-13.9-28.3-58.6-133.9-72.8-160-9.7-17.8-36.1-19.4-55.8-19.7V49.8l142.5 .3v13.1c-19.4 .6-38.1 7.8-29.4 26.1 18.9 40 30.6 68.1 48.1 104.7 5.6-10.8 34.7-69.4 48.1-100.8 8.9-20.6-3.9-28.6-38.6-29.4 .3-3.6 0-10.3 .3-13.6 44.4-.3 111.1-.3 123.1-.6v13.6c-22.5 .8-45.8 12.8-58.1 31.7l-59.2 122.8c6.4 16.1 63.3 142.8 69.2 156.7L559.2 91.8c-8.6-23.1-36.4-28.1-47.2-28.3V49.6l127.8 1.1 .2 .5z\">\n                <\/path>\n            <\/svg>\n        <\/span>\n        \n    \n<\/a>, \n<a href=\"https:\/\/en.wikipedia.org\/wiki\/Consistency_%28negotiation%29\" \n\n    target=\"_blank\"\n    >Cialdini&rsquo;s consistency principle\n    \n        <span style=\"white-space: nowrap\">&thinsp;<svg\n                style=\"height: 0.7em; width: 0.7em; margin-left: -0.2em;\" focusable=\"false\" data-prefix=\"fas\" data-icon=\"external-link-alt\"\n                class=\"svg-inline--fa fa-external-link-alt fa-w-16\" role=\"img\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\"\n                viewBox=\"0 0 640 512\">\n                <path fill=\"currentColor\"\n                    d=\"M640 51.2l-.3 12.2c-28.1 .8-45 15.8-55.8 40.3-25 57.8-103.3 240-155.3 358.6H415l-81.9-193.1c-32.5 63.6-68.3 130-99.2 193.1-.3 .3-15 0-15-.3C172 352.3 122.8 243.4 75.8 133.4 64.4 106.7 26.4 63.4 .2 63.7c0-3.1-.3-10-.3-14.2h161.9v13.9c-19.2 1.1-52.8 13.3-43.3 34.2 21.9 49.7 103.6 240.3 125.6 288.6 15-29.7 57.8-109.2 75.3-142.8-13.9-28.3-58.6-133.9-72.8-160-9.7-17.8-36.1-19.4-55.8-19.7V49.8l142.5 .3v13.1c-19.4 .6-38.1 7.8-29.4 26.1 18.9 40 30.6 68.1 48.1 104.7 5.6-10.8 34.7-69.4 48.1-100.8 8.9-20.6-3.9-28.6-38.6-29.4 .3-3.6 0-10.3 .3-13.6 44.4-.3 111.1-.3 123.1-.6v13.6c-22.5 .8-45.8 12.8-58.1 31.7l-59.2 122.8c6.4 16.1 63.3 142.8 69.2 156.7L559.2 91.8c-8.6-23.1-36.4-28.1-47.2-28.3V49.6l127.8 1.1 .2 .5z\">\n                <\/path>\n            <\/svg>\n        <\/span>\n        \n    \n<\/a>.<\/p>\n<\/li>\n<li>\n<p><strong>Curiosity Tendency<\/strong>: Humans have a general tendency to be curious, which can be supercharged with today&rsquo;s access to information. This fortunate tendency should be used to counteract other psychological tendencies.<\/p>\n<\/li>\n<li>\n<p><strong>Kantian Fairness Tendency<\/strong>: People have a tendency in direct interactions to behave fairly (following Kant&rsquo;s categorical imperative).<\/p>\n<\/li>\n<li>\n<p><strong>Envy\/Jealously Tendency<\/strong>: People&rsquo;s tendency to envy someone else&rsquo;s status, wealth, or compensation. &ldquo;It is not greed that drives the world but envy.&rdquo; (Buffett)<\/p>\n<\/li>\n<li>\n<p><strong>Reciprocation Tendency<\/strong>: People have a tendency to reciprocate favours and disfavours. This can also be used for manipulation, e.g. a salesman could do you a small favour to get a much better outcome in negotiation.<\/p>\n<\/li>\n<li>\n<p><strong>Influence-from-Mere-Association Tendency<\/strong>: Valuing something by the association with another unrelated factor\/idea\/concept. Examples:<\/p>\n<ul>\n<li>association of quality with the highest price<\/li>\n<li>purchasing of luxury items to boost status<\/li>\n<li>advertising of products with unrelated but positive images<\/li>\n<li>associating one&rsquo;s ability with past successes and making bad decisions<\/li>\n<li>thinking of someone worse because they are a competitor<\/li>\n<\/ul>\n<\/li>\n<li>\n<p><strong>Simple Pain-Avoiding Psychological Denial<\/strong>: Denying reality because it is too painful to accept, e.g. addiction, bankruptcy.<\/p>\n<ul>\n<li>Denial of a reality that&rsquo;s too painful to accept, e.g. addiction to alcohol.<\/li>\n<\/ul>\n<\/li>\n<li>\n<p><strong>Excessive Self-Regard Tendency<\/strong>: People&rsquo;s tendency to overestimate their abilities, decisions, and possessions. Moreover, their preference of people that are similar to them.<\/p>\n<\/li>\n<li>\n<p><strong>Overoptimism Tendency<\/strong>: A tendency to be overly optimistic of the future, especially if one has done well in the past.<\/p>\n<\/li>\n<li>\n<p><strong>Deprival-Superreaction Tendency<\/strong>: Reacting stronger to losses than to gains, e.g. losing $10 is considered worse than gaining $10. Irrational overreaction to threatened loss of status, territory, love, friendship, or property.<\/p>\n<\/li>\n<li>\n<p><strong>Social-Proof Tendency<\/strong>: A tendency to act and think the same way as people around you (Group-think).<\/p>\n<\/li>\n<li>\n<p><strong>Contrast-Misreaction Tendency<\/strong>: Making bad decisions by anchoring on an irrelevant comparison. Examples:<\/p>\n<ul>\n<li>Adding $1000 of useless add-ons to a car only because the car costs $65k.<\/li>\n<li>A real estate agent presenting 3 terrible and expensive houses, then showing a merely bad house to make it look more desirable.<\/li>\n<\/ul>\n<\/li>\n<li>\n<p><strong>Stress-Influence Tendency<\/strong>: Light stress can increase performance temporarily, while heavy stress can cause dysfunctional thinking and bad decision-making.<\/p>\n<\/li>\n<li>\n<p><strong>Availability-Misweighting Tendency<\/strong>: Tendency to overweight or overvalue things (people, decisions, work, ideas) that are close or readily accessible to you. This also holds for metrics that are easy to measure and actions that easy to take. This can be countered by following checklists of actions and by considering more difficult (or less accessible) alternatives.<\/p>\n<\/li>\n<li>\n<p><strong>Use-It-or-Lose-It Tendency<\/strong>: Over time, our skills and knowledge fades. Therefore we should deliberately train and repeat the skills we want to retain. Write them down as a checklist and work through them regularly.<\/p>\n<\/li>\n<li>\n<p><strong>Drug-Misinfluence Tendency<\/strong>: Most people can&rsquo;t handle drugs responsibly over a long period of time. Not worth trying to prove that you can.<\/p>\n<\/li>\n<li>\n<p><strong>Senescence-Misinfluence Tendency:<\/strong> Older people have a harder time to learn new skills. Knowing this, the best counter is to actively maintain the accumulated knowledge, see 19).<\/p>\n<\/li>\n<li>\n<p><strong>Authority-Misinfluence Tendency<\/strong>: Tendency to blindly follow the leader (see also: \n<a href=\"https:\/\/psychsafety.com\/the-hippo\/\" \n\n    target=\"_blank\"\n    >HiPPO effect\n    \n        <span style=\"white-space: nowrap\">&thinsp;<svg\n            style=\"height: 0.7em; width: 0.7em; padding-left: -0.2em;\" focusable=\"false\" data-prefix=\"fas\" data-icon=\"external-link-alt\"\n            class=\"svg-inline--fa fa-external-link-alt fa-w-16\" role=\"img\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\"\n            viewBox=\"0 0 512 512\">\n            <path fill=\"currentColor\"\n                d=\"M432,320H400a16,16,0,0,0-16,16V448H64V128H208a16,16,0,0,0,16-16V80a16,16,0,0,0-16-16H48A48,48,0,0,0,0,112V464a48,48,0,0,0,48,48H400a48,48,0,0,0,48-48V336A16,16,0,0,0,432,320ZM488,0h-128c-21.37,0-32.05,25.91-17,41l35.73,35.73L135,320.37a24,24,0,0,0,0,34L157.67,377a24,24,0,0,0,34,0L435.28,133.32,471,169c15,15,41,4.5,41-17V24A24,24,0,0,0,488,0Z\">\n            <\/path>\n        <\/svg>\n    <\/span>\n        \n    \n<\/a>).<\/p>\n<\/li>\n<li>\n<p><strong>Twaddle Tendency<\/strong>: Some people waste time talking about things they are not an expert in. Try to separate these people from and follow the experts.<\/p>\n<\/li>\n<li>\n<p><strong>Reason-Respecting Tendency:<\/strong> People can learn better when they can think through the reasons behind a directive or action. Therefore, when giving orders explain your reasoning.<\/p>\n<\/li>\n<li>\n<p><strong>Lollapalooza Tendency<\/strong>: Often multiple human biases act together to drive a certain behaviour or outcome.<\/p>\n<\/li>\n<\/ol>\n <h2 id=\"conclusion\">\n  <a href=\"#conclusion\" class=\"header-link\">\n    Conclusion\n  <\/a>\n<\/h2><p>I gave only 3\/5 stars because the format of reading his talks didn&rsquo;t appeal to me. I was already aware of most of the human biases listed by Munger. Some are discussed in \n<a href=\"https:\/\/en.wikipedia.org\/wiki\/Thinking,_Fast_and_Slow\" \n\n    target=\"_blank\"\n    >Thinking Fast and Slow (2011)\n    \n        <span style=\"white-space: nowrap\">&thinsp;<svg\n                style=\"height: 0.7em; width: 0.7em; margin-left: -0.2em;\" focusable=\"false\" data-prefix=\"fas\" data-icon=\"external-link-alt\"\n                class=\"svg-inline--fa fa-external-link-alt fa-w-16\" role=\"img\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\"\n                viewBox=\"0 0 640 512\">\n                <path fill=\"currentColor\"\n                    d=\"M640 51.2l-.3 12.2c-28.1 .8-45 15.8-55.8 40.3-25 57.8-103.3 240-155.3 358.6H415l-81.9-193.1c-32.5 63.6-68.3 130-99.2 193.1-.3 .3-15 0-15-.3C172 352.3 122.8 243.4 75.8 133.4 64.4 106.7 26.4 63.4 .2 63.7c0-3.1-.3-10-.3-14.2h161.9v13.9c-19.2 1.1-52.8 13.3-43.3 34.2 21.9 49.7 103.6 240.3 125.6 288.6 15-29.7 57.8-109.2 75.3-142.8-13.9-28.3-58.6-133.9-72.8-160-9.7-17.8-36.1-19.4-55.8-19.7V49.8l142.5 .3v13.1c-19.4 .6-38.1 7.8-29.4 26.1 18.9 40 30.6 68.1 48.1 104.7 5.6-10.8 34.7-69.4 48.1-100.8 8.9-20.6-3.9-28.6-38.6-29.4 .3-3.6 0-10.3 .3-13.6 44.4-.3 111.1-.3 123.1-.6v13.6c-22.5 .8-45.8 12.8-58.1 31.7l-59.2 122.8c6.4 16.1 63.3 142.8 69.2 156.7L559.2 91.8c-8.6-23.1-36.4-28.1-47.2-28.3V49.6l127.8 1.1 .2 .5z\">\n                <\/path>\n            <\/svg>\n        <\/span>\n        \n    \n<\/a> and \n<a href=\"https:\/\/en.wikipedia.org\/wiki\/Influence:_Science_and_Practice\" \n\n    target=\"_blank\"\n    >Influence: Science and Practice (2001)\n    \n        <span style=\"white-space: nowrap\">&thinsp;<svg\n                style=\"height: 0.7em; width: 0.7em; margin-left: -0.2em;\" focusable=\"false\" data-prefix=\"fas\" data-icon=\"external-link-alt\"\n                class=\"svg-inline--fa fa-external-link-alt fa-w-16\" role=\"img\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\"\n                viewBox=\"0 0 640 512\">\n                <path fill=\"currentColor\"\n                    d=\"M640 51.2l-.3 12.2c-28.1 .8-45 15.8-55.8 40.3-25 57.8-103.3 240-155.3 358.6H415l-81.9-193.1c-32.5 63.6-68.3 130-99.2 193.1-.3 .3-15 0-15-.3C172 352.3 122.8 243.4 75.8 133.4 64.4 106.7 26.4 63.4 .2 63.7c0-3.1-.3-10-.3-14.2h161.9v13.9c-19.2 1.1-52.8 13.3-43.3 34.2 21.9 49.7 103.6 240.3 125.6 288.6 15-29.7 57.8-109.2 75.3-142.8-13.9-28.3-58.6-133.9-72.8-160-9.7-17.8-36.1-19.4-55.8-19.7V49.8l142.5 .3v13.1c-19.4 .6-38.1 7.8-29.4 26.1 18.9 40 30.6 68.1 48.1 104.7 5.6-10.8 34.7-69.4 48.1-100.8 8.9-20.6-3.9-28.6-38.6-29.4 .3-3.6 0-10.3 .3-13.6 44.4-.3 111.1-.3 123.1-.6v13.6c-22.5 .8-45.8 12.8-58.1 31.7l-59.2 122.8c6.4 16.1 63.3 142.8 69.2 156.7L559.2 91.8c-8.6-23.1-36.4-28.1-47.2-28.3V49.6l127.8 1.1 .2 .5z\">\n                <\/path>\n            <\/svg>\n        <\/span>\n        \n    \n<\/a>. However, what I found impressive is that he talked about and used these biases already 20 years ago, before they were widely discussed. I also liked that in many speeches he directly criticizes the audience that invited him to speak. However, he does it in a charming and humorous way.<\/p>\n"},{"title":"How related posts are computed","link":"\/posts\/how-recommendations-work\/","pubDate":"Sun, 09 Feb 2025 00:00:00 +0000","guid":"\/posts\/how-recommendations-work\/","description":"<p>The number next to related posts at the bottom of each page is the advertised post&rsquo;s &ldquo;similarity&rdquo; to the currently viewed page (from 1.0 to -1.0).<\/p>\n<p>I am using the following process to compute related posts locally:<\/p>\n<ol>\n<li>Summarize every post and TIL using local Llama 3.1 (<code>llama3.1:8b-instruct-q5_0<\/code> via Ollama) with the following prompt:<\/li>\n<\/ol>\n<div class=\"promptbox\">\n    <div class=\"promptbox-title\">\n        Prompt\n    <\/div>\n    <div class=\"promptbox-content\">\n        <p>You are an analyst and editor with many years of experience in reading and synthesizing content.<\/p>\n<p>Here is a blog post:<\/p>\n<p><code>&lt;BLOGPOST&gt;<\/code><\/p>\n<p>{ blog_post }<\/p>\n<p><code>&lt;\/BLOGPOST&gt;<\/code><\/p>\n<p>Please create a comprehensive and concise summary of the blog post. Focus on the main concepts, key details, and central arguments.<\/p>\n<p><code>&lt;INSTRUCTIONS&gt;<\/code><\/p>\n<ul>\n<li>Include any specific technologies, methods, or frameworks mentioned.<\/li>\n<li>Don&rsquo;t use more than 7 sentences.<\/li>\n<li>Respond in plaintext. Don&rsquo;t add formatting or linebreak characters to your response.<\/li>\n<li>Don&rsquo;t repeat the instructions of the task. Respond directly.<\/li>\n<\/ul>\n<p><code>&lt;\/INSTRUCTIONS&gt;<\/code><\/p>\n      <\/div> \n<\/div>\n<ol start=\"2\">\n<li>Embed the summary using ChromaDB&rsquo;s default embedding model <code>all-MiniLM-L6-v2<\/code> and store the embedding and metadata about the post in a persistent \n<a href=\"https:\/\/www.trychroma.com\/\" \n\n    target=\"_blank\"\n    >ChromaDB\n    \n        <span style=\"white-space: nowrap\">&thinsp;<svg\n            style=\"height: 0.7em; width: 0.7em; padding-left: -0.2em;\" focusable=\"false\" data-prefix=\"fas\" data-icon=\"external-link-alt\"\n            class=\"svg-inline--fa fa-external-link-alt fa-w-16\" role=\"img\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\"\n            viewBox=\"0 0 512 512\">\n            <path fill=\"currentColor\"\n                d=\"M432,320H400a16,16,0,0,0-16,16V448H64V128H208a16,16,0,0,0,16-16V80a16,16,0,0,0-16-16H48A48,48,0,0,0,0,112V464a48,48,0,0,0,48,48H400a48,48,0,0,0,48-48V336A16,16,0,0,0,432,320ZM488,0h-128c-21.37,0-32.05,25.91-17,41l35.73,35.73L135,320.37a24,24,0,0,0,0,34L157.67,377a24,24,0,0,0,34,0L435.28,133.32,471,169c15,15,41,4.5,41-17V24A24,24,0,0,0,488,0Z\">\n            <\/path>\n        <\/svg>\n    <\/span>\n        \n    \n<\/a> vector database (a file on my computer).<\/li>\n<li>Compute the \n<a href=\"https:\/\/en.wikipedia.org\/wiki\/Cosine_similarity\" \n\n    target=\"_blank\"\n    >cosine similarity\n    \n        <span style=\"white-space: nowrap\">&thinsp;<svg\n                style=\"height: 0.7em; width: 0.7em; margin-left: -0.2em;\" focusable=\"false\" data-prefix=\"fas\" data-icon=\"external-link-alt\"\n                class=\"svg-inline--fa fa-external-link-alt fa-w-16\" role=\"img\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\"\n                viewBox=\"0 0 640 512\">\n                <path fill=\"currentColor\"\n                    d=\"M640 51.2l-.3 12.2c-28.1 .8-45 15.8-55.8 40.3-25 57.8-103.3 240-155.3 358.6H415l-81.9-193.1c-32.5 63.6-68.3 130-99.2 193.1-.3 .3-15 0-15-.3C172 352.3 122.8 243.4 75.8 133.4 64.4 106.7 26.4 63.4 .2 63.7c0-3.1-.3-10-.3-14.2h161.9v13.9c-19.2 1.1-52.8 13.3-43.3 34.2 21.9 49.7 103.6 240.3 125.6 288.6 15-29.7 57.8-109.2 75.3-142.8-13.9-28.3-58.6-133.9-72.8-160-9.7-17.8-36.1-19.4-55.8-19.7V49.8l142.5 .3v13.1c-19.4 .6-38.1 7.8-29.4 26.1 18.9 40 30.6 68.1 48.1 104.7 5.6-10.8 34.7-69.4 48.1-100.8 8.9-20.6-3.9-28.6-38.6-29.4 .3-3.6 0-10.3 .3-13.6 44.4-.3 111.1-.3 123.1-.6v13.6c-22.5 .8-45.8 12.8-58.1 31.7l-59.2 122.8c6.4 16.1 63.3 142.8 69.2 156.7L559.2 91.8c-8.6-23.1-36.4-28.1-47.2-28.3V49.6l127.8 1.1 .2 .5z\">\n                <\/path>\n            <\/svg>\n        <\/span>\n        \n    \n<\/a> between the embeddings for each pair of posts. A score of 1.0 indicates proportional vectors, a score of 0.0 orthogonal vectors, and a score of -1.0 opposite vectors.<\/li>\n<li>Write a yaml file that includes for every post a link to the most similar post and their similarity.<\/li>\n<li>Use a Hugo partial to include the data in the yaml file about the most relevant posts at the bottom of each page.<\/li>\n<\/ol>\n <h3 id=\"noteworthy\">\n  <a href=\"#noteworthy\" class=\"header-link\">\n    Noteworthy\n  <\/a>\n<\/h3><ul>\n<li>Maybe not surprising, but Llama 3.1 8b performed much better than Llama 3.2 3b. The latter would hallucinate quite a few parts of the summary.<\/li>\n<li>I had to explicitly instruct the model not to restate the task. Otherwise, every summary would have started with &ldquo;Here is your concise blog post summary in not more than 7 sentences.&rdquo; which would create some artificial similarity in the embeddings.<\/li>\n<li>Most blog posts are shorter than 15000 characters or 4000 tokens which I had to work out to configure the model context correctly.<\/li>\n<li>ChromaDB&rsquo;s default embedding model <code>all-MiniLM-L6-v2<\/code> has 384 dimensions and a maximum input sequence length of 256 BERT tokens (~100&ndash;150 words). I checked some of the summaries and they were around 160 &ndash; 220 BERT token. There is the option to use another embedding model that has a higher maximum sequence length. A good embedding model leaderboard to help make a choice is available \n<a href=\"https:\/\/huggingface.co\/spaces\/mteb\/leaderboard\" \n\n    target=\"_blank\"\n    >here\n    \n        <span style=\"white-space: nowrap\">&thinsp;<svg\n            style=\"height: 0.7em; width: 0.7em; padding-left: -0.2em;\" focusable=\"false\" data-prefix=\"fas\" data-icon=\"external-link-alt\"\n            class=\"svg-inline--fa fa-external-link-alt fa-w-16\" role=\"img\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\"\n            viewBox=\"0 0 512 512\">\n            <path fill=\"currentColor\"\n                d=\"M432,320H400a16,16,0,0,0-16,16V448H64V128H208a16,16,0,0,0,16-16V80a16,16,0,0,0-16-16H48A48,48,0,0,0,0,112V464a48,48,0,0,0,48,48H400a48,48,0,0,0,48-48V336A16,16,0,0,0,432,320ZM488,0h-128c-21.37,0-32.05,25.91-17,41l35.73,35.73L135,320.37a24,24,0,0,0,0,34L157.67,377a24,24,0,0,0,34,0L435.28,133.32,471,169c15,15,41,4.5,41-17V24A24,24,0,0,0,488,0Z\">\n            <\/path>\n        <\/svg>\n    <\/span>\n        \n    \n<\/a>.<\/li>\n<\/ul>\n <h3 id=\"example\">\n  <a href=\"#example\" class=\"header-link\">\n    Example\n  <\/a>\n<\/h3><p>For the blog post \n<a href=\"\/posts\/4000-weeks\/\" \n>Book notes: Four Thousand Weeks by Oliver Burkeman\n<\/a> the LLM came up with this summary which was then embedded:<\/p>\n<details class=\"llmbox llm-llama\">\n    <summary class=\"llmbox-title\">\n        Model: llama3.1:8b-instruct-q5_0 - response \n    <\/summary>\n    <div class=\"llmbox-content\">\n<p>&ldquo;The blog post discusses the book &ldquo;Four Thousand Weeks&rdquo; by Oliver Burkeman, which explores the concept of time and how to make the most of our limited lives. The author argues that modern society&rsquo;s focus on productivity and efficiency is flawed, as it leads to a never-ending cycle of work and stress. Instead, he advocates for embracing our finitude and focusing on the present moment. Key concepts include &ldquo;atelic activities&rdquo; (enjoying leisure time for its own sake), &ldquo;radical incrementalism&rdquo; (making small progress towards goals), and &ldquo;cosmic insignificance therapy&rdquo; (putting life&rsquo;s problems into perspective by considering the vastness of the universe). The author also suggests tools for embracing finitude, such as keeping a done list to focus on completed tasks. Overall, the book encourages readers to reevaluate their priorities and find meaning in the present moment.&rdquo;\n<\/div><\/p>\n<\/details>\n <h3 id=\"embeddings\">\n  <a href=\"#embeddings\" class=\"header-link\">\n    Embeddings\n  <\/a>\n<\/h3><p>I can visualize the embeddings in two dimensions using \n<a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.manifold.TSNE.html\" \n\n    target=\"_blank\"\n    >t-SNE\n    \n        <span style=\"white-space: nowrap\">&thinsp;<svg\n            style=\"height: 0.7em; width: 0.7em; padding-left: -0.2em;\" focusable=\"false\" data-prefix=\"fas\" data-icon=\"external-link-alt\"\n            class=\"svg-inline--fa fa-external-link-alt fa-w-16\" role=\"img\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\"\n            viewBox=\"0 0 512 512\">\n            <path fill=\"currentColor\"\n                d=\"M432,320H400a16,16,0,0,0-16,16V448H64V128H208a16,16,0,0,0,16-16V80a16,16,0,0,0-16-16H48A48,48,0,0,0,0,112V464a48,48,0,0,0,48,48H400a48,48,0,0,0,48-48V336A16,16,0,0,0,432,320ZM488,0h-128c-21.37,0-32.05,25.91-17,41l35.73,35.73L135,320.37a24,24,0,0,0,0,34L157.67,377a24,24,0,0,0,34,0L435.28,133.32,471,169c15,15,41,4.5,41-17V24A24,24,0,0,0,488,0Z\">\n            <\/path>\n        <\/svg>\n    <\/span>\n        \n    \n<\/a>:<\/p>\n<p>\n\n<figure>\n  <div class=\"image-wrapper\">\n  <img src=\"embeddings_visualization.png\" alt=\"Embeddings in 2D space\" loading=\"lazy\" \/>\n  <figcaption>Figure 1. Plot of blog post summary embeddings in t-SNE space.<\/figcaption>\n  <\/div>\n<\/figure>\n<\/p>\n<p>The plot looks somewhat reasonable. The code-heavy jupyter notebook posts are at the bottom, clustered around <code>duckdb-large-datasets<\/code>. The book reviews (<code>4000-weeks<\/code>, <code>how_big_things-get-done<\/code>, <code>how-to-win-friends<\/code>) are fairly close together. SQL related posts are clustered at the top. On the other hand, I would have expected <code>reading-and-note-taking<\/code> to be closer to <code>writing-well<\/code>.<\/p>\n <h3 id=\"code\">\n  <a href=\"#code\" class=\"header-link\">\n    Code\n  <\/a>\n<\/h3><p>I am using \n<a href=\"https:\/\/gitlab.com\/frankRi89\/blog\/-\/blob\/04aca65a2c58058e76df07f121a232885549ce1e\/code\/posts-similarity\/src\/similarity_generator.py\" \n\n    target=\"_blank\"\n    >this langchain script\n    \n        <span style=\"white-space: nowrap\">&thinsp;<svg\n            style=\"height: 0.7em; width: 0.7em; padding-left: -0.2em;\" focusable=\"false\" data-prefix=\"fas\" data-icon=\"external-link-alt\"\n            class=\"svg-inline--fa fa-external-link-alt fa-w-16\" role=\"img\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\"\n            viewBox=\"0 0 512 512\">\n            <path fill=\"currentColor\"\n                d=\"M432,320H400a16,16,0,0,0-16,16V448H64V128H208a16,16,0,0,0,16-16V80a16,16,0,0,0-16-16H48A48,48,0,0,0,0,112V464a48,48,0,0,0,48,48H400a48,48,0,0,0,48-48V336A16,16,0,0,0,432,320ZM488,0h-128c-21.37,0-32.05,25.91-17,41l35.73,35.73L135,320.37a24,24,0,0,0,0,34L157.67,377a24,24,0,0,0,34,0L435.28,133.32,471,169c15,15,41,4.5,41-17V24A24,24,0,0,0,488,0Z\">\n            <\/path>\n        <\/svg>\n    <\/span>\n        \n    \n<\/a> to compute the recommendations.<\/p>\n"},{"title":"Book notes: How Big Things Get Done by Bent Flyvbjerg and Dan Gardner","link":"\/posts\/how-big-things-get-done\/","pubDate":"Sun, 02 Feb 2025 00:00:00 +0000","guid":"\/posts\/how-big-things-get-done\/","description":"<p>\n<a href=\"https:\/\/itu.dk\/flyvbjerg\" \n\n    target=\"_blank\"\n    >Bent Flyvbjerg\n    \n        <span style=\"white-space: nowrap\">&thinsp;<svg\n            style=\"height: 0.7em; width: 0.7em; padding-left: -0.2em;\" focusable=\"false\" data-prefix=\"fas\" data-icon=\"external-link-alt\"\n            class=\"svg-inline--fa fa-external-link-alt fa-w-16\" role=\"img\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\"\n            viewBox=\"0 0 512 512\">\n            <path fill=\"currentColor\"\n                d=\"M432,320H400a16,16,0,0,0-16,16V448H64V128H208a16,16,0,0,0,16-16V80a16,16,0,0,0-16-16H48A48,48,0,0,0,0,112V464a48,48,0,0,0,48,48H400a48,48,0,0,0,48-48V336A16,16,0,0,0,432,320ZM488,0h-128c-21.37,0-32.05,25.91-17,41l35.73,35.73L135,320.37a24,24,0,0,0,0,34L157.67,377a24,24,0,0,0,34,0L435.28,133.32,471,169c15,15,41,4.5,41-17V24A24,24,0,0,0,488,0Z\">\n            <\/path>\n        <\/svg>\n    <\/span>\n        \n    \n<\/a> is an economics professor at IT University of Copenhagen. He maintains a database of megaprojects (power plants, opera halls, tunnels, airports) and their planned and realized timelines and budgets. He researches the reasons why modern megaprojects often fail to deliver on time and on budget. It has recently become quite popular to \n<a href=\"https:\/\/patrickcollison.com\/fast\" \n\n    target=\"_blank\"\n    >discuss\n    \n        <span style=\"white-space: nowrap\">&thinsp;<svg\n            style=\"height: 0.7em; width: 0.7em; padding-left: -0.2em;\" focusable=\"false\" data-prefix=\"fas\" data-icon=\"external-link-alt\"\n            class=\"svg-inline--fa fa-external-link-alt fa-w-16\" role=\"img\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\"\n            viewBox=\"0 0 512 512\">\n            <path fill=\"currentColor\"\n                d=\"M432,320H400a16,16,0,0,0-16,16V448H64V128H208a16,16,0,0,0,16-16V80a16,16,0,0,0-16-16H48A48,48,0,0,0,0,112V464a48,48,0,0,0,48,48H400a48,48,0,0,0,48-48V336A16,16,0,0,0,432,320ZM488,0h-128c-21.37,0-32.05,25.91-17,41l35.73,35.73L135,320.37a24,24,0,0,0,0,34L157.67,377a24,24,0,0,0,34,0L435.28,133.32,471,169c15,15,41,4.5,41-17V24A24,24,0,0,0,488,0Z\">\n            <\/path>\n        <\/svg>\n    <\/span>\n        \n    \n<\/a> the apparent decrease in speed with which large-scale projects are realized. Examples are easy to find, e.g. the delayed \n<a href=\"https:\/\/en.wikipedia.org\/wiki\/High_Speed_2\" \n\n    target=\"_blank\"\n    >High Speed Rail 2 (HS2)\n    \n        <span style=\"white-space: nowrap\">&thinsp;<svg\n                style=\"height: 0.7em; width: 0.7em; margin-left: -0.2em;\" focusable=\"false\" data-prefix=\"fas\" data-icon=\"external-link-alt\"\n                class=\"svg-inline--fa fa-external-link-alt fa-w-16\" role=\"img\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\"\n                viewBox=\"0 0 640 512\">\n                <path fill=\"currentColor\"\n                    d=\"M640 51.2l-.3 12.2c-28.1 .8-45 15.8-55.8 40.3-25 57.8-103.3 240-155.3 358.6H415l-81.9-193.1c-32.5 63.6-68.3 130-99.2 193.1-.3 .3-15 0-15-.3C172 352.3 122.8 243.4 75.8 133.4 64.4 106.7 26.4 63.4 .2 63.7c0-3.1-.3-10-.3-14.2h161.9v13.9c-19.2 1.1-52.8 13.3-43.3 34.2 21.9 49.7 103.6 240.3 125.6 288.6 15-29.7 57.8-109.2 75.3-142.8-13.9-28.3-58.6-133.9-72.8-160-9.7-17.8-36.1-19.4-55.8-19.7V49.8l142.5 .3v13.1c-19.4 .6-38.1 7.8-29.4 26.1 18.9 40 30.6 68.1 48.1 104.7 5.6-10.8 34.7-69.4 48.1-100.8 8.9-20.6-3.9-28.6-38.6-29.4 .3-3.6 0-10.3 .3-13.6 44.4-.3 111.1-.3 123.1-.6v13.6c-22.5 .8-45.8 12.8-58.1 31.7l-59.2 122.8c6.4 16.1 63.3 142.8 69.2 156.7L559.2 91.8c-8.6-23.1-36.4-28.1-47.2-28.3V49.6l127.8 1.1 .2 .5z\">\n                <\/path>\n            <\/svg>\n        <\/span>\n        \n    \n<\/a> project or the time and cost overruns of the \n<a href=\"https:\/\/en.wikipedia.org\/wiki\/Construction_of_Berlin_Brandenburg_Airport\" \n\n    target=\"_blank\"\n    >Berlin Brandenburg Airport (BER)\n    \n        <span style=\"white-space: nowrap\">&thinsp;<svg\n                style=\"height: 0.7em; width: 0.7em; margin-left: -0.2em;\" focusable=\"false\" data-prefix=\"fas\" data-icon=\"external-link-alt\"\n                class=\"svg-inline--fa fa-external-link-alt fa-w-16\" role=\"img\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\"\n                viewBox=\"0 0 640 512\">\n                <path fill=\"currentColor\"\n                    d=\"M640 51.2l-.3 12.2c-28.1 .8-45 15.8-55.8 40.3-25 57.8-103.3 240-155.3 358.6H415l-81.9-193.1c-32.5 63.6-68.3 130-99.2 193.1-.3 .3-15 0-15-.3C172 352.3 122.8 243.4 75.8 133.4 64.4 106.7 26.4 63.4 .2 63.7c0-3.1-.3-10-.3-14.2h161.9v13.9c-19.2 1.1-52.8 13.3-43.3 34.2 21.9 49.7 103.6 240.3 125.6 288.6 15-29.7 57.8-109.2 75.3-142.8-13.9-28.3-58.6-133.9-72.8-160-9.7-17.8-36.1-19.4-55.8-19.7V49.8l142.5 .3v13.1c-19.4 .6-38.1 7.8-29.4 26.1 18.9 40 30.6 68.1 48.1 104.7 5.6-10.8 34.7-69.4 48.1-100.8 8.9-20.6-3.9-28.6-38.6-29.4 .3-3.6 0-10.3 .3-13.6 44.4-.3 111.1-.3 123.1-.6v13.6c-22.5 .8-45.8 12.8-58.1 31.7l-59.2 122.8c6.4 16.1 63.3 142.8 69.2 156.7L559.2 91.8c-8.6-23.1-36.4-28.1-47.2-28.3V49.6l127.8 1.1 .2 .5z\">\n                <\/path>\n            <\/svg>\n        <\/span>\n        \n    \n<\/a>. Counter-examples are less common (or less newsworthy), e.g. the 2023 Notre-Dame Reconstruction.<\/p>\n<p>In their book Flyvbjerg\/Gardner explain the common reasons for cost and time overruns of megaprojects and how to mitigate them. I think we can apply the majority of their learnings for smaller home (kitchen renovation) and work projects (cloud migration) as well.<\/p>\n <h2 id=\"planning-is-cheap-building-is-expensive\">\n  <a href=\"#planning-is-cheap-building-is-expensive\" class=\"header-link\">\n    Planning is cheap, building is expensive\n  <\/a>\n<\/h2><p>In large construction projects, planning is a lot cheaper than building. Once a badly planned project is underway, unforeseen problems will be discovered. Delays tend to cause further delays. The total project time can be in itself a source of additional risk and cost.<\/p>\n<p>When politicians plan prestigious infrastructure projects, they have a bias for strategic misrepresentation of the cost and time to get support for it. Therefore, the government and the public need to scrutinize the project&rsquo;s goal and its planning to ensure that the plan is realistic and enough alternatives have been considered. Only get involved in such a project if it has the people and funds, including contingencies, to succeed.<\/p>\n<p>Since planning is a lot cheaper than building for large projects, Flyvbjerg\/Gardner advise investing significant time into the planning phase:<\/p>\n<ul>\n<li>understand the objective behind the proposed project (assume an outside view)<\/li>\n<li>explore alternatives and don&rsquo;t commit to the first available solution (resist quick action bias)<\/li>\n<li>don&rsquo;t forecast using the best-case scenario, instead use similar past projects to anchor your estimate<\/li>\n<li>experiment and iterate in the planning phase using digital modelling and simulations<\/li>\n<\/ul>\n <h3 id=\"understand-the-why\">\n  <a href=\"#understand-the-why\" class=\"header-link\">\n    Understand the why\n  <\/a>\n<\/h3><p><em>The project is not a goal in itself, it is how the goal is achieved<\/em>. We want to talk to the stakeholder to understand the goal. Is the project even the right approach? Good planning needs to explore the problem before jumping to the solution. This includes considering alternatives.<\/p>\n<p>There is a good example in the book about a bridge project. The original goal of the project is to connect an island with the mainland. By jumping to the bridge idea, the stakeholders ignored alternatives, e.g. a tunnel, ferries, a helipad. Or maybe a physical connection is not required. If it&rsquo;s about improved communication, maybe a high-speed broadband connection achieves the goal.<\/p>\n<p>Another framing of this idea in product development is to work backwards from the customer. Understand their needs and problems, before coming up with any solution.<\/p>\n <h3 id=\"reduce-uncertainty-via-experimentation\">\n  <a href=\"#reduce-uncertainty-via-experimentation\" class=\"header-link\">\n    Reduce uncertainty via experimentation\n  <\/a>\n<\/h3><p>A crucial part of the planning phase is to experiment with the solution. Ideally, we can simulate and iterate on the project, e.g. use 3D models of the building, low fidelity designs to show customers, or low effort versions of the animation movie we are trying to produce.<\/p>\n<p>The simulation ensures that the majority of aspects of our project are scrutinized before the building phase begins. We can assume that the project will run into problems, so we want most of the problems to occur during the <em>cheap<\/em> planning phase.<\/p>\n <h3 id=\"experience-in-people-and-technology\">\n  <a href=\"#experience-in-people-and-technology\" class=\"header-link\">\n    Experience in people and technology\n  <\/a>\n<\/h3><p>To maximize the chance of a successful project we should maximize <em>experience<\/em>. Flyvbjerg\/Gardner consider both experience in <em>key people<\/em> and in <em>technology<\/em>.\nWe should try to ensure that key people have experience in similar projects. Ask: &ldquo;Have they done it before?&rdquo;.<\/p>\n<blockquote>\n<p>&ldquo;Technology is &lsquo;frozen experience&rsquo;.&rdquo;<\/p>\n<\/blockquote>\n<p>All things being equal, we should use tried and tested off-the-shelf technology instead of shiny new technology. If we can, we should use existing designs and operational processes.<span class=\"sidenote-number\"><small class=\"sidenote\">This is the same idea behind McKinkley&rsquo;s advice to \n<a href=\"https:\/\/mcfunley.com\/choose-boring-technology\" \n\n    target=\"_blank\"\n    >Choose Boring Technology\n    \n        <span style=\"white-space: nowrap\">&thinsp;<svg\n            style=\"height: 0.7em; width: 0.7em; padding-left: -0.2em;\" focusable=\"false\" data-prefix=\"fas\" data-icon=\"external-link-alt\"\n            class=\"svg-inline--fa fa-external-link-alt fa-w-16\" role=\"img\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\"\n            viewBox=\"0 0 512 512\">\n            <path fill=\"currentColor\"\n                d=\"M432,320H400a16,16,0,0,0-16,16V448H64V128H208a16,16,0,0,0,16-16V80a16,16,0,0,0-16-16H48A48,48,0,0,0,0,112V464a48,48,0,0,0,48,48H400a48,48,0,0,0,48-48V336A16,16,0,0,0,432,320ZM488,0h-128c-21.37,0-32.05,25.91-17,41l35.73,35.73L135,320.37a24,24,0,0,0,0,34L157.67,377a24,24,0,0,0,34,0L435.28,133.32,471,169c15,15,41,4.5,41-17V24A24,24,0,0,0,488,0Z\">\n            <\/path>\n        <\/svg>\n    <\/span>\n        \n    \n<\/a>.<\/small><\/span><\/p>\n<p>The construction of the Empire State Building is used throughout the book as a positive example. One reason for this was the architect William Lamb who insisted to only use proven technology, and who created a design that allowed repeatable non-custom work steps. Moreover, the construction company Starrett Brothers and Eken had built several similar skyscrapers on time and budget before.<\/p>\n <h3 id=\"reference-class-forecasting\">\n  <a href=\"#reference-class-forecasting\" class=\"header-link\">\n    Reference-class forecasting\n  <\/a>\n<\/h3><p>After the project is planned we need to forecast the project duration and its cost. Unfortunately, planners often forecast too optimistic. Instead of deriving it from the project alone, the authors suggest to use <em>reference-class forecasting<\/em>. The idea is to find a set of comparable projects, and to anchor our forecast based on their outcomes. Then adjust from that anchor.<\/p>\n<p>However, good forecasting can not protect the project from fat-tailed risks. Since those risks can kill our project, we need to identify the (known) high risk events and try to mitigate them.<\/p>\n<blockquote>\n<p>&ldquo;Successful project leaders focus every day on not losing, while keeping a keen eye on the [&hellip;] goal they are trying to achieve.&rdquo;<\/p>\n<\/blockquote>\n <h2 id=\"after-the-planning-is-done-build-quickly\">\n  <a href=\"#after-the-planning-is-done-build-quickly\" class=\"header-link\">\n    After the planning is done, build quickly\n  <\/a>\n<\/h2><p>The project duration is in itself is a source of budget and time risk.\nAfter planning, simulating, and forecasting, we therefore need to act fast once the building phase starts. To do this we should ideally use an experienced team and a modular building pattern.<\/p>\n <h3 id=\"build-with-an-experienced-team\">\n  <a href=\"#build-with-an-experienced-team\" class=\"header-link\">\n    Build with an experienced team\n  <\/a>\n<\/h3><p>If possible, hire, what the authors call, a masterbuilder. That is someone who has experience in similar projects and can pick the right team. If such a team doesn&rsquo;t exist, you or them need to create it.<\/p>\n<p>Moreover, we need to ensure that the incentives of participating contractors are aligned with ours by sharing risks and rewards, e.g. pay a bonus for early completion. Don&rsquo;t always pick the contractor that submits the lowest bid, because the lowest bid doesn&rsquo;t necessarily lead to the lowest cost. Moreover, we will try to choose companies that we have successfully collaborated  with before.<\/p>\n <h3 id=\"whats-our-lego\">\n  <a href=\"#whats-our-lego\" class=\"header-link\">\n    What&rsquo;s our Lego?\n  <\/a>\n<\/h3><p>Nuclear power plant constructions belong to the class of projects that most likely exceed cost and time budgets. Flyvbjerg\/Gardner argue that you are building one large thing that has few repeatable, standardized parts. Solar power plants are the opposite. The core ingredient, the solar modules, can be produced rapidly and repeatedly in factories, and then assembled in a modular fashion.<\/p>\n<p>When possible we should try to build modules. We can then produce and assemble the modules in a repeatable process. This delivers value in stages. After we have completed a module, we can use the learnings to iterate on the module&rsquo;s design and assembly process. So we should always ask ourselves:<\/p>\n<blockquote>\n<p>&ldquo;What&rsquo;s our basic building block, the thing we will repeatedly make, becoming smarter and better each time. What&rsquo;s our Lego?&rdquo;<\/p>\n<\/blockquote>\n <h2 id=\"additional-thoughts-about-the-book\">\n  <a href=\"#additional-thoughts-about-the-book\" class=\"header-link\">\n    Additional thoughts about the book\n  <\/a>\n<\/h2> <h3 id=\"incentives-and-identity\">\n  <a href=\"#incentives-and-identity\" class=\"header-link\">\n    Incentives and identity\n  <\/a>\n<\/h3><p>There is a section in the book that discusses why the Heathrow Terminal 5 project was completed on time. The authors argue that a large part was the alignment of incentives between the project managers, the contractors, and the workers. Contractors were aligned via contractual bonuses and established working relationship. Workers were treated well, their feedback was actively encouraged, and they felt like they are contributing to a historic project in their country. Not part of the book, but the Notre-Dame reconstruction after the fire, was also accomplished on time. I believe that the history of the building, and the feeling of being part of a national project are very powerful motivators that aligned the participants of the project.<\/p>\n <h3 id=\"cost-overruns-by-project-type\">\n  <a href=\"#cost-overruns-by-project-type\" class=\"header-link\">\n    Cost overruns by project type\n  <\/a>\n<\/h3><p>I found the table in the book that shows the mean cost overrun by category interesting.<\/p>\n<p>The table shows the mean (base rate) cost overrun for each project type. The fat-ness of the distribution (% projects in the upper tail), and the (base rate) cost overrun for projects in the upper tail.<\/p>\n<table>\n  <thead>\n      <tr>\n          <th style=\"text-align: center\">Project type<\/th>\n          <th style=\"text-align: center\">Mean cost overrun (%)<\/th>\n          <th style=\"text-align: center\">% of projects in tail (&gt;= 50% overrun)<\/th>\n          <th style=\"text-align: center\">Mean overrun of projects in tail (%)<\/th>\n      <\/tr>\n  <\/thead>\n  <tbody>\n      <tr>\n          <td style=\"text-align: center\">Nuclear storage<\/td>\n          <td style=\"text-align: center\">238<\/td>\n          <td style=\"text-align: center\">48<\/td>\n          <td style=\"text-align: center\">427<\/td>\n      <\/tr>\n      <tr>\n          <td style=\"text-align: center\">Olympic Games<\/td>\n          <td style=\"text-align: center\">157<\/td>\n          <td style=\"text-align: center\">76<\/td>\n          <td style=\"text-align: center\">200<\/td>\n      <\/tr>\n      <tr>\n          <td style=\"text-align: center\">Nuclear power<\/td>\n          <td style=\"text-align: center\">120<\/td>\n          <td style=\"text-align: center\">55<\/td>\n          <td style=\"text-align: center\">204<\/td>\n      <\/tr>\n      <tr>\n          <td style=\"text-align: center\">IT<\/td>\n          <td style=\"text-align: center\">73<\/td>\n          <td style=\"text-align: center\">18<\/td>\n          <td style=\"text-align: center\">447<\/td>\n      <\/tr>\n      <tr>\n          <td style=\"text-align: center\">&hellip;<\/td>\n          <td style=\"text-align: center\">&hellip;<\/td>\n          <td style=\"text-align: center\">&hellip;<\/td>\n          <td style=\"text-align: center\">&hellip;<\/td>\n      <\/tr>\n      <tr>\n          <td style=\"text-align: center\">Buildings<\/td>\n          <td style=\"text-align: center\">62<\/td>\n          <td style=\"text-align: center\">39<\/td>\n          <td style=\"text-align: center\">206<\/td>\n      <\/tr>\n      <tr>\n          <td style=\"text-align: center\">Rail<\/td>\n          <td style=\"text-align: center\">39<\/td>\n          <td style=\"text-align: center\">28<\/td>\n          <td style=\"text-align: center\">116<\/td>\n      <\/tr>\n      <tr>\n          <td style=\"text-align: center\">Airport<\/td>\n          <td style=\"text-align: center\">39<\/td>\n          <td style=\"text-align: center\">43<\/td>\n          <td style=\"text-align: center\">88<\/td>\n      <\/tr>\n      <tr>\n          <td style=\"text-align: center\">Tunnels<\/td>\n          <td style=\"text-align: center\">37<\/td>\n          <td style=\"text-align: center\">28<\/td>\n          <td style=\"text-align: center\">103<\/td>\n      <\/tr>\n      <tr>\n          <td style=\"text-align: center\">&hellip;<\/td>\n          <td style=\"text-align: center\">&hellip;<\/td>\n          <td style=\"text-align: center\">&hellip;<\/td>\n          <td style=\"text-align: center\">&hellip;<\/td>\n      <\/tr>\n      <tr>\n          <td style=\"text-align: center\">Wind energy<\/td>\n          <td style=\"text-align: center\">13<\/td>\n          <td style=\"text-align: center\">7<\/td>\n          <td style=\"text-align: center\">97<\/td>\n      <\/tr>\n      <tr>\n          <td style=\"text-align: center\">Energy transmission<\/td>\n          <td style=\"text-align: center\">8<\/td>\n          <td style=\"text-align: center\">4<\/td>\n          <td style=\"text-align: center\">166<\/td>\n      <\/tr>\n      <tr>\n          <td style=\"text-align: center\">Solar power<\/td>\n          <td style=\"text-align: center\">1<\/td>\n          <td style=\"text-align: center\">2<\/td>\n          <td style=\"text-align: center\">50<\/td>\n      <\/tr>\n  <\/tbody>\n<\/table>\n<p>It makes sense that (non-standard, non-modular) nuclear projects have the highest base rate for cost increase. On the opposite side are wind energy and solar power projects, that can be pre-produced in factories and assembled on-site. Organizing Olympic Games suffers from the fact that they are highly complex, and usually held in a city that hasn&rsquo;t hosted them before (inexperience). As a tech worker, I am intrigued to see IT projects at the top of the list. Moreover, when IT projects overrun they incur a 447% cost overrun, the highest among all project type.<\/p>\n <h3 id=\"learnings-for-software-projects\">\n  <a href=\"#learnings-for-software-projects\" class=\"header-link\">\n    Learnings for software projects\n  <\/a>\n<\/h3><p>How applicable is the advice for day-to-day data and software projects? I think much of the advice applies. I found that chunking larger projects into small value-delivering modules is a great way to ensure continuous progress (especially in large refactoring projects).<\/p>\n<p>The emphasis on planning vs. building is probably not as relevant in normal day-to-day software projects. There is less of a cost difference between planning and building a feature (in both cases mostly the software engineers&rsquo; time) compared to construction projects.<\/p>\n"},{"title":"LLM prompt heuristics that definitely maybe work","link":"\/posts\/prompt-heuristics\/","pubDate":"Fri, 11 Oct 2024 14:56:00 +0000","guid":"\/posts\/prompt-heuristics\/","description":"<p>Effective prompt writing for large language models continues to be a dark art. Having read the prompt engineering blogs from Meta, Anthropic, and OpenAI and watched some of Anthropic&rsquo;s prompt discussions online, it does feel more like design than engineering. Even employees from the same lab don&rsquo;t agree what tricks actually work. If you have time watch \n<a href=\"https:\/\/www.youtube.com\/watch?v=hkhDdcM5V94&amp;pp=ygUqcHJvbXB0IGVuZ2luZWVyaW5nIG1hc3RlcmNsYXNzIGFpIGVuZ2luZWVy\" \n\n    target=\"_blank\"\n    >this video\n    \n\n        \n        <span style=\"white-space: nowrap\">&thinsp;<svg\n            style=\"height: 0.7em; width: 0.9em; margin-left: -0.2em;\" focusable=\"false\" data-prefix=\"fas\" data-icon=\"external-link-alt\"\n            class=\"svg-inline--fa fa-external-link-alt fa-w-16\" role=\"img\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\"\n            viewBox=\"0 0 448 512\">\n            <path fill=\"currentColor\"\n                d=\"M549.7 124.1c-6.3-23.7-24.8-42.3-48.3-48.6C458.8 64 288 64 288 64S117.2 64 74.6 75.5c-23.5 6.3-42 24.9-48.3 48.6-11.4 42.9-11.4 132.3-11.4 132.3s0 89.4 11.4 132.3c6.3 23.7 24.8 41.5 48.3 47.8C117.2 448 288 448 288 448s170.8 0 213.4-11.5c23.5-6.3 42-24.2 48.3-47.8 11.4-42.9 11.4-132.3 11.4-132.3s0-89.4-11.4-132.3zm-317.5 213.5V175.2l142.7 81.2-142.7 81.2z\">\n            <\/path>\n        <\/svg>\n    <\/span>\n\n    \n    \n<\/a> of one of Anthropic&rsquo;s prompt engineers running a prompting masterclass. While giving good heuristics to the audience he is careful not to make any definite statements.<\/p>\n<p>I don&rsquo;t believe it is a good investment to try to become a world-class prompter. All heuristics are highly dependent on the model architecture, training data, and training procedure. This means that with every iteration of the models, heuristics could become obsolete or harmful to performance.<\/p>\n<p>However, the current attention-based architecture doesn&rsquo;t seem to go anywhere\nsoon. Therefore, it is reasonable to expect that we can continue to use the prompt context to help the model to move the embeddings of the user task into directions that contain a lot of nuance and information about the domain of the task.<\/p>\n<p>Therefore, I am collecting the advice into a few guidelines that I can use with the current models. I will caveat this by saying that in 90% of my use-cases the model response is parsed by a human: me. Therefore, I\nam not as worried about hallucinations as someone that puts the model outputs in front of their customers or inside data parsing pipelines.<\/p>\n<p>Here is what seems to work in October 2024 with Claude Sonnet 3.5, GPT-4o, and\nLlama 3.2.<\/p>\n <h3 id=\"prompt-specificity\">\n  <a href=\"#prompt-specificity\" class=\"header-link\">\n    Prompt specificity\n  <\/a>\n<\/h3><ul>\n<li>Make the prompt as specific to the task as you can. This is probably the\nbiggest return on your time. A good advice given in \n<a href=\"https:\/\/www.youtube.com\/watch?v=T9aRN5JkmL8&amp;t=1769s&amp;pp=ygUccHJvbXB0IGVuZ2luZWVyaW5nIGFudGhyb3BpYw%3D%3D\" \n\n    target=\"_blank\"\n    >this discussion\n    \n\n        \n        <span style=\"white-space: nowrap\">&thinsp;<svg\n            style=\"height: 0.7em; width: 0.9em; margin-left: -0.2em;\" focusable=\"false\" data-prefix=\"fas\" data-icon=\"external-link-alt\"\n            class=\"svg-inline--fa fa-external-link-alt fa-w-16\" role=\"img\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\"\n            viewBox=\"0 0 448 512\">\n            <path fill=\"currentColor\"\n                d=\"M549.7 124.1c-6.3-23.7-24.8-42.3-48.3-48.6C458.8 64 288 64 288 64S117.2 64 74.6 75.5c-23.5 6.3-42 24.9-48.3 48.6-11.4 42.9-11.4 132.3-11.4 132.3s0 89.4 11.4 132.3c6.3 23.7 24.8 41.5 48.3 47.8C117.2 448 288 448 288 448s170.8 0 213.4-11.5c23.5-6.3 42-24.2 48.3-47.8 11.4-42.9 11.4-132.3 11.4-132.3s0-89.4-11.4-132.3zm-317.5 213.5V175.2l142.7 81.2-142.7 81.2z\">\n            <\/path>\n        <\/svg>\n    <\/span>\n\n    \n    \n<\/a> is to\nimagine printing out the prompt and giving it to a new hire at your company. Then see if they could solve the task. This forces you to give all necessary context and constraints of the task.<\/li>\n<li>Components of a role prompt: The role, its part of an organisation, its\nperspective and the perspective of the person or organisation being addressed.<\/li>\n<li>No lazy role prompts. Make the role and the context clear.\n<pre tabindex=\"0\"><code>Bad: You are a cab driver.\nGood: You are a cab driver, driving people as your full time job in London for twenty years and are knowledgble about the city, its roads, and its sights. \n<\/code><\/pre><\/li>\n<li>If the model is used inside a product, tell it about it:\n<pre tabindex=\"0\"><code>Bad: You are an assistant writing document summaries.\nGood: You are an assistant used in a product for law firms that summarizes\nlegal documents.\n<\/code><\/pre><\/li>\n<\/ul>\n <h3 id=\"order-of-prompt-components\">\n  <a href=\"#order-of-prompt-components\" class=\"header-link\">\n    Order of prompt components\n  <\/a>\n<\/h3><p>Order the components of your prompts in the following order:<\/p>\n<ol>\n<li>Specific role and context description (put into system prompt if part of interactive\nuse of the model)<\/li>\n<li>Input data, e.g. documents, code snippets, CSV files<\/li>\n<li>Task description<\/li>\n<li>(Optional:) Good examples and bad examples<\/li>\n<li>Task constraints<\/li>\n<\/ol>\n <h3 id=\"use-xml-tags-for-prompt-components\">\n  <a href=\"#use-xml-tags-for-prompt-components\" class=\"header-link\">\n    Use XML tags for prompt components\n  <\/a>\n<\/h3><p>Use XML tags to separate different components of a prompt.<\/p>\n<pre tabindex=\"0\"><code>You are an experienced post-doc at a reputable research institute in the US. You are an expert in the research field of the following paper:\n\n&lt;Research_paper&gt;\n{{research_paper}}  \n&lt;Research_paper\/&gt;\n\nYour task is to summarize the findings of the research paper given to you in\n&lt;Research_paper&gt; tags.\n\n&lt;Instructions&gt;\n- List between 1 - 5 most important findings of the paper. Don&#39;t list more\n  than 5 findings.\n- more instructions...\n&lt;\/Instructions&gt; \n<\/code><\/pre> <h3 id=\"using-examples-few-shot-prompting\">\n  <a href=\"#using-examples-few-shot-prompting\" class=\"header-link\">\n    Using examples (few shot prompting)\n  <\/a>\n<\/h3><p>It can help to provide both good and bad examples for a specific task with an\nexplanation why they are good or bad. In the above example we could provide\nan example research paper with a summary of findings we wrote ourselves as a\ngood example.<\/p>\n <h3 id=\"instructions\">\n  <a href=\"#instructions\" class=\"header-link\">\n    Instructions\n  <\/a>\n<\/h3><ul>\n<li>Response length limit. Instead of writing &ldquo;Be concise&rdquo;, tell the model what that means for your context, e.g. &ldquo;Answer in no more than 4 sentences.&rdquo;<\/li>\n<li>Avoid open ended instructions.<\/li>\n<li>Instruct for style, formatting, and restrictions.<\/li>\n<li>Instruct to ask for sources of evidence to reduce hallucinations.<\/li>\n<li>Ask it to respond in a chain-of-thought to increase performance.\n<pre tabindex=\"0\"><code>You are a logician and love to solve logic puzzles. Carefully read the following puzzle.\n\n&lt;PUZZLE&gt;\nSimon is looking at Charlie. Charlie is looking at Sarah. You know that Simon is married and Sarah is not married. Is a married person looking at an unmarried person?\n&lt;\/PUZZLE&gt;\nLet&#39;s think step by step before giving an answer.\n<\/code><\/pre><\/li>\n<\/ul>\n <h3 id=\"grammar-and-style\">\n  <a href=\"#grammar-and-style\" class=\"header-link\">\n    Grammar and style\n  <\/a>\n<\/h3><p>Avoid typos and wrong punctuation as that <em>can<\/em>  deteriorate the quality of the response.<\/p>\n <h3 id=\"parse-able-output\">\n  <a href=\"#parse-able-output\" class=\"header-link\">\n    Parse-able output\n  <\/a>\n<\/h3><p>Often LLMs add a preamble at the beginning or an epilogue at the end of a response. If you want to force the model to respond only with valid json, you can:<\/p>\n<ol>\n<li>Use the &ldquo;Prefill Claude&rsquo;s response&rdquo; feature<\/li>\n<li>Ask the model to put the json into <JSON><\/JSON> tags and then extract that\nblock from the response<\/li>\n<li>Prefill the response yourself by adding at the end of your prompt: <code>Here is the JSON: {<\/code>. The open bracket conditions the model to start the\nanswer with the first JSON key. You then need to prepend the &ldquo;{&rdquo; to the\nresponse to make it valid.<\/li>\n<\/ol>\n <h3 id=\"other-tricks\">\n  <a href=\"#other-tricks\" class=\"header-link\">\n    Other tricks\n  <\/a>\n<\/h3><ul>\n<li>\n<p>If your prompt includes logic that could be handled in code, handle it in\ncode.<\/p>\n<pre tabindex=\"0\"><code>Bad: You are part of a role-playing game that is used for training of customer support agents in a fortune 500 company. \n\nYou can assume any of the following roles based on user input. \n\nIf the user asks for Role1 assume the role of a customer asking questions about the company&#39;s products. If the user asks for Role2, assume the role of a helpful customer support agent.\n<\/code><\/pre><p>Good: Define two different prompts for each role and use code to switch\nprompts based on what role the operator wants the model to assume.<\/p>\n<\/li>\n<li>\n<p>Give the model a way out if it doesn&rsquo;t know the answer.<\/p>\n<pre tabindex=\"0\"><code>Good: {{prompt}}\nIf something weird happens and you are unsure about what to do, simply print out\n&#34;UNSURE&#34;.\n<\/code><\/pre><\/li>\n<li>\n<p>Use a temperature of 0 for fact-based, less creative tasks.<\/p>\n<\/li>\n<\/ul>\n <h3 id=\"iterative-prompt-design\">\n  <a href=\"#iterative-prompt-design\" class=\"header-link\">\n    Iterative prompt design\n  <\/a>\n<\/h3><ul>\n<li>You can use the LLM to generate examples for a task and select the good.\nexamples. Then use those examples in the prompt that is used &ldquo;in production&rdquo;.<\/li>\n<li>If the model responds incorrectly, tell it about the mistake and ask it how\nyou should modify the prompt.<\/li>\n<\/ul>\n"},{"title":"Processing 112M rows of steam reviews locally with DuckDB","link":"\/posts\/duckdb-large-datasets\/","pubDate":"Sat, 28 Sep 2024 13:43:40 +0000","guid":"\/posts\/duckdb-large-datasets\/","description":"<p>In \n<a href=\"\/posts\/duckdb-for-data-scientists\/\" \n>DuckDB use cases for data scientists: Querying remote S3 files\n<\/a> I wrote how I use DuckDB as a convenient way to query data from CSV or Parquet files in S3. Another use case for data scientists and data engineers is DuckDB&rsquo;s ability to processing larger-than-memory data on your local machine. For us data scientists this fills the gap between<\/p>\n<ul>\n<li>small data that you can transform with pandas<\/li>\n<li>big data that typically requires a multi-node processing engine like PySpark.<\/li>\n<\/ul>\n<p>In this post I am going to run some exploratory queries against this \n<a href=\"https:\/\/www.kaggle.com\/datasets\/kieranpoc\/steam-reviews\/\" \n\n    target=\"_blank\"\n    >kaggle Steam review dataset\n    \n        <span style=\"white-space: nowrap\">&thinsp;<svg\n            style=\"height: 0.7em; width: 0.7em; padding-left: -0.2em;\" focusable=\"false\" data-prefix=\"fas\" data-icon=\"external-link-alt\"\n            class=\"svg-inline--fa fa-external-link-alt fa-w-16\" role=\"img\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\"\n            viewBox=\"0 0 512 512\">\n            <path fill=\"currentColor\"\n                d=\"M432,320H400a16,16,0,0,0-16,16V448H64V128H208a16,16,0,0,0,16-16V80a16,16,0,0,0-16-16H48A48,48,0,0,0,0,112V464a48,48,0,0,0,48,48H400a48,48,0,0,0,48-48V336A16,16,0,0,0,432,320ZM488,0h-128c-21.37,0-32.05,25.91-17,41l35.73,35.73L135,320.37a24,24,0,0,0,0,34L157.67,377a24,24,0,0,0,34,0L435.28,133.32,471,169c15,15,41,4.5,41-17V24A24,24,0,0,0,488,0Z\">\n            <\/path>\n        <\/svg>\n    <\/span>\n        \n    \n<\/a>. It contains 112M rows of Steam game reviews and comes as an uncompressed 46GB CSV file (17GB compressed). Let&rsquo;s find out how my MacBook copes with it. Later I am going to run a small comparison against \n<a href=\"https:\/\/docs.pola.rs\/\" \n\n    target=\"_blank\"\n    >polars\n    \n        <span style=\"white-space: nowrap\">&thinsp;<svg\n            style=\"height: 0.7em; width: 0.7em; padding-left: -0.2em;\" focusable=\"false\" data-prefix=\"fas\" data-icon=\"external-link-alt\"\n            class=\"svg-inline--fa fa-external-link-alt fa-w-16\" role=\"img\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\"\n            viewBox=\"0 0 512 512\">\n            <path fill=\"currentColor\"\n                d=\"M432,320H400a16,16,0,0,0-16,16V448H64V128H208a16,16,0,0,0,16-16V80a16,16,0,0,0-16-16H48A48,48,0,0,0,0,112V464a48,48,0,0,0,48,48H400a48,48,0,0,0,48-48V336A16,16,0,0,0,432,320ZM488,0h-128c-21.37,0-32.05,25.91-17,41l35.73,35.73L135,320.37a24,24,0,0,0,0,34L157.67,377a24,24,0,0,0,34,0L435.28,133.32,471,169c15,15,41,4.5,41-17V24A24,24,0,0,0,488,0Z\">\n            <\/path>\n        <\/svg>\n    <\/span>\n        \n    \n<\/a>, a DataFrame library, which is often mentioned for out-of-memory processing.<span class=\"sidenote-number\"><small class=\"sidenote\">I am not comparing against pandas because its inability to work with large datasets was one of the reasons why I explored this topic. However, you could do this \n<a href=\"https:\/\/pandas.pydata.org\/docs\/user_guide\/scale.html#use-chunking\" \n\n    target=\"_blank\"\n    >with chunking\n    \n        <span style=\"white-space: nowrap\">&thinsp;<svg\n            style=\"height: 0.7em; width: 0.7em; padding-left: -0.2em;\" focusable=\"false\" data-prefix=\"fas\" data-icon=\"external-link-alt\"\n            class=\"svg-inline--fa fa-external-link-alt fa-w-16\" role=\"img\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\"\n            viewBox=\"0 0 512 512\">\n            <path fill=\"currentColor\"\n                d=\"M432,320H400a16,16,0,0,0-16,16V448H64V128H208a16,16,0,0,0,16-16V80a16,16,0,0,0-16-16H48A48,48,0,0,0,0,112V464a48,48,0,0,0,48,48H400a48,48,0,0,0,48-48V336A16,16,0,0,0,432,320ZM488,0h-128c-21.37,0-32.05,25.91-17,41l35.73,35.73L135,320.37a24,24,0,0,0,0,34L157.67,377a24,24,0,0,0,34,0L435.28,133.32,471,169c15,15,41,4.5,41-17V24A24,24,0,0,0,488,0Z\">\n            <\/path>\n        <\/svg>\n    <\/span>\n        \n    \n<\/a>.<\/small><\/span><\/p>\n <h2 id=\"useful-commands-and-settings\">\n  <a href=\"#useful-commands-and-settings\" class=\"header-link\">\n    Useful commands and settings\n  <\/a>\n<\/h2><p>Before I start, here are some useful DuckDB commands and settings that I often use:<\/p>\n<table>\n  <thead>\n      <tr>\n          <th>Command<\/th>\n          <th>What is it good for?<\/th>\n      <\/tr>\n  <\/thead>\n  <tbody>\n      <tr>\n          <td><code>.timer on<\/code><\/td>\n          <td>Prints the execution time after each SQL command<\/td>\n      <\/tr>\n      <tr>\n          <td><code>FROM duckdb_memory();<\/code><\/td>\n          <td>Gives an overview of how much memory is used by DuckDB<\/td>\n      <\/tr>\n      <tr>\n          <td><code>SET enable_progress_bar = true;<\/code><\/td>\n          <td>Prints a progress bar for query runs<\/td>\n      <\/tr>\n      <tr>\n          <td><code>SET memory_limit = '2GB';<\/code><\/td>\n          <td>Controls how much data DuckDB can keep in RAM.<\/td>\n      <\/tr>\n      <tr>\n          <td><code>.mode line<\/code><\/td>\n          <td>Prints query results one at a time (useful if many columns). Default: <code>.mode duckbox<\/code><\/td>\n      <\/tr>\n      <tr>\n          <td><code>SUMMARIZE (select *)<\/code><\/td>\n          <td>Summarizes the contents of a table.<\/td>\n      <\/tr>\n      <tr>\n          <td><code>.exit<\/code><\/td>\n          <td>Stops DuckDB.<\/td>\n      <\/tr>\n  <\/tbody>\n<\/table>\n <h2 id=\"steam-review-dataset\">\n  <a href=\"#steam-review-dataset\" class=\"header-link\">\n    Steam review dataset\n  <\/a>\n<\/h2><p>First, I want to familiarize myself with the dataset. I downloaded the 46GB file <code>all_reviews.csv<\/code> to my 2023 Macbook Pro M3 with 18GB RAM. After starting a non-persistent session with the <code>duckdb<\/code> command, I use the above commands to activate the timer and line mode. I then take a look at an example row with:<\/p>\n<div class=\"highlight\"><pre tabindex=\"0\" class=\"chroma\"><code class=\"language-sql\" data-lang=\"sql\"><span class=\"line\"><span class=\"cl\"><span class=\"w\"> <\/span><span class=\"k\">select<\/span><span class=\"w\"> <\/span><span class=\"o\">*<\/span><span class=\"w\"> <\/span><span class=\"k\">from<\/span><span class=\"w\"> <\/span><span class=\"n\">read_csv<\/span><span class=\"p\">(<\/span><span class=\"s1\">&#39;all_reviews.csv&#39;<\/span><span class=\"p\">,<\/span><span class=\"w\"> <\/span><span class=\"n\">ignore_errors<\/span><span class=\"o\">=<\/span><span class=\"k\">true<\/span><span class=\"p\">)<\/span><span class=\"w\"> <\/span><span class=\"k\">limit<\/span><span class=\"w\"> <\/span><span class=\"mi\">1<\/span><span class=\"p\">;<\/span><span class=\"w\">\n<\/span><\/span><\/span><\/code><\/pre><\/div><pre tabindex=\"0\"><code>              recommendationid = 148919893\n                         appid = 10\n                          game = Counter-Strike\n                author_steamid = 76561199036724879\n        author_num_games_owned = 0\n            author_num_reviews = 3\n       author_playtime_forever = 197\nauthor_playtime_last_two_weeks = 197\n     author_playtime_at_review = 197\n            author_last_played = 1698336369\n                      language = russian\n                        review = \u0441\u0442\u0430\u0440\u043e\u0441\u0442\u044c\n             timestamp_created = 1698336397\n             timestamp_updated = 1698336397\n                      voted_up = 1\n                      votes_up = 0\n                   votes_funny = 0\n           weighted_vote_score = 0.0\n                 comment_count = 0\n                steam_purchase = 1\n             received_for_free = 0\n   written_during_early_access = 0\n         hidden_in_steam_china = 1\n          steam_china_location = \n<\/code><\/pre><p>This gives me an overview of the file columns and an idea of their content.<\/p>\n <h2 id=\"processing-the-file-directly\">\n  <a href=\"#processing-the-file-directly\" class=\"header-link\">\n    Processing the file directly\n  <\/a>\n<\/h2><p>Rather than slowly loading all data into DuckDB, I want to test DuckDB&rsquo;s stream processing capabilities. Instead of fully materializing the data in memory, the execution engine reads and processes the data in chunks.<span class=\"sidenote-number\"><small class=\"sidenote\">This is a useful feature if you want to convert larger-than-memory files from one format to another format, e.g. CSV to Parquet. But that&rsquo;s not what we are here for today.<\/small><\/span>.<\/p>\n<p>I am going to use two aggregation queries for my benchmark:<\/p>\n<ol>\n<li>\n<p>The first query counts the number of English reviews per steam account and sorts them from highest to lowest<\/p>\n<pre tabindex=\"0\"><code class=\"language-terminal\" data-lang=\"terminal\">D select author_steamid, \nCOUNT(*) AS num_reviews  \nFROM read_csv(&#39;all_reviews.csv&#39;, ignore_errors = true)  \nWHERE language = &#39;english&#39; \nGROUP BY author_steamid \nORDER BY num_reviews DESC;\n100% \u2595\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u258f \n\u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510\n\u2502  author_steamid   \u2502 num_reviews \u2502\n\u2502       int64       \u2502    int64    \u2502\n\u251c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u253c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2524\n\u2502 76561198030784015 \u2502        9674 \u2502\n\u2502 76561198024340430 \u2502        5930 \u2502\n\u2502 76561198067298289 \u2502        5534 \u2502\n\u2502 76561198094803808 \u2502        4341 \u2502\n\u2502 76561198125392509 \u2502        4124 \u2502\n\u2502 76561198027267313 \u2502        4124 \u2502\n\u2502 76561197960373660 \u2502        3350 \u2502\n\u2502 76561197970602587 \u2502        3212 \u2502\n\u2502 76561197961017729 \u2502        2810 \u2502\n\u2502 76561198155150242 \u2502        2554 \u2502\n\u2502 76561198066590240 \u2502        2490 \u2502\n\u2502 76561198045381877 \u2502        2280 \u2502\n\u2502 76561198069159152 \u2502        2216 \u2502\n\u2502 76561197960319772 \u2502        1978 \u2502\n\u2502 76561198062813911 \u2502        1940 \u2502\n\u2502 76561198025731804 \u2502        1928 \u2502\n\u2502 76561198137285867 \u2502        1898 \u2502\n\u2502 76561198043135631 \u2502        1883 \u2502\n\u2502 76561198036629241 \u2502        1875 \u2502\n\u2502 76561198055119582 \u2502        1843 \u2502\n\u2502         \u00b7         \u2502           \u00b7 \u2502\n\u2502         \u00b7         \u2502           \u00b7 \u2502\n\u2502         \u00b7         \u2502           \u00b7 \u2502\n\u2502 76561198111175247 \u2502           1 \u2502\n\u251c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2524\n\u2502    15324507 rows (40 shown)     \u2502\n\u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\nRun Time (s): real 22.556 user 166.382276 sys 9.815767\n<\/code><\/pre><p>This query processes the 112M rows in a surprisingly short 22.5s. The most active Steam user has written 9674 game reviews. You can find them \n<a href=\"https:\/\/www.steamidfinder.com\/lookup\/76561198030784015\/\" \n\n    target=\"_blank\"\n    >here\n    \n        <span style=\"white-space: nowrap\">&thinsp;<svg\n            style=\"height: 0.7em; width: 0.7em; padding-left: -0.2em;\" focusable=\"false\" data-prefix=\"fas\" data-icon=\"external-link-alt\"\n            class=\"svg-inline--fa fa-external-link-alt fa-w-16\" role=\"img\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\"\n            viewBox=\"0 0 512 512\">\n            <path fill=\"currentColor\"\n                d=\"M432,320H400a16,16,0,0,0-16,16V448H64V128H208a16,16,0,0,0,16-16V80a16,16,0,0,0-16-16H48A48,48,0,0,0,0,112V464a48,48,0,0,0,48,48H400a48,48,0,0,0,48-48V336A16,16,0,0,0,432,320ZM488,0h-128c-21.37,0-32.05,25.91-17,41l35.73,35.73L135,320.37a24,24,0,0,0,0,34L157.67,377a24,24,0,0,0,34,0L435.28,133.32,471,169c15,15,41,4.5,41-17V24A24,24,0,0,0,488,0Z\">\n            <\/path>\n        <\/svg>\n    <\/span>\n        \n    \n<\/a>.<\/p>\n<\/li>\n<li>\n<p>I made the second query intentionally more complex. I filter for reviews that contain the word <code>love<\/code> and I also compute the mean of the <code>votes_up<\/code> score and the sum of the <code>votes_funny<\/code> score for each user. I then filter the aggregation using <code>having<\/code> for only users that have a <code>sum_votes_funny<\/code> score of more than 100.<\/p>\n<pre tabindex=\"0\"><code class=\"language-terminal\" data-lang=\"terminal\">D SELECT author_steamid, \n    COUNT(*) AS num_reviews,\n    avg(votes_up) as mean_votes_up, \n    sum(votes_funny) as sum_votes_funny,\n    FROM read_csv(&#39;all_reviews.csv&#39;, ignore_errors = true) \n    WHERE language = &#39;english&#39; and review LIKE &#39;%love%&#39;\n    GROUP BY author_steamid \n    having sum_votes_funny &gt; 100 \n    ORDER BY num_reviews DESC; \n100% \u2595\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u258f \n\u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510\n\u2502  author_steamid   \u2502 num_reviews \u2502   mean_votes_up    \u2502 sum_votes_funny \u2502\n\u2502       int64       \u2502    int64    \u2502       double       \u2502     int128      \u2502\n\u251c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u253c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u253c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u253c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2524\n\u2502 76561198155150242 \u2502         777 \u2502   1.09009009009009 \u2502             362 \u2502\n\u2502 76561198042406453 \u2502         665 \u2502 13.478195488721804 \u2502             341 \u2502\n\u2502 76561198043135631 \u2502         475 \u2502 2.9410526315789474 \u2502             116 \u2502\n\u2502 76561198149437416 \u2502         352 \u2502 1.2386363636363635 \u2502             256 \u2502\n\u2502 76561198007343154 \u2502         342 \u2502  4.038011695906433 \u2502             375 \u2502\n\u2502 76561198066590240 \u2502         273 \u2502  9.293040293040294 \u2502             247 \u2502\n\u2502 76561197970314107 \u2502         241 \u2502  38.15767634854772 \u2502             319 \u2502\n\u2502 76561197961017729 \u2502         237 \u2502 10.278481012658228 \u2502             251 \u2502\n\u2502 76561197981638563 \u2502         231 \u2502 26.372294372294373 \u2502             509 \u2502\n\u2502 76561197972040704 \u2502         223 \u2502  8.560538116591928 \u2502             123 \u2502\n\u2502 76561197992694498 \u2502         223 \u2502   73.1390134529148 \u2502             759 \u2502\n\u2502 76561197970761123 \u2502         222 \u2502 22.603603603603602 \u2502             150 \u2502\n\u2502 76561198043609914 \u2502         217 \u2502 18.792626728110598 \u2502             129 \u2502\n\u2502 76561198053422627 \u2502         202 \u2502  17.04950495049505 \u2502             284 \u2502\n\u2502 76561198007888370 \u2502         196 \u2502  93.96938775510205 \u2502            3345 \u2502\n\u2502 76561198040884867 \u2502         191 \u2502  38.41884816753927 \u2502             509 \u2502\n\u2502 76561198817597644 \u2502         166 \u2502 22.542168674698797 \u2502             119 \u2502\n\u2502 76561198356141989 \u2502         166 \u2502 19.246987951807228 \u2502             216 \u2502\n\u2502 76561198031599084 \u2502         165 \u2502 17.163636363636364 \u2502             171 \u2502\n\u2502 76561198011647032 \u2502         159 \u2502  6.345911949685535 \u2502             192 \u2502\n\u2502         \u00b7         \u2502           \u00b7 \u2502                \u00b7   \u2502              \u00b7  \u2502\n\u2502         \u00b7         \u2502           \u00b7 \u2502                \u00b7   \u2502              \u00b7  \u2502\n\u2502         \u00b7         \u2502           \u00b7 \u2502                \u00b7   \u2502              \u00b7  \u2502\n\u2502 76561198106232693 \u2502           1 \u2502               96.0 \u2502             155 \u2502\n\u251c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2524\n\u2502 2383 rows (40 shown)                                         4 columns \u2502\n\u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\nRun Time (s): real 26.648 user 213.432443 sys 7.281861\n<\/code><\/pre><p>With ~26s this is on par with the previous query. User <code>76561198155150242<\/code> used the word &ldquo;love&rdquo; in 777 reviews and collected 362 funny upvotes across their reviews.<\/p>\n<\/li>\n<\/ol>\n<p>These timings show how incredibly convenient DuckDB is to analyse large datasets. &lt;30s is still in the realm where I can interactively work with the dataset and explore different queries.<span class=\"sidenote-number\"><small class=\"sidenote\">You can push DuckDB to the limit by using a blocking operator, e.g. a rank() window function. This query will be slow because the entire input needs to be buffered to compute the result.<\/small><\/span><\/p>\n<p>When I explore data it is more common that I work in a notebook instead of the CLI. Fortunately, I can run the same queries using the \n<a href=\"https:\/\/duckdb.org\/docs\/guides\/python\/install\" \n\n    target=\"_blank\"\n    >DuckDB Python client\n    \n        <span style=\"white-space: nowrap\">&thinsp;<svg\n            style=\"height: 0.7em; width: 0.7em; padding-left: -0.2em;\" focusable=\"false\" data-prefix=\"fas\" data-icon=\"external-link-alt\"\n            class=\"svg-inline--fa fa-external-link-alt fa-w-16\" role=\"img\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\"\n            viewBox=\"0 0 512 512\">\n            <path fill=\"currentColor\"\n                d=\"M432,320H400a16,16,0,0,0-16,16V448H64V128H208a16,16,0,0,0,16-16V80a16,16,0,0,0-16-16H48A48,48,0,0,0,0,112V464a48,48,0,0,0,48,48H400a48,48,0,0,0,48-48V336A16,16,0,0,0,432,320ZM488,0h-128c-21.37,0-32.05,25.91-17,41l35.73,35.73L135,320.37a24,24,0,0,0,0,34L157.67,377a24,24,0,0,0,34,0L435.28,133.32,471,169c15,15,41,4.5,41-17V24A24,24,0,0,0,488,0Z\">\n            <\/path>\n        <\/svg>\n    <\/span>\n        \n    \n<\/a>. In the next section I am going to compare it against polars, another relatively new kid on the block for out-of-memory analytics.<\/p>\n <h2 id=\"working-in-a-jupyter-notebook\">\n  <a href=\"#working-in-a-jupyter-notebook\" class=\"header-link\">\n    Working in a Jupyter notebook\n  <\/a>\n<\/h2> <h3 id=\"duckdb-python-client\">\n  <a href=\"#duckdb-python-client\" class=\"header-link\">\n    DuckDB Python client\n  <\/a>\n<\/h3><p>I am going to write the code to run the above queries using the <code>duckdb<\/code> python package and return the results as a <code>pandas<\/code> dataframe.<\/p>\n<div class=\"highlight\"><pre tabindex=\"0\" class=\"chroma\"><code class=\"language-python\" data-lang=\"python\"><span class=\"line\"><span class=\"cl\"><span class=\"kn\">import<\/span> <span class=\"nn\">duckdb<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"kn\">import<\/span> <span class=\"nn\">pandas<\/span> <span class=\"k\">as<\/span> <span class=\"nn\">pd<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\">\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"c1\"># Connect to DuckDB<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"n\">conn<\/span> <span class=\"o\">=<\/span> <span class=\"n\">duckdb<\/span><span class=\"o\">.<\/span><span class=\"n\">connect<\/span><span class=\"p\">(<\/span><span class=\"s1\">&#39;:memory:&#39;<\/span><span class=\"p\">)<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\">\n<\/span><\/span><span class=\"line\"><span class=\"cl\">\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"c1\"># Define the queries<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"n\">query1<\/span> <span class=\"o\">=<\/span> <span class=\"s2\">&#34;&#34;&#34;\n<\/span><\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"s2\">SELECT author_steamid, COUNT(*) AS num_reviews \n<\/span><\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"s2\">FROM read_csv(&#39;.\/steam_reviews\/all_reviews\/all_reviews.csv&#39;, ignore_errors = true) \n<\/span><\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"s2\">WHERE language = &#39;english&#39; \n<\/span><\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"s2\">GROUP BY author_steamid \n<\/span><\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"s2\">ORDER BY num_reviews DESC \n<\/span><\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"s2\">&#34;&#34;&#34;<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\">\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"n\">query2<\/span> <span class=\"o\">=<\/span> <span class=\"s2\">&#34;&#34;&#34;\n<\/span><\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"s2\">SELECT author_steamid, \n<\/span><\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"s2\">COUNT(*) AS num_reviews,\n<\/span><\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"s2\">avg(votes_up) as mean_votes_up, \n<\/span><\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"s2\">sum(votes_funny) as sum_votes_funny,\n<\/span><\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"s2\">FROM read_csv(&#39;.\/steam_reviews\/all_reviews\/all_reviews.csv&#39;, ignore_errors = true) \n<\/span><\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"s2\">WHERE language = &#39;english&#39; and review LIKE &#39;<\/span><span class=\"si\">%lo<\/span><span class=\"s2\">ve%&#39;\n<\/span><\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"s2\">GROUP BY author_steamid \n<\/span><\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"s2\">having sum_votes_funny &gt; 100 \n<\/span><\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"s2\">ORDER BY num_reviews DESC \n<\/span><\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"s2\">&#34;&#34;&#34;<\/span>\n<\/span><\/span><\/code><\/pre><\/div><div class=\"highlight\"><pre tabindex=\"0\" class=\"chroma\"><code class=\"language-python\" data-lang=\"python\"><span class=\"line\"><span class=\"cl\"><span class=\"o\">%%<\/span><span class=\"n\">timeit<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\">\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"c1\"># Execute the query<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"n\">result1<\/span> <span class=\"o\">=<\/span> <span class=\"n\">conn<\/span><span class=\"o\">.<\/span><span class=\"n\">execute<\/span><span class=\"p\">(<\/span><span class=\"n\">query1<\/span><span class=\"p\">)<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"c1\"># Convert the result to a pandas DataFrame<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"n\">df1<\/span> <span class=\"o\">=<\/span> <span class=\"n\">result1<\/span><span class=\"o\">.<\/span><span class=\"n\">df<\/span><span class=\"p\">()<\/span>\n<\/span><\/span><\/code><\/pre><\/div><pre><code>16.6 s \u00b1 627 ms per loop (mean \u00b1 std. dev. of 7 runs, 1 loop each)\n<\/code><\/pre>\n<div class=\"highlight\"><pre tabindex=\"0\" class=\"chroma\"><code class=\"language-python\" data-lang=\"python\"><span class=\"line\"><span class=\"cl\"><span class=\"n\">df1<\/span><span class=\"o\">.<\/span><span class=\"n\">head<\/span><span class=\"p\">(<\/span><span class=\"mi\">5<\/span><span class=\"p\">)<\/span>\n<\/span><\/span><\/code><\/pre><\/div><div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n<pre><code>.dataframe tbody tr th {\n    vertical-align: top;\n}\n\n.dataframe thead th {\n    text-align: right;\n}\n<\/code><\/pre>\n<p><\/style><\/p>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th><\/th>\n      <th>author_steamid<\/th>\n      <th>num_reviews<\/th>\n    <\/tr>\n  <\/thead>\n  <tbody>\n    <tr>\n      <th>0<\/th>\n      <td>76561198030784015<\/td>\n      <td>9674<\/td>\n    <\/tr>\n    <tr>\n      <th>1<\/th>\n      <td>76561198024340430<\/td>\n      <td>5930<\/td>\n    <\/tr>\n    <tr>\n      <th>2<\/th>\n      <td>76561198067298289<\/td>\n      <td>5534<\/td>\n    <\/tr>\n    <tr>\n      <th>3<\/th>\n      <td>76561198094803808<\/td>\n      <td>4341<\/td>\n    <\/tr>\n    <tr>\n      <th>4<\/th>\n      <td>76561198027267313<\/td>\n      <td>4124<\/td>\n    <\/tr>\n  <\/tbody>\n<\/table>\n<\/div>\n<div class=\"highlight\"><pre tabindex=\"0\" class=\"chroma\"><code class=\"language-python\" data-lang=\"python\"><span class=\"line\"><span class=\"cl\"><span class=\"o\">%%<\/span><span class=\"n\">timeit<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\">\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"c1\"># Execute the query<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"n\">result2<\/span> <span class=\"o\">=<\/span> <span class=\"n\">conn<\/span><span class=\"o\">.<\/span><span class=\"n\">execute<\/span><span class=\"p\">(<\/span><span class=\"n\">query2<\/span><span class=\"p\">)<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"c1\"># Convert the result to a pandas DataFrame<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"n\">df2<\/span> <span class=\"o\">=<\/span> <span class=\"n\">result2<\/span><span class=\"o\">.<\/span><span class=\"n\">df<\/span><span class=\"p\">()<\/span>\n<\/span><\/span><\/code><\/pre><\/div><pre><code>19.9 s \u00b1 343 ms per loop (mean \u00b1 std. dev. of 7 runs, 1 loop each)\n<\/code><\/pre>\n<div class=\"highlight\"><pre tabindex=\"0\" class=\"chroma\"><code class=\"language-python\" data-lang=\"python\"><span class=\"line\"><span class=\"cl\"><span class=\"n\">df2<\/span><span class=\"o\">.<\/span><span class=\"n\">head<\/span><span class=\"p\">(<\/span><span class=\"mi\">5<\/span><span class=\"p\">)<\/span>\n<\/span><\/span><\/code><\/pre><\/div><div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n<pre><code>.dataframe tbody tr th {\n    vertical-align: top;\n}\n\n.dataframe thead th {\n    text-align: right;\n}\n<\/code><\/pre>\n<p><\/style><\/p>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th><\/th>\n      <th>author_steamid<\/th>\n      <th>num_reviews<\/th>\n      <th>mean_votes_up<\/th>\n      <th>sum_votes_funny<\/th>\n    <\/tr>\n  <\/thead>\n  <tbody>\n    <tr>\n      <th>0<\/th>\n      <td>76561198155150242<\/td>\n      <td>777<\/td>\n      <td>1.090090<\/td>\n      <td>362.0<\/td>\n    <\/tr>\n    <tr>\n      <th>1<\/th>\n      <td>76561198042406453<\/td>\n      <td>665<\/td>\n      <td>13.478195<\/td>\n      <td>341.0<\/td>\n    <\/tr>\n    <tr>\n      <th>2<\/th>\n      <td>76561198043135631<\/td>\n      <td>475<\/td>\n      <td>2.941053<\/td>\n      <td>116.0<\/td>\n    <\/tr>\n    <tr>\n      <th>3<\/th>\n      <td>76561198149437416<\/td>\n      <td>352<\/td>\n      <td>1.238636<\/td>\n      <td>256.0<\/td>\n    <\/tr>\n    <tr>\n      <th>4<\/th>\n      <td>76561198007343154<\/td>\n      <td>342<\/td>\n      <td>4.038012<\/td>\n      <td>375.0<\/td>\n    <\/tr>\n  <\/tbody>\n<\/table>\n<\/div>\n<p>Not surprisingly this gives the same results as the queries run from the CLI. Having the aggregated results as a pandas dataframe allows me to use it in downstream work. I don&rsquo;t need to learn another syntax.<\/p>\n<div class=\"highlight\"><pre tabindex=\"0\" class=\"chroma\"><code class=\"language-python\" data-lang=\"python\"><span class=\"line\"><span class=\"cl\"><span class=\"c1\">## clear memory <\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"kn\">import<\/span> <span class=\"nn\">gc<\/span> \n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"k\">del<\/span> <span class=\"n\">result1<\/span><span class=\"p\">,<\/span> <span class=\"n\">result2<\/span><span class=\"p\">,<\/span> <span class=\"n\">df1<\/span><span class=\"p\">,<\/span> <span class=\"n\">df2<\/span><span class=\"p\">;<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"n\">gc<\/span><span class=\"o\">.<\/span><span class=\"n\">collect<\/span><span class=\"p\">();<\/span>\n<\/span><\/span><\/code><\/pre><\/div> <h3 id=\"polars\">\n  <a href=\"#polars\" class=\"header-link\">\n    Polars\n  <\/a>\n<\/h3><p>Since polars is often brought up as a faster and better pandas, I want to do a performance comparison. Polars supports lazy dataframes, which allows me to define operations on the dataframe without loading it fully into memory first.<\/p>\n<p>Below are the two equivalent queries written using polars syntax.<\/p>\n<div class=\"highlight\"><pre tabindex=\"0\" class=\"chroma\"><code class=\"language-python\" data-lang=\"python\"><span class=\"line\"><span class=\"cl\"><span class=\"kn\">import<\/span>  <span class=\"nn\">polars<\/span> <span class=\"k\">as<\/span> <span class=\"nn\">pl<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\">\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"c1\"># Create a lazy DataFrame<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"n\">df_lazy<\/span> <span class=\"o\">=<\/span> <span class=\"n\">pl<\/span><span class=\"o\">.<\/span><span class=\"n\">scan_csv<\/span><span class=\"p\">(<\/span><span class=\"s2\">&#34;.\/steam_reviews\/all_reviews\/all_reviews.csv&#34;<\/span><span class=\"p\">,<\/span> <span class=\"n\">ignore_errors<\/span><span class=\"o\">=<\/span><span class=\"kc\">True<\/span><span class=\"p\">)<\/span>\n<\/span><\/span><\/code><\/pre><\/div><div class=\"highlight\"><pre tabindex=\"0\" class=\"chroma\"><code class=\"language-python\" data-lang=\"python\"><span class=\"line\"><span class=\"cl\"><span class=\"o\">%%<\/span><span class=\"n\">time<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\">\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"c1\"># Define the query using lazy operations<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"n\">result_polars1<\/span> <span class=\"o\">=<\/span> <span class=\"p\">(<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\">    <span class=\"n\">df_lazy<\/span><span class=\"o\">.<\/span><span class=\"n\">filter<\/span><span class=\"p\">(<\/span><span class=\"n\">pl<\/span><span class=\"o\">.<\/span><span class=\"n\">col<\/span><span class=\"p\">(<\/span><span class=\"s2\">&#34;language&#34;<\/span><span class=\"p\">)<\/span> <span class=\"o\">==<\/span> <span class=\"s2\">&#34;english&#34;<\/span><span class=\"p\">)<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\">    <span class=\"o\">.<\/span><span class=\"n\">group_by<\/span><span class=\"p\">(<\/span><span class=\"s2\">&#34;author_steamid&#34;<\/span><span class=\"p\">)<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\">    <span class=\"o\">.<\/span><span class=\"n\">agg<\/span><span class=\"p\">(<\/span><span class=\"n\">pl<\/span><span class=\"o\">.<\/span><span class=\"n\">len<\/span><span class=\"p\">()<\/span><span class=\"o\">.<\/span><span class=\"n\">alias<\/span><span class=\"p\">(<\/span><span class=\"s2\">&#34;num_reviews&#34;<\/span><span class=\"p\">))<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\">    <span class=\"o\">.<\/span><span class=\"n\">sort<\/span><span class=\"p\">(<\/span><span class=\"s2\">&#34;num_reviews&#34;<\/span><span class=\"p\">,<\/span> <span class=\"n\">descending<\/span><span class=\"o\">=<\/span><span class=\"kc\">True<\/span><span class=\"p\">)<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\">    \n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"p\">)<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\">\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"c1\"># Execute the query and collect the results<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"n\">df_polars1<\/span> <span class=\"o\">=<\/span> <span class=\"n\">result_polars1<\/span><span class=\"o\">.<\/span><span class=\"n\">collect<\/span><span class=\"p\">()<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"n\">df_polars1<\/span><span class=\"o\">.<\/span><span class=\"n\">head<\/span><span class=\"p\">(<\/span><span class=\"mi\">5<\/span><span class=\"p\">)<\/span>\n<\/span><\/span><\/code><\/pre><\/div><pre><code>CPU times: user 1min 3s, sys: 22.1 s, total: 1min 26s\nWall time: 2min 7s\n<\/code><\/pre>\n<div><style>\n.dataframe > thead > tr,\n.dataframe > tbody > tr {\n  text-align: right;\n  white-space: pre-wrap;\n}\n<\/style>\n<small>shape: (5, 2)<\/small><table border=\"1\" class=\"dataframe\"><thead><tr><th>author_steamid<\/th><th>num_reviews<\/th><\/tr><tr><td>i64<\/td><td>u32<\/td><\/tr><\/thead><tbody><tr><td>76561198030784015<\/td><td>9822<\/td><\/tr><tr><td>76561198024340430<\/td><td>5983<\/td><\/tr><tr><td>76561198067298289<\/td><td>5577<\/td><\/tr><tr><td>76561198094803808<\/td><td>4408<\/td><\/tr><tr><td>76561198125392509<\/td><td>4203<\/td><\/tr><\/tbody><\/table><\/div>\n<div class=\"highlight\"><pre tabindex=\"0\" class=\"chroma\"><code class=\"language-python\" data-lang=\"python\"><span class=\"line\"><span class=\"cl\"><span class=\"o\">%%<\/span><span class=\"n\">time<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\">\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"n\">result_polars2<\/span> <span class=\"o\">=<\/span> <span class=\"p\">(<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\">    <span class=\"n\">df_lazy<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\">    <span class=\"o\">.<\/span><span class=\"n\">filter<\/span><span class=\"p\">(<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\">        <span class=\"p\">(<\/span><span class=\"n\">pl<\/span><span class=\"o\">.<\/span><span class=\"n\">col<\/span><span class=\"p\">(<\/span><span class=\"s2\">&#34;language&#34;<\/span><span class=\"p\">)<\/span> <span class=\"o\">==<\/span> <span class=\"s2\">&#34;english&#34;<\/span><span class=\"p\">)<\/span> <span class=\"o\">&amp;<\/span> \n<\/span><\/span><span class=\"line\"><span class=\"cl\">        <span class=\"n\">pl<\/span><span class=\"o\">.<\/span><span class=\"n\">col<\/span><span class=\"p\">(<\/span><span class=\"s2\">&#34;review&#34;<\/span><span class=\"p\">)<\/span><span class=\"o\">.<\/span><span class=\"n\">str<\/span><span class=\"o\">.<\/span><span class=\"n\">contains<\/span><span class=\"p\">(<\/span><span class=\"s2\">&#34;love&#34;<\/span><span class=\"p\">)<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\">    <span class=\"p\">)<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\">    <span class=\"o\">.<\/span><span class=\"n\">group_by<\/span><span class=\"p\">(<\/span><span class=\"s2\">&#34;author_steamid&#34;<\/span><span class=\"p\">)<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\">    <span class=\"o\">.<\/span><span class=\"n\">agg<\/span><span class=\"p\">([<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\">        <span class=\"n\">pl<\/span><span class=\"o\">.<\/span><span class=\"n\">len<\/span><span class=\"p\">()<\/span><span class=\"o\">.<\/span><span class=\"n\">alias<\/span><span class=\"p\">(<\/span><span class=\"s2\">&#34;num_reviews&#34;<\/span><span class=\"p\">),<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\">        <span class=\"n\">pl<\/span><span class=\"o\">.<\/span><span class=\"n\">col<\/span><span class=\"p\">(<\/span><span class=\"s2\">&#34;votes_up&#34;<\/span><span class=\"p\">)<\/span><span class=\"o\">.<\/span><span class=\"n\">mean<\/span><span class=\"p\">()<\/span><span class=\"o\">.<\/span><span class=\"n\">alias<\/span><span class=\"p\">(<\/span><span class=\"s2\">&#34;mean_votes_up&#34;<\/span><span class=\"p\">),<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\">        <span class=\"n\">pl<\/span><span class=\"o\">.<\/span><span class=\"n\">col<\/span><span class=\"p\">(<\/span><span class=\"s2\">&#34;votes_funny&#34;<\/span><span class=\"p\">)<\/span><span class=\"o\">.<\/span><span class=\"n\">sum<\/span><span class=\"p\">()<\/span><span class=\"o\">.<\/span><span class=\"n\">alias<\/span><span class=\"p\">(<\/span><span class=\"s2\">&#34;sum_votes_funny&#34;<\/span><span class=\"p\">)<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\">    <span class=\"p\">])<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\">    <span class=\"o\">.<\/span><span class=\"n\">filter<\/span><span class=\"p\">(<\/span><span class=\"n\">pl<\/span><span class=\"o\">.<\/span><span class=\"n\">col<\/span><span class=\"p\">(<\/span><span class=\"s2\">&#34;sum_votes_funny&#34;<\/span><span class=\"p\">)<\/span> <span class=\"o\">&gt;<\/span> <span class=\"mi\">100<\/span><span class=\"p\">)<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\">    <span class=\"o\">.<\/span><span class=\"n\">sort<\/span><span class=\"p\">(<\/span><span class=\"s2\">&#34;num_reviews&#34;<\/span><span class=\"p\">,<\/span> <span class=\"n\">descending<\/span><span class=\"o\">=<\/span><span class=\"kc\">True<\/span><span class=\"p\">)<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"p\">)<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\">\n<\/span><\/span><span class=\"line\"><span class=\"cl\">\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"c1\"># Execute the query and collect the results<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"n\">df_polars2<\/span> <span class=\"o\">=<\/span> <span class=\"n\">result_polars2<\/span><span class=\"o\">.<\/span><span class=\"n\">collect<\/span><span class=\"p\">()<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"n\">df_polars2<\/span><span class=\"o\">.<\/span><span class=\"n\">head<\/span><span class=\"p\">(<\/span><span class=\"mi\">5<\/span><span class=\"p\">)<\/span>\n<\/span><\/span><\/code><\/pre><\/div><pre><code>&lt;timed exec&gt;:9: DeprecationWarning: `pl.count()` is deprecated. Please use `pl.len()` instead.\n\n\nCPU times: user 1min 21s, sys: 42.7 s, total: 2min 4s\nWall time: 5min 14s\n<\/code><\/pre>\n<div><style>\n.dataframe > thead > tr,\n.dataframe > tbody > tr {\n  text-align: right;\n  white-space: pre-wrap;\n}\n<\/style>\n<small>shape: (5, 4)<\/small><table border=\"1\" class=\"dataframe\"><thead><tr><th>author_steamid<\/th><th>num_reviews<\/th><th>mean_votes_up<\/th><th>sum_votes_funny<\/th><\/tr><tr><td>i64<\/td><td>u32<\/td><td>f64<\/td><td>i64<\/td><\/tr><\/thead><tbody><tr><td>76561198155150242<\/td><td>786<\/td><td>1.094148<\/td><td>369<\/td><\/tr><tr><td>76561198042406453<\/td><td>673<\/td><td>13.43685<\/td><td>345<\/td><\/tr><tr><td>76561198043135631<\/td><td>480<\/td><td>2.922917<\/td><td>116<\/td><\/tr><tr><td>76561198149437416<\/td><td>360<\/td><td>1.216667<\/td><td>256<\/td><\/tr><tr><td>76561198007343154<\/td><td>354<\/td><td>4.014124<\/td><td>381<\/td><\/tr><\/tbody><\/table><\/div>\n<p>Now we can compare DuckDB vs. polars execution times for the two queries:<\/p>\n<table>\n  <thead>\n      <tr>\n          <th>wall time<\/th>\n          <th>DuckDB<\/th>\n          <th>polars<\/th>\n      <\/tr>\n  <\/thead>\n  <tbody>\n      <tr>\n          <td>query 1<\/td>\n          <td>17s<\/td>\n          <td>2min 7s<\/td>\n      <\/tr>\n      <tr>\n          <td>query 2<\/td>\n          <td>20s<\/td>\n          <td>5min 14s<\/td>\n      <\/tr>\n  <\/tbody>\n<\/table>\n <h2 id=\"conclusion\">\n  <a href=\"#conclusion\" class=\"header-link\">\n    Conclusion\n  <\/a>\n<\/h2><p>This investigation shows that DuckDB is a powerful and convenient tool to process larger-than-memory datasets on a single machine. As a data scientist this is useful, because I can focus on exploring the data in the early exploration phase. I don&rsquo;t yet have to spend time setting up more complex tools or work on a remote machine. The quick comparison with polars also shows its speed advantages and that it can be used as a drop-in when aggregating data in Jupyter notebooks.<\/p>\n"},{"title":"Interest rate expectations","link":"\/posts\/interest-rate-expecations\/","pubDate":"Sat, 14 Sep 2024 00:00:00 +0000","guid":"\/posts\/interest-rate-expecations\/","description":"<p>This morning, the top headline on the front page of my weekly newspaper was:<\/p>\n<blockquote>\n<p><strong>Bets rise on bumper rate cut by Fed<\/strong><\/p>\n<p>Investors have sharply increased their bets on half percentage point interest rate cut by the Federal reserve next week as the US central bank prepares to lower borrowing cost for the first time in more than four years. Traders in swaps markets are pricing in a 43 percent chance the Fed will opt for a bumper cut in a bid to prevent high rates damaging the economy. -  <em>FT Weekend (14.9.2024)<\/em><\/p>\n<\/blockquote>\n<p>I often read news about expectations on how <em>the Fed<\/em> will set interest rates. I have a vague sense that it is related to the price of some interest rate related futures. I just asked myself if I wanted to look up what the market expectation is next week, where would I actually go to look? This detail is regularly glanced over in news articles. It turns out that the maths behind the expectation approximation is fairly simple. Let us work it out.<\/p>\n <h2 id=\"the-federal-funds-rate\">\n  <a href=\"#the-federal-funds-rate\" class=\"header-link\">\n    The Federal Funds Rate\n  <\/a>\n<\/h2><p>Banks are required to put a certain percentage of their deposits into accounts at a Federal Reserve Bank. This is to maintain liquidity to cover depositor&rsquo;s withdrawals in the short-term. This reserve requirement is loosely a percentage of the bank&rsquo;s deposits at the end of the day averaged over a two-week period. If at the end of the day a bank has excess reserve balances it can lend it overnight to another bank that is undercapitalized. Banks do that a lot and negotiate an interest rate for this overnight loan. The weighted average of all these deals for day is the effective federal funds rate (EFFR).<\/p>\n<p>The Fed&rsquo;s federal funds rate (FFR) target range is set by the Federal Open Market Committee and is the desired range for the EFFR. While the Fed can&rsquo;t directly impact the EFFR it can influence it via their own deposit interest rates or changes to the monetary supply. This tells us what today&rsquo;s FFR range and EFFR are (both are published by the Fed). But how do we compute the market&rsquo;s expectations on changes to the FFR range?<\/p>\n <h2 id=\"federal-funds-futures\">\n  <a href=\"#federal-funds-futures\" class=\"header-link\">\n    Federal Funds Futures\n  <\/a>\n<\/h2><p>We can infer the expectations to FFR range changes at the next committee meeting by looking into how FFR future contracts are priced. Those are traded on the Chicago Mercantile Exchange (CME). You can view the monthly future quotes on the \n<a href=\"https:\/\/www.cmegroup.com\/markets\/interest-rates\/stirs\/30-day-federal-fund.quotes.html\" \n\n    target=\"_blank\"\n    >CME website\n    \n        <span style=\"white-space: nowrap\">&thinsp;<svg\n            style=\"height: 0.7em; width: 0.7em; padding-left: -0.2em;\" focusable=\"false\" data-prefix=\"fas\" data-icon=\"external-link-alt\"\n            class=\"svg-inline--fa fa-external-link-alt fa-w-16\" role=\"img\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\"\n            viewBox=\"0 0 512 512\">\n            <path fill=\"currentColor\"\n                d=\"M432,320H400a16,16,0,0,0-16,16V448H64V128H208a16,16,0,0,0,16-16V80a16,16,0,0,0-16-16H48A48,48,0,0,0,0,112V464a48,48,0,0,0,48,48H400a48,48,0,0,0,48-48V336A16,16,0,0,0,432,320ZM488,0h-128c-21.37,0-32.05,25.91-17,41l35.73,35.73L135,320.37a24,24,0,0,0,0,34L157.67,377a24,24,0,0,0,34,0L435.28,133.32,471,169c15,15,41,4.5,41-17V24A24,24,0,0,0,488,0Z\">\n            <\/path>\n        <\/svg>\n    <\/span>\n        \n    \n<\/a>.<\/p>\n<p>These futures are used by banks and fixed-income portfolio managers to hedge against short-term interest rate fluctuations. The 30-day futures are monthly contracts that are settled on the last business day of every month. The contract price payable is the arithmetic mean of the daily EFFR during the contract month as \n<a href=\"https:\/\/www.newyorkfed.org\/markets\/reference-rates\/effr\" \n\n    target=\"_blank\"\n    >reported by\n    \n        <span style=\"white-space: nowrap\">&thinsp;<svg\n            style=\"height: 0.7em; width: 0.7em; padding-left: -0.2em;\" focusable=\"false\" data-prefix=\"fas\" data-icon=\"external-link-alt\"\n            class=\"svg-inline--fa fa-external-link-alt fa-w-16\" role=\"img\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\"\n            viewBox=\"0 0 512 512\">\n            <path fill=\"currentColor\"\n                d=\"M432,320H400a16,16,0,0,0-16,16V448H64V128H208a16,16,0,0,0,16-16V80a16,16,0,0,0-16-16H48A48,48,0,0,0,0,112V464a48,48,0,0,0,48,48H400a48,48,0,0,0,48-48V336A16,16,0,0,0,432,320ZM488,0h-128c-21.37,0-32.05,25.91-17,41l35.73,35.73L135,320.37a24,24,0,0,0,0,34L157.67,377a24,24,0,0,0,34,0L435.28,133.32,471,169c15,15,41,4.5,41-17V24A24,24,0,0,0,488,0Z\">\n            <\/path>\n        <\/svg>\n    <\/span>\n        \n    \n<\/a> the Federal Reserve Bank of New York, subtracted from 100. For example if the average interest rate was 3.5% for a given month, then the contract price would be 100 - 3.5 = $96.5. Unfortunately, the minimum contract size is the price times 4167, which for this example would be $96.5 * 4167 = $416,796.5. A bit too high for my personal hedging needs.<\/p>\n <h2 id=\"what-do-the-federal-funds-futures-hold\">\n  <a href=\"#what-do-the-federal-funds-futures-hold\" class=\"header-link\">\n    What do the Federal Funds Future(s) hold?\n  <\/a>\n<\/h2><p>Now that we know about the EFFR and Federal Funds futures, we can collect all ingredients to compute the expectations:<\/p>\n<ol>\n<li>The current FFR range was set in the July meeting to 5.25% - 5.5%.<\/li>\n<li>The dates of the next committee meetings are published here on \n<a href=\"https:\/\/www.federalreserve.gov\/monetarypolicy\/fomccalendars.htm\" \n\n    target=\"_blank\"\n    >federalreserve.gov\n    \n        <span style=\"white-space: nowrap\">&thinsp;<svg\n            style=\"height: 0.7em; width: 0.7em; padding-left: -0.2em;\" focusable=\"false\" data-prefix=\"fas\" data-icon=\"external-link-alt\"\n            class=\"svg-inline--fa fa-external-link-alt fa-w-16\" role=\"img\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\"\n            viewBox=\"0 0 512 512\">\n            <path fill=\"currentColor\"\n                d=\"M432,320H400a16,16,0,0,0-16,16V448H64V128H208a16,16,0,0,0,16-16V80a16,16,0,0,0-16-16H48A48,48,0,0,0,0,112V464a48,48,0,0,0,48,48H400a48,48,0,0,0,48-48V336A16,16,0,0,0,432,320ZM488,0h-128c-21.37,0-32.05,25.91-17,41l35.73,35.73L135,320.37a24,24,0,0,0,0,34L157.67,377a24,24,0,0,0,34,0L435.28,133.32,471,169c15,15,41,4.5,41-17V24A24,24,0,0,0,488,0Z\">\n            <\/path>\n        <\/svg>\n    <\/span>\n        \n    \n<\/a>. The remaining meetings this year are scheduled for: 18.9.2024, 7.11.2024, 18.12.2024.<\/li>\n<li>The expected EFFR from the futures contracts for the month of the next committee meeting and the following month. For the remaining year they were on the 14.9.2024:<\/li>\n<\/ol>\n<div class=\"highlight\"><pre tabindex=\"0\" class=\"chroma\"><code class=\"language-python\" data-lang=\"python\"><span class=\"line\"><span class=\"cl\"><span class=\"kn\">import<\/span> <span class=\"nn\">pandas<\/span> <span class=\"k\">as<\/span> <span class=\"nn\">pd<\/span>\n<\/span><\/span><\/code><\/pre><\/div><div class=\"highlight\"><pre tabindex=\"0\" class=\"chroma\"><code class=\"language-python\" data-lang=\"python\"><span class=\"line\"><span class=\"cl\"><span class=\"n\">df<\/span> <span class=\"o\">=<\/span> <span class=\"n\">pd<\/span><span class=\"o\">.<\/span><span class=\"n\">DataFrame<\/span><span class=\"p\">(<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\">    <span class=\"p\">{<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\">        <span class=\"s1\">&#39;MONTH&#39;<\/span> <span class=\"p\">:<\/span> <span class=\"p\">[<\/span><span class=\"s1\">&#39;09-2024&#39;<\/span><span class=\"p\">,<\/span> <span class=\"s1\">&#39;10-2024&#39;<\/span><span class=\"p\">,<\/span> <span class=\"s1\">&#39;11-2024&#39;<\/span><span class=\"p\">,<\/span> <span class=\"s1\">&#39;12-2024&#39;<\/span><span class=\"p\">],<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\">        <span class=\"s1\">&#39;PRICE&#39;<\/span> <span class=\"p\">:<\/span> <span class=\"p\">[<\/span><span class=\"mf\">94.81<\/span><span class=\"p\">,<\/span> <span class=\"mf\">95.03<\/span><span class=\"p\">,<\/span> <span class=\"mf\">95.315<\/span><span class=\"p\">,<\/span> <span class=\"mf\">95.58<\/span><span class=\"p\">],<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\">        <span class=\"s1\">&#39;MEETING&#39;<\/span><span class=\"p\">:<\/span> <span class=\"p\">[<\/span><span class=\"mi\">18<\/span><span class=\"p\">,<\/span> <span class=\"kc\">None<\/span><span class=\"p\">,<\/span> <span class=\"mi\">7<\/span><span class=\"p\">,<\/span> <span class=\"mi\">18<\/span><span class=\"p\">]<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\">     <span class=\"p\">}<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"p\">)<\/span><span class=\"o\">.<\/span><span class=\"n\">set_index<\/span><span class=\"p\">(<\/span><span class=\"s1\">&#39;MONTH&#39;<\/span><span class=\"p\">)<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"n\">display<\/span><span class=\"p\">(<\/span><span class=\"n\">df<\/span><span class=\"p\">)<\/span>\n<\/span><\/span><\/code><\/pre><\/div><div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n<pre><code>.dataframe tbody tr th {\n    vertical-align: top;\n}\n\n.dataframe thead th {\n    text-align: right;\n}\n<\/code><\/pre>\n<p><\/style><\/p>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th><\/th>\n      <th>PRICE<\/th>\n      <th>MEETING<\/th>\n    <\/tr>\n    <tr>\n      <th>MONTH<\/th>\n      <th><\/th>\n      <th><\/th>\n    <\/tr>\n  <\/thead>\n  <tbody>\n    <tr>\n      <th>09-2024<\/th>\n      <td>94.810<\/td>\n      <td>18.0<\/td>\n    <\/tr>\n    <tr>\n      <th>10-2024<\/th>\n      <td>95.030<\/td>\n      <td>NaN<\/td>\n    <\/tr>\n    <tr>\n      <th>11-2024<\/th>\n      <td>95.315<\/td>\n      <td>7.0<\/td>\n    <\/tr>\n    <tr>\n      <th>12-2024<\/th>\n      <td>95.580<\/td>\n      <td>18.0<\/td>\n    <\/tr>\n  <\/tbody>\n<\/table>\n<\/div>\n<p>The next month without a committee meeting is October. This means that October&rsquo;s average EFFR can only be impacted by the September committee meeting. This means that the October future contract price of $95.03 is the expected average EFFR for October.<\/p>\n<div class=\"highlight\"><pre tabindex=\"0\" class=\"chroma\"><code class=\"language-python\" data-lang=\"python\"><span class=\"line\"><span class=\"cl\"><span class=\"n\">r_october_avg<\/span> <span class=\"o\">=<\/span> <span class=\"mi\">100<\/span> <span class=\"o\">-<\/span> <span class=\"n\">df<\/span><span class=\"o\">.<\/span><span class=\"n\">at<\/span><span class=\"p\">[<\/span><span class=\"s1\">&#39;10-2024&#39;<\/span><span class=\"p\">,<\/span><span class=\"s1\">&#39;PRICE&#39;<\/span><span class=\"p\">]<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"n\">r_september_avg<\/span> <span class=\"o\">=<\/span> <span class=\"mi\">100<\/span> <span class=\"o\">-<\/span> <span class=\"n\">df<\/span><span class=\"o\">.<\/span><span class=\"n\">at<\/span><span class=\"p\">[<\/span><span class=\"s1\">&#39;09-2024&#39;<\/span><span class=\"p\">,<\/span><span class=\"s1\">&#39;PRICE&#39;<\/span><span class=\"p\">]<\/span>\n<\/span><\/span><\/code><\/pre><\/div><p>It is also the expected EFFR on the last day of September (and the first day of November).<\/p>\n<div class=\"highlight\"><pre tabindex=\"0\" class=\"chroma\"><code class=\"language-python\" data-lang=\"python\"><span class=\"line\"><span class=\"cl\"><span class=\"n\">r_september_end<\/span> <span class=\"o\">=<\/span> <span class=\"n\">r_october_avg<\/span>\n<\/span><\/span><\/code><\/pre><\/div><p>We are interested in the implied difference in EFFR at the beginning and the end of the month. If there is a difference, then this would imply that the FED committee changed the FFR range.<\/p>\n<p>Keep in mind that the contract price is the arithmetic mean of the realized EFFR on every day in September. Assume that the FED committee does lower the price on the 18th of the month, then we will have 18 days at a higher rate and 30-18 = 12 days at a lower rate. All of that is captured in the future price (the average EFFR) for September.<\/p>\n<p>Therefore we can use:<\/p>\n<p>\\[\nEFFR(\\text{average of Sep}) = \\frac{N}{30} EFFR(\\text{start of Sep}) +\\frac{30-N}{30} EFFR(\\text{end of Sep})\n\\]<\/p>\n<p>\\[\n\\Leftrightarrow EFFR(\\text{start of Sep}) = \\left( EFFR(\\text{average of Sep}) - \\frac{30-N}{30}\\cdot EFFR(\\text{end of Sep}))\\right) \\cdot \\frac{30}{N}\n\\]<\/p>\n<p>where \\(N\\) is the number of days before the committee meeting in September.<\/p>\n<div class=\"highlight\"><pre tabindex=\"0\" class=\"chroma\"><code class=\"language-python\" data-lang=\"python\"><span class=\"line\"><span class=\"cl\"><span class=\"n\">days_month<\/span> <span class=\"o\">=<\/span> <span class=\"mi\">30<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"n\">days_before_meeting<\/span> <span class=\"o\">=<\/span> <span class=\"n\">df<\/span><span class=\"o\">.<\/span><span class=\"n\">at<\/span><span class=\"p\">[<\/span><span class=\"s1\">&#39;09-2024&#39;<\/span><span class=\"p\">,<\/span><span class=\"s1\">&#39;MEETING&#39;<\/span><span class=\"p\">]<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"n\">days_after_meeting<\/span> <span class=\"o\">=<\/span> <span class=\"n\">days_month<\/span> <span class=\"o\">-<\/span> <span class=\"n\">days_before_meeting<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"n\">r_september_start<\/span> <span class=\"o\">=<\/span> <span class=\"p\">(<\/span><span class=\"n\">r_september_avg<\/span> <span class=\"o\">-<\/span> <span class=\"p\">(<\/span><span class=\"n\">days_after_meeting<\/span><span class=\"o\">\/<\/span><span class=\"n\">days_month<\/span> <span class=\"o\">*<\/span> <span class=\"n\">r_september_end<\/span><span class=\"p\">)<\/span> <span class=\"p\">)<\/span> <span class=\"o\">*<\/span> <span class=\"p\">(<\/span><span class=\"n\">days_month<\/span><span class=\"o\">\/<\/span><span class=\"n\">days_before_meeting<\/span><span class=\"p\">)<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"nb\">print<\/span><span class=\"p\">(<\/span><span class=\"sa\">f<\/span><span class=\"s1\">&#39;The expected EFFR for the start of September is <\/span><span class=\"si\">{<\/span><span class=\"n\">r_september_start<\/span><span class=\"si\">:<\/span><span class=\"s1\">.2f<\/span><span class=\"si\">}<\/span><span class=\"s1\">%.&#39;<\/span><span class=\"p\">)<\/span>\n<\/span><\/span><\/code><\/pre><\/div><pre><code>The expected EFFR for the start of September is 5.34%.\n<\/code><\/pre>\n<p>Since today is the 14th of September, we can already look up the realized EFFR up until today. If today was the 25th of August, then we need to calculate the expected \\(EFFR(\\text{start of Sep})\\) with the above formula.<\/p>\n<div class=\"highlight\"><pre tabindex=\"0\" class=\"chroma\"><code class=\"language-python\" data-lang=\"python\"><span class=\"line\"><span class=\"cl\"><span class=\"n\">r_september_change<\/span> <span class=\"o\">=<\/span> <span class=\"n\">r_september_end<\/span> <span class=\"o\">-<\/span> <span class=\"n\">r_september_start<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"nb\">print<\/span><span class=\"p\">(<\/span><span class=\"sa\">f<\/span><span class=\"s1\">&#39;Comparing the expected EFFR at beginning and end of month we have a delta of <\/span><span class=\"si\">{<\/span><span class=\"n\">r_september_change<\/span><span class=\"si\">:<\/span><span class=\"s1\">.2f<\/span><span class=\"si\">}<\/span><span class=\"s1\">.&#39;<\/span><span class=\"p\">)<\/span>\n<\/span><\/span><\/code><\/pre><\/div><pre><code>Comparing the expected EFFR at beginning and end of month we have a delta of -0.37.\n<\/code><\/pre>\n<p>The market&rsquo;s expectation for September is that we will have 0.37bps lower EFFR at the end of the month. If we buy the futures contract today we will make money if the FED committee cuts the rate in such a way that the realized EFFR will decrease more than 0.37bps compared to the beginning of the month.<\/p>\n<p>Traditionally, the committee changes the FFR range by 25bps (however 50 and 100 basis points change do occur, e.g. after 9\/11, after the 2007-2008 housing market crash and during the COVID-19 pandemic).<\/p>\n<p>However, we can see that the market prices in a 37bps lower EFFR. This means we can calculate the probability of a 25bps cut vs. a 50bps cut.<\/p>\n<div class=\"highlight\"><pre tabindex=\"0\" class=\"chroma\"><code class=\"language-python\" data-lang=\"python\"><span class=\"line\"><span class=\"cl\"><span class=\"nb\">print<\/span><span class=\"p\">(<\/span><span class=\"sa\">f<\/span><span class=\"s1\">&#39;The price implies <\/span><span class=\"si\">{<\/span><span class=\"n\">r_september_change<\/span> <span class=\"o\">\/<\/span> <span class=\"o\">-<\/span><span class=\"mf\">0.25<\/span><span class=\"si\">:<\/span><span class=\"s1\">.2f<\/span><span class=\"si\">}<\/span><span class=\"s1\"> x 25bps rate cuts.&#39;<\/span><span class=\"p\">)<\/span>\n<\/span><\/span><\/code><\/pre><\/div><pre><code>The price implies 1.47 x 25bps rate cuts.\n<\/code><\/pre>\n<div class=\"highlight\"><pre tabindex=\"0\" class=\"chroma\"><code class=\"language-python\" data-lang=\"python\"><span class=\"line\"><span class=\"cl\"><span class=\"n\">probability_of_25_bps_cut<\/span> <span class=\"o\">=<\/span> <span class=\"mi\">1<\/span> <span class=\"o\">-<\/span> <span class=\"p\">(<\/span><span class=\"n\">r_september_change<\/span><span class=\"o\">\/<\/span><span class=\"p\">(<\/span><span class=\"o\">-<\/span><span class=\"mf\">0.25<\/span><span class=\"p\">)<\/span> <span class=\"o\">%<\/span> <span class=\"mi\">1<\/span><span class=\"p\">)<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"nb\">print<\/span><span class=\"p\">(<\/span><span class=\"sa\">f<\/span><span class=\"s1\">&#39;This implies a probability of <\/span><span class=\"si\">{<\/span><span class=\"mi\">100<\/span> <span class=\"o\">*<\/span> <span class=\"n\">probability_of_25_bps_cut<\/span><span class=\"si\">:<\/span><span class=\"s1\">.2f<\/span><span class=\"si\">}<\/span><span class=\"s1\">% that the commitee announces a 25bps rate cut.&#39;<\/span><span class=\"p\">)<\/span>\n<\/span><\/span><\/code><\/pre><\/div><pre><code>This implies a probability of 53.33% that the commitee announces a 25bps rate cut.\n<\/code><\/pre>\n<div class=\"highlight\"><pre tabindex=\"0\" class=\"chroma\"><code class=\"language-python\" data-lang=\"python\"><span class=\"line\"><span class=\"cl\"><span class=\"n\">probability_of_50_bps_cut<\/span> <span class=\"o\">=<\/span>  <span class=\"p\">(<\/span><span class=\"n\">r_september_change<\/span><span class=\"o\">\/<\/span><span class=\"p\">(<\/span><span class=\"o\">-<\/span><span class=\"mf\">0.25<\/span><span class=\"p\">)<\/span> <span class=\"o\">%<\/span> <span class=\"mi\">1<\/span><span class=\"p\">)<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"nb\">print<\/span><span class=\"p\">(<\/span><span class=\"sa\">f<\/span><span class=\"s1\">&#39;This implies a probability of <\/span><span class=\"si\">{<\/span><span class=\"mi\">100<\/span> <span class=\"o\">*<\/span> <span class=\"n\">probability_of_50_bps_cut<\/span><span class=\"si\">:<\/span><span class=\"s1\">.2f<\/span><span class=\"si\">}<\/span><span class=\"s1\">% that the commitee announces a 50bps rate cut.&#39;<\/span><span class=\"p\">)<\/span>\n<\/span><\/span><\/code><\/pre><\/div><pre><code>This implies a probability of 46.67% that the commitee announces a 50bps rate cut.\n<\/code><\/pre>\n<p>The calculation confirms the newspaper article&rsquo;s claim that the rate decision is priced as a coin flip between a 25bps and a 50bps rate cut.<\/p>\n<p><strong>Update 18.9.2024<\/strong>: The FOMC \n<a href=\"https:\/\/www.federalreserve.gov\/newsevents\/pressreleases\/monetary20240918a.htm\" \n\n    target=\"_blank\"\n    >decided\n    \n        <span style=\"white-space: nowrap\">&thinsp;<svg\n            style=\"height: 0.7em; width: 0.7em; padding-left: -0.2em;\" focusable=\"false\" data-prefix=\"fas\" data-icon=\"external-link-alt\"\n            class=\"svg-inline--fa fa-external-link-alt fa-w-16\" role=\"img\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\"\n            viewBox=\"0 0 512 512\">\n            <path fill=\"currentColor\"\n                d=\"M432,320H400a16,16,0,0,0-16,16V448H64V128H208a16,16,0,0,0,16-16V80a16,16,0,0,0-16-16H48A48,48,0,0,0,0,112V464a48,48,0,0,0,48,48H400a48,48,0,0,0,48-48V336A16,16,0,0,0,432,320ZM488,0h-128c-21.37,0-32.05,25.91-17,41l35.73,35.73L135,320.37a24,24,0,0,0,0,34L157.67,377a24,24,0,0,0,34,0L435.28,133.32,471,169c15,15,41,4.5,41-17V24A24,24,0,0,0,488,0Z\">\n            <\/path>\n        <\/svg>\n    <\/span>\n        \n    \n<\/a> on a 50bps cut to a target range of 4.75% - 5.0%.<\/p>\n <h2 id=\"what-about-europe\">\n  <a href=\"#what-about-europe\" class=\"header-link\">\n    What about Europe?\n  <\/a>\n<\/h2><p>If you are interested in the expectations for the interest rates set by the Bank of England, you can use \n<a href=\"https:\/\/www.ice.com\/products\/66380299\/One-Month-SONIA-Index-Futures\" \n\n    target=\"_blank\"\n    >One Month SONIA Index Futures\n    \n        <span style=\"white-space: nowrap\">&thinsp;<svg\n            style=\"height: 0.7em; width: 0.7em; padding-left: -0.2em;\" focusable=\"false\" data-prefix=\"fas\" data-icon=\"external-link-alt\"\n            class=\"svg-inline--fa fa-external-link-alt fa-w-16\" role=\"img\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\"\n            viewBox=\"0 0 512 512\">\n            <path fill=\"currentColor\"\n                d=\"M432,320H400a16,16,0,0,0-16,16V448H64V128H208a16,16,0,0,0,16-16V80a16,16,0,0,0-16-16H48A48,48,0,0,0,0,112V464a48,48,0,0,0,48,48H400a48,48,0,0,0,48-48V336A16,16,0,0,0,432,320ZM488,0h-128c-21.37,0-32.05,25.91-17,41l35.73,35.73L135,320.37a24,24,0,0,0,0,34L157.67,377a24,24,0,0,0,34,0L435.28,133.32,471,169c15,15,41,4.5,41-17V24A24,24,0,0,0,488,0Z\">\n            <\/path>\n        <\/svg>\n    <\/span>\n        \n    \n<\/a>. Similarly, for the ECB interest rates expectations you can check \n<a href=\"https:\/\/www.ice.com\/products\/83046794\/ECB-Dated-ESTR-Futures\" \n\n    target=\"_blank\"\n    >ECB Dated ESTR Futures\n    \n        <span style=\"white-space: nowrap\">&thinsp;<svg\n            style=\"height: 0.7em; width: 0.7em; padding-left: -0.2em;\" focusable=\"false\" data-prefix=\"fas\" data-icon=\"external-link-alt\"\n            class=\"svg-inline--fa fa-external-link-alt fa-w-16\" role=\"img\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\"\n            viewBox=\"0 0 512 512\">\n            <path fill=\"currentColor\"\n                d=\"M432,320H400a16,16,0,0,0-16,16V448H64V128H208a16,16,0,0,0,16-16V80a16,16,0,0,0-16-16H48A48,48,0,0,0,0,112V464a48,48,0,0,0,48,48H400a48,48,0,0,0,48-48V336A16,16,0,0,0,432,320ZM488,0h-128c-21.37,0-32.05,25.91-17,41l35.73,35.73L135,320.37a24,24,0,0,0,0,34L157.67,377a24,24,0,0,0,34,0L435.28,133.32,471,169c15,15,41,4.5,41-17V24A24,24,0,0,0,488,0Z\">\n            <\/path>\n        <\/svg>\n    <\/span>\n        \n    \n<\/a>.<\/p>\n <h2 id=\"jupyter-notebook\">\n  <a href=\"#jupyter-notebook\" class=\"header-link\">\n    Jupyter Notebook\n  <\/a>\n<\/h2><p>You can find the jupyter notebook for this post \n<a href=\"https:\/\/gitlab.com\/frankRi89\/blog\/-\/tree\/main\/notebooks\/interest-rate-expectations\" \n\n    target=\"_blank\"\n    >here\n    \n        <span style=\"white-space: nowrap\">&thinsp;<svg\n            style=\"height: 0.7em; width: 0.7em; padding-left: -0.2em;\" focusable=\"false\" data-prefix=\"fas\" data-icon=\"external-link-alt\"\n            class=\"svg-inline--fa fa-external-link-alt fa-w-16\" role=\"img\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\"\n            viewBox=\"0 0 512 512\">\n            <path fill=\"currentColor\"\n                d=\"M432,320H400a16,16,0,0,0-16,16V448H64V128H208a16,16,0,0,0,16-16V80a16,16,0,0,0-16-16H48A48,48,0,0,0,0,112V464a48,48,0,0,0,48,48H400a48,48,0,0,0,48-48V336A16,16,0,0,0,432,320ZM488,0h-128c-21.37,0-32.05,25.91-17,41l35.73,35.73L135,320.37a24,24,0,0,0,0,34L157.67,377a24,24,0,0,0,34,0L435.28,133.32,471,169c15,15,41,4.5,41-17V24A24,24,0,0,0,488,0Z\">\n            <\/path>\n        <\/svg>\n    <\/span>\n        \n    \n<\/a>.<\/p>\n"},{"title":"Clearing up confusion around IPython, ipykernel, and Jupyter notebooks","link":"\/posts\/jupyter-confusion\/","pubDate":"Mon, 26 Aug 2024 00:00:00 +0000","guid":"\/posts\/jupyter-confusion\/","description":"<p>One of my big recurring time sinks while doing data science work used to be trying to get my colleagues&rsquo; jupyter notebooks to run on my machine. The main contributing factors:<\/p>\n<ul>\n<li>My team generally uses poetry environments to improve reproducibility, but sometimes dependencies aren&rsquo;t specified<\/li>\n<li>I use VS code and the jupyter extension to edit notebooks in VS code which requires more configuration then running the Jupyter web UI.<\/li>\n<li>I lacked a clear understanding of the differences of <code>IPython<\/code>, <code>ipykernel<\/code>, <code>jupyter<\/code> and which python environments are being used when running a notebook.<\/li>\n<\/ul>\n<p>So this is my attempt at a \n<a href=\"https:\/\/www.swyx.io\/friendcatchers\" \n\n    target=\"_blank\"\n    >friendcatcher\n    \n        <span style=\"white-space: nowrap\">&thinsp;<svg\n            style=\"height: 0.7em; width: 0.7em; padding-left: -0.2em;\" focusable=\"false\" data-prefix=\"fas\" data-icon=\"external-link-alt\"\n            class=\"svg-inline--fa fa-external-link-alt fa-w-16\" role=\"img\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\"\n            viewBox=\"0 0 512 512\">\n            <path fill=\"currentColor\"\n                d=\"M432,320H400a16,16,0,0,0-16,16V448H64V128H208a16,16,0,0,0,16-16V80a16,16,0,0,0-16-16H48A48,48,0,0,0,0,112V464a48,48,0,0,0,48,48H400a48,48,0,0,0,48-48V336A16,16,0,0,0,432,320ZM488,0h-128c-21.37,0-32.05,25.91-17,41l35.73,35.73L135,320.37a24,24,0,0,0,0,34L157.67,377a24,24,0,0,0,34,0L435.28,133.32,471,169c15,15,41,4.5,41-17V24A24,24,0,0,0,488,0Z\">\n            <\/path>\n        <\/svg>\n    <\/span>\n        \n    \n<\/a>. I hope this saves you a few minutes the next time you run into similar issues.<\/p>\n <h2 id=\"the-different-components\">\n  <a href=\"#the-different-components\" class=\"header-link\">\n    The different components\n  <\/a>\n<\/h2><p>Let&rsquo;s distinguish the components that play a role in running a jupyter notebook:<\/p>\n<ul>\n<li><strong>Jupyter Notebook platform<\/strong>: A web-based interactive computing platform that supports different languages via different kernels, e.g. for python (ipykernel), Julia (IJulia), R (IRKernel)<\/li>\n<li><strong>Jupyter notebook<\/strong> (extension <code>.ipynb<\/code>):  is a document in json format that holds metadata and cell code.<\/li>\n<li><strong>IPython command shell<\/strong>: The shell has two components:\n<ol>\n<li>An interactive Python shell. You can start it with <code>ipython<\/code>. It&rsquo;s like the default python REPL but with enhanced features, e.g. object introspection, tab completion, input history, magic commands, etc.<\/li>\n<li>A jupyter kernel <code>ipykernel<\/code>. This is the backend process where user python code runs and which can be connected to different frontends. One frontend is indeed the IPython shell, another one a Jupyter notebook. You can install <code>ipykernel<\/code> as a standalone package into your python environment.<\/li>\n<\/ol>\n<\/li>\n<li><strong><code>jupyter<\/code> python package<\/strong>. This is a metapackage which installs the notebook, qtconsole, and ipykernel.<\/li>\n<\/ul>\n<pre class=\"mermaid\">graph TD;\n\ncolab(Google Colab UI) <--> ipykernel(ipykernel)\nvs(VS Code UI) <--> ipykernel\nui(jupyter notebook UI) <-->  server\nserver(jupyter server) <-->  ipykernel\nipykernel <-->  ipython[IPython] \n<\/pre>\n <h2 id=\"jupyter-kernels-vs-shell-environment\">\n  <a href=\"#jupyter-kernels-vs-shell-environment\" class=\"header-link\">\n    Jupyter kernels vs. shell environment\n  <\/a>\n<\/h2><p>One source of confusion is that jupyter kernel can point to a different python executable than your shell environment.<\/p>\n<p>To get an overview of available jupyter executables you can use:<\/p>\n<ul>\n<li><strong>List all available jupyter executables in your system:<\/strong>\n<div class=\"highlight\"><pre tabindex=\"0\" class=\"chroma\"><code class=\"language-bash\" data-lang=\"bash\"><span class=\"line\"><span class=\"cl\">$ <span class=\"nb\">type<\/span> -a jupyter\n<\/span><\/span><\/code><\/pre><\/div><\/li>\n<li><strong>List all available jupyter kernels:<\/strong>\n<div class=\"highlight\"><pre tabindex=\"0\" class=\"chroma\"><code class=\"language-bash\" data-lang=\"bash\"><span class=\"line\"><span class=\"cl\">$ jupyter kernelspec list\n<\/span><\/span><\/code><\/pre><\/div><\/li>\n<\/ul>\n<p>Every jupyter kernel folder includes a <code>kernel.json<\/code> file that links to the python executable that is being used. Note this can be different to the python executable referenced by your current shell. Moreover, the shell environment of a Jupyter notebook uses the python executable used to <em>launch<\/em> the notebook.<\/p>\n<ul>\n<li><strong>Print path of currently used python executable:<\/strong>\n<div class=\"highlight\"><pre tabindex=\"0\" class=\"chroma\"><code class=\"language-bash\" data-lang=\"bash\"><span class=\"line\"><span class=\"cl\">$ <span class=\"nb\">type<\/span> python\n<\/span><\/span><\/code><\/pre><\/div>or in a notebook cell:\n<div class=\"highlight\"><pre tabindex=\"0\" class=\"chroma\"><code class=\"language-bash\" data-lang=\"bash\"><span class=\"line\"><span class=\"cl\">!type python\n<\/span><\/span><\/code><\/pre><\/div><\/li>\n<li><strong>Print path of python executable of current kernel:<\/strong>\nIn a notebook cell:\n<div class=\"highlight\"><pre tabindex=\"0\" class=\"chroma\"><code class=\"language-bash\" data-lang=\"bash\"><span class=\"line\"><span class=\"cl\">import sys\n<\/span><\/span><span class=\"line\"><span class=\"cl\">sys.executable\n<\/span><\/span><\/code><\/pre><\/div><\/li>\n<\/ul>\n<p>You can create new kernels using the <code>ipykernel<\/code> package:<\/p>\n<div class=\"highlight\"><pre tabindex=\"0\" class=\"chroma\"><code class=\"language-bash\" data-lang=\"bash\"><span class=\"line\"><span class=\"cl\">$ python -m ipykernel install --user --name envname --display-name <span class=\"s2\">&#34;Python (envname)&#34;<\/span>\n<\/span><\/span><\/code><\/pre><\/div> <h2 id=\"dependency-management\">\n  <a href=\"#dependency-management\" class=\"header-link\">\n    Dependency management\n  <\/a>\n<\/h2><p>Since I use VS Code as my frontend I just need to add the <code>ipykernel<\/code> package into the virtual environment that I use to manage all other dependencies used to run the notebook. This ensures that the same python executable is used for the kernel and the shell environment.<span class=\"sidenote-number\"><small class=\"sidenote\">This is well explained \n<a href=\"https:\/\/jakevdp.github.io\/blog\/2017\/12\/05\/installing-python-packages-from-jupyter\/\" \n\n    target=\"_blank\"\n    >here\n    \n        <span style=\"white-space: nowrap\">&thinsp;<svg\n            style=\"height: 0.7em; width: 0.7em; padding-left: -0.2em;\" focusable=\"false\" data-prefix=\"fas\" data-icon=\"external-link-alt\"\n            class=\"svg-inline--fa fa-external-link-alt fa-w-16\" role=\"img\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\"\n            viewBox=\"0 0 512 512\">\n            <path fill=\"currentColor\"\n                d=\"M432,320H400a16,16,0,0,0-16,16V448H64V128H208a16,16,0,0,0,16-16V80a16,16,0,0,0-16-16H48A48,48,0,0,0,0,112V464a48,48,0,0,0,48,48H400a48,48,0,0,0,48-48V336A16,16,0,0,0,432,320ZM488,0h-128c-21.37,0-32.05,25.91-17,41l35.73,35.73L135,320.37a24,24,0,0,0,0,34L157.67,377a24,24,0,0,0,34,0L435.28,133.32,471,169c15,15,41,4.5,41-17V24A24,24,0,0,0,488,0Z\">\n            <\/path>\n        <\/svg>\n    <\/span>\n        \n    \n<\/a>.<\/small><\/span><\/p>\n<p>These are the steps to create a new environment for a jupyter notebook:<\/p>\n<ol>\n<li>Create project folder:\n<div class=\"highlight\"><pre tabindex=\"0\" class=\"chroma\"><code class=\"language-bash\" data-lang=\"bash\"><span class=\"line\"><span class=\"cl\">$ mkdir notebook_project\n<\/span><\/span><\/code><\/pre><\/div><\/li>\n<li>Create new virtual environment in the folder, then activate it\n<div class=\"highlight\"><pre tabindex=\"0\" class=\"chroma\"><code class=\"language-bash\" data-lang=\"bash\"><span class=\"line\"><span class=\"cl\">$ <span class=\"nb\">cd<\/span> notebook_project\n<\/span><\/span><span class=\"line\"><span class=\"cl\">$ python3 -m venv .venv\n<\/span><\/span><span class=\"line\"><span class=\"cl\">$ <span class=\"nb\">source<\/span> .venv\/bin\/activate\n<\/span><\/span><\/code><\/pre><\/div><\/li>\n<li>Install <code>ipykernel<\/code> (and other dependencies) using pip (make sure the venv is activated):\n<div class=\"highlight\"><pre tabindex=\"0\" class=\"chroma\"><code class=\"language-bash\" data-lang=\"bash\"><span class=\"line\"><span class=\"cl\">$ python3 -m pip install ipykernel\n<\/span><\/span><span class=\"line\"><span class=\"cl\">$ python3 -m pip install pandas\n<\/span><\/span><\/code><\/pre><\/div><\/li>\n<li>Create a new notebook\n<div class=\"highlight\"><pre tabindex=\"0\" class=\"chroma\"><code class=\"language-bash\" data-lang=\"bash\"><span class=\"line\"><span class=\"cl\"><span class=\"nv\">$touch<\/span> mynotebook.ipynb\n<\/span><\/span><\/code><\/pre><\/div><\/li>\n<li>Open notebook in VSCode and in the top right corner select <code>Select Kernel<\/code> &ndash;&gt; <code>Python Environment<\/code> &ndash;&gt; <code>.venv (.venv\/bin\/python)<\/code><\/li>\n<li>You should now be able to run the notebook and use the pandas package inside the notebook.<\/li>\n<li>To add new dependencies:\n<ul>\n<li>Use the terminal: <code>$ python3 -m pip install &lt;package_name&gt;<\/code><\/li>\n<li>Install from within a notebook cell:\n<div class=\"highlight\"><pre tabindex=\"0\" class=\"chroma\"><code class=\"language-python\" data-lang=\"python\"><span class=\"line\"><span class=\"cl\"><span class=\"kn\">import<\/span> <span class=\"nn\">sys<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"err\">!<\/span><span class=\"p\">{<\/span><span class=\"n\">sys<\/span><span class=\"o\">.<\/span><span class=\"n\">executable<\/span><span class=\"p\">}<\/span> <span class=\"o\">-<\/span><span class=\"n\">m<\/span> <span class=\"n\">pip<\/span> <span class=\"n\">install<\/span> <span class=\"o\">&lt;<\/span><span class=\"n\">package_name<\/span><span class=\"o\">&gt;<\/span>\n<\/span><\/span><\/code><\/pre><\/div><\/li>\n<li>Specify your dependencies in a <code>requirements.txt<\/code><\/li>\n<\/ul>\n<\/li>\n<\/ol>\n<p>I use \n<a href=\"https:\/\/python-poetry.org\/\" \n\n    target=\"_blank\"\n    >poetry\n    \n        <span style=\"white-space: nowrap\">&thinsp;<svg\n            style=\"height: 0.7em; width: 0.7em; padding-left: -0.2em;\" focusable=\"false\" data-prefix=\"fas\" data-icon=\"external-link-alt\"\n            class=\"svg-inline--fa fa-external-link-alt fa-w-16\" role=\"img\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\"\n            viewBox=\"0 0 512 512\">\n            <path fill=\"currentColor\"\n                d=\"M432,320H400a16,16,0,0,0-16,16V448H64V128H208a16,16,0,0,0,16-16V80a16,16,0,0,0-16-16H48A48,48,0,0,0,0,112V464a48,48,0,0,0,48,48H400a48,48,0,0,0,48-48V336A16,16,0,0,0,432,320ZM488,0h-128c-21.37,0-32.05,25.91-17,41l35.73,35.73L135,320.37a24,24,0,0,0,0,34L157.67,377a24,24,0,0,0,34,0L435.28,133.32,471,169c15,15,41,4.5,41-17V24A24,24,0,0,0,488,0Z\">\n            <\/path>\n        <\/svg>\n    <\/span>\n        \n    \n<\/a> for virtual environments and dependency management. So in step 2 I would instead use:<\/p>\n<div class=\"highlight\"><pre tabindex=\"0\" class=\"chroma\"><code class=\"language-bash\" data-lang=\"bash\"><span class=\"line\"><span class=\"cl\">$ <span class=\"nb\">cd<\/span> notebook_project\n<\/span><\/span><span class=\"line\"><span class=\"cl\">$ poetry init\n<\/span><\/span><\/code><\/pre><\/div><p>and install packages via:<\/p>\n<div class=\"highlight\"><pre tabindex=\"0\" class=\"chroma\"><code class=\"language-bash\" data-lang=\"bash\"><span class=\"line\"><span class=\"cl\">$ poetry add ipykernel\n<\/span><\/span><span class=\"line\"><span class=\"cl\">$ poetry add pandas\n<\/span><\/span><\/code><\/pre><\/div><p>If I want to use the default Jupyter UI, I can install the <code>jupyter<\/code>  metapackage into my environment and then start the UI with:<\/p>\n<div class=\"highlight\"><pre tabindex=\"0\" class=\"chroma\"><code class=\"language-bash\" data-lang=\"bash\"><span class=\"line\"><span class=\"cl\">$ poetry add jupyter\n<\/span><\/span><span class=\"line\"><span class=\"cl\">$ poetry run jupyter notebook\n<\/span><\/span><\/code><\/pre><\/div><p><strong>Links<\/strong><\/p>\n<ul>\n<li>\n<a href=\"https:\/\/jakevdp.github.io\/blog\/2017\/12\/05\/installing-python-packages-from-jupyter\/\" \n\n    target=\"_blank\"\n    >https:\/\/jakevdp.github.io\/blog\/2017\/12\/05\/installing-python-packages-from-jupyter\/\n    \n        <span style=\"white-space: nowrap\">&thinsp;<svg\n            style=\"height: 0.7em; width: 0.7em; padding-left: -0.2em;\" focusable=\"false\" data-prefix=\"fas\" data-icon=\"external-link-alt\"\n            class=\"svg-inline--fa fa-external-link-alt fa-w-16\" role=\"img\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\"\n            viewBox=\"0 0 512 512\">\n            <path fill=\"currentColor\"\n                d=\"M432,320H400a16,16,0,0,0-16,16V448H64V128H208a16,16,0,0,0,16-16V80a16,16,0,0,0-16-16H48A48,48,0,0,0,0,112V464a48,48,0,0,0,48,48H400a48,48,0,0,0,48-48V336A16,16,0,0,0,432,320ZM488,0h-128c-21.37,0-32.05,25.91-17,41l35.73,35.73L135,320.37a24,24,0,0,0,0,34L157.67,377a24,24,0,0,0,34,0L435.28,133.32,471,169c15,15,41,4.5,41-17V24A24,24,0,0,0,488,0Z\">\n            <\/path>\n        <\/svg>\n    <\/span>\n        \n    \n<\/a><\/li>\n<li>\n<a href=\"https:\/\/code.visualstudio.com\/docs\/datascience\/jupyter-kernel-management\" \n\n    target=\"_blank\"\n    >https:\/\/code.visualstudio.com\/docs\/datascience\/jupyter-kernel-management\n    \n        <span style=\"white-space: nowrap\">&thinsp;<svg\n            style=\"height: 0.7em; width: 0.7em; padding-left: -0.2em;\" focusable=\"false\" data-prefix=\"fas\" data-icon=\"external-link-alt\"\n            class=\"svg-inline--fa fa-external-link-alt fa-w-16\" role=\"img\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\"\n            viewBox=\"0 0 512 512\">\n            <path fill=\"currentColor\"\n                d=\"M432,320H400a16,16,0,0,0-16,16V448H64V128H208a16,16,0,0,0,16-16V80a16,16,0,0,0-16-16H48A48,48,0,0,0,0,112V464a48,48,0,0,0,48,48H400a48,48,0,0,0,48-48V336A16,16,0,0,0,432,320ZM488,0h-128c-21.37,0-32.05,25.91-17,41l35.73,35.73L135,320.37a24,24,0,0,0,0,34L157.67,377a24,24,0,0,0,34,0L435.28,133.32,471,169c15,15,41,4.5,41-17V24A24,24,0,0,0,488,0Z\">\n            <\/path>\n        <\/svg>\n    <\/span>\n        \n    \n<\/a><\/li>\n<li>\n<a href=\"https:\/\/docs.jupyter.org\/en\/latest\/projects\/architecture\/content-architecture.html\" \n\n    target=\"_blank\"\n    >https:\/\/docs.jupyter.org\/en\/latest\/projects\/architecture\/content-architecture.html\n    \n        <span style=\"white-space: nowrap\">&thinsp;<svg\n            style=\"height: 0.7em; width: 0.7em; padding-left: -0.2em;\" focusable=\"false\" data-prefix=\"fas\" data-icon=\"external-link-alt\"\n            class=\"svg-inline--fa fa-external-link-alt fa-w-16\" role=\"img\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\"\n            viewBox=\"0 0 512 512\">\n            <path fill=\"currentColor\"\n                d=\"M432,320H400a16,16,0,0,0-16,16V448H64V128H208a16,16,0,0,0,16-16V80a16,16,0,0,0-16-16H48A48,48,0,0,0,0,112V464a48,48,0,0,0,48,48H400a48,48,0,0,0,48-48V336A16,16,0,0,0,432,320ZM488,0h-128c-21.37,0-32.05,25.91-17,41l35.73,35.73L135,320.37a24,24,0,0,0,0,34L157.67,377a24,24,0,0,0,34,0L435.28,133.32,471,169c15,15,41,4.5,41-17V24A24,24,0,0,0,488,0Z\">\n            <\/path>\n        <\/svg>\n    <\/span>\n        \n    \n<\/a><\/li>\n<\/ul>\n"},{"title":"Data analysis: V\u00e4tternrundan 2024 results","link":"\/posts\/vatternrundan-results\/","pubDate":"Sat, 03 Aug 2024 00:00:00 +0000","guid":"\/posts\/vatternrundan-results\/","description":"<p>In 2024, I was foolish enough to participate in the \n<a href=\"https:\/\/vatternrundan.se\/en\/\" \n\n    target=\"_blank\"\n    >V\u00e4tternrundan\n    \n        <span style=\"white-space: nowrap\">&thinsp;<svg\n            style=\"height: 0.7em; width: 0.7em; padding-left: -0.2em;\" focusable=\"false\" data-prefix=\"fas\" data-icon=\"external-link-alt\"\n            class=\"svg-inline--fa fa-external-link-alt fa-w-16\" role=\"img\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\"\n            viewBox=\"0 0 512 512\">\n            <path fill=\"currentColor\"\n                d=\"M432,320H400a16,16,0,0,0-16,16V448H64V128H208a16,16,0,0,0,16-16V80a16,16,0,0,0-16-16H48A48,48,0,0,0,0,112V464a48,48,0,0,0,48,48H400a48,48,0,0,0,48-48V336A16,16,0,0,0,432,320ZM488,0h-128c-21.37,0-32.05,25.91-17,41l35.73,35.73L135,320.37a24,24,0,0,0,0,34L157.67,377a24,24,0,0,0,34,0L435.28,133.32,471,169c15,15,41,4.5,41-17V24A24,24,0,0,0,488,0Z\">\n            <\/path>\n        <\/svg>\n    <\/span>\n        \n    \n<\/a>, which is a 315km distance bike sportive around Sweden&rsquo;s second biggest lake. It took me and my friends a little less than 11.5 hours of cycling (15h including breaks).<\/p>\n<p>Since this event is a \n<a href=\"https:\/\/en.wikipedia.org\/wiki\/Cyclosportive\" \n\n    target=\"_blank\"\n    >sportive\n    \n        <span style=\"white-space: nowrap\">&thinsp;<svg\n                style=\"height: 0.7em; width: 0.7em; margin-left: -0.2em;\" focusable=\"false\" data-prefix=\"fas\" data-icon=\"external-link-alt\"\n                class=\"svg-inline--fa fa-external-link-alt fa-w-16\" role=\"img\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\"\n                viewBox=\"0 0 640 512\">\n                <path fill=\"currentColor\"\n                    d=\"M640 51.2l-.3 12.2c-28.1 .8-45 15.8-55.8 40.3-25 57.8-103.3 240-155.3 358.6H415l-81.9-193.1c-32.5 63.6-68.3 130-99.2 193.1-.3 .3-15 0-15-.3C172 352.3 122.8 243.4 75.8 133.4 64.4 106.7 26.4 63.4 .2 63.7c0-3.1-.3-10-.3-14.2h161.9v13.9c-19.2 1.1-52.8 13.3-43.3 34.2 21.9 49.7 103.6 240.3 125.6 288.6 15-29.7 57.8-109.2 75.3-142.8-13.9-28.3-58.6-133.9-72.8-160-9.7-17.8-36.1-19.4-55.8-19.7V49.8l142.5 .3v13.1c-19.4 .6-38.1 7.8-29.4 26.1 18.9 40 30.6 68.1 48.1 104.7 5.6-10.8 34.7-69.4 48.1-100.8 8.9-20.6-3.9-28.6-38.6-29.4 .3-3.6 0-10.3 .3-13.6 44.4-.3 111.1-.3 123.1-.6v13.6c-22.5 .8-45.8 12.8-58.1 31.7l-59.2 122.8c6.4 16.1 63.3 142.8 69.2 156.7L559.2 91.8c-8.6-23.1-36.4-28.1-47.2-28.3V49.6l127.8 1.1 .2 .5z\">\n                <\/path>\n            <\/svg>\n        <\/span>\n        \n    \n<\/a>, not a race, the organizers only publish finishing times via a bib number search on \n<a href=\"https:\/\/vatternrundan.se\/en\/participants-results\/\" \n\n    target=\"_blank\"\n    >their website\n    \n        <span style=\"white-space: nowrap\">&thinsp;<svg\n            style=\"height: 0.7em; width: 0.7em; padding-left: -0.2em;\" focusable=\"false\" data-prefix=\"fas\" data-icon=\"external-link-alt\"\n            class=\"svg-inline--fa fa-external-link-alt fa-w-16\" role=\"img\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\"\n            viewBox=\"0 0 512 512\">\n            <path fill=\"currentColor\"\n                d=\"M432,320H400a16,16,0,0,0-16,16V448H64V128H208a16,16,0,0,0,16-16V80a16,16,0,0,0-16-16H48A48,48,0,0,0,0,112V464a48,48,0,0,0,48,48H400a48,48,0,0,0,48-48V336A16,16,0,0,0,432,320ZM488,0h-128c-21.37,0-32.05,25.91-17,41l35.73,35.73L135,320.37a24,24,0,0,0,0,34L157.67,377a24,24,0,0,0,34,0L435.28,133.32,471,169c15,15,41,4.5,41-17V24A24,24,0,0,0,488,0Z\">\n            <\/path>\n        <\/svg>\n    <\/span>\n        \n    \n<\/a>. However, I was curious to understand how well my group did overall.\nTo get my hands on the data I wrote a small scraping script to collect all finish times from the results page.<\/p>\n<p>I can use this data to find out:<\/p>\n<ul>\n<li>information about the participants, e.g. what countries had a lot of starters<\/li>\n<li>distribution of race times, e.g. how fast is the median rider<\/li>\n<li>what percentage of people finished the race<\/li>\n<\/ul>\n <h2 id=\"scraping-the-dataset\">\n  <a href=\"#scraping-the-dataset\" class=\"header-link\">\n    Scraping the dataset\n  <\/a>\n<\/h2><p>Since getting the timing information from the results page involves several clicks for each start number, I use \n<a href=\"https:\/\/www.selenium.dev\/\" \n\n    target=\"_blank\"\n    >selenium\n    \n        <span style=\"white-space: nowrap\">&thinsp;<svg\n            style=\"height: 0.7em; width: 0.7em; padding-left: -0.2em;\" focusable=\"false\" data-prefix=\"fas\" data-icon=\"external-link-alt\"\n            class=\"svg-inline--fa fa-external-link-alt fa-w-16\" role=\"img\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\"\n            viewBox=\"0 0 512 512\">\n            <path fill=\"currentColor\"\n                d=\"M432,320H400a16,16,0,0,0-16,16V448H64V128H208a16,16,0,0,0,16-16V80a16,16,0,0,0-16-16H48A48,48,0,0,0,0,112V464a48,48,0,0,0,48,48H400a48,48,0,0,0,48-48V336A16,16,0,0,0,432,320ZM488,0h-128c-21.37,0-32.05,25.91-17,41l35.73,35.73L135,320.37a24,24,0,0,0,0,34L157.67,377a24,24,0,0,0,34,0L435.28,133.32,471,169c15,15,41,4.5,41-17V24A24,24,0,0,0,488,0Z\">\n            <\/path>\n        <\/svg>\n    <\/span>\n        \n    \n<\/a> to automate the browser actions. The script performs the actions for each start number between 1 and 30,000 and collects the timing information. Some start numbers are not used and many people didn&rsquo;t finish the race. All the data is stored in a parquet file that is used in the next section. I had to do some light data cleaning, e.g. converting scraped strings to timedelta, removing data for unused startnumbers, etc.<\/p>\n <h2 id=\"loading-the-dataset\">\n  <a href=\"#loading-the-dataset\" class=\"header-link\">\n    Loading the dataset\n  <\/a>\n<\/h2><p>I load the data from the parquet file into a pandas dataframe. I perform a few additional data cleaning steps, so it is easier to work with the dataset.<\/p>\n<div class=\"highlight\"><pre tabindex=\"0\" class=\"chroma\"><code class=\"language-python\" data-lang=\"python\"><span class=\"line\"><span class=\"cl\"><span class=\"kn\">import<\/span> <span class=\"nn\">pandas<\/span> <span class=\"k\">as<\/span> <span class=\"nn\">pd<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"kn\">import<\/span> <span class=\"nn\">numpy<\/span> <span class=\"k\">as<\/span> <span class=\"nn\">np<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"kn\">import<\/span> <span class=\"nn\">datetime<\/span> <span class=\"k\">as<\/span> <span class=\"nn\">dt<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"kn\">import<\/span> <span class=\"nn\">matplotlib.pyplot<\/span> <span class=\"k\">as<\/span> <span class=\"nn\">plt<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"kn\">import<\/span> <span class=\"nn\">seaborn<\/span> <span class=\"k\">as<\/span> <span class=\"nn\">sns<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"kn\">from<\/span> <span class=\"nn\">matplotlib.ticker<\/span> <span class=\"kn\">import<\/span> <span class=\"n\">FuncFormatter<\/span>\n<\/span><\/span><\/code><\/pre><\/div><div class=\"highlight\"><pre tabindex=\"0\" class=\"chroma\"><code class=\"language-python\" data-lang=\"python\"><span class=\"line\"><span class=\"cl\"><span class=\"n\">df<\/span> <span class=\"o\">=<\/span> <span class=\"n\">pd<\/span><span class=\"o\">.<\/span><span class=\"n\">read_parquet<\/span><span class=\"p\">(<\/span><span class=\"s2\">&#34;results_vatternrundan24.parquet&#34;<\/span><span class=\"p\">)<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"n\">df<\/span><span class=\"p\">[<\/span><span class=\"s1\">&#39;result_time_minutes&#39;<\/span><span class=\"p\">]<\/span> <span class=\"o\">=<\/span> <span class=\"n\">df<\/span><span class=\"p\">[<\/span><span class=\"s1\">&#39;result_time&#39;<\/span><span class=\"p\">]<\/span><span class=\"o\">.<\/span><span class=\"n\">dt<\/span><span class=\"o\">.<\/span><span class=\"n\">total_seconds<\/span><span class=\"p\">()<\/span> <span class=\"o\">\/<\/span> <span class=\"mi\">60<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"n\">df<\/span><span class=\"p\">[<\/span><span class=\"s1\">&#39;average_speed&#39;<\/span><span class=\"p\">]<\/span> <span class=\"o\">=<\/span> <span class=\"mi\">1<\/span><span class=\"o\">\/<\/span><span class=\"n\">df<\/span><span class=\"p\">[<\/span><span class=\"s1\">&#39;result_time_minutes&#39;<\/span><span class=\"p\">]<\/span> <span class=\"o\">*<\/span> <span class=\"mi\">315<\/span> <span class=\"o\">*<\/span> <span class=\"mi\">60<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"n\">df<\/span><span class=\"p\">[<\/span><span class=\"s1\">&#39;has_finished&#39;<\/span><span class=\"p\">]<\/span> <span class=\"o\">=<\/span> <span class=\"o\">~<\/span><span class=\"n\">df<\/span><span class=\"p\">[<\/span><span class=\"s1\">&#39;result_time&#39;<\/span><span class=\"p\">]<\/span><span class=\"o\">.<\/span><span class=\"n\">isna<\/span><span class=\"p\">()<\/span> <span class=\"o\">&amp;<\/span> <span class=\"p\">(<\/span><span class=\"n\">df<\/span><span class=\"p\">[<\/span><span class=\"s1\">&#39;timing_consistent&#39;<\/span><span class=\"p\">])<\/span>\n<\/span><\/span><\/code><\/pre><\/div><div class=\"highlight\"><pre tabindex=\"0\" class=\"chroma\"><code class=\"language-python\" data-lang=\"python\"><span class=\"line\"><span class=\"cl\"><span class=\"nb\">print<\/span><span class=\"p\">(<\/span><span class=\"sa\">f<\/span><span class=\"s1\">&#39;The dataset contains <\/span><span class=\"si\">{<\/span><span class=\"n\">df<\/span><span class=\"o\">.<\/span><span class=\"n\">shape<\/span><span class=\"p\">[<\/span><span class=\"mi\">0<\/span><span class=\"p\">]<\/span><span class=\"si\">}<\/span><span class=\"s1\"> entries.&#39;<\/span><span class=\"p\">)<\/span>\n<\/span><\/span><\/code><\/pre><\/div><pre><code>The dataset contains 15813 entries.\n<\/code><\/pre>\n <h2 id=\"analysing-the-dataset\">\n  <a href=\"#analysing-the-dataset\" class=\"header-link\">\n    Analysing the dataset\n  <\/a>\n<\/h2><p>The dataset has the following columns:<\/p>\n<ul>\n<li><code>startnumber<\/code><\/li>\n<li><code>initials<\/code>: initials of the rider<\/li>\n<li><code>club<\/code>: club of the rider<\/li>\n<li><code>city<\/code>: home city of the rider<\/li>\n<li><code>country<\/code>: home country of the rider<\/li>\n<li><code>result_time<\/code>: full duration between crossing start and finish lines (includes breaks at food stations)<\/li>\n<li><code>average_speed<\/code><\/li>\n<li><code>is_sub9<\/code>: whether the rider was part of a sub9 group that started later in the day (after 11:30 on Saturday)<\/li>\n<li><code>has_finished<\/code>: a boolean whether a rider completed the event or dropped out<\/li>\n<\/ul>\n<div class=\"highlight\"><pre tabindex=\"0\" class=\"chroma\"><code class=\"language-python\" data-lang=\"python\"><span class=\"line\"><span class=\"cl\"><span class=\"n\">display<\/span><span class=\"p\">(<\/span><span class=\"n\">df<\/span><span class=\"p\">[[<\/span><span class=\"s1\">&#39;startnumber&#39;<\/span><span class=\"p\">,<\/span> <span class=\"s1\">&#39;initials&#39;<\/span><span class=\"p\">,<\/span> <span class=\"s1\">&#39;club&#39;<\/span><span class=\"p\">,<\/span><span class=\"s1\">&#39;city&#39;<\/span><span class=\"p\">,<\/span> <span class=\"s1\">&#39;country&#39;<\/span><span class=\"p\">,<\/span> <span class=\"s1\">&#39;result_time&#39;<\/span><span class=\"p\">,<\/span> <span class=\"s1\">&#39;average_speed&#39;<\/span><span class=\"p\">,<\/span> <span class=\"s1\">&#39;has_finished&#39;<\/span><span class=\"p\">]]<\/span><span class=\"o\">.<\/span><span class=\"n\">head<\/span><span class=\"p\">(<\/span><span class=\"mi\">5<\/span><span class=\"p\">))<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"n\">df<\/span><span class=\"o\">.<\/span><span class=\"n\">dtypes<\/span>\n<\/span><\/span><\/code><\/pre><\/div><div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n<pre><code>.dataframe tbody tr th {\n    vertical-align: top;\n}\n\n.dataframe thead th {\n    text-align: right;\n}\n<\/code><\/pre>\n<p><\/style><\/p>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th><\/th>\n      <th>startnumber<\/th>\n      <th>initials<\/th>\n      <th>club<\/th>\n      <th>city<\/th>\n      <th>country<\/th>\n      <th>result_time<\/th>\n      <th>average_speed<\/th>\n      <th>has_finished<\/th>\n    <\/tr>\n  <\/thead>\n  <tbody>\n    <tr>\n      <th>40<\/th>\n      <td>5121<\/td>\n      <td>JK<\/td>\n      <td>-<\/td>\n      <td>Mantorp<\/td>\n      <td>SE<\/td>\n      <td>0 days 15:11:00<\/td>\n      <td>20.746432<\/td>\n      <td>True<\/td>\n    <\/tr>\n    <tr>\n      <th>41<\/th>\n      <td>5122<\/td>\n      <td>MI<\/td>\n      <td>-<\/td>\n      <td>Sk\u00e4rblacka<\/td>\n      <td>SE<\/td>\n      <td>0 days 15:11:00<\/td>\n      <td>20.746432<\/td>\n      <td>True<\/td>\n    <\/tr>\n    <tr>\n      <th>42<\/th>\n      <td>5123<\/td>\n      <td>TP<\/td>\n      <td>-<\/td>\n      <td>H\u00f6\u00f6r<\/td>\n      <td>SE<\/td>\n      <td>0 days 14:35:00<\/td>\n      <td>21.600000<\/td>\n      <td>True<\/td>\n    <\/tr>\n    <tr>\n      <th>43<\/th>\n      <td>5124<\/td>\n      <td>KH<\/td>\n      <td>Trimgutta<\/td>\n      <td>L\u00f8venstad<\/td>\n      <td>NO<\/td>\n      <td>0 days 14:47:00<\/td>\n      <td>21.307779<\/td>\n      <td>True<\/td>\n    <\/tr>\n    <tr>\n      <th>44<\/th>\n      <td>5125<\/td>\n      <td>HN<\/td>\n      <td>-<\/td>\n      <td>Jessheim<\/td>\n      <td>NO<\/td>\n      <td>NaT<\/td>\n      <td>NaN<\/td>\n      <td>False<\/td>\n    <\/tr>\n  <\/tbody>\n<\/table>\n<\/div>\n<pre><code>startnumber                      int64\ninitials                        object\nclub                            object\ncity                            object\ncountry                         object\nresult_time            timedelta64[ns]\nstart_time                      object\nstation_records                 object\ntiming_consistent                 bool\nis_sub9                           bool\nresult_time_minutes            float64\naverage_speed                  float64\nhas_finished                      bool\ndtype: object\n<\/code><\/pre>\n <h3 id=\"the-riders\">\n  <a href=\"#the-riders\" class=\"header-link\">\n    The riders\n  <\/a>\n<\/h3><p>Let&rsquo;s first look at the top countries, cities, and clubs that are in this dataset. Not surprisingly, Sweden (SE) has the most participants, followed by Germany (DE), Norway (NO), and Finland (FI).<\/p>\n<div class=\"highlight\"><pre tabindex=\"0\" class=\"chroma\"><code class=\"language-python\" data-lang=\"python\"><span class=\"line\"><span class=\"cl\"><span class=\"n\">df<\/span><span class=\"o\">.<\/span><span class=\"n\">groupby<\/span><span class=\"p\">(<\/span><span class=\"s1\">&#39;country&#39;<\/span><span class=\"p\">)<\/span><span class=\"o\">.<\/span><span class=\"n\">agg<\/span><span class=\"p\">(<\/span><span class=\"n\">num_starters<\/span> <span class=\"o\">=<\/span> <span class=\"p\">(<\/span><span class=\"s1\">&#39;startnumber&#39;<\/span><span class=\"p\">,<\/span> <span class=\"s1\">&#39;count&#39;<\/span><span class=\"p\">),<\/span> <span class=\"n\">num_finishers<\/span> <span class=\"o\">=<\/span> <span class=\"p\">(<\/span><span class=\"s1\">&#39;has_finished&#39;<\/span><span class=\"p\">,<\/span> <span class=\"s1\">&#39;sum&#39;<\/span><span class=\"p\">),<\/span> <span class=\"n\">perc_finished<\/span> <span class=\"o\">=<\/span> <span class=\"p\">(<\/span><span class=\"s1\">&#39;has_finished&#39;<\/span><span class=\"p\">,<\/span> <span class=\"s1\">&#39;mean&#39;<\/span><span class=\"p\">),<\/span> <span class=\"n\">mean_result_time<\/span> <span class=\"o\">=<\/span> <span class=\"p\">(<\/span><span class=\"s1\">&#39;result_time&#39;<\/span><span class=\"p\">,<\/span> <span class=\"s1\">&#39;mean&#39;<\/span><span class=\"p\">),<\/span> <span class=\"n\">fastest_finisher<\/span> <span class=\"o\">=<\/span> <span class=\"p\">(<\/span><span class=\"s1\">&#39;result_time&#39;<\/span><span class=\"p\">,<\/span> <span class=\"s1\">&#39;min&#39;<\/span><span class=\"p\">))<\/span><span class=\"o\">.<\/span><span class=\"n\">sort_values<\/span><span class=\"p\">(<\/span><span class=\"s1\">&#39;num_starters&#39;<\/span><span class=\"p\">,<\/span> <span class=\"n\">ascending<\/span><span class=\"o\">=<\/span><span class=\"kc\">False<\/span><span class=\"p\">)<\/span><span class=\"o\">.<\/span><span class=\"n\">head<\/span><span class=\"p\">(<\/span><span class=\"mi\">10<\/span><span class=\"p\">)<\/span>\n<\/span><\/span><\/code><\/pre><\/div><div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n<pre><code>.dataframe tbody tr th {\n    vertical-align: top;\n}\n\n.dataframe thead th {\n    text-align: right;\n}\n<\/code><\/pre>\n<p><\/style><\/p>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th><\/th>\n      <th>num_starters<\/th>\n      <th>num_finishers<\/th>\n      <th>perc_finished<\/th>\n      <th>mean_result_time<\/th>\n      <th>fastest_finisher<\/th>\n    <\/tr>\n    <tr>\n      <th>country<\/th>\n      <th><\/th>\n      <th><\/th>\n      <th><\/th>\n      <th><\/th>\n      <th><\/th>\n    <\/tr>\n  <\/thead>\n  <tbody>\n    <tr>\n      <th>SE<\/th>\n      <td>12035<\/td>\n      <td>9904<\/td>\n      <td>0.822933<\/td>\n      <td>0 days 13:38:44.486094316<\/td>\n      <td>0 days 05:15:00<\/td>\n    <\/tr>\n    <tr>\n      <th>DE<\/th>\n      <td>1437<\/td>\n      <td>1174<\/td>\n      <td>0.816980<\/td>\n      <td>0 days 14:30:54.331914893<\/td>\n      <td>0 days 07:37:00<\/td>\n    <\/tr>\n    <tr>\n      <th>NO<\/th>\n      <td>846<\/td>\n      <td>658<\/td>\n      <td>0.777778<\/td>\n      <td>0 days 11:15:29.272727272<\/td>\n      <td>0 days 07:16:00<\/td>\n    <\/tr>\n    <tr>\n      <th>FI<\/th>\n      <td>402<\/td>\n      <td>335<\/td>\n      <td>0.833333<\/td>\n      <td>0 days 12:56:00.537313432<\/td>\n      <td>0 days 08:09:00<\/td>\n    <\/tr>\n    <tr>\n      <th>DK<\/th>\n      <td>346<\/td>\n      <td>281<\/td>\n      <td>0.812139<\/td>\n      <td>0 days 13:24:26.501766784<\/td>\n      <td>0 days 07:57:00<\/td>\n    <\/tr>\n    <tr>\n      <th>GB<\/th>\n      <td>269<\/td>\n      <td>194<\/td>\n      <td>0.721190<\/td>\n      <td>0 days 14:24:04.307692307<\/td>\n      <td>0 days 02:54:00<\/td>\n    <\/tr>\n    <tr>\n      <th>NL<\/th>\n      <td>87<\/td>\n      <td>77<\/td>\n      <td>0.885057<\/td>\n      <td>0 days 14:25:03.896103896<\/td>\n      <td>0 days 08:53:00<\/td>\n    <\/tr>\n    <tr>\n      <th><\/th>\n      <td>43<\/td>\n      <td>36<\/td>\n      <td>0.837209<\/td>\n      <td>0 days 15:36:35<\/td>\n      <td>0 days 09:01:00<\/td>\n    <\/tr>\n    <tr>\n      <th>US<\/th>\n      <td>38<\/td>\n      <td>25<\/td>\n      <td>0.657895<\/td>\n      <td>0 days 13:39:55.384615384<\/td>\n      <td>0 days 09:19:00<\/td>\n    <\/tr>\n    <tr>\n      <th>PL<\/th>\n      <td>38<\/td>\n      <td>30<\/td>\n      <td>0.789474<\/td>\n      <td>0 days 12:52:26<\/td>\n      <td>0 days 09:20:00<\/td>\n    <\/tr>\n  <\/tbody>\n<\/table>\n<\/div>\n<p>The cities with most participants are all Swedish: Stockholm, G\u00f6teborg, Link\u00f6ping, Uppsala, Malm\u00f6.<\/p>\n<div class=\"highlight\"><pre tabindex=\"0\" class=\"chroma\"><code class=\"language-python\" data-lang=\"python\"><span class=\"line\"><span class=\"cl\"><span class=\"n\">df<\/span><span class=\"o\">.<\/span><span class=\"n\">groupby<\/span><span class=\"p\">(<\/span><span class=\"s1\">&#39;city&#39;<\/span><span class=\"p\">)<\/span><span class=\"o\">.<\/span><span class=\"n\">agg<\/span><span class=\"p\">(<\/span><span class=\"n\">num_starters<\/span> <span class=\"o\">=<\/span> <span class=\"p\">(<\/span><span class=\"s1\">&#39;startnumber&#39;<\/span><span class=\"p\">,<\/span> <span class=\"s1\">&#39;count&#39;<\/span><span class=\"p\">),<\/span> <span class=\"n\">num_finishers<\/span> <span class=\"o\">=<\/span> <span class=\"p\">(<\/span><span class=\"s1\">&#39;has_finished&#39;<\/span><span class=\"p\">,<\/span> <span class=\"s1\">&#39;sum&#39;<\/span><span class=\"p\">),<\/span> <span class=\"n\">perc_finished<\/span> <span class=\"o\">=<\/span> <span class=\"p\">(<\/span><span class=\"s1\">&#39;has_finished&#39;<\/span><span class=\"p\">,<\/span> <span class=\"s1\">&#39;mean&#39;<\/span><span class=\"p\">),<\/span> <span class=\"n\">mean_result_time<\/span> <span class=\"o\">=<\/span> <span class=\"p\">(<\/span><span class=\"s1\">&#39;result_time&#39;<\/span><span class=\"p\">,<\/span> <span class=\"s1\">&#39;mean&#39;<\/span><span class=\"p\">),<\/span> <span class=\"n\">fastest_finisher<\/span> <span class=\"o\">=<\/span> <span class=\"p\">(<\/span><span class=\"s1\">&#39;result_time&#39;<\/span><span class=\"p\">,<\/span> <span class=\"s1\">&#39;min&#39;<\/span><span class=\"p\">))<\/span><span class=\"o\">.<\/span><span class=\"n\">sort_values<\/span><span class=\"p\">(<\/span><span class=\"s1\">&#39;num_starters&#39;<\/span><span class=\"p\">,<\/span> <span class=\"n\">ascending<\/span><span class=\"o\">=<\/span><span class=\"kc\">False<\/span><span class=\"p\">)<\/span><span class=\"o\">.<\/span><span class=\"n\">head<\/span><span class=\"p\">(<\/span><span class=\"mi\">10<\/span><span class=\"p\">)<\/span>\n<\/span><\/span><\/code><\/pre><\/div><div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n<pre><code>.dataframe tbody tr th {\n    vertical-align: top;\n}\n\n.dataframe thead th {\n    text-align: right;\n}\n<\/code><\/pre>\n<p><\/style><\/p>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th><\/th>\n      <th>num_starters<\/th>\n      <th>num_finishers<\/th>\n      <th>perc_finished<\/th>\n      <th>mean_result_time<\/th>\n      <th>fastest_finisher<\/th>\n    <\/tr>\n    <tr>\n      <th>city<\/th>\n      <th><\/th>\n      <th><\/th>\n      <th><\/th>\n      <th><\/th>\n      <th><\/th>\n    <\/tr>\n  <\/thead>\n  <tbody>\n    <tr>\n      <th>Stockholm<\/th>\n      <td>851<\/td>\n      <td>726<\/td>\n      <td>0.853114<\/td>\n      <td>0 days 13:37:09.752066115<\/td>\n      <td>0 days 07:31:00<\/td>\n    <\/tr>\n    <tr>\n      <th>G\u00f6teborg<\/th>\n      <td>613<\/td>\n      <td>516<\/td>\n      <td>0.841762<\/td>\n      <td>0 days 13:30:09.266409266<\/td>\n      <td>0 days 07:35:00<\/td>\n    <\/tr>\n    <tr>\n      <th>Link\u00f6ping<\/th>\n      <td>378<\/td>\n      <td>313<\/td>\n      <td>0.828042<\/td>\n      <td>0 days 13:48:16.815286624<\/td>\n      <td>0 days 07:31:00<\/td>\n    <\/tr>\n    <tr>\n      <th>Uppsala<\/th>\n      <td>323<\/td>\n      <td>268<\/td>\n      <td>0.829721<\/td>\n      <td>0 days 13:50:19.029850746<\/td>\n      <td>0 days 07:37:00<\/td>\n    <\/tr>\n    <tr>\n      <th>Malm\u00f6<\/th>\n      <td>233<\/td>\n      <td>191<\/td>\n      <td>0.819742<\/td>\n      <td>0 days 13:52:36.125654450<\/td>\n      <td>0 days 07:31:00<\/td>\n    <\/tr>\n    <tr>\n      <th>\u00d6rebro<\/th>\n      <td>230<\/td>\n      <td>184<\/td>\n      <td>0.800000<\/td>\n      <td>0 days 13:34:19.677419354<\/td>\n      <td>0 days 08:19:00<\/td>\n    <\/tr>\n    <tr>\n      <th>Motala<\/th>\n      <td>223<\/td>\n      <td>186<\/td>\n      <td>0.834081<\/td>\n      <td>0 days 14:07:49.354838709<\/td>\n      <td>0 days 07:35:00<\/td>\n    <\/tr>\n    <tr>\n      <th>V\u00e4ster\u00e5s<\/th>\n      <td>186<\/td>\n      <td>142<\/td>\n      <td>0.763441<\/td>\n      <td>0 days 13:25:54.929577464<\/td>\n      <td>0 days 07:37:00<\/td>\n    <\/tr>\n    <tr>\n      <th>Lund<\/th>\n      <td>168<\/td>\n      <td>138<\/td>\n      <td>0.821429<\/td>\n      <td>0 days 13:56:18.260869565<\/td>\n      <td>0 days 07:16:00<\/td>\n    <\/tr>\n    <tr>\n      <th>J\u00f6nk\u00f6ping<\/th>\n      <td>147<\/td>\n      <td>124<\/td>\n      <td>0.843537<\/td>\n      <td>0 days 13:49:49.354838709<\/td>\n      <td>0 days 08:55:00<\/td>\n    <\/tr>\n  <\/tbody>\n<\/table>\n<\/div>\n<p>The following clubs had the most riders.<\/p>\n<div class=\"highlight\"><pre tabindex=\"0\" class=\"chroma\"><code class=\"language-python\" data-lang=\"python\"><span class=\"line\"><span class=\"cl\"><span class=\"n\">df<\/span><span class=\"o\">.<\/span><span class=\"n\">groupby<\/span><span class=\"p\">(<\/span><span class=\"s1\">&#39;club&#39;<\/span><span class=\"p\">)<\/span><span class=\"o\">.<\/span><span class=\"n\">agg<\/span><span class=\"p\">(<\/span><span class=\"n\">city<\/span> <span class=\"o\">=<\/span> <span class=\"p\">(<\/span><span class=\"s1\">&#39;city&#39;<\/span><span class=\"p\">,<\/span> <span class=\"s1\">&#39;first&#39;<\/span><span class=\"p\">),<\/span> <span class=\"n\">num_starters<\/span> <span class=\"o\">=<\/span> <span class=\"p\">(<\/span><span class=\"s1\">&#39;startnumber&#39;<\/span><span class=\"p\">,<\/span> <span class=\"s1\">&#39;count&#39;<\/span><span class=\"p\">),<\/span> <span class=\"n\">num_finishers<\/span> <span class=\"o\">=<\/span> <span class=\"p\">(<\/span><span class=\"s1\">&#39;has_finished&#39;<\/span><span class=\"p\">,<\/span> <span class=\"s1\">&#39;sum&#39;<\/span><span class=\"p\">),<\/span> <span class=\"n\">perc_finished<\/span> <span class=\"o\">=<\/span> <span class=\"p\">(<\/span><span class=\"s1\">&#39;has_finished&#39;<\/span><span class=\"p\">,<\/span> <span class=\"s1\">&#39;mean&#39;<\/span><span class=\"p\">),<\/span> <span class=\"n\">mean_result_time<\/span> <span class=\"o\">=<\/span> <span class=\"p\">(<\/span><span class=\"s1\">&#39;result_time&#39;<\/span><span class=\"p\">,<\/span> <span class=\"s1\">&#39;mean&#39;<\/span><span class=\"p\">),<\/span> <span class=\"n\">fastest_finisher<\/span> <span class=\"o\">=<\/span> <span class=\"p\">(<\/span><span class=\"s1\">&#39;result_time&#39;<\/span><span class=\"p\">,<\/span> <span class=\"s1\">&#39;min&#39;<\/span><span class=\"p\">))<\/span><span class=\"o\">.<\/span><span class=\"n\">sort_values<\/span><span class=\"p\">(<\/span><span class=\"s1\">&#39;num_starters&#39;<\/span><span class=\"p\">,<\/span> <span class=\"n\">ascending<\/span><span class=\"o\">=<\/span><span class=\"kc\">False<\/span><span class=\"p\">)[<\/span><span class=\"mi\">1<\/span><span class=\"p\">:<\/span><span class=\"mi\">10<\/span><span class=\"p\">]<\/span>\n<\/span><\/span><\/code><\/pre><\/div><div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n<pre><code>.dataframe tbody tr th {\n    vertical-align: top;\n}\n\n.dataframe thead th {\n    text-align: right;\n}\n<\/code><\/pre>\n<p><\/style><\/p>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th><\/th>\n      <th>city<\/th>\n      <th>num_starters<\/th>\n      <th>num_finishers<\/th>\n      <th>perc_finished<\/th>\n      <th>mean_result_time<\/th>\n      <th>fastest_finisher<\/th>\n    <\/tr>\n    <tr>\n      <th>club<\/th>\n      <th><\/th>\n      <th><\/th>\n      <th><\/th>\n      <th><\/th>\n      <th><\/th>\n      <th><\/th>\n    <\/tr>\n  <\/thead>\n  <tbody>\n    <tr>\n      <th>IMOVEFORCANCER<\/th>\n      <td>Karlskrona<\/td>\n      <td>69<\/td>\n      <td>58<\/td>\n      <td>0.840580<\/td>\n      <td>0 days 09:54:26.896551724<\/td>\n      <td>0 days 07:37:00<\/td>\n    <\/tr>\n    <tr>\n      <th>Fredrikshof<\/th>\n      <td>Stockholm<\/td>\n      <td>56<\/td>\n      <td>48<\/td>\n      <td>0.857143<\/td>\n      <td>0 days 10:46:42.500000<\/td>\n      <td>0 days 09:20:00<\/td>\n    <\/tr>\n    <tr>\n      <th>\u00d6rebrocyklisterna<\/th>\n      <td>Kumla<\/td>\n      <td>55<\/td>\n      <td>42<\/td>\n      <td>0.763636<\/td>\n      <td>0 days 11:31:47.142857142<\/td>\n      <td>0 days 08:04:00<\/td>\n    <\/tr>\n    <tr>\n      <th>Team Kung\u00e4lv<\/th>\n      <td>Kung\u00e4lv<\/td>\n      <td>43<\/td>\n      <td>29<\/td>\n      <td>0.674419<\/td>\n      <td>0 days 10:37:20<\/td>\n      <td>0 days 08:53:00<\/td>\n    <\/tr>\n    <tr>\n      <th>schulz sportreisen<\/th>\n      <td>Dresden<\/td>\n      <td>40<\/td>\n      <td>36<\/td>\n      <td>0.900000<\/td>\n      <td>0 days 13:47:45<\/td>\n      <td>0 days 10:20:00<\/td>\n    <\/tr>\n    <tr>\n      <th>KCE - K\u00e4vlinge Cykelentusiaster<\/th>\n      <td>K\u00e4vlinge<\/td>\n      <td>38<\/td>\n      <td>32<\/td>\n      <td>0.842105<\/td>\n      <td>0 days 11:50:01.875000<\/td>\n      <td>0 days 09:41:00<\/td>\n    <\/tr>\n    <tr>\n      <th>Fredrikshofs IF Cykelklubb<\/th>\n      <td>Enskede<\/td>\n      <td>35<\/td>\n      <td>30<\/td>\n      <td>0.857143<\/td>\n      <td>0 days 11:29:52<\/td>\n      <td>0 days 09:44:00<\/td>\n    <\/tr>\n    <tr>\n      <th>F\u00f6rsvarsmakten<\/th>\n      <td>\u00d6rebro<\/td>\n      <td>32<\/td>\n      <td>29<\/td>\n      <td>0.906250<\/td>\n      <td>0 days 13:51:00<\/td>\n      <td>0 days 09:21:00<\/td>\n    <\/tr>\n    <tr>\n      <th>Team Sportia Uppsala<\/th>\n      <td>Uppsala<\/td>\n      <td>31<\/td>\n      <td>29<\/td>\n      <td>0.935484<\/td>\n      <td>0 days 10:50:47.586206896<\/td>\n      <td>0 days 09:24:00<\/td>\n    <\/tr>\n  <\/tbody>\n<\/table>\n<\/div>\n <h3 id=\"number-of-finishers\">\n  <a href=\"#number-of-finishers\" class=\"header-link\">\n    Number of finishers\n  <\/a>\n<\/h3><p>315km is a long event. Let&rsquo;s check how many of the participants finished the event:<\/p>\n<div class=\"highlight\"><pre tabindex=\"0\" class=\"chroma\"><code class=\"language-python\" data-lang=\"python\"><span class=\"line\"><span class=\"cl\"><span class=\"n\">num_starters<\/span> <span class=\"o\">=<\/span> <span class=\"n\">df<\/span><span class=\"o\">.<\/span><span class=\"n\">shape<\/span><span class=\"p\">[<\/span><span class=\"mi\">0<\/span><span class=\"p\">]<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"n\">num_finishers<\/span> <span class=\"o\">=<\/span> <span class=\"n\">df<\/span><span class=\"o\">.<\/span><span class=\"n\">query<\/span><span class=\"p\">(<\/span><span class=\"s1\">&#39;has_finished&#39;<\/span><span class=\"p\">)<\/span><span class=\"o\">.<\/span><span class=\"n\">shape<\/span><span class=\"p\">[<\/span><span class=\"mi\">0<\/span><span class=\"p\">]<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"nb\">print<\/span><span class=\"p\">(<\/span><span class=\"sa\">f<\/span><span class=\"s2\">&#34;Among the <\/span><span class=\"si\">{<\/span><span class=\"n\">num_starters<\/span><span class=\"si\">}<\/span><span class=\"s2\"> starting riders in the dataset, we found recorded and consistent finishing times for <\/span><span class=\"si\">{<\/span><span class=\"n\">num_finishers<\/span><span class=\"si\">}<\/span><span class=\"s2\">. <\/span><span class=\"se\">\\\n<\/span><\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"se\"><\/span><span class=\"s2\">This is a finishing rate of <\/span><span class=\"si\">{<\/span><span class=\"mi\">100<\/span> <span class=\"o\">*<\/span> <span class=\"n\">num_finishers<\/span> <span class=\"o\">\/<\/span> <span class=\"n\">num_starters<\/span><span class=\"si\">:<\/span><span class=\"s2\">.2f<\/span><span class=\"si\">}<\/span><span class=\"s2\">%.&#34;<\/span><span class=\"p\">)<\/span>\n<\/span><\/span><\/code><\/pre><\/div><pre><code>Among the 15813 starting riders in the dataset, we found recorded and consistent finishing times for 12920. This is a finishing rate of 81.70%.\n<\/code><\/pre>\n <h3 id=\"the-podium-and-top-10\">\n  <a href=\"#the-podium-and-top-10\" class=\"header-link\">\n    The podium and top 10\n  <\/a>\n<\/h3><div class=\"highlight\"><pre tabindex=\"0\" class=\"chroma\"><code class=\"language-python\" data-lang=\"python\"><span class=\"line\"><span class=\"cl\"><span class=\"n\">fastest_rider<\/span> <span class=\"o\">=<\/span> <span class=\"n\">df<\/span><span class=\"o\">.<\/span><span class=\"n\">query<\/span><span class=\"p\">(<\/span><span class=\"s1\">&#39;has_finished&#39;<\/span><span class=\"p\">)<\/span><span class=\"o\">.<\/span><span class=\"n\">sort_values<\/span><span class=\"p\">(<\/span><span class=\"s1\">&#39;result_time&#39;<\/span><span class=\"p\">)<\/span><span class=\"o\">.<\/span><span class=\"n\">head<\/span><span class=\"p\">(<\/span><span class=\"mi\">1<\/span><span class=\"p\">)<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"nb\">print<\/span><span class=\"p\">(<\/span><span class=\"sa\">f<\/span><span class=\"s1\">&#39;The fastest rider <\/span><span class=\"si\">{<\/span><span class=\"n\">fastest_rider<\/span><span class=\"o\">.<\/span><span class=\"n\">initials<\/span><span class=\"o\">.<\/span><span class=\"n\">values<\/span><span class=\"p\">[<\/span><span class=\"mi\">0<\/span><span class=\"p\">]<\/span><span class=\"si\">}<\/span><span class=\"s1\"> from <\/span><span class=\"si\">{<\/span><span class=\"n\">fastest_rider<\/span><span class=\"o\">.<\/span><span class=\"n\">city<\/span><span class=\"o\">.<\/span><span class=\"n\">values<\/span><span class=\"p\">[<\/span><span class=\"mi\">0<\/span><span class=\"p\">]<\/span><span class=\"si\">}<\/span><span class=\"s1\"> only needed <\/span><span class=\"si\">{<\/span><span class=\"n\">dt<\/span><span class=\"o\">.<\/span><span class=\"n\">timedelta<\/span><span class=\"p\">(<\/span><span class=\"n\">microseconds<\/span><span class=\"o\">=<\/span> <span class=\"nb\">float<\/span><span class=\"p\">(<\/span><span class=\"n\">fastest_rider<\/span><span class=\"o\">.<\/span><span class=\"n\">result_time<\/span><span class=\"o\">.<\/span><span class=\"n\">values<\/span><span class=\"p\">[<\/span><span class=\"mi\">0<\/span><span class=\"p\">])<\/span><span class=\"o\">\/<\/span><span class=\"mi\">1000<\/span><span class=\"p\">)<\/span><span class=\"si\">}<\/span><span class=\"s1\"> to complete the 315km. <\/span><span class=\"se\">\\\n<\/span><\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"se\"><\/span><span class=\"s1\">That is a mindblowing average speed of <\/span><span class=\"si\">{<\/span><span class=\"p\">(<\/span><span class=\"n\">fastest_rider<\/span><span class=\"o\">.<\/span><span class=\"n\">average_speed<\/span><span class=\"o\">.<\/span><span class=\"n\">values<\/span><span class=\"p\">[<\/span><span class=\"mi\">0<\/span><span class=\"p\">])<\/span><span class=\"si\">:<\/span><span class=\"s1\">.1f<\/span><span class=\"si\">}<\/span><span class=\"s1\"> km\/h.&#39;<\/span><span class=\"p\">)<\/span>\n<\/span><\/span><\/code><\/pre><\/div><pre><code>The fastest rider JL from Lund only needed 7:16:00 to complete the 315km. That is a mindblowing average speed of 43.3 km\/h.\n<\/code><\/pre>\n<p>The 10 fastest riders were below 7h 32min.<\/p>\n<div class=\"highlight\"><pre tabindex=\"0\" class=\"chroma\"><code class=\"language-python\" data-lang=\"python\"><span class=\"line\"><span class=\"cl\"><span class=\"n\">df<\/span><span class=\"o\">.<\/span><span class=\"n\">query<\/span><span class=\"p\">(<\/span><span class=\"s1\">&#39;has_finished&#39;<\/span><span class=\"p\">)[[<\/span><span class=\"s1\">&#39;startnumber&#39;<\/span><span class=\"p\">,<\/span> <span class=\"s1\">&#39;initials&#39;<\/span><span class=\"p\">,<\/span> <span class=\"s1\">&#39;city&#39;<\/span><span class=\"p\">,<\/span> <span class=\"s1\">&#39;club&#39;<\/span><span class=\"p\">,<\/span> <span class=\"s1\">&#39;country&#39;<\/span><span class=\"p\">,<\/span> <span class=\"s1\">&#39;result_time&#39;<\/span><span class=\"p\">,<\/span> <span class=\"s1\">&#39;average_speed&#39;<\/span><span class=\"p\">]]<\/span><span class=\"o\">.<\/span><span class=\"n\">sort_values<\/span><span class=\"p\">(<\/span><span class=\"s1\">&#39;result_time&#39;<\/span><span class=\"p\">,<\/span> <span class=\"n\">ascending<\/span><span class=\"o\">=<\/span><span class=\"kc\">True<\/span><span class=\"p\">)<\/span><span class=\"o\">.<\/span><span class=\"n\">head<\/span><span class=\"p\">(<\/span><span class=\"mi\">10<\/span><span class=\"p\">)<\/span>\n<\/span><\/span><\/code><\/pre><\/div><div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n<pre><code>.dataframe tbody tr th {\n    vertical-align: top;\n}\n\n.dataframe thead th {\n    text-align: right;\n}\n<\/code><\/pre>\n<p><\/style><\/p>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th><\/th>\n      <th>startnumber<\/th>\n      <th>initials<\/th>\n      <th>city<\/th>\n      <th>club<\/th>\n      <th>country<\/th>\n      <th>result_time<\/th>\n      <th>average_speed<\/th>\n    <\/tr>\n  <\/thead>\n  <tbody>\n    <tr>\n      <th>16311<\/th>\n      <td>27012<\/td>\n      <td>JL<\/td>\n      <td>Lund<\/td>\n      <td>Kjekkas IF<\/td>\n      <td>SE<\/td>\n      <td>0 days 07:16:00<\/td>\n      <td>43.348624<\/td>\n    <\/tr>\n    <tr>\n      <th>16301<\/th>\n      <td>27002<\/td>\n      <td>TJ<\/td>\n      <td>Slattum<\/td>\n      <td>-<\/td>\n      <td>NO<\/td>\n      <td>0 days 07:16:00<\/td>\n      <td>43.348624<\/td>\n    <\/tr>\n    <tr>\n      <th>16304<\/th>\n      <td>27005<\/td>\n      <td>EO<\/td>\n      <td>Nittedal<\/td>\n      <td>-<\/td>\n      <td>NO<\/td>\n      <td>0 days 07:16:00<\/td>\n      <td>43.348624<\/td>\n    <\/tr>\n    <tr>\n      <th>16308<\/th>\n      <td>27009<\/td>\n      <td>BHB<\/td>\n      <td>Follebu<\/td>\n      <td>-<\/td>\n      <td>NO<\/td>\n      <td>0 days 07:16:00<\/td>\n      <td>43.348624<\/td>\n    <\/tr>\n    <tr>\n      <th>16310<\/th>\n      <td>27011<\/td>\n      <td>DT<\/td>\n      <td>Oslo<\/td>\n      <td>-<\/td>\n      <td>NO<\/td>\n      <td>0 days 07:16:00<\/td>\n      <td>43.348624<\/td>\n    <\/tr>\n    <tr>\n      <th>16316<\/th>\n      <td>27017<\/td>\n      <td>JK<\/td>\n      <td>Lillehammer<\/td>\n      <td>-<\/td>\n      <td>NO<\/td>\n      <td>0 days 07:16:00<\/td>\n      <td>43.348624<\/td>\n    <\/tr>\n    <tr>\n      <th>16314<\/th>\n      <td>27015<\/td>\n      <td>SD<\/td>\n      <td>Oslo - Norway<\/td>\n      <td>-<\/td>\n      <td>NO<\/td>\n      <td>0 days 07:16:00<\/td>\n      <td>43.348624<\/td>\n    <\/tr>\n    <tr>\n      <th>16781<\/th>\n      <td>27482<\/td>\n      <td>HD<\/td>\n      <td>Kalmar<\/td>\n      <td>-<\/td>\n      <td>SE<\/td>\n      <td>0 days 07:31:00<\/td>\n      <td>41.906874<\/td>\n    <\/tr>\n    <tr>\n      <th>16782<\/th>\n      <td>27483<\/td>\n      <td>TE<\/td>\n      <td>Rockneby<\/td>\n      <td>-<\/td>\n      <td>SE<\/td>\n      <td>0 days 07:31:00<\/td>\n      <td>41.906874<\/td>\n    <\/tr>\n    <tr>\n      <th>16783<\/th>\n      <td>27484<\/td>\n      <td>JL<\/td>\n      <td>Eker\u00f6<\/td>\n      <td>-<\/td>\n      <td>SE<\/td>\n      <td>0 days 07:31:00<\/td>\n      <td>41.906874<\/td>\n    <\/tr>\n  <\/tbody>\n<\/table>\n<\/div>\n<p>The 10 fastest riders that didn&rsquo;t ride as part of a sub-9 registered teams were:<\/p>\n<div class=\"highlight\"><pre tabindex=\"0\" class=\"chroma\"><code class=\"language-python\" data-lang=\"python\"><span class=\"line\"><span class=\"cl\"><span class=\"n\">df<\/span><span class=\"o\">.<\/span><span class=\"n\">query<\/span><span class=\"p\">(<\/span><span class=\"s1\">&#39;has_finished and not is_sub9&#39;<\/span><span class=\"p\">)[[<\/span><span class=\"s1\">&#39;startnumber&#39;<\/span><span class=\"p\">,<\/span> <span class=\"s1\">&#39;initials&#39;<\/span><span class=\"p\">,<\/span> <span class=\"s1\">&#39;city&#39;<\/span><span class=\"p\">,<\/span> <span class=\"s1\">&#39;club&#39;<\/span><span class=\"p\">,<\/span> <span class=\"s1\">&#39;country&#39;<\/span><span class=\"p\">,<\/span> <span class=\"s1\">&#39;result_time&#39;<\/span><span class=\"p\">,<\/span> <span class=\"s1\">&#39;average_speed&#39;<\/span><span class=\"p\">]]<\/span><span class=\"o\">.<\/span><span class=\"n\">sort_values<\/span><span class=\"p\">(<\/span><span class=\"s1\">&#39;result_time&#39;<\/span><span class=\"p\">,<\/span> <span class=\"n\">ascending<\/span><span class=\"o\">=<\/span><span class=\"kc\">True<\/span><span class=\"p\">)<\/span><span class=\"o\">.<\/span><span class=\"n\">head<\/span><span class=\"p\">(<\/span><span class=\"mi\">10<\/span><span class=\"p\">)<\/span>\n<\/span><\/span><\/code><\/pre><\/div><div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n<pre><code>.dataframe tbody tr th {\n    vertical-align: top;\n}\n\n.dataframe thead th {\n    text-align: right;\n}\n<\/code><\/pre>\n<p><\/style><\/p>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th><\/th>\n      <th>startnumber<\/th>\n      <th>initials<\/th>\n      <th>city<\/th>\n      <th>club<\/th>\n      <th>country<\/th>\n      <th>result_time<\/th>\n      <th>average_speed<\/th>\n    <\/tr>\n  <\/thead>\n  <tbody>\n    <tr>\n      <th>8860<\/th>\n      <td>16661<\/td>\n      <td>MN<\/td>\n      <td>Haslum<\/td>\n      <td>Team T\u00f8ff i Tryne<\/td>\n      <td>NO<\/td>\n      <td>0 days 08:02:00<\/td>\n      <td>39.211618<\/td>\n    <\/tr>\n    <tr>\n      <th>8850<\/th>\n      <td>16651<\/td>\n      <td>HL<\/td>\n      <td>Oslo<\/td>\n      <td>Team T\u00f8ff i Trynet<\/td>\n      <td>NO<\/td>\n      <td>0 days 08:02:00<\/td>\n      <td>39.211618<\/td>\n    <\/tr>\n    <tr>\n      <th>28298<\/th>\n      <td>18529<\/td>\n      <td>MS<\/td>\n      <td>Otalampi<\/td>\n      <td>Team Wassu<\/td>\n      <td>FI<\/td>\n      <td>0 days 08:12:00<\/td>\n      <td>38.414634<\/td>\n    <\/tr>\n    <tr>\n      <th>28297<\/th>\n      <td>18528<\/td>\n      <td>JK<\/td>\n      <td>Kotka<\/td>\n      <td>Team Wassu<\/td>\n      <td>FI<\/td>\n      <td>0 days 08:12:00<\/td>\n      <td>38.414634<\/td>\n    <\/tr>\n    <tr>\n      <th>28295<\/th>\n      <td>18526<\/td>\n      <td>JP<\/td>\n      <td>Helsinki<\/td>\n      <td>Team Wassu<\/td>\n      <td>FI<\/td>\n      <td>0 days 08:12:00<\/td>\n      <td>38.414634<\/td>\n    <\/tr>\n    <tr>\n      <th>8849<\/th>\n      <td>16650<\/td>\n      <td>SG<\/td>\n      <td>Eiksmarka<\/td>\n      <td>Team T\u00f8ff i Trynet<\/td>\n      <td>NO<\/td>\n      <td>0 days 08:17:00<\/td>\n      <td>38.028169<\/td>\n    <\/tr>\n    <tr>\n      <th>28296<\/th>\n      <td>18527<\/td>\n      <td>JS<\/td>\n      <td>Hamina<\/td>\n      <td>Team Wassu<\/td>\n      <td>FI<\/td>\n      <td>0 days 08:21:00<\/td>\n      <td>37.724551<\/td>\n    <\/tr>\n    <tr>\n      <th>8861<\/th>\n      <td>16662<\/td>\n      <td>FA<\/td>\n      <td>Oslo<\/td>\n      <td>Team T\u00f8ff i Trynet<\/td>\n      <td>NO<\/td>\n      <td>0 days 08:23:00<\/td>\n      <td>37.574553<\/td>\n    <\/tr>\n    <tr>\n      <th>11817<\/th>\n      <td>23218<\/td>\n      <td>WG<\/td>\n      <td>Lemgo<\/td>\n      <td>RC Sprintax Bielefeld<\/td>\n      <td>DE<\/td>\n      <td>0 days 08:24:00<\/td>\n      <td>37.500000<\/td>\n    <\/tr>\n    <tr>\n      <th>11819<\/th>\n      <td>23220<\/td>\n      <td>CB<\/td>\n      <td>Bielefeld<\/td>\n      <td>RC Sprintax Bielefeld<\/td>\n      <td>DE<\/td>\n      <td>0 days 08:24:00<\/td>\n      <td>37.500000<\/td>\n    <\/tr>\n  <\/tbody>\n<\/table>\n<\/div>\n <h3 id=\"lanterne-rouge\">\n  <a href=\"#lanterne-rouge\" class=\"header-link\">\n    Lanterne rouge\n  <\/a>\n<\/h3><p>The slowest finisher took almost 28h.<\/p>\n<div class=\"highlight\"><pre tabindex=\"0\" class=\"chroma\"><code class=\"language-python\" data-lang=\"python\"><span class=\"line\"><span class=\"cl\"><span class=\"n\">df<\/span><span class=\"o\">.<\/span><span class=\"n\">query<\/span><span class=\"p\">(<\/span><span class=\"s1\">&#39;has_finished&#39;<\/span><span class=\"p\">)[[<\/span><span class=\"s1\">&#39;startnumber&#39;<\/span><span class=\"p\">,<\/span> <span class=\"s1\">&#39;initials&#39;<\/span><span class=\"p\">,<\/span> <span class=\"s1\">&#39;city&#39;<\/span><span class=\"p\">,<\/span> <span class=\"s1\">&#39;country&#39;<\/span><span class=\"p\">,<\/span> <span class=\"s1\">&#39;result_time&#39;<\/span><span class=\"p\">,<\/span> <span class=\"s1\">&#39;average_speed&#39;<\/span><span class=\"p\">]]<\/span><span class=\"o\">.<\/span><span class=\"n\">sort_values<\/span><span class=\"p\">(<\/span><span class=\"s1\">&#39;result_time&#39;<\/span><span class=\"p\">,<\/span> <span class=\"n\">ascending<\/span><span class=\"o\">=<\/span><span class=\"kc\">True<\/span><span class=\"p\">)<\/span><span class=\"o\">.<\/span><span class=\"n\">tail<\/span><span class=\"p\">(<\/span><span class=\"mi\">5<\/span><span class=\"p\">)<\/span>\n<\/span><\/span><\/code><\/pre><\/div><div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n<pre><code>.dataframe tbody tr th {\n    vertical-align: top;\n}\n\n.dataframe thead th {\n    text-align: right;\n}\n<\/code><\/pre>\n<p><\/style><\/p>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th><\/th>\n      <th>startnumber<\/th>\n      <th>initials<\/th>\n      <th>city<\/th>\n      <th>country<\/th>\n      <th>result_time<\/th>\n      <th>average_speed<\/th>\n    <\/tr>\n  <\/thead>\n  <tbody>\n    <tr>\n      <th>3416<\/th>\n      <td>2367<\/td>\n      <td>AH<\/td>\n      <td>Danderyd<\/td>\n      <td>SE<\/td>\n      <td>1 days 02:37:00<\/td>\n      <td>11.834690<\/td>\n    <\/tr>\n    <tr>\n      <th>1464<\/th>\n      <td>415<\/td>\n      <td>MP<\/td>\n      <td>\u00d6stersund<\/td>\n      <td>SE<\/td>\n      <td>1 days 02:49:00<\/td>\n      <td>11.746426<\/td>\n    <\/tr>\n    <tr>\n      <th>1465<\/th>\n      <td>416<\/td>\n      <td>ZYP<\/td>\n      <td>Stockholm<\/td>\n      <td>SE<\/td>\n      <td>1 days 02:49:00<\/td>\n      <td>11.746426<\/td>\n    <\/tr>\n    <tr>\n      <th>1260<\/th>\n      <td>211<\/td>\n      <td>HB<\/td>\n      <td>Svedala<\/td>\n      <td>SE<\/td>\n      <td>1 days 03:36:00<\/td>\n      <td>11.413043<\/td>\n    <\/tr>\n    <tr>\n      <th>1109<\/th>\n      <td>60<\/td>\n      <td>KH<\/td>\n      <td>Skara<\/td>\n      <td>SE<\/td>\n      <td>1 days 03:44:00<\/td>\n      <td>11.358173<\/td>\n    <\/tr>\n  <\/tbody>\n<\/table>\n<\/div>\n <h3 id=\"finishing-times-excluding-sub-9-groups\">\n  <a href=\"#finishing-times-excluding-sub-9-groups\" class=\"header-link\">\n    Finishing times (excluding sub-9 groups)\n  <\/a>\n<\/h3><p>To get a better sense of the finishing times, I plot the histogram and get some summary statistics of the <code>result_time<\/code> column for every finisher.\nI only consider the times of the regular riders below. This means excluding the sub-9 cycling teams that start separately later on Saturday.\nWith a little less than 15h total (11.5h moving time) my group was slower than the median rider. Next time, we might want to take shorter breaks.<\/p>\n<div class=\"highlight\"><pre tabindex=\"0\" class=\"chroma\"><code class=\"language-python\" data-lang=\"python\"><span class=\"line\"><span class=\"cl\"><span class=\"n\">times<\/span> <span class=\"o\">=<\/span> <span class=\"n\">df<\/span><span class=\"o\">.<\/span><span class=\"n\">query<\/span><span class=\"p\">(<\/span><span class=\"s1\">&#39;has_finished and not is_sub9&#39;<\/span><span class=\"p\">)<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"n\">times<\/span><span class=\"p\">[<\/span><span class=\"s1\">&#39;result_time&#39;<\/span><span class=\"p\">]<\/span><span class=\"o\">.<\/span><span class=\"n\">describe<\/span><span class=\"p\">()<\/span>\n<\/span><\/span><\/code><\/pre><\/div><pre><code>count                        12175\nmean     0 days 13:55:05.829979466\nstd      0 days 02:57:47.200342638\nmin                0 days 08:02:00\n25%                0 days 11:37:00\n50%                0 days 13:44:00\n75%                0 days 15:48:00\nmax                1 days 03:44:00\nName: result_time, dtype: object\n<\/code><\/pre>\n<div class=\"highlight\"><pre tabindex=\"0\" class=\"chroma\"><code class=\"language-python\" data-lang=\"python\"><span class=\"line\"><span class=\"cl\"><span class=\"nb\">print<\/span><span class=\"p\">(<\/span><span class=\"sa\">f<\/span><span class=\"s2\">&#34;To be among the top 5% fastest finishers, you need to beat <\/span><span class=\"si\">{<\/span><span class=\"n\">times<\/span><span class=\"p\">[<\/span><span class=\"s1\">&#39;result_time&#39;<\/span><span class=\"p\">]<\/span><span class=\"o\">.<\/span><span class=\"n\">quantile<\/span><span class=\"p\">(<\/span><span class=\"n\">q<\/span><span class=\"o\">=<\/span><span class=\"mf\">0.05<\/span><span class=\"p\">)<\/span><span class=\"si\">}<\/span><span class=\"s2\">.&#34;<\/span><span class=\"p\">)<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"nb\">print<\/span><span class=\"p\">(<\/span><span class=\"sa\">f<\/span><span class=\"s2\">&#34;To be among the top 10% fastest finishers, you need to beat <\/span><span class=\"si\">{<\/span><span class=\"n\">times<\/span><span class=\"p\">[<\/span><span class=\"s1\">&#39;result_time&#39;<\/span><span class=\"p\">]<\/span><span class=\"o\">.<\/span><span class=\"n\">quantile<\/span><span class=\"p\">(<\/span><span class=\"n\">q<\/span><span class=\"o\">=<\/span><span class=\"mf\">0.1<\/span><span class=\"p\">)<\/span><span class=\"si\">}<\/span><span class=\"s2\">.&#34;<\/span><span class=\"p\">)<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"nb\">print<\/span><span class=\"p\">(<\/span><span class=\"sa\">f<\/span><span class=\"s2\">&#34;To be among the top 25% fastest finishers, you need to beat <\/span><span class=\"si\">{<\/span><span class=\"n\">times<\/span><span class=\"p\">[<\/span><span class=\"s1\">&#39;result_time&#39;<\/span><span class=\"p\">]<\/span><span class=\"o\">.<\/span><span class=\"n\">quantile<\/span><span class=\"p\">(<\/span><span class=\"n\">q<\/span><span class=\"o\">=<\/span><span class=\"mf\">0.25<\/span><span class=\"p\">)<\/span><span class=\"si\">}<\/span><span class=\"s2\">.&#34;<\/span><span class=\"p\">)<\/span>\n<\/span><\/span><\/code><\/pre><\/div><pre><code>To be among the top 5% fastest finishers, you need to beat 0 days 09:44:00.\nTo be among the top 10% fastest finishers, you need to beat 0 days 10:10:00.\nTo be among the top 25% fastest finishers, you need to beat 0 days 11:37:00.\n<\/code><\/pre>\n<p>To plot the distribution of result times, I am binning the result time in minutes and also add some vertical lines indicating the 0.1- and 0.25- percentiles. It&rsquo;s not surprising that the distribution is right-skewed with many riders that have very long finishing times, but nobody faster than 8 hours.<\/p>\n<div class=\"highlight\"><pre tabindex=\"0\" class=\"chroma\"><code class=\"language-python\" data-lang=\"python\"><span class=\"line\"><span class=\"cl\">\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"k\">def<\/span> <span class=\"nf\">minutes_to_hours_minutes<\/span><span class=\"p\">(<\/span><span class=\"n\">x<\/span><span class=\"p\">,<\/span> <span class=\"n\">pos<\/span><span class=\"p\">):<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\">    <span class=\"s2\">&#34;&#34;&#34;Define a function to convert minutes to HH:MM. Used for the axis labelling.&#34;&#34;&#34;<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\">    <span class=\"n\">hours<\/span> <span class=\"o\">=<\/span> <span class=\"nb\">int<\/span><span class=\"p\">(<\/span><span class=\"n\">x<\/span> <span class=\"o\">\/\/<\/span> <span class=\"mi\">60<\/span><span class=\"p\">)<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\">    <span class=\"n\">minutes<\/span> <span class=\"o\">=<\/span> <span class=\"nb\">int<\/span><span class=\"p\">(<\/span><span class=\"n\">x<\/span> <span class=\"o\">%<\/span> <span class=\"mi\">60<\/span><span class=\"p\">)<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\">    <span class=\"k\">return<\/span> <span class=\"sa\">f<\/span><span class=\"s2\">&#34;<\/span><span class=\"si\">{<\/span><span class=\"n\">hours<\/span><span class=\"si\">:<\/span><span class=\"s2\">02d<\/span><span class=\"si\">}<\/span><span class=\"s2\">:<\/span><span class=\"si\">{<\/span><span class=\"n\">minutes<\/span><span class=\"si\">:<\/span><span class=\"s2\">02d<\/span><span class=\"si\">}<\/span><span class=\"s2\">&#34;<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\">\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"c1\"># create histogram plot of result time distribution<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"n\">plt<\/span><span class=\"o\">.<\/span><span class=\"n\">figure<\/span><span class=\"p\">(<\/span><span class=\"n\">figsize<\/span><span class=\"o\">=<\/span><span class=\"p\">(<\/span><span class=\"mi\">10<\/span><span class=\"p\">,<\/span> <span class=\"mi\">6<\/span><span class=\"p\">))<\/span> \n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"n\">sns<\/span><span class=\"o\">.<\/span><span class=\"n\">histplot<\/span><span class=\"p\">(<\/span><span class=\"n\">data<\/span><span class=\"o\">=<\/span><span class=\"n\">times<\/span><span class=\"p\">,<\/span> <span class=\"n\">x<\/span><span class=\"o\">=<\/span><span class=\"s2\">&#34;result_time_minutes&#34;<\/span><span class=\"p\">,<\/span> <span class=\"n\">binwidth<\/span><span class=\"o\">=<\/span><span class=\"mi\">15<\/span><span class=\"p\">)<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\">\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"n\">plt<\/span><span class=\"o\">.<\/span><span class=\"n\">gca<\/span><span class=\"p\">()<\/span><span class=\"o\">.<\/span><span class=\"n\">xaxis<\/span><span class=\"o\">.<\/span><span class=\"n\">set_major_formatter<\/span><span class=\"p\">(<\/span><span class=\"n\">FuncFormatter<\/span><span class=\"p\">(<\/span><span class=\"n\">minutes_to_hours_minutes<\/span><span class=\"p\">))<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\">\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"c1\"># draw colored vertical percentile lines<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"n\">plt<\/span><span class=\"o\">.<\/span><span class=\"n\">axvline<\/span><span class=\"p\">(<\/span><span class=\"n\">times<\/span><span class=\"o\">.<\/span><span class=\"n\">result_time_minutes<\/span><span class=\"o\">.<\/span><span class=\"n\">median<\/span><span class=\"p\">(),<\/span> <span class=\"n\">color<\/span><span class=\"o\">=<\/span><span class=\"s1\">&#39;red&#39;<\/span><span class=\"p\">,<\/span> <span class=\"n\">linestyle<\/span><span class=\"o\">=<\/span><span class=\"s1\">&#39;--&#39;<\/span><span class=\"p\">,<\/span> <span class=\"n\">linewidth<\/span><span class=\"o\">=<\/span><span class=\"mf\">1.5<\/span><span class=\"p\">)<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"n\">plt<\/span><span class=\"o\">.<\/span><span class=\"n\">axvline<\/span><span class=\"p\">(<\/span><span class=\"n\">times<\/span><span class=\"o\">.<\/span><span class=\"n\">result_time_minutes<\/span><span class=\"o\">.<\/span><span class=\"n\">quantile<\/span><span class=\"p\">(<\/span><span class=\"mf\">0.25<\/span><span class=\"p\">),<\/span> <span class=\"n\">color<\/span><span class=\"o\">=<\/span><span class=\"s1\">&#39;green&#39;<\/span><span class=\"p\">,<\/span> <span class=\"n\">linestyle<\/span><span class=\"o\">=<\/span><span class=\"s1\">&#39;--&#39;<\/span><span class=\"p\">,<\/span> <span class=\"n\">linewidth<\/span><span class=\"o\">=<\/span><span class=\"mf\">1.5<\/span><span class=\"p\">)<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"n\">plt<\/span><span class=\"o\">.<\/span><span class=\"n\">axvline<\/span><span class=\"p\">(<\/span><span class=\"n\">times<\/span><span class=\"o\">.<\/span><span class=\"n\">result_time_minutes<\/span><span class=\"o\">.<\/span><span class=\"n\">quantile<\/span><span class=\"p\">(<\/span><span class=\"mf\">0.1<\/span><span class=\"p\">),<\/span> <span class=\"n\">color<\/span><span class=\"o\">=<\/span><span class=\"s1\">&#39;orange&#39;<\/span><span class=\"p\">,<\/span> <span class=\"n\">linestyle<\/span><span class=\"o\">=<\/span><span class=\"s1\">&#39;--&#39;<\/span><span class=\"p\">,<\/span> <span class=\"n\">linewidth<\/span><span class=\"o\">=<\/span><span class=\"mf\">1.5<\/span><span class=\"p\">)<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"n\">plt<\/span><span class=\"o\">.<\/span><span class=\"n\">text<\/span><span class=\"p\">(<\/span><span class=\"n\">times<\/span><span class=\"o\">.<\/span><span class=\"n\">result_time_minutes<\/span><span class=\"o\">.<\/span><span class=\"n\">median<\/span><span class=\"p\">(),<\/span> <span class=\"n\">plt<\/span><span class=\"o\">.<\/span><span class=\"n\">ylim<\/span><span class=\"p\">()[<\/span><span class=\"mi\">1<\/span><span class=\"p\">]<\/span> <span class=\"o\">*<\/span> <span class=\"mf\">0.95<\/span><span class=\"p\">,<\/span> <span class=\"sa\">f<\/span><span class=\"s1\">&#39;Median&#39;<\/span><span class=\"p\">,<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\">         <span class=\"n\">color<\/span><span class=\"o\">=<\/span><span class=\"s1\">&#39;red&#39;<\/span><span class=\"p\">,<\/span> <span class=\"n\">ha<\/span><span class=\"o\">=<\/span><span class=\"s1\">&#39;left&#39;<\/span><span class=\"p\">,<\/span> <span class=\"n\">va<\/span><span class=\"o\">=<\/span><span class=\"s1\">&#39;bottom&#39;<\/span><span class=\"p\">)<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"n\">plt<\/span><span class=\"o\">.<\/span><span class=\"n\">text<\/span><span class=\"p\">(<\/span><span class=\"n\">times<\/span><span class=\"o\">.<\/span><span class=\"n\">result_time_minutes<\/span><span class=\"o\">.<\/span><span class=\"n\">quantile<\/span><span class=\"p\">(<\/span><span class=\"mf\">0.25<\/span><span class=\"p\">),<\/span> <span class=\"n\">plt<\/span><span class=\"o\">.<\/span><span class=\"n\">ylim<\/span><span class=\"p\">()[<\/span><span class=\"mi\">1<\/span><span class=\"p\">]<\/span> <span class=\"o\">*<\/span> <span class=\"mf\">0.95<\/span><span class=\"p\">,<\/span> <span class=\"sa\">f<\/span><span class=\"s1\">&#39;Top 25%&#39;<\/span><span class=\"p\">,<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\">         <span class=\"n\">color<\/span><span class=\"o\">=<\/span><span class=\"s1\">&#39;green&#39;<\/span><span class=\"p\">,<\/span> <span class=\"n\">ha<\/span><span class=\"o\">=<\/span><span class=\"s1\">&#39;center&#39;<\/span><span class=\"p\">,<\/span> <span class=\"n\">va<\/span><span class=\"o\">=<\/span><span class=\"s1\">&#39;bottom&#39;<\/span><span class=\"p\">)<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"n\">plt<\/span><span class=\"o\">.<\/span><span class=\"n\">text<\/span><span class=\"p\">(<\/span><span class=\"n\">times<\/span><span class=\"o\">.<\/span><span class=\"n\">result_time_minutes<\/span><span class=\"o\">.<\/span><span class=\"n\">quantile<\/span><span class=\"p\">(<\/span><span class=\"mf\">0.1<\/span><span class=\"p\">),<\/span> <span class=\"n\">plt<\/span><span class=\"o\">.<\/span><span class=\"n\">ylim<\/span><span class=\"p\">()[<\/span><span class=\"mi\">1<\/span><span class=\"p\">]<\/span> <span class=\"o\">*<\/span> <span class=\"mf\">0.95<\/span><span class=\"p\">,<\/span> <span class=\"sa\">f<\/span><span class=\"s1\">&#39;Top 10%&#39;<\/span><span class=\"p\">,<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\">         <span class=\"n\">color<\/span><span class=\"o\">=<\/span><span class=\"s1\">&#39;orange&#39;<\/span><span class=\"p\">,<\/span> <span class=\"n\">ha<\/span><span class=\"o\">=<\/span><span class=\"s1\">&#39;right&#39;<\/span><span class=\"p\">,<\/span> <span class=\"n\">va<\/span><span class=\"o\">=<\/span><span class=\"s1\">&#39;bottom&#39;<\/span><span class=\"p\">);<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"n\">plt<\/span><span class=\"o\">.<\/span><span class=\"n\">xlabel<\/span><span class=\"p\">(<\/span><span class=\"s1\">&#39;Finish time (incl. breaks)&#39;<\/span><span class=\"p\">);<\/span>\n<\/span><\/span><\/code><\/pre><\/div><figure><img src=\"output_31_0.png\">\n<\/figure>\n\n<p>If you are planning to race next year&rsquo;s Vatternrundan, this distribution might give you an indication of what to expect and a time to aim for.<\/p>\n <h3 id=\"jupyter-notebook\">\n  <a href=\"#jupyter-notebook\" class=\"header-link\">\n    Jupyter Notebook\n  <\/a>\n<\/h3><p>You can find the jupyter notebook and the datasets for this post \n<a href=\"https:\/\/gitlab.com\/frankRi89\/blog\/-\/tree\/main\/notebooks\/vatternrundan_results\" \n\n    target=\"_blank\"\n    >here\n    \n        <span style=\"white-space: nowrap\">&thinsp;<svg\n            style=\"height: 0.7em; width: 0.7em; padding-left: -0.2em;\" focusable=\"false\" data-prefix=\"fas\" data-icon=\"external-link-alt\"\n            class=\"svg-inline--fa fa-external-link-alt fa-w-16\" role=\"img\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\"\n            viewBox=\"0 0 512 512\">\n            <path fill=\"currentColor\"\n                d=\"M432,320H400a16,16,0,0,0-16,16V448H64V128H208a16,16,0,0,0,16-16V80a16,16,0,0,0-16-16H48A48,48,0,0,0,0,112V464a48,48,0,0,0,48,48H400a48,48,0,0,0,48-48V336A16,16,0,0,0,432,320ZM488,0h-128c-21.37,0-32.05,25.91-17,41l35.73,35.73L135,320.37a24,24,0,0,0,0,34L157.67,377a24,24,0,0,0,34,0L435.28,133.32,471,169c15,15,41,4.5,41-17V24A24,24,0,0,0,488,0Z\">\n            <\/path>\n        <\/svg>\n    <\/span>\n        \n    \n<\/a>.<\/p>\n"},{"title":"How to better remember books","link":"\/posts\/reading-and-note-taking\/","pubDate":"Sun, 26 May 2024 00:00:00 +0000","guid":"\/posts\/reading-and-note-taking\/","description":"<p>In this article I am going to describe the system that I use to retain more information from the non-fiction books that I read. There seem to be two schools of thought about how to best retain the content of non-fiction books.\nThe first approach is best described by the quote:<\/p>\n<blockquote>\n<p>&ldquo;I cannot remember the books I\u2019ve read any more than the meals I have eaten; even so, they have made me.&rdquo;<\/p>\n<p><em>- Ralph Waldo Emerson<\/em><\/p>\n<\/blockquote>\n<p>By reading a lot, our brain retains what it finds interesting. Our prior beliefs are updated and specific ideas are enforced within us when we encounter the same or related concepts in different sources.<\/p>\n<p>Some argue that proper retention can only be achieved by actively working through the material while reading it. This second approach commonly involves taking notes of the major ideas of the book.<\/p>\n<p>While I do experience <em>surprise connection<\/em> moments occasionally, there are two reasons why I invest time into taking good notes:<\/p>\n<ol>\n<li>The first approach has an inherent survivorship bias. I do remember concepts from books and I am able to link them to other things. This always feels great. However, I don&rsquo;t know about the missed opportunities of ideas that I have forgotten and that will never resurface.<\/li>\n<li>Reading a non-fiction book is quite a significant time investment. For me it takes between a few days and several weeks. Taking at least basic notes seems to be a marginal additional cost for the certainty of improved retention and better ability to digest the book in the future.<\/li>\n<\/ol>\n<p>I use a simple multi-pass approach that involves both analog and digital notes. I mostly read physical books, because I like to browse books on my shelf that I have read in the past. However, you can easily adapt this approach to ebooks.<\/p>\n<ol>\n<li>Skim the table of contents, and try to understand the structure of the book.<\/li>\n<li>Read the book and use a pencil to mark interesting sentences or paragraphs with a line in the margin (Fig. 1). In rare cases I use double lines to highlight especially noteworthy ideas. This helps later to quickly re-familiarize with the main concepts of a chapter without having to fully re-read it. I don&rsquo;t typically write a lot of notes into the margin because there is usually not a lot of space available.<\/li>\n<\/ol>\n<p>\n\n<figure>\n  <div class=\"image-wrapper\">\n  <img src=\"notes_in_margin.jpg\" alt=\"Margin notes in book\" loading=\"lazy\" \/>\n  <figcaption>Figure 1. I use a pencil to mark relevant sentences in the margin and occasionally add some notes.<\/figcaption>\n  <\/div>\n<\/figure>\n<\/p>\n<ol start=\"3\">\n<li>After finishing one or several chapters, I review the marked paragraphs and summarize the main points of each chapter on the empty space at the beginning or end of the chapter (Fig. 2). By writing the notes into the book I get two benefits. Firstly, there is a higher chance for these analog notes to still be accessible to me in a decade. Secondly, these notes might be interesting to friends that I lend the book to.<\/li>\n<\/ol>\n<p>\n\n<figure>\n  <div class=\"image-wrapper\">\n  <img src=\"notes_in_book.jpg\" alt=\"Chapter Notes in Book\" loading=\"lazy\" \/>\n  <figcaption>Figure 2. I summarize the main points of each chapter on the blank space at the beginning or end of the chapter.<\/figcaption>\n  <\/div>\n<\/figure>\n<\/p>\n<ol start=\"4\">\n<li>Once I finished the book, I transfer the chapter notes into a markdown file (in \n<a href=\"https:\/\/joplinapp.org\/\" \n\n    target=\"_blank\"\n    >Joplin\n    \n        <span style=\"white-space: nowrap\">&thinsp;<svg\n            style=\"height: 0.7em; width: 0.7em; padding-left: -0.2em;\" focusable=\"false\" data-prefix=\"fas\" data-icon=\"external-link-alt\"\n            class=\"svg-inline--fa fa-external-link-alt fa-w-16\" role=\"img\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\"\n            viewBox=\"0 0 512 512\">\n            <path fill=\"currentColor\"\n                d=\"M432,320H400a16,16,0,0,0-16,16V448H64V128H208a16,16,0,0,0,16-16V80a16,16,0,0,0-16-16H48A48,48,0,0,0,0,112V464a48,48,0,0,0,48,48H400a48,48,0,0,0,48-48V336A16,16,0,0,0,432,320ZM488,0h-128c-21.37,0-32.05,25.91-17,41l35.73,35.73L135,320.37a24,24,0,0,0,0,34L157.67,377a24,24,0,0,0,34,0L435.28,133.32,471,169c15,15,41,4.5,41-17V24A24,24,0,0,0,488,0Z\">\n            <\/path>\n        <\/svg>\n    <\/span>\n        \n    \n<\/a>). While copying the notes I sometimes rework or modify them. I might also link to other digital notes that have a connection to the book.<\/li>\n<li>Optional: If I want to share my notes with an audience I will write about the major learnings from the notes and bring them into context with concepts from other books\/articles\/experiences.<\/li>\n<li>Optional: Most books will have some learnings that I want to periodically and actively recall. For those I create atomic \n<a href=\"https:\/\/apps.ankiweb.net\/\" \n\n    target=\"_blank\"\n    >Anki cards\n    \n        <span style=\"white-space: nowrap\">&thinsp;<svg\n            style=\"height: 0.7em; width: 0.7em; padding-left: -0.2em;\" focusable=\"false\" data-prefix=\"fas\" data-icon=\"external-link-alt\"\n            class=\"svg-inline--fa fa-external-link-alt fa-w-16\" role=\"img\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\"\n            viewBox=\"0 0 512 512\">\n            <path fill=\"currentColor\"\n                d=\"M432,320H400a16,16,0,0,0-16,16V448H64V128H208a16,16,0,0,0,16-16V80a16,16,0,0,0-16-16H48A48,48,0,0,0,0,112V464a48,48,0,0,0,48,48H400a48,48,0,0,0,48-48V336A16,16,0,0,0,432,320ZM488,0h-128c-21.37,0-32.05,25.91-17,41l35.73,35.73L135,320.37a24,24,0,0,0,0,34L157.67,377a24,24,0,0,0,34,0L435.28,133.32,471,169c15,15,41,4.5,41-17V24A24,24,0,0,0,488,0Z\">\n            <\/path>\n        <\/svg>\n    <\/span>\n        \n    \n<\/a> which will prompt me to regularly review the concepts. I am trying \n<a href=\"https:\/\/andymatuschak.org\/prompts\/\" \n\n    target=\"_blank\"\n    >to write good prompts\n    \n        <span style=\"white-space: nowrap\">&thinsp;<svg\n            style=\"height: 0.7em; width: 0.7em; padding-left: -0.2em;\" focusable=\"false\" data-prefix=\"fas\" data-icon=\"external-link-alt\"\n            class=\"svg-inline--fa fa-external-link-alt fa-w-16\" role=\"img\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\"\n            viewBox=\"0 0 512 512\">\n            <path fill=\"currentColor\"\n                d=\"M432,320H400a16,16,0,0,0-16,16V448H64V128H208a16,16,0,0,0,16-16V80a16,16,0,0,0-16-16H48A48,48,0,0,0,0,112V464a48,48,0,0,0,48,48H400a48,48,0,0,0,48-48V336A16,16,0,0,0,432,320ZM488,0h-128c-21.37,0-32.05,25.91-17,41l35.73,35.73L135,320.37a24,24,0,0,0,0,34L157.67,377a24,24,0,0,0,34,0L435.28,133.32,471,169c15,15,41,4.5,41-17V24A24,24,0,0,0,488,0Z\">\n            <\/path>\n        <\/svg>\n    <\/span>\n        \n    \n<\/a> for these cards. This can be a mix of facts like &ldquo;Where was Napoleon born?&rdquo; or higher level concepts I don&rsquo;t want to forget, e.g. &ldquo;Give an example for an infinite game as defined by J. P. Carse&rdquo;.<\/li>\n<\/ol>\n<p>The most important rule for this note taking system is that I arbitrarily break the rules for certain books and skip some of these steps when I feel they are not worth the time. However, this system ensures that I do multiple passes of the content, that I have both durable analog and searchable digital notes, and that I am periodically prompted for the most interesting concepts.<\/p>\n"},{"title":"A great weekend","link":"\/posts\/great-weekend\/","pubDate":"Sun, 25 Feb 2024 00:00:00 +0000","guid":"\/posts\/great-weekend\/","description":"<p>After a stressful work week I sometimes fall into the trap of trying to maximize how I use my days off. This makes me restless. I sometimes need to remind myself that the weekend is not about squeezing as many productively used hours into the day, but rather about just living, relaxing, and embracing a little bit of boredom and randomness in your life.<\/p>\n<p>So here is a list of activities that make a weekend great:<\/p>\n<ul>\n<li>walking in nature, e.g. hiking on a local trail or going to the nearest park or forest<\/li>\n<li>exploring a new neighborhood in your city without a particular goal in mind<\/li>\n<li>cooking a meal for your partner or friends<\/li>\n<li>completing a household chore at the beginning of your day<\/li>\n<li>meeting a friend for a coffee\/museum walk<\/li>\n<li>small talk with someone in your local community, e.g. one of your neighbors<\/li>\n<li>messaging or calling a friend who moved far away<\/li>\n<li>taking time for an activity (running, cycling, swimming, gym, local club meetup)<\/li>\n<li>playing a board game<\/li>\n<li>learning a new board game (e.g. Hive, Robinson Crusoe, Wingspan)<\/li>\n<li>watching a documentary with full attention<\/li>\n<li>fixing something broken at home, e.g. your bicycle<\/li>\n<li>reading a fiction or history book<\/li>\n<li>reading the weekly edition of a newspaper (try a section of the paper you would normally skip)<\/li>\n<li>reading or deleting interesting blog posts that you have saved on your phone for later<\/li>\n<li>listening to a music genre that you would not normally listen to<\/li>\n<li>following a yoga \/ stretching \/ meditation routine on youtube<\/li>\n<li>learning and taking notes about a topic that is <em>not<\/em> relevant for your work or career<\/li>\n<li>practicing a different language<\/li>\n<li>talking to your parents<\/li>\n<li>volunteering at a local event<\/li>\n<\/ul>\n"},{"title":"DuckDB use cases for data scientists: Querying remote S3 files","link":"\/posts\/duckdb-for-data-scientists\/","pubDate":"Sun, 25 Feb 2024 00:00:00 +0000","guid":"\/posts\/duckdb-for-data-scientists\/","description":"<p>\n<a href=\"https:\/\/duckdb.org\/\" \n\n    target=\"_blank\"\n    >DuckDB\n    \n        <span style=\"white-space: nowrap\">&thinsp;<svg\n            style=\"height: 0.7em; width: 0.7em; padding-left: -0.2em;\" focusable=\"false\" data-prefix=\"fas\" data-icon=\"external-link-alt\"\n            class=\"svg-inline--fa fa-external-link-alt fa-w-16\" role=\"img\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\"\n            viewBox=\"0 0 512 512\">\n            <path fill=\"currentColor\"\n                d=\"M432,320H400a16,16,0,0,0-16,16V448H64V128H208a16,16,0,0,0,16-16V80a16,16,0,0,0-16-16H48A48,48,0,0,0,0,112V464a48,48,0,0,0,48,48H400a48,48,0,0,0,48-48V336A16,16,0,0,0,432,320ZM488,0h-128c-21.37,0-32.05,25.91-17,41l35.73,35.73L135,320.37a24,24,0,0,0,0,34L157.67,377a24,24,0,0,0,34,0L435.28,133.32,471,169c15,15,41,4.5,41-17V24A24,24,0,0,0,488,0Z\">\n            <\/path>\n        <\/svg>\n    <\/span>\n        \n    \n<\/a> is a pretty cool <em>in-process<\/em> OLAP analytical database that I started to spin up on the fly for quick data analysis. What SQLite is to <em>Postgres<\/em>, DuckDB is to <em>Snowflake<\/em>. It is a single executable without dependencies and stores databases in local files.<\/p>\n<p>I can think of four use cases for data science work:<\/p>\n<ol>\n<li>DuckDB supports larger-than-memory workloads by loading data sequentially. You can use it to analysis datasets that are too large for Pandas (and too small to justify PySpark).<\/li>\n<li>I can query CSV, parquet, and JSON files directly from remote endpoints, e.g. S3, using SQL.<\/li>\n<li>I can replace Snowflake queries with DuckDB queries in unit \/ integration tests.<\/li>\n<li>I can set up a \n<a href=\"https:\/\/duckdb.org\/2022\/10\/12\/modern-data-stack-in-a-box.html\" \n\n    target=\"_blank\"\n    >(DuckDB + dbt)\n    \n        <span style=\"white-space: nowrap\">&thinsp;<svg\n            style=\"height: 0.7em; width: 0.7em; padding-left: -0.2em;\" focusable=\"false\" data-prefix=\"fas\" data-icon=\"external-link-alt\"\n            class=\"svg-inline--fa fa-external-link-alt fa-w-16\" role=\"img\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\"\n            viewBox=\"0 0 512 512\">\n            <path fill=\"currentColor\"\n                d=\"M432,320H400a16,16,0,0,0-16,16V448H64V128H208a16,16,0,0,0,16-16V80a16,16,0,0,0-16-16H48A48,48,0,0,0,0,112V464a48,48,0,0,0,48,48H400a48,48,0,0,0,48-48V336A16,16,0,0,0,432,320ZM488,0h-128c-21.37,0-32.05,25.91-17,41l35.73,35.73L135,320.37a24,24,0,0,0,0,34L157.67,377a24,24,0,0,0,34,0L435.28,133.32,471,169c15,15,41,4.5,41-17V24A24,24,0,0,0,488,0Z\">\n            <\/path>\n        <\/svg>\n    <\/span>\n        \n    \n<\/a> data warehouse for local development.<\/li>\n<\/ol>\n<p>I want to share here my workflow for the second use case. Inspecting parquet files in AWS S3 is a pain because I can&rsquo;t easily inspect them in the AWS console. Since a few months I use DuckDB to load, inspect, and analyse parquet files from the command line. I found this reduced my cognitive load in situations where I quickly want to check a remote file, because I don&rsquo;t have to download the parquet file and write a python script to inspect it.<\/p>\n <h2 id=\"installing-duckdb\">\n  <a href=\"#installing-duckdb\" class=\"header-link\">\n    Installing DuckDB\n  <\/a>\n<\/h2><p>I use macOS and can install DuckDB via homebrew: <code>brew install duckdb<\/code>. To work with remote files I also install the <code>httpfs<\/code> extension.<\/p>\n<p>I start the DuckDB shell with <code>duckdb<\/code> and run the SQL commands:<\/p>\n<div class=\"highlight\"><pre tabindex=\"0\" class=\"chroma\"><code class=\"language-sql\" data-lang=\"sql\"><span class=\"line\"><span class=\"cl\"><span class=\"n\">INSTALL<\/span><span class=\"w\"> <\/span><span class=\"n\">httpfs<\/span><span class=\"p\">;<\/span><span class=\"w\">\n<\/span><\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"w\"><\/span><span class=\"k\">LOAD<\/span><span class=\"w\"> <\/span><span class=\"n\">httpfs<\/span><span class=\"p\">;<\/span><span class=\"w\">\n<\/span><\/span><\/span><\/code><\/pre><\/div> <h2 id=\"authentication-with-aws\">\n  <a href=\"#authentication-with-aws\" class=\"header-link\">\n    Authentication with AWS\n  <\/a>\n<\/h2><p>To load files from S3, I need to configure AWS credentials. I am assuming here that your workplace has configured AWS SSO temporary credentials, but this work also with static credentials (<code>ACCESS_KEY_ID<\/code>, <code>SECRET_ACCESS_KEY<\/code>). There are two ways of doing this:<\/p>\n <h3 id=\"1-set-aws-credentials-in-the-session\">\n  <a href=\"#1-set-aws-credentials-in-the-session\" class=\"header-link\">\n    1. Set AWS credentials in the session:\n  <\/a>\n<\/h3><p>I can set AWS credentials inside a duck db session like this:<\/p>\n<div class=\"highlight\"><pre tabindex=\"0\" class=\"chroma\"><code class=\"language-sql\" data-lang=\"sql\"><span class=\"line\"><span class=\"cl\"><span class=\"k\">LOAD<\/span><span class=\"w\"> <\/span><span class=\"n\">httpfs<\/span><span class=\"p\">;<\/span><span class=\"w\">\n<\/span><\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"w\"><\/span><span class=\"k\">SET<\/span><span class=\"w\"> <\/span><span class=\"n\">s3_region<\/span><span class=\"w\"> <\/span><span class=\"o\">=<\/span><span class=\"w\"> <\/span><span class=\"s1\">&#39;eu-west-2&#39;<\/span><span class=\"p\">;<\/span><span class=\"w\">\n<\/span><\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"w\"><\/span><span class=\"k\">SET<\/span><span class=\"w\"> <\/span><span class=\"n\">s3_access_key_id<\/span><span class=\"w\"> <\/span><span class=\"o\">=<\/span><span class=\"s1\">&#39;???&#39;<\/span><span class=\"p\">;<\/span><span class=\"w\">\n<\/span><\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"w\"><\/span><span class=\"k\">SET<\/span><span class=\"w\"> <\/span><span class=\"n\">s3_secret_access_key<\/span><span class=\"w\"> <\/span><span class=\"o\">=<\/span><span class=\"s1\">&#39;???&#39;<\/span><span class=\"p\">;<\/span><span class=\"w\">\n<\/span><\/span><\/span><\/code><\/pre><\/div> <h3 id=\"2-let-aws-vault-handle-credentials\">\n  <a href=\"#2-let-aws-vault-handle-credentials\" class=\"header-link\">\n    2. Let aws vault handle credentials\n  <\/a>\n<\/h3><p>I use the AWS credentials management tool <code>aws vault<\/code> (see \n<a href=\"https:\/\/github.com\/99designs\/aws-vault\" \n\n    target=\"_blank\"\n    >here\n    \n        <span style=\"white-space: nowrap\">&thinsp;<svg\n            style=\"height: 0.7em; width: 0.7em; padding-left: -0.2em;\" focusable=\"false\" data-prefix=\"fas\" data-icon=\"external-link-alt\"\n            class=\"svg-inline--fa fa-external-link-alt fa-w-16\" role=\"img\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\"\n            viewBox=\"0 0 512 512\">\n            <path fill=\"currentColor\"\n                d=\"M432,320H400a16,16,0,0,0-16,16V448H64V128H208a16,16,0,0,0,16-16V80a16,16,0,0,0-16-16H48A48,48,0,0,0,0,112V464a48,48,0,0,0,48,48H400a48,48,0,0,0,48-48V336A16,16,0,0,0,432,320ZM488,0h-128c-21.37,0-32.05,25.91-17,41l35.73,35.73L135,320.37a24,24,0,0,0,0,34L157.67,377a24,24,0,0,0,34,0L435.28,133.32,471,169c15,15,41,4.5,41-17V24A24,24,0,0,0,488,0Z\">\n            <\/path>\n        <\/svg>\n    <\/span>\n        \n    \n<\/a>) which manages my temporary AWS credentials in the background and exposes them to my shell sub-process. I create the duckdb shell with <code>aws-vault exec [profile-name] | duckdb<\/code> which ensures that AWS credentials are automatically set and updated.<\/p>\n <h2 id=\"querying-parquet-files\">\n  <a href=\"#querying-parquet-files\" class=\"header-link\">\n    Querying parquet files\n  <\/a>\n<\/h2><p>Let&rsquo;s assume that I have multiple parquet files stored in an S3 bucket <code>work-project<\/code> and they all have the same schema:<\/p>\n<pre tabindex=\"0\"><code>s3:\/\/work-project\/clickdata_001.parquet\ns3:\/\/work-project\/clickdata_002.parquet\n[...]\ns3:\/\/work-project\/clickdata_100.parquet\n<\/code><\/pre><p>These files might be placed in this S3 bucket as part of an ETL pipeline before they are loaded into a table in a data warehouse. Let&rsquo;s imagine I want to investigate a bug in the pipeline and need to inspect the files.<\/p>\n <h3 id=\"inspect-schema-of-parquet-file\">\n  <a href=\"#inspect-schema-of-parquet-file\" class=\"header-link\">\n    Inspect schema of parquet file\n  <\/a>\n<\/h3><p>To familiarize myself with the schema I run <code>parquet_schema<\/code><\/p>\n<div class=\"highlight\"><pre tabindex=\"0\" class=\"chroma\"><code class=\"language-sql\" data-lang=\"sql\"><span class=\"line\"><span class=\"cl\"><span class=\"k\">LOAD<\/span><span class=\"w\"> <\/span><span class=\"n\">httpfs<\/span><span class=\"p\">;<\/span><span class=\"w\"> <\/span><span class=\"c1\">-- once at the beginning of the session\n<\/span><\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"c1\"><\/span><span class=\"k\">SELECT<\/span><span class=\"w\"> <\/span><span class=\"o\">*<\/span><span class=\"w\"> <\/span><span class=\"k\">FROM<\/span><span class=\"w\"> <\/span><span class=\"n\">parquet_schema<\/span><span class=\"p\">(<\/span><span class=\"s1\">&#39;s3:\/\/work-project\/clickdata_001.parquet&#39;<\/span><span class=\"p\">);<\/span><span class=\"w\">\n<\/span><\/span><\/span><\/code><\/pre><\/div><p>which will return:<\/p>\n<pre tabindex=\"0\"><code>clicked_at::timestamp\nuser_id::int\nevent_type::string\n<\/code><\/pre> <h3 id=\"query-a-parquet-file\">\n  <a href=\"#query-a-parquet-file\" class=\"header-link\">\n    Query a parquet file\n  <\/a>\n<\/h3><p>I can use postgres SQL dialect to query the parquet files with the <code>read_parquet<\/code> function, e.g.<\/p>\n<div class=\"highlight\"><pre tabindex=\"0\" class=\"chroma\"><code class=\"language-SQL\" data-lang=\"SQL\"><span class=\"line\"><span class=\"cl\"><span class=\"c1\">--- count number of events in the file\n<\/span><\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"c1\"><\/span><span class=\"k\">SELECT<\/span><span class=\"w\"> <\/span><span class=\"k\">count<\/span><span class=\"p\">(<\/span><span class=\"o\">*<\/span><span class=\"p\">)<\/span><span class=\"w\"> <\/span><span class=\"k\">FROM<\/span><span class=\"w\"> <\/span><span class=\"n\">read_parquet<\/span><span class=\"p\">(<\/span><span class=\"s1\">&#39;s3:\/\/work-project\/clickdata_001.parquet&#39;<\/span><span class=\"p\">);<\/span><span class=\"w\">\n<\/span><\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"w\">\n<\/span><\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"w\"><\/span><span class=\"c1\">--- find the last event time of a particular user\n<\/span><\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"c1\"><\/span><span class=\"k\">SELECT<\/span><span class=\"w\"> <\/span><span class=\"k\">max<\/span><span class=\"p\">(<\/span><span class=\"n\">clicked_at<\/span><span class=\"p\">)<\/span><span class=\"w\"> <\/span><span class=\"k\">FROM<\/span><span class=\"w\"> <\/span><span class=\"n\">read_parquet<\/span><span class=\"p\">(<\/span><span class=\"s1\">&#39;s3:\/\/work-project\/clickdata_001.parquet&#39;<\/span><span class=\"p\">)<\/span><span class=\"w\"> <\/span><span class=\"k\">where<\/span><span class=\"w\"> <\/span><span class=\"n\">user_id<\/span><span class=\"w\"> <\/span><span class=\"o\">=<\/span><span class=\"w\"> <\/span><span class=\"mi\">1234<\/span><span class=\"p\">;<\/span><span class=\"w\">\n<\/span><\/span><\/span><\/code><\/pre><\/div> <h3 id=\"query-multiple-parquet-files\">\n  <a href=\"#query-multiple-parquet-files\" class=\"header-link\">\n    Query multiple parquet files\n  <\/a>\n<\/h3><p>Say I want to find the users with the highest number of events across all files. I can use \n<a href=\"https:\/\/duckdb.org\/docs\/data\/multiple_files\/overview.html#glob-syntax\" \n\n    target=\"_blank\"\n    >glob syntax\n    \n        <span style=\"white-space: nowrap\">&thinsp;<svg\n            style=\"height: 0.7em; width: 0.7em; padding-left: -0.2em;\" focusable=\"false\" data-prefix=\"fas\" data-icon=\"external-link-alt\"\n            class=\"svg-inline--fa fa-external-link-alt fa-w-16\" role=\"img\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\"\n            viewBox=\"0 0 512 512\">\n            <path fill=\"currentColor\"\n                d=\"M432,320H400a16,16,0,0,0-16,16V448H64V128H208a16,16,0,0,0,16-16V80a16,16,0,0,0-16-16H48A48,48,0,0,0,0,112V464a48,48,0,0,0,48,48H400a48,48,0,0,0,48-48V336A16,16,0,0,0,432,320ZM488,0h-128c-21.37,0-32.05,25.91-17,41l35.73,35.73L135,320.37a24,24,0,0,0,0,34L157.67,377a24,24,0,0,0,34,0L435.28,133.32,471,169c15,15,41,4.5,41-17V24A24,24,0,0,0,488,0Z\">\n            <\/path>\n        <\/svg>\n    <\/span>\n        \n    \n<\/a> with <code>*<\/code> to run a query against all files as if they were one table:<\/p>\n<div class=\"highlight\"><pre tabindex=\"0\" class=\"chroma\"><code class=\"language-SQL\" data-lang=\"SQL\"><span class=\"line\"><span class=\"cl\"><span class=\"k\">SELECT<\/span><span class=\"w\"> <\/span><span class=\"n\">user_id<\/span><span class=\"p\">,<\/span><span class=\"w\"> <\/span><span class=\"k\">count<\/span><span class=\"p\">(<\/span><span class=\"o\">*<\/span><span class=\"p\">)<\/span><span class=\"w\"> <\/span><span class=\"k\">as<\/span><span class=\"w\"> <\/span><span class=\"n\">num_events<\/span><span class=\"w\"> <\/span><span class=\"k\">FROM<\/span><span class=\"w\"> <\/span><span class=\"n\">read_parquet<\/span><span class=\"p\">(<\/span><span class=\"s1\">&#39;s3:\/\/work-project\/*.parquet&#39;<\/span><span class=\"p\">)<\/span><span class=\"w\"> <\/span><span class=\"k\">group<\/span><span class=\"w\"> <\/span><span class=\"k\">by<\/span><span class=\"w\"> <\/span><span class=\"n\">user_id<\/span><span class=\"w\"> <\/span><span class=\"k\">order<\/span><span class=\"w\"> <\/span><span class=\"k\">by<\/span><span class=\"w\"> <\/span><span class=\"n\">num_events<\/span><span class=\"w\"> <\/span><span class=\"k\">desc<\/span><span class=\"p\">;<\/span><span class=\"w\">\n<\/span><\/span><\/span><\/code><\/pre><\/div> <h3 id=\"create-table-from-files\">\n  <a href=\"#create-table-from-files\" class=\"header-link\">\n    Create table from files\n  <\/a>\n<\/h3><p>Let&rsquo;s assume that all clickdata files together have a size of 500MB. It would be annoying if I had to download these files for every query that I want to run. Let&rsquo;s instead create a table from the files.<\/p>\n<div class=\"highlight\"><pre tabindex=\"0\" class=\"chroma\"><code class=\"language-SQL\" data-lang=\"SQL\"><span class=\"line\"><span class=\"cl\"><span class=\"k\">create<\/span><span class=\"w\"> <\/span><span class=\"k\">table<\/span><span class=\"w\"> <\/span><span class=\"n\">clickdata<\/span><span class=\"w\"> <\/span><span class=\"k\">as<\/span><span class=\"w\"> <\/span><span class=\"k\">select<\/span><span class=\"w\"> <\/span><span class=\"o\">*<\/span><span class=\"w\"> <\/span><span class=\"k\">from<\/span><span class=\"w\"> <\/span><span class=\"n\">read_parquet<\/span><span class=\"p\">(<\/span><span class=\"s1\">&#39;s3:\/\/work-project\/*.parquet&#39;<\/span><span class=\"p\">);<\/span><span class=\"w\">\n<\/span><\/span><\/span><\/code><\/pre><\/div><p>This table is stored in memory until we close the shell, which allows me to run different queries against it until I am done with my analysis, e.g.<\/p>\n<div class=\"highlight\"><pre tabindex=\"0\" class=\"chroma\"><code class=\"language-SQL\" data-lang=\"SQL\"><span class=\"line\"><span class=\"cl\"><span class=\"k\">select<\/span><span class=\"w\"> <\/span><span class=\"o\">*<\/span><span class=\"w\"> <\/span><span class=\"k\">from<\/span><span class=\"w\"> <\/span><span class=\"n\">clickdata<\/span><span class=\"p\">;<\/span><span class=\"w\">\n<\/span><\/span><\/span><\/code><\/pre><\/div><p>I can also store the table in a local database on my computer in case I need to work with the data for a longer time.\nThe following SQL statement<\/p>\n<div class=\"highlight\"><pre tabindex=\"0\" class=\"chroma\"><code class=\"language-SQL\" data-lang=\"SQL\"><span class=\"line\"><span class=\"cl\"><span class=\"n\">EXPORT<\/span><span class=\"w\"> <\/span><span class=\"k\">DATABASE<\/span><span class=\"w\"> <\/span><span class=\"s1\">&#39;mydatabase&#39;<\/span><span class=\"w\"> <\/span><span class=\"p\">(<\/span><span class=\"n\">FORMAT<\/span><span class=\"w\"> <\/span><span class=\"n\">PARQUET<\/span><span class=\"p\">);<\/span><span class=\"w\">\n<\/span><\/span><\/span><\/code><\/pre><\/div><p>will create a local folder <code>mydatabase<\/code> which stores the tables currently in memory. In a later duckdb session I can reload the table from the database using the following command<\/p>\n<div class=\"highlight\"><pre tabindex=\"0\" class=\"chroma\"><code class=\"language-SQL\" data-lang=\"SQL\"><span class=\"line\"><span class=\"cl\"><span class=\"n\">IMPORT<\/span><span class=\"w\"> <\/span><span class=\"k\">DATABASE<\/span><span class=\"w\"> <\/span><span class=\"s1\">&#39;mydatabase&#39;<\/span><span class=\"p\">;<\/span><span class=\"w\"> \n<\/span><\/span><\/span><\/code><\/pre><\/div> <h3 id=\"query-column-statistics\">\n  <a href=\"#query-column-statistics\" class=\"header-link\">\n    Query column statistics\n  <\/a>\n<\/h3><p>A nice feature of the parquet file format is that it stores \n<a href=\"https:\/\/arrow.apache.org\/docs\/python\/generated\/pyarrow.parquet.Statistics.html\" \n\n    target=\"_blank\"\n    >statistics about each column\n    \n        <span style=\"white-space: nowrap\">&thinsp;<svg\n            style=\"height: 0.7em; width: 0.7em; padding-left: -0.2em;\" focusable=\"false\" data-prefix=\"fas\" data-icon=\"external-link-alt\"\n            class=\"svg-inline--fa fa-external-link-alt fa-w-16\" role=\"img\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\"\n            viewBox=\"0 0 512 512\">\n            <path fill=\"currentColor\"\n                d=\"M432,320H400a16,16,0,0,0-16,16V448H64V128H208a16,16,0,0,0,16-16V80a16,16,0,0,0-16-16H48A48,48,0,0,0,0,112V464a48,48,0,0,0,48,48H400a48,48,0,0,0,48-48V336A16,16,0,0,0,432,320ZM488,0h-128c-21.37,0-32.05,25.91-17,41l35.73,35.73L135,320.37a24,24,0,0,0,0,34L157.67,377a24,24,0,0,0,34,0L435.28,133.32,471,169c15,15,41,4.5,41-17V24A24,24,0,0,0,488,0Z\">\n            <\/path>\n        <\/svg>\n    <\/span>\n        \n    \n<\/a>, e.g. <code>min<\/code>, <code>max<\/code>, <code>null_count<\/code>, <code>distinct_count<\/code>.<span class=\"sidenote-number\"><small class=\"sidenote\"> If you are interested, you can inspect parquet file metadata using <code>select * from parquet_metadata('file.parquet');<\/code><\/small><\/span>\nAssume the click event dataset is several GB large and I want to identify the earliest <code>clicked_at<\/code> time. I would run the query below.<\/p>\n<div class=\"highlight\"><pre tabindex=\"0\" class=\"chroma\"><code class=\"language-SQL\" data-lang=\"SQL\"><span class=\"line\"><span class=\"cl\"><span class=\"k\">SELECT<\/span><span class=\"w\"> <\/span><span class=\"k\">min<\/span><span class=\"p\">(<\/span><span class=\"n\">clicked_at<\/span><span class=\"p\">)<\/span><span class=\"w\"> <\/span><span class=\"k\">FROM<\/span><span class=\"w\"> <\/span><span class=\"n\">read_parquet<\/span><span class=\"p\">(<\/span><span class=\"s1\">&#39;s3:\/\/work-project\/*.parquet&#39;<\/span><span class=\"p\">);<\/span><span class=\"w\">\n<\/span><\/span><\/span><\/code><\/pre><\/div><p>Fortunately, DuckDB uses the column statistics stored in the parquet files to compute the answer without having to download the whole dataset from S3. Simon Willison \n<a href=\"https:\/\/til.simonwillison.net\/duckdb\/remote-parquet\" \n\n    target=\"_blank\"\n    >shows\n    \n        <span style=\"white-space: nowrap\">&thinsp;<svg\n            style=\"height: 0.7em; width: 0.7em; padding-left: -0.2em;\" focusable=\"false\" data-prefix=\"fas\" data-icon=\"external-link-alt\"\n            class=\"svg-inline--fa fa-external-link-alt fa-w-16\" role=\"img\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\"\n            viewBox=\"0 0 512 512\">\n            <path fill=\"currentColor\"\n                d=\"M432,320H400a16,16,0,0,0-16,16V448H64V128H208a16,16,0,0,0,16-16V80a16,16,0,0,0-16-16H48A48,48,0,0,0,0,112V464a48,48,0,0,0,48,48H400a48,48,0,0,0,48-48V336A16,16,0,0,0,432,320ZM488,0h-128c-21.37,0-32.05,25.91-17,41l35.73,35.73L135,320.37a24,24,0,0,0,0,34L157.67,377a24,24,0,0,0,34,0L435.28,133.32,471,169c15,15,41,4.5,41-17V24A24,24,0,0,0,488,0Z\">\n            <\/path>\n        <\/svg>\n    <\/span>\n        \n    \n<\/a> a more extreme example of this feature.<\/p>\n<p>DuckDB is a great tool for quick investigations of remotely hosted files. Especially after having configured automatic handling of AWS credentials I can spin up a duckdb shell with one command. I hope it becomes a time saver for you too.<\/p>\n"},{"title":"How to display jupyter notebooks on your hugo blog","link":"\/posts\/hugo-and-jupyter\/","pubDate":"Sat, 20 Jan 2024 00:00:00 +0000","guid":"\/posts\/hugo-and-jupyter\/","description":"<p>I like the simplicity and ease of use of Hugo, the static site generator that powers this blog.\nHowever, as a data scientist I want to be able to make an argument using code, data, and graphs in a Jupyter notebook. This post explains my setup how to convert a Jupyter notebook into a blog post for this website, such as \n<a href=\"\/posts\/covid_bipartisan_bayesian\/\" \n>this\n<\/a>. Before I started, my requirements were:<\/p>\n<ul>\n<li>I wanted to be able to easily share both the original notebook (and accompanying data) as well as the website version.<\/li>\n<li>I wanted to regenerate the website version from the notebook file with at most one command and without any post-processing. This is important because I usually work iteratively on notebooks.<\/li>\n<\/ul>\n <h2 id=\"writing-the-notebook\">\n  <a href=\"#writing-the-notebook\" class=\"header-link\">\n    Writing the notebook\n  <\/a>\n<\/h2><p>The source code for this blog is available in this \n<a href=\"https:\/\/gitlab.com\/frankRi89\/blog\" \n\n    target=\"_blank\"\n    >Gitlab repository\n    \n        <span style=\"white-space: nowrap\">&thinsp;<svg\n            style=\"height: 0.7em; width: 0.7em; padding-left: -0.2em;\" focusable=\"false\" data-prefix=\"fas\" data-icon=\"external-link-alt\"\n            class=\"svg-inline--fa fa-external-link-alt fa-w-16\" role=\"img\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\"\n            viewBox=\"0 0 512 512\">\n            <path fill=\"currentColor\"\n                d=\"M432,320H400a16,16,0,0,0-16,16V448H64V128H208a16,16,0,0,0,16-16V80a16,16,0,0,0-16-16H48A48,48,0,0,0,0,112V464a48,48,0,0,0,48,48H400a48,48,0,0,0,48-48V336A16,16,0,0,0,432,320ZM488,0h-128c-21.37,0-32.05,25.91-17,41l35.73,35.73L135,320.37a24,24,0,0,0,0,34L157.67,377a24,24,0,0,0,34,0L435.28,133.32,471,169c15,15,41,4.5,41-17V24A24,24,0,0,0,488,0Z\">\n            <\/path>\n        <\/svg>\n    <\/span>\n        \n    \n<\/a>. I created a new top-level folder <code>notebooks\/<\/code> and a subfolder for each notebook project I want to post about. This allows me to easily link to the Jupyter notebook and data to make it reproducible.<\/p>\n<p>It is important that you start the notebook with a cell that contains the front matter of the blog post, e.g.<\/p>\n<pre tabindex=\"0\"><code># My notebook title\n\nDate: 2018-06-01  \nAuthor: firstname lastname  \nCategories: category1, category2  \nTags: tag1, tag2, tag3  \n&lt;!--eofm--&gt;\n<\/code><\/pre><p>The <code>&lt;!--eofm--&gt;<\/code> is important to divide the front matter from the rest of the notebook.<\/p>\n <h2 id=\"converting-to-markdown\">\n  <a href=\"#converting-to-markdown\" class=\"header-link\">\n    Converting to markdown\n  <\/a>\n<\/h2><p>To align with the markdown-based workflow of Hugo we need to convert the notebook to markdown. The script  \n<a href=\"https:\/\/github.com\/vlunot\/nb2hugo\" \n\n    target=\"_blank\"\n    >nb2hugo\n    \n        <span style=\"white-space: nowrap\">&thinsp;<svg\n            style=\"height: 0.7em; width: 0.7em; padding-left: -0.2em;\" focusable=\"false\" data-prefix=\"fas\" data-icon=\"external-link-alt\"\n            class=\"svg-inline--fa fa-external-link-alt fa-w-16\" role=\"img\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\"\n            viewBox=\"0 0 512 512\">\n            <path fill=\"currentColor\"\n                d=\"M432,320H400a16,16,0,0,0-16,16V448H64V128H208a16,16,0,0,0,16-16V80a16,16,0,0,0-16-16H48A48,48,0,0,0,0,112V464a48,48,0,0,0,48,48H400a48,48,0,0,0,48-48V336A16,16,0,0,0,432,320ZM488,0h-128c-21.37,0-32.05,25.91-17,41l35.73,35.73L135,320.37a24,24,0,0,0,0,34L157.67,377a24,24,0,0,0,34,0L435.28,133.32,471,169c15,15,41,4.5,41-17V24A24,24,0,0,0,488,0Z\">\n            <\/path>\n        <\/svg>\n    <\/span>\n        \n    \n<\/a> does exactly that. We can install it for example via: <code>pip install nb2hugo<\/code>.<\/p>\n<p>To convert the notebook to markdown we navigate into the notebook folder and run<\/p>\n<pre tabindex=\"0\"><code>nb2hugo covid_bipartisan_bayesian.ipynb --site-dir \/Users\/rob\/hugoblog\/ --section posts\n<\/code><\/pre><p>This command writes the output markdown file <code>covid_bipartisan_bayesian.md<\/code> into the specified folder <code>posts\/<\/code>.<\/p>\n <h2 id=\"adjusting-the-website-css\">\n  <a href=\"#adjusting-the-website-css\" class=\"header-link\">\n    Adjusting the website CSS\n  <\/a>\n<\/h2><p>I made two more adjustment to the style of this blog to better display the &ldquo;markdown-ified&rdquo; notebook. Both pandas code and displayed dataframes are quite wide so I:<\/p>\n<ul>\n<li>increased the content width slightly to <code>800px<\/code>,<\/li>\n<li>decreased the font-size for the text in the code cells to <code>0.75rem<\/code>.<\/li>\n<\/ul>\n<p>While still not perfect, I think this gives a decent result.<\/p>\n"},{"title":"Reproducing Nate Silver's regression analysis on COVID death rates","link":"\/posts\/covid_bipartisan_bayesian\/","pubDate":"Mon, 20 Nov 2023 00:00:00 +0000","guid":"\/posts\/covid_bipartisan_bayesian\/","description":"<p>I spent some time recently studying causal inference methods. Two great resources for this are:<\/p>\n<ul>\n<li>\n<a href=\"https:\/\/matheusfacure.github.io\/python-causality-handbook\/landing-page.html\" \n\n    target=\"_blank\"\n    >Causal Inference for The Brave and True\n    \n        <span style=\"white-space: nowrap\">&thinsp;<svg\n            style=\"height: 0.7em; width: 0.7em; padding-left: -0.2em;\" focusable=\"false\" data-prefix=\"fas\" data-icon=\"external-link-alt\"\n            class=\"svg-inline--fa fa-external-link-alt fa-w-16\" role=\"img\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\"\n            viewBox=\"0 0 512 512\">\n            <path fill=\"currentColor\"\n                d=\"M432,320H400a16,16,0,0,0-16,16V448H64V128H208a16,16,0,0,0,16-16V80a16,16,0,0,0-16-16H48A48,48,0,0,0,0,112V464a48,48,0,0,0,48,48H400a48,48,0,0,0,48-48V336A16,16,0,0,0,432,320ZM488,0h-128c-21.37,0-32.05,25.91-17,41l35.73,35.73L135,320.37a24,24,0,0,0,0,34L157.67,377a24,24,0,0,0,34,0L435.28,133.32,471,169c15,15,41,4.5,41-17V24A24,24,0,0,0,488,0Z\">\n            <\/path>\n        <\/svg>\n    <\/span>\n        \n    \n<\/a> (frequentist view)<\/li>\n<li>\n<a href=\"https:\/\/xcelab.net\/rm\/statistical-rethinking\/\" \n\n    target=\"_blank\"\n    >Statistical Rethinking\n    \n        <span style=\"white-space: nowrap\">&thinsp;<svg\n            style=\"height: 0.7em; width: 0.7em; padding-left: -0.2em;\" focusable=\"false\" data-prefix=\"fas\" data-icon=\"external-link-alt\"\n            class=\"svg-inline--fa fa-external-link-alt fa-w-16\" role=\"img\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\"\n            viewBox=\"0 0 512 512\">\n            <path fill=\"currentColor\"\n                d=\"M432,320H400a16,16,0,0,0-16,16V448H64V128H208a16,16,0,0,0,16-16V80a16,16,0,0,0-16-16H48A48,48,0,0,0,0,112V464a48,48,0,0,0,48,48H400a48,48,0,0,0,48-48V336A16,16,0,0,0,432,320ZM488,0h-128c-21.37,0-32.05,25.91-17,41l35.73,35.73L135,320.37a24,24,0,0,0,0,34L157.67,377a24,24,0,0,0,34,0L435.28,133.32,471,169c15,15,41,4.5,41-17V24A24,24,0,0,0,488,0Z\">\n            <\/path>\n        <\/svg>\n    <\/span>\n        \n    \n<\/a> (Bayesian view).<\/li>\n<\/ul>\n<p>When I learn about an unfamiliar topic, I will always try to look for real-world examples. In this case I keep my eyes open about articles sharing statistical findings, where I can attempt to reproduce the results.<\/p>\n<p>In Nate Silver&rsquo;s recent article \n<a href=\"https:\/\/www.natesilver.net\/p\/fine-ill-run-a-regression-analysis\" \n\n    target=\"_blank\"\n    >Fine, I&rsquo;ll run a regression analysis. But it won&rsquo;t make you happy\n    \n        <span style=\"white-space: nowrap\">&thinsp;<svg\n            style=\"height: 0.7em; width: 0.7em; padding-left: -0.2em;\" focusable=\"false\" data-prefix=\"fas\" data-icon=\"external-link-alt\"\n            class=\"svg-inline--fa fa-external-link-alt fa-w-16\" role=\"img\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\"\n            viewBox=\"0 0 512 512\">\n            <path fill=\"currentColor\"\n                d=\"M432,320H400a16,16,0,0,0-16,16V448H64V128H208a16,16,0,0,0,16-16V80a16,16,0,0,0-16-16H48A48,48,0,0,0,0,112V464a48,48,0,0,0,48,48H400a48,48,0,0,0,48-48V336A16,16,0,0,0,432,320ZM488,0h-128c-21.37,0-32.05,25.91-17,41l35.73,35.73L135,320.37a24,24,0,0,0,0,34L157.67,377a24,24,0,0,0,34,0L435.28,133.32,471,169c15,15,41,4.5,41-17V24A24,24,0,0,0,488,0Z\">\n            <\/path>\n        <\/svg>\n    <\/span>\n        \n    \n<\/a> he shows the effect of state partisanship and COVID vaccination rates on COVID death rates. The models used are simple linear regressions with one to three independent variables (partisanship, age, vaccination rates).<\/p>\n<p>Silver&rsquo;s point in the article is only partially about his finding that state partisanship is a good predictor for COVID death rates in that state. Instead, he tries to dismiss critics that argue that simple models are not valid when they leave out additional variables, e.g. age, co-morbidities. While he agrees that one has to justify a particular model design, simple models can deliver true insights while standing up to scrutiny. Moreover he adds that &ldquo;there&rsquo;s a general tendency [in the profession] to overfit models.&rdquo;<\/p>\n<p>This post is not about Silver&rsquo;s point or the political aspect of the result. I simply wanted to reproduce the findings in the article.<\/p>\n<p>In his article Silver runs four linear regressions:<\/p>\n<ol>\n<li>impact of state partisanship on COVID death rates<\/li>\n<li>impact of state partisanship and state age on COVID death rates<\/li>\n<li>impact of state partisanship, state age, and state vaccination rates on COVID death rates<\/li>\n<li>impact of state age and state vaccination rates on COVID death rates<\/li>\n<\/ol>\n<p>We need to collect data for every US state on partisanship, age structure, vaccination rates, and COVID death rates. It turned out, perhaps not surprisingly, that finding and assembling the dataset from open sources took significantly more time than setting up the regressions. I had three issues when I tried to assemble the data from the sources that the article links to:<\/p>\n<ul>\n<li>one source didn&rsquo;t allow me to download the data as a file and I didn&rsquo;t want to scrape or copy-paste from the website<\/li>\n<li>one source didn&rsquo;t allow me to view the data on the same reference day as in the article<\/li>\n<li>one source didn&rsquo;t provide the data as a file for free<\/li>\n<\/ul>\n<p>So I had to use in some cases different sources.<\/p>\n <h1 id=\"building-the-data-set\">\n  <a href=\"#building-the-data-set\" class=\"header-link\">\n    Building the data set\n  <\/a>\n<\/h1><p>We are building the data set from four sources:<\/p>\n<ol>\n<li>US states COVID cases and deaths: \n<a href=\"https:\/\/github.com\/nytimes\/covid-19-data\/blob\/62ef34cfcb60214be873a38d73619da9ea57d50b\/us-states.csv\" \n\n    target=\"_blank\"\n    >https:\/\/github.com\/nytimes\/covid-19-data\/blob\/62ef34cfcb60214be873a38d73619da9ea57d50b\/us-states.csv\n    \n        <span style=\"white-space: nowrap\">&thinsp;<svg\n            style=\"height: 0.7em; width: 0.7em; padding-left: -0.2em;\" focusable=\"false\" data-prefix=\"fas\" data-icon=\"external-link-alt\"\n            class=\"svg-inline--fa fa-external-link-alt fa-w-16\" role=\"img\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\"\n            viewBox=\"0 0 512 512\">\n            <path fill=\"currentColor\"\n                d=\"M432,320H400a16,16,0,0,0-16,16V448H64V128H208a16,16,0,0,0,16-16V80a16,16,0,0,0-16-16H48A48,48,0,0,0,0,112V464a48,48,0,0,0,48,48H400a48,48,0,0,0,48-48V336A16,16,0,0,0,432,320ZM488,0h-128c-21.37,0-32.05,25.91-17,41l35.73,35.73L135,320.37a24,24,0,0,0,0,34L157.67,377a24,24,0,0,0,34,0L435.28,133.32,471,169c15,15,41,4.5,41-17V24A24,24,0,0,0,488,0Z\">\n            <\/path>\n        <\/svg>\n    <\/span>\n        \n    \n<\/a><\/li>\n<li>US states age statistics: \n<a href=\"https:\/\/www.prb.org\/resources\/which-us-states-are-the-oldest\/\" \n\n    target=\"_blank\"\n    >https:\/\/www.prb.org\/resources\/which-us-states-are-the-oldest\/\n    \n        <span style=\"white-space: nowrap\">&thinsp;<svg\n            style=\"height: 0.7em; width: 0.7em; padding-left: -0.2em;\" focusable=\"false\" data-prefix=\"fas\" data-icon=\"external-link-alt\"\n            class=\"svg-inline--fa fa-external-link-alt fa-w-16\" role=\"img\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\"\n            viewBox=\"0 0 512 512\">\n            <path fill=\"currentColor\"\n                d=\"M432,320H400a16,16,0,0,0-16,16V448H64V128H208a16,16,0,0,0,16-16V80a16,16,0,0,0-16-16H48A48,48,0,0,0,0,112V464a48,48,0,0,0,48,48H400a48,48,0,0,0,48-48V336A16,16,0,0,0,432,320ZM488,0h-128c-21.37,0-32.05,25.91-17,41l35.73,35.73L135,320.37a24,24,0,0,0,0,34L157.67,377a24,24,0,0,0,34,0L435.28,133.32,471,169c15,15,41,4.5,41-17V24A24,24,0,0,0,488,0Z\">\n            <\/path>\n        <\/svg>\n    <\/span>\n        \n    \n<\/a><\/li>\n<li>US states vaccination rates: \n<a href=\"https:\/\/web.archive.org\/web\/20241113065036\/https:\/\/ourworldindata.org\/us-states-vaccinations\" \n\n    target=\"_blank\"\n    >ourworldindata.org (archive.org version)\n    \n        <span style=\"white-space: nowrap\">&thinsp;<svg\n            style=\"height: 0.7em; width: 0.7em; padding-left: -0.2em;\" focusable=\"false\" data-prefix=\"fas\" data-icon=\"external-link-alt\"\n            class=\"svg-inline--fa fa-external-link-alt fa-w-16\" role=\"img\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\"\n            viewBox=\"0 0 512 512\">\n            <path fill=\"currentColor\"\n                d=\"M432,320H400a16,16,0,0,0-16,16V448H64V128H208a16,16,0,0,0,16-16V80a16,16,0,0,0-16-16H48A48,48,0,0,0,0,112V464a48,48,0,0,0,48,48H400a48,48,0,0,0,48-48V336A16,16,0,0,0,432,320ZM488,0h-128c-21.37,0-32.05,25.91-17,41l35.73,35.73L135,320.37a24,24,0,0,0,0,34L157.67,377a24,24,0,0,0,34,0L435.28,133.32,471,169c15,15,41,4.5,41-17V24A24,24,0,0,0,488,0Z\">\n            <\/path>\n        <\/svg>\n    <\/span>\n        \n    \n<\/a><\/li>\n<li>US states election results 2020: \n<a href=\"https:\/\/www.kaggle.com\/code\/paultimothymooney\/2020-usa-election-vote-percentages-by-state\/output\" \n\n    target=\"_blank\"\n    >https:\/\/www.kaggle.com\/code\/paultimothymooney\/2020-usa-election-vote-percentages-by-state\/output\n    \n        <span style=\"white-space: nowrap\">&thinsp;<svg\n            style=\"height: 0.7em; width: 0.7em; padding-left: -0.2em;\" focusable=\"false\" data-prefix=\"fas\" data-icon=\"external-link-alt\"\n            class=\"svg-inline--fa fa-external-link-alt fa-w-16\" role=\"img\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\"\n            viewBox=\"0 0 512 512\">\n            <path fill=\"currentColor\"\n                d=\"M432,320H400a16,16,0,0,0-16,16V448H64V128H208a16,16,0,0,0,16-16V80a16,16,0,0,0-16-16H48A48,48,0,0,0,0,112V464a48,48,0,0,0,48,48H400a48,48,0,0,0,48-48V336A16,16,0,0,0,432,320ZM488,0h-128c-21.37,0-32.05,25.91-17,41l35.73,35.73L135,320.37a24,24,0,0,0,0,34L157.67,377a24,24,0,0,0,34,0L435.28,133.32,471,169c15,15,41,4.5,41-17V24A24,24,0,0,0,488,0Z\">\n            <\/path>\n        <\/svg>\n    <\/span>\n        \n    \n<\/a><\/li>\n<\/ol>\n <h3 id=\"covid-death-rates\">\n  <a href=\"#covid-death-rates\" class=\"header-link\">\n    COVID death rates\n  <\/a>\n<\/h3><p>We want to calculate the COVID death rates (COVID deaths per 1M population) between two points in time (1. February 2021 when vaccines became widely available and 23. March 2023 as in the article). Silver links to \n<a href=\"https:\/\/www.worldometers.info\/coronavirus\/country\/us\/\" \n\n    target=\"_blank\"\n    >worldometer.info\n    \n        <span style=\"white-space: nowrap\">&thinsp;<svg\n            style=\"height: 0.7em; width: 0.7em; padding-left: -0.2em;\" focusable=\"false\" data-prefix=\"fas\" data-icon=\"external-link-alt\"\n            class=\"svg-inline--fa fa-external-link-alt fa-w-16\" role=\"img\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\"\n            viewBox=\"0 0 512 512\">\n            <path fill=\"currentColor\"\n                d=\"M432,320H400a16,16,0,0,0-16,16V448H64V128H208a16,16,0,0,0,16-16V80a16,16,0,0,0-16-16H48A48,48,0,0,0,0,112V464a48,48,0,0,0,48,48H400a48,48,0,0,0,48-48V336A16,16,0,0,0,432,320ZM488,0h-128c-21.37,0-32.05,25.91-17,41l35.73,35.73L135,320.37a24,24,0,0,0,0,34L157.67,377a24,24,0,0,0,34,0L435.28,133.32,471,169c15,15,41,4.5,41-17V24A24,24,0,0,0,488,0Z\">\n            <\/path>\n        <\/svg>\n    <\/span>\n        \n    \n<\/a> as his dataset for covid deaths per state. However, I couldn&rsquo;t find a way to download  the data in timeseries form (numbers for each day) without scraping the website. Instead we are using the equivalent numbers published by the New York Times. Let&rsquo;s load the CSV file into a pandas dataframe:<\/p>\n<div class=\"highlight\"><pre tabindex=\"0\" class=\"chroma\"><code class=\"language-python\" data-lang=\"python\"><span class=\"line\"><span class=\"cl\"><span class=\"kn\">import<\/span> <span class=\"nn\">pandas<\/span> <span class=\"k\">as<\/span> <span class=\"nn\">pd<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"kn\">import<\/span> <span class=\"nn\">numpy<\/span> <span class=\"k\">as<\/span> <span class=\"nn\">np<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"kn\">import<\/span> <span class=\"nn\">statsmodels.api<\/span> <span class=\"k\">as<\/span> <span class=\"nn\">sm<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"kn\">import<\/span> <span class=\"nn\">matplotlib.pyplot<\/span> <span class=\"k\">as<\/span> <span class=\"nn\">plt<\/span>\n<\/span><\/span><\/code><\/pre><\/div><div class=\"highlight\"><pre tabindex=\"0\" class=\"chroma\"><code class=\"language-python\" data-lang=\"python\"><span class=\"line\"><span class=\"cl\"><span class=\"n\">df<\/span> <span class=\"o\">=<\/span> <span class=\"n\">pd<\/span><span class=\"o\">.<\/span><span class=\"n\">read_csv<\/span><span class=\"p\">(<\/span><span class=\"s2\">&#34;us-states.csv&#34;<\/span><span class=\"p\">)<\/span>\n<\/span><\/span><\/code><\/pre><\/div><div class=\"highlight\"><pre tabindex=\"0\" class=\"chroma\"><code class=\"language-python\" data-lang=\"python\"><span class=\"line\"><span class=\"cl\"><span class=\"n\">df<\/span><span class=\"p\">[<\/span><span class=\"s1\">&#39;date&#39;<\/span><span class=\"p\">]<\/span> <span class=\"o\">=<\/span> <span class=\"n\">pd<\/span><span class=\"o\">.<\/span><span class=\"n\">to_datetime<\/span><span class=\"p\">(<\/span><span class=\"n\">df<\/span><span class=\"p\">[<\/span><span class=\"s1\">&#39;date&#39;<\/span><span class=\"p\">])<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"c1\"># create two series for the two dates of interest and rename the columns<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"n\">deaths_february<\/span> <span class=\"o\">=<\/span> <span class=\"n\">df<\/span><span class=\"p\">[<\/span><span class=\"n\">df<\/span><span class=\"o\">.<\/span><span class=\"n\">date<\/span> <span class=\"o\">==<\/span> <span class=\"s1\">&#39;2021-02-01&#39;<\/span><span class=\"p\">][[<\/span><span class=\"s2\">&#34;state&#34;<\/span><span class=\"p\">,<\/span> <span class=\"s2\">&#34;deaths&#34;<\/span><span class=\"p\">]]<\/span><span class=\"o\">.<\/span><span class=\"n\">rename<\/span><span class=\"p\">(<\/span><span class=\"n\">columns<\/span> <span class=\"o\">=<\/span> <span class=\"p\">{<\/span><span class=\"s2\">&#34;deaths&#34;<\/span> <span class=\"p\">:<\/span> <span class=\"s2\">&#34;deaths_2021_02_01&#34;<\/span><span class=\"p\">})<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"n\">deaths_last<\/span> <span class=\"o\">=<\/span> <span class=\"n\">df<\/span><span class=\"p\">[<\/span><span class=\"n\">df<\/span><span class=\"o\">.<\/span><span class=\"n\">date<\/span> <span class=\"o\">==<\/span> <span class=\"s1\">&#39;2023-03-23&#39;<\/span><span class=\"p\">][[<\/span><span class=\"s2\">&#34;state&#34;<\/span><span class=\"p\">,<\/span> <span class=\"s2\">&#34;deaths&#34;<\/span><span class=\"p\">]]<\/span><span class=\"o\">.<\/span><span class=\"n\">rename<\/span><span class=\"p\">(<\/span><span class=\"n\">columns<\/span> <span class=\"o\">=<\/span> <span class=\"p\">{<\/span><span class=\"s2\">&#34;deaths&#34;<\/span> <span class=\"p\">:<\/span> <span class=\"s2\">&#34;deaths_2023_03_23&#34;<\/span><span class=\"p\">})<\/span>\n<\/span><\/span><\/code><\/pre><\/div><p>With the two pandas series <code>deaths_february<\/code> and <code>deaths_last<\/code> we can calculate the number of death in this time period.<\/p>\n<div class=\"highlight\"><pre tabindex=\"0\" class=\"chroma\"><code class=\"language-python\" data-lang=\"python\"><span class=\"line\"><span class=\"cl\"><span class=\"n\">deaths<\/span> <span class=\"o\">=<\/span> <span class=\"n\">deaths_last<\/span><span class=\"o\">.<\/span><span class=\"n\">merge<\/span><span class=\"p\">(<\/span><span class=\"n\">deaths_february<\/span><span class=\"p\">,<\/span> <span class=\"n\">how<\/span> <span class=\"o\">=<\/span> <span class=\"s1\">&#39;left&#39;<\/span><span class=\"p\">,<\/span> <span class=\"n\">on<\/span> <span class=\"o\">=<\/span> <span class=\"s1\">&#39;state&#39;<\/span><span class=\"p\">)<\/span><span class=\"o\">.<\/span><span class=\"n\">dropna<\/span><span class=\"p\">(<\/span><span class=\"n\">axis<\/span> <span class=\"o\">=<\/span> <span class=\"s1\">&#39;index&#39;<\/span><span class=\"p\">)<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"n\">deaths<\/span><span class=\"p\">[<\/span><span class=\"s2\">&#34;deaths_after_vaccine&#34;<\/span><span class=\"p\">]<\/span> <span class=\"o\">=<\/span> <span class=\"n\">deaths<\/span><span class=\"p\">[<\/span><span class=\"s2\">&#34;deaths_2023_03_23&#34;<\/span><span class=\"p\">]<\/span> <span class=\"o\">-<\/span> <span class=\"n\">deaths<\/span><span class=\"p\">[<\/span><span class=\"s2\">&#34;deaths_2021_02_01&#34;<\/span><span class=\"p\">]<\/span>\n<\/span><\/span><\/code><\/pre><\/div><p>Now we can look at the state numbers for COVID deaths between 01.02.2021 and 23.03.2023:<\/p>\n<div class=\"highlight\"><pre tabindex=\"0\" class=\"chroma\"><code class=\"language-python\" data-lang=\"python\"><span class=\"line\"><span class=\"cl\"><span class=\"n\">deaths<\/span>\n<\/span><\/span><\/code><\/pre><\/div><div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n<pre><code>.dataframe tbody tr th {\n    vertical-align: top;\n}\n\n.dataframe thead th {\n    text-align: right;\n}\n<\/code><\/pre>\n<p><\/style><\/p>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th><\/th>\n      <th>state<\/th>\n      <th>deaths_2023_03_23<\/th>\n      <th>deaths_2021_02_01<\/th>\n      <th>deaths_after_vaccine<\/th>\n    <\/tr>\n  <\/thead>\n  <tbody>\n    <tr>\n      <th>0<\/th>\n      <td>Alabama<\/td>\n      <td>21631<\/td>\n      <td>7688.0<\/td>\n      <td>13943.0<\/td>\n    <\/tr>\n    <tr>\n      <th>1<\/th>\n      <td>Alaska<\/td>\n      <td>1438<\/td>\n      <td>253.0<\/td>\n      <td>1185.0<\/td>\n    <\/tr>\n    <tr>\n      <th>3<\/th>\n      <td>Arizona<\/td>\n      <td>33190<\/td>\n      <td>13124.0<\/td>\n      <td>20066.0<\/td>\n    <\/tr>\n    <tr>\n      <th>4<\/th>\n      <td>Arkansas<\/td>\n      <td>13068<\/td>\n      <td>4895.0<\/td>\n      <td>8173.0<\/td>\n    <\/tr>\n    <tr>\n      <th>5<\/th>\n      <td>California<\/td>\n      <td>104277<\/td>\n      <td>41284.0<\/td>\n      <td>62993.0<\/td>\n    <\/tr>\n    <tr>\n      <th>6<\/th>\n      <td>Colorado<\/td>\n      <td>14245<\/td>\n      <td>5737.0<\/td>\n      <td>8508.0<\/td>\n    <\/tr>\n    <tr>\n      <th>7<\/th>\n      <td>Connecticut<\/td>\n      <td>12270<\/td>\n      <td>7119.0<\/td>\n      <td>5151.0<\/td>\n    <\/tr>\n    <tr>\n      <th>8<\/th>\n      <td>Delaware<\/td>\n      <td>3352<\/td>\n      <td>1101.0<\/td>\n      <td>2251.0<\/td>\n    <\/tr>\n    <tr>\n      <th>9<\/th>\n      <td>District of Columbia<\/td>\n      <td>1432<\/td>\n      <td>916.0<\/td>\n      <td>516.0<\/td>\n    <\/tr>\n    <tr>\n      <th>10<\/th>\n      <td>Florida<\/td>\n      <td>87141<\/td>\n      <td>26684.0<\/td>\n      <td>60457.0<\/td>\n    <\/tr>\n    <tr>\n      <th>11<\/th>\n      <td>Georgia<\/td>\n      <td>41055<\/td>\n      <td>13821.0<\/td>\n      <td>27234.0<\/td>\n    <\/tr>\n    <tr>\n      <th>12<\/th>\n      <td>Guam<\/td>\n      <td>416<\/td>\n      <td>130.0<\/td>\n      <td>286.0<\/td>\n    <\/tr>\n    <tr>\n      <th>13<\/th>\n      <td>Hawaii<\/td>\n      <td>1851<\/td>\n      <td>407.0<\/td>\n      <td>1444.0<\/td>\n    <\/tr>\n    <tr>\n      <th>14<\/th>\n      <td>Idaho<\/td>\n      <td>5456<\/td>\n      <td>1737.0<\/td>\n      <td>3719.0<\/td>\n    <\/tr>\n    <tr>\n      <th>15<\/th>\n      <td>Illinois<\/td>\n      <td>41618<\/td>\n      <td>21273.0<\/td>\n      <td>20345.0<\/td>\n    <\/tr>\n    <tr>\n      <th>16<\/th>\n      <td>Indiana<\/td>\n      <td>26179<\/td>\n      <td>9989.0<\/td>\n      <td>16190.0<\/td>\n    <\/tr>\n    <tr>\n      <th>17<\/th>\n      <td>Iowa<\/td>\n      <td>10770<\/td>\n      <td>4906.0<\/td>\n      <td>5864.0<\/td>\n    <\/tr>\n    <tr>\n      <th>18<\/th>\n      <td>Kansas<\/td>\n      <td>10232<\/td>\n      <td>3809.0<\/td>\n      <td>6423.0<\/td>\n    <\/tr>\n    <tr>\n      <th>19<\/th>\n      <td>Kentucky<\/td>\n      <td>18348<\/td>\n      <td>3995.0<\/td>\n      <td>14353.0<\/td>\n    <\/tr>\n    <tr>\n      <th>20<\/th>\n      <td>Louisiana<\/td>\n      <td>18835<\/td>\n      <td>8912.0<\/td>\n      <td>9923.0<\/td>\n    <\/tr>\n    <tr>\n      <th>21<\/th>\n      <td>Maine<\/td>\n      <td>2981<\/td>\n      <td>595.0<\/td>\n      <td>2386.0<\/td>\n    <\/tr>\n    <tr>\n      <th>22<\/th>\n      <td>Maryland<\/td>\n      <td>16672<\/td>\n      <td>7154.0<\/td>\n      <td>9518.0<\/td>\n    <\/tr>\n    <tr>\n      <th>23<\/th>\n      <td>Massachusetts<\/td>\n      <td>24441<\/td>\n      <td>14607.0<\/td>\n      <td>9834.0<\/td>\n    <\/tr>\n    <tr>\n      <th>24<\/th>\n      <td>Michigan<\/td>\n      <td>42311<\/td>\n      <td>15527.0<\/td>\n      <td>26784.0<\/td>\n    <\/tr>\n    <tr>\n      <th>25<\/th>\n      <td>Minnesota<\/td>\n      <td>14964<\/td>\n      <td>6270.0<\/td>\n      <td>8694.0<\/td>\n    <\/tr>\n    <tr>\n      <th>26<\/th>\n      <td>Mississippi<\/td>\n      <td>13431<\/td>\n      <td>6056.0<\/td>\n      <td>7375.0<\/td>\n    <\/tr>\n    <tr>\n      <th>27<\/th>\n      <td>Missouri<\/td>\n      <td>23998<\/td>\n      <td>7182.0<\/td>\n      <td>16816.0<\/td>\n    <\/tr>\n    <tr>\n      <th>28<\/th>\n      <td>Montana<\/td>\n      <td>3701<\/td>\n      <td>1235.0<\/td>\n      <td>2466.0<\/td>\n    <\/tr>\n    <tr>\n      <th>29<\/th>\n      <td>Nebraska<\/td>\n      <td>5068<\/td>\n      <td>2031.0<\/td>\n      <td>3037.0<\/td>\n    <\/tr>\n    <tr>\n      <th>30<\/th>\n      <td>Nevada<\/td>\n      <td>12093<\/td>\n      <td>4281.0<\/td>\n      <td>7812.0<\/td>\n    <\/tr>\n    <tr>\n      <th>31<\/th>\n      <td>New Hampshire<\/td>\n      <td>3018<\/td>\n      <td>1059.0<\/td>\n      <td>1959.0<\/td>\n    <\/tr>\n    <tr>\n      <th>32<\/th>\n      <td>New Jersey<\/td>\n      <td>36097<\/td>\n      <td>21513.0<\/td>\n      <td>14584.0<\/td>\n    <\/tr>\n    <tr>\n      <th>33<\/th>\n      <td>New Mexico<\/td>\n      <td>9110<\/td>\n      <td>3297.0<\/td>\n      <td>5813.0<\/td>\n    <\/tr>\n    <tr>\n      <th>34<\/th>\n      <td>New York<\/td>\n      <td>80109<\/td>\n      <td>43354.0<\/td>\n      <td>36755.0<\/td>\n    <\/tr>\n    <tr>\n      <th>35<\/th>\n      <td>North Carolina<\/td>\n      <td>29746<\/td>\n      <td>9397.0<\/td>\n      <td>20349.0<\/td>\n    <\/tr>\n    <tr>\n      <th>36<\/th>\n      <td>North Dakota<\/td>\n      <td>2529<\/td>\n      <td>1447.0<\/td>\n      <td>1082.0<\/td>\n    <\/tr>\n    <tr>\n      <th>37<\/th>\n      <td>Northern Mariana Islands<\/td>\n      <td>41<\/td>\n      <td>2.0<\/td>\n      <td>39.0<\/td>\n    <\/tr>\n    <tr>\n      <th>38<\/th>\n      <td>Ohio<\/td>\n      <td>42061<\/td>\n      <td>11233.0<\/td>\n      <td>30828.0<\/td>\n    <\/tr>\n    <tr>\n      <th>39<\/th>\n      <td>Oklahoma<\/td>\n      <td>16549<\/td>\n      <td>3564.0<\/td>\n      <td>12985.0<\/td>\n    <\/tr>\n    <tr>\n      <th>40<\/th>\n      <td>Oregon<\/td>\n      <td>9451<\/td>\n      <td>1980.0<\/td>\n      <td>7471.0<\/td>\n    <\/tr>\n    <tr>\n      <th>41<\/th>\n      <td>Pennsylvania<\/td>\n      <td>50701<\/td>\n      <td>21741.0<\/td>\n      <td>28960.0<\/td>\n    <\/tr>\n    <tr>\n      <th>42<\/th>\n      <td>Puerto Rico<\/td>\n      <td>5848<\/td>\n      <td>1836.0<\/td>\n      <td>4012.0<\/td>\n    <\/tr>\n    <tr>\n      <th>43<\/th>\n      <td>Rhode Island<\/td>\n      <td>3915<\/td>\n      <td>2173.0<\/td>\n      <td>1742.0<\/td>\n    <\/tr>\n    <tr>\n      <th>44<\/th>\n      <td>South Carolina<\/td>\n      <td>20192<\/td>\n      <td>7283.0<\/td>\n      <td>12909.0<\/td>\n    <\/tr>\n    <tr>\n      <th>45<\/th>\n      <td>South Dakota<\/td>\n      <td>3222<\/td>\n      <td>1778.0<\/td>\n      <td>1444.0<\/td>\n    <\/tr>\n    <tr>\n      <th>46<\/th>\n      <td>Tennessee<\/td>\n      <td>29035<\/td>\n      <td>9660.0<\/td>\n      <td>19375.0<\/td>\n    <\/tr>\n    <tr>\n      <th>47<\/th>\n      <td>Texas<\/td>\n      <td>94518<\/td>\n      <td>37405.0<\/td>\n      <td>57113.0<\/td>\n    <\/tr>\n    <tr>\n      <th>48<\/th>\n      <td>Utah<\/td>\n      <td>5316<\/td>\n      <td>1669.0<\/td>\n      <td>3647.0<\/td>\n    <\/tr>\n    <tr>\n      <th>49<\/th>\n      <td>Vermont<\/td>\n      <td>939<\/td>\n      <td>175.0<\/td>\n      <td>764.0<\/td>\n    <\/tr>\n    <tr>\n      <th>50<\/th>\n      <td>Virgin Islands<\/td>\n      <td>130<\/td>\n      <td>24.0<\/td>\n      <td>106.0<\/td>\n    <\/tr>\n    <tr>\n      <th>51<\/th>\n      <td>Virginia<\/td>\n      <td>23782<\/td>\n      <td>6474.0<\/td>\n      <td>17308.0<\/td>\n    <\/tr>\n    <tr>\n      <th>52<\/th>\n      <td>Washington<\/td>\n      <td>15905<\/td>\n      <td>4404.0<\/td>\n      <td>11501.0<\/td>\n    <\/tr>\n    <tr>\n      <th>53<\/th>\n      <td>West Virginia<\/td>\n      <td>8132<\/td>\n      <td>2028.0<\/td>\n      <td>6104.0<\/td>\n    <\/tr>\n    <tr>\n      <th>54<\/th>\n      <td>Wisconsin<\/td>\n      <td>16485<\/td>\n      <td>6439.0<\/td>\n      <td>10046.0<\/td>\n    <\/tr>\n    <tr>\n      <th>55<\/th>\n      <td>Wyoming<\/td>\n      <td>2014<\/td>\n      <td>596.0<\/td>\n      <td>1418.0<\/td>\n    <\/tr>\n  <\/tbody>\n<\/table>\n<\/div>\n <h3 id=\"age-characteristics\">\n  <a href=\"#age-characteristics\" class=\"header-link\">\n    Age characteristics\n  <\/a>\n<\/h3><p>We load data for US states population age characteristics and calculate the proportion of the state population equal or older than 65. We would expect that states with an older population are more severely affected by COVID (all else being equal).<\/p>\n<div class=\"highlight\"><pre tabindex=\"0\" class=\"chroma\"><code class=\"language-python\" data-lang=\"python\"><span class=\"line\"><span class=\"cl\"><span class=\"n\">states_population<\/span> <span class=\"o\">=<\/span> <span class=\"n\">pd<\/span><span class=\"o\">.<\/span><span class=\"n\">read_csv<\/span><span class=\"p\">(<\/span><span class=\"s2\">&#34;states_population_age_2020.csv&#34;<\/span><span class=\"p\">,<\/span> <span class=\"n\">thousands<\/span><span class=\"o\">=<\/span><span class=\"s1\">&#39;,&#39;<\/span><span class=\"p\">)<\/span><span class=\"o\">.<\/span><span class=\"n\">drop<\/span><span class=\"p\">([<\/span><span class=\"s2\">&#34;Rank&#34;<\/span><span class=\"p\">,<\/span> <span class=\"s2\">&#34;Population Ages 65+ (percent of state population)&#34;<\/span><span class=\"p\">],<\/span> <span class=\"n\">axis<\/span> <span class=\"o\">=<\/span> <span class=\"mi\">1<\/span><span class=\"p\">)<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"n\">states_population<\/span> <span class=\"o\">=<\/span> <span class=\"n\">states_population<\/span><span class=\"o\">.<\/span><span class=\"n\">rename<\/span><span class=\"p\">(<\/span><span class=\"n\">columns<\/span> <span class=\"o\">=<\/span> <span class=\"p\">{<\/span><span class=\"s2\">&#34;State&#34;<\/span> <span class=\"p\">:<\/span> <span class=\"s2\">&#34;state&#34;<\/span><span class=\"p\">,<\/span> <span class=\"s2\">&#34;Total Resident Population (thousands)&#34;<\/span><span class=\"p\">:<\/span> <span class=\"s2\">&#34;total_population_thousands&#34;<\/span><span class=\"p\">,<\/span> <span class=\"s2\">&#34;Population Ages 65+ (thousands)&#34;<\/span> <span class=\"p\">:<\/span> <span class=\"s2\">&#34;population_age65_thousands&#34;<\/span><span class=\"p\">})<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"n\">states_population<\/span><span class=\"p\">[<\/span><span class=\"s2\">&#34;perc_over_65&#34;<\/span><span class=\"p\">]<\/span> <span class=\"o\">=<\/span>  <span class=\"n\">states_population<\/span><span class=\"p\">[<\/span><span class=\"s2\">&#34;population_age65_thousands&#34;<\/span><span class=\"p\">]<\/span> <span class=\"o\">\/<\/span> <span class=\"n\">states_population<\/span><span class=\"p\">[<\/span><span class=\"s2\">&#34;total_population_thousands&#34;<\/span><span class=\"p\">]<\/span> <span class=\"o\">*<\/span> <span class=\"mf\">100.0<\/span>\n<\/span><\/span><\/code><\/pre><\/div><p>Next we add the <code>perc_over_65<\/code> column to the deaths dataframe.<\/p>\n<div class=\"highlight\"><pre tabindex=\"0\" class=\"chroma\"><code class=\"language-python\" data-lang=\"python\"><span class=\"line\"><span class=\"cl\"><span class=\"c1\"># drop due to missing population data for DC, Guam, Puerto Rico, Northern Mariana Islands, Virgin Islands<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"n\">deaths_and_states<\/span> <span class=\"o\">=<\/span> <span class=\"n\">deaths<\/span><span class=\"o\">.<\/span><span class=\"n\">merge<\/span><span class=\"p\">(<\/span><span class=\"n\">states_population<\/span><span class=\"p\">[[<\/span><span class=\"s2\">&#34;state&#34;<\/span><span class=\"p\">,<\/span> <span class=\"s2\">&#34;total_population_thousands&#34;<\/span><span class=\"p\">,<\/span> <span class=\"s2\">&#34;perc_over_65&#34;<\/span><span class=\"p\">]],<\/span> <span class=\"n\">on<\/span> <span class=\"o\">=<\/span> <span class=\"s1\">&#39;state&#39;<\/span><span class=\"p\">,<\/span> <span class=\"n\">how<\/span> <span class=\"o\">=<\/span> <span class=\"s1\">&#39;left&#39;<\/span><span class=\"p\">)<\/span><span class=\"o\">.<\/span><span class=\"n\">dropna<\/span><span class=\"p\">()<\/span>\n<\/span><\/span><\/code><\/pre><\/div><p>Moreover, to use the same variable scaling as in the article, we scale the <code>deaths_after_vaccine<\/code> variable from absolute numbers to be COVID deaths per million population.<\/p>\n<div class=\"highlight\"><pre tabindex=\"0\" class=\"chroma\"><code class=\"language-python\" data-lang=\"python\"><span class=\"line\"><span class=\"cl\"><span class=\"n\">deaths_and_states<\/span><span class=\"p\">[<\/span><span class=\"s2\">&#34;deaths_per_million&#34;<\/span><span class=\"p\">]<\/span> <span class=\"o\">=<\/span> <span class=\"n\">deaths_and_states<\/span><span class=\"p\">[<\/span><span class=\"s2\">&#34;deaths_after_vaccine&#34;<\/span><span class=\"p\">]<\/span> <span class=\"o\">\/<\/span> <span class=\"p\">((<\/span><span class=\"n\">deaths_and_states<\/span><span class=\"p\">[<\/span><span class=\"s2\">&#34;total_population_thousands&#34;<\/span><span class=\"p\">]<\/span> <span class=\"o\">*<\/span> <span class=\"mi\">1000<\/span><span class=\"p\">)<\/span> <span class=\"o\">\/<\/span> <span class=\"mi\">1_000_000<\/span> <span class=\"p\">)<\/span>\n<\/span><\/span><\/code><\/pre><\/div><div class=\"highlight\"><pre tabindex=\"0\" class=\"chroma\"><code class=\"language-python\" data-lang=\"python\"><span class=\"line\"><span class=\"cl\"><span class=\"c1\"># dataframe with deaths, population, and age variable<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"n\">deaths_and_states<\/span><span class=\"o\">.<\/span><span class=\"n\">head<\/span><span class=\"p\">(<\/span><span class=\"mi\">5<\/span><span class=\"p\">)<\/span>\n<\/span><\/span><\/code><\/pre><\/div><div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n<pre><code>.dataframe tbody tr th {\n    vertical-align: top;\n}\n\n.dataframe thead th {\n    text-align: right;\n}\n<\/code><\/pre>\n<p><\/style><\/p>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th><\/th>\n      <th>state<\/th>\n      <th>deaths_2023_03_23<\/th>\n      <th>deaths_2021_02_01<\/th>\n      <th>deaths_after_vaccine<\/th>\n      <th>total_population_thousands<\/th>\n      <th>perc_over_65<\/th>\n      <th>deaths_per_million<\/th>\n    <\/tr>\n  <\/thead>\n  <tbody>\n    <tr>\n      <th>0<\/th>\n      <td>Alabama<\/td>\n      <td>21631<\/td>\n      <td>7688.0<\/td>\n      <td>13943.0<\/td>\n      <td>4922.0<\/td>\n      <td>17.757009<\/td>\n      <td>2832.791548<\/td>\n    <\/tr>\n    <tr>\n      <th>1<\/th>\n      <td>Alaska<\/td>\n      <td>1438<\/td>\n      <td>253.0<\/td>\n      <td>1185.0<\/td>\n      <td>731.0<\/td>\n      <td>13.132695<\/td>\n      <td>1621.067031<\/td>\n    <\/tr>\n    <tr>\n      <th>2<\/th>\n      <td>Arizona<\/td>\n      <td>33190<\/td>\n      <td>13124.0<\/td>\n      <td>20066.0<\/td>\n      <td>7421.0<\/td>\n      <td>18.515025<\/td>\n      <td>2703.948255<\/td>\n    <\/tr>\n    <tr>\n      <th>3<\/th>\n      <td>Arkansas<\/td>\n      <td>13068<\/td>\n      <td>4895.0<\/td>\n      <td>8173.0<\/td>\n      <td>3031.0<\/td>\n      <td>17.683933<\/td>\n      <td>2696.469812<\/td>\n    <\/tr>\n    <tr>\n      <th>4<\/th>\n      <td>California<\/td>\n      <td>104277<\/td>\n      <td>41284.0<\/td>\n      <td>62993.0<\/td>\n      <td>39368.0<\/td>\n      <td>15.179841<\/td>\n      <td>1600.106686<\/td>\n    <\/tr>\n  <\/tbody>\n<\/table>\n<\/div>\n <h3 id=\"presidential-election-results\">\n  <a href=\"#presidential-election-results\" class=\"header-link\">\n    Presidential election results\n  <\/a>\n<\/h3><p>Similar to the article we want to generate a variable <code>biden<\/code> which indicates Joe Biden&rsquo;s margin of victory over Donald Trump in the 2020 elections.<\/p>\n<div class=\"highlight\"><pre tabindex=\"0\" class=\"chroma\"><code class=\"language-python\" data-lang=\"python\"><span class=\"line\"><span class=\"cl\"><span class=\"c1\"># load election results by state for 2020 presidential election<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"n\">election_results<\/span> <span class=\"o\">=<\/span> <span class=\"n\">pd<\/span><span class=\"o\">.<\/span><span class=\"n\">read_csv<\/span><span class=\"p\">(<\/span><span class=\"s2\">&#34;democratic_vs_republican_votes_by_usa_state_2020.csv&#34;<\/span><span class=\"p\">)<\/span><span class=\"o\">.<\/span><span class=\"n\">drop<\/span><span class=\"p\">(<\/span><span class=\"s2\">&#34;usa_state_code&#34;<\/span><span class=\"p\">,<\/span> <span class=\"n\">axis<\/span> <span class=\"o\">=<\/span> <span class=\"mi\">1<\/span><span class=\"p\">)<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"n\">election_results<\/span><span class=\"p\">[<\/span><span class=\"s2\">&#34;biden&#34;<\/span><span class=\"p\">]<\/span> <span class=\"o\">=<\/span> <span class=\"n\">election_results<\/span><span class=\"p\">[<\/span><span class=\"s2\">&#34;percent_democrat&#34;<\/span><span class=\"p\">]<\/span> <span class=\"o\">-<\/span> <span class=\"mi\">50<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"n\">election_results<\/span><span class=\"o\">.<\/span><span class=\"n\">head<\/span><span class=\"p\">(<\/span><span class=\"mi\">10<\/span><span class=\"p\">)<\/span>\n<\/span><\/span><\/code><\/pre><\/div><div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n<pre><code>.dataframe tbody tr th {\n    vertical-align: top;\n}\n\n.dataframe thead th {\n    text-align: right;\n}\n<\/code><\/pre>\n<p><\/style><\/p>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th><\/th>\n      <th>state<\/th>\n      <th>DEM<\/th>\n      <th>REP<\/th>\n      <th>usa_state<\/th>\n      <th>percent_democrat<\/th>\n      <th>biden<\/th>\n    <\/tr>\n  <\/thead>\n  <tbody>\n    <tr>\n      <th>0<\/th>\n      <td>Alabama<\/td>\n      <td>843473<\/td>\n      <td>1434159<\/td>\n      <td>Alabama<\/td>\n      <td>37.032892<\/td>\n      <td>-12.967108<\/td>\n    <\/tr>\n    <tr>\n      <th>1<\/th>\n      <td>Alaska<\/td>\n      <td>45758<\/td>\n      <td>80999<\/td>\n      <td>Alaska<\/td>\n      <td>36.098993<\/td>\n      <td>-13.901007<\/td>\n    <\/tr>\n    <tr>\n      <th>2<\/th>\n      <td>Arizona<\/td>\n      <td>1643664<\/td>\n      <td>1626679<\/td>\n      <td>Arizona<\/td>\n      <td>50.259682<\/td>\n      <td>0.259682<\/td>\n    <\/tr>\n    <tr>\n      <th>3<\/th>\n      <td>Arkansas<\/td>\n      <td>420985<\/td>\n      <td>761251<\/td>\n      <td>Arkansas<\/td>\n      <td>35.609218<\/td>\n      <td>-14.390782<\/td>\n    <\/tr>\n    <tr>\n      <th>4<\/th>\n      <td>California<\/td>\n      <td>9315259<\/td>\n      <td>4812735<\/td>\n      <td>California<\/td>\n      <td>65.934760<\/td>\n      <td>15.934760<\/td>\n    <\/tr>\n    <tr>\n      <th>5<\/th>\n      <td>Colorado<\/td>\n      <td>1753416<\/td>\n      <td>1335253<\/td>\n      <td>Colorado<\/td>\n      <td>56.769307<\/td>\n      <td>6.769307<\/td>\n    <\/tr>\n    <tr>\n      <th>6<\/th>\n      <td>Connecticut<\/td>\n      <td>1059252<\/td>\n      <td>699079<\/td>\n      <td>Connecticut<\/td>\n      <td>60.241900<\/td>\n      <td>10.241900<\/td>\n    <\/tr>\n    <tr>\n      <th>7<\/th>\n      <td>Delaware<\/td>\n      <td>295413<\/td>\n      <td>199857<\/td>\n      <td>Delaware<\/td>\n      <td>59.646859<\/td>\n      <td>9.646859<\/td>\n    <\/tr>\n    <tr>\n      <th>8<\/th>\n      <td>District of Columbia<\/td>\n      <td>258561<\/td>\n      <td>14449<\/td>\n      <td>District of Columbia<\/td>\n      <td>94.707520<\/td>\n      <td>44.707520<\/td>\n    <\/tr>\n    <tr>\n      <th>9<\/th>\n      <td>Florida<\/td>\n      <td>5294767<\/td>\n      <td>5667834<\/td>\n      <td>Florida<\/td>\n      <td>48.298456<\/td>\n      <td>-1.701544<\/td>\n    <\/tr>\n  <\/tbody>\n<\/table>\n<\/div>\n<div class=\"highlight\"><pre tabindex=\"0\" class=\"chroma\"><code class=\"language-python\" data-lang=\"python\"><span class=\"line\"><span class=\"cl\"><span class=\"c1\"># merge to deaths dataframe<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"n\">death_and_election<\/span> <span class=\"o\">=<\/span> <span class=\"n\">deaths_and_states<\/span><span class=\"o\">.<\/span><span class=\"n\">merge<\/span><span class=\"p\">(<\/span><span class=\"n\">election_results<\/span><span class=\"p\">[[<\/span><span class=\"s2\">&#34;state&#34;<\/span><span class=\"p\">,<\/span> <span class=\"s2\">&#34;biden&#34;<\/span><span class=\"p\">]],<\/span> <span class=\"n\">how<\/span> <span class=\"o\">=<\/span> <span class=\"s2\">&#34;left&#34;<\/span><span class=\"p\">,<\/span> <span class=\"n\">on<\/span> <span class=\"o\">=<\/span> <span class=\"s1\">&#39;state&#39;<\/span><span class=\"p\">)<\/span>\n<\/span><\/span><\/code><\/pre><\/div> <h3 id=\"vaccination-rates\">\n  <a href=\"#vaccination-rates\" class=\"header-link\">\n    Vaccination rates\n  <\/a>\n<\/h3><p>Lastly, we add the vaccination rates per state after the pandemic was over. I use the vaccination rates from 10.5.2023 as this data was easily available.<\/p>\n<div class=\"highlight\"><pre tabindex=\"0\" class=\"chroma\"><code class=\"language-python\" data-lang=\"python\"><span class=\"line\"><span class=\"cl\"><span class=\"n\">vaccinations_raw<\/span> <span class=\"o\">=<\/span> <span class=\"n\">pd<\/span><span class=\"o\">.<\/span><span class=\"n\">read_csv<\/span><span class=\"p\">(<\/span><span class=\"s2\">&#34;us_state_vaccinations.csv&#34;<\/span><span class=\"p\">)<\/span><span class=\"o\">.<\/span><span class=\"n\">rename<\/span><span class=\"p\">(<\/span><span class=\"n\">columns<\/span><span class=\"o\">=<\/span><span class=\"p\">{<\/span><span class=\"s2\">&#34;location&#34;<\/span><span class=\"p\">:<\/span> <span class=\"s2\">&#34;state&#34;<\/span><span class=\"p\">})<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"n\">vaccinations_raw<\/span><span class=\"o\">.<\/span><span class=\"n\">replace<\/span><span class=\"p\">({<\/span><span class=\"s1\">&#39;New York State&#39;<\/span><span class=\"p\">:<\/span> <span class=\"s1\">&#39;New York&#39;<\/span><span class=\"p\">},<\/span> <span class=\"n\">inplace<\/span><span class=\"o\">=<\/span><span class=\"kc\">True<\/span><span class=\"p\">)<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"c1\"># chose date and relevant columns<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"n\">vaccinations<\/span> <span class=\"o\">=<\/span> <span class=\"n\">vaccinations_raw<\/span><span class=\"p\">[<\/span><span class=\"n\">vaccinations_raw<\/span><span class=\"o\">.<\/span><span class=\"n\">date<\/span> <span class=\"o\">==<\/span> <span class=\"s1\">&#39;2023-05-10&#39;<\/span><span class=\"p\">][[<\/span><span class=\"s2\">&#34;state&#34;<\/span><span class=\"p\">,<\/span><span class=\"s2\">&#34;people_vaccinated_per_hundred&#34;<\/span><span class=\"p\">]]<\/span>\n<\/span><\/span><\/code><\/pre><\/div><p>Next, we merge the vaccination rates with the dataframe to get our final dataset.<\/p>\n<div class=\"highlight\"><pre tabindex=\"0\" class=\"chroma\"><code class=\"language-python\" data-lang=\"python\"><span class=\"line\"><span class=\"cl\"><span class=\"n\">dataset<\/span> <span class=\"o\">=<\/span> <span class=\"n\">death_and_election<\/span><span class=\"o\">.<\/span><span class=\"n\">merge<\/span><span class=\"p\">(<\/span><span class=\"n\">vaccinations<\/span><span class=\"p\">,<\/span> <span class=\"n\">how<\/span> <span class=\"o\">=<\/span> <span class=\"s2\">&#34;left&#34;<\/span><span class=\"p\">,<\/span> <span class=\"n\">on<\/span> <span class=\"o\">=<\/span> <span class=\"s1\">&#39;state&#39;<\/span><span class=\"p\">)<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"n\">dataset<\/span><span class=\"p\">[[<\/span><span class=\"s2\">&#34;state&#34;<\/span><span class=\"p\">,<\/span> <span class=\"s2\">&#34;deaths_per_million&#34;<\/span><span class=\"p\">,<\/span> <span class=\"s2\">&#34;biden&#34;<\/span><span class=\"p\">,<\/span> <span class=\"s2\">&#34;perc_over_65&#34;<\/span><span class=\"p\">,<\/span> <span class=\"s2\">&#34;people_vaccinated_per_hundred&#34;<\/span><span class=\"p\">]]<\/span><span class=\"o\">.<\/span><span class=\"n\">head<\/span><span class=\"p\">(<\/span><span class=\"mi\">5<\/span><span class=\"p\">)<\/span>\n<\/span><\/span><\/code><\/pre><\/div><div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n<pre><code>.dataframe tbody tr th {\n    vertical-align: top;\n}\n\n.dataframe thead th {\n    text-align: right;\n}\n<\/code><\/pre>\n<p><\/style><\/p>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th><\/th>\n      <th>state<\/th>\n      <th>deaths_per_million<\/th>\n      <th>biden<\/th>\n      <th>perc_over_65<\/th>\n      <th>people_vaccinated_per_hundred<\/th>\n    <\/tr>\n  <\/thead>\n  <tbody>\n    <tr>\n      <th>0<\/th>\n      <td>Alabama<\/td>\n      <td>2832.791548<\/td>\n      <td>-12.967108<\/td>\n      <td>17.757009<\/td>\n      <td>65.12<\/td>\n    <\/tr>\n    <tr>\n      <th>1<\/th>\n      <td>Alaska<\/td>\n      <td>1621.067031<\/td>\n      <td>-13.901007<\/td>\n      <td>13.132695<\/td>\n      <td>73.23<\/td>\n    <\/tr>\n    <tr>\n      <th>2<\/th>\n      <td>Arizona<\/td>\n      <td>2703.948255<\/td>\n      <td>0.259682<\/td>\n      <td>18.515025<\/td>\n      <td>78.37<\/td>\n    <\/tr>\n    <tr>\n      <th>3<\/th>\n      <td>Arkansas<\/td>\n      <td>2696.469812<\/td>\n      <td>-14.390782<\/td>\n      <td>17.683933<\/td>\n      <td>70.09<\/td>\n    <\/tr>\n    <tr>\n      <th>4<\/th>\n      <td>California<\/td>\n      <td>1600.106686<\/td>\n      <td>15.934760<\/td>\n      <td>15.179841<\/td>\n      <td>85.07<\/td>\n    <\/tr>\n  <\/tbody>\n<\/table>\n<\/div>\n<p>I am also storing the final dataset as a parquet file in case you want to play with it. To find it, follow the link at the bottom of the page.<\/p>\n<div class=\"highlight\"><pre tabindex=\"0\" class=\"chroma\"><code class=\"language-python\" data-lang=\"python\"><span class=\"line\"><span class=\"cl\"><span class=\"n\">dataset<\/span><span class=\"p\">[[<\/span><span class=\"s2\">&#34;state&#34;<\/span><span class=\"p\">,<\/span> <span class=\"s2\">&#34;deaths_per_million&#34;<\/span><span class=\"p\">,<\/span> <span class=\"s2\">&#34;biden&#34;<\/span><span class=\"p\">,<\/span> <span class=\"s2\">&#34;perc_over_65&#34;<\/span><span class=\"p\">,<\/span> <span class=\"s2\">&#34;people_vaccinated_per_hundred&#34;<\/span><span class=\"p\">]]<\/span><span class=\"o\">.<\/span><span class=\"n\">to_parquet<\/span><span class=\"p\">(<\/span><span class=\"s2\">&#34;death_partisanship_final_dataset.parquet&#34;<\/span><span class=\"p\">,<\/span> <span class=\"n\">engine<\/span> <span class=\"o\">=<\/span> <span class=\"s1\">&#39;pyarrow&#39;<\/span><span class=\"p\">)<\/span>\n<\/span><\/span><\/code><\/pre><\/div> <h1 id=\"regressions\">\n  <a href=\"#regressions\" class=\"header-link\">\n    Regressions\n  <\/a>\n<\/h1><p>With the assembled dataset we can now reproduce Nate Silver&rsquo;s findings.<\/p>\n <h3 id=\"1-regression-state-partisanship-on-death-rates\">\n  <a href=\"#1-regression-state-partisanship-on-death-rates\" class=\"header-link\">\n    1. Regression: state partisanship on death rates\n  <\/a>\n<\/h3><p>Silver starts with a one-variable baseline model. And indeed <code>biden<\/code>, Joe Biden&rsquo;s winning margin, is a statistically significant predictor of COVID deaths. According to the model, one percentage point increase in Biden&rsquo;s margin reduced expected COVID deaths by ~30 covid deaths per million population.<\/p>\n<div class=\"highlight\"><pre tabindex=\"0\" class=\"chroma\"><code class=\"language-python\" data-lang=\"python\"><span class=\"line\"><span class=\"cl\"><span class=\"n\">x<\/span> <span class=\"o\">=<\/span> <span class=\"n\">dataset<\/span><span class=\"p\">[[<\/span><span class=\"s1\">&#39;biden&#39;<\/span><span class=\"p\">]]<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"n\">y<\/span> <span class=\"o\">=<\/span> <span class=\"n\">dataset<\/span><span class=\"p\">[<\/span><span class=\"s1\">&#39;deaths_per_million&#39;<\/span><span class=\"p\">]<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"c1\"># add a constant term to the model<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"n\">x<\/span> <span class=\"o\">=<\/span> <span class=\"n\">sm<\/span><span class=\"o\">.<\/span><span class=\"n\">add_constant<\/span><span class=\"p\">(<\/span><span class=\"n\">x<\/span><span class=\"p\">)<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"c1\"># run regression<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"n\">model<\/span> <span class=\"o\">=<\/span> <span class=\"n\">sm<\/span><span class=\"o\">.<\/span><span class=\"n\">OLS<\/span><span class=\"p\">(<\/span><span class=\"n\">y<\/span><span class=\"p\">,<\/span> <span class=\"n\">x<\/span><span class=\"p\">)<\/span><span class=\"o\">.<\/span><span class=\"n\">fit<\/span><span class=\"p\">()<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"n\">print_model<\/span> <span class=\"o\">=<\/span> <span class=\"n\">model<\/span><span class=\"o\">.<\/span><span class=\"n\">summary<\/span><span class=\"p\">()<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"nb\">print<\/span><span class=\"p\">(<\/span><span class=\"n\">print_model<\/span><span class=\"p\">)<\/span>\n<\/span><\/span><\/code><\/pre><\/div><pre><code>                            OLS Regression Results                            \n==============================================================================\nDep. Variable:     deaths_per_million   R-squared:                       0.286\nModel:                            OLS   Adj. R-squared:                  0.272\nMethod:                 Least Squares   F-statistic:                     19.27\nDate:                Sun, 14 Jan 2024   Prob (F-statistic):           6.22e-05\nTime:                        18:05:09   Log-Likelihood:                -381.14\nNo. Observations:                  50   AIC:                             766.3\nDf Residuals:                      48   BIC:                             770.1\nDf Model:                           1                                         \nCovariance Type:            nonrobust                                         \n==============================================================================\n                 coef    std err          t      P&gt;|t|      [0.025      0.975]\n------------------------------------------------------------------------------\nconst       2031.3050     72.204     28.133      0.000    1886.129    2176.481\nbiden        -29.7773      6.784     -4.389      0.000     -43.418     -16.137\n==============================================================================\nOmnibus:                        0.718   Durbin-Watson:                   2.263\nProb(Omnibus):                  0.698   Jarque-Bera (JB):                0.732\nSkew:                          -0.266   Prob(JB):                        0.694\nKurtosis:                       2.737   Cond. No.                         10.8\n==============================================================================\n\nNotes:\n[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.\n<\/code><\/pre>\n<p>We can also plot the data set and the regression line to get a more intuitive overview of the result:<\/p>\n<div class=\"highlight\"><pre tabindex=\"0\" class=\"chroma\"><code class=\"language-python\" data-lang=\"python\"><span class=\"line\"><span class=\"cl\"><span class=\"n\">fig<\/span><span class=\"p\">,<\/span> <span class=\"n\">ax<\/span> <span class=\"o\">=<\/span> <span class=\"n\">plt<\/span><span class=\"o\">.<\/span><span class=\"n\">subplots<\/span><span class=\"p\">(<\/span><span class=\"n\">figsize<\/span> <span class=\"o\">=<\/span> <span class=\"p\">(<\/span><span class=\"mi\">9<\/span><span class=\"p\">,<\/span> <span class=\"mi\">9<\/span><span class=\"p\">))<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"n\">ax<\/span><span class=\"o\">.<\/span><span class=\"n\">scatter<\/span><span class=\"p\">(<\/span><span class=\"n\">dataset<\/span><span class=\"p\">[<\/span><span class=\"s2\">&#34;biden&#34;<\/span><span class=\"p\">],<\/span>  <span class=\"n\">dataset<\/span><span class=\"p\">[<\/span><span class=\"s2\">&#34;deaths_per_million&#34;<\/span><span class=\"p\">])<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"n\">plt<\/span><span class=\"o\">.<\/span><span class=\"n\">xlabel<\/span><span class=\"p\">(<\/span><span class=\"s2\">&#34;Margin <\/span><span class=\"si\">% f<\/span><span class=\"s2\">or Biden (positive values mean Democrats won)&#34;<\/span><span class=\"p\">)<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"n\">plt<\/span><span class=\"o\">.<\/span><span class=\"n\">ylabel<\/span><span class=\"p\">(<\/span><span class=\"s2\">&#34;Post vaccine (Feb 2021) deaths per 1M population&#34;<\/span><span class=\"p\">)<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"n\">b<\/span><span class=\"p\">,<\/span> <span class=\"n\">m<\/span> <span class=\"o\">=<\/span> <span class=\"n\">model<\/span><span class=\"o\">.<\/span><span class=\"n\">params<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"n\">ax<\/span><span class=\"o\">.<\/span><span class=\"n\">axline<\/span><span class=\"p\">(<\/span><span class=\"n\">xy1<\/span><span class=\"o\">=<\/span><span class=\"p\">(<\/span><span class=\"mi\">0<\/span><span class=\"p\">,<\/span> <span class=\"n\">b<\/span><span class=\"p\">),<\/span> <span class=\"n\">slope<\/span><span class=\"o\">=<\/span><span class=\"n\">m<\/span><span class=\"p\">,<\/span> <span class=\"n\">color<\/span><span class=\"o\">=<\/span><span class=\"s1\">&#39;red&#39;<\/span><span class=\"p\">)<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"n\">plt<\/span><span class=\"o\">.<\/span><span class=\"n\">grid<\/span><span class=\"p\">()<\/span>\n<\/span><\/span><\/code><\/pre><\/div><figure><img src=\"\/posts\/covid_bipartisan_bayesian\/output_30_0.png\">\n<\/figure>\n\n <h3 id=\"2-regression-state-partisanship-and-age-on-death-rates\">\n  <a href=\"#2-regression-state-partisanship-and-age-on-death-rates\" class=\"header-link\">\n    2. Regression: state partisanship and age on death rates\n  <\/a>\n<\/h3><p>Silver&rsquo;s article is a response to criticism that the model above is not valuable because different states have different age structures and that may explain most of the variation in death rates. Silver remarks that the results hold even when controlling for age. We can add age <code>perc_over_65<\/code> to the model and re-run the regression and indeed we get the same result. <code>biden<\/code> is still significant with a similar coefficient.<\/p>\n<div class=\"highlight\"><pre tabindex=\"0\" class=\"chroma\"><code class=\"language-python\" data-lang=\"python\"><span class=\"line\"><span class=\"cl\"><span class=\"n\">x2<\/span> <span class=\"o\">=<\/span> <span class=\"n\">dataset<\/span><span class=\"p\">[[<\/span><span class=\"s1\">&#39;biden&#39;<\/span><span class=\"p\">,<\/span> <span class=\"s1\">&#39;perc_over_65&#39;<\/span><span class=\"p\">]]<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"n\">y2<\/span> <span class=\"o\">=<\/span> <span class=\"n\">dataset<\/span><span class=\"p\">[<\/span><span class=\"s1\">&#39;deaths_per_million&#39;<\/span><span class=\"p\">]<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\">\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"n\">x2<\/span> <span class=\"o\">=<\/span> <span class=\"n\">sm<\/span><span class=\"o\">.<\/span><span class=\"n\">add_constant<\/span><span class=\"p\">(<\/span><span class=\"n\">x2<\/span><span class=\"p\">)<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"n\">model2<\/span> <span class=\"o\">=<\/span> <span class=\"n\">sm<\/span><span class=\"o\">.<\/span><span class=\"n\">OLS<\/span><span class=\"p\">(<\/span><span class=\"n\">y2<\/span><span class=\"p\">,<\/span> <span class=\"n\">x2<\/span><span class=\"p\">)<\/span><span class=\"o\">.<\/span><span class=\"n\">fit<\/span><span class=\"p\">()<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"n\">print_model2<\/span> <span class=\"o\">=<\/span> <span class=\"n\">model2<\/span><span class=\"o\">.<\/span><span class=\"n\">summary<\/span><span class=\"p\">()<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"nb\">print<\/span><span class=\"p\">(<\/span><span class=\"n\">print_model2<\/span><span class=\"p\">)<\/span>\n<\/span><\/span><\/code><\/pre><\/div><pre><code>                            OLS Regression Results                            \n==============================================================================\nDep. Variable:     deaths_per_million   R-squared:                       0.397\nModel:                            OLS   Adj. R-squared:                  0.371\nMethod:                 Least Squares   F-statistic:                     15.44\nDate:                Sun, 14 Jan 2024   Prob (F-statistic):           6.99e-06\nTime:                        18:05:12   Log-Likelihood:                -376.95\nNo. Observations:                  50   AIC:                             759.9\nDf Residuals:                      47   BIC:                             765.6\nDf Model:                           2                                         \nCovariance Type:            nonrobust                                         \n================================================================================\n                   coef    std err          t      P&gt;|t|      [0.025      0.975]\n--------------------------------------------------------------------------------\nconst          256.0210    609.788      0.420      0.677    -970.714    1482.756\nbiden          -32.9481      6.397     -5.151      0.000     -45.817     -20.079\nperc_over_65   101.6118     34.690      2.929      0.005      31.824     171.400\n==============================================================================\nOmnibus:                        1.194   Durbin-Watson:                   2.318\nProb(Omnibus):                  0.551   Jarque-Bera (JB):                1.061\nSkew:                          -0.162   Prob(JB):                        0.588\nKurtosis:                       2.365   Cond. No.                         162.\n==============================================================================\n\nNotes:\n[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.\n<\/code><\/pre>\n <h3 id=\"3-regression-state-partisanship-age-and-vaccination-rates-on-death-rates\">\n  <a href=\"#3-regression-state-partisanship-age-and-vaccination-rates-on-death-rates\" class=\"header-link\">\n    3. Regression: state partisanship, age, and vaccination rates on death rates\n  <\/a>\n<\/h3><p>Clearly your political leanings are not a predictor for your chance of dying from COVID. The assumption behind the model is that Republicans have been less likely to get vaccinated, and unvaccinated humans have a higher mortality w.r.t COVID. We can check what happens if we add the vaccination rates as a variable to the model.<\/p>\n<p>We can see below that vaccination rate is significant and explains away the effect of the <code>biden<\/code> variable.<\/p>\n<div class=\"highlight\"><pre tabindex=\"0\" class=\"chroma\"><code class=\"language-python\" data-lang=\"python\"><span class=\"line\"><span class=\"cl\"><span class=\"n\">x3<\/span> <span class=\"o\">=<\/span> <span class=\"n\">dataset<\/span><span class=\"p\">[[<\/span><span class=\"s1\">&#39;biden&#39;<\/span><span class=\"p\">,<\/span> <span class=\"s1\">&#39;perc_over_65&#39;<\/span><span class=\"p\">,<\/span> <span class=\"s1\">&#39;people_vaccinated_per_hundred&#39;<\/span><span class=\"p\">]]<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"n\">y3<\/span> <span class=\"o\">=<\/span> <span class=\"n\">dataset<\/span><span class=\"p\">[<\/span><span class=\"s1\">&#39;deaths_per_million&#39;<\/span><span class=\"p\">]<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"n\">x3<\/span> <span class=\"o\">=<\/span> <span class=\"n\">sm<\/span><span class=\"o\">.<\/span><span class=\"n\">add_constant<\/span><span class=\"p\">(<\/span><span class=\"n\">x3<\/span><span class=\"p\">)<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"n\">model3<\/span> <span class=\"o\">=<\/span> <span class=\"n\">sm<\/span><span class=\"o\">.<\/span><span class=\"n\">OLS<\/span><span class=\"p\">(<\/span><span class=\"n\">y3<\/span><span class=\"p\">,<\/span> <span class=\"n\">x3<\/span><span class=\"p\">)<\/span><span class=\"o\">.<\/span><span class=\"n\">fit<\/span><span class=\"p\">()<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"n\">print_model3<\/span> <span class=\"o\">=<\/span> <span class=\"n\">model3<\/span><span class=\"o\">.<\/span><span class=\"n\">summary<\/span><span class=\"p\">()<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"nb\">print<\/span><span class=\"p\">(<\/span><span class=\"n\">print_model3<\/span><span class=\"p\">)<\/span>\n<\/span><\/span><\/code><\/pre><\/div><pre><code>                            OLS Regression Results                            \n==============================================================================\nDep. Variable:     deaths_per_million   R-squared:                       0.450\nModel:                            OLS   Adj. R-squared:                  0.414\nMethod:                 Least Squares   F-statistic:                     12.55\nDate:                Sun, 14 Jan 2024   Prob (F-statistic):           4.01e-06\nTime:                        18:05:13   Log-Likelihood:                -374.63\nNo. Observations:                  50   AIC:                             757.3\nDf Residuals:                      46   BIC:                             764.9\nDf Model:                           3                                         \nCovariance Type:            nonrobust                                         \n=================================================================================================\n                                    coef    std err          t      P&gt;|t|      [0.025      0.975]\n-------------------------------------------------------------------------------------------------\nconst                          1528.3900    841.133      1.817      0.076    -164.723    3221.503\nbiden                           -15.9990     10.110     -1.583      0.120     -36.349       4.351\nperc_over_65                    114.7298     34.042      3.370      0.002      46.207     183.253\npeople_vaccinated_per_hundred   -18.5072      8.743     -2.117      0.040     -36.106      -0.908\n==============================================================================\nOmnibus:                        0.262   Durbin-Watson:                   2.192\nProb(Omnibus):                  0.877   Jarque-Bera (JB):                0.454\nSkew:                           0.027   Prob(JB):                        0.797\nKurtosis:                       2.537   Cond. No.                     1.08e+03\n==============================================================================\n\nNotes:\n[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.\n[2] The condition number is large, 1.08e+03. This might indicate that there are\nstrong multicollinearity or other numerical problems.\n<\/code><\/pre>\n <h3 id=\"4-regression-age-and-vaccination-rates-on-death-rates\">\n  <a href=\"#4-regression-age-and-vaccination-rates-on-death-rates\" class=\"header-link\">\n    4. Regression: age and vaccination rates on death rates\n  <\/a>\n<\/h3><p>In a final step, we remove the correlated variable <code>biden<\/code> from the model to demonstrate the impact of age and vaccination rate on death rate:<\/p>\n<div class=\"highlight\"><pre tabindex=\"0\" class=\"chroma\"><code class=\"language-python\" data-lang=\"python\"><span class=\"line\"><span class=\"cl\"><span class=\"n\">x4<\/span> <span class=\"o\">=<\/span> <span class=\"n\">dataset<\/span><span class=\"p\">[[<\/span><span class=\"s1\">&#39;perc_over_65&#39;<\/span><span class=\"p\">,<\/span> <span class=\"s1\">&#39;people_vaccinated_per_hundred&#39;<\/span><span class=\"p\">]]<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"n\">y4<\/span> <span class=\"o\">=<\/span> <span class=\"n\">dataset<\/span><span class=\"p\">[<\/span><span class=\"s1\">&#39;deaths_per_million&#39;<\/span><span class=\"p\">]<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"n\">x4<\/span> <span class=\"o\">=<\/span> <span class=\"n\">sm<\/span><span class=\"o\">.<\/span><span class=\"n\">add_constant<\/span><span class=\"p\">(<\/span><span class=\"n\">x4<\/span><span class=\"p\">)<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\">\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"n\">model4<\/span> <span class=\"o\">=<\/span> <span class=\"n\">sm<\/span><span class=\"o\">.<\/span><span class=\"n\">OLS<\/span><span class=\"p\">(<\/span><span class=\"n\">y4<\/span><span class=\"p\">,<\/span> <span class=\"n\">x4<\/span><span class=\"p\">)<\/span><span class=\"o\">.<\/span><span class=\"n\">fit<\/span><span class=\"p\">()<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\">\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"n\">print_model4<\/span> <span class=\"o\">=<\/span> <span class=\"n\">model4<\/span><span class=\"o\">.<\/span><span class=\"n\">summary<\/span><span class=\"p\">()<\/span>\n<\/span><\/span><span class=\"line\"><span class=\"cl\"><span class=\"nb\">print<\/span><span class=\"p\">(<\/span><span class=\"n\">print_model4<\/span><span class=\"p\">)<\/span>\n<\/span><\/span><\/code><\/pre><\/div><pre><code>                            OLS Regression Results                            \n==============================================================================\nDep. Variable:     deaths_per_million   R-squared:                       0.420\nModel:                            OLS   Adj. R-squared:                  0.396\nMethod:                 Least Squares   F-statistic:                     17.03\nDate:                Sun, 14 Jan 2024   Prob (F-statistic):           2.74e-06\nTime:                        18:05:14   Log-Likelihood:                -375.95\nNo. Observations:                  50   AIC:                             757.9\nDf Residuals:                      47   BIC:                             763.6\nDf Model:                           2                                         \nCovariance Type:            nonrobust                                         \n=================================================================================================\n                                    coef    std err          t      P&gt;|t|      [0.025      0.975]\n-------------------------------------------------------------------------------------------------\nconst                          2386.5532    653.203      3.654      0.001    1072.478    3700.629\nperc_over_65                    117.0242     34.551      3.387      0.001      47.516     186.532\npeople_vaccinated_per_hundred   -29.4653      5.423     -5.434      0.000     -40.374     -18.556\n==============================================================================\nOmnibus:                        0.653   Durbin-Watson:                   2.211\nProb(Omnibus):                  0.722   Jarque-Bera (JB):                0.527\nSkew:                           0.244   Prob(JB):                        0.768\nKurtosis:                       2.881   Cond. No.                         829.\n==============================================================================\n\nNotes:\n[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.\n<\/code><\/pre>\n <h3 id=\"jupyter-notebook\">\n  <a href=\"#jupyter-notebook\" class=\"header-link\">\n    Jupyter Notebook\n  <\/a>\n<\/h3><p>You can find the jupyter notebook and the datasets for this post \n<a href=\"https:\/\/gitlab.com\/frankRi89\/blog\/-\/tree\/main\/notebooks\/covid_bipartisian\" \n\n    target=\"_blank\"\n    >here\n    \n        <span style=\"white-space: nowrap\">&thinsp;<svg\n            style=\"height: 0.7em; width: 0.7em; padding-left: -0.2em;\" focusable=\"false\" data-prefix=\"fas\" data-icon=\"external-link-alt\"\n            class=\"svg-inline--fa fa-external-link-alt fa-w-16\" role=\"img\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\"\n            viewBox=\"0 0 512 512\">\n            <path fill=\"currentColor\"\n                d=\"M432,320H400a16,16,0,0,0-16,16V448H64V128H208a16,16,0,0,0,16-16V80a16,16,0,0,0-16-16H48A48,48,0,0,0,0,112V464a48,48,0,0,0,48,48H400a48,48,0,0,0,48-48V336A16,16,0,0,0,432,320ZM488,0h-128c-21.37,0-32.05,25.91-17,41l35.73,35.73L135,320.37a24,24,0,0,0,0,34L157.67,377a24,24,0,0,0,34,0L435.28,133.32,471,169c15,15,41,4.5,41-17V24A24,24,0,0,0,488,0Z\">\n            <\/path>\n        <\/svg>\n    <\/span>\n        \n    \n<\/a>.<\/p>\n"},{"title":"On Writing Well - 80\/20 Checklist","link":"\/posts\/writing-well-checklist\/","pubDate":"Mon, 14 Aug 2023 00:00:00 +0000","guid":"\/posts\/writing-well-checklist\/","description":"<p>This is a checklist that I use at work to quickly sense check my drafts of strategy documents, tech proposals, PR reviews, project feedback, and other places where effective writing is helpful. It&rsquo;s listing suggestions from the book \n<a href=\"https:\/\/www.goodreads.com\/en\/book\/show\/53343\" \n\n    target=\"_blank\"\n    >On Writing Well\n    \n\n        \n    <span style=\"white-space: nowrap\">&thinsp;<svg\n        style=\"height: 0.7em; width: 0.9em; margin-left: -0.2em;\" focusable=\"false\" data-prefix=\"fas\" data-icon=\"external-link-alt\"\n        class=\"svg-inline--fa fa-external-link-alt fa-w-16\" role=\"img\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\"\n        viewBox=\"0 0 448 512\">\n        <path fill=\"currentColor\"\n            d=\"M299.9 191.2c5.1 37.3-4.7 79-35.9 100.7-22.3 15.5-52.8 14.1-70.8 5.7-37.1-17.3-49.5-58.6-46.8-97.2 4.3-60.9 40.9-87.9 75.3-87.5 46.9-.2 71.8 31.8 78.2 78.3zM448 88v336c0 30.9-25.1 56-56 56H56c-30.9 0-56-25.1-56-56V88c0-30.9 25.1-56 56-56h336c30.9 0 56 25.1 56 56zM330 313.2s-.1-34-.1-217.3h-29v40.3c-.8 .3-1.2-.5-1.6-1.2-9.6-20.7-35.9-46.3-76-46-51.9 .4-87.2 31.2-100.6 77.8-4.3 14.9-5.8 30.1-5.5 45.6 1.7 77.9 45.1 117.8 112.4 115.2 28.9-1.1 54.5-17 69-45.2 .5-1 1.1-1.9 1.7-2.9 .2 .1 .4 .1 .6 .2 .3 3.8 .2 30.7 .1 34.5-.2 14.8-2 29.5-7.2 43.5-7.8 21-22.3 34.7-44.5 39.5-17.8 3.9-35.6 3.8-53.2-1.2-21.5-6.1-36.5-19-41.1-41.8-.3-1.6-1.3-1.3-2.3-1.3h-26.8c.8 10.6 3.2 20.3 8.5 29.2 24.2 40.5 82.7 48.5 128.2 37.4 49.9-12.3 67.3-54.9 67.4-106.3z\">\n        <\/path>\n    <\/svg>\n<\/span>\n\n\n\n    \n<\/a> that I covered in this \n<a href=\"\/posts\/writing-well\/\" \n>blog post\n<\/a>. As I mentioned in the blog post I am not striving for perfection. Instead I want to be able to identify the main parts of my first draft that I can improve within 5&ndash;20 min of editing time.<\/p>\n <h2 id=\"clear-thinking\">\n  <a href=\"#clear-thinking\" class=\"header-link\">\n    Clear Thinking\n  <\/a>\n<\/h2><ul>\n<li>I am clear about what the main point of the article is, who my audience is, and why they should care.<\/li>\n<\/ul>\n <h2 id=\"structure\">\n  <a href=\"#structure\" class=\"header-link\">\n    Structure\n  <\/a>\n<\/h2><ul>\n<li>\n<p>I have paid special attention to the first sentence and made it interesting.<\/p>\n<\/li>\n<li>\n<p>The first paragraph hooks the reader by being fresh, novel, paradox, humorous, surprising, unusual or starting with an interesting fact or question.<\/p>\n<\/li>\n<li>\n<p>The first paragraph tells the reader what the article is about and why they should care.<\/p>\n<\/li>\n<li>\n<p>Every paragraph is kept reasonable short and captures one logical idea.<\/p>\n<\/li>\n<li>\n<p>The last sentence of each paragraph entices the reader to keep reading.<\/p>\n<\/li>\n<li>\n<p>The article is not longer than it needs to be. It does not attempt to cover every aspect of the topic.<\/p>\n<\/li>\n<li>\n<p>I paid attention to the end. The end happens in a fitting, unexpected or surprising way that should keep the reader thinking about the text.<\/p>\n<\/li>\n<\/ul>\n <h2 id=\"unity\">\n  <a href=\"#unity\" class=\"header-link\">\n    Unity\n  <\/a>\n<\/h2><ul>\n<li>The text is mostly written in one tense.<\/li>\n<li>The reader is addressed with the same pronoun.<\/li>\n<li>The tone is not changed (casual vs. formal, neutral vs. involved).<\/li>\n<\/ul>\n <h2 id=\"words\">\n  <a href=\"#words\" class=\"header-link\">\n    Words\n  <\/a>\n<\/h2><ul>\n<li>Every word is essential for the sentence and doing new work.<\/li>\n<li>Unnecessary adverbs are avoided.<\/li>\n<li>Unnecessary adjectives are avoided, e.g. &ldquo;diligent code review&rdquo;.<\/li>\n<li>Words that have shorter alternatives are replaced, e.g. \u201cassistance\u201d (help), \u201cnumerous\u201d (many), \u201cfacilitate\u201d (ease), \u201csufficient\u201d (enough), \u201cattempt\u201d (try).<\/li>\n<li>Words that inflate importance are avoided, e.g. \u201cwith the possible exception of\u201d (except), \u201cdue to the fact that\u201d (because).<\/li>\n<li>Small qualifier words are removed: \u201ca bit,\u201d \u201ca little,\u201d \u201csort of,\u201d \u201ckind of,\u201d \u201crather,\u201d \u201cquite,\u201d \u201cvery,\u201d \u201ctoo,\u201d \u201cpretty much,\u201d \u201cin a sense\u201d. Be confident in what you write.<\/li>\n<li>Active verbs are used over passive verbs, e.g. \u201cJoe documented the architecture\u201d over \u201cThe architecture was documented by Joe\u201d.<\/li>\n<li>Concept nouns are replaced by active verbs. Instead of \u201cThe monitoring system is used to detect data drift.\u201d use \u201cWe monitor our data to detect drift.\u201d<\/li>\n<li>Verbs are precise: \u201cThe CEO resigned\u201d instead of \u201cThe CEO left\u201d.<\/li>\n<\/ul>\n <h2 id=\"style\">\n  <a href=\"#style\" class=\"header-link\">\n    Style\n  <\/a>\n<\/h2><ul>\n<li>Sentences are kept short.<\/li>\n<li>Sentences link logically to the next. If not, a link is explicitly provided.<\/li>\n<li>Avoid exclamation points unless for effect.<\/li>\n<li>Use contractions like &ldquo;I&rsquo;ll&rdquo;, &ldquo;I&rsquo;ve&rdquo;, but not &ldquo;I&rsquo;d&rdquo; as this can mean both &ldquo;I had&rdquo; and &ldquo;I would&rdquo;.<\/li>\n<li>Always use &ldquo;that&rdquo; over &ldquo;which&rdquo;.<\/li>\n<\/ul>\n <h2 id=\"tone\">\n  <a href=\"#tone\" class=\"header-link\">\n    Tone\n  <\/a>\n<\/h2><ul>\n<li>Avoid business lingo and concept nouns, e.g. &ldquo;incident management procedures&rdquo;. Replace them with active verbs and plain talk.<\/li>\n<li>Try to write in a human way. Make people do things using active verbs.<\/li>\n<li>Resist trying to sound smart in work documents.<\/li>\n<li>Avoid sexism in language: &ldquo;Software Engineers can spend more time with their families.&rdquo; instead of &ldquo;Software Engineers can spend more time with their wives and children.&rdquo;<\/li>\n<\/ul>\n"},{"title":"Writing well as a skill for working well remotely","link":"\/posts\/writing-well\/","pubDate":"Sat, 12 Aug 2023 00:00:00 +0000","guid":"\/posts\/writing-well\/","description":"<p>Writing a good blog post that summarizes William Zinsser&rsquo;s classic book \n<a href=\"https:\/\/www.goodreads.com\/en\/book\/show\/53343\" \n\n    target=\"_blank\"\n    >On Writing Well\n    \n\n        \n    <span style=\"white-space: nowrap\">&thinsp;<svg\n        style=\"height: 0.7em; width: 0.9em; margin-left: -0.2em;\" focusable=\"false\" data-prefix=\"fas\" data-icon=\"external-link-alt\"\n        class=\"svg-inline--fa fa-external-link-alt fa-w-16\" role=\"img\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\"\n        viewBox=\"0 0 448 512\">\n        <path fill=\"currentColor\"\n            d=\"M299.9 191.2c5.1 37.3-4.7 79-35.9 100.7-22.3 15.5-52.8 14.1-70.8 5.7-37.1-17.3-49.5-58.6-46.8-97.2 4.3-60.9 40.9-87.9 75.3-87.5 46.9-.2 71.8 31.8 78.2 78.3zM448 88v336c0 30.9-25.1 56-56 56H56c-30.9 0-56-25.1-56-56V88c0-30.9 25.1-56 56-56h336c30.9 0 56 25.1 56 56zM330 313.2s-.1-34-.1-217.3h-29v40.3c-.8 .3-1.2-.5-1.6-1.2-9.6-20.7-35.9-46.3-76-46-51.9 .4-87.2 31.2-100.6 77.8-4.3 14.9-5.8 30.1-5.5 45.6 1.7 77.9 45.1 117.8 112.4 115.2 28.9-1.1 54.5-17 69-45.2 .5-1 1.1-1.9 1.7-2.9 .2 .1 .4 .1 .6 .2 .3 3.8 .2 30.7 .1 34.5-.2 14.8-2 29.5-7.2 43.5-7.8 21-22.3 34.7-44.5 39.5-17.8 3.9-35.6 3.8-53.2-1.2-21.5-6.1-36.5-19-41.1-41.8-.3-1.6-1.3-1.3-2.3-1.3h-26.8c.8 10.6 3.2 20.3 8.5 29.2 24.2 40.5 82.7 48.5 128.2 37.4 49.9-12.3 67.3-54.9 67.4-106.3z\">\n        <\/path>\n    <\/svg>\n<\/span>\n\n\n\n    \n<\/a> is a stressful activity. You don&rsquo;t want to break his rules while writing about them. I will attempt this anyway since I learned a lot from the book that I can use while working with my team.<\/p>\n<p>One of the benefits of a largely remote company is the ability to discuss proposals, strategies, documentation, and technical review documents asynchronously. This allows me and my collaborators to think through and respond to arguments in our own time which <del>can<\/del> should reduce overall meeting time.<\/p>\n<p>However, this requires us to write our ideas and arguments down. By doing this well I can have productive and engaging discussions with my team which will increase the chance of a project&rsquo;s success.<\/p>\n<p>William Zinsser argues in his book that an average reader has an attention span of 30s and that their attention is competing with a dozen alternative activities. He argues that the key to good writing is <strong>clear and engaging structure<\/strong>, <strong>simplicity<\/strong>, and, in the context of business writing, <strong>humanity<\/strong>. I will share my takeaways and give you a list that I use at work to check whether I follow Zinsser&rsquo;s advice.<\/p>\n <h2 id=\"structure---clear-thinking-produces-clear-writing\">\n  <a href=\"#structure---clear-thinking-produces-clear-writing\" class=\"header-link\">\n    Structure - Clear thinking produces clear writing\n  <\/a>\n<\/h2><p>When writing you should think clearly about what you want to say. Illogical and bad writing is a result of being unsure about the purpose and structure of the text.\nWe all can relate. Writing is an iterative process. You often start an article with different ideas and arguments than when you reach the end. During the writing process you come up with better arguments that you want to include or that you can explain better. This is natural and you should enjoy the process of editing and deleting to ensure your first draft gets a coherent structure.<\/p>\n<p>Ask yourself: &ldquo;Did I write, what I wanted to say?&rdquo; and &ldquo;Can my reader easily follow my narrative from the first paragraph to the last?&rdquo; The second question relates to the earlier idea of a lazy reader, that will stop reading if they are not hooked and need to spend brain power to follow your train of thought. How do you avoid that?<\/p>\n<p><strong>The most important sentence of your writing is the first one.<\/strong> It must induce the reader to read the second sentence. The second sentence must do the same for the third sentence and so on. <strong>The goal is to hook the reader in the first paragraph<\/strong>. Zinsser calls this &ldquo;the lead&rdquo; and it should achieve two things:<\/p>\n<ol>\n<li>To be an effective hook it should be: fresh, novel, paradox, humorous, surprising, unusual or starting with an interesting fact or question. It needs to force the reader to keep reading.<\/li>\n<li>It needs to tell the reader what the article is about and why they should care.<\/li>\n<\/ol>\n<p>Once the reader is hooked, ensure that the last sentence of each paragraph entices them to continue to the next. The end of a paragraph is often a natural stopping point. The paragraphs should be kept short and reflect one idea. Once you have said, what you wanted to say, stop. Be comfortable to drop material. Decide which part of the subject you want to cover. Cover it well and then stop.<\/p>\n<p>How to end your piece the right way? Try to encapsulate the main idea of the text and <strong>end in a fitting, unexpected or surprising way<\/strong>. Like a good dessert the last paragraph or sentence should be a joy in itself, and linger for a moment after the end.<\/p>\n <h2 id=\"simplicity---make-it-easy-for-your-readers\">\n  <a href=\"#simplicity---make-it-easy-for-your-readers\" class=\"header-link\">\n    Simplicity - Make it easy for your readers\n  <\/a>\n<\/h2><p>Like good structure simplicity helps your reader stay engaged. Unfortunately, while you write you will accumulate clutter. This can be words that do not add value or sentences that are difficult to follow. You can achieve simplicity by ensuring that:<\/p>\n<ol>\n<li>every word in a sentence is doing new work<\/li>\n<li>sentences are short and logically linked together.<\/li>\n<li>you have unity in choice of tense, pronouns, and style<\/li>\n<\/ol>\n<p>Zinsser argues that <strong>writing improves proportional to the number of things you can keep out of it that should not be there<\/strong>. Be critical of the words you choose and avoid:<\/p>\n<ul>\n<li>Words that do not do extra work, e.g. \u201ca personal friend of mine\u201d does not add more than &ldquo;a friend&rdquo;.<\/li>\n<li>Unnecessary adverbs. &ldquo;I wrote up the documentation&rdquo;. &ldquo;up&rdquo; is not required. &ldquo;My algorithm is decidedly better than brute-force&rdquo;. &ldquo;decidedly&rdquo; does not add anything.<\/li>\n<li>Avoid adjectives unless absolutely necessary, e.g. do not write &ldquo;diligent code review&rdquo; unless your company does not care about code reviews.<\/li>\n<li>Words that have shorter alternatives, e.g. \u201cassistance\u201d (help), \u201cnumerous\u201d (many), \u201cfacilitate\u201d (ease), \u201csufficient\u201d (enough), \u201cattempt\u201d (try).<\/li>\n<li>Words that inflate importance, e.g.  \u201cwith the possible exception of\u201d (except), \u201cdue to the fact that\u201d (because), \u201che totally lacked the ability to\u201d (he couldn\u2019t), \u201cfor the purpose of\u201d (for).<\/li>\n<li>Remove the small words that qualify how you feel and how you think and what you saw: \u201ca bit,\u201d \u201ca little,\u201d \u201csort of,\u201d \u201ckind of,\u201d \u201crather,\u201d \u201cquite,\u201d \u201cvery,\u201d \u201ctoo,\u201d \u201cpretty much,\u201d \u201cin a sense\u201d.<\/li>\n<li>Prefer active verbs over passive, e.g. &ldquo;Joe documented the architecture&rdquo; over &ldquo;The architecture was documented by Joe&rdquo;.<\/li>\n<li>Nouns that express a concept are commonly used in bad writing instead of verbs that tell what somebody did. Instead of &ldquo;The monitoring system is used to detect data drift.&rdquo; use &ldquo;We monitor our data to detect drift.&rdquo;<\/li>\n<li>Use precise verbs: &ldquo;Start a company&rdquo; instead of &ldquo;Set up a company&rdquo;. &ldquo;The CEO resigned&rdquo; or &ldquo;The CEO was fired&rdquo; instead of &ldquo;The CEO stepped down&rdquo;.<\/li>\n<\/ul>\n<p>After applying these rules your sentences should be clear and stripped of clutter. Now your job is to ensure that sentence B follows logically from sentence A. Also ensure that sentence F does not repeat the same argument made in sentence A. If the connections are not clear provide the missing link. My favorite quote in the book is on the question of sentence length. Zinsser advises: <strong>&ldquo;If you want to write long sentences, be a genius.&rdquo;<\/strong><\/p>\n<p>Aside from a careful choice of words and logical sentences you should strive for unity in:<\/p>\n<ul>\n<li>tense. Stick to one tense.<\/li>\n<li>pronouns. Use the same pronoun to address your reader.<\/li>\n<li>tone. Are you writing casual or formal, involved or detached, ironic or amused?<\/li>\n<li>style. Are you writing a wikipedia entry, a personal travel story, or a tech strategy?<\/li>\n<\/ul>\n<p>Decide on these points at the beginning and do not change unless necessary.<\/p>\n <h2 id=\"humanity---plain-talk-not-vanity\">\n  <a href=\"#humanity---plain-talk-not-vanity\" class=\"header-link\">\n    Humanity - Plain talk, not vanity\n  <\/a>\n<\/h2><p>After you internalized the advice from the previous two paragraphs you can start applying them at work. What could go wrong?<\/p>\n<p>People at work have a tendency to write with a pretentious style, e.g. &ldquo;The company uses evaluative procedures for our objectives based on our KPIs.&rdquo;<\/p>\n<p>Instead of dead concept nouns like &ldquo;evaluative procedures&rdquo; use active verbs and plain talk, e.g. a better way is &ldquo;We will evaluate our progress based on our KPIs.&rdquo;  You should aim to stay natural and write how you talk. A good test is to ensure your colleagues can visualize who is doing what when they read your sentence.<\/p>\n<p>To incorporate Zinsser&rsquo;s advice in my own writing at work, I created \n<a href=\"\/posts\/writing-well-checklist\/\" \n>this checklist\n<\/a> that I can refer to after completing a first draft. I am not striving for perfection. I don&rsquo;t want to win the Nobel prize in literature or write for The New Yorker. Instead, I want to get \n<a href=\"https:\/\/en.wikipedia.org\/wiki\/Pareto_principle\" \n\n    target=\"_blank\"\n    >80%\n    \n        <span style=\"white-space: nowrap\">&thinsp;<svg\n                style=\"height: 0.7em; width: 0.7em; margin-left: -0.2em;\" focusable=\"false\" data-prefix=\"fas\" data-icon=\"external-link-alt\"\n                class=\"svg-inline--fa fa-external-link-alt fa-w-16\" role=\"img\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\"\n                viewBox=\"0 0 640 512\">\n                <path fill=\"currentColor\"\n                    d=\"M640 51.2l-.3 12.2c-28.1 .8-45 15.8-55.8 40.3-25 57.8-103.3 240-155.3 358.6H415l-81.9-193.1c-32.5 63.6-68.3 130-99.2 193.1-.3 .3-15 0-15-.3C172 352.3 122.8 243.4 75.8 133.4 64.4 106.7 26.4 63.4 .2 63.7c0-3.1-.3-10-.3-14.2h161.9v13.9c-19.2 1.1-52.8 13.3-43.3 34.2 21.9 49.7 103.6 240.3 125.6 288.6 15-29.7 57.8-109.2 75.3-142.8-13.9-28.3-58.6-133.9-72.8-160-9.7-17.8-36.1-19.4-55.8-19.7V49.8l142.5 .3v13.1c-19.4 .6-38.1 7.8-29.4 26.1 18.9 40 30.6 68.1 48.1 104.7 5.6-10.8 34.7-69.4 48.1-100.8 8.9-20.6-3.9-28.6-38.6-29.4 .3-3.6 0-10.3 .3-13.6 44.4-.3 111.1-.3 123.1-.6v13.6c-22.5 .8-45.8 12.8-58.1 31.7l-59.2 122.8c6.4 16.1 63.3 142.8 69.2 156.7L559.2 91.8c-8.6-23.1-36.4-28.1-47.2-28.3V49.6l127.8 1.1 .2 .5z\">\n                <\/path>\n            <\/svg>\n        <\/span>\n        \n    \n<\/a> of the possible improvements. By sanity checking my work against the list I can stand out among my peers and collaborate successfully with my team.<\/p>\n"},{"title":"Book notes: How to Win Friends and Influence People by Dale Carnegie","link":"\/posts\/how-to-win-friends\/","pubDate":"Mon, 29 May 2023 00:00:00 +0000","guid":"\/posts\/how-to-win-friends\/","description":"<p>I remember listening to the audiobook version of Dale Carnegie&rsquo;s &ldquo;How to win Friends and Influence People&rdquo; in 2015 and finding his advice intuitive and applicable to real-life situations.\nHowever, as time passed I seem to have largely forgotten about the content. Coincidentally in the past months, this book has been mentioned to me again by friends and by guests of podcasts that I listened to. So I decided to give it another read, refresh my memory, and summarize the main points.<\/p>\n<p>The book, first published by Carnegie in 1953, is often called <em>the original self-help book<\/em>. It is easy to see why, as it follows the typical structure of books in its category:<\/p>\n<ul>\n<li>the advice in a chapter can be summarized in one sentence<\/li>\n<li>3-5 stories per chapter to illustrate successful application of the advice and describe some of its nuances in more detail<\/li>\n<li>the stories typically involve one of the following:\n<ol>\n<li>a famous, admired historic personality (Benjamin Franklin, John D. Rockefeller, Andrew Carnegie,  Abraham Lincoln)<\/li>\n<li>a salesperson \/ business-owner (car dealership, lumber mill, retail shop)<\/li>\n<li>husband-wife, father-son, mother-daughter relationships<\/li>\n<\/ol>\n<\/li>\n<li>the stories explain how the protagonist of the story solves their problem using the chapter&rsquo;s principle and in many cases gets an unexpected additional benefit. (&ldquo;My landlord reduced my rent and on top of that offered to do renovations on the facade.&rdquo;, &ldquo;We resolved our disputes on the repair bill with five customers. All of them bought new cars from us within the next two years.&rdquo;)<\/li>\n<\/ul>\n<p>What I like about this book is that, in contrast to some modern self-help books, this simple structure is not hidden from the reader. After the chapter&rsquo;s first story explaining the principle you can decide to read the next story or skip to the last paragraph of the chapter.<\/p>\n<p>It is not surprising that this book has sold more than 16 million copies. The advice is simple and timeless, the book has a clear structure, and the stories are intentionally diverse enough to apply to a wide audience.<\/p>\n<p>On the flip side, the book does not contain any references to controlled scientific experiments to verify the impact of the presented principles. I am not sure how much this matters though, because in many cases the principle can be summarized with &ldquo;Don&rsquo;t be an asshole, try to see it from the other person&rsquo;s point of view, and be open to find a solution together that allows the other person to save face.&rdquo;, which will make you a more pleasant person to be around in any case.<\/p>\n<p>The book is organized in four parts, each part with a number of principles to follow.<\/p>\n <h2 id=\"part-1-fundamental-techniques-in-handling-people\">\n  <a href=\"#part-1-fundamental-techniques-in-handling-people\" class=\"header-link\">\n    Part 1: Fundamental Techniques in handling people\n  <\/a>\n<\/h2> <h3 id=\"principle-1-1-dont-criticise-condemn-or-complain\">\n  <a href=\"#principle-1-1-dont-criticise-condemn-or-complain\" class=\"header-link\">\n    Principle 1-1: Don&rsquo;t criticise, condemn, or complain.\n  <\/a>\n<\/h3><p>People are not creatures of logic. They have prejudices, do not see the full picture, and are motivated by pride and vanity. Trying to change their mind by criticizing them directly, or condemning their behaviour, will only create resentment towards you. In some cases lasting a life long. Moreover, they will not change their behaviour due to criticism. Instead of condemning people, we should try to understand them and figure out why they do what they do. This creates sympathy, tolerance, and kindness.<\/p>\n<p>This also applies if there is a big power or status gap.  Benjamin Franklin: &ldquo;I will speak ill of no man.&rdquo; and Carlyle: &ldquo;A great man shows his greatness by the way he treats the little men.&quot;<span class=\"sidenote-number\"><small class=\"sidenote\"> It would be interesting to discuss how this can be aligned with open, direct, and candor feedback which is currently being popularized in the tech industry.<\/small><\/span><\/p>\n <h3 id=\"principle-1-2-give-honest-and-sincere-appreciation\">\n  <a href=\"#principle-1-2-give-honest-and-sincere-appreciation\" class=\"header-link\">\n    Principle 1-2: Give honest and sincere appreciation.\n  <\/a>\n<\/h3><p>Human&rsquo;s deepest urge is the &ldquo;desire to be important&rdquo; (Dewey). This desire made people like Rockefeller pursue and build fortunes or motivated writers like Dickens to write immortal novels. This desire can be found in everyone. Charles Schwab attributed his success as a business leader to &ldquo;my ability to arouse enthusiasm among my people, and the best way to develop that in a person is by appreciation and encouragement.&rdquo; He further adds: &ldquo;There is nothing else that kills the ambitions of a person as criticism from superiors. I never criticise anyone. I believe in giving a person incentive to work.&rdquo;<\/p>\n<p>What is the difference between praise and flattery? One is <em>sincere<\/em> and the other <em>insincere<\/em>.  One is <em>unselfish<\/em>, the other <em>selfish<\/em>.<\/p>\n<p>Use sincere appreciation both at work with your colleagues and direct reports, but also with your friends. Moreover, when was the last time you expressed to your waiter your sincere appreciation of the food they served you?<\/p>\n<p>All our associates are human beings and hunger for appreciation. Give it to them, sincerely.<span class=\"sidenote-number\"><small class=\"sidenote\"> I noticed that sincere praise is a key element at creating a motivating atmosphere in my teams. Still, I sometimes struggle to find the right moment to deliver this praise, especially when a project is not fully completed and the outcome is not yet clear or visible.<\/small><\/span><\/p>\n <h3 id=\"principle-1-3-arouse-in-the-other-person-an-eager-want\">\n  <a href=\"#principle-1-3-arouse-in-the-other-person-an-eager-want\" class=\"header-link\">\n    Principle 1-3: Arouse in the other person an eager want.\n  <\/a>\n<\/h3><p>The only way to influence other people is to talk about what <em>they<\/em> want and show them how to get it. Harry A. Overstreet: &ldquo;Action springs out of what we fundamentally desire [&hellip;] and the best piece of advice which can be given to would-be persuaders, is: First, arouse in the other person an eager want. He who can do this has the whole world with him.&rdquo; Henry Ford agrees: &ldquo;If there is any one secret of success, it lies in the ability to get the other person&rsquo;s point of view and see things from that person&rsquo;s angle as well as from your own.&rdquo;<\/p>\n<p>Carnegie applies this principle to distinguish good from bad salespeople. Bad salespeople only think about what they want. Good salespeople are interested in solving their clients&rsquo; problems. To do this they need to be able to understand their point of view. If they achieve this, they don&rsquo;t have to sell to them. The clients will buy.<span class=\"sidenote-number\"><small class=\"sidenote\"> This is important when discussing career plans with your reports. The easiest way to motivate someone is to develop a career goal to work towards.<\/small><\/span><\/p>\n <h2 id=\"part-2-six-ways-to-make-people-like-you\">\n  <a href=\"#part-2-six-ways-to-make-people-like-you\" class=\"header-link\">\n    Part 2: Six ways to make people like you\n  <\/a>\n<\/h2><p>This part focuses on conversations, meeting new people, and leaving a positive impression.<\/p>\n <h3 id=\"principle-2-1-become-genuinely-interested-in-other-people\">\n  <a href=\"#principle-2-1-become-genuinely-interested-in-other-people\" class=\"header-link\">\n    Principle 2-1: Become genuinely interested in other people.\n  <\/a>\n<\/h3><p>You can make more friends in two months by becoming genuinely interested in other people than you can in two years by trying to get other people interested in you. People are interested in themselves.<\/p>\n<p>One of Theodore Roosevelt&rsquo;s secrets to his popularity, why even his servants loved him, was his genuine interest in them and their concerns. This holds true for presidents as well as for sales people. Carnegie claims: &ldquo;I have discovered that one can win the attention, time, and cooperation of the most sought-after people by being genuinely interested in them.&rdquo;&rdquo;<\/p>\n<p>It&rsquo;s not only about showing interest. If you want to make friends, do things for them: invest time and energy, be unselfish and thoughtful. Remember your friends&rsquo; and business contacts&rsquo; birthdays and send them a telegram (maybe nowadays a WhatsApp message). Remember what they tell you about their families, and greet people with animation and enthusiasm. Show them you are happy to meet and spend time with them.<\/p>\n <h3 id=\"principle-2-2-smile\">\n  <a href=\"#principle-2-2-smile\" class=\"header-link\">\n    Principle 2-2: Smile.\n  <\/a>\n<\/h3><p>Actions speak louder than words, and a smile says: &ldquo;I like you. You make me happy. I am glad to see you.&rdquo;<\/p>\n<p>Carnegie emphasizes this point as important for customer support roles, business people, and everyone in their daily life. Not only because it makes you feel positive, also because people tend to reciprocate and smile back at you.<\/p>\n<p>Even when faced with criticism by a colleague at work, a customer, a friend, a smile can be disarming and diffuse an argument.<\/p>\n<p>Carnegie also points out that often, counter-intuitively, the smile produces the positive feeling. So smiling can also be used as an active tool to regulate negative feelings.<\/p>\n <h3 id=\"principle-2-3-remember-that-a-persons-name-is-to-that-person-the-sweetest-and-most-important-sound-in-any-language\">\n  <a href=\"#principle-2-3-remember-that-a-persons-name-is-to-that-person-the-sweetest-and-most-important-sound-in-any-language\" class=\"header-link\">\n    Principle 2-3: Remember that a person&rsquo;s name is to that person the sweetest and most important sound in any language.\n  <\/a>\n<\/h3><p>People love hearing their name in conversation. Use it often, it creates intimacy and shows that you care about them.<\/p>\n<p>This is especially important when meeting new people. Make an effort to remember their name during the evening and use it at least once.<span class=\"sidenote-number\"><small class=\"sidenote\"> How to remember names at a dinner party? Two things worked for me: Use an alliteration that describes them: &ldquo;Big Ben&rdquo;, &ldquo;Extroverted Emily&rdquo;. Or think of them performing an activity together with someone else with the same name: &ldquo;Michael shooting hoops with Michael Jordan&rdquo;, &ldquo;Robin taking photos of Aunt Robin&rdquo;.<\/small><\/span><\/p>\n <h3 id=\"principle-2-4-be-a-good-listener-encourage-others-to-talk-about-themselves\">\n  <a href=\"#principle-2-4-be-a-good-listener-encourage-others-to-talk-about-themselves\" class=\"header-link\">\n    Principle 2-4: Be a good listener. Encourage others to talk about themselves.\n  <\/a>\n<\/h3><p>This principle is similar to Principle 2-1. &ldquo;Few people are proof against the implied flattery of exclusive attention&rdquo;. In conversation, listen intently, be genuinely interested, try to learn something from the other person. Sometimes a good conversationalist, is merely a good listener. Ask them questions that they will enjoy answering.<\/p>\n<p>In a business context: Encourage them to talk about themselves and their accomplishments. Even when dealing with a dispute, it can help to let the other person talk themselves out, be receptive to their issue, before trying to find a solution.<\/p>\n <h3 id=\"principle-2-5-talk-in-terms-of-the-other-persons-interests\">\n  <a href=\"#principle-2-5-talk-in-terms-of-the-other-persons-interests\" class=\"header-link\">\n    Principle 2-5: Talk in terms of the other person&rsquo;s interests.\n  <\/a>\n<\/h3><p>This principle is mostly discussed in a business context. When talking to decision makers, it can be useful to dedicate most of the time talking about the other person&rsquo;s interests and hobbies, than actually to the problem or decision.<\/p>\n <h3 id=\"principle-2-6-make-the-other-person-feel-important---and-do-it-sincerely\">\n  <a href=\"#principle-2-6-make-the-other-person-feel-important---and-do-it-sincerely\" class=\"header-link\">\n    Principle 2-6: Make the other person feel important - and do it sincerely.\n  <\/a>\n<\/h3><p>&ldquo;The deepest principle in human nature is the craving to be appreciated.&rdquo; Therefore follow the law: &ldquo;<em>Always make the other person feel important.<\/em>&rdquo;&quot;\nand the Golden Rule: &ldquo;Give unto others what we would have others give unto us.&rdquo;<\/p>\n<p>Carnegie emphasizes that this is especially important in situations of status gaps, e.g. the ambassador talking to the waitress or the CEO discussing an issue with a factory worker.<\/p>\n<p>Remember: Everyone considers themselves important, very important.<\/p>\n <h2 id=\"part-3-how-to-win-people-to-your-way-of-thinking\">\n  <a href=\"#part-3-how-to-win-people-to-your-way-of-thinking\" class=\"header-link\">\n    Part 3: How to win people to your way of thinking\n  <\/a>\n<\/h2><p>This part focuses on convincing other people of your ideas and arguments.<\/p>\n <h3 id=\"principle-3-1-the-only-way-to-get-the-best-of-an-argument-is-to-avoid-it\">\n  <a href=\"#principle-3-1-the-only-way-to-get-the-best-of-an-argument-is-to-avoid-it\" class=\"header-link\">\n    Principle 3-1: The only way to get the best of an argument is to avoid it.\n  <\/a>\n<\/h3><p>You can&rsquo;t win an argument. If you lose it, you lose it. If you win it, you have made your opponent feel inferior, hurt their pride and they will resent you. Even a small argument, can significantly hurt the ability for future cooperation.<\/p>\n<p>So winning arguments means short term triumph at the cost of your opponent&rsquo;s good will. This is relevant at work but also in personal life. Nobody likes the Know-it-alls insisting on their opinion at a dinner party.<\/p>\n<p>How to keep a disagreement from becoming an argument?<\/p>\n<ol>\n<li>Welcome the disagreement: There is a chance you can avoid a serious mistake.<\/li>\n<li>Distrust your first instinctive impression:  Don&rsquo;t become defensive, stay open to it.<\/li>\n<li>Control your temper.<\/li>\n<li>Listen first: Let your opponent finish talking. Do not resist, defend or debate.<\/li>\n<li>Look for areas of agreement.<\/li>\n<li>Be honest: Look for areas where you can admit errors. Apologize.<\/li>\n<li>Promise to think over your opponents ideas and study them carefully.<\/li>\n<li>Thank your opponent for their interest.<\/li>\n<li>Postpone action to give both sides time to think through the problem.<\/li>\n<\/ol>\n <h3 id=\"principle-3-2-show-respect-for-the-other-persons-opinion-never-say-you-are-wrong\">\n  <a href=\"#principle-3-2-show-respect-for-the-other-persons-opinion-never-say-you-are-wrong\" class=\"header-link\">\n    Principle 3-2: Show respect for the other person&rsquo;s opinion. Never say, &lsquo;You are wrong.&rsquo;\n  <\/a>\n<\/h3><p>When you tell people &ldquo;You are wrong&rdquo; you won&rsquo;t make them want to agree with you or change their minds. Keep in mind that telling them they are wrong can also be done using a look or an intonation.<\/p>\n<p>Galileo said: &ldquo;You cannot teach a man anything. You can only help him to find it within himself.&rdquo; From the book <em>The Mind in the Making<\/em>: &ldquo;We sometimes find ourselves changing our minds without any resistance or heavy emotion, but if we are wrong, we resent the imputation and harden our hearts. [&hellip;] when someone proposes to rob us of their [believes] companionship. It is obviously not the ideas themselves that are dear to us, but our self-esteem which is threatened.&rdquo;<\/p>\n<p>Telling someone they are wrong, only succeeds in stripping them of self-dignity. Instead, show openness in the other person&rsquo;s complaint or opinion. Try to guide them to the solution instead of just telling them. Ask friendly and cooperative questions.<\/p>\n <h3 id=\"principle-3-3-if-you-are-wrong-admit-it-quickly-and-emphatically\">\n  <a href=\"#principle-3-3-if-you-are-wrong-admit-it-quickly-and-emphatically\" class=\"header-link\">\n    Principle 3-3: If you are wrong, admit it quickly and emphatically.\n  <\/a>\n<\/h3><p>When you know you are wrong, don&rsquo;t try to defend yourself. Admit it emphatically.\nIn some circumstances it might even be helpful to overdo the apology to give your opponent the chance to show generosity.<\/p>\n <h3 id=\"principle-3-4-begin-in-a-friendly-way\">\n  <a href=\"#principle-3-4-begin-in-a-friendly-way\" class=\"header-link\">\n    Principle 3-4: Begin in a friendly way.\n  <\/a>\n<\/h3><p>Start a debate with a &ldquo;drop of honey&rdquo;. Ensure the other party that you are their sincere friend. It will make it more difficult for them to behave in an unfair way.<\/p>\n <h3 id=\"principle-3-5-get-the-other-person-saying-yes-yes-immediately\">\n  <a href=\"#principle-3-5-get-the-other-person-saying-yes-yes-immediately\" class=\"header-link\">\n    Principle 3-5: Get the other person saying &lsquo;yes, yes&rsquo; immediately.\n  <\/a>\n<\/h3><p>Don&rsquo;t begin a discussion on the topics you differ, start on the things you agree on. At the beginning of the debate a skillful speaker should try to get a number of &ldquo;Yes&rdquo; responses.<\/p>\n<p>This is aimed to avoid the psychological pattern where your colleague, girlfriend, customer, says &ldquo;No&rdquo;. The &ldquo;No&rdquo; creates a position which they will struggle to give up without seeming inconsistent.<\/p>\n <h3 id=\"principle-3-6-let-the-other-person-do-a-great-deal-of-the-talking\">\n  <a href=\"#principle-3-6-let-the-other-person-do-a-great-deal-of-the-talking\" class=\"header-link\">\n    Principle 3-6: Let the other person do a great deal of the talking.\n  <\/a>\n<\/h3><p>When debating, let the other party talk themselves out. Don&rsquo;t interrupt them, let them lay out their ideas. Have a sincere and open mind.<\/p>\n<p>When discussing achievements, let your friends and colleagues talk about theirs, don&rsquo;t try to outdo them.<\/p>\n <h3 id=\"principle-3-7-let-the-other-person-feel-that-the-idea-is-his-or-hers\">\n  <a href=\"#principle-3-7-let-the-other-person-feel-that-the-idea-is-his-or-hers\" class=\"header-link\">\n    Principle 3-7: Let the other person feel that the idea is his or hers.\n  <\/a>\n<\/h3><p>Instead of trying to force an idea onto someone else, it is wiser to make suggestions, and let the other person think out the conclusion. Don&rsquo;t sell something to other people, show them how it can solve their problems. Let them sell themselves.<\/p>\n <h3 id=\"principle-3-8-try-honestly-to-see-things-from-the-other-persons-point-of-view\">\n  <a href=\"#principle-3-8-try-honestly-to-see-things-from-the-other-persons-point-of-view\" class=\"header-link\">\n    Principle 3-8: Try honestly to see things from the other person&rsquo;s point of view.\n  <\/a>\n<\/h3><p>Success in dealing with people depends on your ability to sympathetically assume the other person&rsquo;s viewpoint. You can create cooperativeness in conversation when you show that you consider the other person&rsquo;s ideas and feelings as important as your own.<\/p>\n <h3 id=\"principle-3-9-be-sympathetic-with-the-other-persons-ideas-and-desires\">\n  <a href=\"#principle-3-9-be-sympathetic-with-the-other-persons-ideas-and-desires\" class=\"header-link\">\n    Principle 3-9: Be sympathetic with the other person&rsquo;s ideas and desires.\n  <\/a>\n<\/h3><p>People are hungering for sympathy, give it to them.<\/p>\n <h3 id=\"principle-3-10-appeal-to-the-nobler-motives\">\n  <a href=\"#principle-3-10-appeal-to-the-nobler-motives\" class=\"header-link\">\n    Principle 3-10: Appeal to the nobler motives.\n  <\/a>\n<\/h3><p>A person usually has two reasons for doing things: one that sounds good and a real one.\nThey will think of the real one. So to change people, appeal to their nobler motives.<\/p>\n<p>Most people will react favourable if you make them feel that you consider them honest, upright, and fair.<\/p>\n <h3 id=\"principle-3-11-dramatise-your-ideas\">\n  <a href=\"#principle-3-11-dramatise-your-ideas\" class=\"header-link\">\n    Principle 3-11: Dramatise your ideas.\n  <\/a>\n<\/h3><p>Use visuals, sound, and examples to dramatise your ideas. Carnegie gives the example of a salesperson for checkout systems dramatically throwing pennies on the ground while explaining to a prospective customer that with their old checkout systems &ldquo;they are literally throwing pennies away&rdquo;.<\/p>\n <h3 id=\"principle-3-12-throw-down-a-challenge\">\n  <a href=\"#principle-3-12-throw-down-a-challenge\" class=\"header-link\">\n    Principle 3-12: Throw down a challenge.\n  <\/a>\n<\/h3><p>Charles Schwab: &ldquo;The way to get things done, is to stimulate competition. I do not mean in a money-getting way, but in the desire to excel.&rdquo; People tend to be interested in a healthy amount of competition. It&rsquo;s a chance of self-expression and creating a feeling of importance. It&rsquo;s important that this is perceived by your team as playful and without sinister motives.<\/p>\n <h2 id=\"part-4-be-a-leader-how-to-change-people-without-giving-offence-or-arousing-resentment\">\n  <a href=\"#part-4-be-a-leader-how-to-change-people-without-giving-offence-or-arousing-resentment\" class=\"header-link\">\n    Part 4: Be a leader: How to change people without giving offence or arousing resentment\n  <\/a>\n<\/h2><p>This part of the book is directed towards leading and helping other people in a business context.<\/p>\n <h3 id=\"principle-4-1-begin-with-praise-and-honest-appreciation\">\n  <a href=\"#principle-4-1-begin-with-praise-and-honest-appreciation\" class=\"header-link\">\n    Principle 4-1: Begin with praise and honest appreciation.\n  <\/a>\n<\/h3><p>When giving feedback, start with something positive.<\/p>\n <h3 id=\"principle-4-2-call-attention-to-peoples-mistakes-indirectly\">\n  <a href=\"#principle-4-2-call-attention-to-peoples-mistakes-indirectly\" class=\"header-link\">\n    Principle 4-2: Call attention to people&rsquo;s mistakes indirectly.\n  <\/a>\n<\/h3><p>When criticizing you want to get your point across without creating bitter resentment. This works better by pointing out mistakes indirectly. Use &lsquo;AND&rsquo; instead of &lsquo;BUT&rsquo;. Instead of &ldquo;We are really proud of you, John, for raising your grades, but if you had worked harder on calculus your results would have been better&rdquo;, you should say: &ldquo;We are really proud of you, John, for raising your grades, and by continuing your efforts next term, your algebra grade can be up with all the others.&rdquo;<\/p>\n <h3 id=\"principle-4-3-talk-about-your-own-mistakes-before-criticising-the-other-person\">\n  <a href=\"#principle-4-3-talk-about-your-own-mistakes-before-criticising-the-other-person\" class=\"header-link\">\n    Principle 4-3: Talk about your own mistakes before criticising the other person.\n  <\/a>\n<\/h3><p>Admitting one&rsquo;s own mistakes first can help others change their behaviour more easily.<\/p>\n <h3 id=\"principle-4-4-ask-questions-instead-of-giving-direct-orders\">\n  <a href=\"#principle-4-4-ask-questions-instead-of-giving-direct-orders\" class=\"header-link\">\n    Principle 4-4: Ask questions instead of giving direct orders.\n  <\/a>\n<\/h3><p>Don&rsquo;t give direct orders. Instead phrase it as a suggestion. &ldquo;You might consider this.&rdquo; or &ldquo;What do you think of this?&rdquo;. This saves a person&rsquo;s pride and gives them a feeling of importance. People are more likely to accept an order if they played a part in the decision.<\/p>\n <h3 id=\"principle-4-5-let-the-other-person-save-face\">\n  <a href=\"#principle-4-5-let-the-other-person-save-face\" class=\"header-link\">\n    Principle 4-5: Let the other person save face.\n  <\/a>\n<\/h3><p>Even if we think another person is wrong, we only destroy ego by causing them to lose face. What matters is not what you think of them,  it&rsquo;s what they think about themselves.<\/p>\n <h3 id=\"principle-4-6-praise-the-slightest-improvement-and-praise-every-improvement-be-hearty-in-your-approbation-and-lavish-in-your-praise\">\n  <a href=\"#principle-4-6-praise-the-slightest-improvement-and-praise-every-improvement-be-hearty-in-your-approbation-and-lavish-in-your-praise\" class=\"header-link\">\n    Principle 4-6: Praise the slightest improvement and praise every improvement. Be &lsquo;hearty&rsquo; in your approbation and lavish in your praise.&rsquo;\n  <\/a>\n<\/h3><p>Praising and recognizing good work, will reinforce that behaviour. We all crave attention and recognition (see principle 2-4), and will do almost anything to get it. Carnegie makes it very clear that this only works with sincere and specific praise. So it needs to be very clear, what exactly about the project or the behaviour was worth the praise.<\/p>\n <h3 id=\"principle-4-7-give-the-other-person-a-fine-reputation-to-live-up-to\">\n  <a href=\"#principle-4-7-give-the-other-person-a-fine-reputation-to-live-up-to\" class=\"header-link\">\n    Principle 4-7: Give the other person a fine reputation to live up to.\n  <\/a>\n<\/h3><p>If you want a person to improve in a certain aspect, pretend that they already have this trait. They will likely try hard to live up to it.<\/p>\n<p>This is especially powerful to combat a dip in work quality of a good colleague. Remembering them of their past quality, encourages them to get back to that level.<\/p>\n <h3 id=\"principle-4-8-use-encouragement-make-the-fault-seem-easy-to-correct\">\n  <a href=\"#principle-4-8-use-encouragement-make-the-fault-seem-easy-to-correct\" class=\"header-link\">\n    Principle 4-8: Use encouragement. Make the fault seem easy to correct.\n  <\/a>\n<\/h3><p>When you tell your husband, employee or friend that they are doing something stupid, you destroyed every incentive to try and improve. Instead be liberal with encouragement, make it seem easy, show them that you think they can do it, and they will work to excel at the task.<\/p>\n <h3 id=\"principle-4-9-make-the-other-person-happy-about-doing-the-thing-you-suggest\">\n  <a href=\"#principle-4-9-make-the-other-person-happy-about-doing-the-thing-you-suggest\" class=\"header-link\">\n    Principle 4-9: Make the other person happy about doing the thing you suggest.\n  <\/a>\n<\/h3><p>This could for example be a title that indicates the importance of the job. (RS: I am not sure how applicable this is in today&rsquo;s business world). When trying to change attitude or behaviour of a report, keep the following in mind:<\/p>\n<ol>\n<li>Be sincere: It should be about the benefits for the person.<\/li>\n<li>Be empathetic: Make sure you know the goals of the other person.<\/li>\n<li>Consider the benefits that the other person will receive through the attitude change.<\/li>\n<li>Match those benefits to the person&rsquo;s goals.<\/li>\n<li>When you make your requests, make sure to express that the other person will personally benefit by doing it.<\/li>\n<\/ol>\n"},{"title":"Book notes: Four Thousand Weeks by Oliver Burkeman","link":"\/posts\/4000-weeks\/","pubDate":"Tue, 10 Jan 2023 00:00:00 +0000","guid":"\/posts\/4000-weeks\/","description":"<p>I read this book in December 2022 at which point I have lived ~1700\/4000 weeks. Here are my summary notes for every chapter as well as some overall takeaways and questions for the author.<\/p>\n <h2 id=\"introduction-in-the-long-run-were-all-dead\">\n  <a href=\"#introduction-in-the-long-run-were-all-dead\" class=\"header-link\">\n    Introduction: In the long run, we&rsquo;re all dead\n  <\/a>\n<\/h2><ul>\n<li>Most people live only 4000 - 4700 weeks. This seems short or limited during one&rsquo;s own lifetime. It is almost negligible at the cosmic timescale.<\/li>\n<li>This means that proper time management should be on our mind and one of our major concerns.<\/li>\n<li>Modern time management advice focuses on how to get more done in the same time. The author thinks this is a flawed approach as he describes time as an <strong>unstoppable conveyor belt<\/strong> (Edward T. Hall).\n<ul>\n<li>It brings new tasks as quickly as we can dispatch old ones<\/li>\n<li>Becoming more productive just causes the belt to speed up<\/li>\n<li>New work appears magically as we tick things of the to-do list. Someone who appears to answer quick to email requests at work, will get a reputation and receive more requests.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n <h2 id=\"part-1-choosing-to-choose\">\n  <a href=\"#part-1-choosing-to-choose\" class=\"header-link\">\n    Part 1: Choosing to Choose\n  <\/a>\n<\/h2> <h3 id=\"1-the-limit-embracing-life\">\n  <a href=\"#1-the-limit-embracing-life\" class=\"header-link\">\n    1. The limit embracing life\n  <\/a>\n<\/h3><ul>\n<li>\n<p>This chapter shows us that our time and what we can do with it is very limited. Instead of pretending that we can have everything we want (in the future) through ever more work, we should embrace the limitedness and dedicate more time thinking about what matters to us and how to spend our time wisely and <strong>in the moment<\/strong>. We should not live for a future version of our life that might not materialize because we will always shift the goal post further.<\/p>\n<\/li>\n<li>\n<p>Instead of just living our lives as they unfold, it is difficult not to value each moment according to how useful it will be in the future. We prioritise future benefits over current enjoyment.<\/p>\n<ul>\n<li>e.g. using spare time to study a topic that might help at work<\/li>\n<\/ul>\n<\/li>\n<li>\n<p>Our relationship with time is the result of us avoiding to face the painful constraints of reality. We need to confront ourselves with the limitedness of time and that at every point we need to make tough choices that limit the set of things we can do or experience<\/p>\n<ul>\n<li>e.g. visiting &ldquo;all&rdquo; of India, SE Asia, sailing everywhere in the Mediterranean, hiking in the Alps\/Pacific Crest Trail\/Dolomites, seeing the French, Italian, Spanish countryside<\/li>\n<\/ul>\n<\/li>\n<li>\n<p>Procrastination and working hard can both be seen as an avoidance strategy to keep feeling limitless<\/p>\n<\/li>\n<\/ul>\n <h3 id=\"2-the-efficiency-trap\">\n  <a href=\"#2-the-efficiency-trap\" class=\"header-link\">\n    2. The efficiency trap\n  <\/a>\n<\/h3><ul>\n<li>Accept that you can only focus on a small slice of the experiences the world has to offer. Accept that there will be more work tasks on your desk than you could ever complete. Instead focus both in work and in spare time on the few meaningful activities that matter most to you and that have the greatest consequences.<\/li>\n<li>Trying to complete your to-do-list fully probably means that you are working on low-effort, low-value tasks a lot and push the important, hard tasks to the back. Take the time to evaluate the order of tasks on your list.<\/li>\n<li>Daniel Markovits (Yale University): Even the winners who make it to elite universities and then reap the highest salaries, find that their reward is the unending pressure to work with high intensity in order to maintain their income and status that have come to seem like prerequisities for the lives they want to lead (<em>\n<a href=\"https:\/\/www.theatlantic.com\/magazine\/archive\/2019\/09\/meritocracys-miserable-winners\/594760\/\" \n\n    target=\"_blank\"\n    >Markovits: How the Life Became an Endless, Terrible Competition - The Atlantic\n    \n        <span style=\"white-space: nowrap\">&thinsp;<svg\n            style=\"height: 0.7em; width: 0.7em; padding-left: -0.2em;\" focusable=\"false\" data-prefix=\"fas\" data-icon=\"external-link-alt\"\n            class=\"svg-inline--fa fa-external-link-alt fa-w-16\" role=\"img\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\"\n            viewBox=\"0 0 512 512\">\n            <path fill=\"currentColor\"\n                d=\"M432,320H400a16,16,0,0,0-16,16V448H64V128H208a16,16,0,0,0,16-16V80a16,16,0,0,0-16-16H48A48,48,0,0,0,0,112V464a48,48,0,0,0,48,48H400a48,48,0,0,0,48-48V336A16,16,0,0,0,432,320ZM488,0h-128c-21.37,0-32.05,25.91-17,41l35.73,35.73L135,320.37a24,24,0,0,0,0,34L157.67,377a24,24,0,0,0,34,0L435.28,133.32,471,169c15,15,41,4.5,41-17V24A24,24,0,0,0,488,0Z\">\n            <\/path>\n        <\/svg>\n    <\/span>\n        \n    \n<\/a><\/em>)<\/li>\n<li>If you succeed to fit more work into your time, the goalpost starts to shift. New goods, desires, tasks start to become important or obligatory. If your boss notices that you do great work quickly, they will give you more tasks and responsibility. (Note: Getting more responsibility might be desired at the beginning of a career.)<\/li>\n<\/ul>\n <h3 id=\"3-facing-finitude\">\n  <a href=\"#3-facing-finitude\" class=\"header-link\">\n    3. Facing Finitude\n  <\/a>\n<\/h3><ul>\n<li>We should marvel at the fact that we exist and that no next day is guaranteed. Moreover, the very fact that our experiences are transient and not containable or extendable (like the exact feeling of watching a sunset during a holiday) gives them value. If we were able to replay the experiences infinitely they would lose their preciousness. The amazing fact that we are alive now, puts small day-to-day annoyances into perspective.<\/li>\n<\/ul>\n <h3 id=\"4-becoming-a-better-procrastinator\">\n  <a href=\"#4-becoming-a-better-procrastinator\" class=\"header-link\">\n    4. Becoming a better procrastinator\n  <\/a>\n<\/h3><ul>\n<li>A good procrastinator focuses on the important items and decides what tasks to neglect. They also focus only on up to three important (high not medium-value) tasks at a time They also accept that whatever task they will attempt, it will fall short of their perfect standards. Therefore, there is no reason not to start it.<\/li>\n<li>The author also advocates that one should settle in one&rsquo;s romantic life. Mostly, because one eventually <em>has<\/em> to settle (the fantasy of a limitless perfect life will not come), but also because people that do settle are generally happier once they have done it. You should also make it difficult for yourself to back out of the relationship by committing fully: house, family, marriage.<\/li>\n<li>3 principles:\n<ol>\n<li><strong>Pay yourself first when it comes to time<\/strong>. Jessica Abel: &ldquo;If you try to find time for your most valued activities by first dealing with everything else, in the hope that some time will be left at the end of the day, you&rsquo;ll be disappointed.&rdquo; There is no moment in future when you&rsquo;ll magically be done with everything and have loads of free time.<\/li>\n<li><strong>Limit your work in progress.<\/strong> Focus on no more than 3 items on your list at the same time. Otherwise you&rsquo;ll switch to easier tasks when the meaningful once get difficult.<\/li>\n<li><strong>Resist the allure of middling priorities:<\/strong> Don&rsquo;t do the kind-of-important tasks as they will distract you from the truly meaningful ones.<\/li>\n<\/ol>\n<\/li>\n<\/ul>\n <h3 id=\"5-the-watermelon-problem\">\n  <a href=\"#5-the-watermelon-problem\" class=\"header-link\">\n    5. The Watermelon Problem\n  <\/a>\n<\/h3><ul>\n<li>You won&rsquo;t be able to have full control over your attention, but you should be able to retain some control over it in order to have meaningful experiences. This means limiting social media usage and only using it deliberately as you won&rsquo;t be able to win against an army of paid experts trying to steal your attention (it&rsquo;s just too good).<\/li>\n<\/ul>\n <h3 id=\"6-the-intimate-interrupter\">\n  <a href=\"#6-the-intimate-interrupter\" class=\"header-link\">\n    6. The intimate interrupter\n  <\/a>\n<\/h3><ul>\n<li>When we stop work to look at social media it is true that social media apps are exploiting us. It is also true that we have an intimate interrupter in us that promises us an easier life if only we avert attention from the current challenging tasks to something mindless.<\/li>\n<li>When you try to focus on something important, you are forced to face your limits, an experience that is uncomfortable because precisely the task at hand is so important.<\/li>\n<li>Zen Buddhist say that &ldquo;the entirety of human suffering can be explained by the effort to resist paying full attention to the way things are going, because we wish they were going differently.&rdquo;<\/li>\n<\/ul>\n <h2 id=\"part-2-beyond-control\">\n  <a href=\"#part-2-beyond-control\" class=\"header-link\">\n    Part 2: Beyond control\n  <\/a>\n<\/h2> <h3 id=\"7-we-never-really-have-time\">\n  <a href=\"#7-we-never-really-have-time\" class=\"header-link\">\n    7. We never really have time\n  <\/a>\n<\/h3><ul>\n<li>The future is not guaranteed, so we should stop spending our time worrying about whether it will confirm to our desire and instead focus on the only bit we can control which is the present moment.<\/li>\n<li>Plans should be viewed as a statement of intend instead of a contract with the future. The future does not have to comply both in personal life and at work.<\/li>\n<li>\n<a href=\"https:\/\/en.wikipedia.org\/wiki\/Hofstadter%27s_law\" \n\n    target=\"_blank\"\n    >Hofstadter&rsquo;s Law\n    \n        <span style=\"white-space: nowrap\">&thinsp;<svg\n                style=\"height: 0.7em; width: 0.7em; margin-left: -0.2em;\" focusable=\"false\" data-prefix=\"fas\" data-icon=\"external-link-alt\"\n                class=\"svg-inline--fa fa-external-link-alt fa-w-16\" role=\"img\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\"\n                viewBox=\"0 0 640 512\">\n                <path fill=\"currentColor\"\n                    d=\"M640 51.2l-.3 12.2c-28.1 .8-45 15.8-55.8 40.3-25 57.8-103.3 240-155.3 358.6H415l-81.9-193.1c-32.5 63.6-68.3 130-99.2 193.1-.3 .3-15 0-15-.3C172 352.3 122.8 243.4 75.8 133.4 64.4 106.7 26.4 63.4 .2 63.7c0-3.1-.3-10-.3-14.2h161.9v13.9c-19.2 1.1-52.8 13.3-43.3 34.2 21.9 49.7 103.6 240.3 125.6 288.6 15-29.7 57.8-109.2 75.3-142.8-13.9-28.3-58.6-133.9-72.8-160-9.7-17.8-36.1-19.4-55.8-19.7V49.8l142.5 .3v13.1c-19.4 .6-38.1 7.8-29.4 26.1 18.9 40 30.6 68.1 48.1 104.7 5.6-10.8 34.7-69.4 48.1-100.8 8.9-20.6-3.9-28.6-38.6-29.4 .3-3.6 0-10.3 .3-13.6 44.4-.3 111.1-.3 123.1-.6v13.6c-22.5 .8-45.8 12.8-58.1 31.7l-59.2 122.8c6.4 16.1 63.3 142.8 69.2 156.7L559.2 91.8c-8.6-23.1-36.4-28.1-47.2-28.3V49.6l127.8 1.1 .2 .5z\">\n                <\/path>\n            <\/svg>\n        <\/span>\n        \n    \n<\/a>, i.e. &ldquo;Any task you are planning to tackle will always take longer than you expect, &rsquo;even when you take into account Hofstadter&rsquo;s Law.&rsquo;&rdquo;, should be taken into account when planning out a task with a deadline.<\/li>\n<\/ul>\n <h3 id=\"8-you-are-here\">\n  <a href=\"#8-you-are-here\" class=\"header-link\">\n    8. You are here\n  <\/a>\n<\/h3><ul>\n<li>We are focusing too much on our future and view the present only as a means to reach that better, less stressful, fantasy future that never arrives, thereby missing life. (When-I-finally mindset)<\/li>\n<li>We should treat any meaningful activity as if it was the last instance of it, because inevitably it will be true at some point (last time talking to someone, last time seeing the sea, last time visiting a place).<\/li>\n<li>Contrary to the above: Don&rsquo;t try too hard to make an experience meaningful, you are setting yourself up for failure. Realise that you are living in the moment anyway, whether you want it or not. No reason to force it.<\/li>\n<\/ul>\n <h3 id=\"9-rediscovering-rest\">\n  <a href=\"#9-rediscovering-rest\" class=\"header-link\">\n    9. Rediscovering Rest\n  <\/a>\n<\/h3><ul>\n<li>Use at least some sizeable amount of your leisure time on <strong>atelic activities<\/strong>, i.e. things we enjoy for their own sake, e.g. hiking, meeting friends or pursuing a random hobby. It should not have any goals attached like to profit from it in the future (networking) or personal growth (studying Bayesian statistics).<\/li>\n<li>This is against the modern trend to invest leisure time to get future payoff, which is against ancient (Aristotles) wisdom.<\/li>\n<li>Rest for the sake of rest and enjoy lazy hours for their own sake.\nA good hobby should probably feel a bit embarrassing; that&rsquo;s a good sign you are doing it for its own sake.<\/li>\n<\/ul>\n <h3 id=\"10-the-impatience-spiral\">\n  <a href=\"#10-the-impatience-spiral\" class=\"header-link\">\n    10. The Impatience Spiral\n  <\/a>\n<\/h3><ul>\n<li>We are becoming more impatient and have shorter attention.<\/li>\n<li>A good example is reading, people report feeling overpowered when trying to pick up a book. One reason seems to be that the act of reading takes longer than they would like (in fact the reading time is not in the reader&rsquo;s control). We are too impatient to let loose and give ourselves to the task.<\/li>\n<li>When you finally face the truth that you can&rsquo;t dictate how fast things go, you stop trying to outrun your anxiety, and your anxiety is transformed.<\/li>\n<\/ul>\n <h3 id=\"11-staying-on-the-bus\">\n  <a href=\"#11-staying-on-the-bus\" class=\"header-link\">\n    11. Staying on the Bus\n  <\/a>\n<\/h3><ul>\n<li>We are made so uneasy by the experience of allowing reality to unfold at its own speed that when we are faced with a problem we rush towards a resolution - any resolution, as long as we can tell ourselves we are dealing with the situation, thereby maintaining our feeling of staying in control.<\/li>\n<li>Three principles of patience:<\/li>\n<\/ul>\n<ol>\n<li>Accept that there is no state of &lsquo;having no problems&rsquo;. Therefore, stop trying to reach that state by working too much. What is a problem, really? It&rsquo;s simply something that you address yourself to it.<\/li>\n<li>Embrace radical incrementalism. Make the task a smaller part of your daily routine.<\/li>\n<li>Stay on the bus. Everyone starts out badly, and has to grow by failing, copying others and accumulating experience. Only with a lot of patience will you reach the state where <em>distinctive<\/em> work begins.<\/li>\n<\/ol>\n <h3 id=\"12-the-loneliness-of-the-digital-nomad\">\n  <a href=\"#12-the-loneliness-of-the-digital-nomad\" class=\"header-link\">\n    12. The Loneliness of the Digital Nomad\n  <\/a>\n<\/h3><ul>\n<li>The value of time depends also on it&rsquo;s network value. How close are your friends and family? Do you have time off when they have time off?<\/li>\n<li>Study: People in long-term unemployment get a happiness boost when the weekend arrives, e.g. when their friends become available.<\/li>\n<li>Your time should be in sync with the time of people you care about.<\/li>\n<li>You can push your life further into the direction communal sort of freedom (rather than the individual freedom):\n<ul>\n<li>remove flexibility from your schedule by joining local sport teams or campaign groups<\/li>\n<li>prioritise activities in the physical instead of the digital world<\/li>\n<li>let your friends and family every once in a while dictate your schedule<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n <h3 id=\"13-cosmic-insignificance-therapy\">\n  <a href=\"#13-cosmic-insignificance-therapy\" class=\"header-link\">\n    13. Cosmic Insignificance Therapy\n  <\/a>\n<\/h3><ul>\n<li>Human history has the time span of the blink of an eye in cosmic terms. If you are stressed  by work, relationship, daily encounters, you can focus on zooming out to realize that all these problems shrink to irrelevance.<\/li>\n<li>Humans have an &rsquo;egocentricity bias&rsquo;, i.e. over-estimating their relevance and existence in the world. This unfortunately leads to an unrealistic definition of what it means to use your finite time well.<\/li>\n<li>Once you realise your own insignificance you realise that you are holding yourself at too high standards. This is liberating because you might realise that many ordinary things you are already doing can be meaningful and you have just devalued them so far: preparing meals for your children, or that any career that makes the world a tiny bit better is worth pursuing without career aspiration.<\/li>\n<\/ul>\n <h3 id=\"14-the-human-disease\">\n  <a href=\"#14-the-human-disease\" class=\"header-link\">\n    14. The Human Disease\n  <\/a>\n<\/h3><ul>\n<li>\n<p>Accept that you can and will fail in relationships, work, etc. because it is often out of your control.<\/p>\n<\/li>\n<li>\n<p>A life spent on achieving security with respect to time when it is unattainable can only ever feel provisional.<\/p>\n<\/li>\n<li>\n<p>Fitting quote from Marie-Louise von Franz (psychologist):<\/p>\n<blockquote>\n<p>&ldquo;There is a strange attitude and feeling that one is <strong>not yet<\/strong> in real life. For the time being one is doing this or that, but whether it is a woman or a job, it is <strong>not yet<\/strong> what is really wanted, and there is always the fantasy that sometime in the future the real thing will come about&hellip;The one thing dreaded throughout by such a type of man is to be bound to anything whatever. There is a terrific fear of being pinned down, of entering space and time completely, and of being the unique man that one is.&rdquo;<\/p>\n<\/blockquote>\n<\/li>\n<li>\n<p>You have to admit defeat. In exchange for acceptance you get to actually <em>be<\/em> there.<\/p>\n<\/li>\n<\/ul>\n <h2 id=\"afterword\">\n  <a href=\"#afterword\" class=\"header-link\">\n    Afterword\n  <\/a>\n<\/h2><ul>\n<li>We are giving up the master hope: That somehow this <em>isn&rsquo;t<\/em> it, that this is just a dress rehearsal, and that one day you&rsquo;ll feel truly confident that you have what it takes.<\/li>\n<li>The average human lifespan is absurdly terrifyingly, insultingly short. But that isn&rsquo;t a reason for living in anxiety about how to spend your limited time. It&rsquo;s a cause of relief, you can give up the quest to become the optimised, infinitely capable person you are supposedly to be. Instead you can roll up the sleeves and work on what is gloriously possible.<\/li>\n<\/ul>\n <h2 id=\"10-tools-for-embracing-your-finitude\">\n  <a href=\"#10-tools-for-embracing-your-finitude\" class=\"header-link\">\n    10 Tools for Embracing Your Finitude\n  <\/a>\n<\/h2><ol>\n<li>Have a fixed volume approach to productivity\n<ul>\n<li>Keep 2 to-do lists: one open (everything) and one closed (only 10 tasks at most)<\/li>\n<li>Establish pre-determined time boundaries for work<\/li>\n<\/ul>\n<\/li>\n<li>Focus on one project at a time<\/li>\n<li>Decide in advance what to fail at<\/li>\n<li>Keep a done list to focus on what you have already completed<\/li>\n<li>Consciously pick your battles in activism and charity<\/li>\n<li>Use boring, single-purpose technology<\/li>\n<li>Pay more attention to every moment<\/li>\n<li>Be a <em>researcher<\/em> in relationships. Try to determine &ldquo;Who is this person that sits next to me?&rdquo;<\/li>\n<li>If a generous impulse arrives in your mind, check in on a friend, give money, act right away on it instead of putting it off for later<\/li>\n<li>Practice doing nothing through meditation<\/li>\n<\/ol>\n <h2 id=\"readers-notes\">\n  <a href=\"#readers-notes\" class=\"header-link\">\n    Reader&rsquo;s notes\n  <\/a>\n<\/h2><ul>\n<li>\n<p>I sometimes suffer from analysis paralysis. For example, it&rsquo;s difficult for me to make vacation decisions because I always feel that I need to use my free time optimally to get the best experience, e.g. trying to find the best, most authentic country, hotel, activity, restaurant.<\/p>\n<\/li>\n<li>\n<p>I suffer from the behaviour discussed in Chapter 8. I focus a lot on the road to get to a fantasy future instead of properly valuing the present. In some healthy dose I believe this is my strength and has led me to some personal and academic success. However, I see that this can become harmful or regrettable quickly.<\/p>\n<\/li>\n<li>\n<p>I really like the concept of atelic activities and that a good way of identifying them is that it&rsquo;s probably the slightly embarrasing hobbies.<\/p>\n<\/li>\n<li>\n<p>Fitting quote from a \n<a href=\"https:\/\/news.ycombinator.com\/item?id=34385766\" \n\n    target=\"_blank\"\n    >hacker news thread\n    \n        <span style=\"white-space: nowrap\">&thinsp;<svg\n            style=\"height: 0.7em; width: 0.7em; padding-left: -0.2em;\" focusable=\"false\" data-prefix=\"fas\" data-icon=\"external-link-alt\"\n            class=\"svg-inline--fa fa-external-link-alt fa-w-16\" role=\"img\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\"\n            viewBox=\"0 0 512 512\">\n            <path fill=\"currentColor\"\n                d=\"M432,320H400a16,16,0,0,0-16,16V448H64V128H208a16,16,0,0,0,16-16V80a16,16,0,0,0-16-16H48A48,48,0,0,0,0,112V464a48,48,0,0,0,48,48H400a48,48,0,0,0,48-48V336A16,16,0,0,0,432,320ZM488,0h-128c-21.37,0-32.05,25.91-17,41l35.73,35.73L135,320.37a24,24,0,0,0,0,34L157.67,377a24,24,0,0,0,34,0L435.28,133.32,471,169c15,15,41,4.5,41-17V24A24,24,0,0,0,488,0Z\">\n            <\/path>\n        <\/svg>\n    <\/span>\n        \n    \n<\/a> on the book:<\/p>\n<blockquote>\n<p>&ldquo;If you are depressed you are living in the past. If you are anxious you are living in the future. If you are at peace you are living in the present.&rdquo;<\/p>\n<p><em>- Lao Tzu<\/em><\/p>\n<\/blockquote>\n<\/li>\n<\/ul>\n <h3 id=\"questions-for-the-author\">\n  <a href=\"#questions-for-the-author\" class=\"header-link\">\n    Questions for the author\n  <\/a>\n<\/h3><ul>\n<li>How do you balance the invested time in the <em>now<\/em> vs. invested time in the future? How do you know what the right level is? There must be an optimal path, e.g. if you have to work hard for one year to make enough money to retire afterwards, that should be considered worthwhile? The advice of the book seems dedicated to people with comfortable and converged careers or self-employed people, like the author.<\/li>\n<\/ul>\n"}]}}