{"title":"Matt's Dev Blog - Programming","link":[{"@attributes":{"href":"https:\/\/mattsegal.dev\/","rel":"alternate"}},{"@attributes":{"href":"https:\/\/mattsegal.dev\/feeds\/programming.atom.xml","rel":"self"}}],"id":"https:\/\/mattsegal.dev\/","updated":"2022-05-03T12:00:00+10:00","entry":[{"title":"How I hunt down (and fix) errors in production","link":{"@attributes":{"href":"https:\/\/mattsegal.dev\/prod-bug-hunt.html","rel":"alternate"}},"published":"2022-05-03T12:00:00+10:00","updated":"2022-05-03T12:00:00+10:00","author":{"name":"Matthew Segal"},"id":"tag:mattsegal.dev,2022-05-03:\/prod-bug-hunt.html","summary":"<p>Once you\u2019ve deployed your web app to prod there is a moment of satisfaction: a brief respite where you can reflect on your hard work. You sit, adoringly refreshing the homepage of www.mysite.com to watch it load over and over. It\u2019s beautiful, perfect, timeless. A glittering \u2026<\/p>","content":"<p>Once you\u2019ve deployed your web app to prod there is a moment of satisfaction: a brief respite where you can reflect on your hard work. You sit, adoringly refreshing the homepage of www.mysite.com to watch it load over and over. It\u2019s beautiful, perfect, timeless. A glittering crystal palace of logic and reason. Then people start to actually use it in earnest and you begin to receive messages like this in Slack:<\/p>\n<blockquote>\n<p>Hey Matt. I am not getting reply emails for case ABC123 Jane Doe<\/p>\n<\/blockquote>\n<p>Ideally, with a <a href=\"https:\/\/mattsegal.dev\/django-monitoring-stack.html\">solid monitoring stack<\/a>, you will be alerted of bugs and crashes as they happen, but some may still slip through the cracks. In any case, you\u2019ve got to find and fix these issues promptly or your users will learn to distrust you and your software, kicking off a feedback loop of negative perception. Best to nip this in the bud.<\/p>\n<p>So a user has told you about a bug in production, and you\u2019ve gotta fix it - how do you figure out what went wrong? Where do you start? In this post I\u2019ll walk you through an illustrative example of hunting down a bug in our email system.<\/p>\n<h2>The problem<\/h2>\n<p>So this was the message I got over Slack from a user of my website:<\/p>\n<blockquote>\n<p>Hey Matt. I am not getting reply emails for case ABC123 Jane Doe<\/p>\n<\/blockquote>\n<p>A user was not receiving an email, despite their client insisting that they had sent the email. That\u2019s all I know so far...<\/p>\n<h2>More detail<\/h2>\n<p>... and it\u2019s not quite enough. I know the case number but that\u2019s not enough to track any error messages efficiently. I followed up with my user to check:<\/p>\n<ul>\n<li>what address was used to send the email (eg. jane.doe@gmail.com)<\/li>\n<li>when they attempted to send the email (over the weekend apparently)<\/li>\n<\/ul>\n<p>With this info in hand I can focus my search on a particular time range and sender address.<\/p>\n<h2>Knowledge of the system<\/h2>\n<p>There\u2019s one more piece of info you need to have before you start digging into log files and such: what are the components of the email-receiving system? I assembled this one myself, but under other circumstances, in a team setting, I might ask around to build a complete picture of the system. In this case it looks like this:<\/p>\n<p><img alt=\"email-system\" src=\"https:\/\/mattsegal.dev\/img\/prod-bug-hunt\/email-system.png\"><\/p>\n<p>In brief:<\/p>\n<ul>\n<li>The client sends an email from their email client<\/li>\n<li>The email travels through the mystical email realm<\/li>\n<li>SendGrid (SaaS product) receives the email via SMTP<\/li>\n<li>SendGrid sends the email content to a webhook URL on my webserver as an HTTP POST request<\/li>\n<li>My web application ingests the POST request and stores the relevant bits in a database table<\/li>\n<\/ul>\n<p>Inside the web server there\u2019s a pretty standard \u201c3 tier\u201d setup:<\/p>\n<ul>\n<li>NGINX receives all web traffic, sends requests onwards to the app server<\/li>\n<li>Gunicorn app server running the Django web application<\/li>\n<li>A database hosting all the Django tables (including email content)<\/li>\n<\/ul>\n<p><img alt=\"web server\" src=\"https:\/\/mattsegal.dev\/img\/prod-bug-hunt\/webserver.png\"><\/p>\n<h2>My approach<\/h2>\n<p>So, the hunt begins for evidence of this missing email, but where to start looking? One needs a search strategy.  In this case, my intuition is to check the \u201cstart\u201d and \u201cend\u201d points of this system and work my way inwards. My reasoning is:<\/p>\n<ul>\n<li>if we definitely knew that SendGrid did not receive the email, then there\u2019d be no point checking anywhere downstream (saving time)<\/li>\n<li>if we knew that the database contained the email (or it was showing up on the website itself!) then there\u2019d be no point checking upstream services like SendGrid or NGINX (saving time)<\/li>\n<\/ul>\n<p>So do you start upstream or downstream? I think you do whatever\u2019s most convenient and practical. <\/p>\n<p>Of course you may have a special system-specific knowledge that leads you towards checking one particular component first (eg. \u201cour code is garbage it\u2019s probably our code, let\u2019s check that first\u201d), which is a cool and smart thing to do. Gotta exploit that domain knowledge.<\/p>\n<h2>Did SendGrid get the email?<\/h2>\n<p>In this case it seemed easiest to check SendGrid\u2019s fancy web UI for evidence of an email failing to be received or something. I had a click around and found their reporting on this matter to be... pretty fucking useless to be honest.<\/p>\n<p><img alt=\"Sendgrid chart\" src=\"https:\/\/mattsegal.dev\/img\/prod-bug-hunt\/sendgrid-chart.png\"><\/p>\n<p>This is all I could find - so I\u2019ve learned that we usually get emails. Reassuring but not very helpful in this case. They have good reporting on email sending, but this dashboard was disappointingly vague.<\/p>\n<h2>Is the email in the database?<\/h2>\n<p>After checking SendGrid (most upstream) I then checked to see if the the database (most downstream) had received the email content.<\/p>\n<p>As an aside, I also checked if the email was showing up in the web UI, which it wasn\u2019t (maybe my user got confused and looked at the wrong case?). It\u2019s good to quickly check for stupid obvious things just in case.<\/p>\n<p>Since we don\u2019t have a high volume of emails I was able to check the db by just eyeballing the Django admin page. If we were getting many emails per day I would have instead run a query in the Django shell via the ORM (or run an SQL query directly on the db).<\/p>\n<p><img alt=\"Django admin page\" src=\"https:\/\/mattsegal.dev\/img\/prod-bug-hunt\/django-admin.png\"><\/p>\n<p>It wasn\u2019t there &gt;:(<\/p>\n<h2>Did my code explode?<\/h2>\n<p>So far we know that <em>maybe<\/em> SendGrid got the email and it\u2019s definitely not in the database. Since it was easy to do I quickly scanned my error monitoring logs (using <a href=\"https:\/\/sentry.io\/for\/python\/\">Sentry<\/a>) for any relevant errors. Nothing. No relevant application errors during the expected time period found.<\/p>\n<p><img alt=\"Sentry error logs\" src=\"https:\/\/mattsegal.dev\/img\/prod-bug-hunt\/sentry-errors.png\"><\/p>\n<p><strong>Aside<\/strong>: yes my Sentry issue inbox is a mess. I know, it's bad. Think of it like an email in box with 200 unread emails, most of them spam, but maybe a few important ones in the pile. For both emails and error reports, it's best to have a clean inbox.<\/p>\n<p><strong>Aside<\/strong>: ideally I would get Slack notifications for any production errors and investigate them as they happen but Sentry recently made Slack integration a paid feature and I haven\u2019t decided whether to upgrade or move.<\/p>\n<h2>Did NGINX receive the POST request?<\/h2>\n<p>Looking back upstream, I wanted to know if I could find anything interesting in the NGINX logs. If you\u2019re not familiar with webserver logfiles I give a rundown in <a href=\"https:\/\/mattsegal.dev\/django-gunicorn-nginx-logging.html\">this article<\/a> covering a typical Django stack.<\/p>\n<p>All my server logs get sent to SumoLogic, a log aggregator (explained in the \u201clog aggregation\u201d section of <a href=\"https:\/\/mattsegal.dev\/django-monitoring-stack.html\">this article<\/a>), where I can search through them in a web UI.<\/p>\n<p>I checked the NGINX access logs for all incoming requests to the email webhook path in the relevant timeframe and found nothing interesting. This shows NGINX is receiving email data in general, which is good.<\/p>\n<p><img alt=\"Sumologic search of access logs\" src=\"https:\/\/mattsegal.dev\/img\/prod-bug-hunt\/sumologic-access-search.png\"><\/p>\n<p>Next I checked the NGINX error logs... and found a clue!<\/p>\n<p><img alt=\"Sumologic search of error logs\" src=\"https:\/\/mattsegal.dev\/img\/prod-bug-hunt\/sumologic-error-search.png\"><\/p>\n<p>For those who don\u2019t want to squint at the screenshot above this was the error log:<\/p>\n<blockquote>\n<p>2022\/04\/30 02:38:40 [error] 30616#30616: *129401 client intended to send too large body: 21770024 bytes, client: 172.70.135.74, server: www.mysite.com, request: \"POST \/email\/receive\/ HTTP\/1.1\", host: \"www.mysite.com\u201d<\/p>\n<\/blockquote>\n<p>This error, which occurs when in receiving a POST request to the webhook URL, lines up with the time that the client apparently sent the email. So it seems likely that this is related to the email problem.<\/p>\n<h2>What is going wrong?<\/h2>\n<p>I googled the error message and found <a href=\"https:\/\/stackoverflow.com\/questions\/44741514\/nginx-error-client-intended-to-send-too-large-body\">this StackOverflow post<\/a>. It seems that NGINX limits the size of requests that it will receive (which is configurable via the nginx.conf file). I checked my NGINX config and I had a limit of 20MB set. Checking my email ingestion code, it seems like all the file attachments are included in the HTTP request body. So... my guess was that the client sending the email attached more than 20MB of attachments (an uncompressed phone camera image is ~5MB) and NGINX refused to receive that request. Most email providers (eg Gmail) offer ~25MB of attachments per email.<\/p>\n<h2>Testing the hypothesis<\/h2>\n<p>I actually didn\u2019t do this because I got a little over-exicted and immediately wrote and pushed a fix. <\/p>\n<p>What I should have done is verified that the problem I had in mind actually exists. I should have tried to send a 21MB email to our staging server to see if I could reproduce the error, plus asked my user to ask the client if she was sending large files in her email.<\/p>\n<p>Oops. A small fuckup given I think the error message is pretty clear about what the problem is.<\/p>\n<h2>The fix<\/h2>\n<p>The fix was pretty simple, as it often is in these cases, I bumped up the NGINX request size limit (<code>client_max_body_size<\/code>) to 60MB. That might be a little excessive, perhaps 30MB would have been fine, but whatever. I updated the config file in source control and deployed it to the staging and prod environments. I tested that I can send larger files by sending a 24MB email attachment to the staging server.<\/p>\n<h2>Aftermath<\/h2>\n<p>We\u2019ve asked the client to re-send her email. Hopefully it comes through and all is well.<\/p>\n<p>I checked further back in the SumoLogic and this is not the first time this error has happened, meaning we\u2019ve dropped a few emails. I\u2019ll need to notify the team about this. <\/p>\n<p>If I had more time to spend on this project and I\u2019d consider adding some kind of alert to NGINX error logs so that we\u2019d see them pop up in Slack - maybe SumoLogic offers this, I haven\u2019t checked. <\/p>\n<p>Another option would be going with an alternative to SendGrid that had more useful reporting on failed webhook delivery attempts.<\/p>\n<h2>Overview<\/h2>\n<p>Although it can sometimes be stressful, finding and fixing these problems can also be a lot of fun. It\u2019s like a detective game where you are searching for clues to crack the case.<\/p>\n<p>In summary my advice for productively hunting down errors in production are:<\/p>\n<ul>\n<li>Gather info from the user who reported the error<\/li>\n<li>Mentally sketch a map of the system<\/li>\n<li>Check each system component for clues, using a search strategy<\/li>\n<li>Use these clues to develop a hypothesis about what went wrong<\/li>\n<li>Test the hypothesis if you can (before writing a fix)<\/li>\n<li>Build, test, ship a fix (then check it's fixed)<\/li>\n<li>Tell your users the good news<\/li>\n<\/ul>\n<p>Importantly I was only able to solve this issue because I had access to my server log files. A good server monitoring setup makes these issues much quicker and less painful to crack. If you want to know what montioring tools I like to use in my projects, check out <a href=\"https:\/\/mattsegal.dev\/django-monitoring-stack.html\">my Django montioring stack<\/a>.<\/p>","category":{"@attributes":{"term":"Programming"}}},{"title":"DevOps in academic research","link":{"@attributes":{"href":"https:\/\/mattsegal.dev\/devops-academic-research.html","rel":"alternate"}},"published":"2021-11-21T12:00:00+11:00","updated":"2021-11-21T12:00:00+11:00","author":{"name":"Matthew Segal"},"id":"tag:mattsegal.dev,2021-11-21:\/devops-academic-research.html","summary":"<p>I'd like to share some things I've learned and done in the 18 months I worked as a \"Research DevOps Specialist\" for a team of infectious disease <a href=\"https:\/\/www.bmj.com\/about-bmj\/resources-readers\/publications\/epidemiology-uninitiated\/1-what-epidemiology\">epidemiologists<\/a>.\nPrior to this job I'd worked as a web developer for four years and I'd found that the day-to-day had become quite \u2026<\/p>","content":"<p>I'd like to share some things I've learned and done in the 18 months I worked as a \"Research DevOps Specialist\" for a team of infectious disease <a href=\"https:\/\/www.bmj.com\/about-bmj\/resources-readers\/publications\/epidemiology-uninitiated\/1-what-epidemiology\">epidemiologists<\/a>.\nPrior to this job I'd worked as a web developer for four years and I'd found that the day-to-day had become quite routine. Web dev is a mature field where most of the hard problems have been solved. Looking for something new, I started a new job at a local university in early 2020. The job was created when my colleagues wrote ~20k lines of Python code and then found out what a pain in the ass it is to maintain a medium-sized codebase. It's the usual story: the code is fragile, it's slow, it's easy to break things, changes are hard to make. I don't think this situation is anyone's fault per-se: it arises naturally whenever you write a big pile of code.<\/p>\n<p>In the remainder of this post I'll talk about the application we were working on and the awesome, transformative, &lt;superlative> power of:<\/p>\n<ul>\n<li>mapping your workflow<\/li>\n<li>an automated test suite<\/li>\n<li>performance improvements<\/li>\n<li>task automation<\/li>\n<li>visualisation tools; and<\/li>\n<li>data management<\/li>\n<\/ul>\n<p>If you're a web developer, you might be interested to see how familar practices can be applied in different contexts. If you're an academic who uses computers in your work, then you might be interested to learn how some ideas from software development can help you be more effective.<\/p>\n<h2>The application in question<\/h2>\n<p>We were working on a <a href=\"https:\/\/en.wikipedia.org\/wiki\/Compartmental_models_in_epidemiology\">compartmental<\/a> infectious disease model to simulate the spread of tuberculosis. Around March 2020 the team quickly pivoted to modelling COVID-19 as well (surprise!). There's documentation <a href=\"http:\/\/summerepi.com\/\">here<\/a> with <a href=\"http:\/\/summerepi.com\/examples\/index.html\">examples<\/a> if you want to poke around.<\/p>\n<p>In brief, it works like this: you feed the model some data for a target region (population, demographics, disease attributes) and then you simulate what's going to happen in the future (infections, deaths, etc). This kind of modelling is useful for exploring different scenarios, such as \"what would happen if we closed all the schools?\" or \"how should we roll out our vaccine?\". These results are presented to stakeholders, usually from some national health department, via a PowerBI dashboard. Alternatively the results are included in a fancy academic paper as graphs and tables.<\/p>\n<p><img alt=\"notifications\" src=\"https:\/\/mattsegal.dev\/img\/devops-academia\/notifications.png\"><\/p>\n<p>(Note: \"notifications\" are the infected cases that we know about)<\/p>\n<p>A big part of our workflow was model calibration. This is where we would build a disease model with variable input parameters, such as the \"contact rate\" (proportional to how infectious the disease is), and then try to learn the best value of those parameters given some historical data (such as a timeseries of the number of cases). We did this calibration using a technique called <a href=\"https:\/\/en.wikipedia.org\/wiki\/Markov_chain_Monte_Carlo\">Markov chain Monte Carlo<\/a> (MCMC). MCMC has many nice statistical properties, but requires running the model 1000 to 10,000 times - which is quite computationally expensive.<\/p>\n<p><img alt=\"calibration\" src=\"https:\/\/mattsegal.dev\/img\/devops-academia\/calibration.png\"><\/p>\n<p>This all sounds cool, right? It was! The problem is that when I started. the codebase just hadn't been getting the care it needed given its size and complexity. It was becoming unruly and unmanageable. Trying to read and understand the code was stressing me out.<\/p>\n<p>Furthermore, running calibrations was <em>slow<\/em>. It could take days or weeks. There was a lot of manual toil where someone needed to upload the application to the university computer cluster, babysit the run and download the outputs, and then post-process the results on their laptop. The execution of the code itself took days or weeks. This time-sink is a problem when you're trying to submit an academic paper and a reviewer is like \"hey can you just re-run everything with this one small change\" and that means re-running days or weeks of computation.<\/p>\n<p>So there were definitely some pain points and room for improvement when I started.<\/p>\n<h2>Improving our workflow with DevOps<\/h2>\n<p>The team knew that there were problems and everybody wanted to improve the way we worked. If I could point to any key factor in our later succeses it would be their willingness to change and openness to new things.<\/p>\n<p>I took a \"DevOps\" approach to my role (it was in the job title after all). What do I mean by DevOps? This <a href=\"https:\/\/www.atlassian.com\/devops\/what-is-devops\">article<\/a> sums it up well:<\/p>\n<blockquote>\n<p>a set of practices that works to automate and integrate the processes between [different teams], so they can build, test, and release software faster and more reliably<\/p>\n<\/blockquote>\n<p>Traditionally this refers to work done by Software <strong>Dev<\/strong>elopers and IT <strong>Op<\/strong>eration<strong>s<\/strong>, but I think it can be applied more broadly. In this case we had a software developer, a mathematician, an epidemiologist and a data visualisation expert working on a common codebase.<\/p>\n<p>A key technique of DevOps is to think about the entire system that produces finished work. You want to conceive of it as a kind of pipeline to be optimised end-to-end, rather than focusing on any efficiencies achieved by individuals in isolation. One is encouraged to explicitly map the flow of work through the system. Where does work come from? What stages does it need to flow through to be completed? Where are the bottlenecks? Importantly: what is the goal of the system?<\/p>\n<p>In this case, I determined that our goal was to produce robust academic research, in the form of published papers or reports. My key metric was to minimise \"time to produce a new piece of research\", since I believed that our team's biggest constraint was time, rather than materials or money or ideas or something else. Another key metric was \"number of errors\", which should be zero: it's bad to publish incorrect research.<\/p>\n<p>If you want to read more about DevOps I recommend checking out <a href=\"https:\/\/www.goodreads.com\/book\/show\/17255186-the-phoenix-project\">The Phoenix Project<\/a> and\/or <a href=\"https:\/\/www.amazon.com.au\/Goal-Process-Ongoing-Improvement\/dp\/0884271951\">The Goal<\/a> (the audiobooks are decent).<\/p>\n<h2>Mapping the workflow<\/h2>\n<p>As I mentioned, you want to conceive of your team's work as a kind of pipeline. So what was our pipeline? After chatting with my colleagues I came up with something like this:<\/p>\n<p><img alt=\"workflow\" src=\"https:\/\/mattsegal.dev\/img\/devops-academia\/autumn-workflow.png\"><\/p>\n<p>It took several discussions to nail this process down. People typically have decent models of how they work floating around in their heads, but it's not common to write it out explicitly like this. Getting this workflow on paper gave us some clear targets for improvement. For example:<\/p>\n<ul>\n<li>Updating a model required tedious manual testing to check for regressions<\/li>\n<li>The update\/calibrate cycle was the key bottleneck, because calibration ran slowly and manual steps were required to run long jobs on the compute cluster<\/li>\n<li>Post processing was done manually and was typically only done by the one person who knew the correct scripts to run<\/li>\n<\/ul>\n<h2>Testing the codebase<\/h2>\n<p>My first concern was testing. When I started there were no automated tests for the code. There were a few little scripts and \"test functions\" which you could run manually, but nothing that could be run as a part of <a href=\"https:\/\/www.atlassian.com\/continuous-delivery\/continuous-integration\">continuous integration<\/a>.<\/p>\n<p>This was a problem. Without tests, errors will inevitably creep into the code. As the complexity of the codebase increases, it becomes infeasible to manually check that everything is working since there are too many things to check. In general writing code that is correct the first time isn't too hard - it's not breaking it later that's difficult.<\/p>\n<p>In the context of disease modelling, automated tests are even more important than usual because the correctness of the output cannot be easily verified. The whole point of the system is to calculate an output that would be infeasible for a human to produce. Compare this scenario to web development where the desired output is usually known and easily verified. You can usually load up a web page and click a few buttons to check that the app works.<\/p>\n<h2>Smoke Tests<\/h2>\n<p>So where did I start? Trying to add tests to an untested codebase with thousands of lines of code is very intimidating. I couldn't simply sit down and write unit tests for every little bit of functionality because it would have taken weeks. So instead I wrote \"smoke tests\". A smoke test runs some code and checks that it doesn't crash. For example:<\/p>\n<div class=\"highlight\"><pre><span><\/span><code><span class=\"k\">def<\/span> <span class=\"nf\">test_covid_malaysia<\/span><span class=\"p\">():<\/span>\n    <span class=\"sd\">&quot;&quot;&quot;Ensure the Malaysia region model can run without crashing&quot;&quot;&quot;<\/span>\n    <span class=\"c1\"># Load model configuration.<\/span>\n    <span class=\"n\">region<\/span> <span class=\"o\">=<\/span> <span class=\"n\">get_region<\/span><span class=\"p\">(<\/span><span class=\"s2\">&quot;malaysia&quot;<\/span><span class=\"p\">)<\/span>\n    <span class=\"c1\"># Build the model with default parameters.<\/span>\n    <span class=\"n\">model<\/span> <span class=\"o\">=<\/span> <span class=\"n\">region<\/span><span class=\"o\">.<\/span><span class=\"n\">build_model<\/span><span class=\"p\">()<\/span>\n    <span class=\"c1\"># Run the model, don&#39;t check the outputs.<\/span>\n    <span class=\"n\">model<\/span><span class=\"o\">.<\/span><span class=\"n\">run_model<\/span><span class=\"p\">()<\/span>\n<\/code><\/pre><\/div>\n\n<p>To some this may look crimininally stupid, but these tests give fantastic bang-for-buck. They don't tell you whether the model outputs are correct, but they only takes a few minutes to write. These tests catch all sorts of stupid bugs: like someone trying to add a number to a string, undefined variables, bad filepaths, etc. They doesn't help so much in reducing semantic errors, but they do help with development speed.<\/p>\n<h2>Continuous Integration<\/h2>\n<p>A lack of testing is the kind of problem that people don't know they have. When you tell someone \"hey we need to start writing tests!\" the typical reaction is \"hmm yeah sure I guess, sounds nice...\" and internally they're thinking \"... but I've got more important shit to do\". You can try browbeating them by telling them how irresponsible they're being etc, but that's unlikely to actually get anyone to write and run tests on their own time.<\/p>\n<p>So how to convince people that testing is valuable? You can <em>show<\/em> them, with the magic of \u2728continuous integration\u2728. Our code was hosted in GitHub so I set up GitHub Actions to automatically run the new smoke tests on every commit to master. I've written a short guide on how to do this <a href=\"https:\/\/mattsegal.dev\/pytest-on-github-actions.html\">here<\/a>.<\/p>\n<p>This setup makes tests visible to everyone. There's a little tick or cross next to every commit and, importantly, next to the name of the person who broke the code.<\/p>\n<p><img alt=\"test-failures\" src=\"https:\/\/mattsegal.dev\/img\/devops-academia\/test-failures.png\"><\/p>\n<p>With this system in place we eventually developed new norms around keeping the tests passing. People would say \"Oops! I broke the tests!\" and it became normal to run the tests locally and fix them if they were broken. It was a little harder to encourage people to invest time in writing new tests.<\/p>\n<p>Once I become more familiar with the codebase I eventually wrote integration and unit tests for the critical modules. I've written a bit more about some testing approaches I used <a href=\"https:\/\/mattsegal.dev\/alternate-test-styles.html\">here<\/a>.<\/p>\n<p>Something that stood out to me in this process was that perhaps the most valuable thing I did in that job was one of the easiest things to do. Setting up continuous integration with GitHub took me an hour to two, but it's been paying dividends for ~2 years since. How hard something is to do and how valuable it is are different things.<\/p>\n<div class=\"ui divider\" style=\"margin: 1.5em 0;\"><\/div>\n<form action=\"https:\/\/dev.us19.list-manage.com\/subscribe\/post?u=e7a1ec466f7bb1732dbd23fc7&amp;id=ec345473bd\" method=\"post\" name=\"mc-embedded-subscribe-form\" target=\"_blank\" style=\"text-align: center; padding-bottom: 1em;\" novalidate>\n  <h3 class=\"subscribe-cta\">Get alerted when I publish new blog posts<\/h3>\n  <div class=\"ui fluid action input subscribe\">\n    <input\n      type=\"email\"\n      value=\"\"\n      name=\"EMAIL\"\n      placeholder=\"Enter your email address\"\n    \/>\n    <button class=\"ui primary button\" type=\"submit\" name=\"subscribe\">\n      Subscribe\n    <\/button>\n  <\/div>\n  <div style=\"position: absolute; left: -5000px;\" aria-hidden=\"true\">\n    <input\n      type=\"text\"\n      name=\"b_e7a1ec466f7bb1732dbd23fc7_ec345473bd\"\n      tabindex=\"-1\"\n      value=\"\"\n    \/>\n  <\/div>\n<\/form>\n<div class=\"ui divider\" style=\"margin: 1.5em 0;\"><\/div>\n\n<h2>Performance improvements<\/h2>\n<p>The code was too slow and the case for improving performance was clear. Slowness can be subjective, I've <a href=\"https:\/\/mattsegal.dev\/is-django-too-slow.html\">written a little<\/a> about the different meanings of \"slow\" in backend web dev, but in this case having to wait 2+ days for a calibration result was obviously way too slow and was our biggest productivity bottleneck.<\/p>\n<p>The core of the problem was that a MCMC calibration had to run the model over 1000 times. When I started, a single model run took about 2 minutes. Doing that 1000 times means ~33 hours of runtime per calibration. Our team's mathematician worked on trying to make our MCMC algorithm more sample-efficient, while I tried to push down the 2 minute inner loop.<\/p>\n<p>It wasn't hard to do better, since performance optimisation hadn't been a priority so far. I used Python's cProfile module, plus a few visualisation tools to find the hot parts of the code and speed them up. <a href=\"https:\/\/julien.danjou.info\/guide-to-python-profiling-cprofile-concrete-case-carbonara\/\">This article<\/a> was a lifesaver. In broad strokes, these were the kinds of changes that improved performance:<\/p>\n<ul>\n<li>Avoid redundant re-calculation in for-loops<\/li>\n<li>Switching data structures for more efficient value look-ups (eg. converting a list to a dict)<\/li>\n<li>Converting for-loops to matrix operations (<a href=\"https:\/\/en.wikipedia.org\/wiki\/Vectorization\">vectorisation<\/a>)<\/li>\n<li>Applying JIT optimisation to hot, pure, numerical functions (<a href=\"https:\/\/numba.pydata.org\/\">Numba<\/a>)<\/li>\n<li>Caching function return values (<a href=\"https:\/\/en.wikipedia.org\/wiki\/Memoization\">memoization<\/a>)<\/li>\n<li>Caching data read from disk<\/li>\n<\/ul>\n<p>This work was heaps of fun. It felt like I was playing a video game. Profile, change, profile, change, always trying to get a new high score. Initially there were lots of easy, huge wins, but it became harder to push the needle over time.<\/p>\n<p>After several months the code was 10x to 40x faster, running a model in 10s or less, meaning we could run 1000 iterations in a few hours, rather than over a day. This had a big impact on our ability to run calibrations for weekly reports, but the effects of this speedup were felt more broadly. To borrow a phrase: \"more is different\". Our tests ran faster. CI was more snappy and people were happier to run the tests locally, since they would take 10 seconds rather than 2 minutes to complete. Dev work was faster since you could tweak some code, run it, and view the outputs in seconds. In general, these performance improvements opened up other opportunities for working better that weren't obvious from the outset.<\/p>\n<p>There were some performance regressions over time as the code evolved. To try and fight these slowdowns I added automatic <a href=\"https:\/\/github.com\/benchmark-action\/github-action-benchmark\">benchmarking<\/a> to our continuous integration pipeline.<\/p>\n<h2>Task automation<\/h2>\n<p>Once our calibration process could run in hours instead of days we started to notice new bottlenecks in our workflow. Notably, running a calibration involved a lot of manual steps which were not documented, meaning that only one person knew how to do it.<\/p>\n<p>Interacting with the university's <a href=\"https:\/\/slurm.schedmd.com\/documentation.html\">Slurm<\/a> cluster was also a pain. The compute was free but we were at the mercy of the scheduler, which decided when our code would actually run, and the APIs for running and monitoring jobs were arcane and clunky.<\/p>\n<p>Calibrations didn't always run well so this cycle could repeat several times before we got an acceptable result that we would want to use.<\/p>\n<p>Finally, there wasn't a systematic method for recording input and output data for a given model run. It would be hard to reproduce a given model run 6 months later.<\/p>\n<p>The process worked something like this when I started:<\/p>\n<p><img alt=\"old workflow\" src=\"https:\/\/mattsegal.dev\/img\/devops-academia\/old-workflow.png\"><\/p>\n<p>It was possible to automate most of these steps. After a lot of thrashing around on my part, we ended up with a workflow that looks like this.<\/p>\n<p><img alt=\"new workflow\" src=\"https:\/\/mattsegal.dev\/img\/devops-academia\/new-workflow.png\"><\/p>\n<p>In brief:<\/p>\n<ul>\n<li>A disease modeller would update the code and push it to GitHub<\/li>\n<li>Then they could load up a webpage and trigger a job by filling out a form<\/li>\n<li>The calibration and any other post processing would run \"in the cloud\"<\/li>\n<li>The final results would be available on a website<\/li>\n<li>The data vis guy could pull down the results and push them to PowerBI<\/li>\n<\/ul>\n<p>There were many benefits to this new workflow. There were no more manual tasks. The process could be run by anyone on the team. We could easily run multiple calibrations in parallel (and often did). We also created standard diagnostic plots that would be automatically generated for each calibration run (similar to <a href=\"https:\/\/wandb.ai\/site\">Weights and Biases<\/a> for machine learning). For example, these plots show how the model parameters change over the course of a MCMC calibration run.<\/p>\n<p><img alt=\"parameter traces\" src=\"https:\/\/mattsegal.dev\/img\/devops-academia\/param-traces.png\"><\/p>\n<p>I won't go into too much detail on the exact implementation of this cloud pipeline. Not my cleanest work, but it did work. It was a collection of Python scripts that hacked together several tools:<\/p>\n<ul>\n<li><a href=\"https:\/\/buildkite.com\/home\">Buildkite<\/a> for task automation (it's really great)<\/li>\n<li>AWS EC2 for compute<\/li>\n<li>AWS S3 for storing data<\/li>\n<li><a href=\"https:\/\/github.com\/boto\/boto3\">boto3<\/a> for managing transient servers<\/li>\n<li><a href=\"https:\/\/nextjs.org\/\">NextJS<\/a> for building the static results website<\/li>\n<\/ul>\n<p>If I could build it again I'd consider using something like <a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/machine-learning\/concept-ml-pipelines\">Azure ML pipelines<\/a>. See below for an outline of the cloud architecture if you're curious.<\/p>\n<p><img alt=\"new workflow, detailed\" src=\"https:\/\/mattsegal.dev\/img\/devops-academia\/new-workflow-detailed.png\"><\/p>\n<h2>Visualization tools<\/h2>\n<p>Our models had a lot of stuff that needed to be visualised: inputs, outputs, and calibration targets. Our prior approach was to run a Python script which used <a href=\"https:\/\/matplotlib.org\/\">matplotlib<\/a> to dump all the required plots to into a folder. So the development loop to visualise something was:<\/p>\n<ul>\n<li>Edit the model code, run the model<\/li>\n<li>Run a Python script on the model outputs<\/li>\n<li>Open up a folder and look at the plots inside<\/li>\n<\/ul>\n<p>It's not terrible but there's some friction and toil in there.<\/p>\n<p><a href=\"https:\/\/jupyter.org\">Jupyter notebooks<\/a> were a contender in this space, but I chose to use <a href=\"https:\/\/streamlit.io\/\">Streamlit<\/a>, because many of our plots were routine and standardised. With Streamlit, you can use Python to build web dashboards that generate plots based on a user's input. This was useful for disease modellers to quickly check a bunch of different diagnostic plots when working on the model on their laptop. Given it's all Python (no JavaScript), my colleagues were able to independently add their own plots. This tool went from interesting idea to a key fixture of our workflow over a few months.<\/p>\n<p><img alt=\"streamlit dashboard\" src=\"https:\/\/mattsegal.dev\/img\/devops-academia\/streamlit.png\"><\/p>\n<p>A key feature of Streamlit is \"hot reloading\", which is where the code that generates the dashboard automatically re-runs when you change it. This means you can adjust a plot by editing the Python code, hit \"save\" and the changes will appear in your web browser. This quick feedback loop sped up plotting tasks considerably.<\/p>\n<p><strong>Aside:<\/strong> This isn't super relevant but while we're here I just want to show off this visualisation I made of an agent based model simulating the spread of a disease through a bunch of households.<\/p>\n<p><img alt=\"agent based model\" src=\"https:\/\/mattsegal.dev\/img\/devops-academia\/abm.gif\"><\/p>\n<h2>Data management<\/h2>\n<p>We had quite a variety of data flying around. Demographic inputs like population size, model parameters, calibration targets and the model outputs.<\/p>\n<p>We had a lot of model input parameters stored as YAML files and it was hard to keep them all consistent. We had like, a hundred YAML files when I left.\nTo catch errors early I used <a href=\"https:\/\/docs.python-cerberus.org\/en\/stable\/\">Cerberus<\/a> and later <a href=\"https:\/\/pydantic-docs.helpmanual.io\/\">Pydantic<\/a> to validate parameters as they were loaded from disk.\nI wrote smoke tests, which were run in CI, to check that none of these files were invalid. I wrote more about this approach <a href=\"https:\/\/mattsegal.dev\/cerberus-config-validation.html\">here<\/a>, although now I prefer Pydantic to Cerberus becuase it's a little less verbose.<\/p>\n<p>We had a lot of 3rd party inputs for our modelling such as <a href=\"https:\/\/www.google.com\/covid19\/mobility\/\">Google mobility data<\/a>, <a href=\"https:\/\/population.un.org\/wpp\/\">UN World Population<\/a> info, <a href=\"https:\/\/github.com\/kieshaprem\/synthetic-contact-matrices\">social mixing matrices<\/a>. Initially this data was kept in source control as a random scattering of undocumented .csv and .xls file. Pre-processing was done manually using some Python scripts. I pushed to get all of the source data properly documented and consolidated into a single folder and tried to encourage a standard framework for pre-processing all of our inputs with a single script. As our input data grew to 100s of megabytes I moved these CSV files to GitHub's <a href=\"https:\/\/git-lfs.github.com\/\">Git LFS<\/a>, since our repo was getting quite hefty and slow to download (&gt;400MB).<\/p>\n<p>In the end hand-rolled a lot of functionality that I probably shouldn't have. If you want to organise and standardise all your input data, I recommend checking out <a href=\"https:\/\/dvc.org\/\">Data Version Control<\/a>.<\/p>\n<p>Finally I used AWS S3 to store all of the outputs, intermediate values, log files and plots produced by cloud jobs. Each job was stored using a key that included the model name, region name, timestamp and git commit. This was very helpful for debugging and convenient for everybody on the team to access via our results website. The main downside was that I had to occasionally manually prune ~100GB of results from S3 to keep our cloud bills low.<\/p>\n<h2>Wrapping Up<\/h2>\n<p>Overall I look back on this job fondly. You might have noticed that I've written thousands of words about it. There were some downsides specific to the academic environment. There was an emphasis on producing novel results, especially in the context of COVID in 2020, and as a consequence there were a lot of \"one off\" tasks and analyses. The codebase was constantly evolving and it felt like I was always trying to catch-up. It was cool working on things that I'd never done before where I didn't know what the solution was. I drew a lot of inspiration from machine learning and data science.<\/p>\n<p>Thanks for reading. If this sounds cool and you think you might like working as a software developer in academia, then go pester some academics.<\/p>\n<p>If you read this and were like \"wow! we should get this guy working for us!\", I've got good news. I am looking for projects to work on as a freelance web developer. See <a href=\"https:\/\/mattsegal.com.au\/\">here<\/a> for more details.<\/p>","category":{"@attributes":{"term":"Programming"}}},{"title":"How to compress images for a webpage","link":{"@attributes":{"href":"https:\/\/mattsegal.dev\/webpage-image-compressiom.html","rel":"alternate"}},"published":"2021-05-14T12:00:00+10:00","updated":"2021-05-14T12:00:00+10:00","author":{"name":"Matthew Segal"},"id":"tag:mattsegal.dev,2021-05-14:\/webpage-image-compressiom.html","summary":"<p>Often when you're creating a website, a client or designer will provide you with large images that are 2-5MB in size and thousands of pixels wide.\nThe large file size of these images will make them slow to load on your webpage, making it seem slow and broken<\/p>\n<p>This video \u2026<\/p>","content":"<p>Often when you're creating a website, a client or designer will provide you with large images that are 2-5MB in size and thousands of pixels wide.\nThe large file size of these images will make them slow to load on your webpage, making it seem slow and broken<\/p>\n<p>This video shows you a quick browser-only workflow for cropping, resizing and compressing these images so that they will load more quickly on a webpage.\nIt's not very advanced, but it doesn't need to be. Here I convert images from ~2MB to ~100kB, which is a ~20x reduction in file size.<\/p>\n<div class=\"yt-embed\">\n    <iframe \n        src=\"https:\/\/www.youtube.com\/embed\/ZtzdpWQzidM\" \n        frameborder=\"0\" \n        allow=\"accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture\" \n        allowfullscreen\n    >\n    <\/iframe>\n<\/div>","category":{"@attributes":{"term":"Programming"}}},{"title":"How to highlight unused Python variables in VS Code","link":{"@attributes":{"href":"https:\/\/mattsegal.dev\/pylance-vscode.html","rel":"alternate"}},"published":"2020-10-09T12:00:00+11:00","updated":"2020-10-09T12:00:00+11:00","author":{"name":"Matthew Segal"},"id":"tag:mattsegal.dev,2020-10-09:\/pylance-vscode.html","summary":"<p>I make a lot of stupid mistakes when I'm working on Python code. I tend to:<\/p>\n<ul>\n<li>make typos in variable names<\/li>\n<li>accidently delete a variable that's used somewhere else<\/li>\n<li>leave unused variables lying around when they should be deleted<\/li>\n<\/ul>\n<p>It's easy to accidentally create code like in the image below \u2026<\/p>","content":"<p>I make a lot of stupid mistakes when I'm working on Python code. I tend to:<\/p>\n<ul>\n<li>make typos in variable names<\/li>\n<li>accidently delete a variable that's used somewhere else<\/li>\n<li>leave unused variables lying around when they should be deleted<\/li>\n<\/ul>\n<p>It's easy to accidentally create code like in the image below, where you have unused variables (<code>y<\/code>, <code>z<\/code>, <code>q<\/code>) and references to variables that aren't defined yet (<code>z<\/code>).<\/p>\n<p><img alt=\"foo-before\" src=\"https:\/\/mattsegal.dev\/img\/pylance\/foo-before.png\"><\/p>\n<p>You'll catch these issues when you eventually try to run this function, but it's best\nto be able to spot them instantly. I want my editor to show me something that looks like this:<\/p>\n<p><img alt=\"foo-after\" src=\"https:\/\/mattsegal.dev\/img\/pylance\/foo-after.png\"><\/p>\n<p>Here you can see that the vars <code>y<\/code>, <code>z<\/code> and <code>q<\/code> are greyed out, to show that they're not used. The undefined reference to <code>z<\/code> is highlighted with a yellow squiggle. This kind of instant visual feedback means you can write better code, faster and with less mental overhead.<\/p>\n<p>Having your editor highlight unused variables can also help you remove clutter.\nFor example, it's common to have old imports that aren't used anymore, like <code>copy<\/code> and <code>requests<\/code> in this script:<\/p>\n<p><img alt=\"imports-before\" src=\"https:\/\/mattsegal.dev\/img\/pylance\/imports-before.png\"><\/p>\n<p>It's often hard to see what imports are being used just by looking, which is why it's nice to\nhave your editor tell you:<\/p>\n<p><img alt=\"imports-after\" src=\"https:\/\/mattsegal.dev\/img\/pylance\/imports-after.png\"><\/p>\n<p>You'll also note that there is an error in my import statement. <code>import copy from copy<\/code> isn't valid Python. This was an <em>unintentional mistake<\/em> in my example code that VS Code caught for me.<\/p>\n<h2>Setting this up with VS Code<\/h2>\n<p>You can get these variable highlights in VS Code very easily by installing <a href=\"https:\/\/devblogs.microsoft.com\/python\/announcing-pylance-fast-feature-rich-language-support-for-python-in-visual-studio-code\/\">PyLance<\/a>, and alternative \"language server\" for VS Code. A language server is a tool, which runs alongside the editor, that does <a href=\"https:\/\/en.wikipedia.org\/wiki\/Static_program_analysis\">static analysis<\/a> of your code.<\/p>\n<p>To get this language server, go into your extensions tab in VS Code, search for \"pylance\", install it, and then you'll see this popup:<\/p>\n<p><img alt=\"server-prompt\" src=\"https:\/\/mattsegal.dev\/img\/pylance\/server-prompt.png\"><\/p>\n<p>Click \"Yes, and reload\".<\/p>\n<h2>Alternatives<\/h2>\n<p>PyCharm does this kind of <a href=\"https:\/\/en.wikipedia.org\/wiki\/Static_program_analysis\">static analysis<\/a> out of the box. I don't like PyCharm quite so much as VS Code, but it's a decent editor and many people swear by it. You can also get this feature by enabling a Python linter in VS Code like flake8, pylint or autopep8. I don't like twiddling with linters, but again other people enjoy using them.<\/p>\n<h2>Next steps<\/h2>\n<p>If you're looking for more Python productivity helpers, then check out my blog post on the <a href=\"https:\/\/mattsegal.dev\/python-formatting-with-black.html\">Black<\/a> auto-formatter.<\/p>","category":{"@attributes":{"term":"Programming"}}},{"title":"There's no one right way to test your code","link":{"@attributes":{"href":"https:\/\/mattsegal.dev\/alternate-test-styles.html","rel":"alternate"}},"published":"2020-07-11T12:00:00+10:00","updated":"2020-07-11T12:00:00+10:00","author":{"name":"Matthew Segal"},"id":"tag:mattsegal.dev,2020-07-11:\/alternate-test-styles.html","summary":"<p>Today I read a Reddit thread where a beginner was stumbling over themself, apologizing for writing tests the \"wrong way\":<\/p>\n<blockquote>\n<p>I'm now writing some unit tests ... I know that the correct way would be to write tests first and then the code, but unfortunately it had to be done this \u2026<\/p><\/blockquote>","content":"<p>Today I read a Reddit thread where a beginner was stumbling over themself, apologizing for writing tests the \"wrong way\":<\/p>\n<blockquote>\n<p>I'm now writing some unit tests ... I know that the correct way would be to write tests first and then the code, but unfortunately it had to be done this way.<\/p>\n<\/blockquote>\n<p>This is depressing... what causes newbies to feel the need to <em>ask for forgiveness<\/em> when writing tests? You can tell the poster has either previously copped some snark or has seen someone else lectured online for not doing things the \"correct way\".<\/p>\n<p>I feel that people can be very prescriptive about how you should test your code, which is puzzling to me. There are so many different use-cases for automated tests that there cannot be one right way to do it. When you're reading blogs and forums you get the impression that you must write \"unit tests\" (the right way!) and that you need to do <a href=\"https:\/\/en.wikipedia.org\/wiki\/Test-driven_development\">test driven development<\/a>, or else you're some kind of idiot slacker.<\/p>\n<p>In this post I am going to focus on the quiet dominance of \"unit tests\" as the default way to test your code, and suggest some other testing styles that you can use.<\/p>\n<h2>You should write \"unit tests\"<\/h2>\n<p>People often say that you should write <strong>unit tests<\/strong> for your code. In brief, these tests check that some chunk of code returns a an specific output for a given input. For example:<\/p>\n<div class=\"highlight\"><pre><span><\/span><code><span class=\"c1\"># The function to be tested<\/span>\n<span class=\"k\">def<\/span> <span class=\"nf\">add<\/span><span class=\"p\">(<\/span><span class=\"n\">a<\/span><span class=\"p\">:<\/span> <span class=\"nb\">int<\/span><span class=\"p\">,<\/span> <span class=\"n\">b<\/span><span class=\"p\">:<\/span> <span class=\"nb\">int<\/span><span class=\"p\">):<\/span>\n    <span class=\"sd\">&quot;&quot;&quot;Returns a added with b&quot;&quot;&quot;<\/span>\n    <span class=\"k\">return<\/span> <span class=\"n\">a<\/span> <span class=\"o\">+<\/span> <span class=\"n\">b<\/span>\n\n\n<span class=\"c1\"># Some tests for `add`<\/span>\n<span class=\"k\">def<\/span> <span class=\"nf\">test_add__with_positive_numbers<\/span><span class=\"p\">():<\/span>\n    <span class=\"k\">assert<\/span> <span class=\"n\">add<\/span><span class=\"p\">(<\/span><span class=\"mi\">1<\/span><span class=\"p\">,<\/span> <span class=\"mi\">2<\/span><span class=\"p\">)<\/span> <span class=\"o\">==<\/span> <span class=\"mi\">3<\/span>\n\n\n<span class=\"k\">def<\/span> <span class=\"nf\">test_add__with_zero<\/span><span class=\"p\">():<\/span>\n    <span class=\"k\">assert<\/span> <span class=\"n\">add<\/span><span class=\"p\">(<\/span><span class=\"mi\">1<\/span><span class=\"p\">,<\/span> <span class=\"mi\">0<\/span><span class=\"p\">)<\/span> <span class=\"o\">==<\/span> <span class=\"mi\">1<\/span>\n\n<span class=\"c1\"># etc. etc. etc<\/span>\n<\/code><\/pre><\/div>\n\n<p>This style of testing is great under the right circumstances, but these are not the only kind of test that you can, or should, write. Unfortunately the name \"unit test\" is used informally to refer to all automated testing of code. This misnomer leads beginners to believe that unit tests are the best, and maybe only, way to test.<\/p>\n<p>Let's start with what unit tests are good for. They favour a \"bottom-up\" style of coding. They're the most effective when you have a lots of little chunks of code that you want to write, test independently, and then assemble into a bigger program.<\/p>\n<p>This is a perfect fit when you're writing code to deterministically transform data from one form into another, like parts of an <a href=\"https:\/\/en.wikipedia.org\/wiki\/Extract,_transform,_load\">ETL pipeline<\/a> or a compiler. These tests work best when you're writing <a href=\"https:\/\/en.wikipedia.org\/wiki\/Pure_function\">pure functions<\/a>, or code with limited <a href=\"https:\/\/en.wikipedia.org\/wiki\/Side_effect_(computer_science)\">side effects<\/a>.<\/p>\n<h2>When unit tests don't make sense<\/h2>\n<p>The main problem with unit tests is that you can't always break your code up into pretty little pure functions.<\/p>\n<p>When you start working on an existing legacy codebase there's no guarantee that the code is well-structured enough to allow for unit tests. Most commercial code that you'll encounter is legacy code, and a lot of legacy code is untested. I've encountered a fair few 2000+ line classes where reasoning about the effect of any one function is basically impossible because of all the shared state. You can't test a function if you don't know what it's supposed to do. These codebases cannot be rigourly unit tested straight away and need to be <a href=\"https:\/\/understandlegacycode.com\/\">gently massaged into a better shape over time<\/a>, which is a whole other can of worms.<\/p>\n<p>Another, very common, case where unit tests don't make much sense is when a lot of the heavy lifting is being done by a framework. This happens to me all the time when I'm writing web apps with the <a href=\"https:\/\/www.djangoproject.com\/\">Django<\/a> framework. In Django's REST Framework, we use a \"serializer\" class to validate Python objects and translate them into a JSON string. For example:<\/p>\n<div class=\"highlight\"><pre><span><\/span><code><span class=\"kn\">from<\/span> <span class=\"nn\">django.db<\/span> <span class=\"kn\">import<\/span> <span class=\"n\">models<\/span>\n<span class=\"kn\">from<\/span> <span class=\"nn\">rest_framework<\/span> <span class=\"kn\">import<\/span> <span class=\"n\">serializers<\/span>\n<span class=\"kn\">from<\/span> <span class=\"nn\">rest_framework.renderers<\/span> <span class=\"kn\">import<\/span> <span class=\"n\">JSONRenderer<\/span>\n\n<span class=\"c1\"># Create a data model that represents a person<\/span>\n<span class=\"k\">class<\/span> <span class=\"nc\">Person<\/span><span class=\"p\">(<\/span><span class=\"n\">models<\/span><span class=\"o\">.<\/span><span class=\"n\">Model<\/span><span class=\"p\">):<\/span>\n    <span class=\"n\">name<\/span> <span class=\"o\">=<\/span> <span class=\"n\">models<\/span><span class=\"o\">.<\/span><span class=\"n\">CharField<\/span><span class=\"p\">(<\/span><span class=\"n\">max_length<\/span><span class=\"o\">=<\/span><span class=\"mi\">64<\/span><span class=\"p\">)<\/span>\n    <span class=\"n\">email<\/span> <span class=\"o\">=<\/span> <span class=\"n\">models<\/span><span class=\"o\">.<\/span><span class=\"n\">EmailField<\/span><span class=\"p\">()<\/span>\n\n<span class=\"c1\"># Create a serializer that can map a Person to a JSON string<\/span>\n<span class=\"k\">class<\/span> <span class=\"nc\">PersonSerializer<\/span><span class=\"p\">(<\/span><span class=\"n\">serializers<\/span><span class=\"o\">.<\/span><span class=\"n\">ModelSerializer<\/span><span class=\"p\">):<\/span>\n    <span class=\"k\">class<\/span> <span class=\"nc\">Meta<\/span><span class=\"p\">:<\/span>\n        <span class=\"n\">model<\/span> <span class=\"o\">=<\/span> <span class=\"n\">Person<\/span>\n        <span class=\"n\">fields<\/span> <span class=\"o\">=<\/span> <span class=\"p\">[<\/span><span class=\"s2\">&quot;name&quot;<\/span><span class=\"p\">,<\/span> <span class=\"s2\">&quot;email&quot;<\/span><span class=\"p\">]<\/span>\n\n<span class=\"c1\"># Example usage.<\/span>\n<span class=\"n\">p<\/span> <span class=\"o\">=<\/span> <span class=\"n\">Person<\/span><span class=\"p\">(<\/span><span class=\"n\">name<\/span><span class=\"o\">=<\/span><span class=\"s2\">&quot;Matt&quot;<\/span><span class=\"p\">,<\/span> <span class=\"n\">email<\/span><span class=\"o\">=<\/span><span class=\"s2\">&quot;mattdsegal@gmail.com&quot;<\/span><span class=\"p\">)<\/span>\n<span class=\"n\">ps<\/span> <span class=\"o\">=<\/span> <span class=\"n\">PersonSerializer<\/span><span class=\"p\">(<\/span><span class=\"n\">p<\/span><span class=\"p\">)<\/span>\n<span class=\"n\">ps<\/span><span class=\"o\">.<\/span><span class=\"n\">is_valid<\/span><span class=\"p\">()<\/span> <span class=\"c1\"># True<\/span>\n<span class=\"n\">JSONRenderer<\/span><span class=\"p\">()<\/span><span class=\"o\">.<\/span><span class=\"n\">render<\/span><span class=\"p\">(<\/span><span class=\"n\">ps<\/span><span class=\"o\">.<\/span><span class=\"n\">data<\/span><span class=\"p\">)<\/span>\n<span class=\"c1\"># &#39;{&quot;name&quot;:&quot;Matt&quot;,&quot;email&quot;:&quot;mattdsegal@gmail.com&quot;}&#39;<\/span>\n<\/code><\/pre><\/div>\n\n<p>In this case, there's barely anything for you to actually test.\nDon't get me wrong, you <em>could<\/em> write unit tests for this code, but anything you write is just a re-hash of the definitions of the <code>Person<\/code> and <code>PersonSerializer<\/code>. All the interesting stuff is handled by the framework. Any \"unit test\" of this code is really just a test of the 3rd party code, which <a href=\"https:\/\/github.com\/encode\/django-rest-framework\/tree\/master\/tests\">already has heaps of tests<\/a>. In this case, writing unit tests is just adding extra boilerplate to your codebase, when the whole point of using a framework was to save you time.<\/p>\n<p>So if \"unit tests\" don't always make sense, what else can you do? There are other styles of testing that you can use. I'll highlight my two favourites: <strong>smoke tests<\/strong> and <strong>integration tests<\/strong>.<\/p>\n<h2>Quick 'n dirty smoke tests<\/h2>\n<p>Some of the value of an automated test is checking that the code runs at all. A smoke test runs some code and checks that it doesn't crash. Smoke tests are really, really easy to write and maintain and they catch 50% of bugs (made up number). These kinds of tests are great for when:<\/p>\n<ul>\n<li>your app has many potential code-paths<\/li>\n<li>you are using interpreted languages like JavaScript or Python which often crash at runtime<\/li>\n<li>you don't know or can't predict what the output of your code will be<\/li>\n<\/ul>\n<p>Here's a smoke test for a neural network. All it does is construct the network and feed it some random garbage data, making sure that it doesn't crash and that the outputs are the correct shape:<\/p>\n<div class=\"highlight\"><pre><span><\/span><code><span class=\"k\">def<\/span> <span class=\"nf\">test_processes_noise<\/span><span class=\"p\">():<\/span>\n    <span class=\"n\">input_shape<\/span> <span class=\"o\">=<\/span> <span class=\"p\">(<\/span><span class=\"mi\">1<\/span><span class=\"p\">,<\/span> <span class=\"mi\">1<\/span><span class=\"p\">,<\/span> <span class=\"mi\">80<\/span><span class=\"p\">,<\/span> <span class=\"mi\">256<\/span><span class=\"p\">)<\/span>\n    <span class=\"n\">inputs<\/span> <span class=\"o\">=<\/span> <span class=\"n\">get_random_input<\/span><span class=\"p\">(<\/span><span class=\"n\">input_shape<\/span><span class=\"p\">)<\/span>\n    <span class=\"n\">outputs<\/span> <span class=\"o\">=<\/span> <span class=\"n\">MyNeuralNet<\/span><span class=\"p\">(<\/span><span class=\"n\">inputs<\/span><span class=\"p\">)<\/span>\n    <span class=\"k\">assert<\/span> <span class=\"n\">outputs<\/span><span class=\"o\">.<\/span><span class=\"n\">shape<\/span> <span class=\"o\">==<\/span> <span class=\"p\">(<\/span><span class=\"mi\">1<\/span><span class=\"p\">,<\/span> <span class=\"mi\">1<\/span><span class=\"p\">,<\/span> <span class=\"mi\">80<\/span><span class=\"p\">,<\/span> <span class=\"mi\">256<\/span><span class=\"p\">)<\/span>\n<\/code><\/pre><\/div>\n\n<p>This is valuable because runtime errors due to stupid mistakes are very common when building a neural net. A mismatch in array dimensions somewhere in the network is common stumbling block. Typically it might take minutes of runtime before your code crashes due to all the data loading and processing that needs to happen before the broken code is executed. With smoke tests like this, you can check for stupid errors in seconds instead of minutes.<\/p>\n<p>In a more web-development focused example, here's a Django smoke test that loops over a bunch of urls and checks that they all respond to GET requests with happy \"200\" HTTP status codes, without validating any of the data that is returned:<\/p>\n<div class=\"highlight\"><pre><span><\/span><code><span class=\"nd\">@pytest<\/span><span class=\"o\">.<\/span><span class=\"n\">mark<\/span><span class=\"o\">.<\/span><span class=\"n\">django_db<\/span>\n<span class=\"k\">def<\/span> <span class=\"nf\">test_urls_work<\/span><span class=\"p\">(<\/span><span class=\"n\">client<\/span><span class=\"p\">):<\/span>\n    <span class=\"sd\">&quot;&quot;&quot;Ensure all urls return 200&quot;&quot;&quot;<\/span>\n    <span class=\"k\">for<\/span> <span class=\"n\">url<\/span> <span class=\"ow\">in<\/span> <span class=\"n\">SMOKE_TEST_URLS<\/span><span class=\"p\">:<\/span>\n        <span class=\"n\">response<\/span> <span class=\"o\">=<\/span> <span class=\"n\">client<\/span><span class=\"o\">.<\/span><span class=\"n\">get<\/span><span class=\"p\">(<\/span><span class=\"n\">url<\/span><span class=\"p\">)<\/span>\n        <span class=\"k\">assert<\/span> <span class=\"n\">response<\/span><span class=\"o\">.<\/span><span class=\"n\">status_code<\/span> <span class=\"o\">==<\/span> <span class=\"mi\">200<\/span>\n<\/code><\/pre><\/div>\n\n<p>Maybe you don't have time to write detailed tests for all your web app's endpoints, but a quick smoke test like this will at least exercise your code and check for stupid errors.<\/p>\n<p>This crude style of testing is both fine and good. Don't let people shame you for writing smoke tests. If you do nothing but write smoke tests for your app, you'll still be getting a sizeable benefit from your test suite.<\/p>\n<h2>High level integration tests<\/h2>\n<p>To me, integration tests are when you test a whole feature, end-to-end. You are testing a system of components (functions, classes, modules, libraries) and the <em>integrations<\/em> between them. I think this style of testing can provide more bang-for-buck than a set of unit tests, because the integration tests cover a lot of different components with less code, and they check for behaviours that you actually care about. This is more \"top down\" approach to testing, compared to the \"bottom up\" style of unit tests.<\/p>\n<p>Calling back to my earlier Django example, an integration test wouldn't test any independent behaviour of the the <code>Person<\/code> or <code>PersonSerializer<\/code> classes. Instead, we would test them by exercising a code path where they are used in combination. For example, we would want to make sure that a GET request asking for a specific Person by their id returns the correct data. Here's the API code to be tested:<\/p>\n<div class=\"highlight\"><pre><span><\/span><code><span class=\"c1\"># Data model<\/span>\n<span class=\"k\">class<\/span> <span class=\"nc\">Person<\/span><span class=\"p\">(<\/span><span class=\"n\">models<\/span><span class=\"o\">.<\/span><span class=\"n\">Model<\/span><span class=\"p\">):<\/span>\n    <span class=\"n\">name<\/span> <span class=\"o\">=<\/span> <span class=\"n\">models<\/span><span class=\"o\">.<\/span><span class=\"n\">CharField<\/span><span class=\"p\">(<\/span><span class=\"n\">max_length<\/span><span class=\"o\">=<\/span><span class=\"mi\">64<\/span><span class=\"p\">)<\/span>\n    <span class=\"n\">email<\/span> <span class=\"o\">=<\/span> <span class=\"n\">models<\/span><span class=\"o\">.<\/span><span class=\"n\">EmailField<\/span><span class=\"p\">()<\/span>\n\n<span class=\"c1\"># Maps data model to JSON string<\/span>\n<span class=\"k\">class<\/span> <span class=\"nc\">PersonSerializer<\/span><span class=\"p\">(<\/span><span class=\"n\">serializers<\/span><span class=\"o\">.<\/span><span class=\"n\">ModelSerializer<\/span><span class=\"p\">):<\/span>\n    <span class=\"k\">class<\/span> <span class=\"nc\">Meta<\/span><span class=\"p\">:<\/span>\n        <span class=\"n\">model<\/span> <span class=\"o\">=<\/span> <span class=\"n\">Person<\/span>\n        <span class=\"n\">fields<\/span> <span class=\"o\">=<\/span> <span class=\"p\">[<\/span><span class=\"s2\">&quot;name&quot;<\/span><span class=\"p\">,<\/span> <span class=\"s2\">&quot;email&quot;<\/span><span class=\"p\">]<\/span>\n\n<span class=\"c1\"># API endpoint for Person<\/span>\n<span class=\"k\">class<\/span> <span class=\"nc\">PersonViewSet<\/span><span class=\"p\">(<\/span><span class=\"n\">viewsets<\/span><span class=\"o\">.<\/span><span class=\"n\">RetrieveAPIView<\/span><span class=\"p\">):<\/span>\n    <span class=\"n\">serializer_class<\/span> <span class=\"o\">=<\/span> <span class=\"n\">PersonSerializer<\/span>\n    <span class=\"n\">queryset<\/span> <span class=\"o\">=<\/span> <span class=\"n\">Person<\/span><span class=\"o\">.<\/span><span class=\"n\">objects<\/span><span class=\"o\">.<\/span><span class=\"n\">all<\/span><span class=\"p\">()<\/span>\n\n<span class=\"c1\"># Attach API endpoint to a URL path<\/span>\n<span class=\"n\">router<\/span> <span class=\"o\">=<\/span> <span class=\"n\">routers<\/span><span class=\"o\">.<\/span><span class=\"n\">SimpleRouter<\/span><span class=\"p\">()<\/span>\n<span class=\"n\">router<\/span><span class=\"o\">.<\/span><span class=\"n\">register<\/span><span class=\"p\">(<\/span><span class=\"s2\">&quot;person&quot;<\/span><span class=\"p\">,<\/span> <span class=\"n\">PersonViewSet<\/span><span class=\"p\">)<\/span>\n<span class=\"n\">urlpatterns<\/span> <span class=\"o\">=<\/span> <span class=\"p\">[<\/span><span class=\"n\">path<\/span><span class=\"p\">(<\/span><span class=\"s2\">&quot;api&quot;<\/span><span class=\"p\">,<\/span> <span class=\"n\">include<\/span><span class=\"p\">(<\/span><span class=\"n\">router<\/span><span class=\"o\">.<\/span><span class=\"n\">urls<\/span><span class=\"p\">))]<\/span>\n<\/code><\/pre><\/div>\n\n<p>And here's a short integration test for the code above. It used Django's <a href=\"https:\/\/docs.djangoproject.com\/en\/3.0\/topics\/testing\/tools\/#the-test-client\">test client<\/a> to simulate a HTTP GET request to our view and validate the data that is returned:<\/p>\n<div class=\"highlight\"><pre><span><\/span><code><span class=\"nd\">@pytest<\/span><span class=\"o\">.<\/span><span class=\"n\">mark<\/span><span class=\"o\">.<\/span><span class=\"n\">django_db<\/span>\n<span class=\"k\">def<\/span> <span class=\"nf\">test_person_get<\/span><span class=\"p\">(<\/span><span class=\"n\">client<\/span><span class=\"p\">):<\/span>\n    <span class=\"sd\">&quot;&quot;&quot;Ensure a user can retrieve a person&#39;s data by id&quot;&quot;&quot;<\/span>\n    <span class=\"n\">p<\/span> <span class=\"o\">=<\/span> <span class=\"n\">Person<\/span><span class=\"o\">.<\/span><span class=\"n\">objects<\/span><span class=\"o\">.<\/span><span class=\"n\">create<\/span><span class=\"p\">(<\/span><span class=\"n\">name<\/span><span class=\"o\">=<\/span><span class=\"s2\">&quot;Matt&quot;<\/span><span class=\"p\">,<\/span> <span class=\"n\">email<\/span><span class=\"o\">=<\/span><span class=\"s2\">&quot;mattdsegal@gmail.com&quot;<\/span><span class=\"p\">)<\/span>\n    <span class=\"n\">url<\/span> <span class=\"o\">=<\/span> <span class=\"n\">reverse<\/span><span class=\"p\">(<\/span><span class=\"s2\">&quot;person-detail&quot;<\/span><span class=\"p\">,<\/span> <span class=\"n\">args<\/span><span class=\"o\">=<\/span><span class=\"p\">[<\/span><span class=\"n\">p<\/span><span class=\"o\">.<\/span><span class=\"n\">id<\/span><span class=\"p\">])<\/span>\n    <span class=\"n\">response<\/span> <span class=\"o\">=<\/span> <span class=\"n\">client<\/span><span class=\"o\">.<\/span><span class=\"n\">get<\/span><span class=\"p\">(<\/span><span class=\"n\">url<\/span><span class=\"p\">)<\/span>\n    <span class=\"k\">assert<\/span> <span class=\"n\">response<\/span><span class=\"o\">.<\/span><span class=\"n\">status_code<\/span> <span class=\"o\">==<\/span> <span class=\"mi\">200<\/span>\n    <span class=\"k\">assert<\/span> <span class=\"n\">response<\/span><span class=\"o\">.<\/span><span class=\"n\">data<\/span> <span class=\"o\">==<\/span> <span class=\"p\">{<\/span>\n        <span class=\"s2\">&quot;name&quot;<\/span><span class=\"p\">:<\/span> <span class=\"s2\">&quot;Matt&quot;<\/span><span class=\"p\">,<\/span>\n        <span class=\"s2\">&quot;email&quot;<\/span><span class=\"p\">:<\/span> <span class=\"s2\">&quot;mattdsegal@gmail.com&quot;<\/span><span class=\"p\">,<\/span>\n    <span class=\"p\">}<\/span>\n<\/code><\/pre><\/div>\n\n<p>This integration test is exercising the code of the <code>Person<\/code> data model, the <code>PersonSerializer<\/code> data mapping and the <code>PersonViewSet<\/code> API endpoint all in one go.<\/p>\n<p>A valid criticism of this style of testing is that if the integration test fails, it's not always clear <em>why<\/em> it failed. This is typically a non-issue, since you can get to the bottom of a failure by reading the error message and spending a few minutes poking the code with a debugger.<\/p>\n<h2>Next steps<\/h2>\n<p>Testing code is an art that requires you to apply judgement to your specific situation. There's a bunch of styles and methodologies for testing your code and your choice depends on your codebase, your app's risk profile and your time constraints. I think you can cultivate this judgement by trying out different techniques. If you haven't already, try a new style of testing on your codebase and see if you like it.<\/p>\n<p>I've enjoyed poking around the <a href=\"https:\/\/understandlegacycode.com\/changing-untested-code\">Undertand Legacy Code<\/a> blog, which suggests quite a few novel testing methods that I've never heard of. I've got my eye on the \"<a href=\"https:\/\/understandlegacycode.com\/approval-tests\/\">approval test<\/a>\" for a codebase I'm currently working on.<\/p>\n<p>If you're interested in reading more about automated testing with Python, then you might enjoy this post I wrote on how to <a href=\"https:\/\/mattsegal.dev\/pytest-on-github-actions.html\">automatically run your tests on every commit with GitHub Actions<\/a>.<\/p>","category":{"@attributes":{"term":"Programming"}}},{"title":"How to polish your GitHub projects when you're looking for a job","link":{"@attributes":{"href":"https:\/\/mattsegal.dev\/github-resume-polish.html","rel":"alternate"}},"published":"2020-06-17T12:00:00+10:00","updated":"2020-06-17T12:00:00+10:00","author":{"name":"Matthew Segal"},"id":"tag:mattsegal.dev,2020-06-17:\/github-resume-polish.html","summary":"<p>When you're going for your first programming job, you don't have any work experience or references to show that you can write code. You might not even have a relevant degree (I didn't). What you <em>can<\/em> do is write some code and throw it up on GitHub to demonstrate to \u2026<\/p>","content":"<p>When you're going for your first programming job, you don't have any work experience or references to show that you can write code. You might not even have a relevant degree (I didn't). What you <em>can<\/em> do is write some code and throw it up on GitHub to demonstrate to employers that you can build a complete app all by yourself.<\/p>\n<p>A lot of junior devs don't know how to show off their projects on GitHub. They spend <em>hours and hours<\/em> writing code and then forget to do some basic things to make their project seem interesting. In this post I want to share some tips that you can apply in a few hours to make an existing project much more effective at getting you an interview.<\/p>\n<h3>Remove all the clutter<\/h3>\n<p>Your project should only contain source code, plus the minimum files required to run it. It should not not contain:<\/p>\n<ul>\n<li>Editor config files (.idea, .vscode)<\/li>\n<li>Database files (eg. SQLite)<\/li>\n<li>Random documents (.pdf, .xls)<\/li>\n<li>Media files (images, videos, audio)<\/li>\n<li>Build outputs and artifacts (*.dll files, *.exe, etc)<\/li>\n<li>Bytecode (eg. *.pyc files for Python)<\/li>\n<li>Log files (eg. *.log)<\/li>\n<\/ul>\n<p>Having these files in your repo make you look sloppy. Professional developers don't like finding random crap cluttering up their codebase.\nYou can keep these files out of your git repo using a <a href=\"https:\/\/www.atlassian.com\/git\/tutorials\/saving-changes\/gitignore\">.gitignore<\/a> file. If you already have these files inside your repo, make sure to delete them. If you're using <code>bash<\/code> you can use <code>find<\/code> to delete all files that match a pattern, like Python bytecode files ending in <code>.pyc<\/code>.<\/p>\n<div class=\"highlight\"><pre><span><\/span><code>find -name *.pyc -delete\n<\/code><\/pre><\/div>\n\n<p>You can achieve a similar result in Windows PowerShell, but it'll be a little more verbose.<\/p>\n<p>Sometimes you do need to keep some media files, documents or even small databases in your source control. This is okay to do as long as it's an essential part of running, testing or documenting the code, as opposed to random clutter that you forgot to remove or gitignore. A good example of non-code files that you should keep in source control is website static files, like favicons and fonts.<\/p>\n<h3 id=\"readme\">Write a README<\/h3>\n\n<p>Your project <em>must<\/em> have a README file. This is a file in the root of your project's repository called <code>README.md<\/code>. It's a text file written in <a href=\"https:\/\/github.com\/adam-p\/markdown-here\/wiki\/markdown-cheatsheet\">Markdown<\/a> that gives a quick overview of what your project is and what it does. Not having a README makes your project seem crappy, and many people, including me, may close the browser window without checking any code if there isn't one present.<\/p>\n<p>Here's <a href=\"https:\/\/github.com\/anikalegal\/clerk\">one I prepared earlier<\/a>, and <a href=\"https:\/\/github.com\/AnikaLegal\/intake\">here's another<\/a>. They're not\nperfect, but I hope they give you a general idea of what to do.<\/p>\n<p>One hour of paying attention to your project's README is worth 20 extra hours of coding, when it comes to impressing hiring managers. You know when people mindlessly write that they have \"excellent communication skills\" on their resume? No one believe that - it's far too easy to just say that. Don't <em>tell them<\/em> that you have excellent commuication skills, <em>show them<\/em> when you write an excellent README.<\/p>\n<p>Enough of me waffling about why you should right a README, what do you put in it?<\/p>\n<p>First, you should describe what your project does at a high level: what problem it solves. It is a command line tool that plays music? Is it a website that finds you low prices on Amazon? Is it a Reddit bot that reminds people? A reader should be able to read the first few sentences and decide if it's something they might want to use. You should summarize the main features of your project in this section.<\/p>\n<p>A key point to remember is that the employer or recruiter reading your GitHub is both lazy and time-poor. They might not read past the first few sentences... they might not even read the code! They may well assume that your project works without checking anything. Before you rush to pack your README with features that don't exist, you scallywag, note that they may ask you more about your project in a job interview. So, uh... don't lie about anything.<\/p>\n<p>Beyond a basic overview of your project, it's also good to outline the high-level architecture of your code - how it's structured. For example, in a Django web app, you could explain the different apps that you've implemented and their responsibilities.<\/p>\n<p>If your project is a website, then you can also talk about the production infrastructure that your website runs on. For example:<\/p>\n<blockquote>\n<p>This website is deployed to a DigitalOcean virtual machine. The Django app runs inside a Gunicorn WSGI app server and depends on a Postgres database. A seperate Celery worker process runs offline tasks. Redis is responsible for both caching and serving as a task broker.<\/p>\n<\/blockquote>\n<p>Or for something a little more simple:<\/p>\n<blockquote>\n<p>This project is a static webpage that is hosted on Netlify<\/p>\n<\/blockquote>\n<p>Simply indicating that you know how to deploy your application makes you look good. \"Isn't that obvious though?\" - you may ask. No, it's not obvious and you need to be explicit.<\/p>\n<p>A little warning on READMEs: they're for other people to read, not you. Do not include personal to-dos or notes to yourself in your README. Put those somewhere else, like Trello or Workflowy.<\/p>\n<h3>Add a screenshot<\/h3>\n<p>Add a screenshot of your website or tool and embed it in the README, it'll take you 10 minutes and it makes it look way better. Store the screenshot in a \"docs\" folder and embed it in your README using Markdown. If it's a command line app your can use <a href=\"https:\/\/asciinema.org\/\">asciinema<\/a> to record the tool in action, if your project has a GUI then you can quickly record yourself using the website with <a href=\"https:\/\/www.loom.com\/my-videos\">Loom<\/a>. This will make your project seem much more impressive for only a small amount of effort.<\/p>\n<h3>Give instructions for other developers<\/h3>\n<p>You should include instructions on how other devs can get started using your project. This is important because it demonstrates that you can document project setup instructions, and also because someone may actually try to run your code. These instructions should state what tools are required to run your project. For example:<\/p>\n<ul>\n<li>You will need Python 3 and pip installed<\/li>\n<li>You will need yarn and node v11+<\/li>\n<li>You will need docker and docker-compose<\/li>\n<\/ul>\n<p>Next your should explain the steps, with explicit command line examples if possible, that are required to get the app built or running. If your project has external libraries that need to be installed, then you should have a file that specifies these dependencies, like a <code>requirements.txt<\/code> (Python) or <code>package.json<\/code> (Node) or <code>Dockerfile<\/code> \/ <code>docker-compose.yaml<\/code> (Docker).<\/p>\n<p>You should also include instructions on how to run your automated tests.\nYou have some tests, right? More on that later.<\/p>\n<p>If you've scripted your project's deployment, you can mention how to do it here, if you like.<\/p>\n<h3>Have a nice, readable commit history<\/h3>\n<p>If possible, your git commit history should tell a story about what you've been working on.\nEach commit should represent a distinct unit of work, and the commit message should explain what work was done.\nFor example your commit messages could look like this:<\/p>\n<ul>\n<li>Added smoke tests for payment API<\/li>\n<li>Refactored image compression<\/li>\n<li>Added Windows compatibility<\/li>\n<\/ul>\n<p>There are differing opions amongst devs on what exactly makes a \"good\" commit message, but it's very, very clear what bad commit messages look like:<\/p>\n<ul>\n<li>zzzz<\/li>\n<li>add code<\/li>\n<li>more code<\/li>\n<li>fuck<\/li>\n<li>remove shitty code<\/li>\n<li>fuckfuckfuckfuck<\/li>\n<li>still broken<\/li>\n<li>fuck Windows<\/li>\n<li>zzz<\/li>\n<li>adsafsf<\/li>\n<li>broken<\/li>\n<\/ul>\n<p>I for one have written my fair share of \"zzz\"s. This tip is hard to implement if you've already written all your commits. If you're feeling brave, or if you need to remove a few \"fucks\", you can re-write your commit history with <code>git rebase<\/code>. Be warned though, you can lose your code if you screw this up.<\/p>\n<h3>Fix your formatting<\/h3>\n<p>If I see inconsistent indentation or other poor formatting in someone's code, my opinion of their programming ability drops dramatically.\nIs this fair? Maybe, maybe not, but that's how it is. Make sure all your code sticks to your language's standard styling conventions.\nIf you don't know what those are, find out, you'll need to learn them eventually.\nFixing bad coding style is much easier to do if you use a linter or auto-formatter.<\/p>\n<h3>Add linting or formatting<\/h3>\n<p>This one is a bonus, but it's reasonably quick to do. Grab your language community's favorite linter and run it over your code.\nSomething like <code>eslint<\/code> for JavaScript or <code>flake8<\/code> for Python.\nFor those not in the know, a linter is a program that identifies style issues in your code.\nYou run it over your codebase and it yells at you if you do anything wrong. You think your impostor syndrome is bad?\nTry using a tool that screams at your about all your shitty style choices.\nThese tools are quite common in-industry and using one will help you stand out from other junior devs.<\/p>\n<p>Even better than a linter, try using an auto-formatter. I prefer these personally.\nThese tools automatically re-write your code so they conform with a standard style.\nExamples include <a href=\"https:\/\/golang.org\/cmd\/gofmt\/\">gofmt<\/a> for Go, <a href=\"https:\/\/github.com\/psf\/black\">Black<\/a> for Python and\n<a href=\"https:\/\/prettier.io\/\">Prettier<\/a> for JavaScript. I've written more about getting started with Black <a href=\"https:\/\/mattsegal.dev\/python-formatting-with-black.html\">here<\/a>.<\/p>\n<p>Whatever you choose, make sure you document how to run the linter or formatting tool in your README.<\/p>\n<h3>Write some tests<\/h3>\n<p>Automated code testing is an important part of writing reliable professional-grade software.\nIf you want someone to pay you money to be a professional software developer, then you should demonstrate\nthat you know what a unit test is and how to write one. You don't need to write 100s of tests or get a high test coverage,\nbut write a <em>few<\/em> at least.<\/p>\n<p>Needless to say, explain how to run your tests in your README.<\/p>\n<h3>Add automated tests<\/h3>\n<p>If you want to look super fancy then you can run your automated tests in GitHub Actions.\nThis isn't a must-have but it looks nice.\nIt'll take you 30 minutes if you've already written some tests and you can put a cool \"tests passing\" badge in your README that looks really good.\nI've written more on how to do this <a href=\"https:\/\/mattsegal.dev\/pytest-on-github-actions.html\">here<\/a><\/p>\n<h3>Deploy your project<\/h3>\n<p>If your project is a website then make sure it's deployed and available online.\nIf you have deployed it, make sure there's a link to the live site in the README.\nThis could be a large undertaking, taking hours or days, especially if you haven't done this before, so\nI'll leave it to you do decide if it's worthwhile.<\/p>\n<p>If your project is a Django app and you want to get it online, then you might like my guide on <a href=\"https:\/\/mattsegal.dev\/simple-django-deployment.html\">simple Django deployments<\/a>.<\/p>\n<h3>Add documentation<\/h3>\n<p>This is a high effort endeavour so I don't really recommend it if you're just trying to quickly improve the appeal of your project.\nThat said, building HTML documentation with something like <a href=\"https:\/\/www.sphinx-doc.org\/en\/master\/\">Sphinx<\/a> and hosting it on <a href=\"https:\/\/pages.github.com\/\">GitHub Pages<\/a> looks pretty pro. This only really makes sense if your app is reasonably complicated and requires documentation.<\/p>\n<h3>Next steps<\/h3>\n<p>I mention GitHub a lot in this post, but the same tips apply for projects hosted on Bitbucket and GitLab. All these tips also apply to employer-supplied coding tests that are hosted on GitHub, although I'd caution you not to spend too much time jazzing up coding tests: too many beautiful submissions end up in the garbage.<\/p>\n<p>Now you should have a few things you can do to spiff up your projects before you show them to prospective employers. I think it's important to make sure that the code that you've spent hours on isn't overlooked or dismissed because you didn't write a README.<\/p>\n<p>Good luck, and please don't hesitate to mail me money if this post helps you get a job.<\/p>","category":{"@attributes":{"term":"Programming"}}},{"title":"Studying programming: where to start","link":{"@attributes":{"href":"https:\/\/mattsegal.dev\/self-study-starting.html","rel":"alternate"}},"published":"2020-05-16T12:00:00+10:00","updated":"2020-05-16T12:00:00+10:00","author":{"name":"Matthew Segal"},"id":"tag:mattsegal.dev,2020-05-16:\/self-study-starting.html","summary":"<p>You have zero programming knowledge and you want to start learning to code.\nWhere do you start?<\/p>\n<p>Maybe you want to learn enough to get yourself a coding job, or you're planning to study computer science in the future and you\nwant to try it out before you start your \u2026<\/p>","content":"<p>You have zero programming knowledge and you want to start learning to code.\nWhere do you start?<\/p>\n<p>Maybe you want to learn enough to get yourself a coding job, or you're planning to study computer science in the future and you\nwant to try it out before you start your course.\nMaybe you just want to automate a few things here and there.<\/p>\n<p>This post will outline a path to getting comfortable with coding.\nIt's certainly not the only way to learn programming, it's just the advice that I would give anybody who asked me.<\/p>\n<h3>Learning the basics<\/h3>\n<p>First you'll need to pick a programming language to learn.\nThis won't be the only language you ever learn if you pursue programming long-term,\nI've been coding for about 5 years and I know roughly three-and-a-half to six languages, depending on what counts as a \"language\", so this\nchoice isn't forever.<\/p>\n<p>If you haven't already picked something, learn Python.\nIt's one of the easier languages to get started with, has a syntax that almost looks\nlike natural language and you can get a lot done with it.\nThere are also a <em>lot<\/em> of good beginner resources for learning Python.<\/p>\n<p>You should follow a bare-basics course or book to get started, like <a href=\"https:\/\/realpython.com\/learning-paths\/python3-introduction\/\">Real Python's introductory course<\/a> or the often-recommended <a href=\"https:\/\/automatetheboringstuff.com\/\">Automate the Boring Stuff with Python book<\/a>. There are <a href=\"https:\/\/www.reddit.com\/r\/learnpython\/wiki\/index#wiki_new_to_programming.3F\">dozens of books and courses<\/a> you can choose, so just pick one and learn the basics. I've also written about some <a href=\"https:\/\/mattsegal.dev\/windows-setup-programming.html\">other tools<\/a> that you should look into when programming (specifically on Windows).<\/p>\n<p>\"The basics\", which I have mentioned so far, will include:<\/p>\n<ul>\n<li>installing Python on your computer<\/li>\n<li>running Python scripts on the \"command line\" (CLI)<\/li>\n<li>variables, data types, simple data structures<\/li>\n<li>control flow (if, else, for, while)<\/li>\n<li>printing output to the CLI<\/li>\n<li>reading user input from the CLI<\/li>\n<li>reading from and writing to files<\/li>\n<\/ul>\n<p>It's going to feel very simple and kind of dumb. After a week of messing around you <em>might<\/em> be able to build a simple text-based calculator.\nThat's pretty normal: your first programs will not be very impressive at all.\nMaybe you spent your first week just-trying-to-fucking-install-Python, which isn't that abnormal either.<\/p>\n<h3>Many small challenges<\/h3>\n<p>Once you've read a book or followed a course and gotten the basics down, you need to start setting small challenges for yourself. You can't just do tutorials forever... well you <em>can<\/em>, but you'll always be dependent on them to learn new things. You need to come up with your own problems and build your own solutions. You might start by building:<\/p>\n<ul>\n<li>a script that asks your name and prints it back to you<\/li>\n<li>a text-based calculator that helps you add, multiply, divide<\/li>\n<li>a script that tells you the number of days between two dates<\/li>\n<li>a script that prints out your workout for the day<\/li>\n<\/ul>\n<p>The point isn't that these are particularly useful or impressive programs, it's that you set a challenge for yourself and then build a working solution.\nThis is something you should do over and over. You will get stuck and need to search Google and check <a href=\"https:\/\/stackoverflow.com\/\">Stack Overflow<\/a>, read the official <a href=\"https:\/\/docs.python.org\/3\/\">Python documentation<\/a>, and ask for help on <a href=\"https:\/\/www.reddit.com\/r\/learnprogramming\/\">\/r\/learnprogramming<\/a> and <a href=\"https:\/\/www.reddit.com\/r\/python\/\">\/r\/learnpython<\/a>. Getting stuck, and then getting unstuck is a part of being a programmer, and tutorials that hold your hand won't teach you how to solve your own problems. Since you are doing lots of small challenges, it's OK if you need to give up and try something easier before coming back to it later.<\/p>\n<p>Don't get me wrong, tutorials are <em>great<\/em> resources for learning a specific skill, but you cannot learn from them alone.<\/p>\n<p>If you're having trouble inventing your own challenges, then check out <a href=\"https:\/\/www.codeabbey.com\/\">Code Abbey<\/a>, which has a bajillion problems to solve with a wide range of difficulties.<\/p>\n<p>Once you're comfortable with the basic syntax of Python and solving simple problems, you can slowly ramp up the complexity of the problems and start playing around with third party libraries. You might eventually:<\/p>\n<ul>\n<li>make a script that pulls data from a webpage and prints it to a screen<\/li>\n<li>write a simple website that stores your \"to-dos\" in a database<\/li>\n<li>write a script that finds and deletes duplicate photos on your laptop<\/li>\n<li>play around with a new language, like JavaScript, which is the language that you need to program websites<\/li>\n<li>learn to create webpages with HTML and CSS<\/li>\n<li>read and write data from a spreadsheet (Excel, Google Sheets)<\/li>\n<li><a href=\"https:\/\/www.khanacademy.org\/computing\/computer-programming\/sql\">learn SQL<\/a> to manage databases<\/li>\n<\/ul>\n<p>You won't know how to solve these problems to start, which is part of the process: Googling around until you figure out how to achieve your goal.\nI don't actually remember very much about the coding tools and languages that I use day to day - I'm just a fucking gun at Googling things. I'm a professional <a href=\"https:\/\/www.djangoproject.com\/\">Django<\/a> developer (most of the time) and when I'm working on Django projects I will check the <a href=\"https:\/\/docs.djangoproject.com\/en\/3.0\/\">documentation<\/a> at least once an hour. Developing your Google-fu and learning how to read documentation will be vital for your programming abilities.<\/p>\n<h3>More advanced theory<\/h3>\n<p>If you get comfortable with coding and want to get a head start on your studies, then you should try some more advanced online courses.\nEven if you aren't going to study computer science at school, <a href=\"https:\/\/mattsegal.dev\/self-study-tools-vs-concepts.html\">there are great benefits to learning some theory<\/a>.<\/p>\n<p>I recommend choosing something that teaches computer science concepts, not just how to use particular tools.\nI think <a href=\"https:\/\/www.coursera.org\/learn\/principles-of-computing-1\">Principles of Computing<\/a> is a fantastic course for dipping your toes into computer science,\nwith engaging coursework and a focus on the practical skills that a software engineer needs. <a href=\"https:\/\/mattsegal.dev\/nand-to-tetris.html\">Nand2Tetris<\/a> is awesome, but a little more challenging. There is an <em>absurd<\/em> number of free online computer science courses, these two aren't the only good ones, they're just the ones I have personally done and recommend.<\/p>\n<h3>Learn some more advanced tools and frameworks<\/h3>\n<p>Eventually you'll want to do something practical with your coding, and you won't want to build everything from scratch.\nWith a solid grounding in Python and basic programming, you can branch out to learn about:<\/p>\n<ul>\n<li>scientific computing (NumPy, SciPy)<\/li>\n<li>data science tools (pandas, matplotlib, Jupyter notebooks)<\/li>\n<li>web development (Flask, Django)<\/li>\n<li>web server admin (Linux, SSH, SCP, apt, AWS, etc)<\/li>\n<li>databases (SQLite, Postgres, SQLAlchemy)<\/li>\n<li>frontend JavaScript frameworks (React, Vue, Angular)<\/li>\n<\/ul>\n<p>I recommend that you learn these tools with a small project in mind, to better motivate and guide your study.<\/p>\n<h3>Conclusion<\/h3>\n<p>So in summary, I recommend you:<\/p>\n<ul>\n<li>learn the basics of Python by following a beginner's course or book<\/li>\n<li>do lots of small self-guided coding challenges<\/li>\n<li>try some more advanced self-guided challenges and new tools<\/li>\n<li>learn some computer science theory<\/li>\n<li>learn some advanced tools and frameworks<\/li>\n<\/ul>","category":{"@attributes":{"term":"Programming"}}},{"title":"Studying programming: pace yourself","link":{"@attributes":{"href":"https:\/\/mattsegal.dev\/self-study-pacing.html","rel":"alternate"}},"published":"2020-05-15T12:00:00+10:00","updated":"2020-05-15T12:00:00+10:00","author":{"name":"Matthew Segal"},"id":"tag:mattsegal.dev,2020-05-15:\/self-study-pacing.html","summary":"<p>You can learn programming all by yourself and get a coding job. Just you, your laptop and the internet.\nIt's great! You don't have to pay thousands of dollars for a degree and you can work at your own pace.<\/p>\n<p>There's a problem with this approach though: with no teacher \u2026<\/p>","content":"<p>You can learn programming all by yourself and get a coding job. Just you, your laptop and the internet.\nIt's great! You don't have to pay thousands of dollars for a degree and you can work at your own pace.<\/p>\n<p>There's a problem with this approach though: with no teacher or course to guide you it's not clear\nhow much work you need to do every day. There are no professors giving weekly lectures or tutors setting homework.<\/p>\n<p>You just do as much study as you're motivated to do. Are you doing enough? Could you do more?\nThese questions can eat at you, creating guilt and anxiety when you spend time on non-programming activities.\nThere's no clear line between work, study and play. There's no campus or workplace to go to and no-one is keeping you accountable.<\/p>\n<p>I've written before about <a href=\"https:\/\/mattsegal.dev\/self-study-mindset-enthusiasm.html\">how to choose<\/a>\nwhat to study and whether to focus on <a href=\"https:\/\/mattsegal.dev\/self-study-tools-vs-concepts.html\">theory or practice<\/a>.\nIn this post I want to discuss the question: how much should should you be working?<\/p>\n<p>In general my advice will be to pace yourself. Slow and steady - don't burn out. I can't pin down\nthis quote exactly, but it goes something like:<\/p>\n<blockquote>\n<p>You overestimate what you can do in a day, and underestimate what you can do in a year<\/p>\n<\/blockquote>\n<h3>Actual advice<\/h3>\n<p>Let's be specific though. I think you should aim for four hours a day of total study.\nThat's what worked for me personally.\nIt might not seem like much, but that's four hours <em>every day<\/em>.\nYou might think that 8 hours of work per day is the golden standard, but learning is much harder than working.<\/p>\n<p>That four hour figure was for people who are studying full time.\nOf course not everyone has the luxury to dedicate themselves to learning a new profession.\nIf you're working a job as well then I'd aim for 1-2 hours a day at most.<\/p>\n<h3>Study some theory<\/h3>\n<p>Of those fours hours I recommend you spend 1-2 hours doing some sort of course work. This might be watching\nlectures, reasing a textbook or doing assigned coursework. I'm talking about courses from <a href=\"https:\/\/www.coursera.org\/\">Coursera<\/a>,\n<a href=\"https:\/\/www.udemy.com\/\">Udemy<\/a>, <a href=\"https:\/\/ocw.mit.edu\/index.htm\">MIT OpenCourseWare<\/a>, or even just good old YouTube videos.\nPassively reading and watching videos is very mentally draining and I doubt you can actively absorb information and take notes for more than two hours a day.\nYou can sit infront of a video like a zombie for as many hours as you want, but I think most people only have a couple of hours of high quality\npassive learning in them every day. Maybe you're special (good for you!), but if you're not special that's OK.<\/p>\n<h3>Do some practice<\/h3>\n<p>I think you should spend the rest of your time on some sort of coding project, which I've described <a href=\"https:\/\/mattsegal.dev\/self-study-mindset-enthusiasm.html\">elsewhere<\/a>. I suggest coding after some theory, because learning theory is harder. If you've found a project that you're interested in, then it shouldn't be hard to spend 2-3 hours writing code. Some days you'll get totally stuck and give up a little early - that's normal. Other days you'll be having fun and the next 10 hours will fly by. What's important is that you don't <em>force yourself<\/em> to sit and code for 8 hours a day when you're not having fun.\nIf you find yourself repeatedly unably to spend 2-3 hours a day working on your code, then you need to take a step back and figure out what's blocking you:<\/p>\n<ul>\n<li>Do you hate the language you're using? Try a new language<\/li>\n<li>Is the project you chose for yourself too hard? Try something easier<\/li>\n<\/ul>\n<p>In general you should be taking a meta perspective on your work. Don't try and just slog through your problems. If 2-3 hours of coding a day isn't fun,\nthen figure out how to make it fun.<\/p>\n<h3>Immerse yourself<\/h3>\n<p>You can accumulate a lot of passive programming knowledge without doing a lot of work <em>per-se<\/em>.\nIf you can find programming-related entertainment that you enjoy consuming, then it won't feel like you're\nstudying. For example, I used to really enjoy reading <a href=\"https:\/\/www.joelonsoftware.com\/\">Joel Spolosky's blog<\/a>\nand listening to podcasts like <a href=\"https:\/\/www.programmingthrowdown.com\/\">Programming Throwdown<\/a> and <a href=\"https:\/\/softwareengineeringdaily.com\/\">Software Engineering Daily<\/a>.\nI'd just listen to these podcasts while I was walking around. It was just casual consumption, not a study thing: there were no weekly \"podcast goals\" that had to be met.\nMaybe blogs and podcasts aren't your thing, but there are also books and YouTube videos a-plently that are both entertaining and <em>slightly<\/em> informative.\nHell, I even picked up lingo from being subbed to <a href=\"https:\/\/www.reddit.com\/r\/programminghumor\/\">\/r\/programminghumor<\/a>.<\/p>\n<p>It's really hard to say how long it takes any given person to get a software job from scratch.\nThere's a lot of variance involved including your local job market plus pure luck and coincidence.\nIn general getting a job will probably take longer than you expect, so it's important to work consistently.\nYou should be in it for the long haul: pace yourself.<\/p>","category":{"@attributes":{"term":"Programming"}}},{"title":"Studying programming: tools or theory?","link":{"@attributes":{"href":"https:\/\/mattsegal.dev\/self-study-tools-vs-concepts.html","rel":"alternate"}},"published":"2020-05-10T12:00:00+10:00","updated":"2020-05-10T12:00:00+10:00","author":{"name":"Matthew Segal"},"id":"tag:mattsegal.dev,2020-05-10:\/self-study-tools-vs-concepts.html","summary":"<p>When you're studying web development you have a lot to learn and limited time.\nOne of the hard choices that you'll need to make is whether you learn tools or concepts.\nShould you study data structures and algorithms to be a web developer?\nIt seems kind of esoteric.\nDo you \u2026<\/p>","content":"<p>When you're studying web development you have a lot to learn and limited time.\nOne of the hard choices that you'll need to make is whether you learn tools or concepts.\nShould you study data structures and algorithms to be a web developer?\nIt seems kind of esoteric.\nDo you just need to learn a bunch of the latest tools and frameworks to be productive?\nI'm going to argue that you need both: learning concepts makes you better at using tools, and using tools motivates you to learn concepts.<\/p>\n<h3>The case for learning tools<\/h3>\n<p>The case for learning tools and frameworks is the strongest so let's get it out of the way: they make you more productive.\nI can use <a href=\"https:\/\/www.djangoproject.com\/\">Django<\/a> to build a website with\nauthentication, permissions, HTML templating, database models, form validation, etc. in half a day.\nWriting any one of these features from scratch would take me days at the very least.\nYou do not want to invent the 2020 programmer's toolchain from scratch, not if you want to get anything done.<\/p>\n<p>In addition, employers want you to know how to use tools.\nProgrammers get paid to ship valuable code, not to know a bunch of stuff.\nJob advertisments are primarily a list of <a href=\"https:\/\/reactjs.org\/\">React<\/a>, <a href=\"https:\/\/spring.io\/\">Spring<\/a>, <a href=\"https:\/\/webpack.js.org\/\">Webpack<\/a>, <a href=\"https:\/\/nuxtjs.org\/\">NuxtJS<\/a>, Django, <a href=\"https:\/\/rubyonrails.org\/\">Rails<\/a>, etc.\nContrary to how it might seem, you can get these jobs without knowing every technology on the list,\nbut you do need to know at least some of them.\nGood luck getting a coding job if you don't know Git.<\/p>\n<p>Ok, so we're done right? Tools win, fuck ideas. Learn Git, get money.<\/p>\n<h3>The case for learning ideas<\/h3>\n<p>You can't just learn tools and frameworks. If you do not know <a href=\"https:\/\/www.youtube.com\/watch?v=DTQV7_HwF58\">how the internet works<\/a>,\nthen you're going to spend your time as a web developer swimming in a meaningless word-soup of \"DNS\", \"TCP\" and \"Headers\".\nDjango's database model structure is going to be very confusing if you don't know what \"database normalisation\" is.\nHow will you debug issues that don't already have a StackOverflow post written for then?<\/p>\n<p>Ok cool, so you need to learn some basic internet stuff, but do you really need to learn about computational complexity?\nDo you have to be able to <a href=\"https:\/\/twitter.com\/mxcl\/status\/608682016205344768?lang=en\">invert a binary tree<\/a>?<\/p>\n<p>Well, no: you don't <em>have<\/em> to learn these theoretical computer-sciency concepts to get a job as a programmer.\nThat said, I think it's in your interest to learn theoretical stuff.\nLearning computer-sciency concepts help you learn new tools faster and use them better.\nIf you've learned a little bit of <a href=\"https:\/\/en.wikipedia.org\/wiki\/Functional_programming\">functional programming<\/a> then you'll find a lot of familliar concepts when reading the <a href=\"https:\/\/redux.js.org\/basics\/reducers\">Redux documentation<\/a>:<\/p>\n<blockquote>\n<p>The reducer is a pure function that takes the previous state and an action, and returns the next state.<\/p>\n<\/blockquote>\n<p>If you haven't been exposed to functional programming concepts, then words like \"state\", \"pure function\" and \"immutability\"\nare going to be complete jibberish. Functional programming is infamous for this kind of techno-babble:<\/p>\n<blockquote>\n<p><a href=\"https:\/\/stackoverflow.com\/questions\/3870088\/a-monad-is-just-a-monoid-in-the-category-of-endofunctors-whats-the-problem\">A monad is just a monoid in the category of endofunctors, what's the problem?<\/a><\/p>\n<\/blockquote>\n<p>The authors of the Redux docs have the <a href=\"https:\/\/en.wikipedia.org\/wiki\/Curse_of_knowledge\">curse of knowledge<\/a>. They either don't know that they need to explain these terms, or they don't care to.\nYou might not have bothered to learn about functional programming, but they did.<\/p>\n<p>Similarly, you don't need to understand hash functions to use Git, but the string of crazy numbers and letters\nin your history is going to be quite disorienting: what the fuck is e2cbf1addc70652c4d63fdb5a81720024c9f2677 supposed to mean?<\/p>\n<p>Even simple ideas like the idea of a \"tree\" data structure helps you work with the computer filesystem more easily.\nYou might know that recursion is a good method for \"walking\" trees. Pattern-matching a programming problem\nto a data structure will help you come up with solutions much faster.<\/p>\n<p>You can't know beforehand which computer science concepts will be useful.\nAs far as I can tell, functional programming got \"cool\" and baked into some frontend tools in the last five years or so. I don't know what's next. You need to get a broad base of knowledge to navigate and demystify the programming landscape.<\/p>\n<p>Ok, so you should:<\/p>\n<ul>\n<li>isolate yourself in a log cabin for four years<\/li>\n<li>study computer science<\/li>\n<li>return to civilisation<\/li>\n<li>learn Git<\/li>\n<li>get money<\/li>\n<\/ul>\n<p>...right?<\/p>\n<h3>What to learn first?<\/h3>\n<p>You can't sit down and just learn all of computer science, downloading it all into your brain\nlike Neo hooked into the Matrix.\nYou'll also struggle to learn new tools and frameworks without some computer science fundamentals.\nSo, what to do?<\/p>\n<p>I think you should try a <a href=\"https:\/\/en.wikipedia.org\/wiki\/Spiral_approach\">spiral approach<\/a> to learning.\nYou should learn a some theory, then explore some new tools, then try to build something practical.\nRepeat over and over.\nYou won't necessarily learn everything in the \"right order\", but new ideas from one area will influence another.\nYou might:<\/p>\n<ul>\n<li>run into performance bottlenecks in your code and get interested in computational complexity<\/li>\n<li>read about \"pure functions\" in the Redux docs and explore functional programming<\/li>\n<li>complete a course on <a href=\"https:\/\/mattsegal.dev\/nand-to-tetris.html\">compilers<\/a> and finally understand what all those pesky .class, .pyc and .dll files are doing on your computer<\/li>\n<\/ul>\n<p>This might seem like a random and haphazard approach, and it kind of is, but I don't think learning\nprogramming should be viewed as a big list of \"things you must do\". I've written more about that <a href=\"https:\/\/mattsegal.dev\/self-study-mindset-enthusiasm.html\">in this post<\/a>.<\/p>\n<p>If you are learning programming and you have only focused on learning frameworks and tools, then I encourage you to mix in some theoretical online courses as well. If you're immersed in a univeristy-style curriculum and haven't tried any modern programming tools - start using them now!<\/p>","category":{"@attributes":{"term":"Programming"}}},{"title":"Studying programming: what to learn next?","link":{"@attributes":{"href":"https:\/\/mattsegal.dev\/self-study-mindset-enthusiasm.html","rel":"alternate"}},"published":"2020-05-08T12:00:00+10:00","updated":"2020-05-08T12:00:00+10:00","author":{"name":"Matthew Segal"},"id":"tag:mattsegal.dev,2020-05-08:\/self-study-mindset-enthusiasm.html","summary":"<p>A lot of people trying to teach themselves programming have an anxiety\nabout what they should be learning. There is an endless array\nof options - you've seen these ridiculous <a href=\"https:\/\/github.com\/prakhar1989\/awesome-courses\">lists of online courses<\/a>, right?\nThere's too much to learn and not enough time! You don't want to waste time learning \u2026<\/p>","content":"<p>A lot of people trying to teach themselves programming have an anxiety\nabout what they should be learning. There is an endless array\nof options - you've seen these ridiculous <a href=\"https:\/\/github.com\/prakhar1989\/awesome-courses\">lists of online courses<\/a>, right?\nThere's too much to learn and not enough time! You don't want to waste time learning something that doesn't matter.<\/p>\n<p>This dilemma can manifest as a general sense of dread about the task ahead of you,\nor it can lead you to ruminate over specific technolgies: should I learn Java or Python? Flask or Django?\nWhich framework is best? What should I learn to get myself a job?<\/p>\n<h3>Follow your enthusiasm<\/h3>\n<p>I recommend you dissolve this question by learning about things that interest you.\nStuff that you're enthusiastic about. Stuff that <strong>gets you going<\/strong>.\nYou wanted to learn programming for a reason, right? Why was that -\nwhat about it seems cool to you? Work on that!<\/p>\n<p>If you don't know what's cool about programming, then you should explore the landscape: sample lots of things until you find something you like. Try lots of small projects:\nmake a webpage in HTML, do some hacking challenges, learn about databases, or functional programming, etc.\nYou can use <a href=\"https:\/\/marginalrevolution.com\/marginalrevolution\/2019\/08\/reading-and-rabbit-holes.html\">this \"rabbit holes\" technique<\/a>\nto pose some interesting questions for yourself.<\/p>\n<p>You might have a goal like \"I want to be a backend web developer\". I think you can work towards this goal while still learning things that interest you. For example, if you're into hacking at the moment, you can learn about hacking webservers.<\/p>\n<h3>It's all just practice anyway<\/h3>\n<p>Here's the thing, it doesn't really matter what you learn next, at least not early on.\nAs a beginner coder, you're going to fucking suck at everything you do, so you might as well have fun while you're sucking.\nYour first few months will just be learning to program, in general, and the most important thing to do is to write a lot of code.<\/p>\n<p>If you're interested in the content you're learning and the code you're writing then you will do so\nmuch more practice than if you're just grinding. Conversely, you'll burn out if you're forcing yourself to work on\nsomething you don't care about. This isn't some airy-fairy feel-good advice telling you to \"follow your dreams\" - you will do more productive work if you're interested in what you're doing.<\/p>\n<p>Why is practice so important? In programming there are a bunch of meta-skills that you don't learn deliberately,\nbut you'll learn them by writing a lot of code. I'm talking about\nhow to debug your code, how to find solutions to problems online, how to read documentation.\nThere are \"coding muscles\" in your brain that you need to exercise.\nPicking up these meta-skills is more important than knowing some specifc web framework, like Ruby on Rails (RoR).<\/p>\n<p>Here's why: if you try to learn RoR early it'll take you weeks to learn the basics of the framework.\nYou'll struggle to navigate the online documentation and command-line tools.\nYou'll make Ruby language syntax errors while trying to learn framework-specific concepts.\nOn the other hand, you can learn the basics of RoR in a weekend after 6 months of programming.\nYou'll only need to pick up the RoR-specific stuff because you already have a solid background in coding.<\/p>\n<p>Just to clarify: I'm not saying that you shouldn't learn Ruby on Rails in your first week of learning to code. I'm saying that you shouldn't <em>force yourself<\/em> to learn Ruby on Rails because you're trying to optimise for getting a job in the shortest time possible.<\/p>\n<h3>There's practice, then there's <em>practice<\/em><\/h3>\n<p>A note on \"practice\". I said that it's important for a beginner, and suggested that writing a lot of code is good practice.\nI think that's mostly true, but there are some things you can do to get better faster:<\/p>\n<ul>\n<li>Deliberately try out new skills and techniques in your work: \"in this project, I'm going to try the 'object oriented' style I just learned\". In another project you might try the \"functional style\".<\/li>\n<li>Re-visit your old projects and think about how\/if you could make them better with your new skills. You'll learn a lot by trying to read the code you wrote a month ago.<\/li>\n<li>Try to get some feedback on your code from more experienced developers.<\/li>\n<\/ul>\n<h3>Start by building things<\/h3>\n<p>When I said \"learn about hacking webservers\" earlier, you might imagine yourself reading a textbook on hacking and then reading a textbook on webservers. Maybe that works for you, but I don't like that approach. I think you should start your learning journey with a practical challenge. In this hacking example it might be to complete the first stage of the <a href=\"https:\/\/overthewire.org\/wargames\/\">Over the Wire wargames<\/a>. There are two reasons for this.<\/p>\n<p>Firstly, you do not know what you need to learn. How could you? Having a practical goal grounds you in reality and forces you to confront your ignorance. Here's an example: I literally did not know what \"backend web development\" was when I was building my <a href=\"https:\/\/mattslinks.xyz\/\">first website<\/a> (this ~v10). Even so, I wanted new list items to remain on the page after I refreshed it in my browser. To get that to work, I learned a lot about backend web dev: HTTP, APIs, Linux, virtual machines, web servers, JSON, web frameworks, WSGI, etc. I had no idea that I needed to know what a \"JSON\" was when I set out on that path, but my practical project lead me there.<\/p>\n<p>Secondly, it's much easier to learn things when you have an implementation in mind. If you are building something, then when you need to learn a new concept you'll think:<\/p>\n<blockquote>\n<p>Hmm, I need to learn more about JSON and HTTP are to get this task done.<\/p>\n<\/blockquote>\n<p>You'll be motivated because you will quickly be able to use that new knowledge. Contrast that to someone handing you a massive list of web technologies and telling you to study them all. Do you think you'll be motivated to slog through that list?<\/p>\n<p>To clarify: I'm not telling you not to read textbooks. I'm saying that you should read\ntextbooks after you've found a practical problem that inspires you to learn more. <a href=\"https:\/\/mattsegal.dev\/nand-to-tetris.html\">Nand2Tetris<\/a> is a great example of this. It's an online course where they trick you into reading a <a href=\"https:\/\/www.nand2tetris.org\/book\">textbook<\/a> by first asking you to build a computer.<\/p>\n<h3>Conclusion<\/h3>\n<p>Next time you catch yourself agonising over your next coding project, new framework or online course,\njust ask yourself: is this fun? Will I learn something new? If so, go for it!<\/p>","category":{"@attributes":{"term":"Programming"}}},{"title":"Keeping your config files valid with Python","link":{"@attributes":{"href":"https:\/\/mattsegal.dev\/cerberus-config-validation.html","rel":"alternate"}},"published":"2020-05-03T12:00:00+10:00","updated":"2020-05-03T12:00:00+10:00","author":{"name":"Matthew Segal"},"id":"tag:mattsegal.dev,2020-05-03:\/cerberus-config-validation.html","summary":"<p>It's common to use a config file for your Python projects:\nsome sort of JSON or YAML document that defines how you program behaves. Something like this:<\/p>\n<div class=\"highlight\"><pre><span><\/span><code><span class=\"c1\"># my-config.yaml<\/span><span class=\"w\"><\/span>\n<span class=\"nt\">num_iters<\/span><span class=\"p\">:<\/span><span class=\"w\"> <\/span><span class=\"l l-Scalar l-Scalar-Plain\">30<\/span><span class=\"w\"><\/span>\n<span class=\"nt\">population_size<\/span><span class=\"p\">:<\/span><span class=\"w\"> <\/span><span class=\"l l-Scalar l-Scalar-Plain\">20000<\/span><span class=\"w\"><\/span>\n<span class=\"nt\">cycle_type<\/span><span class=\"p\">:<\/span><span class=\"w\"> <\/span><span class=\"s\">&quot;long&quot;<\/span><span class=\"w\"><\/span>\n<span class=\"nt\">use_gpu<\/span><span class=\"p\">:<\/span><span class=\"w\"> <\/span><span class=\"l l-Scalar l-Scalar-Plain\">true<\/span><span class=\"w\"><\/span>\n<span class=\"nt\">plots<\/span><span class=\"p\">:<\/span><span class=\"w\"> <\/span><span class=\"p p-Indicator\">[<\/span><span class=\"nv\">population<\/span><span class=\"p p-Indicator\">,<\/span><span class=\"w\"> <\/span><span class=\"nv\">infections<\/span><span class=\"p p-Indicator\">,<\/span><span class=\"w\"> <\/span><span class=\"nv\">cost<\/span><span class=\"p p-Indicator\">]<\/span><span class=\"w\"><\/span>\n<\/code><\/pre><\/div>\n\n<p>Storing config in a file \u2026<\/p>","content":"<p>It's common to use a config file for your Python projects:\nsome sort of JSON or YAML document that defines how you program behaves. Something like this:<\/p>\n<div class=\"highlight\"><pre><span><\/span><code><span class=\"c1\"># my-config.yaml<\/span><span class=\"w\"><\/span>\n<span class=\"nt\">num_iters<\/span><span class=\"p\">:<\/span><span class=\"w\"> <\/span><span class=\"l l-Scalar l-Scalar-Plain\">30<\/span><span class=\"w\"><\/span>\n<span class=\"nt\">population_size<\/span><span class=\"p\">:<\/span><span class=\"w\"> <\/span><span class=\"l l-Scalar l-Scalar-Plain\">20000<\/span><span class=\"w\"><\/span>\n<span class=\"nt\">cycle_type<\/span><span class=\"p\">:<\/span><span class=\"w\"> <\/span><span class=\"s\">&quot;long&quot;<\/span><span class=\"w\"><\/span>\n<span class=\"nt\">use_gpu<\/span><span class=\"p\">:<\/span><span class=\"w\"> <\/span><span class=\"l l-Scalar l-Scalar-Plain\">true<\/span><span class=\"w\"><\/span>\n<span class=\"nt\">plots<\/span><span class=\"p\">:<\/span><span class=\"w\"> <\/span><span class=\"p p-Indicator\">[<\/span><span class=\"nv\">population<\/span><span class=\"p p-Indicator\">,<\/span><span class=\"w\"> <\/span><span class=\"nv\">infections<\/span><span class=\"p p-Indicator\">,<\/span><span class=\"w\"> <\/span><span class=\"nv\">cost<\/span><span class=\"p p-Indicator\">]<\/span><span class=\"w\"><\/span>\n<\/code><\/pre><\/div>\n\n<p>Storing config in a file is nice because it lets you separate your input data from the code itself,\nbut it sucks when a bad config file crashes your program. What <em>really<\/em> sucks is when:<\/p>\n<ul>\n<li>You don't know exactly which bad value is caused the crash, or how to fix it<\/li>\n<li>The bad config crashes your program minutes or hours after you first ran it<\/li>\n<li>Other users write invalid config, then tell you your code is broken<\/li>\n<\/ul>\n<p>A related issue is when you have complex data structures flying around inside your code: lists of dicts, dicts of lists, dicts of dicts of dicts.\nYou just have to pray that all the data is structured the way you want it. Sometimes you forget how it's supposed to look in the first place.<\/p>\n<p>You might have tried validating this data yourself using \"assert\" statments, \"if\"s and \"ValueError\"s, but it quickly get tedious and ugly.<\/p>\n<h3>Cerberus<\/h3>\n<p>When I run into these kinds of problems, I tend to pull out <a href=\"https:\/\/docs.python-cerberus.org\/en\/stable\/\">Cerberus<\/a>\nto stop the bleeding. It's a small Python library that can validate data according to some schema at runtime.\nIt's pretty simple to use (as per their docs):<\/p>\n<div class=\"highlight\"><pre><span><\/span><code><span class=\"kn\">from<\/span> <span class=\"nn\">cerberus<\/span> <span class=\"kn\">import<\/span> <span class=\"n\">Validator<\/span>\n\n<span class=\"n\">schema<\/span> <span class=\"o\">=<\/span> <span class=\"p\">{<\/span><span class=\"s1\">&#39;name&#39;<\/span><span class=\"p\">:<\/span> <span class=\"p\">{<\/span><span class=\"s1\">&#39;type&#39;<\/span><span class=\"p\">:<\/span> <span class=\"s1\">&#39;string&#39;<\/span><span class=\"p\">}}<\/span>\n<span class=\"n\">v<\/span> <span class=\"o\">=<\/span> <span class=\"n\">Validator<\/span><span class=\"p\">(<\/span><span class=\"n\">schema<\/span><span class=\"p\">)<\/span>\n<span class=\"n\">v<\/span><span class=\"o\">.<\/span><span class=\"n\">validate<\/span><span class=\"p\">({<\/span><span class=\"s1\">&#39;name&#39;<\/span><span class=\"p\">:<\/span> <span class=\"s1\">&#39;john doe&#39;<\/span><span class=\"p\">})<\/span> <span class=\"c1\"># True<\/span>\n<span class=\"n\">v<\/span><span class=\"o\">.<\/span><span class=\"n\">validate<\/span><span class=\"p\">({<\/span><span class=\"s1\">&#39;name&#39;<\/span><span class=\"p\">:<\/span> <span class=\"s1\">&#39;aaaa&#39;<\/span><span class=\"p\">})<\/span> <span class=\"c1\"># True<\/span>\n<span class=\"n\">v<\/span><span class=\"o\">.<\/span><span class=\"n\">validate<\/span><span class=\"p\">({<\/span><span class=\"s1\">&#39;name&#39;<\/span><span class=\"p\">:<\/span> <span class=\"mi\">1<\/span><span class=\"p\">})<\/span> <span class=\"c1\"># False<\/span>\n<span class=\"n\">v<\/span><span class=\"o\">.<\/span><span class=\"n\">errors<\/span> <span class=\"c1\"># {&#39;name&#39;: [&#39;must be of string type&#39;]}<\/span>\n<\/code><\/pre><\/div>\n\n<p>You can use this tool to validate all of your loaded config at the <em>start<\/em> of your program: giving early feedback to the user\nand printing a sensible error message that tells them how to solve the problem (\"Look at 'name', make it a string!\").\nThis is much better than some obscure ValueError that bubbles up from 6 function calls deep.\nIt's still not a great experience for non-programmers, but coders will appreciate the clarity.<\/p>\n<p>The Cerberus schema is just a Python dictionary that you define.\nEven so, it's quite a powerful system for how basic it is. You can use Ceberus schemas to validate complicated nested data structres if you want to,\neven adding custom validation functions and type definitions.\nIt's particularly nice because it allows you to declare how your data should look, rather than writing a hundred \"if\" statements.<\/p>\n<p>Here's an example: a YAML config file for training a neural network might look like <a href=\"https:\/\/gist.github.com\/MattSegal\/d813f8d7848b5459f95f5eeacf581d2a\">this<\/a> and\nyou could build a validator for that config like <a href=\"https:\/\/gist.github.com\/MattSegal\/fea30d10d26ef666f3a572e97f03c339\">this<\/a>. Since everything is just dicts, there's no reason you can't also write your schema as a YAML or JSON (<a href=\"https:\/\/gist.github.com\/MattSegal\/b855659ff40533a9d13935a3ca632f63\">example<\/a>).\nLuckily Cerberus will validate your schema before applying it, so there is no endless recursion of \"who validates the validators?\".<\/p>\n<h3>Schema as documenation<\/h3>\n<p>I think that defining data schemas using Cerberus gets really useful when lots of different people need to use your config files.\nThe schema that you've defined also serves as documentation on how to write a correct config file: add a few explanatory comments and you've got some quick-n-dirty docs.\nIt's not a perfect strategy for writing docs but it has one fantasic property: the documentation cannot lie, because it <em>actually runs as code<\/em>.<\/p>\n<p>I was recently working on an in-house CLI tool for builds and deployment that was written in Python.\nI had devs from other teams using the file and I couldn't always show them how to use it in-person.\nEven worse I was constantly updating the tool based on feature requests and the config was evolving over time.\nOnce I had written a Cerberus schema for the tool's config files, I was able to link to the\nschema from our documentation. In addition, I was able to run regression tests on \"wild\" config files\nby pulling them down from our source control and checking that they were still valid.<\/p>\n<h3>Limitations<\/h3>\n<p>There's no denying that these schemas are very, very verbose: you need to write a lot of text to define even simple data structures.\nI think this verbosity caused by the fact that the tool uses built-in Python data structures, rather than an object-oriented DSL.\nIt's quick and easy to get started, but that comes at a cost.<\/p>\n<p>Another issue is that you can abuse this tool by using it as half-assed type system.\nIt gives you no type hints or static compilation errors in your IDE: everything happens when the code runs.\nSome code quality problems are better solved by investing in static analysis and using tools like <a href=\"http:\/\/mypy-lang.org\/\">mypy<\/a>.<\/p>\n<p>Finally, using Cerberus to validate config files and big data structures can hide underlying issues.\nI think of it like slapping a bandaid on a problem. It stops the bleeding, but you should also clean up all the broken glass on the floor.\nWhy do you have all these config files in the first place? Why are you shipping around these big crazy data structures in your code?\nIt's good to ask these questions and consider alternative solutions.<\/p>\n<h3>Next steps<\/h3>\n<p>Give Cerberus a try in your next CLI tool or data science project, you're a quick pip install and a schema definition\nfrom validating your config files.<\/p>","category":{"@attributes":{"term":"Programming"}}},{"title":"8 helpful tools for programming on Windows","link":{"@attributes":{"href":"https:\/\/mattsegal.dev\/windows-setup-programming.html","rel":"alternate"}},"published":"2020-05-02T12:00:00+10:00","updated":"2020-05-02T12:00:00+10:00","author":{"name":"Matthew Segal"},"id":"tag:mattsegal.dev,2020-05-02:\/windows-setup-programming.html","summary":"<p>Software development on Windows can be a pain. Not because of any issues with C#, .NET\nor the operating system, but simply because the tools surrounding your work can be quite clunky by default.\nI'm talking about the lack of a package manager, PowerShell's ugly blue terminal with no tabs \u2026<\/p>","content":"<p>Software development on Windows can be a pain. Not because of any issues with C#, .NET\nor the operating system, but simply because the tools surrounding your work can be quite clunky by default.\nI'm talking about the lack of a package manager, PowerShell's ugly blue terminal with no tabs and a bunch of \"missing\" tools (git, ssh).\nIt's like a living room where all the furniture is perfectly positioned to stub your toe.<\/p>\n<p>That said, you can get have a pretty nice developer experience if you install a few tools.\nThis post goes over my preferred setup on a new Windows laptop. It's not a definitive guide, just some tips and\ntricks that I've picked up from other devs that I've worked with. Hopefully you find some of them useful.<\/p>\n<p>The post below summarises everything in this video.<\/p>\n<div class=\"yt-embed\">\n    <iframe \n        src=\"https:\/\/www.youtube.com\/embed\/wMJJp1PbQQA\" \n        frameborder=\"0\" \n        allow=\"accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture\" \n        allowfullscreen\n    >\n    <\/iframe>\n<\/div>\n\n<h3>ConEmu console emulator<\/h3>\n<p><a href=\"https:\/\/conemu.github.io\/\">ConEmu<\/a> my #1 favourite tool for Windows. It allows you to:<\/p>\n<ul>\n<li>Open many PowerShell tabs in one window<\/li>\n<li>Show and hide the terminal with a hotkey (ctrl-`)<\/li>\n<li>Split your windows into sub-windows using hotkeys (ctrl-shift-(o|e))<\/li>\n<li>Open different shells in one window (PowerShell, Git Bash, cmd)<\/li>\n<li>Customise your terminal (different colors etc)<\/li>\n<li>Open PowerShell as Admin automatically<\/li>\n<\/ul>\n<p>It's like removing a rock from your shoe: an ugly blue rock.\nSome people also like to use <a href=\"https:\/\/cmder.net\/\">Cmder<\/a> for the same use-case.<\/p>\n<h3>Everything search<\/h3>\n<p>Windows Explorer search is so horribly broken in 2020 that you <em>hope<\/em> Microsoft is trolling you,\nbecause the alternative is just sad. In any case <a href=\"https:\/\/www.voidtools.com\/support\/everything\/\">Everything<\/a>\ngives you very fast search of all your files and folders, including that pesky InternalToolChain.dll\nwhich has gone missing.<\/p>\n<p>I believe it runs in the background all the time, quietly indexing your files.\nI do not how this affects your workstation's performance.<\/p>\n<h3>Chocolatey package manager<\/h3>\n<p><a href=\"https:\/\/chocolatey.org\/\">Chocolatey<\/a> is the (unofficial) package manager for Windows.\n<a href=\"https:\/\/www.nuget.org\/\">NuGet<\/a> is good for installing your .NET libraries, while <code>choco<\/code> is good for everything else.\nIt's great for quickly installing tools and automating the process. It's quite easy to <a href=\"https:\/\/chocolatey.org\/install\">install<\/a>.<\/p>\n<p>To install a tool like Everything, you can just <a href=\"https:\/\/chocolatey.org\/search?q=everything\">search for it<\/a> then run the install from the CLI:<\/p>\n<div class=\"highlight\"><pre><span><\/span><code><span class=\"n\">choco<\/span> <span class=\"n\">install<\/span> <span class=\"n\">everything<\/span>\n<\/code><\/pre><\/div>\n\n<p>In fact, once you've got choco installed, you can install all of the other tools on this list with:<\/p>\n<div class=\"highlight\"><pre><span><\/span><code><span class=\"n\">choco<\/span> <span class=\"n\">install<\/span> <span class=\"n\">git<\/span> <span class=\"n\">-y<\/span>\n<span class=\"n\">choco<\/span> <span class=\"n\">install<\/span> <span class=\"n\">conemu<\/span> <span class=\"n\">-y<\/span>\n<span class=\"n\">choco<\/span> <span class=\"n\">install<\/span> <span class=\"n\">everything<\/span> <span class=\"n\">-y<\/span>\n<span class=\"n\">choco<\/span> <span class=\"n\">install<\/span> <span class=\"n\">poshgit<\/span> <span class=\"n\">-y<\/span>\n<span class=\"n\">choco<\/span> <span class=\"n\">install<\/span> <span class=\"n\">vscode<\/span> <span class=\"n\">-y<\/span>\n<span class=\"n\">choco<\/span> <span class=\"n\">install<\/span> <span class=\"n\">ag<\/span> <span class=\"n\">-y<\/span>\n<\/code><\/pre><\/div>\n\n<p>Try not to install anything with Chocolatey if it already exists: things can get weird. You can always run <code>Get-Command<\/code> in PowerShell to check for existing executables:<\/p>\n<div class=\"highlight\"><pre><span><\/span><code><span class=\"nb\">Get-Command<\/span> <span class=\"n\">python<\/span>\n<\/code><\/pre><\/div>\n\n<h3>Visual Studio Code<\/h3>\n<p><a href=\"https:\/\/code.visualstudio.com\/\">Visual Studio Code<\/a> is a text editor that strikes a great balance between being full-featured and overly bloated.\nThis is an obvious proposition to more experienced developers, but there are a lot of beginners out there editing their files in <code>notepad.exe<\/code>.\nI personally prefer it to slimmer alternatives like Sublime Text 3 and hulking behemoths like PyCharm or Visual Studio.<\/p>\n<p>A really cool feature of VSCode on Windows is that it's quite command-line friendly. You can open the current folder in VSCode from the CLI with:<\/p>\n<div class=\"highlight\"><pre><span><\/span><code><span class=\"n\">code<\/span> <span class=\"p\">.<\/span>\n<\/code><\/pre><\/div>\n\n<p>or you can open a single file like this:<\/p>\n<div class=\"highlight\"><pre><span><\/span><code><span class=\"n\">code<\/span> <span class=\"n\">my<\/span><span class=\"o\">-file<\/span><span class=\"p\">.<\/span><span class=\"n\">txt<\/span>\n<\/code><\/pre><\/div>\n\n<p>I'm quite a fan of the <a href=\"https:\/\/marketplace.visualstudio.com\/items?itemName=AndreyVolosovich.monokai-st3\">Monokai ST3 theme<\/a> and the <a href=\"https:\/\/github.com\/PKief\/vscode-material-icon-theme\">Materoal Icon Theme<\/a>, plus a bajillion other language-specific extensions.<\/p>\n<p><a href=\"https:\/\/www.sublimetext.com\/3\">Sublime Text 3<\/a> is a popular alternative with a rich plugin ecosystem but less features out-of-the box.\nSome people also like <a href=\"https:\/\/notepad-plus-plus.org\/downloads\/\">Notepad++<\/a>, a decision I don't really understand, but as the name suggests,\nit beats the shit out of just using Notepad.<\/p>\n<h3>PowerShell setup<\/h3>\n<p>There's a few tricks to getting PowerShell into a usable state on a new Windows machine.\nThe first thing is to always open as Administrator, if you can.\nOnce PowerShell is open, I like set the \"execution policy\", which allows you to run scripts:<\/p>\n<div class=\"highlight\"><pre><span><\/span><code><span class=\"nb\">Set-ExecutionPolicy<\/span> <span class=\"n\">-ExecutionPolicy<\/span> <span class=\"n\">Bypass<\/span>\n<\/code><\/pre><\/div>\n\n<p>Now you can put some PowerShell in a script, like myscript.ps1:<\/p>\n<div class=\"highlight\"><pre><span><\/span><code><span class=\"c\"># myscript.ps1<\/span>\n<span class=\"nb\">Write-Host<\/span> <span class=\"s2\">&quot;Hello World!&quot;<\/span>\n<\/code><\/pre><\/div>\n\n<p>and then run it from your PowerShell terminal<\/p>\n<div class=\"highlight\"><pre><span><\/span><code><span class=\"p\">.\/<\/span><span class=\"n\">myscript<\/span><span class=\"p\">.<\/span><span class=\"n\">ps1<\/span>\n<span class=\"c\"># Hello World!<\/span>\n<\/code><\/pre><\/div>\n\n<p>Finally, I like to configure my profile, which is a script that runs before every PowerShell session starts.\nThis is where you can add things like welcome messages, function definitions and module imports.\nTo set up your profile, just open <code>$profile<\/code>, add some stuff and then save the file. For example:<\/p>\n<div class=\"highlight\"><pre><span><\/span><code><span class=\"n\">code<\/span> <span class=\"nv\">$profile<\/span>\n<\/code><\/pre><\/div>\n\n<p>One other handy PowerShell tip while I'm here: you can open folders in explorer with <code>explorer<\/code>, and, if you're not using VSCode,\nyou can still edit files using <code>notepad<\/code>:<\/p>\n<div class=\"highlight\"><pre><span><\/span><code><span class=\"n\">explorer<\/span> <span class=\"p\">.<\/span>\n<span class=\"n\">explorer<\/span> <span class=\"s2\">&quot;C:\\Users\\mattd&quot;<\/span>\n<span class=\"n\">notepad<\/span> <span class=\"s2\">&quot;secret-plot.txt&quot;<\/span>\n<\/code><\/pre><\/div>\n\n<h3>Git for Windows<\/h3>\n<p>This one also seems kind of obvious if you're already using <a href=\"https:\/\/git-scm.com\/download\/win\">Git<\/a>,\nand if you're not using Git then why would you bother?\nWait! There's more than just <code>git<\/code> in Git for Windows. The install also gives you:<\/p>\n<ul>\n<li>Git Bash, which is a bash shell that can run scripts<\/li>\n<li><code>ssh<\/code> for logging into Linux servers<\/li>\n<li>\n<p><code>scp<\/code> for transfering files to Linux servers<\/p>\n<\/li>\n<li>\n<p><code>ssh-keygen<\/code> for generating SSH keys<\/p>\n<\/li>\n<li>\n<p>A bunch of nice Unix tools like <code>find<\/code><\/p>\n<\/li>\n<\/ul>\n<p>You'll never need ot use <a href=\"https:\/\/www.putty.org\/\">PuTTY<\/a> again! You might scoff at my promotion of Git Bash:<\/p>\n<blockquote>\n<p>Fool! Doesn't he know about <a href=\"https:\/\/docs.microsoft.com\/en-us\/windows\/wsl\/install-win10\">Windows Subsystem for Linux<\/a>?<\/p>\n<\/blockquote>\n<p>WSL seems nice, but a lot of workplaces won't let you install it, but they will allow Git.<\/p>\n<h3>Posh-Git: a PowerShell environment for Git<\/h3>\n<p><a href=\"https:\/\/github.com\/dahlbyk\/posh-git\">Posh-Git<\/a> gives you a nice little Git status message in your command prompt.\nIt tells you the current branch that you're on and the number un-comitted changes. It's quite convenient. To install it:<\/p>\n<div class=\"highlight\"><pre><span><\/span><code><span class=\"nb\">Install-Module<\/span> <span class=\"n\">posh-git<\/span> <span class=\"n\">-Scope<\/span> <span class=\"n\">CurrentUser<\/span> <span class=\"n\">-Force<\/span>\n<\/code><\/pre><\/div>\n\n<p>And then to activate it:<\/p>\n<div class=\"highlight\"><pre><span><\/span><code><span class=\"nb\">Import-Module<\/span> <span class=\"n\">posh-git<\/span>\n<\/code><\/pre><\/div>\n\n<p>Note that it will only display your Git status when the current directory is a Git repo.\nConsider placing the import statement into your PowerShell profile so that it loads automatically for you.<\/p>\n<h3>The Silver Searcher<\/h3>\n<p>The <a href=\"https:\/\/github.com\/ggreer\/the_silver_searcher\">Silver Searcher<\/a> is a nice little CLI tool for quickly finding all\ninstances of a string in a folder. For example if I want to find all instances of \"probably\" in my \"contents\" folder:<\/p>\n<div class=\"highlight\"><pre><span><\/span><code><span class=\"n\">ag<\/span> <span class=\"n\">-i<\/span> <span class=\"n\">probably<\/span> <span class=\"n\">content<\/span>\n<\/code><\/pre><\/div>\n\n<p>Its main appeal is speed: it's really fucking fast. Git grep is also a contender if you're working in a Git repository:<\/p>\n<div class=\"highlight\"><pre><span><\/span><code><span class=\"n\">git<\/span> <span class=\"n\">grep<\/span> <span class=\"n\">probably<\/span>\n<\/code><\/pre><\/div>\n\n<h3>Next steps<\/h3>\n<p>Despite Steve Ballmer's <a href=\"https:\/\/www.youtube.com\/watch?v=Vhh_GeBPOhs\">pro-dev yelling<\/a> in the 2000s, Microsoft\ndropped the ball on making Windows nice for developers. Nevertheless you can use these tools, and others, to cobble together an environment where\nbuilding software isn't like stubbing your toe on every piece of furniture. If you haven't tried these tools already, then I encourage you to install them and have a play around. You might find your coding experience a little less painful.<\/p>","category":{"@attributes":{"term":"Programming"}}},{"title":"Run your Python unit tests via GitHub actions","link":{"@attributes":{"href":"https:\/\/mattsegal.dev\/pytest-on-github-actions.html","rel":"alternate"}},"published":"2020-04-27T12:00:00+10:00","updated":"2020-04-27T12:00:00+10:00","author":{"name":"Matthew Segal"},"id":"tag:mattsegal.dev,2020-04-27:\/pytest-on-github-actions.html","summary":"<p>You've written some unit tests for your Python app. Good for you! There are dozens of us, dozens!\nYou don't always remember to run your tests, or worse, your colleagues don't always remember to run them.<\/p>\n<p>Wouldn't it be nice to automatically run unit tests on every commit to GitHub \u2026<\/p>","content":"<p>You've written some unit tests for your Python app. Good for you! There are dozens of us, dozens!\nYou don't always remember to run your tests, or worse, your colleagues don't always remember to run them.<\/p>\n<p>Wouldn't it be nice to automatically run unit tests on every commit to GitHub? What about on every pull request?\nYou can do this with <a href=\"https:\/\/github.com\/features\/actions\">GitHub Actions<\/a>.\nYou'd be able to hunt down commits that broke the build, and if you're feeling blamey, <em>who<\/em> broke the build.\nSounds complicated, but it's not.\nSounds like it might cost money, but the free version has ~30 hours of execution per month.\nLet me show you how to set this up.<\/p>\n<p>There is example code for this blog post <a href=\"https:\/\/github.com\/MattSegal\/actions-python-tests\">here<\/a>.<\/p>\n<h3>Setting up your project<\/h3>\n<p>I'm going to assume that:<\/p>\n<ul>\n<li>You have some Python code<\/li>\n<li>You use Git, and your code is already in a GitHub repository<\/li>\n<\/ul>\n<p>If you're already running unit tests locally you can skip this section.\nOtherwise, your Python project's folder looks something like this:<\/p>\n<div class=\"highlight\"><pre><span><\/span><code>.\n\u251c\u2500\u2500 env                     Python virtualenv\n\u251c\u2500\u2500 requirements.txt        Python requirements\n\u251c\u2500\u2500 README.md               Project description\n\u2514\u2500\u2500 stuff.py                Your code\n<\/code><\/pre><\/div>\n\n<p>If you don't have tests already, I recommend trying pytest (and adding it to your requirements.txt).<\/p>\n<div class=\"highlight\"><pre><span><\/span><code>pip install pytest\n<\/code><\/pre><\/div>\n\n<p>You'll need at least one test<\/p>\n<div class=\"highlight\"><pre><span><\/span><code><span class=\"c1\"># test_stuff.py<\/span>\n<span class=\"kn\">from<\/span> <span class=\"nn\">stuff<\/span> <span class=\"kn\">import<\/span> <span class=\"n\">run_stuff<\/span>\n\n<span class=\"k\">def<\/span> <span class=\"nf\">test_run_stuff<\/span><span class=\"p\">():<\/span>\n    <span class=\"n\">result<\/span> <span class=\"o\">=<\/span> <span class=\"n\">run_stuff<\/span><span class=\"p\">()<\/span>\n    <span class=\"k\">assert<\/span> <span class=\"n\">result<\/span> <span class=\"o\">==<\/span> <span class=\"mi\">1<\/span>\n<\/code><\/pre><\/div>\n\n<p>You'll want to make sure your tests run and pass locally<\/p>\n<div class=\"highlight\"><pre><span><\/span><code>pytest\n<\/code><\/pre><\/div>\n\n<h3>Set up your Action<\/h3>\n<p>You'll need to create new a file in a new folder: <code>.github\/workflows\/ci.yml<\/code>.\nYou can learn more about these config files <a href=\"https:\/\/help.github.com\/en\/actions\">here<\/a>.\nHere's an example file:<\/p>\n<div class=\"highlight\"><pre><span><\/span><code><span class=\"nt\">name<\/span><span class=\"p\">:<\/span><span class=\"w\"> <\/span><span class=\"l l-Scalar l-Scalar-Plain\">Project Tests<\/span><span class=\"w\"><\/span>\n<span class=\"nt\">on<\/span><span class=\"p\">:<\/span><span class=\"w\"><\/span>\n<span class=\"w\">  <\/span><span class=\"nt\">push<\/span><span class=\"p\">:<\/span><span class=\"w\"><\/span>\n<span class=\"w\">    <\/span><span class=\"nt\">branches<\/span><span class=\"p\">:<\/span><span class=\"w\"><\/span>\n<span class=\"w\">      <\/span><span class=\"p p-Indicator\">-<\/span><span class=\"w\"> <\/span><span class=\"l l-Scalar l-Scalar-Plain\">master<\/span><span class=\"w\"><\/span>\n<span class=\"w\">  <\/span><span class=\"nt\">pull_request<\/span><span class=\"p\">:<\/span><span class=\"w\"><\/span>\n<span class=\"w\">    <\/span><span class=\"nt\">branches<\/span><span class=\"p\">:<\/span><span class=\"w\"><\/span>\n<span class=\"w\">      <\/span><span class=\"p p-Indicator\">-<\/span><span class=\"w\"> <\/span><span class=\"l l-Scalar l-Scalar-Plain\">master<\/span><span class=\"w\"><\/span>\n\n<span class=\"nt\">jobs<\/span><span class=\"p\">:<\/span><span class=\"w\"><\/span>\n<span class=\"w\">  <\/span><span class=\"nt\">build<\/span><span class=\"p\">:<\/span><span class=\"w\"><\/span>\n<span class=\"w\">    <\/span><span class=\"nt\">runs-on<\/span><span class=\"p\">:<\/span><span class=\"w\"> <\/span><span class=\"l l-Scalar l-Scalar-Plain\">ubuntu-latest<\/span><span class=\"w\"><\/span>\n<span class=\"w\">    <\/span><span class=\"nt\">steps<\/span><span class=\"p\">:<\/span><span class=\"w\"><\/span>\n<span class=\"w\">      <\/span><span class=\"p p-Indicator\">-<\/span><span class=\"w\"> <\/span><span class=\"nt\">uses<\/span><span class=\"p\">:<\/span><span class=\"w\"> <\/span><span class=\"l l-Scalar l-Scalar-Plain\">actions\/checkout@v2<\/span><span class=\"w\"><\/span>\n<span class=\"w\">      <\/span><span class=\"p p-Indicator\">-<\/span><span class=\"w\"> <\/span><span class=\"nt\">name<\/span><span class=\"p\">:<\/span><span class=\"w\"> <\/span><span class=\"l l-Scalar l-Scalar-Plain\">Set up Python 3.6<\/span><span class=\"w\"><\/span>\n<span class=\"w\">        <\/span><span class=\"nt\">uses<\/span><span class=\"p\">:<\/span><span class=\"w\"> <\/span><span class=\"l l-Scalar l-Scalar-Plain\">actions\/setup-python@v1<\/span><span class=\"w\"><\/span>\n<span class=\"w\">        <\/span><span class=\"nt\">with<\/span><span class=\"p\">:<\/span><span class=\"w\"><\/span>\n<span class=\"w\">          <\/span><span class=\"nt\">python-version<\/span><span class=\"p\">:<\/span><span class=\"w\"> <\/span><span class=\"l l-Scalar l-Scalar-Plain\">3.6<\/span><span class=\"w\"><\/span>\n<span class=\"w\">      <\/span><span class=\"p p-Indicator\">-<\/span><span class=\"w\"> <\/span><span class=\"nt\">name<\/span><span class=\"p\">:<\/span><span class=\"w\"> <\/span><span class=\"l l-Scalar l-Scalar-Plain\">Install dependencies<\/span><span class=\"w\"><\/span>\n<span class=\"w\">        <\/span><span class=\"nt\">run<\/span><span class=\"p\">:<\/span><span class=\"w\"> <\/span><span class=\"p p-Indicator\">|<\/span><span class=\"w\"><\/span>\n<span class=\"w\">          <\/span><span class=\"no\">python -m pip install --upgrade pip<\/span><span class=\"w\"><\/span>\n<span class=\"w\">          <\/span><span class=\"no\">pip install -r requirements.txt<\/span><span class=\"w\"><\/span>\n<span class=\"w\">      <\/span><span class=\"p p-Indicator\">-<\/span><span class=\"w\"> <\/span><span class=\"nt\">name<\/span><span class=\"p\">:<\/span><span class=\"w\"> <\/span><span class=\"l l-Scalar l-Scalar-Plain\">Test with pytest<\/span><span class=\"w\"><\/span>\n<span class=\"w\">        <\/span><span class=\"nt\">run<\/span><span class=\"p\">:<\/span><span class=\"w\"> <\/span><span class=\"l l-Scalar l-Scalar-Plain\">pytest -vv<\/span><span class=\"w\"><\/span>\n<\/code><\/pre><\/div>\n\n<p>Now your project looks like this:<\/p>\n<div class=\"highlight\"><pre><span><\/span><code>.\n\u251c\u2500\u2500 .github                 GitHub hidden folder\n|   \u2514\u2500\u2500 workflows           Some other folder\n|       \u2514\u2500\u2500 ci.yml          GitHub Actions config\n\u251c\u2500\u2500 env                     Python virtualenv\n\u251c\u2500\u2500 requirements.txt        Python requirements\n\u251c\u2500\u2500 README.md               Project description\n\u251c\u2500\u2500 test_stuff.py           pytest unit tests\n\u2514\u2500\u2500 stuff.py                Your code\n<\/code><\/pre><\/div>\n\n<p>Commit your changes, push it up to GitHub and watch your tests run!<\/p>\n<p>Sometimes they fail:<\/p>\n<div class=\"loom-embed\"><iframe src=\"https:\/\/www.loom.com\/embed\/c46a3b978fa441b2a50abbe9d7d2a1ef\" frameborder=\"0\" webkitallowfullscreen mozallowfullscreen allowfullscreen style=\"position: absolute; top: 0; left: 0; width: 100%; height: 100%;\"><\/iframe><\/div>\n\n<p>Sometimes they pass:<\/p>\n<div class=\"loom-embed\"><iframe src=\"https:\/\/www.loom.com\/embed\/f06b6150b74445159e665f0b3ba92c2a\" frameborder=\"0\" webkitallowfullscreen mozallowfullscreen allowfullscreen style=\"position: absolute; top: 0; left: 0; width: 100%; height: 100%;\"><\/iframe><\/div>\n\n<h3>Add a badge to your README<\/h3>\n<p>You can add a \"badge\" to your project's README.md.\nAssuming your project was hosted at https:\/\/github.com\/MyName\/my-project\/, you can add this\nto your README.md file:<\/p>\n<div class=\"highlight\"><pre><span><\/span><code>![](https:\/\/github.com\/MyName\/my-project\/workflows\/Project%20Tests\/badge.svg)\n<\/code><\/pre><\/div>\n\n<h3>Next steps<\/h3>\n<p>Write some tests, run them locally, and then let GitHub run them for you on every commit from now on.\nIf you get stuck, check out <a href=\"https:\/\/github.com\/MattSegal\/actions-python-tests\">this minimal reference<\/a> or the <a href=\"https:\/\/help.github.com\/en\/actions\">Actions docs<\/a>.<\/p>","category":{"@attributes":{"term":"Programming"}}},{"title":"Never think about Python formatting again","link":{"@attributes":{"href":"https:\/\/mattsegal.dev\/python-formatting-with-black.html","rel":"alternate"}},"published":"2020-04-24T12:00:00+10:00","updated":"2020-04-24T12:00:00+10:00","author":{"name":"Matthew Segal"},"id":"tag:mattsegal.dev,2020-04-24:\/python-formatting-with-black.html","summary":"<p>At some point you realise that formatting your Python code is important.\nYou want your code to be readable, but what's the <em>right<\/em> way to format it?\nYou recognise that it's much harder to read this:<\/p>\n<div class=\"highlight\"><pre><span><\/span><code><span class=\"n\">some_things<\/span> <span class=\"o\">=<\/span> <span class=\"p\">{<\/span><span class=\"s2\">&quot;carrots&quot;<\/span><span class=\"p\">:<\/span> <span class=\"p\">[<\/span><span class=\"mi\">1<\/span><span class=\"p\">,<\/span><span class=\"mi\">2<\/span> <span class=\"p\">],<\/span>\n<span class=\"s2\">&quot;apples&quot;<\/span><span class=\"p\">:[<\/span>\n<span class=\"mi\">3<\/span><span class=\"p\">,<\/span><span class=\"mi\">3<\/span><span class=\"p\">,<\/span> <span class=\"mi\">3<\/span>\n<span class=\"p\">],<\/span> <span class=\"s2\">&quot;pears&quot;<\/span><span class=\"p\">:<\/span> <span class=\"p\">[]<\/span> <span class=\"p\">}<\/span>\n<\/code><\/pre><\/div>\n\n<p>than it is to \u2026<\/p>","content":"<p>At some point you realise that formatting your Python code is important.\nYou want your code to be readable, but what's the <em>right<\/em> way to format it?\nYou recognise that it's much harder to read this:<\/p>\n<div class=\"highlight\"><pre><span><\/span><code><span class=\"n\">some_things<\/span> <span class=\"o\">=<\/span> <span class=\"p\">{<\/span><span class=\"s2\">&quot;carrots&quot;<\/span><span class=\"p\">:<\/span> <span class=\"p\">[<\/span><span class=\"mi\">1<\/span><span class=\"p\">,<\/span><span class=\"mi\">2<\/span> <span class=\"p\">],<\/span>\n<span class=\"s2\">&quot;apples&quot;<\/span><span class=\"p\">:[<\/span>\n<span class=\"mi\">3<\/span><span class=\"p\">,<\/span><span class=\"mi\">3<\/span><span class=\"p\">,<\/span> <span class=\"mi\">3<\/span>\n<span class=\"p\">],<\/span> <span class=\"s2\">&quot;pears&quot;<\/span><span class=\"p\">:<\/span> <span class=\"p\">[]<\/span> <span class=\"p\">}<\/span>\n<\/code><\/pre><\/div>\n\n<p>than it is to read this:<\/p>\n<div class=\"highlight\"><pre><span><\/span><code><span class=\"n\">some_things<\/span> <span class=\"o\">=<\/span> <span class=\"p\">{<\/span>\n    <span class=\"s2\">&quot;carrots&quot;<\/span><span class=\"p\">:<\/span> <span class=\"p\">[<\/span><span class=\"mi\">1<\/span><span class=\"p\">,<\/span> <span class=\"mi\">2<\/span><span class=\"p\">],<\/span>\n    <span class=\"s2\">&quot;apples&quot;<\/span><span class=\"p\">:<\/span> <span class=\"p\">[<\/span><span class=\"mi\">3<\/span><span class=\"p\">,<\/span> <span class=\"mi\">3<\/span><span class=\"p\">,<\/span> <span class=\"mi\">3<\/span><span class=\"p\">],<\/span>\n    <span class=\"s2\">&quot;pears&quot;<\/span><span class=\"p\">:<\/span> <span class=\"p\">[],<\/span>\n<span class=\"p\">}<\/span>\n<\/code><\/pre><\/div>\n\n<p>or... wait should it be like this instead? Hmm...<\/p>\n<div class=\"highlight\"><pre><span><\/span><code><span class=\"n\">some_things<\/span> <span class=\"o\">=<\/span> <span class=\"p\">{<\/span>\n    <span class=\"s2\">&quot;carrots&quot;<\/span><span class=\"p\">:<\/span> <span class=\"p\">[<\/span><span class=\"mi\">1<\/span><span class=\"p\">,<\/span> <span class=\"mi\">2<\/span><span class=\"p\">],<\/span>\n    <span class=\"s2\">&quot;apples&quot;<\/span><span class=\"p\">:<\/span>  <span class=\"p\">[<\/span><span class=\"mi\">3<\/span><span class=\"p\">,<\/span> <span class=\"mi\">3<\/span><span class=\"p\">,<\/span> <span class=\"mi\">3<\/span><span class=\"p\">],<\/span>\n    <span class=\"s2\">&quot;pears&quot;<\/span><span class=\"p\">:<\/span>   <span class=\"p\">[],<\/span>\n<span class=\"p\">}<\/span>\n<\/code><\/pre><\/div>\n\n<p>nah, nah, wait a sec maybe would be better if we kept in on one line to save space...<\/p>\n<div class=\"highlight\"><pre><span><\/span><code><span class=\"n\">some_things<\/span> <span class=\"o\">=<\/span> <span class=\"p\">{<\/span><span class=\"s2\">&quot;carrots&quot;<\/span><span class=\"p\">:<\/span> <span class=\"p\">[<\/span><span class=\"mi\">1<\/span><span class=\"p\">,<\/span> <span class=\"mi\">2<\/span><span class=\"p\">],<\/span> <span class=\"s2\">&quot;apples&quot;<\/span><span class=\"p\">:<\/span>  <span class=\"p\">[<\/span><span class=\"mi\">3<\/span><span class=\"p\">,<\/span> <span class=\"mi\">3<\/span><span class=\"p\">,<\/span> <span class=\"mi\">3<\/span><span class=\"p\">],<\/span> <span class=\"s2\">&quot;pears&quot;<\/span><span class=\"p\">:<\/span> <span class=\"p\">[]}<\/span>\n<\/code><\/pre><\/div>\n\n<p>Umm, is that line too long though? We could do this for hours.<\/p>\n<p>Formatting your code <em>is<\/em> important, but it's easy to get lost in the details.\nYou want your code to look professional, but it can be a time-sink. It's easy to:<\/p>\n<ul>\n<li>spend time experimenting with different formatting styles<\/li>\n<li>spend ages twiddling with linter (eg. PyLint) rules, and then spend cumulative hours tweaking your code to make the linter stop yelling at you<\/li>\n<li>fight a co-worker to the death on top of a castle tower in a thunderstorm over the proper way to lay out brackets<\/li>\n<\/ul>\n<p>This is all just incidental bullshit though. It's a distraction from your real work: laying out brackets one way or another isn't going to make your software run any better (but if the closing bracket isn't on its own new line then I'll gut you like the dog you are!).<\/p>\n<p>Is there a way to avoid this mess? How can we get rid of all this incidental work?<\/p>\n<h3>Give black a try<\/h3>\n<p><a href=\"https:\/\/github.com\/psf\/black\/\">Black<\/a> is a tool that auto-formats your Python code. You run black over all your .py files and the correct formatting is applied for you. It's like <a href=\"https:\/\/prettier.io\/\">prettier<\/a>, but for Python instead of JavaScript.<\/p>\n<p>Importantly, Black has minimal configuration. You basically only get to choose the maximum line length that you want, and everything else is decided by the formatter. It's the \"uncompromising Python code formatter\". This means you don't get to choose what formatting style you use, but it also means you don't need to decide either: once you've adopted Black, you <em>never need to think about Python formatting again<\/em>. No more config files, no more arguing with your coworkers. Spend your time on more valuable things, like what your code is doing.<\/p>\n<p>Is it safe to just run your whole codebase through this tool? I think so. Black compares the Python <a href=\"https:\/\/en.wikipedia.org\/wiki\/Abstract_syntax_tree\">abstract syntax tree<\/a> of the code before and after the changes, just to make sure it didn't change or break anything. In the last few jobs I've worked, I've walked in, made the case for Black (politely), and run it over the whole codebase. It's never caused any issues.<\/p>\n<p>Here's some of the other benefits of Black:<\/p>\n<ul>\n<li><strong>Less work when coding<\/strong>: all the time you spend manually formatting your code can now be spent elsewhere.<\/li>\n<li><strong>More productive pull requests<\/strong>: the person reviewing your code can't <a href=\"https:\/\/en.wiktionary.org\/wiki\/bikeshedding\">bikeshed<\/a> your formatting, because it's out of your hands - instead they'll need to actually look at what your code is doing.<\/li>\n<li><strong>Smaller diffs<\/strong>: there will be no formatting changes in your diffs, so the only changes left are meaningful ones. In addition, the Black formatting style is optimised around minimising diffs.<\/li>\n<li><strong>Keep the linter off your back<\/strong>: if you are also using a linter like flake8, then Black will help you avoid basic <a href=\"https:\/\/www.python.org\/dev\/peps\/pep-0008\/\">PEP 8<\/a> errors.<\/li>\n<li><strong>Auto format on save in your IDE<\/strong>: This one is huuuuge. You can set up Black to reformat your code <em>as you write it<\/em>. I've found this helps me write code much faster.<\/li>\n<\/ul>\n<h3>Running Black<\/h3>\n<p>You have to install it.<\/p>\n<div class=\"highlight\"><pre><span><\/span><code>pip install black\n<\/code><\/pre><\/div>\n\n<p>Then you run it with a path as an argument<\/p>\n<div class=\"highlight\"><pre><span><\/span><code>black .\n<\/code><\/pre><\/div>\n\n<p>Then it mangles all of your code!<\/p>\n<div class=\"highlight\"><pre><span><\/span><code>reformatted \/home\/matt\/code\/redbubble\/colors.py\nreformatted \/home\/matt\/code\/redbubble\/fuzzer.py\nreformatted \/home\/matt\/code\/redbubble\/image.py\nreformatted \/home\/matt\/code\/redbubble\/sierpinski.py\nAll done! \u2728 \ud83c\udf70 \u2728\n4 files reformatted, 2 files left unchanged.\n<\/code><\/pre><\/div>\n\n<p>You can mess around a little bit with the line length config, or using pyproject.toml, but that's basically it.<\/p>\n<p>If you're running CI and you want to check for correct formatting, you can use<\/p>\n<div class=\"highlight\"><pre><span><\/span><code>black --check .\n<\/code><\/pre><\/div>\n\n<p>It returns exit code 0 if the formatting is correct, and exit code 1 if it's not.<\/p>\n<h3>Format on save<\/h3>\n<p>Format on save is incredible, it's been a big productivity boost for me. In VSCode you can add the following settings to format on save with black:<\/p>\n<div class=\"highlight\"><pre><span><\/span><code><span class=\"p\">{<\/span><span class=\"w\"><\/span>\n<span class=\"w\">  <\/span><span class=\"nt\">&quot;python.formatting.provider&quot;<\/span><span class=\"p\">:<\/span><span class=\"w\"> <\/span><span class=\"s2\">&quot;black&quot;<\/span><span class=\"p\">,<\/span><span class=\"w\"><\/span>\n<span class=\"w\">  <\/span><span class=\"nt\">&quot;editor.formatOnSave&quot;<\/span><span class=\"p\">:<\/span><span class=\"w\"> <\/span><span class=\"kc\">true<\/span><span class=\"w\"><\/span>\n<span class=\"p\">}<\/span><span class=\"w\"><\/span>\n<\/code><\/pre><\/div>\n\n<p>I don't know about other editors, but I've set this up in PyCharm as well. Once that's done then any save will format the document. Here's an example:<\/p>\n<div class=\"loom-embed\"><iframe src=\"https:\/\/www.loom.com\/embed\/a5914312a4ff44d188f019bb63e19bf7\" frameborder=\"0\" webkitallowfullscreen mozallowfullscreen allowfullscreen style=\"position: absolute; top: 0; left: 0; width: 100%; height: 100%;\"><\/iframe><\/div>\n\n<h3>Limitations<\/h3>\n<p>Black is a just formatter, not a linter, so it does not do some linting functions. It will not complain about unused variables, imports and other linty stuff.<\/p>\n<p>It will also not do import sorting like <a href=\"https:\/\/github.com\/timothycrosley\/isort\">isort<\/a>. In fact, Black and isort can fight over how imports should be formatted, if you're running both of them. You can resolve it by running isort then black, or vice versa, but it can make CI tests a little awkward.<\/p>\n<p>Finally, it's \"in beta\", which as far as I can tell just means \"you should expect some formatting to change in the future\".<\/p>\n<h3>Summary<\/h3>\n<p>Black is awesome, it'll save you time and brain cycles, go forth and use it on all your Python code.<\/p>","category":{"@attributes":{"term":"Programming"}}},{"title":"Nand to Tetris is a great course","link":{"@attributes":{"href":"https:\/\/mattsegal.dev\/nand-to-tetris.html","rel":"alternate"}},"published":"2020-04-17T12:00:00+10:00","updated":"2020-04-17T12:00:00+10:00","author":{"name":"Matthew Segal"},"id":"tag:mattsegal.dev,2020-04-17:\/nand-to-tetris.html","summary":"<p>Everyone who learns programming at some point stops and asks - how does this actually work?\nYou might know how to write and run code, but <em>what's actually happening inside the computer<\/em>?\nIt can seem unfathomable.<\/p>\n<p>Some people don't care about what's happening under the hood. Their code works, it gets \u2026<\/p>","content":"<p>Everyone who learns programming at some point stops and asks - how does this actually work?\nYou might know how to write and run code, but <em>what's actually happening inside the computer<\/em>?\nIt can seem unfathomable.<\/p>\n<p>Some people don't care about what's happening under the hood. Their code works, it gets the job done, why would you bother drilling into the details?\nI'm like that sometimes, and you can get a long way coding without knowing the fundamentals of computing,\nbut there is a certain clarity and confidence that you can only get from knowing that you could <a href=\"https:\/\/www.youtube.com\/watch?v=SbO0tqH8f5I\">build a computer from scratch<\/a>... if you had the time.<\/p>\n<h3>You too can build a computer<\/h3>\n<p>I'm glad to say that you <em>may well<\/em> have the time to build a computer from scratch, since that's what you do in the online course Nand to Tetris (<a href=\"https:\/\/www.nand2tetris.org\/\">website<\/a>, <a href=\"https:\/\/www.coursera.org\/learn\/build-a-computer\">Coursera<\/a>). The course takes you through 12 projects, about 1 week each, where you incrementally build:<\/p>\n<ul>\n<li>a CPU<\/li>\n<li>a RAM chip<\/li>\n<li>a full computer<\/li>\n<li>an assembly language<\/li>\n<li>a virtual machine<\/li>\n<li>a high-level language<\/li>\n<li>an operating system<\/li>\n<\/ul>\n<p>All of this is done on your computer using tools provided by the course. Once you've done these projects you will understand the building blocks of a computer from the RAM and CPU, to assembly up to the compiler that executes your programming language of choice. It's a powerful course that will unlock a whole new perspective on computer programming for you. I believe that bang-for-buck it's probably the best online course for someone who is a self-taught programmer. It's practical, fun and mostly oriented around building things.<\/p>\n<p>I found that the course took a few weeks to really get into gear. The intial content on boolean logic and arithmetic can be a little dry, but if you can get through that, the course becomes more interesting and rewarding. It's a pretty cool feeling to run a program that is executed by a system that you wrote, from the compiler to the VM to the assembly code to the CPU.<\/p>\n<h3>What you need<\/h3>\n<p>The course description is a little over-optimistic in my opinion:<\/p>\n<blockquote>\n<p>This is a self-contained course: all the knowledge necessary to succeed in the course and build the computer system will be given as part of the learning experience.<\/p>\n<\/blockquote>\n<p>It's <em>mostly<\/em> self-contained, but really you either need to be an intermediate programmer, or very gung-ho. I'm not trying to talk you out of the course if you're new to coding, just know it's going to be challenging and you might get stuck from time-to-time. I believe that anyone can get through it if they are determined.<\/p>\n<p>The whole course consists of projects and there's automated testing of your work. If you're not in the habit of doing it already, I strongly recommend learning how to write unit tests in your programming language of choice. It's 1000x faster to write your own tests and run them on your computer than to upload your code to Coursera and let them verify whether your code works. You will write bugs, and you want to minimise the feedback loop required to find them.<\/p>\n<p>If you have <em>lots<\/em> of free time and you want to line up a 1-2 punch of theory and practice, you could also watch Harry Porter's <a href=\"https:\/\/www.youtube.com\/playlist?list=PLbtzT1TYeoMjNOGEiaRmm_vMIwUAidnQz\">Theory of Computation<\/a> videos, which teach you the mathsy theoretical underpinnings of computer science.<\/p>\n<h3>Do it! (if you can)<\/h3>\n<p>Not everyone has the spare time to commit to 12 weeks of programming projects, but if you do, I encourage you to give this course a try.\nKnowing how a computer works from chip-to-compiler is a nugget of knowledge that will be useful for your whole life.\nI can't say the same for learning the latest JavaScript toolchain.\n<a href=\"https:\/\/www.coursera.org\/learn\/build-a-computer\">Give it a try<\/a>!<\/p>","category":{"@attributes":{"term":"Programming"}}}]}