Matt's Dev Blog - DevOps

A breakdown of how NGINX is configured with Django

2020-07-31T12:00:00+10:00

You are trying to deploy your Django web app to the internet. You have never done this before, so you follow a guide like this one. The guide gives you many instructions, which includes installing and configuring an "NGINX reverse proxy". At some point you mutter to yourself:

What-the-hell is an NGINX? Eh, whatever, let's keep reading.

You will have to copy-paste some weird gobbledygook into a file, which looks like this:

# NGINX site config file at /etc/nginx/sites-available/myproject
server {
    listen 80;
    server_name foo.com;
    location / {
        proxy_pass http://127.0.0.1:8000;
        proxy_set_header Host $host;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
        proxy_redirect http://127.0.0.1:8000 http://foo.com;
    }
    location /static/ {
        root /home/myuser/myproject;
    }
}

What is all this stuff? What is it supposed to do?

Most people do their first Django deployment as a learning exercise. You want to understand what you are doing, so that you can fix problems if you get stuck and so you don't need to rely on guides in the future. In this post I'll break down the elements of this NGINX config and how it ties in with Django, so that you can confidently debug, update and extend it in the future.

What is this file supposed to achieve?

This scary-looking config file sets up NGINX so that it acts as the entrypoint to your Django application. Explaining why you might choose to use NGINX is a topic too expansive for this post, so I'm just going to stick to explaining how it works.

NGINX is completely separate program to your Django app. It is running inside its own process, while Django is running inside a WSGI server process, such as Gunicorn. In this post I will sometimes refer to Gunicorn and Django interchangeably.

All HTTP requests that hit your Django app have to go through NGINX first.

NGINX listens for incoming HTTP requests on port 80 and HTTPS requests on port 443. When a new request comes in:

NGINX looks at the request, checks some rules, and sends it on to your WSGI server, which is usually listening on localhost, port 8000
Your Django app will process the request and eventually produce a response
Your WSGI server will send the response back to NGINX; and then
NGINX will send the response back out to the original requesting client

You can also configure NGINX to serve static files, like images, directly from the filesystem, so that requests for these assets don't need to go through Django

You can adjust the rules in NGINX so that it selectively routes requests to multiple app servers. You could, for example, run a Wordpress site and a Django app from the same server:

Now that you have a general idea of what NGINX is supposed to do, let's go over the config file that makes this happen.

Server block

The top level block in the NGINX config file is the virtual server. The main utility of virtual servers is that they allow you to sort incoming requests based on the port and hostname. Let's start by looking at a basic server block:

server {
    # Listen on port 80 for incoming requests.
    listen 80;
    # Return status code 200 with text "Hello World".
    return 200 'Hello World';
}

Let me show you some example requests. Say we're on the same server as NGINX and we send a GET request using the command line tool curl.

curl localhost
# Hello World

This curl command sends the following HTTP request to localhost, port 80:

GET / HTTP/1.1
Host: localhost
User-Agent: curl/7.58.0

We will get the following HTTP response back from NGINX, with a 200 OK status code and "Hello World" in the body:

HTTP/1.1 200 OK
Content-Type: application/octet-stream
Content-Length: 11

Hello World

We can also request some random path and we get the same result:

curl localhost/some/path/on/website
# Hello World

With curl sending this HTTP request:

GET /some/path/on/website HTTP/1.1
Host: localhost
User-Agent: curl/7.58.0

and we get back the same response as before:

HTTP/1.1 200 OK
Content-Type: application/octet-stream
Content-Length: 11

Hello World

Simple so far, but not very interesting, let's start to mix it up with multiple server blocks.

Multiple virtual servers

You can add more than one virtual server in NGINX:

# All requests to foo.com return a 200 OK status code
server {
    listen 80;
    server_name foo.com;
    return 200 'Welcome to foo.com!';
}

# Any other requests get a 404 Not Found page
server {
    listen 80 default_server;
    return 404;
}

NGINX uses the server_name directive to check the Host header of incoming requests and match the request to a virtual server. Your web browser will usually set this header automatically for you. You can set up a particular virtual server to be the default choice (default_server) if no other ones match the incoming request. You can use this feature to host multiple Django apps on a single server. All you need to do is set up your DNS to get multiple domain names to point to a single server, and then add a virtual server for each Django app.

Let's test out the config above. If send a request to localhost, we'll get a 404 status code from the default server:

curl localhost
# <html>
#   <head><title>404 Not Found</title></head>
#   ...
# </html>

This is the request that gets sent:

GET / HTTP/1.1
Host: localhost
User-Agent: curl/7.58.0

Our request was matched to the default server because the Host header we sent didn't match foo.com. Let's try setting the Host header to foo.com:

curl localhost --header "Host: foo.com"
# Welcome to foo.com!

This is the request that gets sent:

GET / HTTP/1.1
Host: foo.com
User-Agent: curl/7.58.0

Now are directed to the foo.com virtual server because we sent the correct Host header in our request. Finally, we can see that setting a random Host header sends us to the default server:

curl localhost --header "Host: fasfsadfs.com"
# <html>
#   <head><title>404 Not Found</title></head>
#   ...
# </html>

There's more that you can do with virtual servers in NGINX, but what we've covered so far should be enough for you to understand their typical usage with Django.

Location blocks

Within a virtual server you can route the request based on the path.

server {
    listen 80;
    # Requests to the root path get a 200 OK response
    location / {
        return 200 'Cool!';
    }
    # Requests to /forbidden get 403 Forbidden response
    location /forbidden {
        return 403;
    }
}

Under this configuration, any requested path that matches /forbidden will return a 403 Forbidden status code, and everything else will return Cool! Let's try it out:

curl localhost
# Cool!
curl localhost/blah/blah/blah
# Cool!
curl localhost/forbidden
# <html>
# <head><title>403 Forbidden</title></head>
# ...
# </html>

curl localhost/forbidden/blah/blah/blah
# <html>
# <head><title>403 Forbidden</title></head>
# ...
# </html>

Now that we've covered server and location blocks it should be easier to make sense of some of the config that I showed you at the start of this post:

server {
    listen 80;
    server_name foo.com;
    location / {
        # Do something...
    }
    location /static/ {
        # Do something...
    }
}

Next we'll dig into the connection between NGINX and our WSGI server.

Reverse proxy location

As mentioned earlier, NGINX acts as a reverse proxy for Django:

This reverse proxy setup is configured within this location block:

location / {
    proxy_pass http://127.0.0.1:8000;
    proxy_set_header Host $host;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_set_header X-Forwarded-Proto $scheme;
    proxy_redirect http://127.0.0.1:8000 http://foo.com;
}

In the next few sections I will break down the directives in this block so that you understand what is going on. You might also find the NGINX documentation on reverse proxies helpful for understanding this config.

Proxy pass

The proxy_pass directive tells NGINX to send all requests for that location to the specified address. For example, if your WSGI server was running on localhost (which has IP 127.0.0.1), port 8000, then you would use this config:

server {
    listen 80;
    location / {
        proxy_pass http://127.0.0.1:8000;
    }
}

You can also point proxy_pass at a Unix domain socket, with Gunicorn listening on that socket, which is very similar to using localhost except it doesn't use up a port number and it's a bit faster:

server {
    listen 80;
    location / {
        proxy_pass http://unix:/home/user/my-socket-file.sock;
    }
}

Seems simple enough - you just point NGINX at your WSGI server, so... what was all that other crap? Why do you set proxy_set_header and proxy_redirect? That's what we'll discuss next.

NGINX is lying to you

As a reverse proxy, NGINX will receive HTTP requests from clients and then send those requests to our Gunicorn WSGI server. The problem is that NGINX hides information from our WSGI server. The HTTP request that Gunicorn receives is not the same as the one that NGINX received from the client.

Let me give you an example, which is illustrated above. You, the client, have an IP of 12.34.56.78 and you go to https://foo.com in your web browser and try to load the page. The request hits the server on port 443 and is read by NGINX. At this stage, NGINX knows that:

the protocol is HTTPS
the client has an IP address of 12.34.56.78
the request is for the host foo.com

NGINX then sends the request onwards to Gunicorn. When Gunicorn receives this request, it thinks:

the protocol is HTTP, not HTTPS, because the connection between NGINX and Gunicorn is not encrypted
the client has the IP address 127.0.0.1, because that's the address NGINX is using
the host is 127.0.0.1:8000 because NGINX said so

Some of this lost information is useful, and we want to force NGINX to send it to our WSGI server. That's what these lines are for:

proxy_set_header Host $host;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;

Next, I will explain each line in more detail.

Setting the Host header

Django would like to know the value of the Host header so that various bits of the framework, like ALLOWED_HOSTS or HttpRequest.get_host can work. The problem is that NGINX does not pass the Host header to proxied servers by default.

For example, when I'm using proxy_pass like I did in the previous section, and I send a request with the Host header to NGINX like this:

curl localhost --header "Host: foo.com"

Then NGINX receives the HTTP request, which looks like this:

GET / HTTP/1.1
Host: foo.com
User-Agent: curl/7.58.0

and then NGINX sends a HTTP request to your WSGI server, like this:

GET / HTTP/1.0
Host: 127.0.0.1:8000
User-Agent: curl/7.58.0

Notice something? That rat-fuck-excuse-for-a-webserver sent different headers to our WSGI server! I'm sure there is a good reason for this behaviour, but it's not what we want because it breaks some Django functionality. We can fix this by using the proxy_set_header as follows:

server {
    listen 80;
    location / {
        proxy_pass http://127.0.0.1:8000;
        # Ensure original Host header is forwarded to our Django app.
        proxy_set_header Host $host;
    }
}

Now NGINX will send the desired headers to Django:

GET / HTTP/1.0
Host: foo.com
User-Agent: curl/7.58.0

Gunicorn will read this Host header and provide it to you in your Django views via the request.META object:

# views.py
def my_view(request):
    host = request.META['HTTP_HOST']
    print(host)  # Eg. "foo.com"
    return HttpResponse(f"Got host {host}")

Setting the X-Forwarded-Whatever headers

The Host header isn't the only useful information that NGINX does not pass to Gunicorn. We would also like the protocol and source IP address of the client request to be passed to our WSGI server. We achieve this with these two lines:

proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;

I just want to point out that these header names are completely arbitrary. You can send any header you want with the format X-Insert-Words-Here to Gunicorn and it will parse it and send it onwards to Django. For example, you could set the header to be X-Matt-Is-Cool as follows:

proxy_set_header X-Matt-Is-Cool 'it is true';

Now NGINX will include this header with every request it sends to Gunicorn. When Gunicorn parses the HTTP request it reads any header with the format X-Insert-Words-Here into a Python dictionary, which ends up in the HttpRequest object that Django passes to your view. So in this case, X-Matt-Is-Cool gets turned into the key HTTP_X_MATT_IS_COOL in your request object. For example:

# views.py
def my_view(request):
    # Prints value of X-Matt-Is-Cool header included by NGINX
    print(request.META["HTTP_X_MATT_IS_COOL"])  # it is true
    return HttpResponse("Hello World")

This means you can add in whatever custom headers you like to your NGINX config, but for now let's focus on getting the protocol and client IP address to your Django app.

Setting the X-Forwarded-Proto header

Django sometimes needs to know whether the incoming request is secure (HTTPS) or not (HTTP). For example, some features of the SecurityMiddleware class checks for HTTPS. The problem is, of course, that NGINX is always telling Django that the client's request to the sever is not secure, even when it is. This problem always crops up for me when I'm implementing pagination, and the "next" URL has http:// instead of https:// like it should.

Our fix for this is to put the client request protocol into a header called X-Forwarded-Proto:

proxy_set_header X-Forwarded-Proto $scheme;

Then you need to set up the SECURE_PROXY_SSL_HEADER setting to read this header in your settings.py file:

SECURE_PROXY_SSL_HEADER = ('HTTP_X_FORWARDED_PROTO', 'https')

Now Django can tell the difference between incoming HTTP requests and HTTPS requests.

Setting the X-Forwarded-For header

Now let's talk about determining the client's IP address. As mentioned before, NGINX will always lie to you and say that the client IP address is 127.0.0.1. If you don't care about client IP addresses, then you don't care about this header. You don't need to set it if you don't want to. Knowing the client IP might be useful sometimes. For example, if you want to guess at where they are located, or if you are building one of those What's My IP? websites:

You can set the X-Forwarded-For header to tell Gunicorn the original IP address of the client:

proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;

As described earlier, the header X-Forwarded-For gets turned into the key HTTP_X_FORWARDED_FOR in your request object. For example:

# views.py
def my_view(request):
    # Prints client IP address: "12.34.56.78"
    print(request.META["HTTP_X_FORWARDED_FOR"])
    # Prints NGINX IP address: "127.0.0.1", ie. localhost
    print(request.META["REMOTE_ADDR"])
    return HttpResponse("Hello World")

Does this seem kind of underwhelming? Maybe a little pointless? As I said before, if you don't care about client IP addresses, then this header isn't for you.

Proxy redirect

Let's cover the final line of the Django reverse proxy config: proxy_redirect. The NGINX docs for this directive are here.

proxy_redirect http://127.0.0.1:8000 http://foo.com;

This directive is used when handling redirects that are issued by Django. For example, you might have a webpage that used to live at path old/page/, but you moved it to new/page/. You want to send any user that asked for old/page/ to new/page/. To achieve this you could write a Django view like this:

# view.py
def redirect_view(request):
    return HttpResponseRedirect("new/page/")

When a user asks for old/page/, this view will send them a HTTP response with a 302 redirect status code:

HTTP/1.1 302 Found
Location: new/page/

Your web browser will follow the Location response header to the new page. A problem occurs when your Django app includes the WSGI server's address and port in the Location header:

HTTP/1.1 302 Found
Location: http://127.0.0.1:8000/new/page/

This is a problem because the client's browser will try to go to that address, and it will fail because the WSGI server is not on the same server as the client.

Here's the thing: I have never actually seen this happen, and I'm having trouble thinking of a common scenario where this would happen. Send me an email if you know where this issue crops up. Anyway, using proxy_redirect helps in the hypothetical case where Django does include the WSGI address in a redirect's Location header.

The directive rewrites the header using the syntax:

proxy_redirect redirect replacement

So, for example, if there was a redirect response like this:

HTTP/1.1 302 Found
Location: http://127.0.0.1:8000/new/page/

and you set up your proxy_redirect like this

proxy_redirect http://127.0.0.1:8000 https://foo.com/blog/;

then the outgoing response would be re-written to this:

HTTP/1.1 302 Found
Location: https://foo.com/blog/new/page/

I guess this directive might be useful in some situations? I'm not really sure.

Static block

Earlier I mentioned that NGINX can serve static files directly from the filesystem.

This is a good idea because NGINX is much more efficient at doing this than your WSGI server will be. It means that your server will be able to respond faster to static file request and handle more traffic. You can use this technique to put all of your Django app's static files into a folder like this:

/home/myuser/myproject 
└─ static               Your static files
    ├─ styles.css       CSS file
    ├─ main.js          JavaScript file
    └─ cat.png          A picture of a cat

Then you can set the /static/ location to serve files directly from this folder:

location /static/ {
    root /home/myuser/myproject;
}

Now a request to http://localhost/static/cat.png will cause NGINX to read from /home/myuser/myproject/static/cat.png, without sending a request to the WSGI server.

Next steps

Now you know what every line of your Django app's NGINX config is doing. Hopefully you will be able to use this knowledge to debug issues faster and customise your existing setup. If you have specific questions that weren't covered by this post, I recommend looking at the official NGINX documentation here.

If you liked this post then you might also like reading some other stuff I've written:

A simple guide to deploying a Django app
An overview of Django server setups
How to manage logs with Django, Gunicorn and NGINX
A mini rant on Django performance: Is Django too slow?
A little series on Postgres database backups 1, 2, 3

If you found some of the stuff about HTTP in this post confusing, I heartily recommend checking out Brian Will's "The Internet" videos to learn more about what HTTP, TCP, and ports are: part 1, part 2, part 3, part 4.

And, of course, if you want to get updates on any new posts I write, you can subscribe to my blog's mailing list below.

How to automate your Postgres database backups

2020-06-05T12:00:00+10:00

If you've got a web app running in production, then you'll want to take regular database backups, or else you risk losing all your data. Taking these backups manually is fine, but it's easy to forget to do it. It's better to remove the chance of human error and automate the whole process. To automate your backup and restore you will need three things:

A safe place to store your backup files
A script that creates the backups and uploads them to the safe place
A method to automatically run the backup script every day

A safe place for your database backup files

You don't want to store your backup files on the same server as your database. If your database server gets deleted, then you'll lose your backups as well. Instead, you should store your backups somewhere else, like a hard drive, your PC, or in the cloud.

I like using cloud object storage for this kind of use-case. If you haven't heard of "object storage" before: it's just a kind of cloud service where you can store a bunch of files. All major cloud providers offer this service:

Amazon's AWS has the Simple Storage Service (S3)
Microsoft's Azure has Storage
Google Cloud also has Storage
DigitalOcean has Spaces

These object storage services are very cheap at around 2c/GB/month, you'll never run out of disk space, they're easy to access from command line tools and they have very fast upload/download speeds, especially to/from other services hosted with the same cloud provider. I use these services a lot: this blog is being served from AWS S3.

I like using S3 simply because I'm quite familiar with it, so that's what we're going to use for the rest of this post. If you're not already familiar with using the AWS command-line, then check out this post I wrote about getting started with AWS S3 before you continue.

Creating a database backup script

In my previous post on database backups I showed you a small script to automatically take a backup using PostgreSQL:

#!/bin/bash
# Backs up mydatabase to a file.
TIME=$(date "+%s")
BACKUP_FILE="postgres_${PGDATABASE}_${TIME}.pgdump"
echo "Backing up $PGDATABASE to $BACKUP_FILE"
pg_dump --format=custom > $BACKUP_FILE
echo "Backup completed for $PGDATABASE"

I'm going to assume you have set up your Postgres database environment variables (PGHOST, etc) either in the script, or elsewhere, as mentioned in the previous post. Next we're going to get our script to upload all backups to AWS S3.

Uploading backups to AWS Simple Storage Service (S3)

We will be uploading our backups to S3 with the aws command line (CLI) tool. To get this tool to work, we need to set up our AWS credentials on the server by either using aws configure or by setting the environment variables AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY. Once that's done we can use aws s3 cp to upload our backup files. Let's say we're using a bucket called "mydatabase-backups":

#!/bin/bash
# Backs up mydatabase to a file and then uploads it to AWS S3.
# First, dump database backup to a file
TIME=$(date "+%s")
BACKUP_FILE="postgres_${PGDATABASE}_${TIME}.pgdump"
echo "Backing up $PGDATABASE to $BACKUP_FILE"
pg_dump --format=custom > $BACKUP_FILE

# Second, copy file to AWS S3
S3_BUCKET=s3://mydatabase-backups
S3_TARGET=$S3_BUCKET/$BACKUP_FILE
echo "Copying $BACKUP_FILE to $S3_TARGET"
aws s3 cp $BACKUP_FILE $S3_TARGET

echo "Backup completed for $PGDATABASE"

You should be able to run this multiple times and see a new backup appear in your S3 bucket's webpage every time you do it. As a bonus, you can add a little one liner at the end of your script that checks for the last uploaded file to the S3 bucket:

BACKUP_RESULT=$(aws s3 ls $S3_BUCKET | tail -n 1)
echo "Latest S3 backup: $BACKUP_RESULT"

Once you're confident that your backup script works, we can move on to getting it to run every day.

Running cron jobs

Now we need to get our server to run this script every day, even when we're not around. The simplest way to do this is on a Linux server is with cron. Cron can automatically run scripts for us on a schedule. We'll be using the crontab tool to set up our backup job.

You can read more about how to use crontab here. If you find that you're having issues setting up cron, you might also find this StackOverflow post useful.

Before we set up our daily database backup job, I suggest trying out a test script to make sure that your cron setup is working. For example, this script prints the current time when it is run:

#!/bin/bash
echo $(date)

Using nano, you can create a new file called ~/test.sh, save it, then make it executable as follows:

nano ~/test.sh
# Write out the time printing script in nano, save the file.
chmod +x ~/test.sh

Then you can test it out a little by running it a couple of times to check that it is printing the time:

~/test.sh
# Sat Jun  6 08:05:14 UTC 2020
~/test.sh
# Sat Jun  6 08:05:14 UTC 2020
~/test.sh
# Sat Jun  6 08:05:14 UTC 2020

Once you're confident that your test script works, you can create a cron job to run it every minute. Cron uses a special syntax to specifiy how often a job runs. These "cron expressions" are a pain to write by hand, so I use this tool to generate them. The cron expression for "every minute" is the inscrutable string "* * * * *". This is the crontab entry that we're going to use:

# Test crontab entry
SHELL=/bin/bash
* * * * * ~/test.sh &>> ~/time.log

The SHELL setting tells crontab to use bash to execute our command
The "* * * * *" entry tells cron to execute our command every minute
The command ~/test.sh &>> ~/time.log runs our test script ~/test.sh and then appends all output to a log file called ~/time.log

Enter the text above into your user's crontab file using the crontab editor:

crontab -e

Once you've saved your entry, you should then be able to view your crontab entry using the list command:

crontab -l
# SHELL=/bin/bash
# * * * * * ~/test.sh &>> ~/time.log

You can check that cron is actually trying to run your script by watching the system log:

tail -f /var/log/syslog | grep CRON
# Jun  6 11:17:01 swarm CRON[6908]: (root) CMD (~/test.sh &>> ~/time.log)
# Jun  6 11:17:01 swarm CRON[6908]: (root) CMD (~/test.sh &>> ~/time.log)

You can also watch your logfile to see that time is being written every minute:

tail -f time.log
# Sat Jun 6 11:34:01 UTC 2020
# Sat Jun 6 11:35:01 UTC 2020
# Sat Jun 6 11:36:01 UTC 2020
# Sat Jun 6 11:37:01 UTC 2020

Once you're happy that you can run a test script every minute with cron, we can move on to running your database backup script daily.

Running our backup script daily

Now we're nearly ready to run our backup script using a cron job. There are a few changes that we'll need to make to our existing setup. First we need to write our database backup script to ~/backup.sh and make sure it is executable:

chmod +x ~/backup.sh

Then we need to crontab entry to run every day, which will be "0 0 * * *", and update our cron command to run our backup script. Our new crontab entry should be:

# Database backup crontab entry
SHELL=/bin/bash
0 0 * * * ~/backup.sh &>> ~/backup.log

Update your crontab with crontab -e. Now we wait! This script should run every night at midnight (server time) to take your database backups and upload them to AWS S3. If this isn't working, then change your cron expression so that it runs the script every minute, and use the steps I showed above to try and debug the problem.

Hopefully it all runs OK and you will have plenty of daily database backups to roll back to if anything ever goes wrong.

Automatic restore from the latest backup

When disaster strikes and you need your backups, you could manually view your S3 bucket, download the backup file, upload it to the server and manual run the restore, which I documented in my previous post. This is totally fine, but as a bonus I thought it would be nice to include a script that automatically downloads the latest backup file and uses it to restore your database. This kind of script would be ideal for dumping production data into a test server. First I'll show you the script, then I'll explain how it works:

#!/bin/bash
echo -e "\nRestoring database $PGDATABASE from S3 backups"

# Find the latest backup file
S3_BUCKET=s3://mydatabase-backups
LATEST_FILE=$(aws s3 ls $S3_BUCKET | awk '{print $4}' | sort | tail -n 1)
echo -e "\nFound file $LATEST_FILE in bucket $S3_BUCKET"

# Restore from the latest backup file
echo -e "\nRestoring $PGDATABASE from $LATEST_FILE"
S3_TARGET=$S3_BUCKET/$LATEST_FILE
aws s3 cp $S3_TARGET - | pg_restore --dbname $PGDATABASE --clean --no-owner
echo -e "\nRestore completed"

I've assumed that all the Postgres environment variables (PGHOST, etc) are already set elsewhere.

There are three tasks that are done in this script:

finding the latest backup file in S3
downloading the backup file
restoring from the backup file

So the first part of this script is finding the latest database backup file. The way we know which file is the latest is because of the Unix timestamp which we added to the filename. The first command we use is aws s3 ls, which shows us all the files in our backup bucket:

aws s3 ls $S3_BUCKET
# 2019-04-04 10:04:58     112309 postgres_mydatabase_1554372295.pgdump
# 2019-04-06 07:48:53     112622 postgres_mydatabase_1554536929.pgdump
# 2019-04-14 07:24:02     113484 postgres_mydatabase_1555226638.pgdump
# 2019-05-06 11:37:39     115805 postgres_mydatabase_1557142655.pgdump

We then use awk to isolate the filename. awk is a text processing tool which I use occasionally, along with cut and sed to mangle streams of text into the shape I want. I hate them all, but they can be useful.

aws s3 ls $S3_BUCKET | awk '{print $4}'
# postgres_mydatabase_1554372295.pgdump
# postgres_mydatabase_1554536929.pgdump
# postgres_mydatabase_1555226638.pgdump
# postgres_mydatabase_1557142655.pgdump

We then run sort over this output to ensure that each line is sorted by the time. The aws CLI tool seems to sort this data by the uploaded time, but we want to use our timestamp, just in case a file was manually uploaded out-of-order:

aws s3 ls $S3_BUCKET | awk '{print $4}' | sort
# postgres_mydatabase_1554372295.pgdump
# postgres_mydatabase_1554536929.pgdump
# postgres_mydatabase_1555226638.pgdump
# postgres_mydatabase_1557142655.pgdump

We use tail to grab the last line of the output:

aws s3 ls $S3_BUCKET | awk '{print $4}' | sort | tail -n 1
# postgres_mydatabase_1557142655.pgdump

And there's our filename! We use the $() command-substituation thingy to capture the command output and store it in a variable:

LATEST_FILE=$(aws s3 ls $S3_BUCKET | awk '{print $4}' | sort | tail -n 1)
echo $LATEST_FILE
# postgres_mydatabase_1557142655.pgdump

And that's part one of our script done: find the latest backup file. Now we need to download that file and use it to restore our database. We use the aws CLI to copy backup file from S3 and stream the bytes into stdout. This literally prints out your whole backup file into the terminal:

S3_TARGET=$S3_BUCKET/$LATEST_FILE
aws s3 cp $S3_TARGET -
# xtshirt9.5.199.5.19k0ENCODINENCODING
# SET client_encoding = 'UTF8';
# false00
# ... etc ...

The - symbol is commonly used in shell scripting to mean "write to stdout". This isn't very useful on it's own, but we can send that data to the pg_restore command via a pipe:

S3_TARGET=$S3_BUCKET/$LATEST_FILE
aws s3 cp $S3_TARGET - | pg_restore --dbname $PGDATABASE --clean --no-owner

And that's the whole script!

Next steps

Now you can set up automated backups for your Postgres database. Hopefully having these daily backups this will take a weight off your mind. Don't forget to do a test restore every now and then, because backups are worthless if you aren't confident that they actually work.

If you want to learn more about the Unix shell tools I used in this post, then I recommend having a go at the Over the Wire Wargames, which teaches you about bash scripting and hacking at the same time.

An introduction to cloud file storage

2020-06-05T11:00:00+10:00

Sometimes when you're running a web app you will find that you have a lot of files on your server. All these files will start to feel like a burden. You might worry about losing them all if the server fails, or you might be concerned about running out of disk space. You might even have multiple servers that all need to access these files.

Wouldn't it be nice if solving all these issues were someone else's problem? You would pay a few cents a month so that you never need to think about this again, right? I like using cloud object storage for hosting most of my web app's files and backups. If you haven't heard of "object storage" before: it's just a kind of cloud service where you can store a bunch of files. All major cloud providers offer this service:

Amazon's AWS has the Simple Storage Service (S3)
Microsoft's Azure has Storage
Google Cloud also has Storage
DigitalOcean has Spaces

I like using S3 simply because I'm quite familiar with it, so that's what we're going to use for the rest of this post. The other services are probably great as well. This video will take you through how to get started with AWS S3.

As an update to this video: AWS also ships a self-contained CLI tool that doesn't need to be installed in a virtual environment, which you can read about here. Eg:

URL="https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip"
curl $URL -o "awscliv2.zip"
unzip awscliv2.zip
sudo ./aws/install
aws --version

One great use-case for object storage like AWS S3 is hosting your database backups.

How to backup and restore a Postgres database

2020-06-04T12:00:00+10:00

You've deployed your Django web app to to the internet. Grats! Now you have a fun new problem: your app's database is full of precious "live" data, and if you lose that data, it's gone forever. If your database gets blown away or corrupted, then you will need backups to restore your data. This post will go over how to backup and restore PostgreSQL, which is the database most commonly deployed with Django.

Not everyone needs backups. If your Django app is just a hobby project then losing all your data might not be such a big deal. That said, if your app is a critical part of a business, then losing your app's data could literally mean the end of the business - people losing their jobs and going bankrupt. So, at least some of time, you don't want to lose all your data.

The good news is that backing up and restoring Postgres is pretty easy, you only need two commands: pg_dump and pg_restore. If you're using MySQL instead of Postgres, then you can do something very similar to the instructions in this post using mysqldump.

Taking database backups

I'm going to assume that you've already got a Postgres database running somewhere. You'll need to run the following code from a bash shell on a Linux machine that can access the database. In this example, let's say you're logged into the database server with ssh.

The first thing to do is set some Postgres-specifc environment variables to specify your target database and login credentials. This is mostly for our convenience later on.

# The server Postgres is running on
export PGHOST=localhost
# The port Postgres is listening on
export PGPORT=5432
# The database you want to back up
export PGDATABASE=mydatabase
# The database user you are logging in as
export PGUSER=myusername
# The database user's password
export PGPASSWORD=mypassw0rd

You can test these environment variables by running a psql command to list all the tables in your app's database.

psql -c "\dt"

# Output:
# List of relations
# Schema | Name          | Type  | Owner
#--------+---------------+-------+--------
# public | auth_group    | table | myusername
# public | auth_group... | table | myusername
# public | auth_permi... | table | myusername
# public | django_adm... | table | myusername
# .. etc ..

If psql is missing you can install it on Ubuntu or Debian using apt:

sudo apt install postgresql-client

Now we're ready to create a database dump with pg_dump. It's pretty simple to use because we set up those environment variables earlier. When you run pg_dump, it just spits out a bunch of SQL statements as hundreds, or even thousands of lines of text. You can take a look at the output using head to view the first 10 lines of text:

pg_dump | head

# Output:
# --
# -- PostgreSQL database dump
# --
# -- Dumped from database version 9.5.19
# -- Dumped by pg_dump version 9.5.19
# SET statement_timeout = 0;
# SET lock_timeout = 0;
# SET client_encoding = 'UTF8';

The SQL statements produced by pg_dump are instructions on how to re-create your database. You can turn this output into a backup by writing all this SQL text into a file:

pg_dump > mybackup.sql

That's it! You now have a database backup. You might have noticed that storing all your data as SQL statements is rather inefficient. You can compress this data by using the "custom" dump format:

pg_dump --format=custom > mybackup.pgdump

This "custom" format is ~3x smaller in terms of file size, but it's not as pretty for humans to read because it's now in some funky non-text binary format:

pg_dump --format=custom | head

# Output:
# xtshirt9.5.199.5.19k0ENCODINENCODING
# SET client_encoding = 'UTF8';
# false00
# ... etc ...

Finally, mybackup.pgdump is a crappy file name. It's not clear what is inside the file. Are we going to remember which database this is for? How do we know that this is the freshest copy? Let's add a timestamp plus a descriptive name to help us remember:

# Get Unix epoch timestamp
# Eg. 1591255548
TIME=$(date "+%s")
# Descriptive file name
# Eg. postgres_mydatabase_1591255548.pgdump
BACKUP_FILE="postgres_${PGDATABASE}_${TIME}.pgdump"
pg_dump --format=custom > $BACKUP_FILE

Now you can run these commands every month, week, or day to get a snapshot of your data. If you wanted, you could write this whole thing into a bash script called backup.sh:

#!/bin/bash
# Backs up mydatabase to a file.
export PGHOST=localhost
export PGPORT=5432
export PGDATABASE=mydatabase
export PGUSER=myusername
export PGPASSWORD=mypassw0rd
TIME=$(date "+%s")
BACKUP_FILE="postgres_${PGDATABASE}_${TIME}.pgdump"
echo "Backing up $PGDATABASE to $BACKUP_FILE"
pg_dump --format=custom > $BACKUP_FILE
echo "Backup completed"

You should avoid hardcoding passwords like I just did above, it's better to pass credentials in as a script argument or environment variable. The file /etc/environment is a nice place to store these kinds of credentials on a secure server.

Restoring your database from backups

It's pointless creating backups if you don't know how to use them to restore your data. There are three scenarios that I can think of where you want to run a restore:

You need to set up your database from scratch
You want to rollback your exiting database to a previous time
You want to restore data in your dev environment

I'll go over these scenarios one at a time.

Restoring from scratch

Sometimes you can lose the database server and there is nothing left. Maybe you deleted it by accident, thinking it was a different server. Luckily you have your database backup file, and hopefully some automated configuration management to help you quickly set the server up again.

Once you've got the new server provisioned and PostgreSQL installed, you'll need to recreate the database and the user who owns it:

sudo -u postgres psql <<-EOF
    CREATE USER $PGUSER WITH PASSWORD '$PGPASSWORD';
    CREATE DATABASE $PGDATABASE WITH OWNER $PGUSER;
EOF

Then you can set up the same environment variables that we did earlier (PGHOST, etc.) and then use pg_restore to restore your data. You'll probably see some warning errors, which is normal.

BACKUP_FILE=postgres_mydatabase_1591255548.pgdump
pg_restore --dbname $PGDATABASE $BACKUP_FILE

# Output:
# ... lots of errors ...
# pg_restore: WARNING:  no privileges were granted for "public"
# WARNING: errors ignored on restore: 1

I'm not 100% on what all these errors mean, but I believe they're mostly related to the restore script trying to modify Postgres objects that your user does not have permission to modify. If you're using a standard Django app this shouldn't be an issue. You can check that the restore actually worked by checking your tables with psql:

# Check the tables
psql -c "\dt"

# Output:
# List of relations
# Schema | Name          | Type  | Owner
#--------+---------------+-------+--------
# public | auth_group    | table | myusername
# public | auth_group... | table | myusername
# public | auth_permi... | table | myusername
# public | django_adm... | table | myusername
# .. etc ..

# Check the last migration
psql -c "SELECT * FROM django_migrations ORDER BY id DESC LIMIT 1"

# Output:
#  id |  app   | name      | applied
# ----+--------+-----------+---------------
#  20 | tshirt | 0003_a... | 2019-08-26...

There you go! Your database has been restored. Crisis averted.

Rolling back an existing database

If you want to roll your existing database back to an previous point in time, deleting all new data, then you will need to use the --clean flag, which drops your restored database tables before re-creating them (docs here):

BACKUP_FILE=postgres_mydatabase_1591255548.pgdump
pg_restore --clean --dbname $PGDATABASE $BACKUP_FILE

Restoring a dev environment

It's often beneficial to restore a testing or development database from a known backup. When you do this, you're not so worried about setting up the right user permissions. In this case you want to completely destroy and re-create the database to get a completely fresh start, and you want to use the --no-owner flag to ignore any database-user related stuff in the restore script:

sudo -u postgres psql -c "DROP DATABASE $PGDATABASE"
sudo -u postgres psql -c "CREATE DATABASE $PGDATABASE"
BACKUP_FILE=postgres_mydatabase_1591255548.pgdump
pg_restore --no-owner --dbname $PGDATABASE $BACKUP_FILE

I use this method quite often to pull non-sensitive data down from production environments to try and reproduce bugs that have occured in prod. It's much easier to fix mysterious bugs when you have regular database backups, error reporting and centralized logging.

Next steps

I hope you now have the tools you need to backups and restore your Django app's Postgres database. If you want to read more the Postgres docs have a good section on database backups.

Once you've got your head around database backups, you should automate the process to make it more reliable. I will show you how to do this in this follow-up post.

Cloudflare makes DNS slightly less painful

2020-04-18T12:00:00+10:00

When you're setting up a new website, there's a bunch of little tasks that you have to do that suck. They're important, but they don't give you the joy of creating something new, they're just... plumbing.

In particular I'm thinking of:

setting up your domain name with DNS records
encrypting your traffic with SSL
compressing and caching your static assets (CSS, JS) using a CDN

No one decided to learn web development because they were super stoked on DNS. The good news is that you can use Cloudflare (for free) to make all these plumbing tasks a little less painful.

In the rest of this post I'll go over the pros and cons of using Cloudflare, plus a short video guide on how to start using it.

What is Cloudflare

Cloudflare is a reverse proxy service that you put in-between you website visitors and your website's server. All requests that hit your website are routed through Cloudflare's servers first. This means that they can provide:

DNS record configuration: allowing you to set up A records, CNAMEs etc for your domain.
HTTP traffic encryption using SSL: All HTTP traffic between the end-user and Cloudflare's servers are encrypted with SSL (making it HTTPS)
Caching of static assets: Cloudflare will cache static assets like CSS and JS depending on the "Cache-Control" headers set by your origin server.
Compression of static assets: Cloudflare will compress static assets like CSS and JS so that your pages load and render faster.

This is a whooole lot of bullshit that I don't want to set up myself, if I can avoid it, so it's nice when Cloudflare handles it for me.

Cloudflare pros

In addition to the features I listed above, there are a few nice I've found when using Cloudflare:

Free: It has a free plan which is sufficient for all the projects I've worked on so far
Easy to use: I think it's uncommonly easy to set up and use for tools in its field
CNAME flattening: They provide a handy DNS feature called "CNAME flattening", which means you can point your root domain name (eg. "mattsegal.dev") to other domain names (eg. an AWS S3 bucket website "mattsegal.dev.s3-blah.aws.com"). As far as I know only Cloudflare provides this feature.
Flexible SSL: Their "flexible SSL" feature is both a pro and a con. It works like this: traffic between you users and Cloudflare are encrypted, but traffic between Cloudflare are your servers are not encrypted. As long as you trust Cloudflare or intermediate routers not to snoop on your packets, this is a nice setup. In this case setting up flexible SSL is as simple as toggling a button on the website. You can set up end-to-end encryption but that's a little more work. Let's Encrypt has made setting up SSL much easier and cheaper for developers, but it's still relatively complex compared to Cloudflare's "flexible" implementation.
Faster DNS updates?: I might be imagining things, but I find that updates to DNS records in Cloudflare seem to propagate faster than other services.
Analytics: They provide some basic analytics like unique visitors and download bandwidth, which is nice, I guess

Cloudflare cons

The biggest main con I see for using Cloudflare is that you're not learning to use open source alternatives like self-hosted NGINX to do the same job. If you are an NGINX expert already then you're a big boy/girl and you can make your own decisions about what tools to use. If you're a newer developer and you've never set up a webserver like NGINX and Apache, then you're robbing yourself of useful infrastructure experience if you only ever use Cloudflare for everything.

That said, I think that newer developers should start deploying websites using services like Cloudflare, and then learn how to use tools like NGINX.

Another, more abstract downside, is that some double-digit percentage of the internet's websites use Cloudflare. If you're worried about centralization of control of the internet, then Cloudflare's growing consolidation of internet traffic is a concern. Personally I don't really care about that right now.

How to get started

This video shows you how to get set up with Cloudflare.

What now?

Once you've set up Cloudflare, you'll need to start creating some DNS records. I've written a guide on exactly this topic to help you get set up. I suggest you check it out so you can give your website a domain name.

DNS for beginners: how to give your site a domain name

2020-04-13T12:00:00+10:00

You are learning how to build a website and you want to give it a domain name like mycoolwebsite.com. It doesn't seem like a real website without a domain name, does it? How is anybody going to find your website without one? Setting up your domain is an important step for launcing your website, but it's also a real pain if you're new to web development. I want to help make this job a little easier for you.

Typically you go to namecheap or GoDaddy or some other domain name vendor and you buy mycoolwebsite.com for 12 bucks a year - now you need to set it up. When you try to get started you are confronted by all these bizzare terms: "A record", "CNAME", "nameserver". It can be quite intimidating. The rest of this blog will show you the basics of how to set up your domain, with a few explanations sprinkled throughout.

Contents:

What the fuck is DNS?
I want my domain name to go to an IP address
I want my domain name to go to a different domain name
I want to give control of my domain name to another service

What the fuck is DNS?

I'll keep this short. I think CloudFlare explains it best:

The Domain Name System (DNS) is the phonebook of the Internet. Humans access information online through domain names, like nytimes.com or espn.com. Web browsers interact through Internet Protocol (IP) addresses. DNS translates domain names to IP addresses so browsers can load Internet resources.

DNS is a worldwide, online "phonebook" that translates human-friendly website names like "mattsegal.dev" into computer-friendly numbers like 192.168.1.1. You use the domain name system every day:

You type "mattsegal.dev" into your web browser and press "Enter"
Your computer will reach out into the domain name system and ask other computers to find out which IP address "mattsegal.dev" points to
Your computer eventually finds the correct IP address
Your web browser fetches a web page from that IP address

So, how do we get our website into this "phonebook"?

I want my domain name to go to an IP address

Sometimes you have an IP address like 11.22.33.44 and you want your domain name to send users to that IP. You want a mapping like this:

mycoolwebsite.com --> 11.22.33.44

You will need this when you are running software like WordPress, or your own custom web app. Your website is running on a server and that server has an IP address. For example, I have a website mattslinks.xyz which runs on a webserver which has a public IP of 167.99.78.141. My users (me, my girlfriend) don't want to type in 167.99.78.141 into our browsers to visit my site. We'd prefer to type in mattslinks.xyz, which is way easier to remember. So I need to set up a mapping using DNS:

mattslinks.xyz --> 167.99.78.141

So how do we set this up? We need an A record ("address record") to do this. An A record maps a domain name to an IP address. To set up an A record you need to go onto your domain name provider's website and enter the subdomain name you want plus the IP address that you wanto to point to.

What I've set up here is:

mattslinks.xyz --> 167.99.78.141
www.mattslinks.xyz --> 167.99.78.141

At this point you may yell "What the fuck is a subdomain!?" at your monitor. Please do, it's cathartic. The idea is that when you own mattslinks.xyz, you also own a near-infinite number of "child domains" which end in mattslinks.xyz. For example you can set up A records (and other DNS records) for all these domain names:

mattslinks.xyz ("root domain", sometimes written as "@")
www.mattslinks.xyz (a subdomain)
blog.mattslinks.xyz (a different subdomain)
cult.mattslinks.xyz
super.secret.clubhouse.mattslinks.xyz

Apparently you can do this to up to 255 characters (including the dots) so this.is.a.very.long.domain.name.but.i.advise.against.doing.this.mattslinks.xyz is technically possible, but a stupid idea.

If you're serving a normal website, then it's pretty standard to add A records for both your root domain (mattslinks.xyz) and the "www" subdomain (www.mattslinks.xyz), because some people might put "www" in front of the domain name and we don't want them to miss our website.

Just in case this all seems a little too abstract and theoretical for you, here's a video of me setting some A records:

And then, 30 minutes later, checking if I've gone mad or not...

Finally, the record updates and I add a www subdomain

You might also be wondering about the TTL value. It's not that important, just set it to 3600. If you care to know, TTL stands for "time to live" and it represents how long your DNS records is going to hang around in the system before anybody checks the records you set. So if it's 3600 (seconds), it means it takes at least an hour for changes that you make to your DNS records to update on other people's computers.

So you have an A record set up, how do you check that it's working? The easiest way is to wait an hour or so and then use a 3rd party website like DNS checker. If you're a little more technical and have a bash shell handy you can also try using dig from your local machine.

I want my domain name to go to a different domain name

Sometimes your DNS needs are a little more complicated than just mapping a domain name to an IP address. Sometimes you want to do this instead:

prettyname.com --> ugly-name-for-pretty-site.ap-southeast2.amazon.aws.com

That is to say, you want users to type in www.prettyname.com, but you want them to see the website which is hosted on ugly-name-for-pretty-site.ap-southeast2.amazon.aws.com, but you never want them to know about the hideous name that lies beneath.

For this problem you need a CNAME record ("canonical name"). A CNAME record is used to map from one domain name to another.

Here's an example of me setting up a CNAME record in CloudFlare:

I want to give control of my domain name to another service

Sometimes you you want to give control of a domain to another service. This can happen when you're using a service like Squarespace or Webflow and you want them to set up all your DNS records for you, or if you want to use a different service (like CloudFlare) to manage your DNS.

The way to set this up is to use set the name servers of your domain. Changing the name servers, as far as I can tell, gives the target servers full control of your domain. In this video, I'll show you some examples.

Conclusion

So there you go, some basic DNS-how-tos. With A records, CNAMES and name servers under your belt, you should be able to do ~70% of DNS tasks that you need in web development. Get a handle on TXT and MX records, and you're up to ~95%. DNS is horrible to work with, but it doesn't need to be confusing.

This certainly isn't the definitive guide on DNS, and I expect I made some technical errors in my explanations, but I hope you now have the tools to go out an setup some websites.

9 commands for debugging Django in Docker containers

2020-04-08T12:00:00+10:00

You want to get started "Dockerizing" your Django environment and you do a tutorial which shows you how to set it all up with docker-compose. You follow the listed commands and everything is working. Cool!

A few days later there's an error in your code and you want to debug the issue. What caused your dev environment to break? Is it your code? Is it a dependencies issue? Is it a Docker thing? How can you tell?

I've compiled a list of handy Docker commands that I whip out in these "what the fuck is happening!?!?" situations to help me get to the bottom of the issue:

Rebuild from scratch
Run a debugger
Get a bash shell in a running container
Get a bash shell in a brand new container
Run a script
Poke around inside of a PostgreSQL container
Watch some logs
View volumes
Destroy absolutely everything

Rebuild from scratch

Sometimes you want to rebuild you Docker image from scratch, just to make sure. Rebuilding with the --no-cache flag ensures that your Dockerfile is executed from start to finish, with no intermediate cached layers used.

For docker:

docker build --no-cache .

For docker-compose, assuming you have a "web" service:

docker-compose build --no-cache web

Run a debugger

You might notice that using docker-compose, Django's runserver and the pdb debugger together doesn't really work.

If you've plopped your debugger into a Django view for example:

def my_view(request):
    things = Thing.objects.all()
    result = do_stuff(things)
    # Launch Python command-line debugger
    import pdb;pdb.set_trace()
    return JsonResponse(result)

... and your docker-compose.yml file is something like this:

services:
  web:
    command: ./manage.py runserver
    # ... more stuff ...

... and you start your services like this:

docker-compose up web

Then your Python debugger will never work! When the view hits the pdb.set_trace() function, you'll always see this horrible error:

 # ... 10 million lines of stack trace ...
  File "/usr/lib/python3.6/bdb.py", line 51, in trace_dispatch
    return self.dispatch_line(frame)
  File "/usr/lib/python3.6/bdb.py", line 70, in dispatch_line
    if self.quitting: raise BdbQuit
bdb.BdbQuit

This is an easy fix. The debugger, which is inside the Docker container, is trying to communicate with your terminal, which is outside of the Docker container, via some port, which is closed - hence the error. So we need to tell Docker to keep the required port open with --service-ports. More info here:

docker-compose run --rm --service-ports web

Now when you hit the debugger you will get a functional, interactive pdb interface in your terminal.

Get a bash shell in a running container

Sometimes you want to poke around inside a container that is already running. You might want to cat a file, run ls or inspect the output of ps auxww. To get inside a running container you can use docker's exec command.

First, you need to get the running container's id:

docker ps

Which will get you and output like

CONTAINER ID    ...    NAMES
0dd3d893u8d3    ...    web
518f741c4415    ...    worker
0ce1cfd9c99f    ...    database

Say I wanted to poke around in the "worker" container, then I need to note its id of "518f741c4415" and then run bash using docker exec:

docker exec -it 518f741c4415 bash

It's a little easier if you're using docker-compose. If you want to get into an already running "web" container:

docker-compose exec web bash

Get a bash shell in a brand new container

Sometimes you want to poke around inside a container that is based on an image, to see what is baked into the image. You can do this using docker or docker-compose.

For a service set up like this:

services:
  web:
    image: myimage:latest
    # ... more stuff ...

You can run the image myimage using docker:

docker run --rm -it myimage:latest bash

Or via docker-compose:

docker-compose run --rm web bash

Note the --rm flag, which will save you from having all these single use containers lying around, using up disk space.

Run a script

If you just want to run a script in a single-use, throw away container, you can use the run command as well. This is particularly useful for running management commands or unit tests:

docker-compose run --rm web ./manage.py migrate

Note: this only works if your container's default working dir is contains ./manage.py.

Poke around inside of a PostgreSQL container

If you're using Django and docker-compose then you're likely running a PostgreSQL container, set up something like this:

services:
  database:
    image: postgres
    # ... more stuff ...
    environment:
      POSTGRES_HOST_AUTH_METHOD: "trust"

  web:
    command: ./manage.py runserver
    # ... more stuff ...
    environment:
      PGDATABASE: postgres
      PGUSER: postgres
      PGPASSWORD: password
      PGHOST: database
      PGPORT: 5432

Then you can use the psql command line from the web container to check out your database tables:

docker-compose run --rm web psql

Watch some logs

Sometimes you have a container, like a Celery worker or database, which is running in the background and you want to see its console output. Even better, you want to watch its console output in realtime. You can do this with logs. For example, if I want to follow the output of the "worker" container:

docker-compose logs --tail 100 -f worker

View volumes

Sometimes when you're having issues with volume you want to double check what volumes you have and how they're set up. This is relatively straightforward.

To see all volumes:

docker volume ls

Which gets output like

DRIVER              VOLUME NAME
local               docker_postgres-data

And then to drill down into one volume:

docker volume inspect docker_postgres-data

Giving you something like

[
  {
    "CreatedAt": "2020-04-08T12:44:34+10:00",
    "Driver": "local",
    "Labels": {
      "com.docker.compose.project": "docker",
      "com.docker.compose.version": "1.23.1",
      "com.docker.compose.volume": "postgres-data"
    },
    "Mountpoint": "/var/lib/docker/volumes/docker_postgres-data/_data",
    "Name": "docker_postgres-data",
    "Options": null,
    "Scope": "local"
  }
]

If that doesn't help you, there's always the next step.

Destroy absolutely everything

There's a Docker command that removes all your "unused" data:

docker system prune

That's nice, it might free up some disk space, but what if you want to go full scorched-earth on your Docker envrionemnt? Like tear down Carthage and salt the fields so that nothing will ever grow again?

Here's a script I use occasionally when I just want to get rid of everything and start afresh:

# Stop all containers
docker kill $(docker ps -q)

# Remove all containers
docker rm $(docker ps -a -q)

# Remove all docker images
docker rmi $(docker images -q)

# Remove all volumes
docker volume rm $(docker volume ls -q)

Burn it all down I say! From the ashes, we will rebuild!

If this doesn't fix your issue, I recommend that you throw your laptop out a window, sell all your worldy possesions and start a new life in the wilderness.

Introduction to configuration management

2020-04-08T12:00:00+10:00

This is a talk I gave at the Melbourne Junior dev meetup:

Have you ever found a bug in prod, which wasn't caught earlier because of a missing folder, library, or file permission? It sucks! This talk goes over some practices and tools that you can use to keep your environments consistent and share knowledge with the rest of your team.

Matt's Dev Blog - DevOps

A breakdown of how NGINX is configured with Django

What is this file supposed to achieve?

Server block

Multiple virtual servers

Get alerted when I publish new blog posts

Location blocks

Reverse proxy location

Proxy pass

NGINX is lying to you

Setting the Host header

Setting the X-Forwarded-Whatever headers

Setting the X-Forwarded-Proto header

Setting the X-Forwarded-For header

Proxy redirect

Static block

Next steps

How to automate your Postgres database backups

A safe place for your database backup files

Creating a database backup script

Uploading backups to AWS Simple Storage Service (S3)

Running cron jobs

Running our backup script daily

Automatic restore from the latest backup

Next steps

An introduction to cloud file storage

How to backup and restore a Postgres database

Taking database backups

Restoring your database from backups

Restoring from scratch

Rolling back an existing database

Restoring a dev environment

Next steps

Cloudflare makes DNS slightly less painful

What is Cloudflare

Cloudflare pros

Cloudflare cons

How to get started

What now?

DNS for beginners: how to give your site a domain name

What the fuck is DNS?

I want my domain name to go to an IP address

I want my domain name to go to a different domain name

I want to give control of my domain name to another service

Conclusion

9 commands for debugging Django in Docker containers

Rebuild from scratch

Run a debugger

Get a bash shell in a running container

Get a bash shell in a brand new container

Run a script

Poke around inside of a PostgreSQL container

Watch some logs

View volumes

Destroy absolutely everything

Introduction to configuration management