Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for Reverse Proxy authentication backends (like authelia) #866

Merged
merged 4 commits into from
Jan 10, 2023
Merged

Support for Reverse Proxy authentication backends (like authelia) #866

merged 4 commits into from
Jan 10, 2023

Conversation

ajgon
Copy link
Contributor

@ajgon ajgon commented Sep 30, 2021

Summary

Adds support to reverse proxy authentication backends (like authelia) via configured HTTP header.

Related issues

#773

Changes these areas

  • Bugfixes
  • Feature behavior
  • Command line interface
  • Configuration options
  • Internal architecture
  • Snapshot data layout on disk

@pirate pirate marked this pull request as draft September 30, 2021 18:39
@pirate
Copy link
Member

pirate commented Sep 30, 2021

Thanks for this PR and the documentation PR to go with it! It looks like it's on a good track so far.

Quick question: Where is REVERSE_PROXY_USER_HEADER actually handled? I don't see any code passing it to some auth mechanism?

@pirate pirate added size: medium is: enhancement status: wip Work is in-progress / has already been partially completed touches: API/CLI/Spec touches: configuration touches: docs why: security Intended to improve ArchiveBox security or data integrity labels Sep 30, 2021
@ajgon
Copy link
Contributor Author

ajgon commented Oct 1, 2021

Thanks for this PR and the documentation PR to go with it! It looks like it's on a good track so far.

Quick question: Where is REVERSE_PROXY_USER_HEADER actually handled? I don't see any code passing it to some auth mechanism?

Everything is handled by standard RemoteUserMiddleware class from django. I've just created subclass for it to handle customizable header and whitelisting. I was basing this on this.

@ajgon
Copy link
Contributor Author

ajgon commented Oct 1, 2021

Let me also explain how this is supposed to work, to ensure that we're on the same page :) :

  • User visits archivebox, which is protected by auth proxy (let's use authelia for the example, but the pattern is the same for vouch, keycloak etc.)
  • User signs in in authelia. Authelia takes care of authentication, and is our "source of truth" about user identity. The idea is, that we trust the proxy, it handled authentication properly and now wishes to let us know about it.
  • Authelia sends login of authenticated user in Remote-User header (it also sends Remote-Email, and some other ones, but the one which we are interested in, is only user). This is the username of the user which we wish to authenticate archivebox side (so it has to match database records).
    • Other auth systems use different headers (X-Remote-User for example) - that's why, the header name is configurable via REVERSE_PROXY_USER_HEADER config option.
  • Now, this is also a potential security flaw, as malicious attacker can send it's own Remote-User: admin header and try to log in, mimicing the proxy. Usually it shouldn't be the problem, because when using authelia, archivebox should never be exposed to the world, it should always be proxied through authelia.
  • But we live in a real world, and misconfiguration accidents happen :( To add extra layer of security, I introduced a REVERSE_PROXY_WHITELIST mechanism. It's very simple concept - we only allow the given list of IP classes (CIDRs), to actually provide Remote-User header. Every other IP sending header is ignored. Usually, this should be configured only to authelia IP. For example, for docker environments setting this to 172.16.0.0/12 should usually be sufficient enough.
    • We are only interested in IP of the proxy itself, we should never care about the client IP (so, we should never use any X-Forwarded-For and similar headers). This is handled by request.META.get('REMOTE_ADDR') which is raw IP of the connecting service (in this case - the proxy).
  • By default REVERSE_PROXY_WHITELIST is set to empty string, which means "blacklist everything". This effectively means, this feature is disabled by default - so if somebody doesn't care about it, and sets up archivebox "out from the box", they are not exposed to any security issues as the header will never be taken into consideration.

Hopefully this clears things up, and explain what I'm trying to build here :)

@ajgon ajgon marked this pull request as ready for review April 20, 2022 09:40
@iarp
Copy link

iarp commented Oct 4, 2022

@pirate @dugite-code Hi, any chance of this PR being merged soon? Or where are things left? It would be quite nice to get it through.

@dugite-code
Copy link
Contributor

dugite-code commented Oct 6, 2022

I've been running this patched into the current version of ArchiveBox for a while. As long as the user exists initially the SSO works as expected, however any auto-created users do not have the correct permissions assigned leaving users stuck on the login page with the message: You are authenticated as example.user, but are not authorized to access this page. Would you like to login to a different account? They can manually navigate to the public url and show as logged in.

The user needs to be set as staff and at least given viewing permissions, although it appears you can still add/remove snapshots with just viewing permissions. Note: you cannot modify auto created users after the fact.

@iarp
Copy link

iarp commented Oct 7, 2022

What about adding a server config option that adds the RemoteUserBackend to AUTHENTICATION_BACKENDS and adding a note to the config page with something along the lines of "If you enable this, any users using this option with ArchiveBox must be set with correct permissions of ___".

Another option could be to alter RemoteUserBackend.configure_user to add the necessary permissions and flags (depending on more server config options).

@ajvpot
Copy link

ajvpot commented Feb 23, 2024

I've been running this patched into the current version of ArchiveBox for a while. As long as the user exists initially the SSO works as expected, however any auto-created users do not have the correct permissions assigned leaving users stuck on the login page with the message: You are authenticated as example.user, but are not authorized to access this page. Would you like to login to a different account? They can manually navigate to the public url and show as logged in.

The user needs to be set as staff and at least given viewing permissions, although it appears you can still add/remove snapshots with just viewing permissions. Note: you cannot modify auto created users after the fact.

I'm also experiencing this. Would it be possible to add another config option for which permissions to grant automatically created users?

@pirate
Copy link
Member

pirate commented Feb 29, 2024

Would it be possible to add another config option for which permissions to grant automatically created users?

Yes, we recently added this same fix for LDAP auth here: #1335

I would just set is_staff=True and is_superuser=True for now, don't bother with more granular permissions as the rest of the codebase doesn't support row-level permissions yet.

Unfortunately I'm a bit overloaded with paying client work right now so I probably won't get around to implementing this myself, but if you submit a PR to add this I'd be happy to review it!

@pirate pirate removed the status: wip Work is in-progress / has already been partially completed label Feb 29, 2024
@a10kiloham
Copy link

a10kiloham commented Jun 3, 2024

I'm getting CSRF errors when I try and do things like a re-crawl. I have my From looking online it seems like there ought to be a setting to allow TRUSTED_HOSTS or otherwise USE_X_FORWARDED_HOST = True should be true in Django I think?
I have my settings as follows:

  • ALLOWED_HOSTS=*
  • REVERSE_PROXY_WHITELIST=172.0.0.0/8
  • REVERSE_PROXY_USER_HEADER=Remote-User
    which I believe should be correct? This is using Authelia and Caddy fwiw.

@pirate
Copy link
Member

pirate commented Jun 4, 2024

It's possible, can you test if that change fixes it and report back?

  • find the archivebox source code location on your machine archivebox version | grep PACKAGE_DIR
  • then edit PACKAGE_DIR/core/settings.py and add USE_X_FORWARDED_HOST = True
  • restart archivebox to see if that works

@lkubb
Copy link

lkubb commented Jun 5, 2024

I'm getting CSRF errors

Seeing the same during a regular login with the dev docker container. I could fix it with the following in settings.py (no USE_X_FORWARDED_HOST necessary)

CSRF_TRUSTED_ORIGINS = ["https://my.archivebox.domain"]

Side note: The logout button does not work either since it results in a GET request, but needs a POST with CSRF token.

@a10kiloham
Copy link

As a quick fix for this I just edited my docker-compose as follows (inserting my reverse proxied external address into the trusted origins line.

services:
    archivebox:
      build:
        context: .
        dockerfile_inline: |
          FROM archivebox/archivebox:dev
          RUN echo 'CSRF_TRUSTED_ORIGINS = ["https://archive.example.com"]' >> /app/archivebox/core/settings.py
      command: server --quick-init 0.0.0.0:8000
      container_name: archivebox

@Martsial
Copy link

Martsial commented Jul 20, 2024

As a quick fix for this I just edited my docker-compose as follows (inserting my reverse proxied external address into the trusted origins line.

services:
    archivebox:
      build:
        context: .
        dockerfile_inline: |
          FROM archivebox/archivebox:dev
          RUN echo 'CSRF_TRUSTED_ORIGINS = ["https://archive.example.com"]' >> /app/archivebox/core/settings.py
      command: server --quick-init 0.0.0.0:8000
      container_name: archivebox

The same fix for k8s:

containers:
    - image: archivebox/archivebox:<tag>
      name: archivebox
      args: ["server", "--quick-init", "0.0.0.0:8000"]
      lifecycle:
        postStart:
          exec:
            command: ["/bin/sh", "-c", "echo 'CSRF_TRUSTED_ORIGINS = [\"https://example.org\"]' >> /app/archivebox/core/settings.py"]

@pirate
Copy link
Member

pirate commented Aug 23, 2024

Ok this should be fixed now on dev: 9c35f3d

I've added CSRF_TRUSTED_ORIGINS to the available config, no need for the dockerfile override hack anymore.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants