Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: :dev branch container gives permission denied errors on migration script (in /app) even after new install #1002

Open
agnosticlines opened this issue Jul 24, 2022 · 13 comments
Labels
size: easy status: done Work is completed and released (or scheduled to be released in the next version)
Milestone

Comments

@agnosticlines
Copy link

agnosticlines commented Jul 24, 2022

Hi there,

I've been using ArchiveBox for quite a while, previously I was using the :latest branch, which works fine for most things, except that SingleFile doesn't work because of the missing link to /usr/bin/chromium-browser

I see there's a fix in the dockerfile, so when I switch to the dev docker release the link is correct however I cannot start the server due to permission errors, this happens regardless of PUID, PGID host permissions (777 for debugging) of the data folder, the issue appears to be for some reason the /app/ folder has the wrong permissions

Initial logs

Attaching to archivebox-archivebox-1
archivebox-archivebox-1  | Change in ownership detected, please be patient while we chown existing files
archivebox-archivebox-1  | This could take some time...
archivebox-archivebox-1  | chown: changing ownership of '/data/index.sqlite3': Permission denied
archivebox-archivebox-1  | chown: changing ownership of '/data/archive': Permission denied
archivebox-archivebox-1  | chown: changing ownership of '/data/logs/errors.log': Permission denied
archivebox-archivebox-1  | chown: changing ownership of '/data/logs': Permission denied
archivebox-archivebox-1  | chown: changing ownership of '/data/sources': Permission denied
archivebox-archivebox-1  | chown: changing ownership of '/data/ArchiveBox.conf': Permission denied
archivebox-archivebox-1  | chown: changing ownership of '/data': Permission denied
archivebox-archivebox-1  | chown: changing ownership of '/data': Permission denied
archivebox-archivebox-1  | [i] [2022-07-24 12:00:30] ArchiveBox v0.6.3: archivebox server --quick-init 0.0.0.0:8000
archivebox-archivebox-1  |     > /data
archivebox-archivebox-1  |
archivebox-archivebox-1  | [^] Verifying and updating existing ArchiveBox collection to v0.6.3...
archivebox-archivebox-1  | ----------------------------------------------------------------------
archivebox-archivebox-1  |
archivebox-archivebox-1  | [*] Verifying archive folder structure...
archivebox-archivebox-1  |     + ./archive, ./sources, ./logs...
archivebox-archivebox-1  |     + ./ArchiveBox.conf...
archivebox-archivebox-1  |
archivebox-archivebox-1  | [*] Verifying main SQL index and running any migrations needed...
archivebox-archivebox-1  | Traceback (most recent call last):
archivebox-archivebox-1  |   File "/usr/local/bin/archivebox", line 33, in <module>
archivebox-archivebox-1  |     sys.exit(load_entry_point('archivebox', 'console_scripts', 'archivebox')())
archivebox-archivebox-1  |   File "/app/archivebox/cli/__init__.py", line 140, in main
archivebox-archivebox-1  |     run_subcommand(
archivebox-archivebox-1  |   File "/app/archivebox/cli/__init__.py", line 80, in run_subcommand
archivebox-archivebox-1  |     module.main(args=subcommand_args, stdin=stdin, pwd=pwd)    # type: ignore
archivebox-archivebox-1  |   File "/app/archivebox/cli/archivebox_server.py", line 64, in main
archivebox-archivebox-1  |     server(
archivebox-archivebox-1  |   File "/app/archivebox/util.py", line 114, in typechecked_function
archivebox-archivebox-1  |     return func(*args, **kwargs)
archivebox-archivebox-1  |   File "/app/archivebox/main.py", line 1280, in server
archivebox-archivebox-1  |     run_subcommand('init', subcommand_args=['--quick'], stdin=None, pwd=out_dir)
archivebox-archivebox-1  |   File "/app/archivebox/cli/__init__.py", line 80, in run_subcommand
archivebox-archivebox-1  |     module.main(args=subcommand_args, stdin=stdin, pwd=pwd)    # type: ignore
archivebox-archivebox-1  |   File "/app/archivebox/cli/archivebox_init.py", line 43, in main
archivebox-archivebox-1  |     init(
archivebox-archivebox-1  |   File "/app/archivebox/util.py", line 114, in typechecked_function
archivebox-archivebox-1  |     return func(*args, **kwargs)
archivebox-archivebox-1  |   File "/app/archivebox/main.py", line 344, in init
archivebox-archivebox-1  |     for migration_line in apply_migrations(out_dir):
archivebox-archivebox-1  |   File "/app/archivebox/util.py", line 114, in typechecked_function
archivebox-archivebox-1  |     return func(*args, **kwargs)
archivebox-archivebox-1  |   File "/app/archivebox/index/sql.py", line 143, in apply_migrations
archivebox-archivebox-1  |     call_command("makemigrations", interactive=False, stdout=null)
archivebox-archivebox-1  |   File "/usr/local/lib/python3.10/site-packages/django/core/management/__init__.py", line 168, in call_command
archivebox-archivebox-1  |     return command.execute(*args, **defaults)
archivebox-archivebox-1  |   File "/usr/local/lib/python3.10/site-packages/django/core/management/base.py", line 371, in execute
archivebox-archivebox-1  |     output = self.handle(*args, **options)
archivebox-archivebox-1  |   File "/usr/local/lib/python3.10/site-packages/django/core/management/base.py", line 85, in wrapped
archivebox-archivebox-1  |     res = handle_func(*args, **kwargs)
archivebox-archivebox-1  |   File "/usr/local/lib/python3.10/site-packages/django/core/management/commands/makemigrations.py", line 182, in handle
archivebox-archivebox-1  |     self.write_migration_files(changes)
archivebox-archivebox-1  |   File "/usr/local/lib/python3.10/site-packages/django/core/management/commands/makemigrations.py", line 220, in write_migration_files
archivebox-archivebox-1  |     with open(writer.path, "w", encoding='utf-8') as fh:
archivebox-archivebox-1  | PermissionError: [Errno 13] Permission denied: '/app/archivebox/core/migrations/0021_auto_20220724_1200.py'

This happens even if I set up a brand new directory with the :dev branch to create a new database and do setup/init.

If I create the container and spawn a shell to poke around it appears /app has permissions root:root which should be fine, but I guess ArchiveBox writes out the migration files to the /app directory, what's interesting is that this happens even on a totally fresh install.

Here are the permissions:

$ id
uid=999(archivebox) gid=999(archivebox) groups=999(archivebox),29(audio),44(video)
$ ls -lahS /
total 84K
drwxr-xr-x   1 root       root       4.0K Jul 24 11:59 .
drwxr-xr-x   1 root       root       4.0K Jul 24 11:59 ..
drwxr-xr-x   1 root       root       4.0K Jun  9 08:09 app
drwxr-xr-x   1 root       root       4.0K Jun  9 00:22 bin
drwxr-xr-x   2 root       root       4.0K Mar 19 13:46 boot
drwxr-xr-x   3 archivebox archivebox 4.0K Jul 24 11:59 data
drwxr-xr-x   1 root       root       4.0K Jul 24 11:59 etc
drwxr-xr-x   1 root       root       4.0K Jun  9 00:00 home
drwxr-xr-x   1 root       root       4.0K Jun  9 00:02 lib
drwxr-xr-x   2 root       root       4.0K May 27 00:00 media
drwxr-xr-x   2 root       root       4.0K May 27 00:00 mnt
drwxr-xr-x   1 root       root       4.0K Jun  9 00:16 node
drwxr-xr-x   2 root       root       4.0K May 27 00:00 opt
drwx------   1 root       root       4.0K Jun  9 00:21 root
drwxr-xr-x   3 root       root       4.0K May 27 00:00 run
drwxr-xr-x   1 root       root       4.0K Jun  9 00:03 sbin
drwxr-xr-x   2 root       root       4.0K May 27 00:00 srv
drwxrwxrwt   1 root       root       4.0K Jun  9 00:30 tmp
drwxr-xr-x   1 root       root       4.0K May 27 00:00 usr
drwxr-xr-x   1 root       root       4.0K May 27 00:00 var
drwxr-xr-x   5 root       root       4.0K Jun  9 00:20 venv
drwxr-xr-x   5 root       root        340 Jul 24 11:59 dev
-rwxr-xr-x   1 root       root          0 Jul 24 11:59 .dockerenv
dr-xr-xr-x 135 root       root          0 Jul 24 11:59 proc
dr-xr-xr-x  12 root       root          0 Jul 24 11:59 sys

Update:

May not be a permissions issue, the file doesn't seem to exist

$ cat /app/archivebox/core/migrations/0021_auto_20220724_1159.py
cat: /app/archivebox/core/migrations/0021_auto_20220724_1159.py: No such file or directory

Not too sure what causes this, I looked at the migrations subdir and they're all on git anyways, so I'm guessing it's trying to execute a migration that doesn't exist yet? Maybe @pirate pushed a new build expecting a migration there when there isn't one yet?

Update: Yeah it's a permissions issue, giving archivebox:archivebox ownership of /app/archivebox/core/migrations works:

$ archivebox server --quick-init 0.0.0.0:8000
[i] [2022-07-24 12:54:06] ArchiveBox v0.6.3: archivebox server --quick-init 0.0.0.0:8000
    > /data

[^] Verifying and updating existing ArchiveBox collection to v0.6.3...
----------------------------------------------------------------------

[*] Verifying archive folder structure...
    + ./archive, ./sources, ./logs...
    + ./ArchiveBox.conf...

[*] Verifying main SQL index and running any migrations needed...
    Operations to perform:
      Apply all migrations: admin, auth, contenttypes, core, sessions
    Running migrations:
    Applying core.0021_auto_20220724_1254... OK

    √ ./index.sqlite3

[*] Checking links from indexes and archive folders (safe to Ctrl+C)...
    √ Loaded 4 links from existing main index.
    > Skipping full snapshot directory check (quick mode)

----------------------------------------------------------------------
[√] Done. Verified and updated the existing ArchiveBox collection.

    Hint: To view your archive index, run:
        archivebox server  # then visit http://127.0.0.1:8000

    To add new links, you can run:
        archivebox add < ~/some/path/to/list_of_links.txt

    For more usage and examples, run:
        archivebox help
@selim13
Copy link

selim13 commented Sep 13, 2022

A quick and dirty workaround to fix permissions on start:

services:
  archivebox:
    image: archivebox/archivebox:dev
    entrypoint: /bin/bash
    command: -c "chown -R archivebox:archivebox /app/archivebox/core/migrations && /app/bin/docker_entrypoint.sh server --quick-init 0.0.0.0:8000"

@turian
Copy link
Contributor

turian commented Sep 13, 2022

@selim13 Can you make this a PR? I have the same issue

@turian turian mentioned this issue Sep 14, 2022
6 tasks
@turian
Copy link
Contributor

turian commented Sep 14, 2022

@agnosticlines @selim13 You can use my docker image turian/archivebox:migrations-0021 or wait for this PR to be merged: #1027 (based upon my branch https://github.com/turian/ArchiveBox/tree/feature/migrations-0021_auto_20220914_0934.py )

@pirate
Copy link
Member

pirate commented Nov 28, 2022

Yeah sorry a bug got merged into dev with one of the recent PRs a few months ago and I haven't had a chance to bisect and dig it out yet.

@pirate
Copy link
Member

pirate commented Nov 28, 2022

This should be fixed now as I merged @turian's PR with the missing migration. Comment back if you're still having issues on the latest dev release.

@pirate pirate closed this as completed Nov 28, 2022
@canoziia
Copy link

Hi, I found that this problem seems to still exist:

archivebox  | find: '/.config/chromium/Crash Reports/pending/': No such file or directory
archivebox  | [i] [2022-11-28 11:16:37] ArchiveBox v0.6.3: archivebox server --quick-init 0.0.0.0:15200
archivebox  |     > /data
archivebox  |
archivebox  | find: '/.config/chromium/Crash Reports/pending/': No such file or directory
archivebox  | [+] Initializing a new ArchiveBox v0.6.3 collection...
archivebox  | ----------------------------------------------------------------------
archivebox  |
archivebox  | [+] Building archive folder structure...
archivebox  |     + ./archive, ./sources, ./logs...
archivebox  |     + ./ArchiveBox.conf...
archivebox  | find: '/.config/chromium/Crash Reports/pending/': No such file or directory
archivebox  |
archivebox  | [+] Building main SQL index and running initial migrations...
archivebox  | Traceback (most recent call last):
archivebox  |   File "/usr/local/bin/archivebox", line 33, in <module>
archivebox  |     sys.exit(load_entry_point('archivebox', 'console_scripts', 'archivebox')())
archivebox  |   File "/app/archivebox/cli/__init__.py", line 140, in main
archivebox  |     run_subcommand(
archivebox  |   File "/app/archivebox/cli/__init__.py", line 80, in run_subcommand
archivebox  |     module.main(args=subcommand_args, stdin=stdin, pwd=pwd)    # type: ignore
archivebox  |   File "/app/archivebox/cli/archivebox_server.py", line 64, in main
archivebox  |     server(
archivebox  |   File "/app/archivebox/util.py", line 114, in typechecked_function
archivebox  |     return func(*args, **kwargs)
archivebox  |   File "/app/archivebox/main.py", line 1280, in server
archivebox  |     run_subcommand('init', subcommand_args=['--quick'], stdin=None, pwd=out_dir)
archivebox  |   File "/app/archivebox/cli/__init__.py", line 80, in run_subcommand
archivebox  |     module.main(args=subcommand_args, stdin=stdin, pwd=pwd)    # type: ignore
archivebox  |   File "/app/archivebox/cli/archivebox_init.py", line 43, in main
archivebox  |     init(
archivebox  |   File "/app/archivebox/util.py", line 114, in typechecked_function
archivebox  |     return func(*args, **kwargs)
archivebox  |   File "/app/archivebox/main.py", line 344, in init
archivebox  |     for migration_line in apply_migrations(out_dir):
archivebox  |   File "/app/archivebox/util.py", line 114, in typechecked_function
archivebox  |     return func(*args, **kwargs)
archivebox  |   File "/app/archivebox/index/sql.py", line 143, in apply_migrations
archivebox  |     call_command("makemigrations", interactive=False, stdout=null)
archivebox  |   File "/usr/local/lib/python3.10/site-packages/django/core/management/__init__.py", line 168, in call_command
archivebox  |     return command.execute(*args, **defaults)
archivebox  |   File "/usr/local/lib/python3.10/site-packages/django/core/management/base.py", line 371, in execute
archivebox  |     output = self.handle(*args, **options)
archivebox  |   File "/usr/local/lib/python3.10/site-packages/django/core/management/base.py", line 85, in wrapped
archivebox  |     res = handle_func(*args, **kwargs)
archivebox  |   File "/usr/local/lib/python3.10/site-packages/django/core/management/commands/makemigrations.py", line 182, in handle
archivebox  |     self.write_migration_files(changes)
archivebox  |   File "/usr/local/lib/python3.10/site-packages/django/core/management/commands/makemigrations.py", line 220, in write_migration_files
archivebox  |     with open(writer.path, "w", encoding='utf-8') as fh:
archivebox  | PermissionError: [Errno 13] Permission denied: '/app/archivebox/core/migrations/0021_auto_20221128_1116.py'

root@docker:~/app/archivebox# docker images
REPOSITORY                          TAG                 IMAGE ID       CREATED         SIZE
archivebox/archivebox               dev                 b7a570c267b8   8 hours ago     1.63GB

@pirate
Copy link
Member

pirate commented Jun 13, 2023

I believe this issue should be fixed again, comment back if you're still having trouble and I can re-open it.

@pirate pirate closed this as completed Jun 13, 2023
@pirate pirate added type: bug report size: easy status: done Work is completed and released (or scheduled to be released in the next version) labels Jun 13, 2023
@Astro1247
Copy link

Hi! Looks like trouble is back with migration 0069

[*] Verifying main SQL index and running any migrations needed...
[!] Failed to create some migrations. Please open an issue and copy paste this output for help: [Errno 13] Permission denied: '/app/archivebox/core/migrations/0069_alter_archiveresult_extractor.py'

@pirate
Copy link
Member

pirate commented Aug 22, 2024

Ok yeah the issue is it's trying to create a new migration that it shouldn't be creating. Don't run the workarounds posted above (don't worry if you did it should be fine), it's an edge case for some setups that I need to fix in the code directly.

@pirate pirate reopened this Aug 22, 2024
@Astro1247
Copy link

Ok yeah the issue is it's trying to create a new migration that it shouldn't be creating. Don't run the workarounds posted above (don't worry if you did it should be fine), it's an edge case for some setups that I need to fix in the code directly.

unfortunately currently my instance is stuck without being able to access my archive anymore, I was using 0.7.2 but at some moment it stopped grabbing titles for all links with just something like "title not found", or similar from logs and also it was really, really slow in most cases, so I tried to update it to latest dev 0.8.2 - now stuck with django.db.utils.IntegrityError: NOT NULL constraint failed: core_snapshot.created_by_id
Tried to downgrade back - but stuck with some different other problems each time too.. Like latest was django.db.utils.IntegrityError: NOT NULL constraint failed: core_snapshot.created

I should've create a full instance backup before trying to update.. :(
So for now I'm just hoping that this issue in first place after being resolve can resolve some other issues and I'll be able to use my archive again..

@pirate
Copy link
Member

pirate commented Aug 22, 2024

@Astro1247, I understand it looks scary but don't worry you shouldn't lose data, Django migrations are deterministic and atomic (meaning if they fail during some step it wont leave your db in a corrupted state). In the future you should definitely back up before installing any BETA releases but I'll help fix it for now.

I just pushed fixes for the two underlying issues you encountered (missing migration 0069 and created_by_id failure):

You should be able to pull the latest :dev and run it, and it will pick up wherever it left off, and re-run the migrations needed to bring you up to the current version.

docker compose pull
docker compose down
docker compose up

@pirate
Copy link
Member

pirate commented Aug 23, 2024

Just bumping this to send a new notification because I edited my previous comment quite a bit ^

@Astro1247
Copy link

@Astro1247, I understand it looks scary but don't worry you shouldn't lose data, Django migrations are deterministic and atomic (meaning if they fail during some step it wont leave your db in a corrupted state). In the future you should definitely back up before installing any BETA releases but I'll help fix it for now.

I just pushed fixes for the two underlying issues you encountered (missing migration 0069 and created_by_id failure):

* [afe1307](https://github.com/ArchiveBox/ArchiveBox/commit/afe130761780f259b0841fe7dd2240cee6e02a31)

* [09553d83](https://github.com/ArchiveBox/ArchiveBox/commit/09553d83)

You should be able to pull the latest :dev and run it, and it will pick up wherever it left off, and re-run the migrations needed to bring you up to the current version.

docker compose pull
docker compose down
docker compose up

Thanks, it managed to recover (some?, it said it loaded 530 links and next row it said it ignored 530 invalid links) data and even it completely mixed all the data (date of saving is new now) at least I'm able to view this data and archivebox itself started.

All new starts are now with this warning:

WARNING: Error loading /app/node_modules/.bin/readability-extractor: Expected /app/node_modules/.bin/readability-extractor version None, got 0.0.11
- ❌ Binary /app/node_modules/.bin/postlight-parser failed to load with error: None of the configured providers [env] were able to load binary: postlight-parser

Also no new archiving tasks can be completed, no matter how much different attempts with different settings I've done, all fails into:

archivebox-1            |       > dom
archivebox-1            |         Extractor failed:
archivebox-1            |              Failed to save DOM
archivebox-1            |             [315:315:0829/175745.921310:FATAL:zygote_host_impl_linux.cc(126)] No usable sandbox! Update your kernel or see https://chromium.googlesource.com/chromium/src/+/main/docs/linux/suid_sandbox_development.md for more information on developing with the SUID sandbox. If you want to live dangerously and need an immediate workaround, you can try using --no-sandbox.

So no new data can be saved, it all fails with same error, even that everything is okay, when using vnc everything seems fine too, but, its fails on archiving attempt :)

Also, all these actions that I've taken, all this updates etc were to fix failing title extracting, and.. it still fails to extract any page title XD Even with everything latest

archivebox-1            |       > title
archivebox-1            |         Extractor failed:
archivebox-1            |              Unable to detect page title
archivebox-1            |         Run to see full output:
archivebox-1            |           docker run -it -v $PWD/data:/data archivebox/archivebox /bin/bash
archivebox-1            |             cd /data/archive/1724950329.479523;
archivebox-1            |             curl --silent --location --compressed --max-time 60 --user-agent "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36" "https://github.com/ArchiveBox/ArchiveBox/issues/1002"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size: easy status: done Work is completed and released (or scheduled to be released in the next version)
Projects
None yet
Development

No branches or pull requests

6 participants