Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Poc setup django on init #515

Merged
merged 3 commits into from
Nov 28, 2020
Merged

Conversation

cdvv7788
Copy link
Contributor

Summary

Initializing django on cli init, and removing it from internal functions.
The in-memory database may be required at a later point (when we start saving the ArchiveResults to the database) but at this point, this seems to work and no test is broken. Maybe we can remove all of the setup_django calls in this PR and work with that as a basis for the next iteration. WDYT @pirate ?

Related issues #496 #510

Changes these areas

  • Bugfixes
  • Feature behavior
  • Command line interface
  • Configuration options
  • Internal architecture
  • Snapshot data layout on disk

@cdvv7788 cdvv7788 mentioned this pull request Oct 26, 2020
6 tasks
@cdvv7788 cdvv7788 force-pushed the POC-setup-django-on-init branch from 5b9fd53 to f6ce1de Compare October 27, 2020 14:15
@pirate
Copy link
Member

pirate commented Oct 28, 2020

These commands in need to work in a non-data folder, make sure to test them with this PR:

cd /tmp
archivebox version
archivebox help
archivebox oneshot
archivebox init

Also make sure to check the whole init process in general, I ran into some trouble earlier trying to use v0.4.21 to read a collection, so I want to make sure this PR fixes that issue:

~/D/o/A/data ⨈(data) ⎇ (master) ❈5 # ll
> /Users/squash/Documents/opt/ArchiveBox/data

drwxr-xr-x     - squash staff 2020-10-25 21:01 --  .venv/
drwxr-xr-x     - squash staff 2020-09-11 12:51 --  archive/
drwxr-xr-x     - squash staff 2020-08-01 11:44 --  data/
drwxr-xr-x     - squash staff 2020-08-18 02:04 --  logs/
drwxr-xr-x     - squash staff 2020-10-24 22:50 --  node_modules/
drwxr-xr-x     - squash staff 2020-09-11 12:51 --  sources/
drwxr-xr-x     - squash staff 2020-08-18 00:53 --  static/
.rwxr-xr-x@    0 squash staff 2020-10-24 22:47 --  2020-10-25_index_old.html*
.rwxr-xr-x  869k squash staff 2020-09-11 12:51 --  2020-10-25_index_old.json*
.rw-------@   81 squash staff 2020-10-24 22:47 --  ArchiveBox.conf
.rwxr-xr-x    82 squash staff 2020-10-24 22:47 --  ArchiveBox.conf.bak*
.rwxr-xr-x   15k squash staff 2020-09-11 12:51 --  favicon.ico*
.rw-r--r--@ 302k squash staff 2020-10-24 22:48 --  index.html
.rwxr-xr-x  409k squash staff 2020-10-24 22:47 --  index.sqlite3*
.rw-r--r--   86k squash staff 2020-10-24 22:50 --  package-lock.json
.rwxr-xr-x    30 squash staff 2020-09-11 12:51 --  robots.txt*~/D/o/A/data ⨈(data) ⎇ (master) ❈5 # archivebox help
Welcome to ArchiveBox v0.4.21!

To import an existing archive (from a previous version of ArchiveBox):
    1. cd into your data dir OUTPUT_DIR (usually ArchiveBox/output) and run:
    2. archivebox init

To start a new archive:
    1. Create an empty directory, then cd into it and run:
    2. archivebox init

For more information, see the documentation here:
    https://github.com/pirate/ArchiveBox/wiki
➜ ~/D/o/A/data ⨈(data) ⎇ (master) ❈5 # archivebox init
[i] [2020-10-28 00:48:29] ArchiveBox v0.4.21: archivebox init
    > /Users/squash/Documents/opt/ArchiveBox/data

[X] This folder appears to already have files in it, but no index.json is present.

    You must run init in a completely empty directory, or an existing data folder.

    Hint: To import an existing data folder make sure to cd into the folder first, 
    then run and run 'archivebox init' to pick up where you left off.

    (Always make sure your data folder is backed up first before updating ArchiveBox)
[2]

(its not recognizing the collection on 0.4.21, and it wont let me init to update it because it cant find index.json.)

@cdvv7788
Copy link
Contributor Author

@pirate you are trying to run init in an archive that has been updated to v0.5. What do you expect to happen in this case? Should it look for the _index_old.json and try to copy them to index.json?
I re-tested all of the commands, and the whole test suite is passing the checks. Everything seems to work. There was an issue with the docker build (which I would like you to review) but other than that, everything seems to be in order.
I will remove all of the setup_django calls in the codebase and see if this statement holds.

@pirate
Copy link
Member

pirate commented Oct 29, 2020

It should throw an error if the version in the DB is greater than / incompatible the version being run, rather than claiming there is no index in the current folder.

@cdvv7788
Copy link
Contributor Author

However, that should be a hotfix in the v0.4 branch, right? That would belong to another PR. I can add that to the current version, but your case will not be supported that way unless we merge it back there too.

@cdvv7788
Copy link
Contributor Author

cdvv7788 commented Nov 2, 2020

@pirate for the fix you want, you want v0.4 to recognize if the index is of v0.5, right? Should I send a PR to that version's branch directly?

@pirate
Copy link
Member

pirate commented Nov 23, 2020

Nah we should just add the check to the current version, without patching 0.4. It will be useful only starting in the next release with v0.6.

@pirate pirate merged commit 1b22f8e into ArchiveBox:master Nov 28, 2020
@cdvv7788 cdvv7788 deleted the POC-setup-django-on-init branch November 28, 2020 05:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants