Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Remove index.json and index.html generation #502

Merged
merged 5 commits into from
Oct 23, 2020

Conversation

cdvv7788
Copy link
Contributor

@cdvv7788 cdvv7788 commented Oct 8, 2020

Summary

At the end of the cli commands (add, remove, etc) the json and html indexes were still being generated. This is a slow process dependent on the number of archived links (size of the archive). This PR removes that generation, as they can be generated on demand using the archivebox list command.

**Related issues: #461

Changes these areas

  • Bugfixes
  • Feature behavior
  • Command line interface
  • Configuration options
  • Internal architecture
  • Archived data layout on disk

@cdvv7788 cdvv7788 requested review from pirate and apkallum October 9, 2020 14:00
@pirate
Copy link
Member

pirate commented Oct 10, 2020

Hmm this PR seems fine but I'd like to make the upgrade UX clearer.

Can we rename any index.html/index.json files in the root dir to put their creation dates in the name?

data/
    index.html      ->        2020-08-08_index_old.html
    index.json      ->        2020-08-09_index_old.json
    index.sqlite3
    ArchiveBox.conf
    ...

That should make it clear that users need to manually export somehow in order to update those files.
It will also ensure that any tools relying on those files existing in the old location break, forcing people to update those tools to match the new behavior and export process requirements.

@cdvv7788
Copy link
Contributor Author

@pirate I renamed the indexes if they are present. Anything else missing?

@cdvv7788 cdvv7788 merged commit f330e64 into ArchiveBox:master Oct 23, 2020
@cdvv7788 cdvv7788 deleted the remove-static-indexes branch October 23, 2020 11:46
@kedorlaomer
Copy link

I liked the index.html (although slow generation is clearly bad). Do you think you could add it as an option? A use case (an actual one) is statically hosting the archive directory. This won't work without extra work (basically gathering all files under archive/*/index.html and cobbling together an index.html)

@cdvv7788
Copy link
Contributor Author

cdvv7788 commented Nov 16, 2020

@kedorlaomer you can still generate it using the list command. Something like archivebox list --html --index > index.html should work. We enabled that option, and disabled automatic generation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants